The function nmf
is a S4 generic defines the main interface to run NMF
algorithms within the framework defined in package NMF
.
It has many methods that facilitates applying, developing and testing NMF
algorithms.
The package vignette vignette('NMF')
contains an introduction to the
interface, through a sample data analysis.
nmf(x, rank, method, ...) S4 (matrix,numeric,NULL) `nmf`(x, rank, method, seed = NULL, model = NULL, ...) S4 (matrix,numeric,list) `nmf`(x, rank, method, ..., .parameters = list()) S4 (matrix,numeric,function) `nmf`(x, rank, method, seed, model = "NMFstd", ..., name, objective = "euclidean", mixed = FALSE) S4 (matrix,NMF,ANY) `nmf`(x, rank, method, seed, ...) S4 (matrix,NULL,ANY) `nmf`(x, rank, method, seed, ...) S4 (matrix,matrix,ANY) `nmf`(x, rank, method, seed, model = list(), ...) S4 (formula,ANY,ANY) `nmf`(x, rank, method, ..., model = NULL) S4 (matrix,numeric,NMFStrategy) `nmf`(x, rank, method, seed = nmf.getOption("default.seed"), rng = NULL, nrun = if (length(rank) > 1) 30 else 1, model = NULL, .options = list(), .pbackend = nmf.getOption("pbackend"), .callback = NULL, .tmpdir = getwd(), ...)
nmf,matrix,matrix,ANY
.
If rank
is a numeric vector with more than one element, e.g. a range of ranks,
then nmf
performs the estimation procedure described in
nmfEstimateRank
.function
or list
object. See their descriptions in section Methods.
If method
is missing the algorithm to use is obtained from the option
nmf.getOption('default.algorithm')
, unless it can be infer from the type of NMF model
to fit, if this later is available from other arguments.
Factory fresh default value is brunet, which corresponds to the standard NMF
algorithm from Brunet2004 (see section Algorithms).
Cases where the algorithm is inferred from the call are when an NMF model is passed in arguments rank
or seed
(see description for nmf,matrix,numeric,NULL
in section Methods).nmf
methods
are passed to the function that effectively implements the algorithm that fits
an NMF model on x
.method
,
and be lists of named values that are passed to the corresponding method.method
[only used when method
is a function].method
[only used when method
is a function].
It may be either 'euclidean'
or 'KL'
for specifying the euclidean
distance (Frobenius norm) or the Kullback-Leibler divergence respectively,
or a function with signature (x="NMF", y="matrix", ...)
that computes
the objective value for an NMF model x
on a target matrix y
,
i.e. the residuals between the target matrix and its NMF estimate.
Any extra argument may be specified, e.g. function(x, y, alpha, beta=2, ...)
.method
support mixed-sign target matrices, i.e. that may contain negative
values [only used when method
is a function].character
string: giving the name of a registered
seeding method. The corresponding method will be called to compute
the starting point.
Available methods can be listed via nmfSeed()
.
See its dedicated documentation for details on each available registered methods
(nmfSeed
).
list
: giving the name of a registered
seeding method and, optionally, extra parameters to pass to it.
numeric
: that is used to seed the random number
generator, before generating a random starting point.
Note that when performing multiple runs, the L'Ecuyer's RNG is used in order to
produce a sequence of random streams, that is used in way that ensures
that parallel computation are fully reproducible.
NMF-class
: it should
contain the data of an initialised NMF model, i.e. it must contain valid
basis and mixture coefficient matrices, directly usable by the algorithm's
workhorse function.
function
: that computes the starting point. It must have
signature (object="NMF", target="matrix", ...)
and return an object that
inherits from class NMF
.
It is recommended to use argument object
as a template for the returned object,
by only updating the basis and coefficient matrices, using basis<-
and
coef<-
respectively.
NMF-class
,
that will be passed to the seeding method.
The following values are supported:
NULL
, the default model associated to the NMF algorithm is
instantiated and ...
is looked-up for arguments with names that
correspond to slots in the model class, which are passed to the function
nmfModel
to instantiate the model.
Arguments in ...
that do not correspond to slots are passed to the
algorithm.
character
string, that is the name of the NMF model
class to be instantiate.
In this case, arguments in ...
are handled in the same way as
when model
is NULL
.
list
that contains named values that are passed to the
function nmfModel
to instantiate the model.
In this case, ...
is not looked-up at all, and passed entirely to
the algorithm.
This means that all necessary model parameters must be specified in
model
.
model
MUST be a list -- possibly empty --, if one wants this
parameter to be effectively passed to the algorithm.
If a variable appears in both arguments model
and ...
,
the former will be used to initialise the NMF model, the latter will be
passed to the NMF algorithm.
See code examples for an illustration of this situation.rank
is a numeric vector
with more than one element, in which case a default of 30 runs per value of the
rank are performed, allowing the computation of a consensus matrix that is used
in selecting the appropriate rank (see consensus
).
When using a random seeding method, multiple runs are generally required to
achieve stability and avoid bad local minima.list
containing named options with their values, or, in
the case only boolean/integer options need to be set, a character string
that specifies which options are turned on/off or their value, in a unix-like
command line argument way.
The string must be composed of characters that correspond to a given option
(see mapping below), and modifiers '+' and '-' that toggle options on and off respectively.
E.g. .options='tv'
will toggle on options track
and verbose
,
while .options='t-v'
will toggle on option track
and toggle off
option verbose
.
Modifiers '+' and '-' apply to all option character found after them:
t-vp+k
means track=TRUE
, verbose=parallel=FALSE
,
and keep.all=TRUE
.
The default behaviour is to assume that .options
starts with a '+'.
for options that accept integer values, the value may be appended to the
option's character e.g. 'p4'
for asking for 4 processors or 'v3'
for showing verbosity message up to level 3.
The following options are available (the characters after - are those
to use to encode .options
as a string):
FALSE
).
Like option verbose
but with more information displayed.
nrun
>1): if
TRUE
, all factorizations are saved and returned (default: FALSE
).
Otherwise only the factorization achieving the minimum residuals is returned.
nrun
> 1) (default: TRUE
).
If TRUE
, the runs are performed using the parallel foreach backend
defined in argument .pbackend
.
If this is set to 'mc'
or 'par'
then nmf
tries to
perform the runs using multiple cores with package
link[doParallel]{doParallel}
-- which therefore needs to be installed.
If equal to an integer, then nmf
tries to perform the computation on
the specified number of processors.
When passing options as a string the number is appended to the option's character
e.g. 'p4'
for asking for 4 processors.
If FALSE
, then the computation is performed sequentially using the base
function sapply
.
Unlike option 'P' (capital 'P'), if the computation cannot be performed in
parallel, then it will still be carried on sequentially.
IMPORTANT NOTE FOR MAC OS X USERS: The parallel computation is
based on the doMC
and multicore
packages, so the same care
should be taken as stated in the vignette of doMC
: it
is not safe to use doMC from R.app on Mac OS X. Instead, you should use doMC
from a terminal session, starting R from the command line.
p
, but an error is thrown if
the computation cannot be performed in parallel or with the specified number
of processors.
nmf.getOption('shared.memory')
.
TRUE
TRUE
, the returned object's slot residuals
contains the
trajectory of the objective values, which can be retrieved via
residuals(res, track=TRUE)
This tracking functionality is available for all built-in algorithms.
FALSE
).
If TRUE
, messages about the configuration and the state of the
current run(s) are displayed.
The level of verbosity may be specified with an integer value, the greater
the level the more messages are displayed.
Value FALSE
means no messages are displayed, while value TRUE
is equivalent to verbosity level 1.
foreach
parallel backend
to register and/or use when running in parallel mode.
See options p
and P
in argument .options
for how to
enable this mode.
Note that any backend that is internally registered is cleaned-up on exit,
so that the calling foreach environment should not be affected by a call to
nmf
-- except when .pbackend=NULL
.
Currently it accepts the following values:
doParallel
;
doParallel
backend;
doSEQ
;
NULL
use currently registered backend;
NA
do not compute using a foreach loop -- and therefore not in
parallel -- but rather use a call to standard sapply
.
This is useful for when developing/debugging NMF algorithms, as foreach loop
handling may sometime get in the way.
Note that this is equivalent to using .options='-p'
or .options='p0'
,
but takes precedence over any option specified in .options
:
e.g. nmf(..., .options='P10', .pbackend=NA)
performs all runs sequentially
using sapply
.
Use nmf.options(pbackend=NA)
to completely disable foreach/parallel computations
for all subsequent nmf
calls.
keep.all=FALSE
(default). It
allows to pass a callback function that is called after each run when
performing multiple runs (i.e. with nrun>1
).
This is useful for example if one is also interested in saving summary
measures or process the result of each NMF fit before it gets discarded.
After each run, the callback function is called with two arguments, the
NMFfit-class
object that as just been fitted and the run
number: .callback(res, i)
.
For convenience, a function that takes only one argument or has
signature (x, ...)
can still be passed in .callback
.
It is wrapped internally into a dummy function with two arguments,
only the first of which is passed to the actual callback function (see example
with summary
).
The call is wrapped into a tryCatch so that callback errors do not stop the
whole computation (see below).
The results of the different calls to the callback function are stored in a
miscellaneous slot accessible using the method $
for NMFfit
objects: res$.callback
.
By default nmf
tries to simplify the list of callback result using
sapply
, unless option 'simplifyCB'
is FASE
.
If no error occurs res$.callback
contains the list of values that
resulted from the calling the callback function --, ordered as the fits.
If any error occurs in one of the callback calls, then the whole computation is
not stopped, but the error message is stored in res$.callback
,
in place of the result.
See the examples for sample code.'doSEQ'
).The returned value depends on the run mode:
Single run:An object of class NMFfit-class
.
Multiple runs, single method:When nrun > 1
and method
is not list
, this method returns an object of class NMFfitX-class
.
Multiple runs, multiple methods:When nrun > 1
and method
is a list
, this method returns an object of class NMFList-class
.
The nmf
function has multiple methods that compose a very flexible
interface allowing to:
The workhorse method is nmf,matrix,numeric,NMFStrategy
, which is eventually
called by all other methods.
The other methods provides convenient ways of specifying the NMF algorithm(s),
the factorization rank, or the seed to be used.
Some allow to directly run NMF algorithms on different types of objects, such
as data.frame
or ExpressionSet
objects.
signature(x = "data.frame", rank = "ANY", method = "ANY")
: Fits an NMF model on a data.frame
.
The target data.frame
is coerced into a matrix with as.matrix
.
signature(x = "matrix", rank = "numeric", method = "NULL")
: Fits an NMF model using an appropriate algorithm when method
is not supplied.
This method tries to select an appropriate algorithm amongst the NMF algorithms
stored in the internal algorithm registry, which contains the type of NMF models
each algorithm can fit.
This is possible when the type of NMF model to fit is available from argument seed
,
i.e. if it is an NMF model itself.
Otherwise the algorithm to use is obtained from nmf.getOption('default.algorithm')
.
This method is provided for internal usage, when called from other nmf
methods
with argument method
missing in the top call (e.g. nmf,matrix,numeric,missing
).
signature(x = "matrix", rank = "numeric", method = "list")
: Fits multiple NMF models on a common matrix using a list of algorithms.
The models are fitted sequentially with nmf
using the same options
and parameters for all algorithms.
In particular, irrespective of the way the computation is seeded, this method
ensures that all fits are performed using the same initial RNG settings.
This method returns an object of class NMFList-class
, that is
essentially a list containing each fit.
signature(x = "matrix", rank = "numeric", method = "character")
: Fits an NMF model on x
using an algorithm registered with access key
method
.
Argument method
is partially match against the access keys of all
registered algorithms (case insensitive).
Available algorithms are listed in section Algorithms below or the
introduction vignette.
A vector of their names may be retrieved via nmfAlgorithm()
.
signature(x = "matrix", rank = "numeric", method = "function")
: Fits an NMF model on x
using a custom algorithm defined the function
method
.
The supplied function must have signature (x=matrix, start=NMF, ...)
and return an object that inherits from class NMF-class
.
It will be called internally by the workhorse nmf
method, with an NMF model
to be used as a starting point passed in its argument start
.
Extra arguments in ...
are passed to method
from the top
nmf
call.
Extra arguments that have no default value in the definition of the function
method
are required to run the algorithm (e.g. see argument alpha
of myfun
in the examples).
If the algorithm requires a specific type of NMF model, this can be specified
in argument model
that is handled as in the workhorse nmf
method (see description for this argument).
signature(x = "matrix", rank = "NMF", method = "ANY")
: Fits an NMF model using the NMF model rank
to seed the computation,
i.e. as a starting point.
This method is provided for convenience as a shortcut for
nmf(x, nbasis(object), method, seed=object, ...)
It discards any value passed in argument seed
and uses the NMF model passed
in rank
instead.
It throws a warning if argument seed
not missing.
If method
is missing, this method will call the method
nmf,matrix,numeric,NULL
, which will infer an algorithm suitable for fitting an
NMF model of the class of rank
.
signature(x = "matrix", rank = "NULL", method = "ANY")
: Fits an NMF model using the NMF model supplied in seed
, to seed the computation,
i.e. as a starting point.
This method is provided for completeness and is equivalent to
nmf(x, seed, method, ...)
.
signature(x = "matrix", rank = "missing", method = "ANY")
: Method defined to ensure the correct dispatch to workhorse methods in case
of argument rank
is missing.
signature(x = "matrix", rank = "numeric", method = "missing")
: Method defined to ensure the correct dispatch to workhorse methods in case
of argument method
is missing.
signature(x = "matrix", rank = "matrix", method = "ANY")
: Fits an NMF model partially seeding the computation with a given matrix passed
in rank
.
The matrix rank
is used either as initial value for the basis or mixture
coefficient matrix, depending on its dimension.
Currently, such partial NMF model is directly used as a seed, meaning that the remaining part is left uninitialised, which is not accepted by all NMF algorithm. This should change in the future, where the missing part of the model will be drawn from some random distribution.
Amongst built-in algorithms, only snmf/l and snmf/r support partial seeds, with only the coefficient or basis matrix initialised respectively.
signature(x = "matrix", rank = "data.frame", method = "ANY")
: Shortcut for nmf(x, as.matrix(rank), method, ...)
.
signature(x = "formula", rank = "ANY", method = "ANY")
: This method implements the interface for fitting formula-based NMF models.
See nmfModel
.
Argument rank
target matrix or formula environment.
If not missing, model
must be a list
, a data.frame
or
an environment
in which formula variables are searched for.
Lee and Seung's multiplicative updates are used by several NMF algorithms. To improve speed and memory usage, a C++ implementation of the specific matrix products is used whenever possible. It directly computes the updates for each entry in the updated matrix, instead of using multiple standard matrix multiplication.
The algorithms that benefit from this optimization are: 'brunet', 'lee', 'nsNMF' and 'offset'. However there still exists plain R versions for these methods, which implement the updates as standard matrix products. These are accessible by adding the prefix '.R#' to their name: '.R#brunet', '.R#lee', '.R#nsNMF' and '.R#offset'.
All algorithms are accessible by their respective access key as listed below. The following algorithms are available:
Default stopping criterion: invariance of the connectivity matrix
(see nmf.stop.connectivity
).
Default stopping criterion: invariance of the connectivity matrix
(see nmf.stop.connectivity
).
Default stopping criterion: stationarity of the objective function
(see nmf.stop.stationary
).
Default stopping criterion: invariance of the connectivity matrix
(see nmf.stop.connectivity
).
Default stopping criterion: invariance of the connectivity matrix
(see nmf.stop.connectivity
).
Default stopping criterion: stationarity of the objective function
(see nmf.stop.stationary
).
fcnnls
).
It minimises an Euclidean-based objective function, that is regularized to
favour sparse basis matrices (for snmf/l) or sparse coefficient matrices
(for snmf/r).
Stopping criterion: built-in within the internal workhorse function nmf_snmf
,
based on the KKT optimality conditions.
The purpose of seeding methods is to compute initial values for the factor matrices in a given NMF model. This initial guess will be used as a starting point by the chosen NMF algorithm.
The seeding method to use in combination with the algorithm can be passed
to interface nmf
through argument seed
.
The seeding seeding methods available in registry are listed by the function
nmfSeed
(see list therein).
Detailed examples of how to specify the seeding method and its parameters can be found in the Examples section of this man page and in the package's vignette.
Brunet J, Tamayo P, Golub TR and Mesirov JP (2004). "Metagenes and molecular pattern discovery using matrix factorization."
_Proceedings of the National Academy of Sciences of the United States of America_, *101*(12), pp. 4164-9. ISSN 0027-8424,
Lee DD and Seung H (2001). "Algorithms for non-negative matrix factorization." _Advances in neural information processing
systems_.
Wang G, Kossenkov AV and Ochs MF (2006). "LS-NMF: a modified non-negative matrix factorization algorithm utilizing
uncertainty estimates." _BMC bioinformatics_, *7*, pp. 175. ISSN 1471-2105,
Pascual-Montano A, Carazo JM, Kochi K, Lehmann D and Pascual-marqui RD (2006). "Nonsmooth nonnegative matrix factorization (nsNMF)." _IEEE Trans. Pattern Anal. Mach. Intell_, *28*, pp. 403-415.
Badea L (2008). "Extracting gene expression profiles common to colon and pancreatic adenocarcinoma using simultaneous
nonnegative matrix factorization." _Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing_, *290*, pp. 267-78.
ISSN 1793-5091,
Kim H and Park H (2007). "Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares
for microarray data analysis." _Bioinformatics (Oxford, England)_, *23*(12), pp. 1495-502. ISSN 1460-2059,
Van Benthem M and Keenan MR (2004). "Fast algorithm for the solution of large-scale non-negativity-constrained least squares
problems." _Journal of Chemometrics_, *18*(10), pp. 441-450. ISSN 0886-9383,
# Only basic calls are presented in this manpage.
# Many more examples are provided in the demo file nmf.R
## Not run:
##D demo('nmf')
## End(Not run)
# random data
x <- rmatrix(20,10)
# run default algorithm with rank 2
res <- nmf(x, 2)
# specify the algorithm
res <- nmf(x, 2, 'lee')
# get verbose message on what is going on
res <- nmf(x, 2, .options='v')
## NMF algorithm: 'brunet'
## NMF seeding method: random
## Iterations: 0/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 1/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 50/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 100/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 150/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 200/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 250/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 300/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 350/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 400/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 450/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
Iterations: 500/2000
## DONE (converged at 500/2000 iterations)
## Not run:
##D # more messages
##D res <- nmf(x, 2, .options='v2')
##D # even more
##D res <- nmf(x, 2, .options='v3')
##D # and so on ...
## End(Not run)
nmfAlgorithm