Clustering and Prediction

Description

The methods predict for NMF models return the cluster membership of each sample or each feature. Currently the classification/prediction of new data is not implemented.

Usage

predict(object, ...)

S4 (NMF)
`predict`(object, what = c("columns", "rows", "samples", "features"), prob = FALSE, 
  dmatrix = FALSE)

S4 (NMFfitX)
`predict`(object, what = c("columns", "rows", "samples", "features", "consensus", 
      "chc"), dmatrix = FALSE, ...)

Arguments

object
an NMF model
what
a character string that indicates the type of cluster membership should be returned: ‘columns’ or ‘rows’ for clustering the colmuns or the rows of the target matrix respectively. The values ‘samples’ and ‘features’ are aliases for ‘colmuns’ and ‘rows’ respectively.
prob
logical that indicates if the relative contributions of/to the dominant basis component should be computed and returned. See Details.
dmatrix
logical that indicates if a dissimiliarity matrix should be attached to the result. This is notably used internally when computing NMF clustering silhouettes.
...
additional arguments affecting the predictions produced.

Details

The cluster membership is computed as the index of the dominant basis component for each sample (what='samples' or 'columns') or each feature (what='features' or 'rows'), based on their corresponding entries in the coefficient matrix or basis matrix respectively.

For example, if what='samples', then the dominant basis component is computed for each column of the coefficient matrix as the row index of the maximum within the column.

If argument prob=FALSE (default), the result is a factor. Otherwise a list with two elements is returned: element predict contains the cluster membership index (as a factor) and element prob contains the relative contribution of the dominant component to each sample (resp. the relative contribution of each feature to the dominant basis component):

  • Samples:
    p(j) = x(k0) / sum_k x(k),
    for each sample 1\leq j \leq p, where x(k) is the contribution
    of the k-th basis component to j-th sample (i.e. H[k ,j]), and
    x(k0) is the maximum of these contributions.

  • Features:
    p(i) = y(k0) / sum_k y(k),
    for each feature 1\leq i \leq p, where y(k) is the contribution
    of the k-th basis component to i-th feature (i.e. W[i, k]), and
    y(k0) is the maximum of these contributions.

Methods

  1. predictsignature(object = "NMF"): Default method for NMF models

  2. predictsignature(object = "NMFfitX"): Returns the cluster membership index from an NMF model fitted with multiple runs.

    Besides the type of clustering available for any NMF models ('columns', 'rows', 'samples', 'features'), this method can return the cluster membership index based on the consensus matrix, computed from the multiple NMF runs.

    Argument what accepts the following extra types:

    1. 'chc' returns the cluster membership based on the hierarchical clustering of the consensus matrix, as performed by consensushc.
    2. 'consensus' same as 'chc' but the levels of the membership index are re-labeled to match the order of the clusters as they would be displayed on the associated dendrogram, as re-ordered on the default annotation track in consensus heatmap produced by consensusmap.

References

Brunet J, Tamayo P, Golub TR and Mesirov JP (2004). "Metagenes and molecular pattern discovery using matrix factorization." _Proceedings of the National Academy of Sciences of the United States of America_, *101*(12), pp. 4164-9. ISSN 0027-8424, , .

Pascual-Montano A, Carazo JM, Kochi K, Lehmann D and Pascual-marqui RD (2006). "Nonsmooth nonnegative matrix factorization (nsNMF)." _IEEE Trans. Pattern Anal. Mach. Intell_, *28*, pp. 403-415.

Examples



# random target matrix
v <- rmatrix(20, 10)
# fit an NMF model
x <- nmf(v, 5)

# predicted column and row clusters
predict(x)
##  [1] 2 1 3 3 2 2 4 3 5 5
## Levels: 1 2 3 4 5
predict(x, 'rows')
##  [1] 4 3 3 1 2 1 3 2 2 4 5 2 2 5 2 2 2 3 5 1
## Levels: 1 2 3 4 5
# with relative contributions of each basis component
predict(x, prob=TRUE)
## $predict
##  [1] 2 1 3 3 2 2 4 3 5 5
## Levels: 1 2 3 4 5
## 
## $prob
##  [1] 0.6548911 0.5736570 0.3991033 0.4738792 0.7041045 0.3985352 0.7131128
##  [8] 0.4525372 0.5564594 0.4987524
predict(x, 'rows', prob=TRUE)
## $predict
##  [1] 4 3 3 1 2 1 3 2 2 4 5 2 2 5 2 2 2 3 5 1
## Levels: 1 2 3 4 5
## 
## $prob
##  [1] 0.3316286 0.4483612 0.4585225 0.3756082 0.5058462 0.4518340 0.7303948
##  [8] 0.3822813 0.4583829 0.4653933 0.7779704 0.4404912 0.3925128 0.4034469
## [15] 0.4019369 0.5154141 0.4928875 0.3634886 0.3868426 0.7014758