The function featureScore
implements different methods to computes
basis-specificity scores for each feature in the data.
The function extractFeatures
implements different methods to select the
most basis-specific features of each basis component.
featureScore(object, ...) S4 (matrix) `featureScore`(object, method = c("kim", "max")) extractFeatures(object, ...) S4 (matrix) `extractFeatures`(object, method = c("kim", "max"), format = c("list", "combine", "subset"), nodups = TRUE)
extractFeatures
, it may be an integer vector that
indicates the number of top most contributing features to
extract from each column of object
, when ordered in decreasing order,
or a numeric value between 0 and 1 that indicates the minimum relative basis
contribution above which a feature is selected (i.e. basis contribution threshold).
In the case of a single numeric value (integer or percentage), it is used for all columns.
Note that extractFeatures(x, 1)
means relative contribution threshold of
100%, to select the top contributing features one must explicitly specify
an integer value as in extractFeatures(x, 1L)
.
However, if all elements in methods are > 1, they are automatically treated as
if they were integers: extractFeatures(x, 2)
means the top-2 most
contributing features in each component.object
, each containing the indexes of the selected features, as an
integer vector.
If object
has row names, these are used to name each index vector.
Components for which no feature were selected are assigned a NA
value.
nodups=TRUE
(default).
object
,
but subset with the selected indexes, so that it contains data only from
basis-specific features.
format='combine'
.featureScore
returns a numeric vector of the length the number
of rows in object
(i.e. one score per feature).
extractFeatures
returns the selected features as a list of indexes,
a single integer vector or an object of the same class as object
that only contains the selected features.
One of the properties of Nonnegative Matrix Factorization is that is tend to produce sparse representation of the observed data, leading to a natural application to bi-clustering, that characterises groups of samples by a small number of features.
In NMF models, samples are grouped according to the basis
components that contributes the most to each sample, i.e. the basis
components that have the greatest coefficient in each column of the coefficient
matrix (see predict,NMF-method
).
Each group of samples is then characterised by a set of features selected
based on basis-specifity scores that are computed on the basis matrix.
signature(object = "matrix")
: Select features on a given matrix, that contains the basis component in columns.
signature(object = "NMF")
: Select basis-specific features from an NMF model, by applying the method
extractFeatures,matrix
to its basis matrix.
signature(object = "matrix")
: Computes feature scores on a given matrix, that contains the basis component in columns.
signature(object = "NMF")
: Computes feature scores on the basis matrix of an NMF model.
The function featureScore
can compute basis-specificity scores using
the following methods:
The score for feature i
is defined as:
S_i = 1 + 1/log2(k) sum_q [ p(i,q) log2( p(i,q) ) ] ,where
p(i,q)
is the probability that thei
-th feature contributes to basisq
:p(i,q) = W(i,q) / (sum_r W(i,r))The feature scores are real values within the range [0,1]. The higher the feature score the more basis-specific the corresponding feature.
The feature scores are defined as the row maximums.
The function extractFeatures
can select features using the following
methods:
The features are first scored using the function
featureScore
with method kim.
Then only the features that fulfil both following criteria are retained:
\hat{\mu} + 3 \hat{\sigma}
, where \hat{\mu}
and \hat{\sigma}
are the median and the median absolute deviation
(MAD) of the scores respectively;
bioNMF
software package and described in Carmona-Saez et al. (2006).
For each basis component, the features are first sorted by decreasing contribution. Then, one selects only the first consecutive features whose highest contribution in the basis matrix is effectively on the considered basis.
Kim H and Park H (2007). "Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares
for microarray data analysis." _Bioinformatics (Oxford, England)_, *23*(12), pp. 1495-502. ISSN 1460-2059,
Carmona-Saez P, Pascual-Marqui RD, Tirado F, Carazo JM and Pascual-Montano A (2006). "Biclustering of gene expression data by
Non-smooth Non-negative Matrix Factorization." _BMC bioinformatics_, *7*, pp. 78. ISSN 1471-2105,