Title: | FeatSeekR an R package for unsupervised feature selection |
---|---|
Description: | FeatSeekR performs unsupervised feature selection using replicated measurements. It iteratively selects features with the highest reproducibility across replicates, after projecting out those dimensions from the data that are spanned by the previously selected features. The selected a set of features has a high replicate reproducibility and a high degree of uniqueness. |
Authors: | Tuemay Capraz [cre, aut] |
Maintainer: | Tuemay Capraz <[email protected]> |
License: | GPL-3 |
Version: | 1.7.0 |
Built: | 2024-10-30 07:18:44 UTC |
Source: | https://github.com/bioc/FeatSeekR |
This function ranks features of a 2 dimensional array according to their reproducibility between conditions.
FeatSeek( data, conditions = NULL, max_features = NULL, init = NULL, verbose = TRUE )
FeatSeek( data, conditions = NULL, max_features = NULL, init = NULL, verbose = TRUE )
data |
|
conditions |
factor of length samples,
indicating which sample belongs to which condition. Only required if
|
max_features |
|
init |
|
verbose |
|
SummarizedExperiment
containing one assay with the
selected features. rowData
stores for each selected feature the
F-statistic under metric
,
the cumulative explained variance under explained_variance
and
the feature names under selected
# run FeatSeek to select the top 20 features data <- array(rnorm(100*30), dim=c(30, 100), dimnames <- list(paste("feature", seq_len(30)), NULL)) conds <- rep(seq_len(50), 2) res <- FeatSeek(data, conds, max_features=20) # res stores the 20 selected features ranked by their replicate # reproducibility
# run FeatSeek to select the top 20 features data <- array(rnorm(100*30), dim=c(30, 100), dimnames <- list(paste("feature", seq_len(30)), NULL)) conds <- rep(seq_len(50), 2) res <- FeatSeek(data, conds, max_features=20) # res stores the 20 selected features ranked by their replicate # reproducibility
FeatSeekR performs unsupervised feature selection using replicated measurements. It iteratively selects features with the highest reproducibility across conditions, after projecting out those dimensions from the data that are spanned by the previously selected features. The selected a set of features has a high replicate reproducibility and a high degree of uniqueness.
For information on how to use this package please type
vignette("FeatSeekR-vignette")
.
Please post questions regarding the package to the Bioconductor Support Site:
https://support.bioconductor.org
Tümay Capraz
plot correlation matrix of selected feature sets
plotSelectedFeatures(res, n_features = NULL, assay = "selected")
plotSelectedFeatures(res, n_features = NULL, assay = "selected")
res |
result |
n_features |
top |
assay |
assay slot to plot from result |
returns heatmap of selected features
# run FeatSeek to select the top 20 features data <- array(rnorm(100*30), dim=c(30,100), dimnames = list(paste("feature", seq_len(30)), NULL)) conds <- rep(seq_len(50), 2) res <- FeatSeek(data, conds, max_features=20) # res stores the 20 selected features ranked by their replicate # reproducibility # plot the top 5 features plotSelectedFeatures(res, n_features=5)
# run FeatSeek to select the top 20 features data <- array(rnorm(100*30), dim=c(30,100), dimnames = list(paste("feature", seq_len(30)), NULL)) conds <- rep(seq_len(50), 2) res <- FeatSeek(data, conds, max_features=20) # res stores the 20 selected features ranked by their replicate # reproducibility # plot the top 5 features plotSelectedFeatures(res, n_features=5)
plot variance explained from 1 to max_features in res
plotVarianceExplained(res)
plotVarianceExplained(res)
res |
result |
returns plot of variance explained vs number of features
# run FeatSeek to select the top 20 features data <- array(rnorm(100*30), dim=c(30,100), dimnames = list(paste("feature", seq_len(30)), NULL)) conds <- rep(seq_len(50), 2) res <- FeatSeek(data, conds, max_features=20) # res stores the 20 selected features ranked by their replicate # reproducibility plotVarianceExplained(res)
# run FeatSeek to select the top 20 features data <- array(rnorm(100*30), dim=c(30,100), dimnames = list(paste("feature", seq_len(30)), NULL)) conds <- rep(seq_len(50), 2) res <- FeatSeek(data, conds, max_features=20) # res stores the 20 selected features ranked by their replicate # reproducibility plotVarianceExplained(res)
simulate Data with orthogonal feature clusters and replicated
samples. Each feature cluster corresponds to a different latent factor and
contains 10 redundant features. E.g. choosing samples = 100,
n_latent_factors = 5 and replicates = 2
will simulate a 50 x 200 data matrix, where the first 100 samples belong to
replicate 1 and sample 101-200 belong to replicate 2.
simData(conditions, n_latent_factors, replicates)
simData(conditions, n_latent_factors, replicates)
conditions |
number of conditions to generate samples from |
n_latent_factors |
number of latent factors to generate |
replicates |
number of replicates to generate |
simData constructs n_latent_factors by generating a random matrix
whose row vectors
with
samples and
are
orthonormal, each corresponding to a different latent factor. To simulate a
set of redundant feature groups, it generates 10 features
for each latent factor
by scaling each latent
factor by a random factor
and adding
replicate specific noise
with
preserving orthogonality.
SummarizedExperiment
object carrying simulated data, with
colData
indicating which sample belongs to which replicate
# simulate data 100 samples from 100 conditions, 20 features generated by 2 # latent factors and 2 replicates simData(conditions=100, n_latent_factors=2, replicates=2)
# simulate data 100 samples from 100 conditions, 20 features generated by 2 # latent factors and 2 replicates simData(conditions=100, n_latent_factors=2, replicates=2)