Package 'FeatSeekR'

Title: FeatSeekR an R package for unsupervised feature selection
Description: FeatSeekR performs unsupervised feature selection using replicated measurements. It iteratively selects features with the highest reproducibility across replicates, after projecting out those dimensions from the data that are spanned by the previously selected features. The selected a set of features has a high replicate reproducibility and a high degree of uniqueness.
Authors: Tuemay Capraz [cre, aut]
Maintainer: Tuemay Capraz <[email protected]>
License: GPL-3
Version: 1.5.0
Built: 2024-06-30 03:12:58 UTC
Source: https://github.com/bioc/FeatSeekR

Help Index


FeatSeek

Description

This function ranks features of a 2 dimensional array according to their reproducibility between conditions.

Usage

FeatSeek(
  data,
  conditions = NULL,
  max_features = NULL,
  init = NULL,
  verbose = TRUE
)

Arguments

data

SummarizedExperiment with assay named data, where samples belongs to different conditions. Which sample belongs to which condition should be indicated in colData slot conditions. Or matrix with features x samples. Each conditions have multiple samples from replicated measurements.

conditions

factor of length samples, indicating which sample belongs to which condition. Only required if data is provided as matrix.

max_features

integer number of features to rank

init

character vector with names of initial features. If NULL the feature with highest F-statistic will be used

verbose

logical indicating whether messages should be printed

Value

SummarizedExperiment containing one assay with the selected features. rowData stores for each selected feature the F-statistic under metric, the cumulative explained variance under explained_variance and the feature names under selected

Examples

# run FeatSeek to select the top 20 features
data <-  array(rnorm(100*30), dim=c(30, 100),
dimnames <- list(paste("feature", seq_len(30)), NULL))
conds <- rep(seq_len(50), 2)
res <- FeatSeek(data, conds, max_features=20)

# res stores the 20 selected features ranked by their replicate 
# reproducibility

FeatSeekR an R package for unsupervised feature selection

Description

FeatSeekR performs unsupervised feature selection using replicated measurements. It iteratively selects features with the highest reproducibility across conditions, after projecting out those dimensions from the data that are spanned by the previously selected features. The selected a set of features has a high replicate reproducibility and a high degree of uniqueness.

Details

For information on how to use this package please type vignette("FeatSeekR-vignette").

Please post questions regarding the package to the Bioconductor Support Site:

https://support.bioconductor.org

Author(s)

Tümay Capraz


plotSelectedFeatures

Description

plot correlation matrix of selected feature sets

Usage

plotSelectedFeatures(res, n_features = NULL, assay = "selected")

Arguments

res

result SummarizedExperiment from FeatSeek function

n_features

top n_features to plot. if NULL then the maximum number of features in res will be plotted

assay

assay slot to plot from result SummarizedExperiment object, default is the selected features slot

Value

returns heatmap of selected features

Examples

# run FeatSeek to select the top 20 features
data <-  array(rnorm(100*30), dim=c(30,100),
            dimnames = list(paste("feature", seq_len(30)), NULL))
conds <- rep(seq_len(50), 2)
res <- FeatSeek(data, conds, max_features=20)

# res stores the 20 selected features ranked by their replicate 
# reproducibility
# plot the top 5 features
plotSelectedFeatures(res, n_features=5)

plotVarianceExplained

Description

plot variance explained from 1 to max_features in res

Usage

plotVarianceExplained(res)

Arguments

res

result SummarizedExperiment from FeatSeek function

Value

returns plot of variance explained vs number of features

Examples

# run FeatSeek to select the top 20 features
data <-  array(rnorm(100*30), dim=c(30,100),
            dimnames = list(paste("feature", seq_len(30)), NULL))
conds <- rep(seq_len(50), 2)
res <- FeatSeek(data, conds, max_features=20)

# res stores the 20 selected features ranked by their replicate 
# reproducibility
plotVarianceExplained(res)

simData

Description

simulate Data with orthogonal feature clusters and replicated samples. Each feature cluster corresponds to a different latent factor and contains 10 redundant features. E.g. choosing samples = 100, n_latent_factors = 5 and replicates = 2 will simulate a 50 x 200 data matrix, where the first 100 samples belong to replicate 1 and sample 101-200 belong to replicate 2.

Usage

simData(conditions, n_latent_factors, replicates)

Arguments

conditions

number of conditions to generate samples from

n_latent_factors

number of latent factors to generate

replicates

number of replicates to generate

Details

simData constructs n_latent_factors by generating a random matrix Q\mathbf{Q} whose row vectors QiN(0,1)\mathbf{Q}_{i\cdot} \sim \mathcal{N}(0,1) with nn samples and i{1,,n_latent_factors}i \in \{1, \dots, \textrm{n\_latent\_factors}\} are orthonormal, each corresponding to a different latent factor. To simulate a set of redundant feature groups, it generates 10 features XjX_{j\cdot} for each latent factor Qi\mathbf{Q}_{i\cdot} by scaling each latent factor by a random factor δjN(0,1)\delta_j \sim \mathcal{N}(0,1) and adding replicate specific noise ϵcN(0,0.1)\pmb{\epsilon}_c \sim \mathcal{N}(0,0.1) with c{1,,replicates}c \in \{1, \dots, \textrm{replicates}\} preserving orthogonality.

Value

SummarizedExperiment object carrying simulated data, with colData indicating which sample belongs to which replicate

Examples

# simulate data 100 samples from 100 conditions, 20 features generated by 2 
# latent factors and 2 replicates
simData(conditions=100, n_latent_factors=2, replicates=2)