Package 'FeatSeekR' reference manual

Title:	FeatSeekR an R package for unsupervised feature selection
Description:	FeatSeekR performs unsupervised feature selection using replicated measurements. It iteratively selects features with the highest reproducibility across replicates, after projecting out those dimensions from the data that are spanned by the previously selected features. The selected a set of features has a high replicate reproducibility and a high degree of uniqueness.
Authors:	Tuemay Capraz [cre, aut]
Maintainer:	Tuemay Capraz <[email protected]>
License:	GPL-3
Version:	1.7.2
Built:	2025-03-18 06:06:02 UTC
Source:	https://github.com/bioc/FeatSeekR

FeatSeek

Description

This function ranks features of a 2 dimensional array according to their reproducibility between conditions.

Usage

FeatSeek(
  data,
  conditions = NULL,
  max_features = NULL,
  init = NULL,
  verbose = TRUE
)
FeatSeek(
  data,
  conditions = NULL,
  max_features = NULL,
  init = NULL,
  verbose = TRUE
)

Arguments

`data`	`SummarizedExperiment` with assay named `data`, where samples belongs to different conditions. Which sample belongs to which condition should be indicated in `colData` slot conditions. Or `matrix` with features x samples. Each conditions have multiple samples from replicated measurements.
`conditions`	factor of length samples, indicating which sample belongs to which condition. Only required if `data` is provided as `matrix`.
`max_features`	`integer` number of features to rank
`init`	`character vector` with names of initial features. If `NULL` the feature with highest F-statistic will be used
`verbose`	`logical` indicating whether messages should be printed

Value

SummarizedExperiment containing one assay with the selected features. rowData stores for each selected feature the F-statistic under metric, the cumulative explained variance under explained_variance and the feature names under selected

Examples

# run FeatSeek to select the top 20 features
data <-  array(rnorm(100*30), dim=c(30, 100),
dimnames <- list(paste("feature", seq_len(30)), NULL))
conds <- rep(seq_len(50), 2)
res <- FeatSeek(data, conds, max_features=20)

# res stores the 20 selected features ranked by their replicate 
# reproducibility

# run FeatSeek to select the top 20 features
data <-  array(rnorm(100*30), dim=c(30, 100),
dimnames <- list(paste("feature", seq_len(30)), NULL))
conds <- rep(seq_len(50), 2)
res <- FeatSeek(data, conds, max_features=20)

# res stores the 20 selected features ranked by their replicate 
# reproducibility

FeatSeekR an R package for unsupervised feature selection

Description

FeatSeekR performs unsupervised feature selection using replicated measurements. It iteratively selects features with the highest reproducibility across conditions, after projecting out those dimensions from the data that are spanned by the previously selected features. The selected a set of features has a high replicate reproducibility and a high degree of uniqueness.

Details

For information on how to use this package please type vignette("FeatSeekR-vignette").

Please post questions regarding the package to the Bioconductor Support Site:

https://support.bioconductor.org

Author(s)

Tümay Capraz

plotSelectedFeatures

Description

plot correlation matrix of selected feature sets

Usage

plotSelectedFeatures(res, n_features = NULL, assay = "selected")
plotSelectedFeatures(res, n_features = NULL, assay = "selected")

Arguments

`res`	result `SummarizedExperiment` from `FeatSeek` function
`n_features`	top `n_features` to plot. if `NULL` then the maximum number of features in res will be plotted
`assay`	assay slot to plot from result `SummarizedExperiment` object, default is the selected features slot

Value

returns heatmap of selected features

Examples

# run FeatSeek to select the top 20 features
data <-  array(rnorm(100*30), dim=c(30,100),
            dimnames = list(paste("feature", seq_len(30)), NULL))
conds <- rep(seq_len(50), 2)
res <- FeatSeek(data, conds, max_features=20)

# res stores the 20 selected features ranked by their replicate 
# reproducibility
# plot the top 5 features
plotSelectedFeatures(res, n_features=5)

# run FeatSeek to select the top 20 features
data <-  array(rnorm(100*30), dim=c(30,100),
            dimnames = list(paste("feature", seq_len(30)), NULL))
conds <- rep(seq_len(50), 2)
res <- FeatSeek(data, conds, max_features=20)

# res stores the 20 selected features ranked by their replicate 
# reproducibility
# plot the top 5 features
plotSelectedFeatures(res, n_features=5)

plotVarianceExplained

Description

plot variance explained from 1 to max_features in res

Usage

plotVarianceExplained(res)
plotVarianceExplained(res)

Arguments

res

result SummarizedExperiment from FeatSeek function

Value

returns plot of variance explained vs number of features

Examples

# run FeatSeek to select the top 20 features
data <-  array(rnorm(100*30), dim=c(30,100),
            dimnames = list(paste("feature", seq_len(30)), NULL))
conds <- rep(seq_len(50), 2)
res <- FeatSeek(data, conds, max_features=20)

# res stores the 20 selected features ranked by their replicate 
# reproducibility
plotVarianceExplained(res)

# run FeatSeek to select the top 20 features
data <-  array(rnorm(100*30), dim=c(30,100),
            dimnames = list(paste("feature", seq_len(30)), NULL))
conds <- rep(seq_len(50), 2)
res <- FeatSeek(data, conds, max_features=20)

# res stores the 20 selected features ranked by their replicate 
# reproducibility
plotVarianceExplained(res)

simData

Description

simulate Data with orthogonal feature clusters and replicated samples. Each feature cluster corresponds to a different latent factor and contains 10 redundant features. E.g. choosing samples = 100, n_latent_factors = 5 and replicates = 2 will simulate a 50 x 200 data matrix, where the first 100 samples belong to replicate 1 and sample 101-200 belong to replicate 2.

Usage

simData(conditions, n_latent_factors, replicates)
simData(conditions, n_latent_factors, replicates)

Arguments

`conditions`	number of conditions to generate samples from
`n_latent_factors`	number of latent factors to generate
`replicates`	number of replicates to generate

Details

simData constructs n_latent_factors by generating a random matrix $\mathbf{Q}$ whose row vectors $\mathbf{Q}_{i\cdot} \sim \mathcal{N}(0,1)$ with $n$ samples and $i \in \{1, \dots, \textrm{n\_latent\_factors}\}$ are orthonormal, each corresponding to a different latent factor. To simulate a set of redundant feature groups, it generates 10 features $X_{j\cdot}$ for each latent factor $\mathbf{Q}_{i\cdot}$ by scaling each latent factor by a random factor $\delta_j \sim \mathcal{N}(0,1)$ and adding replicate specific noise $\pmb{\epsilon}_c \sim \mathcal{N}(0,0.1)$ with $c \in \{1, \dots, \textrm{replicates}\}$ preserving orthogonality.

Value

SummarizedExperiment object carrying simulated data, with colData indicating which sample belongs to which replicate

Examples

# simulate data 100 samples from 100 conditions, 20 features generated by 2 
# latent factors and 2 replicates
simData(conditions=100, n_latent_factors=2, replicates=2)
# simulate data 100 samples from 100 conditions, 20 features generated by 2 
# latent factors and 2 replicates
simData(conditions=100, n_latent_factors=2, replicates=2)

Package 'FeatSeekR'

Help Index

FeatSeek

Description

Usage

Arguments

Value

Examples

FeatSeekR an R package for unsupervised feature selection

Description

Details

Author(s)

plotSelectedFeatures

Description

Usage

Arguments

Value

Examples

plotVarianceExplained

Description

Usage

Arguments

Value

Examples

simData

Description

Usage

Arguments

Details

Value

Examples