Title: | A normalization-invariant minimum enclosing ball method to detect differentially expressed genes for RNA-seq and scRNA-seq data |
---|---|
Description: | This package provides a method to identify differential expression genes in the same or different species. Given that non-DE genes have some similarities in features, a scaling-free minimum enclosing ball (SFMEB) model is built to cover those non-DE genes in feature space, then those DE genes, which are enormously different from non-DE genes, being regarded as outliers and rejected outside the ball. The method on this package is described in the article 'A minimum enclosing ball method to detect differential expression genes for RNA-seq data'. The SFMEB method is extended to the scMEB method that considering two or more potential types of cells or unknown labels scRNA-seq dataset DEGs identification. |
Authors: | Yan Zhou, Jiadi Zhu |
Maintainer: | Jiadi Zhu <[email protected]>, Yan Zhou <[email protected]> |
License: | GPL-2 |
Version: | 1.21.0 |
Built: | 2024-10-30 08:21:01 UTC |
Source: | https://github.com/bioc/MEB |
Use a normalization-invariant minimum enclosing ball (NIMEB) method to discriminate differential expression (DE) genes in the same or different species.
NIMEB(countsTable, train_id, gamma, nu = 0.01, reject_rate = 0.1, ds = FALSE)
NIMEB(countsTable, train_id, gamma, nu = 0.01, reject_rate = 0.1, ds = FALSE)
countsTable |
Matrix or data.frame of short read counts for each genes in the same or different species. |
train_id |
A vector shows the position of housekeeping genes or conserved genes in countsTable. |
gamma |
A parameter needed for all kernels except linear. |
nu |
parameter needed for one-classification. |
reject_rate |
A value used in controling the scale of ball, default is 0.01. |
ds |
A value to show the data is for the same species or different species. If ds is FALSE, the data is the same species, else the data is the different species. |
list(.) A list of results, "model" represents the model of NIMEB, which could be used to discriminate a new gene, "gamma" represents the selected gamma parameters in model NIMEB, "train_error" represents the corresponding train_error when the value of gamma changed.
## Simulation data for the same species. library(SummarizedExperiment) data(sim_data_sp) gamma <- seq(1e-06,5e-05,1e-06) sim_model_sp <- NIMEB(countsTable = assay(sim_data_sp), train_id=seq_len(1000), gamma, nu = 0.01, reject_rate = 0.05, ds = FALSE) ## Real data for the same species. data(real_data_sp) gamma <- seq(1e-06,5e-05,1e-06) real_model_sp <- NIMEB(countsTable = assay(real_data_sp), train_id=seq_len(530), gamma, nu = 0.01, reject_rate = 0.1, ds = FALSE) ## Simulation data for the different species. data(sim_data_dsp) gamma <- seq(1e-07,2e-05,1e-06) sim_model_dsp <- NIMEB(countsTable = assay(sim_data_dsp), train_id=seq_len(1000), gamma, nu = 0.01, reject_rate = 0.1, ds = TRUE) ## Real data for the different species. data(real_data_dsp) gamma <- seq(5e-08,5e-07,1e-08) real_model_dsp <- NIMEB(countsTable = assay(real_data_dsp), train_id=seq_len(143), gamma, nu = 0.01, reject_rate = 0.1, ds = TRUE)
## Simulation data for the same species. library(SummarizedExperiment) data(sim_data_sp) gamma <- seq(1e-06,5e-05,1e-06) sim_model_sp <- NIMEB(countsTable = assay(sim_data_sp), train_id=seq_len(1000), gamma, nu = 0.01, reject_rate = 0.05, ds = FALSE) ## Real data for the same species. data(real_data_sp) gamma <- seq(1e-06,5e-05,1e-06) real_model_sp <- NIMEB(countsTable = assay(real_data_sp), train_id=seq_len(530), gamma, nu = 0.01, reject_rate = 0.1, ds = FALSE) ## Simulation data for the different species. data(sim_data_dsp) gamma <- seq(1e-07,2e-05,1e-06) sim_model_dsp <- NIMEB(countsTable = assay(sim_data_dsp), train_id=seq_len(1000), gamma, nu = 0.01, reject_rate = 0.1, ds = TRUE) ## Real data for the different species. data(real_data_dsp) gamma <- seq(5e-08,5e-07,1e-08) real_model_dsp <- NIMEB(countsTable = assay(real_data_dsp), train_id=seq_len(143), gamma, nu = 0.01, reject_rate = 0.1, ds = TRUE)
This data set includes two species and 19330 genes with corresponding short read counts, in which the first 143 genes are conserved genes.
real_data_dsp
real_data_dsp
A data.frame contains two species and 19330 genes.
Brawand, D., Soumillon, M., Necsulea, A. and Julien, P. et al. (2011). The evolution of gene expression levels in mammalian organs. Nature, 478, 343-348.
This data set includes two samples and each sample with five biological replicates and 16519 genes with corresponding short read counts, in which the first 530 genes are housekeeping genes.
real_data_sp
real_data_sp
A data.frame contains two samples and each sample with five biological replicates and 16519 genes.
Marioni J.C., Mason C.E., et al. (2008). RNA-seq: an assessment of technical reproducibility and comparisonwith gene expression arrays. Genome Res. 18(9), 1509–1517.
Using the Minimum Enclosing Ball (MEB) method to discriminate differential expression genes (DEGs) without prior cell clustering results.
scMEB(sce, stable_idx, filtered = FALSE, gamma = seq(1e-04,0.001,1e-05), nu = 0.01, reject_rate = 0.1)
scMEB(sce, stable_idx, filtered = FALSE, gamma = seq(1e-04,0.001,1e-05), nu = 0.01, reject_rate = 0.1)
sce |
A SingleCellExperiment class scRNA-seq data. |
stable_idx |
A vector shows the name of stable expressed gene in sce. |
filtered |
A logical value to show if the data have been filtered. |
gamma |
A parameter needed for all kernels except linear. |
nu |
A parameter needed for one-classification. |
reject_rate |
A value used in controling the scale of ball, default is 0.01. |
list(.) A list of results, "model" represents the model of scMEB, which could be used to discriminate a new gene, "dat_pca" represents the first 50 PCs of each genes, "gamma" represents the selected gamma parameters in model scMEB, "train_error" represents the corresponding train_error when the value of gamma changed, "dist" shows the distance between the points and the radius of the sphere in feature space.
## Simulation data for scRNA-seq data generated from splatter package. library(SingleCellExperiment) data(sim_scRNA_data) data(stable_gene) sim_scRNA <- scMEB(sce=sim_scRNA_data, stable_idx=stable_gene, filtered = FALSE, gamma = seq(1e-04,0.001,1e-05), nu = 0.01, reject_rate = 0.1)
## Simulation data for scRNA-seq data generated from splatter package. library(SingleCellExperiment) data(sim_scRNA_data) data(stable_gene) sim_scRNA <- scMEB(sce=sim_scRNA_data, stable_idx=stable_gene, filtered = FALSE, gamma = seq(1e-04,0.001,1e-05), nu = 0.01, reject_rate = 0.1)
This data set includes two species and 18472 genes with corresponding short read counts, in which the first 1000 genes are conserved genes.
sim_data_dsp
sim_data_dsp
A data.frame contains two species and 18472 genes.
Jiadi Zhu, Yan Zhou, Junhui Wang, Youlong Yang (2019, pending publication). A minimum enclosing ball method to detect differential expression genes for RNA-seq data.
This data set includes two samples and 10623 genes with corresponding short read counts, in which the first 1000 genes are housekeeping genes.
sim_data_sp
sim_data_sp
A data.frame contains two samples and 10623 genes.
Jiadi Zhu, Yan Zhou, Junhui Wang, Youlong Yang (2019, pending publication). A minimum enclosing ball method to detect differential expression genes for RNA-seq data.
This dataset is a SingleCellExperiment class, it includes 5,000 genes and 100 cells.
sim_scRNA_data
sim_scRNA_data
A SingleCellExperiment class data includes 5,000 genes and 100 cells.
A simulation data generated from spaltter package.
The name of the stable genes in all dataset that used to build the model.
stable_gene
stable_gene
A vector contains the name of the stable genes.
The simulation data (sim_scRNA_data) generated from spaltter package.