Title: | Differential Variability (DV) analysis for single-cell RNA sequencing data. (e.g. Identify Differentially Variable Genes across two experimental conditions) |
---|---|
Description: | A spline based scRNA-seq method for identifying differentially variable (DV) genes across two experimental conditions. Spline-DV constructs a 3D spline from 3 key gene statistics: mean expression, coefficient of variance, and dropout rate. This is done for both conditions. The 3D spline provides the “expected” behavior of genes in each condition. The distance of the observed mean, CV and dropout rate of each gene from the expected 3D spline is used to measure variability. As the final step, the spline-DV method compares the variabilities of each condition to identify differentially variable (DV) genes. |
Authors: | Shreyan Gupta [aut, cre] , James Cai [aut] |
Maintainer: | Shreyan Gupta <[email protected]> |
License: | GPL-2 |
Version: | 0.99.11 |
Built: | 2024-11-26 03:10:49 UTC |
Source: | https://github.com/bioc/SplineDV |
Plots 3D gene statistic scatter plot of target genes in two conditions.
DVPlot(df, targetgene = NULL, ptSize = 3, lwd = 5, dlwd = 7)
DVPlot(df, targetgene = NULL, ptSize = 3, lwd = 5, dlwd = 7)
df |
Resultant DataFrame after splineDV analysis |
targetgene |
An integer value. Defines the minimum reads required for a cell to be included in the analysis. |
ptSize |
An integer value. Defines point size for dots |
lwd |
An integer value. Defines line width for spline. |
dlwd |
An integer value. Defines Line width for target gene distance. |
3D plotly scatter plot
Shreyan Gupta <[email protected]>
# example code ## Generate example count data X <- abs(matrix(round(rpois(2000*500, lambda=0.5), 0), nrow=2000, ncol=500)) rownames(X) <- paste0('g', as.character(1:2000)) Y <- abs(matrix(round(rpois(2000*500, lambda=0.5), 0), nrow=2000, ncol=500)) rownames(Y) <- paste0('g', as.character(1:2000)) ## Run splineDV res <- splineDV(X, Y) fig <- DVPlot(res)
# example code ## Generate example count data X <- abs(matrix(round(rpois(2000*500, lambda=0.5), 0), nrow=2000, ncol=500)) rownames(X) <- paste0('g', as.character(1:2000)) Y <- abs(matrix(round(rpois(2000*500, lambda=0.5), 0), nrow=2000, ncol=500)) rownames(Y) <- paste0('g', as.character(1:2000)) ## Run splineDV res <- splineDV(X, Y) fig <- DVPlot(res)
Plots 3D scatter plot with HVGs or target gene(s) highlighted. The 3D spline represents the estimated expected behavior of genes in the sample.
HVGPlot(df, targetgene = NULL, ptSize = 3, lwd = 5, dlwd = 7)
HVGPlot(df, targetgene = NULL, ptSize = 3, lwd = 5, dlwd = 7)
df |
Resultant DataFrame after splineHVG analysis |
targetgene |
An integer value. Defines the minimum reads required for a cell to be included in the analysis. |
ptSize |
An integer value. Defines point size for dots |
lwd |
An integer value. Defines line width for spline. |
dlwd |
An integer value. Defines Line width for target gene distance. |
3D plotly scatter plot.
Shreyan Gupta <[email protected]>
# example code ## Generate example count data X <- abs(matrix(round(rpois(2000*500, lambda=0.5),0), nrow=2000, ncol=500)) rownames(X) <- paste0('g', as.character(1:2000)) ## Run splineHVG res <- splineHVG(X) fig <- HVGPlot(res)
# example code ## Generate example count data X <- abs(matrix(round(rpois(2000*500, lambda=0.5),0), nrow=2000, ncol=500)) rownames(X) <- paste0('g', as.character(1:2000)) ## Run splineHVG res <- splineHVG(X) fig <- HVGPlot(res)
QC filter scRNA-seq expression data. Removes lowly expressed genes and cells. Also removes cells with high mitochondrial gene expression.
hvgQC(X, ncounts = 1000, ncells = 15, mtPerc = 15)
hvgQC(X, ncounts = 1000, ncells = 15, mtPerc = 15)
X |
Count matrix or SingleCellExperiment. |
ncounts |
An integer value. Defines the minimum reads required for a cell to be included in the analysis. |
ncells |
An integer value. Defines the minimum cells required for a gene to be included in the analysis. |
mtPerc |
A double value. Defines the minimum percent mitochondrial genes expression required for a cell to be excluded from the analysis. |
QC filtered SingleCellExperiment.
Shreyan Gupta <[email protected]>
# example code ## Generate example count data X <- abs(matrix(round(rpois(2000*500, lambda=0.5),0), nrow=2000, ncol=500)) rownames(X) <- paste0('g', as.character(1:2000)) ## Run splineHVG QCres <- hvgQC(X)
# example code ## Generate example count data X <- abs(matrix(round(rpois(2000*500, lambda=0.5),0), nrow=2000, ncol=500)) rownames(X) <- paste0('g', as.character(1:2000)) ## Run splineHVG QCres <- hvgQC(X)
Differential Variability (DV) analysis. 3 gene statistics are used - Mean, CV and Dropout rate of each gene. A smooth.spline is modeled using these statistics. First, the distance vectors of each gene from spline is computed using the splineHVG function for each condition. Then the vector distance between the distance vectors are computed. A high
splineDV( X, Y, ncounts = 1000, ncells = 15, spar = 0.5, mtPerc = 15, detailed = FALSE )
splineDV( X, Y, ncounts = 1000, ncells = 15, spar = 0.5, mtPerc = 15, detailed = FALSE )
X |
Count matrix or SingleCellExperiment (Test sample) |
Y |
Count matrix or SingleCellExperiment (Control sample) |
ncounts |
An integer value. Defines the minimum reads required for a cell to be included in the analysis. |
ncells |
An integer value. Defines the minimum cells required for a gene to be included in the analysis. |
spar |
A double value. Smoothing parameter for Spline. |
mtPerc |
A double value. Defines the minimum percent mitochondrial genes expression required for a cell to be excluded from the analysis. |
detailed |
A boolean value. Defines whether to add individual splineHVG DataFrame to the output. |
A DataFrame with DV Statistics. The statistics include log1p(mean), log1p(CV), dropout rates, nearest point on spline (splinex, spliney and splinez) and distance from spline for each sample. The distance across conditions is stored in "vectorDist". P values are computed assuming normal distribution of distance differences.
Shreyan Gupta <[email protected]>
# example code ## Generate example count data X <- abs(matrix(round(rpois(2000*500, lambda=0.5), 0), nrow=2000, ncol=500)) rownames(X) <- paste0('g', as.character(1:2000)) Y <- abs(matrix(round(rpois(2000*500, lambda=0.5), 0), nrow=2000, ncol=500)) rownames(Y) <- paste0('g', as.character(1:2000)) ## Run splineDV res <- splineDV(X, Y)
# example code ## Generate example count data X <- abs(matrix(round(rpois(2000*500, lambda=0.5), 0), nrow=2000, ncol=500)) rownames(X) <- paste0('g', as.character(1:2000)) Y <- abs(matrix(round(rpois(2000*500, lambda=0.5), 0), nrow=2000, ncol=500)) rownames(Y) <- paste0('g', as.character(1:2000)) ## Run splineDV res <- splineDV(X, Y)
Compute Highly Variable Genes from an scRNAseq expression data. Uses 3 gene statstics - Mean, CV and Dropout rate to model a 3D spline to estimate the expected behavior of genes. A gene is considered highly variable if the actual gene expression is far from the estimated 3D spline.
splineHVG( X, QC = TRUE, ncounts = 1000, ncells = 15, mtPerc = 15, spar = 0.5, nHVGs = 2000, use.ndist = TRUE )
splineHVG( X, QC = TRUE, ncounts = 1000, ncells = 15, mtPerc = 15, spar = 0.5, nHVGs = 2000, use.ndist = TRUE )
X |
Count matrix or SingleCellExperiment |
QC |
A Boolean value (TRUE/FALSE), if TRUE, a quality control is applied over the data. |
ncounts |
An integer value. Defines the minimum reads required for a cell to be included in the analysis. |
ncells |
An integer value. Defines the minimum cells required for a gene to be included in the analysis. |
mtPerc |
A double value. Defines the minimum percent mitochondrial genes expression required for a cell to be excluded from the analysis. |
spar |
A double value. Smoothing parameter for Spline. |
nHVGs |
An integer value. Number of top Highly Variable Genes (HVGs) to select. |
use.ndist |
A Boolean value (TRUE/FALSE), if TRUE, computes the nearest point on spline by nearest-neighbor search (TRUE Recommended). Else, uses the position of the corresponding gene on the spline for distance computation. |
A DataFrame with highly variable gene selection statistics. The statistics include log1p(mean), log1p(CV), dropout rates, nearest point on spline (splinex, spliney and splinez) and distance from spline. A higher distance signifies higher variability.
Shreyan Gupta <[email protected]>
# example code ## Generate example count data X <- abs(matrix(round(rpois(2000*500, lambda=0.5),0), nrow=2000, ncol=500)) rownames(X) <- paste0('g', as.character(1:2000)) ## Run splineHVG res <- splineHVG(X)
# example code ## Generate example count data X <- abs(matrix(round(rpois(2000*500, lambda=0.5),0), nrow=2000, ncol=500)) rownames(X) <- paste0('g', as.character(1:2000)) ## Run splineHVG res <- splineHVG(X)