Package 'SplineDV'

Title: Differential Variability (DV) analysis for single-cell RNA sequencing data. (e.g. Identify Differentially Variable Genes across two experimental conditions)
Description: A spline based scRNA-seq method for identifying differentially variable (DV) genes across two experimental conditions. Spline-DV constructs a 3D spline from 3 key gene statistics: mean expression, coefficient of variance, and dropout rate. This is done for both conditions. The 3D spline provides the “expected” behavior of genes in each condition. The distance of the observed mean, CV and dropout rate of each gene from the expected 3D spline is used to measure variability. As the final step, the spline-DV method compares the variabilities of each condition to identify differentially variable (DV) genes.
Authors: Shreyan Gupta [aut, cre] , James Cai [aut]
Maintainer: Shreyan Gupta <[email protected]>
License: GPL-2
Version: 0.99.11
Built: 2024-11-26 03:10:49 UTC
Source: https://github.com/bioc/SplineDV

Help Index


Plot 3D scatter plot in two conditions

Description

Plots 3D gene statistic scatter plot of target genes in two conditions.

Usage

DVPlot(df, targetgene = NULL, ptSize = 3, lwd = 5, dlwd = 7)

Arguments

df

Resultant DataFrame after splineDV analysis

targetgene

An integer value. Defines the minimum reads required for a cell to be included in the analysis.

ptSize

An integer value. Defines point size for dots

lwd

An integer value. Defines line width for spline.

dlwd

An integer value. Defines Line width for target gene distance.

Value

3D plotly scatter plot

Author(s)

Shreyan Gupta <[email protected]>

Examples

# example code
## Generate example count data
X <- abs(matrix(round(rpois(2000*500, lambda=0.5), 0), nrow=2000, ncol=500))
rownames(X) <- paste0('g', as.character(1:2000))
Y <- abs(matrix(round(rpois(2000*500, lambda=0.5), 0), nrow=2000, ncol=500))
rownames(Y) <- paste0('g', as.character(1:2000))

## Run splineDV
res <- splineDV(X, Y)
fig <- DVPlot(res)

Plot HVG 3D scatter plot

Description

Plots 3D scatter plot with HVGs or target gene(s) highlighted. The 3D spline represents the estimated expected behavior of genes in the sample.

Usage

HVGPlot(df, targetgene = NULL, ptSize = 3, lwd = 5, dlwd = 7)

Arguments

df

Resultant DataFrame after splineHVG analysis

targetgene

An integer value. Defines the minimum reads required for a cell to be included in the analysis.

ptSize

An integer value. Defines point size for dots

lwd

An integer value. Defines line width for spline.

dlwd

An integer value. Defines Line width for target gene distance.

Value

3D plotly scatter plot.

Author(s)

Shreyan Gupta <[email protected]>

Examples

# example code
## Generate example count data
X <- abs(matrix(round(rpois(2000*500, lambda=0.5),0), nrow=2000, ncol=500))
rownames(X) <- paste0('g', as.character(1:2000))

## Run splineHVG
res <- splineHVG(X)
fig <- HVGPlot(res)

Quality control

Description

QC filter scRNA-seq expression data. Removes lowly expressed genes and cells. Also removes cells with high mitochondrial gene expression.

Usage

hvgQC(X, ncounts = 1000, ncells = 15, mtPerc = 15)

Arguments

X

Count matrix or SingleCellExperiment.

ncounts

An integer value. Defines the minimum reads required for a cell to be included in the analysis.

ncells

An integer value. Defines the minimum cells required for a gene to be included in the analysis.

mtPerc

A double value. Defines the minimum percent mitochondrial genes expression required for a cell to be excluded from the analysis.

Value

QC filtered SingleCellExperiment.

Author(s)

Shreyan Gupta <[email protected]>

Examples

# example code
## Generate example count data
X <- abs(matrix(round(rpois(2000*500, lambda=0.5),0), nrow=2000, ncol=500))
rownames(X) <- paste0('g', as.character(1:2000))

## Run splineHVG
QCres <- hvgQC(X)

Spline-DV

Description

Differential Variability (DV) analysis. 3 gene statistics are used - Mean, CV and Dropout rate of each gene. A smooth.spline is modeled using these statistics. First, the distance vectors of each gene from spline is computed using the splineHVG function for each condition. Then the vector distance between the distance vectors are computed. A high

Usage

splineDV(
  X,
  Y,
  ncounts = 1000,
  ncells = 15,
  spar = 0.5,
  mtPerc = 15,
  detailed = FALSE
)

Arguments

X

Count matrix or SingleCellExperiment (Test sample)

Y

Count matrix or SingleCellExperiment (Control sample)

ncounts

An integer value. Defines the minimum reads required for a cell to be included in the analysis.

ncells

An integer value. Defines the minimum cells required for a gene to be included in the analysis.

spar

A double value. Smoothing parameter for Spline.

mtPerc

A double value. Defines the minimum percent mitochondrial genes expression required for a cell to be excluded from the analysis.

detailed

A boolean value. Defines whether to add individual splineHVG DataFrame to the output.

Value

A DataFrame with DV Statistics. The statistics include log1p(mean), log1p(CV), dropout rates, nearest point on spline (splinex, spliney and splinez) and distance from spline for each sample. The distance across conditions is stored in "vectorDist". P values are computed assuming normal distribution of distance differences.

Author(s)

Shreyan Gupta <[email protected]>

Examples

# example code
## Generate example count data
X <- abs(matrix(round(rpois(2000*500, lambda=0.5), 0), nrow=2000, ncol=500))
rownames(X) <- paste0('g', as.character(1:2000))
Y <- abs(matrix(round(rpois(2000*500, lambda=0.5), 0), nrow=2000, ncol=500))
rownames(Y) <- paste0('g', as.character(1:2000))

## Run splineDV
res <- splineDV(X, Y)

Spline-HVG

Description

Compute Highly Variable Genes from an scRNAseq expression data. Uses 3 gene statstics - Mean, CV and Dropout rate to model a 3D spline to estimate the expected behavior of genes. A gene is considered highly variable if the actual gene expression is far from the estimated 3D spline.

Usage

splineHVG(
  X,
  QC = TRUE,
  ncounts = 1000,
  ncells = 15,
  mtPerc = 15,
  spar = 0.5,
  nHVGs = 2000,
  use.ndist = TRUE
)

Arguments

X

Count matrix or SingleCellExperiment

QC

A Boolean value (TRUE/FALSE), if TRUE, a quality control is applied over the data.

ncounts

An integer value. Defines the minimum reads required for a cell to be included in the analysis.

ncells

An integer value. Defines the minimum cells required for a gene to be included in the analysis.

mtPerc

A double value. Defines the minimum percent mitochondrial genes expression required for a cell to be excluded from the analysis.

spar

A double value. Smoothing parameter for Spline.

nHVGs

An integer value. Number of top Highly Variable Genes (HVGs) to select.

use.ndist

A Boolean value (TRUE/FALSE), if TRUE, computes the nearest point on spline by nearest-neighbor search (TRUE Recommended). Else, uses the position of the corresponding gene on the spline for distance computation.

Value

A DataFrame with highly variable gene selection statistics. The statistics include log1p(mean), log1p(CV), dropout rates, nearest point on spline (splinex, spliney and splinez) and distance from spline. A higher distance signifies higher variability.

Author(s)

Shreyan Gupta <[email protected]>

Examples

# example code
## Generate example count data
X <- abs(matrix(round(rpois(2000*500, lambda=0.5),0), nrow=2000, ncol=500))
rownames(X) <- paste0('g', as.character(1:2000))

## Run splineHVG
res <- splineHVG(X)