Package 'ROSeq' reference manual

Title:	Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-Seq data
Description:	ROSeq - A rank based approach to modeling gene expression with filtered and normalized read count matrix. ROSeq takes filtered and normalized read matrix and cell-annotation/condition as input and determines the differentially expressed genes between the contrasting groups of single cells. One of the input parameters is the number of cores to be used.
Authors:	Krishan Gupta [aut, cre], Manan Lalit [aut], Aditya Biswas [aut], Abhik Ghosh [aut], Debarka Sengupta [aut]
Maintainer:	Krishan Gupta <[email protected]>
License:	GPL-3
Version:	1.19.0
Built:	2025-03-04 05:35:21 UTC
Source:	https://github.com/bioc/ROSeq

Computes differential expression for the gene in question, by comparing the optimal parameters for sub-populations one and two

Description

Uses the (asymptotically) optimum two-sample Wald test based on the MLE of the parameters and its asymptotic variances given by the inverse of the Fisher information matrix

Usage

computeDEG(results_1, results_2)
computeDEG(results_1, results_2)

Arguments

`results_1`	A vector corresponding to sub-population one and containing 5 values (a, b, A, number of bins, R2)
`results_2`	A vector corresponding to sub-population two and containing 5 values (a, b, A, number of bins, R2)

Value

T The Wald test statistic for testing the null hypothesis

Finds the optimal values of parameters a and b that model the probability distribution of ranks, by Maximising the Log-Likelihood

Description

Takes in as input the read count data corresponding to one sub-population and the typical gene statistics. Then it splits the entire range into equally sized bins of size $k * \sigma$ , where k is a scalar with a default value of 0.05, and $\sigma$ is the standard deviation of the pulled expression estimates across the cell-groups. Each of these bins corresponds to a rank. Therefore, for each group, cell frequency for each bin maps to a rank. These frequencies are normalized group-wise by dividing by the total cell count within a concerned group.

Usage

findParams(ds, geneStats)
findParams(ds, geneStats)

Arguments

`ds`	The (normalized and filtered) read count data corresponding to a sub-population
`geneStats`	A vector containing 7 values corresponding to the gene data (maximum, minimum, mean, standard deviation, upper multiple of the standard deviation, lower multiple of standard deviation and log_2(fold change))

Value

results A vector containing 5 values (a, b, A, number of bins, R2)

Finds the double derivative of A

Description

Finds the double derivative of A with with respect to a, (a, b), b , (a, b) in respective templates from right to left. This first derivative is evaluated at the optimal (a_hat, b_hat). u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b

Usage

getd(u1, v, du1da, dvda)
getd(u1, v, du1da, dvda)

Arguments

`u1`	u1
`v`	v
`du1da`	First derivative of u1 with respect to parameter a
`dvda`	First derivative of v with respect to parameter a

Value

d2logAda2

Evaluates statistics of the read counts corresponding to the gene

Description

Takes in the complete read count vector corresponding to the gene (sp) and also the data corresponding to the two sub-populations (sp1 and sp2)

Usage

getDataStatistics(sp, spOne, spTwo)
getDataStatistics(sp, spOne, spTwo)

Arguments

`sp`	The complete (normalized and filtered) read count data corresponding to the gene in question
`spOne`	The (normalized and filtered) read count data corresponding to the first sub-population
`spTwo`	The (normalized and filtered) read count data corresponding to the second sub-population

Value

geneStats A vector containing 6 values corresponding to the gene data(maximum, minimum, mean, standard deviation, upper multiple of standard deviation and lower multiple of standard deviation)

Finds the first derivative of u1 with respect to a. This first derivative is evaluated at the optimal (a_hat, b_hat).

Description

u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b

Usage

getdu1da(coefficients, r)
getdu1da(coefficients, r)

Arguments

`coefficients`	the optimal values of a and b
`r`	the rank vector

Value

du1da

Finds the first derivative of u1 with respect to b. This first derivative is evaluated at the optimal (a_hat, b_hat).

Description

u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b

Usage

getdu1db(coefficients, r)
getdu1db(coefficients, r)

Arguments

`coefficients`	the optimal values of a and b
`r`	the rank vector

Value

du1db

Finds the first derivative of u2 with respect to a. This first derivative is evaluated at the optimal (a_hat, b_hat).

Description

u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b

Usage

getdu2da(coefficients, r)
getdu2da(coefficients, r)

Arguments

`coefficients`	the optimal values of a and b
`r`	the rank vector

Value

du2da

Finds the first derivative of u2 with respect to b. This first derivative is evaluated at the optimal (a_hat, b_hat).

Description

u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b

Usage

getdu2db(coefficients, r)
getdu2db(coefficients, r)

Arguments

`coefficients`	the optimal values of a and b
`r`	the rank vector

Value

du2db

Finds the first derivative of v with respect to a. This first derivative is evaluated at the optimal (a_hat, b_hat).

Description

u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b

Usage

getdvda(coefficients, r)
getdvda(coefficients, r)

Arguments

`coefficients`	the optimal values of a and b
`r`	the rank vector

Value

dvda

Finds the first derivative of v with respect to b. This first derivative is evaluated at the optimal (a_hat, b_hat).

Description

u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b

Usage

getdvdb(coefficients, r)
getdvdb(coefficients, r)

Arguments

`coefficients`	the optimal values of a and b
`r`	the rank vector

Value

dvdb

Computes the Fisher Information Matrix

Description

The Fisher Information Matrix and its derivatives are essential to evulate the minima of log likelihood

Usage

getI(results)
getI(results)

Arguments

results

A vector containing 5 values (a, b, A, number of bins, R2)

Value

I The Fisher Information Matrix

Computes u1

Description

u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b

Usage

getu1(coefficients, r)
getu1(coefficients, r)

Arguments

`coefficients`	the optimal values of a and b
`r`	the rank vector

Value

Computes u2

Description

u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b

Usage

getu2(coefficients, r)
getu2(coefficients, r)

Arguments

`coefficients`	the optimal values of a and b
`r`	the rank vector

Value

Computes v

Description

u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b

Usage

getv(coefficients, r)
getv(coefficients, r)

Arguments

`coefficients`	the optimal values of a and b
`r`	the rank vector

Value

Computes differential analysis for a given gene

Description

Takes in the row index which corresponds to a gene and evaluates for differential expression across two cell types.

Usage

initiateAnalysis(gene, scdata, scgroups, classOne, classTwo)
initiateAnalysis(gene, scdata, scgroups, classOne, classTwo)

Arguments

`gene`	The row index of the normalised and filtered, read count matrix
`scdata`	The normalised and filtered, read count matrix
`scgroups`	The location of the two sub-populations
`classOne`	The location of the first sub-population, for example, sample names as given as columns names
`classTwo`	The location of thesecond sub-population, for example, sample names as given as columns names

Value

combinedResults A vector containing 12 values (gr1: a, g1: b, gr1: A, gr1: number of bins, gr1: R2, gr2: a, gr2: b, gr2: A, gr2: number of bins, gr2: R2, T, p)

Single cell samples for DE genes analysis

Description

Three replicates from three human induced pluripotent stem cell (iPSC) lines were considered. 96 single cells were considered in each of the three replicates corresponding to one of the three individuals (these shall be referred to by their labels NA19098,NA19101 and NA19239)

Usage

data("L_Tung_single")data("L_Tung_single")

Format

The format is: list of cells corresponding NA19098 versus NA19101 and groups labels.

Details

filtered and normalized data

Source

Tung, P.-Y.et al.Batch effects and the effective design of single-cell geneexpression studies.Scientific reports7, 39921 (2017).

References

Tung, P.-Y.et al.Batch effects and the effective design of single-cell geneexpression studies.Scientific reports7, 39921 (2017).

Examples

    data(L_Tung_single)
    ## summary(ROSeq::L_Tung_single)
data(L_Tung_single)
    ## summary(ROSeq::L_Tung_single)

Minimizes the Negative Log-Likelihood by iterating across values of parameters a and b

Description

Takes in as input a vector of values (coefficients), the number of bins and the normalized probability dsitribution of ranks

Usage

minimizeNLL(coefficients, r, readCount)
minimizeNLL(coefficients, r, readCount)

Arguments

`coefficients`	A vector containing two values for a and b
`r`	The number of bins
`readCount`	A vector of (normalized) frequency of read counts that fall within each bin

Value

NLL Negative-Log Likelihood for the input coefficients

Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-Seq data

Description

Takes in the complete filtered and normalized read count matrix, the location of the two sub-populations and the number of cores to be used

Usage

ROSeq(countData, condition, numCores = 1)
ROSeq(countData, condition, numCores = 1)

Arguments

`countData`	The normalised and filtered, read count matrix, with row names as genes name/ID and column names as sample id/name
`condition`	Labels for the two sub-populations
`numCores`	The number of cores to be used

Value

pValues and FDR adjusted p significance values

Examples

countData<-list()
countData$count<-ROSeq::L_Tung_single$NA19098_NA19101_count
countData$group<-ROSeq::L_Tung_single$NA19098_NA19101_group
head(countData$count)
gene_names<-rownames(countData$count)
countData$count<-apply(countData$count,2,function(x) as.numeric(x))
rownames(countData$count)<-gene_names
countData$count<-countData$count[,colSums(countData$count> 0) > 2000]
g_keep <- apply(countData$count,1,function(x) sum(x>2)>=3)
countData$count<-countData$count[g_keep,]
countData$count<-limma::voom(ROSeq::TMMnormalization(countData$count))
output<-ROSeq(countData=countData$count$E, condition = countData$group)
output
countData<-list()
countData$count<-ROSeq::L_Tung_single$NA19098_NA19101_count
countData$group<-ROSeq::L_Tung_single$NA19098_NA19101_group
head(countData$count)
gene_names<-rownames(countData$count)
countData$count<-apply(countData$count,2,function(x) as.numeric(x))
rownames(countData$count)<-gene_names
countData$count<-countData$count[,colSums(countData$count> 0) > 2000]
g_keep <- apply(countData$count,1,function(x) sum(x>2)>=3)
countData$count<-countData$count[g_keep,]
countData$count<-limma::voom(ROSeq::TMMnormalization(countData$count))
output<-ROSeq(countData=countData$count$E, condition = countData$group)
output

TMM Normalization.

Description

Trimmed Means of M values (TMM) normalization (on the basis of edgeR package)

Usage

TMMnormalization(countTable)
TMMnormalization(countTable)

Arguments

countTable

The filtered, read count matrix, with row names as genes name/ID and column names as sample id/name

Value

countTableTMM

Examples

countData<-list()
countData$count<-ROSeq::L_Tung_single$NA19098_NA19101_count
countData$group<-ROSeq::L_Tung_single$NA19098_NA19101_group
head(countData$count)
gene_names<-rownames(countData$count)
countData$count<-apply(countData$count,2,function(x) as.numeric(x))
rownames(countData$count)<-gene_names
countData$count<-countData$count[,colSums(countData$count> 0) > 2000]
g_keep <- apply(countData$count,1,function(x) sum(x>2)>=3)
countData$count<-countData$count[g_keep,]
countTableTMM<-ROSeq::TMMnormalization(countData$count)
countTableTMM
countData<-list()
countData$count<-ROSeq::L_Tung_single$NA19098_NA19101_count
countData$group<-ROSeq::L_Tung_single$NA19098_NA19101_group
head(countData$count)
gene_names<-rownames(countData$count)
countData$count<-apply(countData$count,2,function(x) as.numeric(x))
rownames(countData$count)<-gene_names
countData$count<-countData$count[,colSums(countData$count> 0) > 2000]
g_keep <- apply(countData$count,1,function(x) sum(x>2)>=3)
countData$count<-countData$count[g_keep,]
countTableTMM<-ROSeq::TMMnormalization(countData$count)
countTableTMM

Package 'ROSeq'

Help Index

Computes differential expression for the gene in question, by comparing the optimal parameters for sub-populations one and two

Description

Usage

Arguments

Value

See Also

Finds the optimal values of parameters a and b that model the probability distribution of ranks, by Maximising the Log-Likelihood

Description

Usage

Arguments

Value

Finds the double derivative of A

Description

Usage

Arguments

Value

Evaluates statistics of the read counts corresponding to the gene

Description

Usage

Arguments

Value

Finds the first derivative of u1 with respect to a. This first derivative is evaluated at the optimal (a_hat, b_hat).

Description

Usage

Arguments

Value

Finds the first derivative of u1 with respect to b. This first derivative is evaluated at the optimal (a_hat, b_hat).

Description

Usage

Arguments

Value

Finds the first derivative of u2 with respect to a. This first derivative is evaluated at the optimal (a_hat, b_hat).

Description

Usage

Arguments

Value

Finds the first derivative of u2 with respect to b. This first derivative is evaluated at the optimal (a_hat, b_hat).

Description

Usage

Arguments

Value

Finds the first derivative of v with respect to a. This first derivative is evaluated at the optimal (a_hat, b_hat).

Description

Usage

Arguments

Value

Finds the first derivative of v with respect to b. This first derivative is evaluated at the optimal (a_hat, b_hat).

Description

Usage

Arguments

Value

Computes the Fisher Information Matrix

Description

Usage

Arguments

Value

Computes u1

Description

Usage

Arguments

Value

Computes u2

Description

Usage

Arguments

Value

Computes v

Description

Usage

Arguments

Value

Computes differential analysis for a given gene

Description

Usage

Arguments

Value

Single cell samples for DE genes analysis

Description