Title: | Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-Seq data |
---|---|
Description: | ROSeq - A rank based approach to modeling gene expression with filtered and normalized read count matrix. ROSeq takes filtered and normalized read matrix and cell-annotation/condition as input and determines the differentially expressed genes between the contrasting groups of single cells. One of the input parameters is the number of cores to be used. |
Authors: | Krishan Gupta [aut, cre], Manan Lalit [aut], Aditya Biswas [aut], Abhik Ghosh [aut], Debarka Sengupta [aut] |
Maintainer: | Krishan Gupta <[email protected]> |
License: | GPL-3 |
Version: | 1.19.0 |
Built: | 2024-12-04 05:59:57 UTC |
Source: | https://github.com/bioc/ROSeq |
Uses the (asymptotically) optimum two-sample Wald test based on the MLE of the parameters and its asymptotic variances given by the inverse of the Fisher information matrix
computeDEG(results_1, results_2)
computeDEG(results_1, results_2)
results_1 |
A vector corresponding to sub-population one and containing 5 values (a, b, A, number of bins, R2) |
results_2 |
A vector corresponding to sub-population two and containing 5 values (a, b, A, number of bins, R2) |
T The Wald test statistic for testing the null hypothesis
Takes in as input the read count data corresponding
to one sub-population and the typical gene statistics.
Then it splits the entire range into equally sized bins of
size , where k is a scalar with a default value
of 0.05, and
is the standard deviation of the pulled
expression estimates across the cell-groups.
Each of these bins corresponds to a rank. Therefore, for each group, cell
frequency for each bin maps to a rank. These frequencies are normalized
group-wise by dividing by the total cell count within a concerned group.
findParams(ds, geneStats)
findParams(ds, geneStats)
ds |
The (normalized and filtered) read count data corresponding to a sub-population |
geneStats |
A vector containing 7 values corresponding to the gene data (maximum, minimum, mean, standard deviation, upper multiple of the standard deviation, lower multiple of standard deviation and log_2(fold change)) |
results A vector containing 5 values (a, b, A, number of bins, R2)
Finds the double derivative of A with with respect to a, (a, b), b , (a, b) in respective templates from right to left. This first derivative is evaluated at the optimal (a_hat, b_hat). u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b
getd(u1, v, du1da, dvda)
getd(u1, v, du1da, dvda)
u1 |
u1 |
v |
v |
du1da |
First derivative of u1 with respect to parameter a |
dvda |
First derivative of v with respect to parameter a |
d2logAda2
Takes in the complete read count vector corresponding to the gene (sp) and also the data corresponding to the two sub-populations (sp1 and sp2)
getDataStatistics(sp, spOne, spTwo)
getDataStatistics(sp, spOne, spTwo)
sp |
The complete (normalized and filtered) read count data corresponding to the gene in question |
spOne |
The (normalized and filtered) read count data corresponding to the first sub-population |
spTwo |
The (normalized and filtered) read count data corresponding to the second sub-population |
geneStats A vector containing 6 values corresponding to the gene data(maximum, minimum, mean, standard deviation, upper multiple of standard deviation and lower multiple of standard deviation)
u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b
getdu1da(coefficients, r)
getdu1da(coefficients, r)
coefficients |
the optimal values of a and b |
r |
the rank vector |
du1da
u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b
getdu1db(coefficients, r)
getdu1db(coefficients, r)
coefficients |
the optimal values of a and b |
r |
the rank vector |
du1db
u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b
getdu2da(coefficients, r)
getdu2da(coefficients, r)
coefficients |
the optimal values of a and b |
r |
the rank vector |
du2da
u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b
getdu2db(coefficients, r)
getdu2db(coefficients, r)
coefficients |
the optimal values of a and b |
r |
the rank vector |
du2db
u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b
getdvda(coefficients, r)
getdvda(coefficients, r)
coefficients |
the optimal values of a and b |
r |
the rank vector |
dvda
u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b
getdvdb(coefficients, r)
getdvdb(coefficients, r)
coefficients |
the optimal values of a and b |
r |
the rank vector |
dvdb
The Fisher Information Matrix and its derivatives are essential to evulate the minima of log likelihood
getI(results)
getI(results)
results |
A vector containing 5 values (a, b, A, number of bins, R2) |
I The Fisher Information Matrix
u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b
getu1(coefficients, r)
getu1(coefficients, r)
coefficients |
the optimal values of a and b |
r |
the rank vector |
u1
u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b
getu2(coefficients, r)
getu2(coefficients, r)
coefficients |
the optimal values of a and b |
r |
the rank vector |
u2
u1, v and u2 constitute the equations required for evaluating the first and second order derivatives of A with respect to parameters a and b
getv(coefficients, r)
getv(coefficients, r)
coefficients |
the optimal values of a and b |
r |
the rank vector |
v
Takes in the row index which corresponds to a gene and evaluates for differential expression across two cell types.
initiateAnalysis(gene, scdata, scgroups, classOne, classTwo)
initiateAnalysis(gene, scdata, scgroups, classOne, classTwo)
gene |
The row index of the normalised and filtered, read count matrix |
scdata |
The normalised and filtered, read count matrix |
scgroups |
The location of the two sub-populations |
classOne |
The location of the first sub-population, for example, sample names as given as columns names |
classTwo |
The location of thesecond sub-population, for example, sample names as given as columns names |
combinedResults A vector containing 12 values (gr1: a, g1: b, gr1: A, gr1: number of bins, gr1: R2, gr2: a, gr2: b, gr2: A, gr2: number of bins, gr2: R2, T, p)
Three replicates from three human induced pluripotent stem cell (iPSC) lines were considered. 96 single cells were considered in each of the three replicates corresponding to one of the three individuals (these shall be referred to by their labels NA19098,NA19101 and NA19239)
data("L_Tung_single")
data("L_Tung_single")
The format is: list of cells corresponding NA19098 versus NA19101 and groups labels.
filtered and normalized data
Tung, P.-Y.et al.Batch effects and the effective design of single-cell geneexpression studies.Scientific reports7, 39921 (2017).
Tung, P.-Y.et al.Batch effects and the effective design of single-cell geneexpression studies.Scientific reports7, 39921 (2017).
data(L_Tung_single) ## summary(ROSeq::L_Tung_single)
data(L_Tung_single) ## summary(ROSeq::L_Tung_single)
Takes in as input a vector of values (coefficients), the number of bins and the normalized probability dsitribution of ranks
minimizeNLL(coefficients, r, readCount)
minimizeNLL(coefficients, r, readCount)
coefficients |
A vector containing two values for a and b |
r |
The number of bins |
readCount |
A vector of (normalized) frequency of read counts that fall within each bin |
NLL Negative-Log Likelihood for the input coefficients
Takes in the complete filtered and normalized read count matrix, the location of the two sub-populations and the number of cores to be used
ROSeq(countData, condition, numCores = 1)
ROSeq(countData, condition, numCores = 1)
countData |
The normalised and filtered, read count matrix, with row names as genes name/ID and column names as sample id/name |
condition |
Labels for the two sub-populations |
numCores |
The number of cores to be used |
pValues and FDR adjusted p significance values
countData<-list() countData$count<-ROSeq::L_Tung_single$NA19098_NA19101_count countData$group<-ROSeq::L_Tung_single$NA19098_NA19101_group head(countData$count) gene_names<-rownames(countData$count) countData$count<-apply(countData$count,2,function(x) as.numeric(x)) rownames(countData$count)<-gene_names countData$count<-countData$count[,colSums(countData$count> 0) > 2000] g_keep <- apply(countData$count,1,function(x) sum(x>2)>=3) countData$count<-countData$count[g_keep,] countData$count<-limma::voom(ROSeq::TMMnormalization(countData$count)) output<-ROSeq(countData=countData$count$E, condition = countData$group) output
countData<-list() countData$count<-ROSeq::L_Tung_single$NA19098_NA19101_count countData$group<-ROSeq::L_Tung_single$NA19098_NA19101_group head(countData$count) gene_names<-rownames(countData$count) countData$count<-apply(countData$count,2,function(x) as.numeric(x)) rownames(countData$count)<-gene_names countData$count<-countData$count[,colSums(countData$count> 0) > 2000] g_keep <- apply(countData$count,1,function(x) sum(x>2)>=3) countData$count<-countData$count[g_keep,] countData$count<-limma::voom(ROSeq::TMMnormalization(countData$count)) output<-ROSeq(countData=countData$count$E, condition = countData$group) output
Trimmed Means of M values (TMM) normalization (on the basis of edgeR package)
TMMnormalization(countTable)
TMMnormalization(countTable)
countTable |
The filtered, read count matrix, with row names as genes name/ID and column names as sample id/name |
countTableTMM
countData<-list() countData$count<-ROSeq::L_Tung_single$NA19098_NA19101_count countData$group<-ROSeq::L_Tung_single$NA19098_NA19101_group head(countData$count) gene_names<-rownames(countData$count) countData$count<-apply(countData$count,2,function(x) as.numeric(x)) rownames(countData$count)<-gene_names countData$count<-countData$count[,colSums(countData$count> 0) > 2000] g_keep <- apply(countData$count,1,function(x) sum(x>2)>=3) countData$count<-countData$count[g_keep,] countTableTMM<-ROSeq::TMMnormalization(countData$count) countTableTMM
countData<-list() countData$count<-ROSeq::L_Tung_single$NA19098_NA19101_count countData$group<-ROSeq::L_Tung_single$NA19098_NA19101_group head(countData$count) gene_names<-rownames(countData$count) countData$count<-apply(countData$count,2,function(x) as.numeric(x)) rownames(countData$count)<-gene_names countData$count<-countData$count[,colSums(countData$count> 0) > 2000] g_keep <- apply(countData$count,1,function(x) sum(x>2)>=3) countData$count<-countData$count[g_keep,] countTableTMM<-ROSeq::TMMnormalization(countData$count) countTableTMM