Title: | OVESEG-test to detect tissue/cell-specific markers |
---|---|
Description: | An R package for multiple-group comparison to detect tissue/cell-specific marker genes among subtypes. It provides functions to compute OVESEG-test statistics, derive component weights in the mixture null distribution model and estimate p-values from weightedly aggregated permutations. Obtained posterior probabilities of component null hypotheses can also portrait all kinds of upregulation patterns among subtypes. |
Authors: | Lulu Chen <[email protected]> |
Maintainer: | Lulu Chen <[email protected]> |
License: | GPL-2 |
Version: | 1.23.0 |
Built: | 2024-12-18 03:45:44 UTC |
Source: | https://github.com/bioc/OVESEG |
Function OVESEGtest
performs OVESEG-test for expression
profiles from multiple groups to detect subtype-specific marker genes.
While it may take a long time to execute permutations for p-value estimation,
users can apply OVESEGtstat
to obtain OVESEG-test statistics
to rank genes and apply postProbNull
to obtain each gene's
posterior probabilities of component null hypotheses.
nullDistri
estimates probabilities of any one group being
upregulated under null hypotheses.
patternDistri
estimates probabilities of all kinds of
upregulation patterns among groups.
Chen, L., Herrington, D., Clarke, R., Yu, G., and Wang, Y. (2019). “Data-Driven Robust Detection of Tissue/Cell-Specific Markers.” bioRxiv. https://doi.org/10.1101/517961.
Three cell subtypes (B cells, CD4+ T cells, CD8+ T cells) were purified from 20 fresh blood samples. RNA was extracted from each of these cell subsets and processed into RNA sequencing libraries (Illumina TruSeq). Sequencing libraries were analyzed on an Illumina HiScan, with a target read depth of ~20M reads. Reads were demultiplexed, mapped to human gene models (ENSEMBL), and tabulated using HTSeq. We downsample the original data to 10000 genes. Subtype labels for purified populations are also included. (Data generation script can be found in ./data_raw folder.)
data(countBT)
data(countBT)
A list with one count mixture (count) and a categorical vector giving subtypes (group).
Linsley et al. PLoS One 2014;9(10):e109760. PMID: 25314013
This function estimates probabilities of any one group being upregulated than other groups under null hypotheses.
nullDistri(ppnull)
nullDistri(ppnull)
ppnull |
a list returned by |
The probability of one group being upregulated under null hypotheses is calculated by accumulating and normalizing genewise posterior probability of null hypotheses. The group with higher probability tends to get more False Positive MGs.
a numeric vector indicating probabilities of each group being upregulated than others under null hypotheses.
data(RocheBT) ppnull <- postProbNull(RocheBT$y, RocheBT$group, alpha='moderated') pk <- nullDistri(ppnull)
data(RocheBT) ppnull <- postProbNull(RocheBT$y, RocheBT$group, alpha='moderated') pk <- nullDistri(ppnull)
This function performs OVESEG-test to assess significance of molecular markers.
OVESEGtest(y, group, weights = NULL, alpha = "moderated", NumPerm = 999, seed = 111, detail.return = TRUE, BPPARAM = bpparam())
OVESEGtest(y, group, weights = NULL, alpha = "moderated", NumPerm = 999, seed = 111, detail.return = TRUE, BPPARAM = bpparam())
y |
a numeric matrix containing log-expression or logCPM (log2-counts per million) values. Data frame or SummarizedExperiment object will be internally coerced into a matrix. Rows correspond to probes and columns to samples. Missing values are not permitted. |
group |
categorical vector or factor giving group membership of columns of y. At least two categories need to be presented. |
weights |
optional numeric matrix containing prior weights for each spot. |
alpha |
parameter specifying within-group variance estimator to be used.
'moderated': empirical Bayes moderated variance estimator as used in
|
NumPerm |
an integer specifying the number of permutation resamplings (default 999). |
seed |
an integer seed for the random number generator. |
detail.return |
a logical indicating whether more details about posterior probability estimation will be returned. |
BPPARAM |
a BiocParallelParam object indicating whether parallelization should be used for permutation resamplings. The default is bpparam(). |
OVESEG-test is a statistically-principled method that can detect tissue/cell-specific marker genes among many subtypes. OVESEG-test statistics are designed to mathematically match the definition of molecular markers, and a novel permutation scheme are employed to estimate the corresponding distribution under null hypotheses where the expression patterns of non-markers can be highly complex.
a list containing the following components:
pv.overall |
a vector of p-values calculated by all permutations regardless of upregulated subtypes. |
pv.oneside |
a vector of subtype-specific p-values calculated specifically for the upregulated subtype of each probe. |
pv.oneside.max |
subtype-specific p-values when observed test statistic equal to zero. |
pv.multiside |
pv.oneside*K (K-time comparison correction) and truncated at 1. |
W |
a matrix of posterior probabilities for each component null hypothesis given an observed probe. Rows correspond to probes and columns to one hypothesis. |
label |
a vector of group labels. |
groupOrder |
a matrix with each row being group indexes ordered based
on decreasing expression levels.
Group indexes are positions in |
F.p.value |
a matrix with each column giving p-values corresponding to F-statistics on certain groups. |
lfdr |
a matrix with each column being local false discovery rates estimated based on one column of weighted F.p.value matrix. |
fit |
a |
F.p.value
, lfdr
and fit
are returned only when
detail.return
is TRUE.
data(RocheBT) rtest <- OVESEGtest(RocheBT$y, RocheBT$group, NumPerm=99, BPPARAM=BiocParallel::SerialParam()) ## Not run: #parallel computing rtest <- OVESEGtest(RocheBT$y, RocheBT$group, NumPerm=99, BPPARAM=BiocParallel::SnowParam()) ## End(Not run)
data(RocheBT) rtest <- OVESEGtest(RocheBT$y, RocheBT$group, NumPerm=99, BPPARAM=BiocParallel::SerialParam()) ## Not run: #parallel computing rtest <- OVESEGtest(RocheBT$y, RocheBT$group, NumPerm=99, BPPARAM=BiocParallel::SnowParam()) ## End(Not run)
This function computes OVESEG-test statistics.
OVESEGtstat(y, group, weights = NULL, alpha = "moderated", order.return = FALSE, lmfit.return = FALSE)
OVESEGtstat(y, group, weights = NULL, alpha = "moderated", order.return = FALSE, lmfit.return = FALSE)
y |
a numeric matrix containing log-expression or logCPM (log2-counts per million) values. Data frame or SummarizedExperiment object will be internally coerced into a matrix. Rows correspond to probes and columns to samples. Missing values are not permitted. |
group |
categorical vector or factor giving group membership of columns of y. At least two categories need to be presented. |
weights |
optional numeric matrix containing prior weights for each spot. |
alpha |
parameter specifying within-group variance estimator to be used.
'moderated': empirical Bayes moderated variance estimator as used in
|
order.return |
a logical indicating whether the order of groups will be returned. If FALSE, only the highest expressed group index is return for each probe. |
lmfit.return |
a logical indicating whether a |
OVESEG-test statistics are designed to mathematically match the definition of molecular markers:
where is probe index,
is the number of groups, and
is the mean expression of group
.
a list containing the following components:
tstat |
a vector of OVESEG-test statistics for probes. |
label |
a vector of group labels. |
groupOrder |
If |
fit |
a |
data(RocheBT) rtstat <- OVESEGtstat(RocheBT$y, RocheBT$group, alpha='moderated') rtstat <- OVESEGtstat(RocheBT$y, RocheBT$group, alpha=0.1)
data(RocheBT) rtstat <- OVESEGtstat(RocheBT$y, RocheBT$group, alpha='moderated') rtstat <- OVESEGtstat(RocheBT$y, RocheBT$group, alpha=0.1)
This function permutes group labels among highest expressed M groups and then computes new OVESEG-test statistics.
OVEtstatPermTopM(y, group, groupOrder, M, weights = NULL, alpha = "moderated", NumPerm = 999, seed = 111, BPPARAM = bpparam())
OVEtstatPermTopM(y, group, groupOrder, M, weights = NULL, alpha = "moderated", NumPerm = 999, seed = 111, BPPARAM = bpparam())
y |
a numeric matrix containing log-expression or logCPM (log2-counts per million) values. Data frame or SummarizedExperiment object will be internally coerced into a matrix. Rows correspond to probes and columns to samples. Missing values are not permitted. |
group |
categorical vector or factor giving group membership of columns of y. At least two categories need to be presented. |
groupOrder |
an integer matrix with each row giving group indexes ordered based on decreasing expression levels . |
M |
an integer indicating the number of groups being permutated.
The range is |
weights |
optional numeric matrix containing prior weights for each spot. |
alpha |
parameter specifying within-group variance estimator to be used.
'moderated': empirical Bayes moderated variance estimator as used in
|
NumPerm |
an integer specifying the number of permutation resamplings (default 999). |
seed |
an integer seed for the random number generator. |
BPPARAM |
a BiocParallelParam object indicating whether parallelization should be used for permutation resamplings. The default is bpparam(). |
Top M expressed groups will be involved in permutation. There are
probe patterns in which probes are highly expressed in certain M
groups among the total K groups. Probes of the same pattern share the same
shuffled labels.
To improve the time efficiency, some functions within permutation loops are implemented using c++.
a list containing the following components:
tstat.perm |
a numeric matrix with each column giving OVESEG-test statistics over the expressions after one permutation. |
topidx.perm |
a integer matrix with each colulmn giving the highest expressed group index over the expressions after one permutation. |
data(RocheBT) ppnull <- postProbNull(RocheBT$y, RocheBT$group, detail.return = TRUE) rperm <- OVEtstatPermTopM(RocheBT$y, RocheBT$group, ppnull$groupOrder, M=2, NumPerm=99, BPPARAM=BiocParallel::SerialParam()) ## Not run: #parallel computing rperm <- OVEtstatPermTopM(RocheBT$y, RocheBT$group, ppnull$groupOrder, M=2, NumPerm=99, BPPARAM=BiocParallel::SnowParam()) ## End(Not run)
data(RocheBT) ppnull <- postProbNull(RocheBT$y, RocheBT$group, detail.return = TRUE) rperm <- OVEtstatPermTopM(RocheBT$y, RocheBT$group, ppnull$groupOrder, M=2, NumPerm=99, BPPARAM=BiocParallel::SerialParam()) ## Not run: #parallel computing rperm <- OVEtstatPermTopM(RocheBT$y, RocheBT$group, ppnull$groupOrder, M=2, NumPerm=99, BPPARAM=BiocParallel::SnowParam()) ## End(Not run)
This function estimates probabilities of all kinds of upregulation patterns among subtypes.
patternDistri(ppnull)
patternDistri(ppnull)
ppnull |
a list returned by |
The probability of each upregulation pattern is calculated by accumulating and normalizing genewise posterior probability of null hypotheses and of alternative hypotheses.
a data frame object containing all possible upregulation patterns and corresponding probabilities.
data(RocheBT) ppnull <- postProbNull(RocheBT$y, RocheBT$group, alpha='moderated') pd<- patternDistri(ppnull)
data(RocheBT) ppnull <- postProbNull(RocheBT$y, RocheBT$group, alpha='moderated') pd<- patternDistri(ppnull)
This function computes posterior probabilities of each component null hypothesis given observed probes. Such probe-wise probabilities will be used as weights for aggregating permutations.
postProbNull(y, group, weights = NULL, alpha = "moderated", detail.return = TRUE)
postProbNull(y, group, weights = NULL, alpha = "moderated", detail.return = TRUE)
y |
a numeric matrix containing log-expression or logCPM (log2-counts per million) values. Data frame or SummarizedExperiment object will be internally coerced into a matrix. Rows correspond to probes and columns to samples. Missing values are not permitted. |
group |
categorical vector or factor giving group membership of columns of y. At least two categories need to be presented. |
weights |
optional numeric matrix containing prior weights for each spot. |
alpha |
parameter specifying within-group variance estimator to be used.
'moderated': empirical Bayes moderated variance estimator as used in
|
detail.return |
a logical indicating whether more details (e.g. lfdr) will be returned. |
Posterior probabilities of each component null hypothesis given
observed probes are estimated from ANOVA test on certain groups and local
fdr. There are totally () null hypotheses, where
is the
number of groups.
a list containing the following components:
W |
a matrix of posterior probabilities for each component null hypothesis given an observed probe. Rows correspond to probes and columns to one hypothesis. |
label |
a vector of group labels. |
groupOrder |
a matrix with each row being group indexes ordered based
on decreasing expression levels.
Group indexes are positions in |
F.p.value |
a matrix with each column giving p-values corresponding to F-statistics on certain groups. |
lfdr |
a matrix with each column being local false discovery rates estimated based on one column of weighted F.p.value matrix. |
fit |
a |
F.p.value
, lfdr
and fit
are returned only when
detail.return
is TRUE.
data(RocheBT) ppnull <- postProbNull(RocheBT$y, RocheBT$group, alpha='moderated') ppnull <- postProbNull(RocheBT$y, RocheBT$group, alpha=0.1)
data(RocheBT) ppnull <- postProbNull(RocheBT$y, RocheBT$group, alpha='moderated') ppnull <- postProbNull(RocheBT$y, RocheBT$group, alpha=0.1)
This function estimates p-values by aggregating weighted permutations.
pvalueWeightedEst(tt, ttperm, w)
pvalueWeightedEst(tt, ttperm, w)
tt |
a vector of test statistics. |
ttperm |
a matrix of test statistics from permutations. Rows correspond to probes and columns to one permutation. |
w |
a matrix containing weights for each spot in |
P-values are estimated by weightedly accumulating test statistics from permutations that are larger than observations
p-values.
#generate some example data t.obs <- rnorm(100) t.perm <- matrix(rnorm(100*1000),nrow=100) w <- matrix(runif(100*1000),nrow=100) pv <- pvalueWeightedEst(t.obs, t.perm, w)
#generate some example data t.obs <- rnorm(100) t.perm <- matrix(rnorm(100*1000),nrow=100) w <- matrix(runif(100*1000),nrow=100) pv <- pvalueWeightedEst(t.obs, t.perm, w)
Three cell subtypes (B cells, CD4+ T cells, CD8+ T cells) were isolated from 5 pools of 5 healthy donors each. RNA obtained from these 15 purified populations were subsequently used for mRNA expression profiling by HG-U133Plus2.0 microarrays. We downsample the original data to 5000 probes/probesets. Subtype labels for purified populations are also included. (Data generation script can be found in ./data_raw folder.)
data(RocheBT)
data(RocheBT)
A list with one expression matrix (y) and a categorical vector giving subtypes (group).
Allantaz et al. PLoS One 2012;7(1):e29979. PMID: 22276136