Title: | Estimation of local false discovery rate |
---|---|
Description: | In a typical microarray setting with gene expression data observed under two conditions, the local false discovery rate describes the probability that a gene is not differentially expressed between the two conditions given its corrresponding observed score or p-value level. The resulting curve of p-values versus local false discovery rate offers an insight into the twilight zone between clear differential and clear non-differential gene expression. Package 'twilight' contains two main functions: Function twilight.pval performs a two-condition test on differences in means for a given input matrix or expression set and computes permutation based p-values. Function twilight performs a stochastic downhill search to estimate local false discovery rates and effect size distributions. The package further provides means to filter for permutations that describe the null distribution correctly. Using filtered permutations, the influence of hidden confounders could be diminished. |
Authors: | Stefanie Scheid <[email protected]> |
Maintainer: | Stefanie Scheid <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.83.0 |
Built: | 2024-10-31 06:22:19 UTC |
Source: | https://github.com/bioc/twilight |
Application of function twilight
on twilight object data(expval)
.
The function call was exfdr <- twilight(expval,B=1000)
.
data(exfdr)
data(exfdr)
A twilight object.
Scheid S and Spang R (2004): A stochastic downhill search algorithm for estimating the local false discovery rate, IEEE TCBB 1(3), 98–108.
Scheid S and Spang R (2005): twilight; a Bioconductor package for estimating the local false discovery rate, Bioinformatics 21(12), 2921–2922.
Scheid S and Spang R (2006): Permutation filtering: A novel concept for significance analysis of large-scale genomic data, in: Apostolico A, Guerra C, Istrail S, Pevzner P, and Waterman M (Eds.): Research in Computational Molecular Biology: 10th Annual International Conference, Proceedings of RECOMB 2006, Venice, Italy, April 2-5, 2006. Lecture Notes in Computer Science vol. 3909, Springer, Heidelberg, pp. 338-347.
Application of function twilight.pval
on leukemia data set of Golub et al. (1999), as given in data(Golub_Merge)
in library(golubEsets)
.
First step was the variance-stabilizing normalization of Huber et al. (2002) in library(vsn)
: golubNorm <- justvsn(Golub_Merge)
.
The function call was then expval <- twilight.pval(golubNorm,id)
with id <- as.numeric(Golub_Merge$ALL.AML)
.
data(expval)
data(expval)
A twilight object.
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD and Lander ES (1999): Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring, Science 286, 531–537.
Huber W, von Heydebreck A, Sultmann H, Poustka A and Vingron M (2002): Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Bioinformatics 18, suppl. 1, S96–S104.
Scheid S and Spang R (2004): A stochastic downhill search algorithm for estimating the local false discovery rate, IEEE TCBB 1(3), 98–108.
Scheid S and Spang R (2005): twilight; a Bioconductor package for estimating the local false discovery rate, Bioinformatics 21(12), 2921–2922.
Scheid S and Spang R (2006): Permutation filtering: A novel concept for significance analysis of large-scale genomic data, in: Apostolico A, Guerra C, Istrail S, Pevzner P, and Waterman M (Eds.): Research in Computational Molecular Biology: 10th Annual International Conference, Proceedings of RECOMB 2006, Venice, Italy, April 2-5, 2006. Lecture Notes in Computer Science vol. 3909, Springer, Heidelberg, pp. 338-347.
Tusher VG, Tibshirani R and Chu G (2001): Significance analysis of mircroarrays applied to the ionizing response, PNAS 98(9), 5116–5121.
Interface to plotting of twilight
objects. Produces one of five possible plots.
## S3 method for class 'twilight' plot(x, which = NULL, grayscale = FALSE, legend = TRUE, ...)
## S3 method for class 'twilight' plot(x, which = NULL, grayscale = FALSE, legend = TRUE, ...)
x |
An object of class twilight. |
which |
A character string specifying the plot to be made. |
grayscale |
Logical value. Specifying whether plots should be colored or grayscaled. Only necessary for "scores" and "fdr". |
legend |
Logical value. Produces a legend for "scores" and "effectsize". A legend for "fdr" is only available if bootstrap estimates exist. |
... |
Additional graphical arguments. |
Option which="scores"
plots the expected versus the observed test statistics and draws confidence lines calculated from permutations. This plot is similar to plots in Tusher et al. (2001).
Option which="qvalues"
plots -values versus the number of rejected hypotheses.
Option which="fdr"
plots -values versus 1 - local false discovery rate, that is the conditional probability of being significant given the corresponding
-value, plus bootstrap estimates if computed. Bottom ticks are 1%-quantiles of
-values.
Option which="volcano"
results in the volcano plot, that is observed score versus 1 - local false discovery rate. Bottom ticks are 1%-quantiles of scores.
Option which="effectsize"
plots the observed fold change equivalent score distribution overlaid by the estimated effect size distribution, that is distribution of scores under the alternative. This plot is only available if function twilight.pval
was applied with method="fc"
as fold change equivalent scores are computed from log ratios.
Option which="table"
tabulates histogram "effectsize"
.
No value is returned except for "table".
Stefanie Scheid
Scheid S and Spang R (2004): A stochastic downhill search algorithm for estimating the local false discovery rate, IEEE TCBB 1(3), 98–108.
Scheid S and Spang R (2005): twilight; a Bioconductor package for estimating the local false discovery rate, Bioinformatics 21(12), 2921–2922.
Scheid S and Spang R (2006): Permutation filtering: A novel concept for significance analysis of large-scale genomic data, in: Apostolico A, Guerra C, Istrail S, Pevzner P, and Waterman M (Eds.): Research in Computational Molecular Biology: 10th Annual International Conference, Proceedings of RECOMB 2006, Venice, Italy, April 2-5, 2006. Lecture Notes in Computer Science vol. 3909, Springer, Heidelberg, pp. 338-347.
Tusher VG, Tibshirani R and Chu G (2001): Significance analysis of mircroarrays applied to the ionizing response, PNAS 98(9), 5116–5121.
### contains a twilight object created by function twilight data(exfdr) plot(exfdr)
### contains a twilight object created by function twilight data(exfdr) plot(exfdr)
Extract and print information about a twilight object.
## S3 method for class 'twilight' print(x, ...)
## S3 method for class 'twilight' print(x, ...)
x |
Input object of class |
... |
Additional printing arguments. |
No value is returned.
Stefanie Scheid
Scheid S and Spang R (2004): A stochastic downhill search algorithm for estimating the local false discovery rate, IEEE TCBB 1(3), 98–108.
Scheid S and Spang R (2005): twilight; a Bioconductor package for estimating the local false discovery rate, Bioinformatics 21(12), 2921–2922.
Scheid S and Spang R (2006): Permutation filtering: A novel concept for significance analysis of large-scale genomic data, in: Apostolico A, Guerra C, Istrail S, Pevzner P, and Waterman M (Eds.): Research in Computational Molecular Biology: 10th Annual International Conference, Proceedings of RECOMB 2006, Venice, Italy, April 2-5, 2006. Lecture Notes in Computer Science vol. 3909, Springer, Heidelberg, pp. 338-347.
### contains a twilight object created by function twilight data(exfdr) print(exfdr)
### contains a twilight object created by function twilight data(exfdr) print(exfdr)
The function performs the successive exclusion procedure (SEP) as described in Scheid and Spang (2004).
twilight(xin, lambda = NULL, B = 0, boot.ci = 0.95, clus = NULL, verbose = TRUE)
twilight(xin, lambda = NULL, B = 0, boot.ci = 0.95, clus = NULL, verbose = TRUE)
xin |
Numerical vector of |
lambda |
Numerical value denoting the penalty factor. If not specified, the function searchs for an appropriate regularization parameter. |
B |
Numerical value specifying the number of bootstrap samples. If not specified, no bootstrap estimates are calculated. |
boot.ci |
Numerical value denoting the probability value for bootstrap confidence intervals of local false discovery rate and prior pi0. |
clus |
A list or numerical value to be passed to |
verbose |
Logical value for message printing. |
In short, the successive exclusion procedure divides the set of p-values into two parts. The first part is chosen such that it resembles a uniform distribution while containing as many p-values as possible. This set of p-values represents the set of p-values derived from non-induced genes. The height of the uninform distribution is a natural estimate for the mixture parameter pi0. The p-value not contained in the uniform part serve as representatives of p-values derived from induced genes. Their distribution is the basis of the final estimate of the local false discovery rate.
NOTE: Library(snow) has to be loaded manually. It is not loaded as part of 'suggests' or 'depends' because it is only available under UNIX. If twilight does not work with the current version of snow, please send a report.
Returns a twilight
object consisting of a data.frame
named result
with variables
pvalue |
Sorted input vector. |
qvalue |
|
index |
Index of the original ordering. |
fdr |
Local false discovery rate averaged over 10 runs of SEP. |
mean.fdr |
Bootstrap estimate of local false discovery rate. |
lower.fdr |
Lower |
upper.fdr |
Upper |
Values are sorted by pvalue
.
Additional output consists of
lambda |
Regularization parameter. |
pi0 |
SEP estimate of prior probability. |
boot.pi0 |
Bootstrap estimate and boot.ci -bootstrap confidence bounds. |
boot.ci |
Passes boot.ci for plotting. |
effect |
Histogram of effect size distributions averaged over 10 runs of SEP. |
If xin
is of class twilight
, the remaining slots are filled with corresponding input values. If xin
is not of class twilight
, these slots remain empty.
Stefanie Scheid
Scheid S and Spang R (2004): A stochastic downhill search algorithm for estimating the local false discovery rate, IEEE TCBB 1(3), 98–108.
Scheid S and Spang R (2005): twilight; a Bioconductor package for estimating the local false discovery rate, Bioinformatics 21(12), 2921–2922.
Scheid S and Spang R (2006): Permutation filtering: A novel concept for significance analysis of large-scale genomic data, in: Apostolico A, Guerra C, Istrail S, Pevzner P, and Waterman M (Eds.): Research in Computational Molecular Biology: 10th Annual International Conference, Proceedings of RECOMB 2006, Venice, Italy, April 2-5, 2006. Lecture Notes in Computer Science vol. 3909, Springer, Heidelberg, pp. 338-347.
Storey JD and Tibshirani R (2003): Statistical significance for genomewide studies, PNAS 100(16), 9440–9445.
twilight.pval
, plot.twilight
, exfdr
### twilight object created with B=1000 bootstrap samples data(exfdr) print(exfdr) plot(exfdr)
### twilight object created with B=1000 bootstrap samples data(exfdr) print(exfdr) plot(exfdr)
For a given binary input vector, the function completely enumerates all possible permutations.
twilight.combi(xin, pin, bin)
twilight.combi(xin, pin, bin)
xin |
Binary input vector, e.g. class labels. |
pin |
Logical value. |
bin |
Logical value. |
Please note, that the resulting permutations are always as "balanced" as possible. The balancing is done for the smaller subsample. If its sample size is odd, say 5, twilight.combi
computes all permutations with 2 or 3 samples unchanged. In the paired case, the output matrix contains only one half of all permutations. The second half is simply 1-output
which leads to the same absolute test statistics in a paired test.
Returns a matrix where each row contains one permuted vector. Note that even for balanced permutations, the first row always contains the original vector. If the number of rows exceeds 10000, NULL
is returned.
Stefanie Scheid
Scheid S and Spang R (2004): A stochastic downhill search algorithm for estimating the local false discovery rate, IEEE TCBB 1(3), 98–108.
Scheid S and Spang R (2005): twilight; a Bioconductor package for estimating the local false discovery rate, Bioinformatics 21(12), 2921–2922.
Scheid S and Spang R (2006): Permutation filtering: A novel concept for significance analysis of large-scale genomic data, in: Apostolico A, Guerra C, Istrail S, Pevzner P, and Waterman M (Eds.): Research in Computational Molecular Biology: 10th Annual International Conference, Proceedings of RECOMB 2006, Venice, Italy, April 2-5, 2006. Lecture Notes in Computer Science vol. 3909, Springer, Heidelberg, pp. 338-347.
twilight.permute.pair
, twilight.permute.unpair
x <- c(rep(0,4),rep(1,3)) y <- twilight.combi(x,pin=FALSE,bin=FALSE)
x <- c(rep(0,4),rep(1,3)) y <- twilight.combi(x,pin=FALSE,bin=FALSE)
The function call invokes the filtering for permutations of class labels that produce a set of complete null scores. Depending on the test setting, the algorithm iteratively generates valid permutations of the class labels and computes scores. These are transformed to pooled p-values and each set of permutation p-values is tested for uniformity. Permutations with acceptable uniform p-value distributions are kept. The search stops if either the number num.perm
of wanted permutations is reached or if the number of possible unique(!) permutations is smaller than num.perm
. The default values are similar to function twilight.pval
but please note the details below.
twilight.filtering(xin, yin, method = "fc", paired = FALSE, s0 = 0, verbose = TRUE, num.perm = 1000, num.take = 50)
twilight.filtering(xin, yin, method = "fc", paired = FALSE, s0 = 0, verbose = TRUE, num.perm = 1000, num.take = 50)
xin |
Either an expression set ( |
yin |
A numerical vector containing class labels. The higher label denotes the case, the lower label the control samples to test case vs. control. For correlation scores, |
method |
Character string: |
paired |
Logical value. Depends on whether the samples are paired. Ignored if |
s0 |
Fudge factor for variance correction in the Z-test. Takes effect only if |
verbose |
Logical value for message printing. |
num.perm |
Number of permutations. Within |
num.take |
Number of permutations kept in each step of the iterative filtering. Within |
See vignette.
Returns a matrix with permuted input labels yin
as rows. Please note that this matrix is already translated into binary labels for two-sample testing or to ranks if Spearman's correlation was chosen. The resulting permutation matrix can be directly passed into function twilight.pval
. Please note that the first row always contains the original input yin
to be consistent with the other permutation functions in package twilight
.
Stefanie Scheid
Scheid S and Spang R (2004): A stochastic downhill search algorithm for estimating the local false discovery rate, IEEE TCBB 1(3), 98–108.
Scheid S and Spang R (2005): twilight; a Bioconductor package for estimating the local false discovery rate, Bioinformatics 21(12), 2921–2922.
Scheid S and Spang R (2006): Permutation filtering: A novel concept for significance analysis of large-scale genomic data, in: Apostolico A, Guerra C, Istrail S, Pevzner P, and Waterman M (Eds.): Research in Computational Molecular Biology: 10th Annual International Conference, Proceedings of RECOMB 2006, Venice, Italy, April 2-5, 2006. Lecture Notes in Computer Science vol. 3909, Springer, Heidelberg, pp. 338-347.
## Not run: ### Leukemia data set of Golub et al. (1999). library(golubEsets) data(Golub_Train) ### Variance-stabilizing normalization of Huber et al. (2002). library(vsn) golubNorm <- justvsn(Golub_Train) ### A binary vector of class labels. id <- as.numeric(Golub_Train$ALL.AML) ### Do an unpaired t-test. ### Let's have a quick example with 50 filtered permutations only. ### With num.take=10, we only need 5 iteration steps. yperm <- twilight.filtering(golubNorm,id,method="t",num.perm=50,num.take=10) dim(yperm) ### Let's check that the filtered permutations really produce uniform p-value distributions. ### The first row is the original labeling, so we try the second permutation. yperm <- yperm[-1,] b <- twilight.pval(golubNorm,yperm[1,],method="t",yperm=yperm) hist(b$result$pvalue) ## End(Not run)
## Not run: ### Leukemia data set of Golub et al. (1999). library(golubEsets) data(Golub_Train) ### Variance-stabilizing normalization of Huber et al. (2002). library(vsn) golubNorm <- justvsn(Golub_Train) ### A binary vector of class labels. id <- as.numeric(Golub_Train$ALL.AML) ### Do an unpaired t-test. ### Let's have a quick example with 50 filtered permutations only. ### With num.take=10, we only need 5 iteration steps. yperm <- twilight.filtering(golubNorm,id,method="t",num.perm=50,num.take=10) dim(yperm) ### Let's check that the filtered permutations really produce uniform p-value distributions. ### The first row is the original labeling, so we try the second permutation. yperm <- yperm[-1,] b <- twilight.pval(golubNorm,yperm[1,],method="t",yperm=yperm) hist(b$result$pvalue) ## End(Not run)
The function returns a matrix where each row is a (un)balanced permutation of the input twosample class labels.
twilight.permute.pair(v, m, bal = TRUE)
twilight.permute.pair(v, m, bal = TRUE)
v |
A binary vector representing class labels in original order. Pairs must be in the same order. |
m |
A numerical value giving the number of permutations. |
bal |
Logical value. Results in balanced or unbalanced permutations. |
Returns a matrix where each row contains one permuted vector of class labels. Note that even for balanced permutations, the first row always contains the original vector.
Stefanie Scheid
Scheid S and Spang R (2004): A stochastic downhill search algorithm for estimating the local false discovery rate, IEEE TCBB 1(3), 98–108.
Scheid S and Spang R (2005): twilight; a Bioconductor package for estimating the local false discovery rate, Bioinformatics 21(12), 2921–2922.
Scheid S and Spang R (2006): Permutation filtering: A novel concept for significance analysis of large-scale genomic data, in: Apostolico A, Guerra C, Istrail S, Pevzner P, and Waterman M (Eds.): Research in Computational Molecular Biology: 10th Annual International Conference, Proceedings of RECOMB 2006, Venice, Italy, April 2-5, 2006. Lecture Notes in Computer Science vol. 3909, Springer, Heidelberg, pp. 338-347.
twilight.permute.unpair
, twilight.combi
The function returns a matrix where each row is a (un)balanced permutation of the input twosample class labels.
twilight.permute.unpair(v, m, bal = TRUE)
twilight.permute.unpair(v, m, bal = TRUE)
v |
A binary vector representing class labels in original order. |
m |
A numerical value giving the number of permutations. |
bal |
Logical value. Results in balanced or unbalanced permutations. |
Returns a matrix where each row contains one permuted vector of class labels. Note that even for balanced permutations, the first row always contains the original vector.
Stefanie Scheid
Scheid S and Spang R (2004): A stochastic downhill search algorithm for estimating the local false discovery rate, IEEE TCBB 1(3), 98–108.
Scheid S and Spang R (2005): twilight; a Bioconductor package for estimating the local false discovery rate, Bioinformatics 21(12), 2921–2922.
Scheid S and Spang R (2006): Permutation filtering: A novel concept for significance analysis of large-scale genomic data, in: Apostolico A, Guerra C, Istrail S, Pevzner P, and Waterman M (Eds.): Research in Computational Molecular Biology: 10th Annual International Conference, Proceedings of RECOMB 2006, Venice, Italy, April 2-5, 2006. Lecture Notes in Computer Science vol. 3909, Springer, Heidelberg, pp. 338-347.
twilight.permute.pair
, twilight.combi
A function to compute two-sample t, Z and fold change equivalent test statistics (paired or unpaired) and correlation coefficients. Based on permutations, expected test statistics as given in Tusher et al. (2001) and empirical -values are computed. Additional output are
-values computed as given in Storey and Tibshirani (2003). The resulting object is of class
twilight
and can be passed to functions twilight
or plot.twilight
.
twilight.pval(xin, yin, method = "fc", paired = FALSE, B = 1000, yperm = NULL, balance = FALSE, quant.ci = 0.95, s0=NULL, verbose = TRUE, filtering = FALSE)
twilight.pval(xin, yin, method = "fc", paired = FALSE, B = 1000, yperm = NULL, balance = FALSE, quant.ci = 0.95, s0=NULL, verbose = TRUE, filtering = FALSE)
xin |
Either an expression set ( |
yin |
A numerical vector containing class labels. The higher label denotes the case, the lower label the control samples to test case vs. control. For correlation scores, |
method |
Character string: |
paired |
Logical value. Depends on whether the samples are paired. Ignored if |
B |
Numerical value specifying the number of permutations. |
yperm |
Optional matrix containing in each row a permutation of the class labels in binary(!) format for two-sample testing. For computation of correlation scores, the rows of |
balance |
Logical value. Depends on whether balanced or unbalanced permutations should be done. Ignored if |
quant.ci |
Probability value for confidence lines. Lines are symmetric and denote the |
s0 |
Fudge factor for variance correction in the Z-test. Takes effect only if |
verbose |
Logical value for message printing. |
filtering |
Logical value for filtering for permutations of class labels that produce a set of complete null scores. Invokes function |
Please see vignette for detailed information.
Returns a twilight
object consisting of a data.frame
named result
with variables
observed |
Observed test statistics. |
expected |
Mean of order statistics of the permutation statistics. |
candidate |
Binary vector. |
pvalue |
Empirical |
qvalue |
|
index |
Index of the original ordering. |
Values are sorted by absolute observed
scores.
Additional output consists of
ci.line |
Quantile corresponding to quant.ci , passed for plotting. |
pi0 |
Estimated prior probability. |
s0 |
Estimated fudge factor if method="z" . |
call |
Character string of function arguments. |
quant.ci |
Passes quant.ci for plotting. |
The remaining slots are left empty for function twilight
.
Stefanie Scheid
Scheid S and Spang R (2004): A stochastic downhill search algorithm for estimating the local false discovery rate, IEEE TCBB 1(3), 98–108.
Scheid S and Spang R (2005): twilight; a Bioconductor package for estimating the local false discovery rate, Bioinformatics 21(12), 2921–2922.
Scheid S and Spang R (2006): Permutation filtering: A novel concept for significance analysis of large-scale genomic data, in: Apostolico A, Guerra C, Istrail S, Pevzner P, and Waterman M (Eds.): Research in Computational Molecular Biology: 10th Annual International Conference, Proceedings of RECOMB 2006, Venice, Italy, April 2-5, 2006. Lecture Notes in Computer Science vol. 3909, Springer, Heidelberg, pp. 338-347.
Storey JD and Tibshirani R (2003): Statistical significance for genomewide studies, PNAS 100(16), 9440–9445.
Tusher VG, Tibshirani R and Chu G (2001): Significance analysis of mircroarrays applied to the ionizing response, PNAS 98(9), 5116–5121.
twilight
, plot.twilight
, twilight.combi
, twilight.filtering
, expval
, twilight.teststat
### twilight object created from Golub data set data(expval) print(expval) plot(expval)
### twilight object created from Golub data set data(expval) print(expval) plot(expval)
A function to compute two-sample t, Z and fold change equivalent test statistics (paired or unpaired) and correlation coefficients.
twilight.teststat(xin, yin, method = "fc", paired = FALSE, s0 = NULL)
twilight.teststat(xin, yin, method = "fc", paired = FALSE, s0 = NULL)
xin |
Either an expression set ( |
yin |
A numerical vector containing class labels. The higher label denotes the case, the lower label the control samples to test case vs. control. For correlation scores, |
method |
Character string: |
paired |
Logical value. Depends on whether the samples are paired. Ignored if |
s0 |
Fudge factor for variance correction in the Z-test. Takes effect only if |
Please see vignette for detailed information.
Returns a list with two components: a numerical vector of observed test statistics observed
. Each entry corresponds to one row of the input data matrix. Also, the estimated fudge factor s0
is returned. In any other case except method="z"
, s0
is zero.
Stefanie Scheid
Scheid S and Spang R (2004): A stochastic downhill search algorithm for estimating the local false discovery rate, IEEE TCBB 1(3), 98–108.
Scheid S and Spang R (2005): twilight; a Bioconductor package for estimating the local false discovery rate, Bioinformatics 21(12), 2921–2922.
Scheid S and Spang R (2006): Permutation filtering: A novel concept for significance analysis of large-scale genomic data, in: Apostolico A, Guerra C, Istrail S, Pevzner P, and Waterman M (Eds.): Research in Computational Molecular Biology: 10th Annual International Conference, Proceedings of RECOMB 2006, Venice, Italy, April 2-5, 2006. Lecture Notes in Computer Science vol. 3909, Springer, Heidelberg, pp. 338-347.
Tusher VG, Tibshirani R and Chu G (2001): Significance analysis of mircroarrays applied to the ionizing response, PNAS 98(9), 5116–5121.
### Z-test on random values M <- matrix(rnorm(20000),nrow=1000) id <- c(rep(1,10),rep(0,10)) stat <- twilight.teststat(M,id,method="z") ### Pearson correlation id <- 1:20 stat <- twilight.teststat(M,id,method="pearson")
### Z-test on random values M <- matrix(rnorm(20000),nrow=1000) id <- c(rep(1,10),rep(0,10)) stat <- twilight.teststat(M,id,method="z") ### Pearson correlation id <- 1:20 stat <- twilight.teststat(M,id,method="pearson")