Title: | Multiple Testing using SAM and Efron's Empirical Bayes Approaches |
---|---|
Description: | Identification of differentially expressed genes and estimation of the False Discovery Rate (FDR) using both the Significance Analysis of Microarrays (SAM) and the Empirical Bayes Analyses of Microarrays (EBAM). |
Authors: | Holger Schwender |
Maintainer: | Holger Schwender <[email protected]> |
License: | LGPL (>= 2) |
Version: | 1.81.0 |
Built: | 2024-12-18 06:11:47 UTC |
Source: | https://github.com/bioc/siggenes |
Generates the required statistics for an Empirical Bayes Analysis of Microarrays (EBAM) of categorical data such as SNP data.
Should not be called directly, but via ebam(..., method = chisq.ebam).
This function replaces cat.ebam
.
chisq.ebam(data, cl, approx = NULL, B = 100, n.split = 1, check.for.NN = FALSE, lev = NULL, B.more = 0.1, B.max = 50000, n.subset = 10, fast = FALSE, n.interval = NULL, df.ratio = 3, df.dens = NULL, knots.mode = NULL, type.nclass = "wand", rand = NA)
chisq.ebam(data, cl, approx = NULL, B = 100, n.split = 1, check.for.NN = FALSE, lev = NULL, B.more = 0.1, B.max = 50000, n.subset = 10, fast = FALSE, n.interval = NULL, df.ratio = 3, df.dens = NULL, knots.mode = NULL, type.nclass = "wand", rand = NA)
data |
a matrix, data frame, or list. If a matrix or data frame, then each row
must correspond to a variable (e.g., a SNP), and each column to a sample (i.e.\ an observation).
If the number of observations is huge it is better to specify |
cl |
a numeric vector of length |
approx |
should the null distribution be approximated by a |
B |
the number of permutations used in the estimation of the null distribution,
and hence, in the computation of the expected |
n.split |
number of chunks in which the variables are splitted in the computation
of the values of the test statistic. Currently, only available if |
check.for.NN |
if |
lev |
numeric or character vector specifying the codings of the levels of the
variables/SNPs. Can only be specified if |
B.more |
a numeric value. If the number of all possible permutations is smaller
than or equal to (1+ |
B.max |
a numeric value. If the number of all possible permutations is smaller
than or equal to |
n.subset |
a numeric value indicating in how many subsets the |
fast |
if |
n.interval |
the number of intervals used in the logistic regression with
repeated observations for estimating the ratio |
df.ratio |
integer specifying the degrees of freedom of the natural cubic
spline used in the logistic regression with repeated observations. Ignored
if |
df.dens |
integer specifying the degrees of freedom of the natural cubic
spline used in the Poisson regression to estimate the density of the observed
|
knots.mode |
if |
type.nclass |
character string specifying the procedure used to compute the
number of cells of the histogram. Ignored if |
rand |
numeric value. If specified, i.e. not |
For each variable, Pearson's Chi-Square statistic is computed to test if the distribution of the variable differs between several groups. Since only one null distribution is estimated for all variables as proposed in the original EBAM application of Efron et al. (2001), all variables must have the same number of levels/categories.
A list containing statistics required by ebam
.
This procedure will only work correctly if all SNPs/variables have the same number of levels/categories.
Holger Schwender, [email protected]
Efron, B., Tibshirani, R., Storey, J.D., and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment, JASA, 96, 1151-1160.
Schwender, H. and Ickstadt, K. (2008). Empirical Bayes Analysis of Single Nucleotide Polymorphisms. BMC Bioinformatics, 9, 144.
Schwender, H., Krause, A., and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany.
## Not run: # Generate a random 1000 x 40 matrix consisting of the values # 1, 2, and 3, and representing 1000 variables and 40 observations. mat <- matrix(sample(3, 40000, TRUE), 1000) # Assume that the first 20 observations are cases, and the # remaining 20 are controls. cl <- rep(1:2, e=20) # Then an EBAM analysis for categorical data can be done by out <- ebam(mat, cl, method=chisq.ebam, approx=TRUE) out # approx is set to TRUE to approximate the null distribution # by the ChiSquare-distribution (usually, for such a small # number of observations this might not be a good idea # as the assumptions behind this approximation might not # be fulfilled). # The same results can also be obtained by employing # contingency tables, i.e. by specifying data as a list. # For this, we need to generate the tables summarizing # groupwise how many observations show which level at # which variable. These tables can be obtained by library(scrime) cases <- rowTables(mat[, cl==1]) controls <- rowTables(mat[, cl==2]) ltabs <- list(cases, controls) # And the same EBAM analysis as above can then be # performed by out2 <- ebam(ltabs, method=chisq.ebam, approx=TRUE) out2 ## End(Not run)
## Not run: # Generate a random 1000 x 40 matrix consisting of the values # 1, 2, and 3, and representing 1000 variables and 40 observations. mat <- matrix(sample(3, 40000, TRUE), 1000) # Assume that the first 20 observations are cases, and the # remaining 20 are controls. cl <- rep(1:2, e=20) # Then an EBAM analysis for categorical data can be done by out <- ebam(mat, cl, method=chisq.ebam, approx=TRUE) out # approx is set to TRUE to approximate the null distribution # by the ChiSquare-distribution (usually, for such a small # number of observations this might not be a good idea # as the assumptions behind this approximation might not # be fulfilled). # The same results can also be obtained by employing # contingency tables, i.e. by specifying data as a list. # For this, we need to generate the tables summarizing # groupwise how many observations show which level at # which variable. These tables can be obtained by library(scrime) cases <- rowTables(mat[, cl==1]) controls <- rowTables(mat[, cl==2]) ltabs <- list(cases, controls) # And the same EBAM analysis as above can then be # performed by out2 <- ebam(ltabs, method=chisq.ebam, approx=TRUE) out2 ## End(Not run)
Generates the required statistics for a Significance Analysis of Microarrays of categorical data such as SNP data.
Should not be called directly, but via sam(..., method = chisq.stat).
Replaces cat.stat
chisq.stat(data, cl, approx = NULL, B = 100, n.split = 1, check.for.NN = FALSE, lev = NULL, B.more = 0.1, B.max = 50000, n.subset = 10, rand = NA)
chisq.stat(data, cl, approx = NULL, B = 100, n.split = 1, check.for.NN = FALSE, lev = NULL, B.more = 0.1, B.max = 50000, n.subset = 10, rand = NA)
data |
a matrix, data frame, or list. If a matrix or data frame, then each row
must correspond to a variable (e.g., a SNP), and each column to a sample (i.e.\ an observation).
If the number of observations is huge it is better to specify |
cl |
a numeric vector of length |
approx |
should the null distribution be approximated by a |
B |
the number of permutations used in the estimation of the null distribution,
and hence, in the computation of the expected |
n.split |
number of chunks in which the variables are splitted in the computation
of the values of the test statistic. Currently, only available if |
check.for.NN |
if |
lev |
numeric or character vector specifying the codings of the levels of the
variables/SNPs. Can only be specified if |
B.more |
a numeric value. If the number of all possible permutations is smaller
than or equal to (1+ |
B.max |
a numeric value. If the number of all possible permutations is smaller
than or equal to |
n.subset |
a numeric value indicating how many permutations are considered
simultaneously when computing the expected |
rand |
numeric value. If specified, i.e. not |
For each SNP (or more general, categorical variable), Pearson's Chi-Square statistic is computed to test if the distribution of the SNP differs between several groups. Since only one null distribution is estimated for all SNPs as proposed in the original SAM procedure of Tusher et al. (2001) all SNPs must have the same number of levels/categories.
A list containing statistics required by sam
.
This procedure will only work correctly if all SNPs/variables have the same number of levels/categories. Therefore, it is stopped when the number of levels differ between the variables.
Holger Schwender, [email protected]
Schwender, H. (2005). Modifying Microarray Analysis Methods for Categorical Data – SAM and PAM for SNPs. In Weihs, C. and Gaul, W. (eds.), Classification – The Ubiquitous Challenge. Springer, Heidelberg, 370-377.
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.
SAM-class
,sam
, chisq.ebam
, trend.stat
## Not run: # Generate a random 1000 x 40 matrix consisting of the values # 1, 2, and 3, and representing 1000 variables and 40 observations. mat <- matrix(sample(3, 40000, TRUE), 1000) # Assume that the first 20 observations are cases, and the # remaining 20 are controls. cl <- rep(1:2, e=20) # Then an SAM analysis for categorical data can be done by out <- sam(mat, cl, method=chisq.stat, approx=TRUE) out # approx is set to TRUE to approximate the null distribution # by the ChiSquare-distribution (usually, for such a small # number of observations this might not be a good idea # as the assumptions behind this approximation might not # be fulfilled). # The same results can also be obtained by employing # contingency tables, i.e. by specifying data as a list. # For this, we need to generate the tables summarizing # groupwise how many observations show which level at # which variable. These tables can be obtained by library(scrime) cases <- rowTables(mat[, cl==1]) controls <- rowTables(mat[, cl==2]) ltabs <- list(cases, controls) # And the same SAM analysis as above can then be # performed by out2 <- sam(ltabs, method=chisq.stat, approx=TRUE) out2 ## End(Not run)
## Not run: # Generate a random 1000 x 40 matrix consisting of the values # 1, 2, and 3, and representing 1000 variables and 40 observations. mat <- matrix(sample(3, 40000, TRUE), 1000) # Assume that the first 20 observations are cases, and the # remaining 20 are controls. cl <- rep(1:2, e=20) # Then an SAM analysis for categorical data can be done by out <- sam(mat, cl, method=chisq.stat, approx=TRUE) out # approx is set to TRUE to approximate the null distribution # by the ChiSquare-distribution (usually, for such a small # number of observations this might not be a good idea # as the assumptions behind this approximation might not # be fulfilled). # The same results can also be obtained by employing # contingency tables, i.e. by specifying data as a list. # For this, we need to generate the tables summarizing # groupwise how many observations show which level at # which variable. These tables can be obtained by library(scrime) cases <- rowTables(mat[, cl==1]) controls <- rowTables(mat[, cl==2]) ltabs <- list(cases, controls) # And the same SAM analysis as above can then be # performed by out2 <- sam(ltabs, method=chisq.stat, approx=TRUE) out2 ## End(Not run)
Computes the required statistics for a Significance Analysis of Microarrays (SAM) using either a (modified) t- or F-statistic.
Should not be called directly, but via the function sam.
d.stat(data, cl, var.equal = FALSE, B = 100, med = FALSE, s0 = NA, s.alpha = seq(0, 1, 0.05), include.zero = TRUE, n.subset = 10, mat.samp = NULL, B.more = 0.1, B.max = 30000, gene.names = NULL, R.fold = 1, use.dm = TRUE, R.unlog = TRUE, na.replace = TRUE, na.method = "mean", rand = NA)
d.stat(data, cl, var.equal = FALSE, B = 100, med = FALSE, s0 = NA, s.alpha = seq(0, 1, 0.05), include.zero = TRUE, n.subset = 10, mat.samp = NULL, B.more = 0.1, B.max = 30000, gene.names = NULL, R.fold = 1, use.dm = TRUE, R.unlog = TRUE, na.replace = TRUE, na.method = "mean", rand = NA)
data |
a matrix, data frame or |
cl |
a numeric vector of length |
var.equal |
if |
B |
numeric value indicating how many permutations should be used in the estimation of the null distribution. |
med |
if |
s0 |
a numeric value specifying the fudge factor. If |
s.alpha |
a numeric vector or value specifying the quantiles of the
standard deviations of the genes used in the computation of |
include.zero |
if |
n.subset |
a numeric value indicating how many permutations are considered
simultaneously when computing the p-value and the number of falsely called
genes. If |
mat.samp |
a matrix having |
B.more |
a numeric value. If the number of all possible permutations is smaller
than or equal to (1+ |
gene.names |
a character vector of length |
B.max |
a numeric value. If the number of all possible permutations is smaller
than or equal to |
R.fold |
a numeric value. If the fold change of a gene is smaller than or
equal to |
use.dm |
if |
R.unlog |
if |
na.replace |
if |
na.method |
a character string naming the statistic with which missing values
will be replaced if |
rand |
numeric value. If specified, i.e. not |
An object of class SAM.
Holger Schwender, [email protected]
Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany.
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.
Generates both a plot of vs. the FDR and a plot of
vs.\ the number of identified genes in a SAM analysis.
delta.plot(object, delta = NULL, helplines = FALSE)
delta.plot(object, delta = NULL, helplines = FALSE)
object |
a object of class SAM. |
delta |
a vector of values for |
helplines |
if |
The plots are a visualization of the table generated
by
sam
that contains the estimated FDR and the number of identified
genes for a set of values.
Two plots in one graphsheet: The plot of vs. FDR and
the plot of
vs. the number of identified genes.
Holger Schwender, [email protected]
Tusher, V., Tibshirani, R., and Chu, G. (2001). Significance Analysis of Microarrays Applied to the Ionizing Radiation Response. PNAS, 98, 5116-5121.
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # Perform a SAM analysis. sam.out<-sam(golub, golub.cl, B=100, rand=123) # Generate the Delta plots for the default set of Deltas computed by sam. delta.plot(sam.out) # Another way of generating the same plot. plot(sam.out) # Generate the Delta plots for Delta = 0.2, 0.4, ..., 2. plot(sam.out, seq(0.2, 2, 0.2)) ## End(Not run)
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # Perform a SAM analysis. sam.out<-sam(golub, golub.cl, B=100, rand=123) # Generate the Delta plots for the default set of Deltas computed by sam. delta.plot(sam.out) # Another way of generating the same plot. plot(sam.out) # Generate the Delta plots for Delta = 0.2, 0.4, ..., 2. plot(sam.out, seq(0.2, 2, 0.2)) ## End(Not run)
Estimates the density of a vector of observations by a Poisson regression fit to histogram counts.
denspr(x, n.interval = NULL, df = 5, knots.mode = TRUE, type.nclass = c("wand", "scott", "FD"), addx=FALSE)
denspr(x, n.interval = NULL, df = 5, knots.mode = TRUE, type.nclass = c("wand", "scott", "FD"), addx=FALSE)
x |
a numeric vector containing the observations for which the density should be estimated. |
n.interval |
an integer specifying the number of cells for the histogram.
If |
df |
integer specifying the degrees of freedom of the natural cubic spline used in the Poisson regression fit. |
knots.mode |
if |
type.nclass |
character string specifying the procedure used to compute the
number of cells of the histogram. Ignored if |
addx |
should |
An object of class denspr
consisting of
y |
a numeric vector of the same length as |
center |
a numeric vector specifying the midpoints of the cells of the histogram |
counts |
a numeric vector of the same length as |
x.mode |
the estimated mode |
ns.out |
the output of |
type |
the method used to estimate the numbers of cells |
x |
the input vector |
Holger Schwender,[email protected]
Efron, B., and Tibshirani, R. (1996). Using specially designed exponential families for density estimation. Annals of Statistics, 24, 2431–2461.
Wand, M.P. (1997). Data-based choice of histogram bin width. American Statistician, 51, 59–64.
## Not run: # Generating some random data. x <- rnorm(10000) out <- denspr(x, addx=TRUE) plot(out) # Or for an asymmetric density. x <- rchisq(10000, 2) out <- denspr(x, df=3, addx=TRUE) plot(out) ## End(Not run)
## Not run: # Generating some random data. x <- rnorm(10000) out <- denspr(x, addx=TRUE) plot(out) # Or for an asymmetric density. x <- rchisq(10000, 2) out <- denspr(x, df=3, addx=TRUE) plot(out) ## End(Not run)
Performs an Empirical Bayes Analysis of Microarrays (EBAM). It is possible to perform one and two class analyses using either a modified t-statistic or a (standardized) Wilcoxon rank statistic, and a multiclass analysis using a modified F-statistic. Moreover, this function provides a EBAM procedure for categorical data such as SNP data and the possibility to employ an user-written score function.
ebam(x, cl, method = z.ebam, delta = 0.9, which.a0 = NULL, control = ebamControl(), gene.names = dimnames(x)[[1]], ...)
ebam(x, cl, method = z.ebam, delta = 0.9, which.a0 = NULL, control = ebamControl(), gene.names = dimnames(x)[[1]], ...)
x |
either a matrix, a data frame or an ExpressionSet object, or the output of |
cl |
a specification of the class labels of the samples. Ignored if Typically, In the one-class case, In the two class unpaired case, In the two class paired case, In the multiclass case and if For examples of how |
method |
a character string or name specifying the method or function that should be
used in the computation of the expression score If If For an analysis of categorical data such as SNP data,
If the variables are ordinal and a trend test should be applied
(e.g., in the two-class case, the Cochran-Armitage trend test), It is also possible to employ an user-written function for computing an user-specified expression score. For details, see the vignette of siggenes. |
delta |
a numeric vector consisting of probabilities for which the number of differentially
expressed genes and the FDR should be computed, where a gene is called differentially expressed
if its posterior probability is larger than |
which.a0 |
an integer between 1 and the length of |
control |
further arguments for controlling the EBAM analysis. For these arguments,
see |
gene.names |
a vector of length |
... |
further arguments of the specific EBAM methods. If |
An object of class EBAM.
Holger Schwender, [email protected]
Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment. JASA, 96, 1151-1160.
Schwender, H., Krause, A., and Ickstadt, K. (2006). Identifying Interesting Genes with siggenes. RNews, 6(5), 45-50.
Storey, J.D. and Tibshirani, R. (2003). Statistical Significance for Genome-Wide Studies. Proceedings of the National Academy of Sciences, 100, 9440-9445.
EBAM-class
, find.a0
, z.ebam
,
wilc.ebam
, chisq.ebam
## Not run: # Load the data of Golub et al. (1999) contained in the package multtest. data(golub) # golub.cl contains the class labels. golub.cl # Perform an EBAM analysis for the two class unpaired case assuming # unequal variances. Specify the fudge factor a0 by the suggested # choice of find.a0 find.out <- find.a0(golub, golub.cl, rand = 123) ebam.out <- ebam(find.out) ebam.out # Since a0 = 0 leads to the largest number of genes (i.e. the suggested # choice of a0), the following leads to the same results as the above # analysis (but only if the random number generator, i.e. rand, is set # to the same number). ebam.out2 <- ebam(golub, golub.cl, a0 = 0, fast = TRUE, rand = 123) ebam.out2 # If fast is set to TRUE in ebam, a crude estimate of the number of # falsely called genes is used (see the help file for z.ebam). This # estimate is always employed in find.a0. # The exact number is used in ebam when performing ebam.out3 <- ebam(golub, golub.cl, a0 = 0, rand = 123) ebam.out3 # Since this is the recommended way, we use ebam.out3 at the end of # the Examples section for further analyses. # Perform an EBAM analysis for the two class unpaired case assuming # equal group variances. Set a0 = 0, and use B = 50 permutations # of the class labels. ebam.out4 <- ebam(golub, golub.cl, a0 = 0, var.equal = TRUE, B = 50, rand = 123) ebam.out4 # Perform an EBAM analysis for the two class unpaired cased assuming # unequal group variances. Use the median (i.e. the 50% quantile) # of the standard deviations of the genes as fudge factor a0. And # obtain the number of genes and the FDR if a gene is called # differentially when its posterior probability is larger than # 0.95. ebam.out5 <- ebam(golub, golub.cl, quan.a0 = 0.5, delta = 0.95, rand = 123) ebam.out5 # For the third analysis, obtain the number of differentially # expressed genes and the FDR if a gene is called differentially # expressed if its posterior probability is larger than 0.8, 0.85, # 0.9, 0.95. print(ebam.out3, c(0.8, 0.85, 0.9, 0.95)) # Generate a plot of the posterior probabilities for delta = 0.9. plot(ebam.out3, 0.9) # Obtain the list of genes called differentially expressed if their # posterior probability is larger than 0.99, and gene-specific # statistics for these variables such as their z-value and their # local FDR. summary(ebam.out3, 0.99) ## End(Not run)
## Not run: # Load the data of Golub et al. (1999) contained in the package multtest. data(golub) # golub.cl contains the class labels. golub.cl # Perform an EBAM analysis for the two class unpaired case assuming # unequal variances. Specify the fudge factor a0 by the suggested # choice of find.a0 find.out <- find.a0(golub, golub.cl, rand = 123) ebam.out <- ebam(find.out) ebam.out # Since a0 = 0 leads to the largest number of genes (i.e. the suggested # choice of a0), the following leads to the same results as the above # analysis (but only if the random number generator, i.e. rand, is set # to the same number). ebam.out2 <- ebam(golub, golub.cl, a0 = 0, fast = TRUE, rand = 123) ebam.out2 # If fast is set to TRUE in ebam, a crude estimate of the number of # falsely called genes is used (see the help file for z.ebam). This # estimate is always employed in find.a0. # The exact number is used in ebam when performing ebam.out3 <- ebam(golub, golub.cl, a0 = 0, rand = 123) ebam.out3 # Since this is the recommended way, we use ebam.out3 at the end of # the Examples section for further analyses. # Perform an EBAM analysis for the two class unpaired case assuming # equal group variances. Set a0 = 0, and use B = 50 permutations # of the class labels. ebam.out4 <- ebam(golub, golub.cl, a0 = 0, var.equal = TRUE, B = 50, rand = 123) ebam.out4 # Perform an EBAM analysis for the two class unpaired cased assuming # unequal group variances. Use the median (i.e. the 50% quantile) # of the standard deviations of the genes as fudge factor a0. And # obtain the number of genes and the FDR if a gene is called # differentially when its posterior probability is larger than # 0.95. ebam.out5 <- ebam(golub, golub.cl, quan.a0 = 0.5, delta = 0.95, rand = 123) ebam.out5 # For the third analysis, obtain the number of differentially # expressed genes and the FDR if a gene is called differentially # expressed if its posterior probability is larger than 0.8, 0.85, # 0.9, 0.95. print(ebam.out3, c(0.8, 0.85, 0.9, 0.95)) # Generate a plot of the posterior probabilities for delta = 0.9. plot(ebam.out3, 0.9) # Obtain the list of genes called differentially expressed if their # posterior probability is larger than 0.99, and gene-specific # statistics for these variables such as their z-value and their # local FDR. summary(ebam.out3, 0.99) ## End(Not run)
This is a class representation for the Empirical Bayes Analysis of Microarrays (EBAM) proposed by Efron et al. (2001).
Objects can be created using the function ebam
.
z
:Object of class "numeric"
representing the
expression scores of the genes.
posterior
:Object of class "numeric"
representing
the posterior probabilities of the genes.
p0
:Object of class "numeric"
specifying the
prior probability that a gene is not differentially expressed.
local
:Object of class "numeric"
consisting of the
local FDR estimates for the genes.
mat.fdr
:Object of class "matrix"
containing general
statistics such as the number of differentially expressed genes and
the estimated FDR for the specified values of delta
.
a0
:Object of class "numeric"
specifying the used
value of the fudge factor. If not computed, a0
will be
set to numeric(0)
.
mat.samp
:Object of class "matrix"
containing
the permuted group labels used in the estimation of the null
distribution. Each row represents one permutation, each column
one observation (pair). If no permutation procedure has been used,
mat.samp
will be set to matrix(numeric(0))
.
vec.pos
:Object of class "numeric"
consisting of
the number of positive permuted test scores that are absolutely
larger than the test score of a particular gene for each gene.
If not computed vec.pos
is set to numeric(0)
.
vec.neg
:Object of class "numeric"
consisting of
the number of negative permuted test scores that are absolutely
larger than the test score of a particular gene for each gene.
If not computed vec.neg
is set to numeric(0)
.
msg
:Object of class "character"
containing information
about, e.g., the type of analysis. msg
is printed when the functions
print
and summary
are called.
chip
:Object of class "character"
naming the microarray
used in the analysis. If no information about the chip is available,
chip
will be set to ""
.
signature(object = "EBAM")
: Generates a plot of the
posterior probabilities of the genes for a specified value of
. For details, see
help.ebam(plot)
. For
the arguments, see args.ebam(plot)
.
signature(object = "EBAM")
: Prints general information
such as the number of differentially expressed genes and the estimated
FDR for several values of . For details, see
help.ebam(print)
. Arguments can be listed by args.ebam(print)
.
signature(object = "EBAM")
: Shows the output of an
EBAM analysis.
signature(object = "EBAM")
: Summarizes the results
of an EBAM analysis for a specified value of .
For details, see
help.ebam(summary)
. For the arguments,
see args.ebam(summary)
.
Holger Schwender, [email protected]
Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment, JASA, 96, 1151-1160.
Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany.
## Not run: # Load the data of Golub et al. (1999) contained in the package multtest. data(golub) # golub.cl contains the class labels. golub.cl # Perform an EBAM analysis for the two class unpaired case assuming # unequal variances. Specify the fudge factor a0 by the suggested # choice of find.a0 find.out <- find.a0(golub, golub.cl, rand = 123) ebam.out <- ebam(find.out) ebam.out # Obtain the number of differentially # expressed genes and the FDR if a gene is called differentially # expressed if its posterior probability is larger than 0.8, 0.85, # 0.9, 0.95. print(ebam.out, c(0.8, 0.85, 0.9, 0.95)) # Generate a plot of the posterior probabilities for delta = 0.9. plot(ebam.out, 0.9) # Obtain the list of genes called differentially expressed if their # posterior probability is larger than 0.99, and gene-specific # statistics for these variables such as their z-value and their # local FDR. summary(ebam.out, 0.9) ## End(Not run)
## Not run: # Load the data of Golub et al. (1999) contained in the package multtest. data(golub) # golub.cl contains the class labels. golub.cl # Perform an EBAM analysis for the two class unpaired case assuming # unequal variances. Specify the fudge factor a0 by the suggested # choice of find.a0 find.out <- find.a0(golub, golub.cl, rand = 123) ebam.out <- ebam(find.out) ebam.out # Obtain the number of differentially # expressed genes and the FDR if a gene is called differentially # expressed if its posterior probability is larger than 0.8, 0.85, # 0.9, 0.95. print(ebam.out, c(0.8, 0.85, 0.9, 0.95)) # Generate a plot of the posterior probabilities for delta = 0.9. plot(ebam.out, 0.9) # Obtain the list of genes called differentially expressed if their # posterior probability is larger than 0.99, and gene-specific # statistics for these variables such as their z-value and their # local FDR. summary(ebam.out, 0.9) ## End(Not run)
Specifies most of the optional arguments of ebam
and find.a0
.
ebamControl(p0 = NA, p0.estimation = c("splines", "interval", "adhoc"), lambda = NULL, ncs.value = "max", use.weights = FALSE) find.a0Control(p0.estimation = c("splines", "adhoc", "interval"), lambda = NULL, ncs.value = "max", use.weights = FALSE, n.chunk = 5, n.interval = 139, df.ratio = NULL)
ebamControl(p0 = NA, p0.estimation = c("splines", "interval", "adhoc"), lambda = NULL, ncs.value = "max", use.weights = FALSE) find.a0Control(p0.estimation = c("splines", "adhoc", "interval"), lambda = NULL, ncs.value = "max", use.weights = FALSE, n.chunk = 5, n.interval = 139, df.ratio = NULL)
p0 |
a numeric value specifying the prior probability |
p0.estimation |
either |
lambda |
a numeric vector or value specifying the |
ncs.value |
a character string. Only used if |
use.weights |
should weights be used in the spline based estimation of |
n.chunk |
an integer specifying in how many subsets the |
n.interval |
the number of intervals used in the logistic regression with
repeated observations for estimating the ratio |
df.ratio |
integer specifying the degrees of freedom of the natural cubic spline used in the logistic regression with repeated observations. |
These parameters should only be changed if they are fully understood.
A list containing the values of the parameters that are used in ebam
or find.a0
,
respectively.
Holger Schwender, [email protected]
Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment. JASA, 96, 1151-1160.
Storey, J.D. and Tibshirani, R. (2003). Statistical Significance for Genome-Wide Studies. Proceedings of the National Academy of Sciences, 100, 9440-9445.
Suggests an optimal value for the fudge factor in an EBAM analysis as proposed by Efron et al. (2001).
find.a0(data, cl, method = z.find, B = 100, delta = 0.9, quan.a0 = (0:5)/5, include.zero = TRUE, control = find.a0Control(), gene.names = dimnames(data)[[1]], rand = NA, ...)
find.a0(data, cl, method = z.find, B = 100, delta = 0.9, quan.a0 = (0:5)/5, include.zero = TRUE, control = find.a0Control(), gene.names = dimnames(data)[[1]], rand = NA, ...)
data |
a matrix, data frame or an ExpressionSet object.
Each row of |
cl |
a numeric vector of length In the one-class case, In the two class unpaired case, In the two class paired case, In the multiclass case and if For examples of how |
method |
the name of a function for computing the numerator and the denominator
of the test statistic of interest, and for specifying other objects required
for the identification of the fudge factor. The default function |
B |
the number of permutations used in the estimation of the null distribution. |
delta |
a probability. All genes showing a posterior probability that is
larger than or equal to |
quan.a0 |
a numeric vector indicating over which quantiles of the
standard deviations of the genes the fudge factor |
include.zero |
should |
control |
further arguments for controlling the EBAM analysis with |
gene.names |
a character vector of length |
rand |
integer. If specified, i.e. not |
... |
further arguments for the function specified by |
The suggested choice for the fudge factor is the value of that
leads to the largest number of genes showing a posterior probability larger
than
delta
.
Actually, only the genes having a posterior probability larger than delta
are called differentially expressed that do not exhibit a test score less extreme
than the score of a gene whose posterior probability is less than delta
.
So, let's say, we have done an EBAM analysis with a t-test and we have ordered
the genes by their t-statistic. Let's further assume that Gene 1 to Gene 5 (i.e.
the five genes with the lowest t-statistics), Gene 7 and 8, Gene 3012 to 3020,
and Gene 3040 to 3051 are the only genes that show a posterior probability larger
than delta
. Then, Gene 1 to 5, and 3040 to 3051 are called differentially
expressed, but Gene 7 and 8, and 3012 to 3020 are not called differentially
expressed, since Gene 6 and Gene 3021 to 3039 show a posterior probability less
than delta
.
An object of class FindA0.
The numbers of differentially expressed genes can differ between find.a0
and ebam
, even though the same value of the fudge factor is used, since
in find.a0
the observed and permuted test scores are monotonically
transformed such that the observed scores follow a standard normal distribution
(if the test statistic can take both positive and negative values) and
an F-distribution (if the test statistic can only take positive values) for each
possible choice of the fudge factor.
Holger Schwender, [email protected]
Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment, JASA, 96, 1151-1160.
ebam
, FindA0-class
, find.a0Control
## Not run: # Load the data of Golub et al. (1999) contained in the package multtest. data(golub) # golub.cl contains the class labels. golub.cl # Obtain the number of differentially expressed genes and the FDR for the # default set of values for the fudge factor. find.out <- find.a0(golub, golub.cl, rand = 123) find.out # Obtain the number of differentially expressed genes and the FDR when using # the t-statistic assuming equal group variances find.out2 <- find.a0(golub, golub.cl, var.equal = TRUE, rand = 123) # Using the Output of the first analysis with find.a0, the number of # differentially expressed genes and the FDR for other values of # delta, e.g., 0.95, can be obtained by print(find.out, 0.95) # The logit-transformed posterior probabilities can be plotted by plot(find.out) # To avoid the logit-transformation, set logit = FALSE. plot(find.out, logit = FALSE) ## End(Not run)
## Not run: # Load the data of Golub et al. (1999) contained in the package multtest. data(golub) # golub.cl contains the class labels. golub.cl # Obtain the number of differentially expressed genes and the FDR for the # default set of values for the fudge factor. find.out <- find.a0(golub, golub.cl, rand = 123) find.out # Obtain the number of differentially expressed genes and the FDR when using # the t-statistic assuming equal group variances find.out2 <- find.a0(golub, golub.cl, var.equal = TRUE, rand = 123) # Using the Output of the first analysis with find.a0, the number of # differentially expressed genes and the FDR for other values of # delta, e.g., 0.95, can be obtained by print(find.out, 0.95) # The logit-transformed posterior probabilities can be plotted by plot(find.out) # To avoid the logit-transformation, set logit = FALSE. plot(find.out, logit = FALSE) ## End(Not run)
This is a class representation for the specification of the fudge factor in an EBAM analysis as proposed by Efron et al. (2001).
Objects can be created using the function find.a0
.
mat.z
:Object of class "matrix"
containing the expression
scores of the genes for each of the possible values for the fudge factor,
where each row corresponds to a gene, and each column to one of the values
for the fudge factor .
mat.posterior
:Object of class "matrix"
consisting of
the posterior probabilities of the genes for each of the possible values
for the fudge factor, where each row of mat.posterior
corresponds to
a gene, and each column to one of the values for . The
probabilities in
mat.posterior
are computed using the monotonically
transformed test scores (see the Details section of find.a0
).
mat.center
:Object of class "matrix"
representing the
centers of the nrow(mat.center)
intervals used in the logistic
regression with repeated observations for estimating
for each of the
ncol(mat.center)
possible values for the fudge
factor.
mat.success
:Object of class "matrix"
consisting of
the numbers of observed test scores in the nrow(mat.success)
intervals
used in the logistic regression with repeated observations for each
of the ncol(mat.success)
possible values for the fudge factor.
mat.failure
:Object of class "matrix"
containing the
numbers of permuted test scores in the nrow(mat.failure)
intervals
used in the logistic regression with repeated observations for each
of the ncol(mat.failure)
possible values for the fudge factor.
z.norm
:Object of class "numeric"
comprising the
values of the nrow(mat.z)
quantiles of the standard normal
distribution (if any mat.z<0
) or an F-distribution (if all
mat.z >= 0
).
p0
:Object of class "numeric"
specifying the prior
probability that a gene is not differentially expressed.
mat.a0
:Object of class "data.frame"
comprising
the number of differentially expressed genes and the estimated FDR
for the possible choices of the fudge factor specified by vec.a0
.
mat.samp
:Object of class "matrix"
consisting of the
nrow{mat.samp}
permutations of the class labels.
vec.a0
:Object of class "numeric"
representing the
possible values of the fudge factor .
suggested
:Object of class "numeric"
revealing the
suggested choice for the fudge factor, i.e. the value of vec.a0
that leads to the largest number of differentially expressed genes.
delta
:Object of class "numeric"
specifying the
minimum posterior probability that a gene must have to be called
differentially expressed.
df.ratio
:Object of class "numeric"
representing the
degrees of freedom of the natural cubic spline used in the logistic
regression with repeated observations.
msg
:Object of class "character"
containing information
about, e.g., the type of analysis. msg
is printed when
print
is called.
chip
:Object of class "character"
naming the microarray
used in the analysis. If no information about the chip is available,
chip
will be set to ""
.
signature(object = "FindA0")
: Generates a plot of the
(logit-transformed) posterior probabilities of the genes for a specified
value of and a set of possible values for the fudge
factor. For details, see
help.finda0(plot)
. For
the arguments, see args.finda0(plot)
.
signature(object = "FindA0")
: Prints the number of
differentially expressed genes and the estimated
FDR for each of the possible values of the fudge factor specified by
vec.a0
. For details, see help.finda0(print)
.
For arguments, see args.finda0(print)
.
signature(object = "FindA0")
: Shows the output of an
analysis with find.a0
.
Holger Schwender, [email protected]
Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment, JASA, 96, 1151-1160.
Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany.
Computes the value of the threshold Delta for a given FDR or number of genes/variables in a SAM or EBAM analysis.
findDelta(object, fdr = NULL, genes = NULL, prec = 6, initial = NULL, verbose = FALSE)
findDelta(object, fdr = NULL, genes = NULL, prec = 6, initial = NULL, verbose = FALSE)
object |
either a SAM or an EBAM object. |
fdr |
numeric value between 0 and 1 for which the threshold Delta and thus the
number of genes/variables should be obtained. Only one of |
genes |
integer specifying the number of genes/variables for which the threshold
Delta and thus the estimated FDR should be obtained. Only one of |
prec |
integer indicating the precision of the considered Delta values. |
initial |
a numeric vector of length two containing the minimum and the maximum value of
Delta that is initially used in the search for Delta. Both values must be larger than 0.
If |
verbose |
should more information about the search process be shown? |
If a value of Delta is found for the exact value of fdr
or genes
, then a vector
of length 3 consisting of Delta and the corresponding number of genes and the estimated FDR.
If such a value is not found, then a matrix with two rows and three columns, where the two
rows contain the number of genes/variables and the estimated FDR for the two considered values
of Delta that provide the closest upper and lower bounds to the desired FDR (if fdr
is
specified) or number of genes/variables (if genes
is specified.)
Holger Schwender, [email protected]
Computes the fudge factor as described by Tusher et al. (2001).
fudge2(r, s, alpha = seq(0, 1, 0.05), include.zero = TRUE)
fudge2(r, s, alpha = seq(0, 1, 0.05), include.zero = TRUE)
r |
a numeric vector. The numerator of the test statistic computed for each gene is represented by one component of this vector. |
s |
a numeric vector. Each component of this vector corresponds to the denominator of the test statistic of a gene. |
alpha |
a numeric value or vector specifying quantiles of the
|
include.zero |
if |
s.zero |
the value of the fudge factor |
alpha.hat |
the optimal quantile of the |
vec.cv |
the vector of the coefficients of variations.
Following Tusher et al. (2001), the optimal |
msg |
a character string summarizing the most important information about the fudge factor. |
Holger Schwender, [email protected]
Tusher, V., Tibshirani, R., and Chu, G. (2001). Significance Analysis of Microarrays Applied to the Ionizing Radiation Response. PNAS, 98, 5116-5121.
Computes the required statistics for an Empirical Bayes Analysis of Microarrays (EBAM; Efron et al., 2001) or a Significant Analysis of Microarrays (SAM; Tusher et al., 2001), respectively, based on the score statistic proposed by Louis et al. (2010) for fuzzy genotype calls or approximate Bayes Factors (Wakefield, 2007) determined using this score statistic.
Should not be called directly, but via ebam(..., method = fuzzy.ebam)
or sam(..., method = fuzzy.stat)
, respectively.
fuzzy.ebam(data, cl, type = c("asymptotic", "permutation", "abf"), W = NULL, logbase = exp(1), addOne = TRUE, df.ratio = NULL, n.interval = NULL, df.dens = 5, knots.mode = TRUE, type.nclass = c("FD", "wand", "scott"), fast = FALSE, B = 100, B.more = 0.1, B.max = 30000, n.subset = 10, rand = NA) fuzzy.stat(data, cl, type = c("asymptotic", "permutation", "abf"), W = NULL, logbase = exp(1), addOne = TRUE, B = 100, B.more = 0.1, B.max = 30000, n.subset = 10, rand = NA)
fuzzy.ebam(data, cl, type = c("asymptotic", "permutation", "abf"), W = NULL, logbase = exp(1), addOne = TRUE, df.ratio = NULL, n.interval = NULL, df.dens = 5, knots.mode = TRUE, type.nclass = c("FD", "wand", "scott"), fast = FALSE, B = 100, B.more = 0.1, B.max = 30000, n.subset = 10, rand = NA) fuzzy.stat(data, cl, type = c("asymptotic", "permutation", "abf"), W = NULL, logbase = exp(1), addOne = TRUE, B = 100, B.more = 0.1, B.max = 30000, n.subset = 10, rand = NA)
data |
a matrix containing fuzzy genotype calls. Such a matrix can, e.g., be generated by the function
|
cl |
a vector of zeros and ones specifying which of the columns of |
type |
a character string specifying how the analysis should be performed. If |
W |
the prior variance. Must be either a positive value or a vector of length |
logbase |
a numeric value larger than 1. If |
addOne |
should 1 be added to the ABF before it is log-transformed? If |
df.ratio |
integer specifying the degrees of freedom of the natural cubic
spline used in the logistic regression with repeated observations for estimating the ratio |
n.interval |
the number of intervals used in the logistic regression with
repeated observations (if |
df.dens |
integer specifying the degrees of freedom of the natural cubic
spline used in the Poisson regression to estimate the density of the observed
|
knots.mode |
logical specifying whether the |
type.nclass |
character string specifying the procedure used to compute the
number of cells of the histogram. Ignored if |
fast |
if |
B |
the number of permutations used in the estimation of the null distribution,
and hence, in the computation of the expected |
B.more |
a numeric value. If the number of all possible permutations is smaller
than or equal to (1+ |
B.max |
a numeric value. If the number of all possible permutations is smaller
than or equal to |
n.subset |
a numeric value indicating in how many subsets the |
rand |
numeric value. If specified, i.e. not |
A list containing statistics required by ebam
or sam
.
Holger Schwender, [email protected]
Efron, B., Tibshirani, R., Storey, J.D., and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment, JASA, 96, 1151-1160.
Louis, T.A., Carvalho, B.S., Fallin, M.D., Irizarry, R.A., Li, Q., and Ruczinski, I. (2010). Association Tests that Accommodate Genotyping Errors. In Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., and West, M. (eds.), Bayesian Statistics 9, 393-420. Oxford University Press, Oxford, UK. With Discussion.
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance Analysis of Microarrays Applied to the Ionizing Radiation Response. PNAS, 98, 5116-5121.
Wakefield, J. (2007). A Bayesian Measure of Probability of False Discovery in Genetic Epidemiology Studies. AJHG, 81, 208-227.
ebam
, sam
, EBAM-class
, SAM-class
Displays the help page or the argument list, respectively, for a EBAM-specific method.
help.ebam(method) args.ebam(method)
help.ebam(method) args.ebam(method)
method |
a name or a character string specifying the method for which
the arguments or the help page, respectively, should be shown. Currently
available are |
The arguments of the specified method
are displayed or a html page
containing the help for the specified method
is opened, respectively.
Holger Schwender, [email protected]
## Not run: # Displays the arguments of the function summary args.ebam(summary) # Opens the help page in the browser help.ebam(summary) ## End(Not run)
## Not run: # Displays the arguments of the function summary args.ebam(summary) # Opens the help page in the browser help.ebam(summary) ## End(Not run)
Displays the help page or the argument list, respectively, for a FindA0-specific method.
help.finda0(method) args.finda0(method)
help.finda0(method) args.finda0(method)
method |
a name or a character string specifying the method for which
the arguments or the help page, respectively, should be shown. Currently
available are |
The arguments of the specified method
are displayed or a html page
containing the help for the specified method
is opened, respectively.
Holger Schwender, [email protected]
## Not run: # Displays the arguments of the function summary args.finda0(summary) # Opens the help page in the browser help.finda0(summary) ## End(Not run)
## Not run: # Displays the arguments of the function summary args.finda0(summary) # Opens the help page in the browser help.finda0(summary) ## End(Not run)
Displays the help page or the argument list, respectively, for a SAM-specific method.
help.sam(method) args.sam(method)
help.sam(method) args.sam(method)
method |
a name or a character string specifying the method for which
the arguments or the help page, respectively, should be shown. Currently
available are |
The arguments of the specified method
are displayed or a html page
containing the help for the specified method
is opened, respectively.
Holger Schwender, [email protected]
## Not run: # Displays the arguments of the function summary args.sam(summary) # Opens the help page in the browser help.sam(summary) ## End(Not run)
## Not run: # Displays the arguments of the function summary args.sam(summary) # Opens the help page in the browser help.sam(summary) ## End(Not run)
Transforms the output of an analysis with limma into a SAM
or EBAM
object, such that a SAM or EBAM analysis, respectively,
can be performed using the test statistics provided by limma.
limma2sam(fit, coef, moderate = TRUE, sam.control = samControl()) limma2ebam(fit, coef, moderate = TRUE, delta = 0.9, ebam.control = ebamControl())
limma2sam(fit, coef, moderate = TRUE, sam.control = samControl()) limma2ebam(fit, coef, moderate = TRUE, delta = 0.9, ebam.control = ebamControl())
fit |
an object of class |
coef |
column number or name corresponding to the coefficient or contrast of
interest. For details, see the argument |
moderate |
should the limma t-statistic be considered? If |
sam.control |
further arguments for the SAM analysis. See |
delta |
the minimum posterior probability for a gene to be called differentially
expressed (or more generally, for a variable to be called significant) in an EBAM
analysis. For details, see |
ebam.control |
further arguments for an EBAM analysis. See |
An object of class SAM
or EBAM
.
Holger Schwender, [email protected]
Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment. JASA, 96, 1151-1160.
Smyth, G.K. (2004). Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. Statistical Applications in Genetics and Molecular Biology, 3(1), Article 3.
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance Analysis of Microarrays Applied to the Ionizing Radiation Response. PNAS, 98, 5116-5121.
sam
, ebam
, SAM-class
, EBAM-class
,
samControl
, ebamControl
Generates a htmlpage with links to several public repositories for a list of genes.
link.genes(genenames, filename, entrez = TRUE, refseq = TRUE, symbol = TRUE, omim = FALSE, ug = FALSE, fullname = FALSE, which.refseq = "NM", chipname = "", cdfname = NULL, refsnp = NULL, max.associated = 2, dataframe = NULL, title = NULL, bg.col = "white", text.col = "black", link.col = "blue", tableborder = 1, new.window = TRUE, load = TRUE)
link.genes(genenames, filename, entrez = TRUE, refseq = TRUE, symbol = TRUE, omim = FALSE, ug = FALSE, fullname = FALSE, which.refseq = "NM", chipname = "", cdfname = NULL, refsnp = NULL, max.associated = 2, dataframe = NULL, title = NULL, bg.col = "white", text.col = "black", link.col = "blue", tableborder = 1, new.window = TRUE, load = TRUE)
genenames |
a character vector containing the names of the interesting genes. |
filename |
a character string naming the file in which the output should be stored. Must have the suffix ".html". |
entrez |
logical indicating if Entrez links should be added to the output. |
refseq |
logical indicating if RefSeq links should be added to the output. |
symbol |
logical indicating if the gene symbols should be added to the output. |
omim |
logical indicating if OMIM links should be added to the output. |
ug |
logical indicating if UniGene links should be added to the output. |
fullname |
logical indicating whether the full gene names should be added to the output |
which.refseq |
character string or vector naming the first two letters of the RefSeq links that should be displayed in the html file. |
chipname |
character string specifying the chip type used in the analysis. Must
be specified as in the metadata section of Bioconductor (e.g., |
cdfname |
character string specifying the cdf name of the used chip. Must exactly follow
the nomenclatur of the Affymetrix chips (e.g., |
refsnp |
either a character vector or a data frame. If the former, |
max.associated |
integer specifying the maximum number of genes associated with the respective
SNP displayed in the html output. If all entries should be shown, set |
dataframe |
data frame having one row for each interesting gene, i.e.\ |
title |
character string naming the title that should be used in the html page. |
bg.col |
specification of the background color of the html page. See |
text.col |
specification of the color of the text used in the html page. See |
link.col |
specification of the color of the links used in the html file. See |
tableborder |
integer specifying the thickness of the border of the table. |
new.window |
logical indicating if the links should be opened in a new window. |
load |
logical value indicating whether to attempt to load the required annotation data package
if it is not already loaded. For details, see the man page of |
Holger Schwender, [email protected]
SAM-class
, sam
, link.siggenes
, sam2html
Generates a html page with links to several public repositories for a list of genes called differentially expressed when using a specific Delta value in a SAM or an EBAM analysis.
link.siggenes(object, delta, filename, gene.names = NULL, addDataFrame = TRUE, entrez = TRUE, refseq = TRUE, symbol = TRUE, omim = FALSE, ug = FALSE, fullname = FALSE, which.refseq = "NM", chipname = "", cdfname = NULL, refsnp = NULL, max.associated = 2, n.digits = 3, title = NULL, bg.col = "white", text.col = "black", link.col = "blue", tableborder = 1, new.window = TRUE, load = TRUE)
link.siggenes(object, delta, filename, gene.names = NULL, addDataFrame = TRUE, entrez = TRUE, refseq = TRUE, symbol = TRUE, omim = FALSE, ug = FALSE, fullname = FALSE, which.refseq = "NM", chipname = "", cdfname = NULL, refsnp = NULL, max.associated = 2, n.digits = 3, title = NULL, bg.col = "white", text.col = "black", link.col = "blue", tableborder = 1, new.window = TRUE, load = TRUE)
object |
a SAM or an EBAM object. |
delta |
a numerical value specifying the Delta value. |
filename |
character string naming the file in which the output should be stored. Must have the suffix ".html". |
gene.names |
a character vector of the same length as |
addDataFrame |
logical indicating if gene-specific information on the differentially expressed genes should be added to the output. |
entrez |
logical indicating if Entrez links should be added to the output. |
refseq |
logical indicating if RefSeq links should be added to the output. |
symbol |
logical indicating if the gene symbols should be added to the output. |
omim |
logical indicating if OMIM links should be added to the output. |
ug |
logical indicating if UniGene links should be added to the output. |
fullname |
logical indicating whether the full gene names should be added to the output. |
which.refseq |
character string or vector naming the first two letters of the RefSeq links that should be displayed in the html file. |
chipname |
character string specifying the chip type used in the analysis. Must
be specified as in the meta-data section of Bioconductor (e.g., |
cdfname |
character string specifying the cdf name of the used chip. Must exactly follow
the nomenclatur of the Affymetrix chips (e.g., |
refsnp |
either a character vector or a data frame. If the former, |
max.associated |
integer specifying the maximum number of genes associated with the respective
SNP displayed in the html output. If all entries should be shown, set |
n.digits |
integer specifying the number of decimal places used in the output. |
title |
character string naming the title that should be used in the html page. |
bg.col |
specification of the background color of the html page. See |
text.col |
specification of the color of the text used in the html page. See |
link.col |
specification of the color of the links used in the html file. See |
tableborder |
integer specifying the thickness of the border of the table. |
new.window |
logical indicating if the links should be opened in a new window. |
load |
logical value indicating whether to attempt to load the required annotation data package
if it is not already loaded. For details, see the man page of |
Holger Schwender, [email protected]
sam
, ebam
, link.genes
, sam2html
,
ebam2html
Lists the genes called differentially expressed by the SAM or the EBAM analysis for
a specified value of the threshold .
list.siggenes(object, delta, file = "", gene.names = NULL, order = TRUE, text = NULL, append = FALSE)
list.siggenes(object, delta, file = "", gene.names = NULL, order = TRUE, text = NULL, append = FALSE)
object |
either a SAM- or an EBAM-object. |
delta |
a numeric value specifying the threshold |
file |
a character string naming a file in which the output is stored.
If |
gene.names |
a character vector containing the names of the genes. Needs
only to be specified, if the gene names were not specified in |
order |
if |
text |
a character string specifying the heading of the gene list. By
default, the header specifies the type of analysis and the used value of
|
append |
If |
A list of significant genes either shown in the console or stored in a file.
Holger Schwender, [email protected]
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in \pkg{multtest}. library(multtest) data(golub) # Perform a SAM analysis. sam.out<-sam(golub, golub.cl, B=100, rand=123) # List the genes called significant by SAM using Delta = 3.1. list.siggenes(sam.out, 3.1, gene.names=golub.gnames[,2]) ## End(Not run)
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in \pkg{multtest}. library(multtest) data(golub) # Perform a SAM analysis. sam.out<-sam(golub, golub.cl, B=100, rand=123) # List the genes called significant by SAM using Delta = 3.1. list.siggenes(sam.out, 3.1, gene.names=golub.gnames[,2]) ## End(Not run)
Generates an MD plot for a specified value of Delta.
Contrary to a SAM plot in which the observed values of the test
statistic are plotted against the expected ones, the difference
between the observed and the expected values are plotted against
the observed values in an MD plot.
md.plot(object, delta, pos.stats = 1, sig.col = 3, xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, ylab = NULL, xsym = NULL, ysym = NULL, forceDelta = FALSE, includeZero = TRUE, lab = c(10, 10, 7), pch = NULL, sig.cex = 1, ...)
md.plot(object, delta, pos.stats = 1, sig.col = 3, xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, ylab = NULL, xsym = NULL, ysym = NULL, forceDelta = FALSE, includeZero = TRUE, lab = c(10, 10, 7), pch = NULL, sig.cex = 1, ...)
object |
an object of class SAM. |
delta |
a numeric value specifying the value of |
pos.stats |
an integer between 0 and 2. If |
sig.col |
a specification of the color of the significant genes. If |
xlim |
a numeric vector of length 2 specifying the x limits (minimum and maximum) of the plot. |
ylim |
a numeric vector of length 2 specifying the y limits of the plot. |
main |
a character string naming the main title of the plot. |
xlab |
a character string naming the label of the x axis. |
ylab |
a character string naming the label of the y axis. |
xsym |
should the range of the plotted x-axis be symmetric about the origin? Ignored if |
ysym |
should the range of the plotted y-axis be symmetric about the origin? Ignored if |
forceDelta |
should the two horizontal lines at |
includeZero |
should |
lab |
a numeric vector of length 3 specifying the approximate number of tickmarks on the x axis and on the y axis and the label size. |
pch |
either an integer specifying a symbol or a single character to be used as the
default in plotting points. For a description of how |
sig.cex |
a numerical value giving the amount by which the symbols of the significant genes should be scaled relative to the default. |
... |
further graphical parameters. See |
A MD plot.
Holger Schwender, [email protected]
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # Perform a SAM analysis for the two class unpaired case assuming # unequal variances. sam.out <- sam(golub, golub.cl, B=100, rand=123) # Generate a SAM plot for Delta = 2 plot(sam.out, 2) # As an alternative, the MD plot can be generated. md.plot(sam.out, 2) ## End(Not run)
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # Perform a SAM analysis for the two class unpaired case assuming # unequal variances. sam.out <- sam(golub, golub.cl, B=100, rand=123) # Generate a SAM plot for Delta = 2 plot(sam.out, 2) # As an alternative, the MD plot can be generated. md.plot(sam.out, 2) ## End(Not run)
Computes the number of cells in a histogram using the method of Wand (1994).
nclass.wand(x, level = 1)
nclass.wand(x, level = 1)
x |
numeric vector of observations. |
level |
integer specifying the number of levels of functional estimation used in
the estimation. For details, see the help page of |
nclass.wand
calls dpih
, and then computes the number of cells
corresponding to the optimal bin width returned by dpih
.
A numeric value specifying the number of cells for the histogram of x
.
Wand, M.P. (1997). Data-based choice of histogram bin width. American Statistician, 51, 59–64.
Estimates the prior probability that a gene is not differentially expressed by the natural cubic splines based method of Storey and Tibshirani (2003).
pi0.est(p, lambda = seq(0, 0.95, 0.05), ncs.value = "max", ncs.weights = NULL)
pi0.est(p, lambda = seq(0, 0.95, 0.05), ncs.value = "max", ncs.weights = NULL)
p |
a numeric vector containing the p-values of the genes. |
lambda |
a numeric vector or value specifying the |
ncs.value |
a character string. Only used if |
ncs.weights |
a numerical vector of the same length as |
For each value of lambda
, is
computed by the number of p-values
p
larger than
divided by
,
where
is the length of
p
.
If lambda
is a value, is the
estimate for the prior probabiltity
that a gene is
not differentially expressed.
If lambda
is a vector, a natural cubic spline with 3 degrees of
freedom is fitted through the data points
,
where each point is weighed by
ncs.weights
. is estimated
by
, where
if
ncs.value="max"
, and if
ncs.value="paper"
.
p0 |
the estimate of the prior probability that a gene is not differentially expressed. |
spline.out |
the output of |
Holger Schwender, [email protected]
Storey, J.D., and Tibshirani, R. (2003). Statistical Significance for Genome-wide Studies. PNAS, 100, 9440-9445.
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # Perform a SAM analysis. sam.out<-sam(golub, golub.cl, B=100, rand=123) # Estimate the prior probability that a gene is not significant pi0.est([email protected]) ## End(Not run)
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # Perform a SAM analysis. sam.out<-sam(golub, golub.cl, B=100, rand=123) # Estimate the prior probability that a gene is not significant pi0.est(sam.out@p.value) ## End(Not run)
Utility function for generating a plot of a SAM or an EBAM object in an html output.
plotArguments(pos.stats = NULL, sig.col = 3, xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, ylab = NULL, pty = "s", lab = c(10, 10, 7), pch = NULL, sig.cex = 1, stats.cex = 0.8, y.intersp = 1.3)
plotArguments(pos.stats = NULL, sig.col = 3, xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, ylab = NULL, pty = "s", lab = c(10, 10, 7), pch = NULL, sig.cex = 1, stats.cex = 0.8, y.intersp = 1.3)
pos.stats |
an integer between 0 and 2 for a SAM plot, and between 0 and
4 for an EBAM plot. See |
sig.col |
a specification of the color of the significant genes. If |
xlim |
a numeric vector of length 2 specifying the x limits (minimum and maximum) of the plot. |
ylim |
a numeric vector of length 2 specifying the y limits of the plot. |
main |
a character string naming the main title of the plot. |
xlab |
a character string naming the label of the x axis. |
ylab |
a character string naming the label of the y axis. |
pty |
a character specifying the type of plot region to be used. |
lab |
a numeric vector of length 3 specifying the approximate number of tickmarks on the x axis and on the y axis and the label size. |
pch |
either an integer specifying a symbol or a single character to be used as the
default in plotting points. For a description of how |
sig.cex |
a numerical value giving the amount by which the symbols of the significant genes should be scaled relative to the default. |
stats.cex |
the size of the statistics printed in the plot relative to the default size. Only available for an EBAM plot. |
y.intersp |
a numeric value specifying the space between the rows in which the statistics are plotted. Only available for an EBAM plot. |
A list required by sam2html
or ebam2html
if addPlot = TRUE
.
Holger Schwender, [email protected]
Utility function for generating a plot of the posterior probabilities in an html file when searching for the optimal value of the fudge factor in an EBAM analysis.
plotFindArguments(onlyTab = FALSE, logit = TRUE, pos.legend = NULL, legend.cex = 0.8, col = NULL, main = NULL, xlab = NULL, ylab = NULL, only.a0 = FALSE, lty = 1, lwd = 1, y.intersp = 1.1)
plotFindArguments(onlyTab = FALSE, logit = TRUE, pos.legend = NULL, legend.cex = 0.8, col = NULL, main = NULL, xlab = NULL, ylab = NULL, only.a0 = FALSE, lty = 1, lwd = 1, y.intersp = 1.1)
onlyTab |
if |
logit |
should the posterior probabilities be logit-transformed before they are plotted? |
pos.legend |
an integer between 0 and 4. See |
legend.cex |
the size of the text in the legend relative to the default size |
col |
a vector specifying the colors of the lines for the different values
of the fudge factor. For a description of how colors can be specified, see
|
main |
a character string naming the main title of the plot. |
xlab |
a character string naming the label of the x axis. |
ylab |
a character string naming the label of the y axis. |
only.a0 |
if |
lty |
a value or vector specifying the line type of the curves. For details,
see |
lwd |
a numeric value specifying the width of the plotted lines. For details,
see |
y.intersp |
a numeric value specifying the space between the rows of the legend. |
A list required by ebam2html
if findA0
is specified.
Holger Schwender, [email protected]
Computes the q-values of a given set of p-values.
qvalue.cal(p, p0, version = 1)
qvalue.cal(p, p0, version = 1)
p |
a numeric vector containing the p-values. |
p0 |
a numeric value specifying the prior probability that a gene is not differentially expressed. |
version |
If |
Using version = 1
in qvalue.cal
corresponds to setting
robust = FALSE
in the function qvalue
of John Storey's
R package qvalue, while version = 2
corresponds to
robust = TRUE
.
A vector of the same length as p
containing the q-values
corresponding to the p-values in p
.
Holger Schwender, [email protected]
Storey, J.D. (2003). The positive False Discovery Rate: A Bayesian Interpretation and the q-value. Annals of Statistics, 31, 2013-2035.
Storey, J.D., and Tibshirani, R. (2003). Statistical Significance for Genome-wide Studies. PNAS, 100, 9440-9445.
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # Perform a SAM analysis. sam.out<-sam(golub, golub.cl, B=100, rand=123) # Estimate the prior probability that a gene is not significant. pi0 <- pi0.est([email protected])$p0 # Compute the q-values of the genes. q.value <- qvalue.cal([email protected], pi0) ## End(Not run)
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # Perform a SAM analysis. sam.out<-sam(golub, golub.cl, B=100, rand=123) # Estimate the prior probability that a gene is not significant. pi0 <- pi0.est(sam.out@p.value)$p0 # Compute the q-values of the genes. q.value <- qvalue.cal(sam.out@p.value, pi0) ## End(Not run)
Computes either the Wilcoxon Rank Sum or Signed Rank Statistics for all rows of a matrix simultaneously.
rowWilcoxon(X, cl, rand = NA)
rowWilcoxon(X, cl, rand = NA)
X |
a matrix in which each row corresponds to a variable, and each column to an observation/sample. |
cl |
a numeric vector consisting of ones and zeros. The length of
|
rand |
Sets the random number generator into a reproducible state.
Ignored if Wilcoxon rank sums are computed, or |
If there are ties, then the ranks of the observations belonging to the same group of tied observations will be set to the maximum rank available for the corresponding group.
A numeric vector containing Wilcoxon rank statistics for each row of X
.
Holger Schwender, [email protected]
Performs a Significance Analysis of Microarrays (SAM). It is possible to perform one and two class analyses using either a modified t-statistic or a (standardized) Wilcoxon rank statistic, and a multiclass analysis using a modified F-statistic. Moreover, this function provides a SAM procedure for categorical data such as SNP data and the possibility to employ an user-written score function.
sam(data, cl, method = d.stat, control=samControl(), gene.names = dimnames(data)[[1]], ...)
sam(data, cl, method = d.stat, control=samControl(), gene.names = dimnames(data)[[1]], ...)
data |
a matrix, a data frame, or an ExpressionSet object. Each row of Can also be a list (if |
cl |
a vector of length In the one-class case, In the two class unpaired case, In the two class paired case, In the multiclass case and if For examples of how |
method |
a character string or a name specifying the method/function that should be used
in the computation of the expression scores If If For an analysis of categorical data such as SNP data,
If the variables are ordinal and a trend test should be applied
(e.g., in the two-class case, the Cochran-Armitage trend test), It is also possible to use
an user-written function to compute the expression scores.
For details, see |
control |
further optional arguments for controlling the SAM analysis. For
these arguments, see |
gene.names |
a character vector of length |
... |
further arguments of the specific SAM methods. If |
sam
provides SAM procedures for several types of analysis (one and two class analyses
with either a modified t-statistic or a Wilcoxon rank statistic, a multiclass analysis
with a modified F statistic, and an analysis of categorical data). It is, however, also
possible to write your own function for another type of analysis. The required arguments
of this function must be data
and cl
. This function can also have other
arguments. The output of this function must be a list containing the following objects:
d
:a numeric vector consisting of the expression scores of the genes.
d.bar
:a numeric vector of the same length as na.exclude(d)
specifying
the expected expression scores under the null hypothesis.
p.value
:a numeric vector of the same length as d
containing
the raw, unadjusted p-values of the genes.
vec.false
:a numeric vector of the same length as d
consisting of
the one-sided numbers of falsely called genes, i.e. if the numbers
of genes expected to be larger than
under the null hypothesis, and if
, the number of genes expected to be smaller than
under the
null hypothesis.
s
:a numeric vector of the same length as d
containing the standard deviations
of the genes. If no standard deviation can be calculated, set s = numeric(0)
.
s0
:a numeric value specifying the fudge factor. If no fudge factor is calculated,
set s0 = numeric(0)
.
mat.samp
:a matrix with B rows and ncol(data)
columns, where B is the number
of permutations, containing the permutations used in the computation of the permuted
d-values. If such a matrix is not computed, set mat.samp = matrix(numeric(0))
.
msg
:a character string or vector containing information about, e.g., which type of analysis
has been performed. msg
is printed when the function print
or
summary
, respectively, is called. If no such message should be printed, set msg = ""
.
fold
:a numeric vector of the same length as d
consisting of the fold
changes of the genes. If no fold change has been computed, set fold = numeric(0)
.
If this function is, e.g., called foo
, it can be used by setting method = foo
in sam
. More detailed information and an example will be contained in the siggenes
manual.
An object of class SAM.
Holger Schwender, [email protected]
Schwender, H., Krause, A., and Ickstadt, K. (2006). Identifying Interesting Genes with siggenes. RNews, 6(5), 45-50.
Schwender, H. (2004). Modifying Microarray Analysis Methods for Categorical Data – SAM and PAM for SNPs. To appear in: Proceedings of the the 28th Annual Conference of the GfKl.
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.
SAM-class
,d.stat
,wilc.stat
,
chisq.stat
, samControl
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # golub.cl contains the class labels. golub.cl # Perform a SAM analysis for the two class unpaired case assuming # unequal variances. sam.out <- sam(golub, golub.cl, B=100, rand=123) sam.out # Obtain the Delta plots for the default set of Deltas plot(sam.out) # Generate the Delta plots for Delta = 0.2, 0.4, 0.6, ..., 2 plot(sam.out, seq(0.2, 0.4, 2)) # Obtain the SAM plot for Delta = 2 plot(sam.out, 2) # Get information about the genes called significant using # Delta = 3. sam.sum3 <- summary(sam.out, 3, entrez=FALSE) # Obtain the rows of golub containing the genes called # differentially expressed [email protected] # and their names golub.gnames[[email protected], 3] # The matrix containing the d-values, q-values etc. of the # differentially expressed genes can be obtained by [email protected] # Perform a SAM analysis using Wilcoxon rank sums sam(golub, golub.cl, method="wilc.stat", rand=123) # Now consider only the first ten columns of the Golub et al. (1999) # data set. For now, let's assume the first five columns were # before treatment measurements and the next five columns were # after treatment measurements, where column 1 and 6, column 2 # and 7, ..., build a pair. In this case, the class labels # would be new.cl <- c(-(1:5), 1:5) new.cl # and the corresponding SAM analysis for the two-class paired # case would be sam(golub[,1:10], new.cl, B=100, rand=123) # Another way of specifying the class labels for the above paired # analysis is mat.cl <- matrix(c(rep(c(-1, 1), e=5), rep(1:5, 2)), 10) mat.cl # and the above SAM analysis can also be done by sam(golub[,1:10], mat.cl, B=100, rand=123) ## End(Not run)
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # golub.cl contains the class labels. golub.cl # Perform a SAM analysis for the two class unpaired case assuming # unequal variances. sam.out <- sam(golub, golub.cl, B=100, rand=123) sam.out # Obtain the Delta plots for the default set of Deltas plot(sam.out) # Generate the Delta plots for Delta = 0.2, 0.4, 0.6, ..., 2 plot(sam.out, seq(0.2, 0.4, 2)) # Obtain the SAM plot for Delta = 2 plot(sam.out, 2) # Get information about the genes called significant using # Delta = 3. sam.sum3 <- summary(sam.out, 3, entrez=FALSE) # Obtain the rows of golub containing the genes called # differentially expressed sam.sum3@row.sig.genes # and their names golub.gnames[sam.sum3@row.sig.genes, 3] # The matrix containing the d-values, q-values etc. of the # differentially expressed genes can be obtained by sam.sum3@mat.sig # Perform a SAM analysis using Wilcoxon rank sums sam(golub, golub.cl, method="wilc.stat", rand=123) # Now consider only the first ten columns of the Golub et al. (1999) # data set. For now, let's assume the first five columns were # before treatment measurements and the next five columns were # after treatment measurements, where column 1 and 6, column 2 # and 7, ..., build a pair. In this case, the class labels # would be new.cl <- c(-(1:5), 1:5) new.cl # and the corresponding SAM analysis for the two-class paired # case would be sam(golub[,1:10], new.cl, B=100, rand=123) # Another way of specifying the class labels for the above paired # analysis is mat.cl <- matrix(c(rep(c(-1, 1), e=5), rep(1:5, 2)), 10) mat.cl # and the above SAM analysis can also be done by sam(golub[,1:10], mat.cl, B=100, rand=123) ## End(Not run)
This is a class representation for several versions of the SAM (Significance Analysis of Microarrays) procedure proposed by Tusher et al. (2001).
Objects can be created using the functions sam
, sam.dstat
,
sam.wilc
and sam.snp
.
d
:Object of class "numeric"
representing the
expression scores of the genes.
d.bar
:Object of class "numeric"
representing
the expected expression scores under the null hypothesis.
vec.false
:Object of class "numeric"
containing
the one-sided expected number of falsely called genes.
p.value
:Object of class "numeric"
consisting of
the p-values of the genes.
s
:Object of class "numeric"
representing the
standard deviations of the genes. If the standard deviations are
not computed, s
will be set to numeric(0)
.
s0
:Object of class "numeric"
representing the
value of the fudge factor. If not computed, s0
will be
set to numeric(0)
.
mat.samp
:Object of class "matrix"
containing
the permuted group labels used in the estimation of the null
distribution. Each row represents one permutation, each column
one observation (pair). If no permutation procedure has been used,
mat.samp
will be set to matrix(numeric(0))
.
p0
:Object of class "numeric"
representing the
prior probability that a gene is not differentially expressed.
mat.fdr
:Object of class "matrix"
containing general
information as the number of significant genes and the estimated FDR
for several values of . Each row represents one
value of
, each of the 9 columns one statistic.
q.value
:Object of class "numeric"
consisting of
the q-values of the genes. If not computed, q.value
will be
set to numeric(0)
.
fold
:Object of class "numeric"
representing the
fold changes of the genes. If not computed, fold
will be
set to numeric(0)
.
msg
:Object of class "character"
containing information
about, e.g., the type of analysis. msg
is printed when the functions
print
and summary
, respectively, are called.
chip
:Object of class "character"
naming the microarray
used in the analysis. If no information about the chip is available,
chip
will be set to ""
.
signature(x = "SAM")
: After generating a SAM plot,
identify
can be used to obtain information about the genes by
clicking on the symbols in the SAM plot. For details, see
help.sam(identify)
. Arguments are listed by args.sam(identify)
.
signature(x = "SAM")
: Generates a SAM plot or the Delta
plots. If the specified delta
in plot(object,delta)
is
a numeric value, a SAM plot will be generated. If delta
is either
not specified or a numeric vector, the Delta plots will be generated.
For details, see ?sam.plot2
, ?delta.plot
or
help.sam(plot)
,respectively. Arguments are listed by args.sam(plot)
.
signature(x = "SAM")
: Prints general information such as
the number of significant genes and the estimated FDR for a set of
. For details, see
help.sam(print)
. Arguments are
listed by args.sam(print)
.
signature(object = "SAM")
: Shows the output of the SAM
analysis.
signature(object = "SAM")
: Summarizes the results of
a SAM analysis. If delta
in summary(object,delta)
is not
specified or a numeric vector, the information shown by print and some
additional information will be shown. If delta
is a numeric
vector, the general information for the specific is
shown and additionally gene-specific information about the genes called
significant using this value of
. The output of summary
is an object of class sumSAM which has the slots
row.sig.genes
,
mat.fdr
, mat.sig
and list.args
. For details,
see help.sam(summary)
. All arguments are listed by args.sam(summary)
.
SAM was developed by Tusher et al. (2001).
!!! There is a patent pending for the SAM technology at Stanford University. !!!
Holger Schwender, [email protected]
Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany. http://www.sfb475.uni-dortmund.de/berichte/tr44-03.pdf.
Schwender, H. (2004). Modifying Microarray Analysis Methods for Categorical Data – SAM and PAM for SNPs. To appear in: Proceedings of the the 28th Annual Conference of the GfKl.
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.
sam
,args.sam
,sam.plot2
,
delta.plot
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # Perform a SAM analysis for the two class unpaired case assuming # unequal variances. sam.out <- sam(golub, golub.cl, B=100, rand=123) sam.out # Alternative ways to show the output of sam. show(sam.out) print(sam.out) # Obtain a little bit more information. summary(sam.out) # Print the results of the SAM analysis for other values of Delta. print(sam.out, seq(.2, 2, .2)) # Again, the same with additional information. summary(sam.out, seq(.2, 2, .2)) # Obtain the Delta plots for the default set of Deltas. plot(sam.out) # Generate the Delta plots for Delta = 0.2, 0.4, 0.6, ..., 2. plot(sam.out, seq(0.2, 0.4, 2)) # Obtain the SAM plot for Delta = 2. plot(sam.out, 2) # Get information about the genes called significant using # Delta = 3. sam.sum3 <- summary(sam.out, 3) sam.sum3 # Obtain the rows of the Golub et al. (1999) data set containing # the genes called differentially expressed [email protected] # and their names golub.gnames[[email protected], 3] # The matrix containing the d-values, q-values etc. of the # differentially expressed genes can be obtained by [email protected] ## End(Not run)
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # Perform a SAM analysis for the two class unpaired case assuming # unequal variances. sam.out <- sam(golub, golub.cl, B=100, rand=123) sam.out # Alternative ways to show the output of sam. show(sam.out) print(sam.out) # Obtain a little bit more information. summary(sam.out) # Print the results of the SAM analysis for other values of Delta. print(sam.out, seq(.2, 2, .2)) # Again, the same with additional information. summary(sam.out, seq(.2, 2, .2)) # Obtain the Delta plots for the default set of Deltas. plot(sam.out) # Generate the Delta plots for Delta = 0.2, 0.4, 0.6, ..., 2. plot(sam.out, seq(0.2, 0.4, 2)) # Obtain the SAM plot for Delta = 2. plot(sam.out, 2) # Get information about the genes called significant using # Delta = 3. sam.sum3 <- summary(sam.out, 3) sam.sum3 # Obtain the rows of the Golub et al. (1999) data set containing # the genes called differentially expressed sam.sum3@row.sig.genes # and their names golub.gnames[sam.sum3@row.sig.genes, 3] # The matrix containing the d-values, q-values etc. of the # differentially expressed genes can be obtained by sam.sum3@mat.sig ## End(Not run)
Generates a SAM plot for a specified value of Delta.
sam.plot2(object, delta, pos.stats = NULL, sig.col = 3, xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, ylab = NULL, pty = "s", lab = c(10, 10, 7), pch = NULL, sig.cex = 1, ...)
sam.plot2(object, delta, pos.stats = NULL, sig.col = 3, xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, ylab = NULL, pty = "s", lab = c(10, 10, 7), pch = NULL, sig.cex = 1, ...)
object |
an object of class SAM. |
delta |
a numeric value specifying the value of |
pos.stats |
an integer between 0 and 2. If |
sig.col |
a specification of the color of the significant genes. If |
xlim |
a numeric vector of length 2 specifying the x limits (minimum and maximum) of the plot. |
ylim |
a numeric vector of length 2 specifying the y limits of the plot. |
main |
a character string naming the main title of the plot. |
xlab |
a character string naming the label of the x axis. |
ylab |
a character string naming the label of the y axis. |
pty |
a character specifying the type of plot region to be used. |
lab |
a numeric vector of length 3 specifying the approximate number of tickmarks on the x axis and on the y axis and the label size. |
pch |
either an integer specifying a symbol or a single character to be used as the
default in plotting points. For a description of how |
sig.cex |
a numerical value giving the amount by which the symbols of the significant genes should be scaled relative to the default. |
... |
further graphical parameters. See |
A SAM plot.
Holger Schwender, [email protected]
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # Perform a SAM analysis for the two class unpaired case assuming # unequal variances. sam.out <- sam(golub, golub.cl, B=100, rand=123) # Generate a SAM plot for Delta = 2 sam.plot2(sam.out, 2) # Alternatively way of generating the same SAM plot plot(sam.out, 2) # As an alternative, the MD plot can be generated. md.plot(sam.out, 2) ## End(Not run)
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # Perform a SAM analysis for the two class unpaired case assuming # unequal variances. sam.out <- sam(golub, golub.cl, B=100, rand=123) # Generate a SAM plot for Delta = 2 sam.plot2(sam.out, 2) # Alternatively way of generating the same SAM plot plot(sam.out, 2) # As an alternative, the MD plot can be generated. md.plot(sam.out, 2) ## End(Not run)
Specifies most of the optional arguments of sam
.
samControl(delta = NULL, n.delta = 10, p0 = NA, lambda = seq(0, 0.95, 0.05), ncs.value = "max", ncs.weights = NULL, q.version = 1)
samControl(delta = NULL, n.delta = 10, p0 = NA, lambda = seq(0, 0.95, 0.05), ncs.value = "max", ncs.weights = NULL, q.version = 1)
delta |
a numeric vector specifying a set of values for the threshold
|
n.delta |
a numeric value specifying the number of |
p0 |
a numeric value specifying the prior probability |
lambda |
a numeric vector or value specifying the |
ncs.value |
a character string. Only used if |
ncs.weights |
a numerical vector of the same length as |
q.version |
a numeric value indicating which version of the q-value should
be computed. If |
These parameters should only be changed if they are fully understood.
A list containing the values of the parameters that are used in sam
.
Holger Schwender, [email protected]
Schwender, H., Krause, A., and Ickstadt, K. (2006). Identifying Interesting Genes with siggenes. RNews, 6(5), 45-50.
Storey, J.D. and Tibshirani, R. (2003). Statistical Significance for Genome-Wide Studies. Proceedings of the National Academy of Sciences, 100, 9440-9445.
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.
Generates a csv file for either a SAM or an EBAM object for the use in Excel. This csv file can contain general information as the number of differentially expressed genes and the estimated FDR, and gene-specific information on the differentially expressed genes.
sam2excel(object, delta, file, excel.version=1, n.digits = 3, what = "both", entrez = FALSE, chip = "", quote = FALSE) ebam2excel(object, delta, file, excel.version=1, n.digits = 4, what = "both", entrez = FALSE, chip = "", quote = FALSE)
sam2excel(object, delta, file, excel.version=1, n.digits = 3, what = "both", entrez = FALSE, chip = "", quote = FALSE) ebam2excel(object, delta, file, excel.version=1, n.digits = 4, what = "both", entrez = FALSE, chip = "", quote = FALSE)
object |
either a SAM or an EBAM object. |
delta |
a numerical value specifying the Delta value. |
file |
character string naming the file in which the output should be stored. Must have the suffix ".csv". |
excel.version |
either |
n.digits |
integer specifying the number of decimal places used in the output. |
what |
either |
entrez |
logical indicating if both the Entrez links and the symbols of the genes will be added to the output. |
chip |
character string naming the chip type used in this analysis. Must
be specified as in the meta-data section of Bioconductor (e.g., |
quote |
logical indicating if character strings and factors should be surrounded by
double quotes. For details see |
Holger Schwender, [email protected]
sam
, sam2html
, ebam
, ebam2html
Generates a html page for a SAM or an EBAM object. This html page can contain general information as the number of differentially expressed genes and the estimated FDR, the SAM or EBAM plot, and gene-specific information on the differentially expressed genes.
ebam2html(object, delta, filename, addStats = TRUE, addPlot = TRUE, addGenes = TRUE, findA0 = NULL, varName = NULL, entrez = TRUE, refseq = TRUE, symbol = TRUE, omim = FALSE, ug = FALSE, fullname = FALSE, chipname = "", cdfname = NULL, which.refseq = "NM", refsnp = NULL, max.associated = 2, n.digits = 3, bg.col = "white", text.col = "black", link.col = "blue", plotArgs = plotArguments(), plotFindArgs = plotFindArguments(), bg.plot.adjust = FALSE, plotname = NULL, plotborder = 0, tableborder = 1, new.window = TRUE, load = TRUE, ...) sam2html(object, delta, filename, addStats = TRUE, addPlot = TRUE, addGenes = TRUE, varName = NULL, entrez = TRUE, refseq = TRUE, symbol = TRUE, omim = FALSE, ug = FALSE, fullname = FALSE, bonf = FALSE, chipname = "", cdfname = NULL, which.refseq = "NM", refsnp = NULL, max.associated = 2, n.digits = 3, bg.col = "white", text.col = "black", link.col = "blue", plotArgs = plotArguments(), bg.plot.adjust = FALSE, plotname = NULL, plotborder = 0, tableborder = 1, new.window = TRUE, load = TRUE, ...)
ebam2html(object, delta, filename, addStats = TRUE, addPlot = TRUE, addGenes = TRUE, findA0 = NULL, varName = NULL, entrez = TRUE, refseq = TRUE, symbol = TRUE, omim = FALSE, ug = FALSE, fullname = FALSE, chipname = "", cdfname = NULL, which.refseq = "NM", refsnp = NULL, max.associated = 2, n.digits = 3, bg.col = "white", text.col = "black", link.col = "blue", plotArgs = plotArguments(), plotFindArgs = plotFindArguments(), bg.plot.adjust = FALSE, plotname = NULL, plotborder = 0, tableborder = 1, new.window = TRUE, load = TRUE, ...) sam2html(object, delta, filename, addStats = TRUE, addPlot = TRUE, addGenes = TRUE, varName = NULL, entrez = TRUE, refseq = TRUE, symbol = TRUE, omim = FALSE, ug = FALSE, fullname = FALSE, bonf = FALSE, chipname = "", cdfname = NULL, which.refseq = "NM", refsnp = NULL, max.associated = 2, n.digits = 3, bg.col = "white", text.col = "black", link.col = "blue", plotArgs = plotArguments(), bg.plot.adjust = FALSE, plotname = NULL, plotborder = 0, tableborder = 1, new.window = TRUE, load = TRUE, ...)
object |
a SAM or an EBAM object. |
delta |
a numerical value specifying the Delta value. |
filename |
character string naming the file in which the output should be stored. Must have the suffix ".html". |
addStats |
logical indicating if general information as the number of differentially expressed genes and the estimated FDR should be added to the html page. |
addPlot |
logical indicating if the SAM/EBAM plot should be added to the html page |
addGenes |
logical indicating if gene-specific information on the differentially expressed genes should be added to the html page. |
findA0 |
an object of class FindA0. If specified, the numbers of differentially expressed genes and the estimated FDRs for the different possible values of the fudge factor and the corresponding plot of the logit-transformed posterior probabilities are included in the html file. |
varName |
character string indicating how the variables should be named. If |
entrez |
logical indicating if Entrez links should be added to the output. Ignored if
|
refseq |
logical indicating if RefSeq links should be added to the output. Ignored
if |
symbol |
logical indicating if the gene symbols should be added to the output.
Ignored if |
omim |
logical indicating if OMIM links should be added to the output. Ignored
if |
ug |
logical indicating if UniGene links should be added to the output. Ignored if
|
fullname |
logical indicating whether the full gene names should be added to the output.
Ignored if |
bonf |
logical indicating whether Bonferroni adjusted p-values should be added to the
output. Ignored if |
chipname |
character string specifying the chip type used in the analysis. Must
be specified as in the meta-data section of Bioconductor (e.g., |
cdfname |
character string specifying the cdf name of the used chip. Must exactly follow
the nomenclatur of the Affymetrix chips (e.g., |
which.refseq |
character string or vector naming the first two letters of the RefSeq links that should be displayed in the html file. |
refsnp |
either a character vector or a data frame. If the former, |
max.associated |
integer specifying the maximum number of genes associated with the respective
SNP displayed in the html output. If all entries should be shown, set |
n.digits |
integer specifying the number of decimal places used in the output. |
bg.col |
specification of the background color of the html page. See |
text.col |
specification of the color of the text used in the html page. See
|
link.col |
specification of the color of the links used in the html file.
See |
plotArgs |
further arguments for generating the SAM/EBAM plot. These are the arguments used
by the SAM/EBAM specific |
plotFindArgs |
further arguments for generating the (logit-transformed) posterior
probabilities for the different values of the fudge factor. Ignored if |
bg.plot.adjust |
logical indicating if the background color of the SAM plot should be
the same as the background color of the html page. If |
plotname |
character string naming the file in which the SAM/EBAM plot is stored. This file
is needed when the SAM/EBAM plot should be added to the html page. If not specified the SAM/EBAM
plot will be stored as png file in the same folder as the html page. Ignored if |
plotborder |
integer specifying the thickness of the border around the plot. By default,
|
tableborder |
integer specifying the thickness of the border of the table. Ignored if
|
new.window |
logical indicating if the links should be opened in a new window. |
load |
logical value indicating whether to attempt to load the required annotation data package
if it is not already loaded. For details, see the man page of |
... |
further graphical arguments for the SAM/EBAM plot. See |
Holger Schwender, [email protected]
SAM-class
, sam
, EBAM-class
, ebam
,
link.genes
, link.siggenes
, plotArguments
,
plotFindArguments
These classes are just used for a nicer output of the summary of an object of class SAM or EBAM, respectively.
Objects can be created by calls of the form new("sumSAM", ...)
,
or by using the function summary(object)
when object is a
SAM-class object.
Objects can be created by calls of the form new("sumEBAM", ...)
,
or by using the function summary(object)
when object is an
EBAM-class object.
row.sig.genes
:Object of class "numeric"
consisting
of the row numbers of the significant genes in the data matrix.
mat.fdr
:Object of class "matrix"
containing general
information as the number of differentially expressed genes and the
estimated FDR for either one or several values of Delta.
mat.sig
:Object of class "data.frame"
containing gene-specific
statistics as the d-values (or z-values) and the q-values or (the local FDR)
of the differentially expressed genes.
list.args
:Object of class "list"
consisting of some of
the specified arguments of summary needed for internal use.
signature(x = "sumSAM")
: Prints the output of the SAM-specific
method summary.
signature(object = "sumSAM")
: Shows the output of the summary
of a SAM analysis.
signature(x = "sumEBAM")
: Prints the output of the EBAM-specific
method summary.
signature(object = "sumEBAM")
: Shows the output of the summary
of a EBAM analysis.
Holger Schwender, [email protected]
Generates the required statistics for an Empirical Bayes Analysis of Microarrays for a linear trend in (ordinal) data.
In the two-class case, the Cochran-Armitage trend statistic is computed. Otherwise, the statistic for the general test of trend described on page 87 of Agresti (2002) is determined.
Should not be called directly, but via ebam(..., method = trend.ebam).
## Default S3 method: trend.ebam(data, cl, catt = TRUE, approx = TRUE, n.interval = NULL, df.dens = NULL, knots.mode = NULL, type.nclass = "wand", B = 100, B.more = 0.1, B.max = 50000, n.subset = 10, fast = FALSE, df.ratio = 3, rand = NA, ...) ## S3 method for class 'list' trend.ebam(data, cl, catt = TRUE, approx = TRUE, n.interval = NULL, df.dens = NULL, knots.mode = NULL, type.nclass = "wand", ...)
## Default S3 method: trend.ebam(data, cl, catt = TRUE, approx = TRUE, n.interval = NULL, df.dens = NULL, knots.mode = NULL, type.nclass = "wand", B = 100, B.more = 0.1, B.max = 50000, n.subset = 10, fast = FALSE, df.ratio = 3, rand = NA, ...) ## S3 method for class 'list' trend.ebam(data, cl, catt = TRUE, approx = TRUE, n.interval = NULL, df.dens = NULL, knots.mode = NULL, type.nclass = "wand", ...)
data |
either a numeric matrix or data frame, or a list. If a matrix or data frame, then each row must correspond to a variable (e.g., a SNP), and each column to a sample (i.e.\ an observation). The values in the matrix or data frame are interpreted as the scores for the different levels of the variables. If the number of observations is huge it is better to specify |
cl |
a numeric vector of length |
catt |
should the Cochran-Armitage trend statistic be computed in the two-class case? If |
approx |
should the null distribution be approximated by the |
n.interval |
the number of intervals used in the logistic regression with
repeated observations for estimating the ratio |
df.dens |
integer specifying the degrees of freedom of the natural cubic
spline used in the Poisson regression to estimate the density of the observed
|
knots.mode |
if |
type.nclass |
character string specifying the procedure used to compute the
number of cells of the histogram. Ignored if |
B |
the number of permutations used in the estimation of the null distribution,
and hence, in the computation of the expected |
B.more |
a numeric value. If the number of all possible permutations is smaller
than or equal to (1+ |
B.max |
a numeric value. If the number of all possible permutations is smaller
than or equal to |
n.subset |
a numeric value indicating in how many subsets the |
fast |
if |
df.ratio |
integer specifying the degrees of freedom of the natural cubic
spline used in the logistic regression with repeated observations. Ignored
if |
rand |
numeric value. If specified, i.e. not |
... |
ignored. |
A list containing statistics required by ebam
.
Holger Schwender, [email protected]
Agresti, A.\ (2002). Categorical Data Analysis. Wiley, Hoboken, NJ. 2nd Edition.
Efron, B., Tibshirani, R., Storey, J.D., and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment, JASA, 96, 1151-1160.
EBAM-class
,ebam
, trend.stat
, chisq.ebam
## Not run: # Generate a random 1000 x 40 matrix consisting of the values # 1, 2, and 3, and representing 1000 variables and 40 observations. mat <- matrix(sample(3, 40000, TRUE), 1000) # Assume that the first 20 observations are cases, and the # remaining 20 are controls, and that the values 1, 2, 3 in # mat can be interpreted as scores for the different levels # of the variables. cl <- rep(1:2, e=20) # Then an EBAM analysis of linear trend can be done by out <- ebam(mat, cl, method=trend.ebam) out # The same results can also be obtained by employing # contingency tables, i.e. by specifying data as a list. # For this, we need to generate the tables summarizing # groupwise how many observations show which level at # which variable. These tables can be obtained by library(scrime) cases <- rowTables(mat[, cl==1]) controls <- rowTables(mat[, cl==2]) ltabs <- list(cases, controls) # And the same EBAM analysis as above can then be # performed by out2 <- ebam(ltabs, method=trend.ebam) out2 ## End(Not run)
## Not run: # Generate a random 1000 x 40 matrix consisting of the values # 1, 2, and 3, and representing 1000 variables and 40 observations. mat <- matrix(sample(3, 40000, TRUE), 1000) # Assume that the first 20 observations are cases, and the # remaining 20 are controls, and that the values 1, 2, 3 in # mat can be interpreted as scores for the different levels # of the variables. cl <- rep(1:2, e=20) # Then an EBAM analysis of linear trend can be done by out <- ebam(mat, cl, method=trend.ebam) out # The same results can also be obtained by employing # contingency tables, i.e. by specifying data as a list. # For this, we need to generate the tables summarizing # groupwise how many observations show which level at # which variable. These tables can be obtained by library(scrime) cases <- rowTables(mat[, cl==1]) controls <- rowTables(mat[, cl==2]) ltabs <- list(cases, controls) # And the same EBAM analysis as above can then be # performed by out2 <- ebam(ltabs, method=trend.ebam) out2 ## End(Not run)
Generates the required statistics for a Significance Analysis of Microarrays for a linear trend in (ordinal) data.
In the two-class case, the Cochran-Armitage trend statistic is computed. Otherwise, the statistic for the general test of trend described on page 87 of Agresti (2002) is determined.
Should not be called directly, but via sam(..., method = trend.stat).
## Default S3 method: trend.stat(data, cl, catt = TRUE, approx = TRUE, B = 100, B.more = 0.1, B.max = 50000, n.subset = 10, rand = NA, ...) ## S3 method for class 'list' trend.stat(data, cl, catt = TRUE, approx = TRUE, B = 100, B.more = 0.1, B.max = 50000, n.subset = 10, rand = NA, ...)
## Default S3 method: trend.stat(data, cl, catt = TRUE, approx = TRUE, B = 100, B.more = 0.1, B.max = 50000, n.subset = 10, rand = NA, ...) ## S3 method for class 'list' trend.stat(data, cl, catt = TRUE, approx = TRUE, B = 100, B.more = 0.1, B.max = 50000, n.subset = 10, rand = NA, ...)
data |
either a numeric matrix or data frame, or a list. If a matrix or data frame, then each row must correspond to a variable (e.g., a SNP), and each column to a sample (i.e.\ an observation). The values in the matrix or data frame are interpreted as the scores for the different levels of the variables. If the number of observations is huge it is better to specify |
cl |
a numeric vector of length |
catt |
should the Cochran-Armitage trend statistic be computed in the two-class case? If |
approx |
should the null distribution be approximated by the |
B |
the number of permutations used in the estimation of the null distribution,
and hence, in the computation of the expected |
B.more |
a numeric value. If the number of all possible permutations is smaller
than or equal to (1+ |
B.max |
a numeric value. If the number of all possible permutations is smaller
than or equal to |
n.subset |
a numeric value indicating how many permutations are considered
simultaneously when computing the expected |
rand |
numeric value. If specified, i.e. not |
... |
ignored. |
A list containing statistics required by sam
.
Holger Schwender, [email protected]
Agresti, A.\ (2002). Categorical Data Analysis. Wiley, Hoboken, NJ. 2nd Edition.
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.
SAM-class
,sam
, chisq.stat
, trend.ebam
## Not run: # Generate a random 1000 x 40 matrix consisting of the values # 1, 2, and 3, and representing 1000 variables and 40 observations. mat <- matrix(sample(3, 40000, TRUE), 1000) # Assume that the first 20 observations are cases, and the # remaining 20 are controls, and that the values 1, 2, 3 in mat # can be interpreted as scores for the different levels # of the variables represented by the rows of mat. cl <- rep(1:2, e=20) # Then an SAM analysis of linear trend can be done by out <- sam(mat, cl, method=trend.stat) out # The same results can also be obtained by employing # contingency tables, i.e. by specifying data as a list. # For this, we need to generate the tables summarizing # groupwise how many observations show which level at # which variable. These tables can be obtained by library(scrime) cases <- rowTables(mat[, cl==1]) controls <- rowTables(mat[, cl==2]) ltabs <- list(cases, controls) # And the same SAM analysis as above can then be # performed by out2 <- sam(ltabs, method=trend.stat, approx=TRUE) out2 ## End(Not run)
## Not run: # Generate a random 1000 x 40 matrix consisting of the values # 1, 2, and 3, and representing 1000 variables and 40 observations. mat <- matrix(sample(3, 40000, TRUE), 1000) # Assume that the first 20 observations are cases, and the # remaining 20 are controls, and that the values 1, 2, 3 in mat # can be interpreted as scores for the different levels # of the variables represented by the rows of mat. cl <- rep(1:2, e=20) # Then an SAM analysis of linear trend can be done by out <- sam(mat, cl, method=trend.stat) out # The same results can also be obtained by employing # contingency tables, i.e. by specifying data as a list. # For this, we need to generate the tables summarizing # groupwise how many observations show which level at # which variable. These tables can be obtained by library(scrime) cases <- rowTables(mat[, cl==1]) controls <- rowTables(mat[, cl==2]) ltabs <- list(cases, controls) # And the same SAM analysis as above can then be # performed by out2 <- sam(ltabs, method=trend.stat, approx=TRUE) out2 ## End(Not run)
Generates the required statistics for an Empirical Bayes Analysis of Microarrays analysis using standardized Wilcoxon rank statistics.
Should not be called directly, but via ebam(..., method = wilc.ebam).
wilc.ebam(data, cl, approx50 = TRUE, ties.method = c("min", "random", "max"), use.offset = TRUE, df.glm = 5, use.row = FALSE, rand = NA)
wilc.ebam(data, cl, approx50 = TRUE, ties.method = c("min", "random", "max"), use.offset = TRUE, df.glm = 5, use.row = FALSE, rand = NA)
data |
a matrix or a data frame. Each row of
|
cl |
a numeric vector of length |
approx50 |
if |
ties.method |
either |
use.offset |
should an offset be used in the Poisson regression employed to estimate
the density of the observed Wilcoxon rank sums? If |
df.glm |
integer specifying the degrees of freedom of the natural cubic spline employed in the Poisson regression. |
use.row |
if |
rand |
numeric value. If specified, i.e. not |
Standardized versions of the Wilcoxon rank statistics are computed. This means that
is used as expression
score
, where
is the usual Wilcoxon rank sum statistic or Wilcoxon
signed rank statistic, respectively.
In the computation of these statistics, the ranks of ties are by default set to the minimum rank. In the computation of the Wilcoxon signed rank statistic, zeros are randomly set either to a very small positive or negative value.
If there are less than 50 observations in each of the groups, the exact null distribution
will be used. If there are more than 50 observations in at least one group, the null
distribution will by default be approximated by the standard normal distribution. It is,
however, still possible to compute the exact null distribution by setting approx50
to FALSE
.
A list of statistics required by ebam
.
Holger Schwender, [email protected]
Efron, B., Storey, J.D., Tibshirani, R.\ (2001). Microarrays, empirical Bayes methods, and the false discovery rate, Technical Report, Department of Statistics, Stanford University.
Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany.
Generates the required statistics for a Significance Analysis of Microarrays analysis using standardized Wilcoxon rank statistics.
Should not be called directly, but via sam(..., method = wilc.stat).
wilc.stat(data, cl, gene.names = NULL, R.fold = 1, use.dm = FALSE, R.unlog = TRUE, na.replace = TRUE, na.method = "mean", approx50 = TRUE, ties.method=c("min","random","max"), use.row = FALSE, rand = NA)
wilc.stat(data, cl, gene.names = NULL, R.fold = 1, use.dm = FALSE, R.unlog = TRUE, na.replace = TRUE, na.method = "mean", approx50 = TRUE, ties.method=c("min","random","max"), use.row = FALSE, rand = NA)
data |
a matrix or a data frame. Each row of |
cl |
a numeric vector of length |
gene.names |
a character vector of length |
R.fold |
a numeric value. If the fold change of a gene is smaller than or
equal to |
use.dm |
if |
R.unlog |
if |
na.replace |
if |
na.method |
a character string naming the statistic with which missing values
will be replaced if |
approx50 |
if |
ties.method |
either |
use.row |
if |
rand |
numeric value. If specified, i.e. not |
Standardized versions of the Wilcoxon rank statistics are computed. This means that
is used as expression
score
, where
is the usual Wilcoxon rank sum statistic or Wilcoxon
signed rank statistic, respectively.
In the computation of these statistics, the ranks of ties are by default set to the minimum rank. In the computation of the Wilcoxon signed rank statistic, zeros are randomly set either to a very small positive or negative value.
If there are less than 50 observations in each of the groups, the exact null distribution
will be used. If there are more than 50 observations in at least one group, the null
distribution will by default be approximated by the standard normal distribution. It is,
however, still possible to compute the exact null distribution by setting approx50
to FALSE
.
A list containing statistics required by sam
.
Holger Schwender, [email protected]
Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany.
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.
Computes the required statistics for an Empirical Bayes Analysis with a modified t- or F-test.
Should not be called directly, but via
ebam(..., method = z.ebam)
or find.a0(..., method = z.find)
,
respectively.
z.ebam(data, cl, a0 = NULL, quan.a0 = NULL, B = 100, var.equal = FALSE, B.more = 0.1, B.max = 30000, n.subset = 10, fast = FALSE, n.interval = 139, df.ratio = NULL, rand = NA) z.find(data, cl, B = 100, var.equal = FALSE, B.more = 0.1, B.max = 30000)
z.ebam(data, cl, a0 = NULL, quan.a0 = NULL, B = 100, var.equal = FALSE, B.more = 0.1, B.max = 30000, n.subset = 10, fast = FALSE, n.interval = 139, df.ratio = NULL, rand = NA) z.find(data, cl, B = 100, var.equal = FALSE, B.more = 0.1, B.max = 30000)
data |
a matrix, data frame or ExpressionSet object. Each row of |
cl |
a numeric vector of length |
a0 |
a numeric value specifying the fudge factor. |
quan.a0 |
a numeric value between 0 and 1 specifying the quantile of the standard deviations of the genes that is used as fudge factor. |
B |
an integer indicating how many permutations should be used in the estimation of the null distribution. |
var.equal |
should the ordinary t-statistic assuming equal group variances
be computed? If |
B.more |
a numeric value. If the number of all possible permutations is smaller
than or equal to (1+ |
B.max |
a numeric value. If the number of all possible permutations is smaller
than or equal to |
n.subset |
an integer specifying in how many subsets the |
fast |
if |
n.interval |
the number of intervals used in the logistic regression with
repeated observations for estimating the ratio |
df.ratio |
integer specifying the degrees of freedom of the natural cubic spline used in the logistic regression with repeated observations. |
rand |
integer. If specified, i.e. not |
A list of object required by find.a0
or ebam
, respectively.
Holger Schwender, [email protected]
Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment, JASA, 96, 1151-1160.
Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany.