Title: | PROspective Power Evaluation for RNAseq |
---|---|
Description: | This package provide simulation based methods for evaluating the statistical power in differential expression analysis from RNA-seq data. |
Authors: | Hao Wu |
Maintainer: | Hao Wu <[email protected]> |
License: | GPL |
Version: | 1.39.0 |
Built: | 2024-10-31 03:43:12 UTC |
Source: | https://github.com/bioc/PROPER |
This function take the simulation output from "runSims" function and compute a variety of power-related quantities.
comparePower(simOutput, alpha.type = c("fdr", "pval"), alpha.nominal=0.1, stratify.by = c("expr", "dispersion"), strata, filter.by = c("none","expr"), strata.filtered = 1, target.by=c("lfc", "effectsize"), delta=0.5)
comparePower(simOutput, alpha.type = c("fdr", "pval"), alpha.nominal=0.1, stratify.by = c("expr", "dispersion"), strata, filter.by = c("none","expr"), strata.filtered = 1, target.by=c("lfc", "effectsize"), delta=0.5)
simOutput |
The result from "runSims" function. |
alpha.type |
A string to represent the way to call DE genes. Available options are "fdr" and "pval", for calling DE genes based on FDR or p-values. Default is "fdr". |
alpha.nominal |
The nomial value for call DE genes. Recommend values are 0.1 when using FDR, and 0.01 for using p-values. |
stratify.by |
A string to represent the way to stratify genes. Available options are "expr" and "dispersion", for stratifying gene by average expression levels or over-dispersion. |
strata |
The strata used for stratification. |
filter.by |
A string to represent the way to filter genes. This is used in conjunction with strata.filtered for gene filtering. Available options are "none" and "expr". "none" stands for no filtering, thus all genes will be considered. "expr" stands for filtering based on average expression levels. |
strata.filtered |
The strata to be filtered out in computing power-related quantities. Genes fall into these strata will be excluded when computing power-related quantities. See "Details" for more description of gene filtering. |
target.by |
A string to specify the method to define "biologically important" DE genes. Available options are (1) "lfc": interesting genes are defined by absolute log fold changes. (2) "effectsize": interesting genes are defined by absolute log fold changes divided by the square root of dispersion. |
delta |
A threshold used for defining "biologically important" genes. Genes with absolute log fold changes (when target.by is "lfc") or effect sizes (when target.by is "effectsize") greater than this value are deemed DE in power calculations. See "Details" for more description. |
This is the main function to compute various power-related quantities, under stratification and filtering of all genes.
Gene stratification: we advocate to compute and visualize the powers at different stratification of genes. Because of the characteristics of RNA-seq data such as average expression level (counts), powers for calling DE genes are different even when the magnitude of changes are the same. The stratified results will provide a more comprehensive power assessment and better guide the investigators in experimental designs and analysis strategies.
Gene filtering: sometimes it is advisible to filter out some genes (such as the ones with very low count) before DE detection. The filtering option here provides an opportunity to compare the powers before and after filtering.
Define biologically interesting genes: we advocate to compute powers for "biologically interesting genes", because there might be many true DE genes with very low changes which are difficult to detect. In this sense, we are only interested in genes with adequate changes between groups (defined as "biologically interesting"). We provide two options to define biologically interesting genes: by absolute values of log fold changes or effect sizes (absolute values of log fold changes divided by the square root of dispersions). Genes with these quantities over a threshold are deemed intersting, and the power calculation are based on these genes.
A list with following fields:
TD , FD , alpha , FDR , power
|
3D array representing the number of true discoveries, false discoveries, type I error rate, FDR, and power for all simulation settings. The dimension of the arrays are nstrata * N * nsims. Here nstrata is number of specified strata. N is number of different sample sizes settings, and nsims is number of simulations. |
alpha.marginal , FDR.marginal , power.marginal
|
Matrix representing the marginal type I error rate, FDR, and power for all simulation settings. The dimension of the matrices are N * nsims. These are the marginalized (over all strata) values of the stratified quantities. |
stratify.by |
The input stratify.by. |
strata |
The input strata. |
Nreps |
Sample sizes one wants to perform simulation on. This is taken from the simulation options. |
target.by |
The input method to define "biologically important" DE genes. |
delta |
The input delta for biologically important genes. |
Hao Wu <[email protected]>
RNAseq.SimOptions.2grp, simRNAseq, runSims
## Not run: simOptions = RNAseq.SimOptions.2grp() ## run a few simulations simRes = runSims(Nreps=c(3,5,7), sim.opts=simOptions, nsims=5, DEmethod="edgeR") ## using FDR 0.1 to call DE, then look at power curves and summary powers = comparePower(simRes) summaryPower(powers) par(mfrow=c(2,2)) plotPower(powers) plotPowerTD(powers) plotFDR(powers) plotFDcost(powers) ## filter out the genes with low counts (<10) and redo power calculation ## Marginal powers are significantly higher. powers = comparePower(simRes, filter.by="expr", strata.filtered=1) summaryPower(powers) par(mfrow=c(2,2)) plotPower(powers) plotPowerTD(powers) plotFDR(powers) plotFDcost(powers) ## Provide higher threshold for log fold change to define true DE. ## This will result in higher power. powers2 = comparePower(simRes, delta=2) summaryPower(powers2) par(mfrow=c(2,2)) plotPower(powers2) plotPowerTD(powers2) plotFDR(powers2) plotFDcost(powers2) ## use effect size to define biologically interesting genes powers3 = comparePower(simRes, filter.by="expr", strata.filtered=1, target.by="effectsize", delta=1) summaryPower(powers3) par(mfrow=c(2,2)) plotPower(powers3) plotPowerTD(powers3) plotFDR(powers3) plotFDcost(powers3) ## End(Not run)
## Not run: simOptions = RNAseq.SimOptions.2grp() ## run a few simulations simRes = runSims(Nreps=c(3,5,7), sim.opts=simOptions, nsims=5, DEmethod="edgeR") ## using FDR 0.1 to call DE, then look at power curves and summary powers = comparePower(simRes) summaryPower(powers) par(mfrow=c(2,2)) plotPower(powers) plotPowerTD(powers) plotFDR(powers) plotFDcost(powers) ## filter out the genes with low counts (<10) and redo power calculation ## Marginal powers are significantly higher. powers = comparePower(simRes, filter.by="expr", strata.filtered=1) summaryPower(powers) par(mfrow=c(2,2)) plotPower(powers) plotPowerTD(powers) plotFDR(powers) plotFDcost(powers) ## Provide higher threshold for log fold change to define true DE. ## This will result in higher power. powers2 = comparePower(simRes, delta=2) summaryPower(powers2) par(mfrow=c(2,2)) plotPower(powers2) plotPowerTD(powers2) plotFDR(powers2) plotFDcost(powers2) ## use effect size to define biologically interesting genes powers3 = comparePower(simRes, filter.by="expr", strata.filtered=1, target.by="effectsize", delta=1) summaryPower(powers3) par(mfrow=c(2,2)) plotPower(powers3) plotPowerTD(powers3) plotFDR(powers3) plotFDcost(powers3) ## End(Not run)
This function estimate simulation parameters from a count table, which can be used to set simulation scenarios. The parameters estimated include baseline expression, biological variation in the form of dispersion paramter.
estParam(X,type = c(1,2))
estParam(X,type = c(1,2))
X |
An ExpressionSet or a matrix, with entries being RNA sequencing counts |
type |
|
This is a function that allows a user to establish simulation basis from his/her own data
set. Type 1 uses simple average count and sample variance to estimate
over dispersion, after normalizing by library size for each
sample. Estimated dispersion is bounded at minimum 0.001. Type 2 uses
edgeR
's AveLogCPM
as the estiamte for baseline expression
and tagwiseDispersion
as the estimate for dispersion.
A list with following fields:
seqDepth |
Estimated sequencing depth for each sample |
lmean |
A vector of baseline expression rate per gene in log
scale, with length |
lOD |
A vector of log dispersion for each gene, with length |
Jean Wu <[email protected]>
## Not run: lmu0=rnorm(20000,5,2) lOD0=rnorm(20000,-4,1) lmu=exp(rnorm(20000*10,lmu0,exp(lOD0))) X=matrix(rpois(20000*10,lmu),20000,10) param=estParam(X) ## End(Not run)
## Not run: lmu0=rnorm(20000,5,2) lOD0=rnorm(20000,-4,1) lmu=exp(rnorm(20000*10,lmu0,exp(lOD0))) X=matrix(rpois(20000*10,lmu),20000,10) param=estParam(X) ## End(Not run)
These are several datasets distributed with the package. They are simulation parameters estimated from existing RNA-seq data. The original raw count data were obtained from reCount website.
data(cheung) data(gilad) data(bottomly) data(maqc) data(GE.human) data(pbmc)
data(cheung) data(gilad) data(bottomly) data(maqc) data(GE.human) data(pbmc)
The data include:
cheung: parameters from Cheung data. They are measurements from unrelated individuals, so the dispersions are large.
gilad: parameters from Gilad data. They are for Human liver sample comparisons between male and female. This dataset has moderate dispersions.
bottomly: parameters from Bottomly data. They compare two strains of inbred mice and the within group dispersions are small.
maqc: MAQC data which are technical replicates. There are no biological variation from the replicates, so the dispersions are close to 0.
GE.human: a vector of aggregated counts from all samples in Cheung data. This is used for generating marginal expression distribution when provide sequnencing depth.
pbmc: data from gene expression microarray data. This is for an example of using historical data to estimate effect sizes.
Hao Wu <[email protected]>
Using histogram-alike graph to visualize the distribution of all genes and DE genes in all user specified strata.
plotPowerHist(powerOutput, simResult, main = "Histogram of power", return = FALSE)
plotPowerHist(powerOutput, simResult, main = "Histogram of power", return = FALSE)
powerOutput |
Result object from "comparePower" function. |
simResult |
Result object from "runSims" function. |
main |
Figure caption. |
return |
Return a matrix for average number of genes and true DE genes in each strata. |
A matrix of two rows. First row is for number of genes, and the second row is for number of DE genes. Columns are strata.
Hao Wu <[email protected]>
## Not run: simOptions = RNAseq.SimOptions.2grp() ## run a few simulations simRes = runSims(Nreps=c(3,5,7), sim.opts=simOptions, nsims=5, DEmethod="edgeR") ## using FDR 0.1 to call DE, then look at power curves and summary powers = comparePower(simRes) plotPowerHist(powers, simRes) ## End(Not run)
## Not run: simOptions = RNAseq.SimOptions.2grp() ## run a few simulations simRes = runSims(Nreps=c(3,5,7), sim.opts=simOptions, nsims=5, DEmethod="edgeR") ## using FDR 0.1 to call DE, then look at power curves and summary powers = comparePower(simRes) plotPowerHist(powers, simRes) ## End(Not run)
These are a group of functions to generate plot to visualize the stratified power metrics, including stratified power, FDR, true discoveries, false discoveries, false discovery cost, and type I error. There will be a line for each sample size. X-axis are strata and y-axis are the metrics. Error bars will be plotted if requested.
plotPower(powerOutput, cols = 1:ncol(powerOutput$FD), lty = 1:ncol(powerOutput$power), main = "", ylab = "Power", leg = TRUE, error.bar=TRUE) plotFDR(powerOutput, cols = 1:ncol(powerOutput$FDR), lty=1:ncol(powerOutput$FDR), main = "", ylab="FDR", leg = TRUE, error.bar=TRUE) plotPowerTD(powerOutput, cols = 1:ncol(powerOutput$TD), lty=1:ncol(powerOutput$TD), main="", ylab = "# True Discoveries", leg = TRUE, error.bar=TRUE) plotPowerFD(powerOutput, cols = 1:ncol(powerOutput$FD), lty=1:ncol(powerOutput$FD), main = "", ylab = "# False Discoverie", leg = TRUE, error.bar=TRUE) plotFDcost(powerOutput, cols = 1:ncol(powerOutput$FD), lty=1:ncol(powerOutput$FD), main="", ylab = "False discovery cost", leg = TRUE, error.bar=TRUE) plotPowerAlpha(powerOutput, cols = 1:ncol(powerOutput$alpha), lty=1:ncol(powerOutput$alpha), main = "", ylab = "False positive rate", leg = TRUE, error.bar=TRUE) plotAll(powerOutput)
plotPower(powerOutput, cols = 1:ncol(powerOutput$FD), lty = 1:ncol(powerOutput$power), main = "", ylab = "Power", leg = TRUE, error.bar=TRUE) plotFDR(powerOutput, cols = 1:ncol(powerOutput$FDR), lty=1:ncol(powerOutput$FDR), main = "", ylab="FDR", leg = TRUE, error.bar=TRUE) plotPowerTD(powerOutput, cols = 1:ncol(powerOutput$TD), lty=1:ncol(powerOutput$TD), main="", ylab = "# True Discoveries", leg = TRUE, error.bar=TRUE) plotPowerFD(powerOutput, cols = 1:ncol(powerOutput$FD), lty=1:ncol(powerOutput$FD), main = "", ylab = "# False Discoverie", leg = TRUE, error.bar=TRUE) plotFDcost(powerOutput, cols = 1:ncol(powerOutput$FD), lty=1:ncol(powerOutput$FD), main="", ylab = "False discovery cost", leg = TRUE, error.bar=TRUE) plotPowerAlpha(powerOutput, cols = 1:ncol(powerOutput$alpha), lty=1:ncol(powerOutput$alpha), main = "", ylab = "False positive rate", leg = TRUE, error.bar=TRUE) plotAll(powerOutput)
powerOutput |
Result object from "comparePower" function. |
cols , lty
|
Colors and line types of the TD curves. |
main |
Figure title. |
ylab |
Y-axis label. |
leg |
Indicator for having figure legend or not. |
error.bar |
Indicator for whether plotting error bars in the figures. |
Hao Wu <[email protected]>
comparePower
## Not run: simOptions = RNAseq.SimOptions.2grp() ## run a few simulations simRes = runSims(Nreps=c(3,5,7), sim.opts=simOptions, nsims=5, DEmethod="edgeR") ## using FDR 0.1 to call DE, then look at power curves and summary powers = comparePower(simRes) ## plot par(mfrow=c(2,3)) plotPower(powers) plotPowerTD(powers) plotFDR(powers) plotFDcost(powers) plotPowerAlpha(powers) ## in one figure par(mfrow=c(2,3)) plotAll(powers) ## End(Not run)
## Not run: simOptions = RNAseq.SimOptions.2grp() ## run a few simulations simRes = runSims(Nreps=c(3,5,7), sim.opts=simOptions, nsims=5, DEmethod="edgeR") ## using FDR 0.1 to call DE, then look at power curves and summary powers = comparePower(simRes) ## plot par(mfrow=c(2,3)) plotPower(powers) plotPowerTD(powers) plotFDR(powers) plotFDcost(powers) plotPowerAlpha(powers) ## in one figure par(mfrow=c(2,3)) plotAll(powers) ## End(Not run)
The function helps study the power under different sequencing depth and sample sizes. It estimates the marginal powers based on existing simulation results, given new sequencing depth.
power.seqDepth(simResult, powerOutput, depth.factor = c(0.2, 0.5, 1, 2, 5, 10))
power.seqDepth(simResult, powerOutput, depth.factor = c(0.2, 0.5, 1, 2, 5, 10))
simResult |
Result object from "runSims" function. |
powerOutput |
Result object from "comparePower" function. |
depth.factor |
A vector of numbers specifying the *relative* sequencing depth, comparing to the depth used in the simulation. 1 means using the same number of total reads as the simulation. |
The powers under different sequencing depth and sample sizes provides important guidence in experimental design. Under the same total number of sequence reads, investigator can choose to use more replicates and shallower coverage for each, or less replicates and deeper coverage.
This function provides estimated marginal power holding all experimental variables fixed (biological variation, effect sizes, sample sizes, etc.) except the sequencing depth. Changing sequencing depth will only alter the marginal distribution of average counts. Since the stratified power (by average counts) won't change, those numbers are used in estimating the powers under different depth. This approaches allows skipping new simulations, which saves computation.
A matrix for marginal powers. Each row is for a sequencing depth, each columns is for a sample size.
Hao Wu <[email protected]>
comparePower, summary.power
## Not run: simOptions = RNAseq.SimOptions.2grp() simRes = runSims(Nreps=c(3,5,7), sim.opts=simOptions, nsims=5, DEmethod="edgeR") powers = comparePower(simRes) power.seqDepth(simRes, powers) ## End(Not run)
## Not run: simOptions = RNAseq.SimOptions.2grp() simRes = runSims(Nreps=c(3,5,7), sim.opts=simOptions, nsims=5, DEmethod="edgeR") powers = comparePower(simRes) power.seqDepth(simRes, powers) ## End(Not run)
This function takes user provided options for for simulating RNA-seq data, and return a list. The result of this function will be the input for "runSims" and "simRNAseq" function.
RNAseq.SimOptions.2grp(ngenes, seqDepth, lBaselineExpr, lOD, p.DE, lfc, sim.seed)
RNAseq.SimOptions.2grp(ngenes, seqDepth, lBaselineExpr, lOD, p.DE, lfc, sim.seed)
ngenes |
Number of genes in the simulation. |
lBaselineExpr |
log baseline expression for each gene. This can be: (1) a constant; (2) a vector with length ngenes; (3) a function that takes an integer n, and generate a vector of length n; (4) a string specifying the name of existing datasets, from which the mean expressions will be sampled. Details for this option is provide in "Details" section. |
lOD |
log over-dispersion for each gene. Available options are the same as for lBaselineExpr. |
seqDepth |
Sequencing depth, in terms of total read counts. This will be ignored if lBaselineExpr is specified. |
p.DE |
Percentage of genes being differentially expressed (DE). By default it's 5%. |
lfc |
log-fold change for DE genes. This can be: (1) a constant; (2) a vector with length being number of DE genes; (3) a function that takes an integer n, and generate a vector of length n. If the input is a vector and the length is not the number of DE genes, it will be sampled with replacement to generate log-fold change. |
sim.seed |
Simulation seed. |
The simulation of RNA-seq data requires a lot of parameters. This function provides users an interface to specify the simulation parameters. The result from this function will be used for simulating RNA-seq count data. By default, the simulation parameters are similar to that from Cheung data (for unrelated individuals, with large biological variance).
The baseline expression levels and log over-dispersions can be sampled from real data. There are parameters estimated from several real datasets distributed with the package. Available string options for "lBaselineExpr" and "lOD" include: (1) "cheung": parameters from Cheung data, which measures unrelated individuals, so the dispersions are large; (2) "gilad": from Gilad data which are for Human liver sample comparisons between male and female. This dataset has moderate dispersions; (3) "bottomly": from Bottmly data which are from comparing two strains of inbred mice. The dispersions are small. (4) "maqc": from MAQC data which are technical replicates. There are no biological variation from the replicates because the data are technical replicates. The dispersions from this dataset is very small.
The effect sizes (log fold changes of the DE genes) are arbitrarily specified. It is possible to estimate those from real data. We provide a simple example in the package vignette for doing so.
A list with following fields:
ngenes |
An integer for number of genes. |
p.DE |
Percentage of DE genes. |
lBaselineExpr |
A vector of length ngenes for log baseline expression. |
lOD |
A vector of length ngenes for log over-dispersion. |
lfc |
A vector of length (ngenes*p.DE) for log fold change of the DE genes. |
sim.seed |
The specified simulation seed. |
design |
A string representing the experimental design. From this function it is '2grp', standing for two-group comparison. |
Hao Wu <[email protected]>
simRNAseq,runSims
## default simOptions=RNAseq.SimOptions.2grp() summary(simOptions) ## specify some parameters: generate baseline expression and ## dispersion from Bottom data, and specify a function for ## alternative log fold changes. fun.lfc=function(x) rnorm(x, mean=0, sd=1.5) simOptions=RNAseq.SimOptions.2grp(ngenes=30000, lBaselineExpr="bottomly", lOD="bottomly", p.DE=0.05, lfc=fun.lfc) summary(simOptions)
## default simOptions=RNAseq.SimOptions.2grp() summary(simOptions) ## specify some parameters: generate baseline expression and ## dispersion from Bottom data, and specify a function for ## alternative log fold changes. fun.lfc=function(x) rnorm(x, mean=0, sd=1.5) simOptions=RNAseq.SimOptions.2grp(ngenes=30000, lBaselineExpr="bottomly", lOD="bottomly", p.DE=0.05, lfc=fun.lfc) summary(simOptions)
This is the "wrapper" function for running RNA-seq DE detection simulation. It runs simulations under different sample sizes (replicates in each group) for a certain numbers. In each simulation, the RNA-seq data are generated, and then DE detection (using user specified method/software) is performed. The return object contains DE test results (test statistics, p-values, FDRs, etc.) from all simulations.
runSims(Nreps = c(3, 5, 7, 10), Nreps2, nsims = 100, sim.opts, DEmethod = c("edgeR", "DSS", "DESeq2"), verbose =TRUE)
runSims(Nreps = c(3, 5, 7, 10), Nreps2, nsims = 100, sim.opts, DEmethod = c("edgeR", "DSS", "DESeq2"), verbose =TRUE)
Nreps |
Sample sizes one wants to perform simulation on. This is a vector, each element specifies the number of biological replicates in each group. Default value is c(3, 5, 7, 10), which means we want to perform simulation when there are 3, 5, 7, or 10 replicates in each group. |
Nreps2 |
Sample sizes for the second treatment group. If this is missing, it'll take the same value as Nreps and then two groups will be assumed to have the same sample size. This parameter allows two treatment groups have different sample sizes. When specified, it must be a vector of the same length as Nreps. |
nsims |
Number of simulations. |
sim.opts |
An object for simulation option. This should be the return object from "RNAseq.SimOptions.2grp" function. |
DEmethod |
String to specify the DE detection method to be used. Available options are "edgeR", "DSS", and "DESeq2". |
verbose |
Logical value to indicate whether to output some messages (progress report). |
This is the main simulation function in the packge. After specifying the simulation parameters (from "RNAseq.SimOptions.2grp" function), one wants to evaluate the power vs different sample sizes under that simulation setting. This function simulates the count data and performs statistical tests for DE detection. It only stores and returns the DE test results (test statistics, p-values, FDRs, etc.) but doesn't make inferences. The inferences will be conducted in the "comparePower" function. The advantage is that for one simulation setting, the simulation only need to be run once. The inferences using different critical values and type I error controls can then be drawn from the same results. This greatly save the computation because the simulation part is the most computationally intensive.
This function can be slow, depends on the setting (number of genes, replicates, simulations, etc). For the default (50000 genes, 100 simulations, for 3, 5, 7, or 10 replicates), it takes about an hour to run on a single core of i7 2.7GHz CPU. But again, this only need to be run once for a particular simulation setting.
A list with following fields:
pvalue , fdrs
|
3D array for p-values and FDR from each simulation. The dimension of the array is ngenes * N * nsims. Here N is length(Nreps), of the number of different sample sizes settings. |
xbar |
3D array for average read counts for genes. Dimension is the same as pvalue/fdr. This will be used in "comparePower" function for stratified power quantities. |
DEid |
A list of length nsims. Each contains the id of DE genes. |
lfcs |
A list of length nsims. Each contains the log fold changes of DE genes. |
Nreps |
The input Nreps. |
sim.opts |
The input sim.opts. |
Hao Wu <[email protected]>
RNAseq.SimOptions.2grp, simRNAseq
simOptions = RNAseq.SimOptions.2grp() ## using 3 different sample sizes, run 2 simulations, using edgeR simRes = runSims(Nreps=c(3,5,7), sim.opts=simOptions, nsims=2, DEmethod="edgeR") names(simRes) ## Not run: ## using 5 different sample sizes, run 100 simulations, using edgeR. ## This will be slow. simRes = runSims(Nreps=c(3,5,7,10,15), sim.opts=simOptions, nsims=100, DEmethod="edgeR") ## for different sample sizes in two groups simRes = runSims(Nreps=c(3,5,7,10), Nreps2=c(5,7,9,11), sim.opts=simOptions, nsims=100, DEmethod="edgeR") ## End(Not run)
simOptions = RNAseq.SimOptions.2grp() ## using 3 different sample sizes, run 2 simulations, using edgeR simRes = runSims(Nreps=c(3,5,7), sim.opts=simOptions, nsims=2, DEmethod="edgeR") names(simRes) ## Not run: ## using 5 different sample sizes, run 100 simulations, using edgeR. ## This will be slow. simRes = runSims(Nreps=c(3,5,7,10,15), sim.opts=simOptions, nsims=100, DEmethod="edgeR") ## for different sample sizes in two groups simRes = runSims(Nreps=c(3,5,7,10), Nreps2=c(5,7,9,11), sim.opts=simOptions, nsims=100, DEmethod="edgeR") ## End(Not run)
This function generate simulated RNA-seq count data for a two-group comparison design. It takes an object for simulation option (return object from 'RNAseq.SimOptions.2grp' function, and sample sizes (replicates in each condition, then generate a matrix of counts.
simRNAseq(simOptions, n1, n2)
simRNAseq(simOptions, n1, n2)
simOptions |
An object for simulation option. This should be the return object from 'RNAseq.SimOptions.2grp' function. |
n1 , n2
|
Sample size in two treatment groups. |
The count data are generated based a negative binomial (NB) distribution. Parameters for NB are provided in the input object.
A list with following fields:
counts |
A matrix of dimension ngenes x (n1+n2). Each row is for a gene, each column is for a sample. Columns 1 to n1 are for the first condition. The rest columns are for the other condition. |
designs |
A vector/matrix representing the experimental designs. In a two-group comparison, it's a simple 0/1 vector. |
DEid |
The ID (row index) for true DE genes. |
simOptions |
The simulation option object. This is exactly the same as the input simOptions. |
Hao Wu <[email protected]>
RNAseq.SimOptions.2grp, runSims
simOptions = RNAseq.SimOptions.2grp() data = simRNAseq(simOptions, n1=3, n2=3) names(data)
simOptions = RNAseq.SimOptions.2grp() data = simRNAseq(simOptions, n1=3, n2=3) names(data)
Takes a result object from "comparePower" function and print out a table to summarize important power-related quantities. The results are marginalized, meaning that they are averaged quantities over all strata and simulations. This provides a quick view of the marginal results for power-related quantities under all sample sizes.
summaryPower(powerOutput)
summaryPower(powerOutput)
powerOutput |
The result object from "comparePower" function. |
A matrix with multiple rows, each for a different sample size. Columns include sample size, specified nomial type I control value (for FDR or p-values), actual FDR, marginal power, averaged number of true and false discoveries, and false discovery costs.
Hao Wu <[email protected]>
runSims, comparePower
## Not run: simOptions = RNAseq.SimOptions.2grp() simRes = runSims(Nreps=c(3,5,7), sim.opts=simOptions, nsims=5, DEmethod="edgeR") powers = comparePower(simRes) summaryPower(powers) ## End(Not run)
## Not run: simOptions = RNAseq.SimOptions.2grp() simRes = runSims(Nreps=c(3,5,7), sim.opts=simOptions, nsims=5, DEmethod="edgeR") powers = comparePower(simRes) summaryPower(powers) ## End(Not run)