Title: | Efficient Power Analysis for EWAS with Continuous or Binary Outcomes |
---|---|
Description: | A quasi-simulation based approach to performing power analysis for EWAS (Epigenome-wide association studies) with continuous or binary outcomes. 'EpipwR' relies on empirical EWAS datasets to determine power at specific sample sizes while keeping computational cost low. EpipwR can be run with a variety of standard statistical tests, controlling for either a false discovery rate or a family-wise type I error rate. |
Authors: | Jackson Barth [aut, cre] , Austin Reynolds [aut], Mary Lauren Benton [ctb], Carissa Fong [ctb] |
Maintainer: | Jackson Barth <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.1.0 |
Built: | 2024-10-30 07:16:55 UTC |
Source: | https://github.com/bioc/EpipwR |
Plots 95% error bars of EpipwR output. Different sample sizes are displayed along the x-axis with lines representing distinct correlation means. This functions should only be used with EpipwR output.
EpipwR_plot(df)
EpipwR_plot(df)
df |
A data frame containing output from the |
A line plot with sample size on the x-axis and power on the y-axis. Multiple lines are plotted for each correlation/effect size mean specified.
out <- get_power_cont(100,10000,c(70,80,90,100),.05,c(.3,.4,.5)) EpipwR_plot(out)
out <- get_power_cont(100,10000,c(70,80,90,100),.05,c(.3,.4,.5)) EpipwR_plot(out)
Calculates power for EWAS with a binary outcome for multiple
sample sizes and/or effect sizes. Data sets are only simulated for the
non-null tests; p-values are generated directly for the null tests.
Rather than specifying the number of data sets to calculate power, you
specify the precision level (MOE
)
and a maximum number of data sets (Nmax
). After 20 data sets, the
function terminates when the desired precision level is reached or if the
number of tested data sets reaches Nmax
.
get_power_cc( dm, Total, n, fdr_fwer, delta_mu, delta_sd = 0, n1_prop = 0.5, Tissue = "Saliva", Nmax = 1000, MOE = 0.03, test = "pooled", use_fdr = TRUE, Suppress_updates = FALSE )
get_power_cc( dm, Total, n, fdr_fwer, delta_mu, delta_sd = 0, n1_prop = 0.5, Tissue = "Saliva", Nmax = 1000, MOE = 0.03, test = "pooled", use_fdr = TRUE, Suppress_updates = FALSE )
dm |
Number of non-null tests. |
Total |
The total number of tests (null and non-null). |
n |
Sample size(s) for which power is calculated (accepts a vector). This is total sample size. |
fdr_fwer |
Either the false discovery rate or the family-wise type I
error rate, depending on |
delta_mu |
Average effect size (difference in mean methylation between
groups) for non-null tests (accepts a vector). If multiple |
delta_sd |
Standard deviation of the effect size for non-null tests.
If 0, all effect sizes are fixed at |
n1_prop |
Indicates the proportion of the total sample size ( |
Tissue |
Tissue type of Empirical EWAS to be used for data generation (see details for valid options). |
Nmax |
The maximum number of data sets used to calculate power. |
MOE |
The target margin of error of a 95% confidence interval for average power. This determines the stopping point of the algorithm. |
test |
The type of t-test to be used. |
use_fdr |
If |
Suppress_updates |
If |
Valid options for the Tissue
argument are "Saliva"
, "Lymphoma"
,
"Placenta"
, "Liver"
, "Colon"
, "Blood adult"
, "Blood 5 year olds"
,
"Blood newborns"
, "Cord-blood (whole blood)"
, "Cord-blood (PBMC)"
,
"Adult (PBMC)"
, and "Sperm"
. All data sets are publicly available on the
gene expression omnibus (see
EpipwR.data package for
more details) and were identified by Graw, et. al. (2019). Please note that,
due to some extreme values in this data set, the Lymphoma option will
occasionally throw a warning related to data generation. At this time, we
recommend using one of the other tissue options.
Unlike pwrEWAS (Graw et al., 2019), EpipwR enforces equality of precision (sum of the parameters) on the distributions of each group rather than equality of variance.
A dataframe with rows equal to the number of n
and rho_mu
combinations
Graw, S., Henn, R., Thompson, J. A., and Koestler, D. C. (2019). pwrEWAS: A user-friendly tool for comprehensive power estimation for epigenome wide association studies (EWAS). BMC Bioinformatics, 20(1):218.
# This examples calculates power for 100 non-null tests out of 10,000 total # with an FDR of 5%. Sample sizes of 40, 50, 60, 70 and fixed effect sizes # of 0.01, 0.02, 0.05 are used. For improved accuracy, MOE=.01 to ensure # that the 95% confidence interval for average power has a margin of error # no larger than .01 (unless the maximum of 1,000 data sets is exceeded). get_power_cc(100,10000,c(40,50,60,70),.05,c(.01,.02,.05), MOE=.01)
# This examples calculates power for 100 non-null tests out of 10,000 total # with an FDR of 5%. Sample sizes of 40, 50, 60, 70 and fixed effect sizes # of 0.01, 0.02, 0.05 are used. For improved accuracy, MOE=.01 to ensure # that the 95% confidence interval for average power has a margin of error # no larger than .01 (unless the maximum of 1,000 data sets is exceeded). get_power_cc(100,10000,c(40,50,60,70),.05,c(.01,.02,.05), MOE=.01)
Calculates power for EWAS with a continuous outcome for multiple
sample sizes and/or correlations. Data sets are only simulated for the
non-null tests; p-values are generated directly for the null tests.
Rather than specifying the number of data sets to calculate power, you
specify the precision level (MOE
)
and a maximum number of data sets (Nmax
). After 20 data sets, the
function terminates when the desired precision level is reached or if the
number of tested data sets reaches Nmax
.
get_power_cont( dm, Total, n, fdr_fwer, rho_mu, rho_sd = 0, Tissue = "Saliva", Nmax = 1000, MOE = 0.03, test = "pearson", use_fdr = TRUE, Suppress_updates = FALSE )
get_power_cont( dm, Total, n, fdr_fwer, rho_mu, rho_sd = 0, Tissue = "Saliva", Nmax = 1000, MOE = 0.03, test = "pearson", use_fdr = TRUE, Suppress_updates = FALSE )
dm |
Number of non-null tests. |
Total |
The total number of tests (null and non-null). |
n |
Sample size(s) for which power is calculated (accepts a vector). |
fdr_fwer |
Either the false discovery rate or the family-wise type I
error rate, depending on |
rho_mu |
Mean correlation(s) of methylation and phenotype for non-null
tests (accepts a vector). If multiple |
rho_sd |
Standard deviation of methylation and phenotype for non-null
tests. If 0 all correlations are fixed at |
Tissue |
Tissue type of Empirical EWAS to be used for data generation (see details for valid options). |
Nmax |
The maximum number of datasets used to calculate power |
MOE |
The target margin of error of a 95% confidence interval for average power. This determines the stopping point of the algorithm. |
test |
The type of statistical test to be used. |
use_fdr |
If |
Suppress_updates |
If |
Valid options for the Tissue
argument are "Saliva"
, "Lymphoma"
,
"Placenta"
, "Liver"
, "Colon"
, "Blood adult"
, "Blood 5 year olds"
,
"Blood newborns"
, "Cord-blood (whole blood)"
, "Cord-blood (PBMC)"
,
"Adult (PBMC)"
, and "Sperm"
. All data sets are publicly available on the
gene expression omnibus (see
EpipwR.data package for
more details) and were identified by Graw, et. al. (2019). Please note that,
due to some extreme values in this data, the Lymphoma option will
occasionally throw a warning related to data generation. At this time, we
recommend using one of the other tissue options.
Although this function only covers 3 types of statistical tests, its worth
noting that tests run using software packages such as limma will yield the
same results as a pearson correlation test in the absence of covariates
or any dependence across CpG sites (as is the assumption here).
Any users wanting to mimic an analysis done in limma should use
test="pearson"
.
A data frame with rows equal to the number of n
and rho_mu
combinations
Graw, S., Henn, R., Thompson, J. A., and Koestler, D. C. (2019). pwrEWAS: #' A user-friendly tool for comprehensive power estimation for epigenome wide association studies (EWAS). BMC Bioinformatics, 20(1):218.
# This examples calculates power for 100 non-null tests out of 10,000 total # with an FDR of 5%. # Sample sizes of 70,80,90,100 and fixed correlations at 0.3,0.4,0.5 are # used. get_power_cont(100,10000,c(70,80,90,100),.05,c(.3,.4,.5))
# This examples calculates power for 100 non-null tests out of 10,000 total # with an FDR of 5%. # Sample sizes of 70,80,90,100 and fixed correlations at 0.3,0.4,0.5 are # used. get_power_cont(100,10000,c(70,80,90,100),.05,c(.3,.4,.5))