| Title: | Efficient Power Analysis for EWAS with Continuous or Binary Outcomes |
|---|---|
| Description: | A quasi-simulation based approach to performing power analysis for EWAS (Epigenome-wide association studies) with continuous or binary outcomes. 'EpipwR' relies on empirical EWAS datasets to determine power at specific sample sizes while keeping computational cost low. EpipwR can be run with a variety of standard statistical tests, controlling for either a false discovery rate or a family-wise type I error rate. |
| Authors: | Jackson Barth [aut, cre] (ORCID: <https://orcid.org/0009-0009-6307-9928>), Austin Reynolds [aut], Mary Lauren Benton [ctb], Carissa Fong [ctb] |
| Maintainer: | Jackson Barth <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.7.0 |
| Built: | 2026-05-30 08:31:50 UTC |
| Source: | https://github.com/bioc/EpipwR |
Plots 95% error bars of EpipwR output. Different sample sizes are displayed along the x-axis with lines representing distinct correlation means. This functions should only be used with EpipwR output.
EpipwR_plot(df)EpipwR_plot(df)
df |
A data frame containing output from the |
A line plot with sample size on the x-axis and power on the y-axis. Multiple lines are plotted for each correlation/effect size mean specified.
out <- get_power_cont(100,10000,c(70,80,90,100),.05,c(.3,.4,.5)) EpipwR_plot(out)out <- get_power_cont(100,10000,c(70,80,90,100),.05,c(.3,.4,.5)) EpipwR_plot(out)
Calculates power for EWAS with a binary outcome for multiple
sample sizes and/or effect sizes based on Barth and Reynolds (2025).
Data sets are only simulated for the non-null tests; p-values are generated
directly for the null tests. Rather than specifying the number of data sets
to calculate power, you specify the precision level (MOE)
and a maximum number of data sets (Nmax). After 20 data sets, the
function terminates when the desired precision level is reached or if the
number of tested data sets reaches Nmax. Researchers who are familiar with
pwrEWAS can use pwrE_to_EpipwR to use pwrEWAS parameterization
in EpipwR.
get_power_cc( dm, Total, n, fdr_fwer, delta_mu, delta_sd = 0, n1_prop = 0.5, Tissue = "Saliva", Nmax = 1000, MOE = 0.03, test = "pooled", use_fdr = TRUE, det_limit = 0, Suppress_updates = FALSE, emp_data = NULL )get_power_cc( dm, Total, n, fdr_fwer, delta_mu, delta_sd = 0, n1_prop = 0.5, Tissue = "Saliva", Nmax = 1000, MOE = 0.03, test = "pooled", use_fdr = TRUE, det_limit = 0, Suppress_updates = FALSE, emp_data = NULL )
dm |
Number of non-null tests. |
Total |
The total number of tests (null and non-null). |
n |
Sample size(s) for which power is calculated (accepts a vector). This is total sample size. |
fdr_fwer |
Either the false discovery rate or the family-wise type I
error rate, depending on |
delta_mu |
Average effect size (difference in mean methylation between
groups) for non-null tests (accepts a vector). If multiple |
delta_sd |
Standard deviation of the effect size for non-null tests.
If 0, all effect sizes are fixed at |
n1_prop |
Indicates the proportion of the total sample size ( |
Tissue |
Tissue type of Empirical EWAS to be used for data generation (see details for valid options). |
Nmax |
The maximum number of data sets used to calculate power. |
MOE |
The target margin of error of a 95% confidence interval for average power. This determines the stopping point of the algorithm. |
test |
The type of t-test to be used. |
use_fdr |
If |
det_limit |
The minimum mean difference for the effect size
distribution. ignored if |
Suppress_updates |
If |
emp_data |
Reference data set in matrix or data frame format (Beta
values with CpG sites as rows, samples as columns). Ignored unless
|
Valid options for the Tissue argument are "Saliva", "Lymphoma",
"Placenta", "Liver", "Colon", "Blood adult", "Blood 5 year olds",
"Blood newborns", "Cord-blood (whole blood)", "Cord-blood (PBMC)",
"Adult (PBMC)", and "Sperm". All data sets are publicly available on the
gene expression omnibus (see
EpipwR.data package for
more details) and were identified by Graw, et al. (2019). Please note that,
due to some extreme values in this data set, the Lymphoma option will
occasionally throw a warning related to data generation. At this time, we
recommend using one of the other tissue options. Users who would like to use
their own reference data set should set Tissue="Custom" and provide the
data in matrix or data frame format with emp_data. Users who would like to
take advantage of this setting are responsible for formatting this data set
correctly.
Unlike pwrEWAS (Graw et al., 2019), EpipwR enforces equality of precision (sum of the parameters) on the distributions of each group rather than equality of variance.
A dataframe with rows equal to the number of n and rho_mu
combinations
Barth, J., and Reynolds, A. W. (2025). EpipwR: Efficient power analysis for EWAS with continuous outcomes. Bioinformatics Advances, 5(1), vbaf150.
Graw, S., Henn, R., Thompson, J. A., and Koestler, D. C. (2019). pwrEWAS: A user-friendly tool for comprehensive power estimation for epigenome wide association studies (EWAS). BMC Bioinformatics, 20(1):218.
# This examples calculates power for 100 non-null tests out of 10,000 total # with an FDR of 5%. Sample sizes of 40, 50, 60, 70 and fixed effect sizes # of 0.01, 0.02, 0.05 are used. For improved accuracy, MOE=.01 to ensure # that the 95% confidence interval for average power has a margin of error # no larger than .01 (unless the maximum of 1,000 data sets is exceeded). get_power_cc(100,10000,c(40,50,60,70),.05,c(.01,.02,.05), MOE=.01)# This examples calculates power for 100 non-null tests out of 10,000 total # with an FDR of 5%. Sample sizes of 40, 50, 60, 70 and fixed effect sizes # of 0.01, 0.02, 0.05 are used. For improved accuracy, MOE=.01 to ensure # that the 95% confidence interval for average power has a margin of error # no larger than .01 (unless the maximum of 1,000 data sets is exceeded). get_power_cc(100,10000,c(40,50,60,70),.05,c(.01,.02,.05), MOE=.01)
Calculates power for EWAS with a continuous outcome for multiple
sample sizes and/or correlations based on Barth and Reynolds (2025). Data
sets are only simulated for the non-null tests; p-values are generated
directly for the null tests. Rather than specifying the number of data sets
to calculate power, you specify the precision level (MOE)
and a maximum number of data sets (Nmax). After 20 data sets, the
function terminates when the desired precision level is reached or if the
number of tested data sets reaches Nmax.
get_power_cont( dm, Total, n, fdr_fwer, rho_mu, rho_sd = 0, Tissue = "Saliva", Nmax = 1000, MOE = 0.03, test = "pearson", use_fdr = TRUE, det_limit = 0.03, Suppress_updates = FALSE, emp_data = NULL, phenotype_data = NULL )get_power_cont( dm, Total, n, fdr_fwer, rho_mu, rho_sd = 0, Tissue = "Saliva", Nmax = 1000, MOE = 0.03, test = "pearson", use_fdr = TRUE, det_limit = 0.03, Suppress_updates = FALSE, emp_data = NULL, phenotype_data = NULL )
dm |
Number of non-null tests. |
Total |
The total number of tests (null and non-null). |
n |
Sample size(s) for which power is calculated (accepts a vector). |
fdr_fwer |
Either the false discovery rate or the family-wise type I
error rate, depending on |
rho_mu |
Mean correlation(s) of methylation and phenotype for non-null
tests (accepts a vector). If multiple |
rho_sd |
Standard deviation of methylation and phenotype for non-null
tests. If 0 all correlations are fixed at |
Tissue |
Tissue type of Empirical EWAS to be used for data generation (see details for valid options). |
Nmax |
The maximum number of datasets used to calculate power |
MOE |
The target margin of error of a 95% confidence interval for average power. This determines the stopping point of the algorithm. |
test |
The type of statistical test to be used. |
use_fdr |
If |
det_limit |
The minimum absolute correlation for the effect size
distribution. Ignored if |
Suppress_updates |
If |
emp_data |
Reference data set in matrix or data frame format (Beta
values with CpG sites as rows, samples as columns). Ignored unless
|
phenotype_data |
A sample of phenotype data to be used in the power calculation(s). Accepts a vector with length > 100, although we recommend that users make this at least as large as their maximum sample size. If left blank, a normal distribution is used to generate correlations. |
Valid options for the Tissue argument are "Saliva", "Lymphoma",
"Placenta", "Liver", "Colon", "Blood adult", "Blood 5 year olds",
"Blood newborns", "Cord-blood (whole blood)", "Cord-blood (PBMC)",
"Adult (PBMC)", and "Sperm". All data sets are publicly available on the
gene expression omnibus (see
EpipwR.data package for
more details) and were identified by Graw, et. al. (2019). Please note that,
due to some extreme values in this data, the Lymphoma option will
occasionally throw a warning related to data generation. At this time, we
recommend using one of the other tissue options. Users who would like to use
their own reference data set should set Tissue="Custom" and provide the
data in matrix or data frame format with emp_data. Similarly, users can
also now specify their own phenotype data, either from a real data set or by
generating samples from a known distribution (i.e., rt(1000,2)). Users who
would like to take advantage of either of these settings are responsible for
the quality and formatting of the data provided.
Although this function only covers 3 types of statistical tests, its worth
noting that tests run using software packages such as limma will yield the
same results as a pearson correlation test in the absence of covariates
or any dependence across CpG sites (as is the assumption here).
Any users wanting to mimic an analysis done in limma should use
test="pearson".
A data frame with rows equal to the number of n and rho_mu
combinations
Barth, J., and Reynolds, A. W. (2025). EpipwR: Efficient power analysis for EWAS with continuous outcomes. Bioinformatics Advances, 5(1), vbaf150.
Graw, S., Henn, R., Thompson, J. A., and Koestler, D. C. (2019). pwrEWAS: A user-friendly tool for comprehensive power estimation for epigenome wide association studies (EWAS). BMC Bioinformatics, 20(1):218.
# This examples calculates power for 100 non-null tests out of 10,000 total # with an FDR of 5%. # Sample sizes of 70,80,90,100 and fixed correlations at 0.3,0.4,0.5 are # used. get_power_cont(100,10000,c(70,80,90,100),.05,c(.3,.4,.5))# This examples calculates power for 100 non-null tests out of 10,000 total # with an FDR of 5%. # Sample sizes of 70,80,90,100 and fixed correlations at 0.3,0.4,0.5 are # used. get_power_cont(100,10000,c(70,80,90,100),.05,c(.3,.4,.5))
A function that determines the EpipwR inputs (delta_mu, delta_sd,
det_limit) for effect-size distribution using pwrEWAS methodology
(Centered at 0, truncated around 0 +/- det_limit and the 99.99th
percentile of the distribution as maximal_delta). The output can then
be used as inputs to get_power_cc.
pwrE_to_EpipwR(maximal_delta, det_limit, quantile = 0.9999)pwrE_to_EpipwR(maximal_delta, det_limit, quantile = 0.9999)
maximal_delta |
The desired 99.99th percentile (unless quantile is specified) of the effect size distribution. Must be between 0 and 1. |
det_limit |
The minimum effect size to be used in a power calculation. |
quantile |
The quantile that maximal_delta represents. Note that
pwrEWAS uses |
As described in Graw, et al. (2019), users specify the "maximal delta" and the detection limit to set the effect size distribution. The purpose of this function is to provide a map between the pwrEWAS inputs and the EpipwR inputs to generate the same effect size distribution.
The main purpose of the function is to calculate the standard deviation of
the effect size distribution, since the mean is always 0 and the detection
limit is user-sepcified. Specifically, the standard deviation is calculated
such that the 99.99th percentile of a normal distribution (truncating out
0 +/- det_limit) is maximal_delta. For further justification, see Graw,
et al. (2019). Note that this function does allow users to specify quantiles
other than 0.9999. If users wish to use this function for continuous EpipwR,
we recommend specifying a smaller quantile (i.e., 0.95, 0.99 etc.).
A list with the calculated input values for the get_power_cc
function.
Graw, S., Henn, R., Thompson, J. A., and Koestler, D. C. (2019). pwrEWAS: A user-friendly tool for comprehensive power estimation for epigenome wide association studies (EWAS). BMC Bioinformatics, 20(1):218.
#This example find the correct EpipwR settings for a maximal (99.99th #percentile) difference of 0.25 and a dectection limit of .005. pwrE_to_EpipwR(.25, .005)#This example find the correct EpipwR settings for a maximal (99.99th #percentile) difference of 0.25 and a dectection limit of .005. pwrE_to_EpipwR(.25, .005)