MeRIP-Seq (methylated RNA immunoprecipitation followed by sequencing) is a method used to study the epigenetic regulation of gene expression by identifying methylated RNA molecules in a sample. It involves immunoprecipitation of methylated RNA using an antibody specific for methylated nucleotides, followed by high-throughput sequencing of the immunoprecipitated RNA. The resulting data provide a snapshot of the methylation status of RNA molecules, allowing researchers to investigate the role of methylation in the regulation of gene expression and other biological processes. After being collected from two or more biological conditions, MeRIP-Seq data is typically analyzed using computational tools to identify differentially methylated regions (DMRs). DMR detection can uncover the functional significance of m6A methylation and identify potential therapeutic targets for disease such as cancer.
To establish the statistical rigor of MeRIP-Seq experiments, it is important to carefully consider sample size during the study design process in order to ensure that the experiment is adequately powered to detect differentially methylated RNA (DMR) regions. However, there is no such tool available, so we developed the R package magpie, a simulation-based tool for performing power calculations on MeRIP-Seq data. magpie has two main functions:
Power calculation: Given a MeRIP-Seq dataset, various experimental scenarios, and a selected test method, magpie can calculate the statistical power of the experiment and output the results.
Results preservation and visualization: magpie can save the results of the power calculation as an Excel file, and it can also produce basic line plots that allow the user to visualize the results.
From GitHub:
install.packages("devtools") # if you have not installed "devtools" package
library(devtools)
install_github("https://github.com/dxd429/magpie",
build_vignettes = TRUE
)
From Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("magpie")
To view the package vignette in HTML format, run the following lines in R:
To get started with magpie, users are expected to provide MeRIP-Seq
data for parameter estimation. This includes paired input and IP .BAM
files for each experimental condition, along with the matching
annotation file in the *.sqlite
format. Since .BAM files
are generally large, it is also encouraged to use data of only one or a
few chromosomes.
magpie offers the function quickPower()
for users who
need quick power calculation results without their own simulations. This
function takes few seconds to run and uses pre-calculated results from
three publicly available MeRIP-seq datasets. The details of these
datasets can be found in @ref(quickPower)). To use this function and
save and plot the results:
library(magpie)
power.test <- quickPower(dataset = "GSE46705") # Options are 'GSE46705', 'GSE55575', and 'GSE94613'.
Save results into .xlsx files:
### write out .xlsx
writeToxlsx(power.test, file = "test_TRESS.xlsx")
### write out stratified results
writeToxlsx_strata(power.test, file = "test_strata_TRESS.xlsx")
Produce basic line plots for visulization:
In order to best mimic the characteristics of real data, magpie expects users to provide real MeRIP-Seq datasets, if not directly extracting our pre-calculated results.
magpie requires paired input and IP .BAM files for each replicate of both conditions. Example file names are as follows: “Ctrl1.input.bam” & Ctrl1.ip.bam”, “Ctrl2.input.bam” & Ctrl2.ip.bam”, “Case1.input.bam” & Case1.ip.bam”, “Case2.input.bam” & Case2.ip.bam”, for a 2 controls vs 2 cases DMR calling study. Note that names of provided .BAM files should be ordered carefully so that samples from different conditions will not be mistreated.
For illustration purpose, we include a sample dataset (GSE46705) from
a study investigating how METTL3-METTL14 complex mediates mammalian
nuclear RNA N6-adenosine methylation in our experimental data package
magpieData
on GitHub, which can be installed with:
install.packages("devtools") # if you have not installed "devtools" package
library(devtools)
install_github("https://github.com/dxd429/magpieData",
build_vignettes = TRUE
)
The data package contains four .BAM files of two wild type (WT) replicates, four .BAM files of two knockdown of complex METTL3 replicates only on chromosome 15, and one genome annotation file.
In terms of the annotation file, it should match the reference genome
version when obtaining the .BAM files, and be saved in the format of
*.sqlite
, which can be created with R function
makeTxDbFromUCSC()
from Bioconductor package
GenomicFeatures
:
magpie offers two functions for power evaluation:
powerEval()
and quickPower()
. If users prefer
to quickly examine the power evaluation results using the built-in
datasets, they can use the quickPower()
function.
Otherwise, users can use the powerEval()
function to
perform power evaluation on their own data and customize the simulation
settings. The output of these two functions is a list of power
measurements under various experimental settings, such as
FDR
, Precision
, and
Statistical power
.
With .BAM files and an annotation file, powerEval()
enables users to specify number of simulations (nsim
),
sample sizes (N.reps
), sequencing depths
(depth_factor
), FDR thresholds (thres
), and
testing methods (Test_method
).
Users should always indicate the percentage of the whole genome
covered by their dataset, whether it’s complete or partial
(bam_factor
). Here, bam_factor
allows for
simulating whole-genome data using a smaller subset of it, under the
assumption that DMR signals are relatively evenly distributed across
chromosomes.
We demonstrate the usage of powerEval()
with the example
dataset from R package magpieData
:
library(magpieData)
library(magpie)
### Get the example data
BAM_path <- getBAMpath()
### Call powerEval()
power.test <- powerEval(
Input.file = c("Ctrl1.chr15.input.bam", "Ctrl2.chr15.input.bam", "Case1.chr15.input.bam", "Case2.chr15.input.bam"),
IP.file = c("Ctrl1.chr15.ip.bam", "Ctrl2.chr15.ip.bam", "Case1.chr15.ip.bam", "Case2.chr15.ip.bam"),
BamDir = BAM_path,
annoDir = paste0(BAM_path, "/hg18_chr15.sqlite"),
variable = rep(c("Ctrl", "Trt"), each = 2),
bam_factor = 0.03,
nsim = 10,
N.reps = c(2, 3, 5, 7),
depth_factor = c(1, 2),
thres = c(0.01, 0.05, 0.1),
Test_method = "TRESS" ## TRESS or exomePeak2
)
To use your own data, replace Input.file
and
IP.file
with the names of your .BAM files, and set
BamDir
and annoDir
to the file paths for your
data. Make sure that the order of the variable
corresponds
to the order of your data files.
Another power calculation function is quickPower()
.
Unlike powerEval()
which often takes a while to run,
quickPower()
produces results in seconds, by directly
extracting results from three pre-evaluated datasets on GEO:
GSE46705
: Human HeLa cell line: Two replicates of
wild type (WT) and two replicates of knockdown (KD) of complex
METTL3.
GSE55575
: Mouse embryonic fibroblasts: Two
replicates of wild type (WT) and four replicates of knockdown (KD) of
WTAP.
GSE94613
: Human leukemia cell line: Four replicates
of wild type (WT) and eight replicates of knockdown (KD) of complex
METTL3.
Here, we use quickPower()
to get power evaluation
results of GSE46705
, tested by TRESS
:
Magpie computes power evaluation metrics for each experimental scenario defined in the function arguments, the results it generates include:
FDR: The ratio of number of false positives to the number of positive discoveries.
FDC: The ratio of number of false positives to the number of true positives.
Power: Statistical power.
Precision: The ratio of number of true positives to the number of positive discoveries.
Once users have obtained a list of power measurements using either
powerEval()
or quickPower()
, they can use
functions in magpie to save the results as a formatted .xlsx file and
generate basic line plots.
magpie provides two functions: writeToxlsx()
and
writeToxlsx_strata()
, for saving results to a formatted
.xlsx file. writeToxlsx()
allows you to save power
measurements for different sample sizes, FDR thresholds, and sequencing
depths, while writeToxlsx_strata()
writes out the results
stratified by input expression level and for different sample sizes.
Note that we only evaluate power under the original sequencing depth
(depth_factor = 1
) and FDR threshold of 0.05
(thres = 1
).
Here we save the output from quickPower()
to
corresponding .xlsx
files:
### write out .xlsx
writeToxlsx(power.test, file = "test_TRESS.xlsx")
### write out stratified results
writeToxlsx_strata(power.test, file = "test_strata_TRESS.xlsx")
The generated .xlsx
files are formatted as follows:
FDR | N.rep | 0.01 | 0.05 | 0.1 |
---|---|---|---|---|
2 | 0.36 | 0.48 | 0.57 | |
3 | 0.14 | 0.27 | 0.38 | |
5 | 0.06 | 0.13 | 0.21 | |
7 | 0.04 | 0.11 | 0.17 |
FDR | N.rep | (0, 27.68] | (27.68, 54.3] | (54.3, 92.64] | (92.64, Inf] |
---|---|---|---|---|---|
2 | 0.41 | 0.46 | 0.47 | 0.58 | |
3 | 0.23 | 0.30 | 0.27 | 0.28 | |
5 | 0.15 | 0.16 | 0.11 | 0.12 | |
7 | 0.12 | 0.14 | 0.08 | 0.11 |
As mentioned before, magpie also provides four plotting functions:
plotRes()
, plotAll()
,
plotStrata()
, and plotAll_Strata()
, for figure
generating. In general, plotRes()
and
plotStrata()
produce individual plots, while
plotAll()
and `plotAll_Strata()
produce four
plots in a 2 x 2 panel.
Again, we demonstrate these four functions with the previous output from `quickPower()``:
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] magpie_1.7.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] viridisLite_0.4.2 blob_1.2.4
## [3] Biostrings_2.75.1 bitops_1.0-9
## [5] fastmap_1.2.0 RCurl_1.98-1.16
## [7] GenomicAlignments_1.43.0 XML_3.99-0.17
## [9] digest_0.6.37 mime_0.12
## [11] lifecycle_1.0.4 KEGGREST_1.47.0
## [13] RSQLite_2.3.8 magrittr_2.0.3
## [15] compiler_4.4.2 rlang_1.1.4
## [17] sass_0.4.9 tools_4.4.2
## [19] utf8_1.2.4 yaml_2.3.10
## [21] rtracklayer_1.67.0 knitr_1.49
## [23] S4Arrays_1.7.1 bit_4.5.0
## [25] curl_6.0.1 DelayedArray_0.33.2
## [27] xml2_1.3.6 plyr_1.8.9
## [29] RColorBrewer_1.1-3 abind_1.4-8
## [31] BiocParallel_1.41.0 BiocGenerics_0.53.3
## [33] sys_3.4.3 aod_1.3.3
## [35] grid_4.4.2 stats4_4.4.2
## [37] fansi_1.0.6 TRESS_1.13.0
## [39] colorspace_2.1-1 ggplot2_3.5.1
## [41] scales_1.3.0 SummarizedExperiment_1.37.0
## [43] cli_3.6.3 rmarkdown_2.29
## [45] crayon_1.5.3 generics_0.1.3
## [47] rstudioapi_0.17.1 reshape2_1.4.4
## [49] httr_1.4.7 rjson_0.2.23
## [51] DBI_1.2.3 cachem_1.1.0
## [53] stringr_1.5.1 zlibbioc_1.52.0
## [55] parallel_4.4.2 AnnotationDbi_1.69.0
## [57] BiocManager_1.30.25 XVector_0.47.0
## [59] restfulr_0.0.15 matrixStats_1.4.1
## [61] vctrs_0.6.5 Matrix_1.7-1
## [63] jsonlite_1.8.9 IRanges_2.41.1
## [65] S4Vectors_0.45.2 bit64_4.5.2
## [67] systemfonts_1.1.0 GenomicFeatures_1.59.1
## [69] maketools_1.3.1 locfit_1.5-9.10
## [71] jquerylib_0.1.4 glue_1.8.0
## [73] codetools_0.2-20 stringi_1.8.4
## [75] gtable_0.3.6 GenomeInfoDb_1.43.1
## [77] GenomicRanges_1.59.0 BiocIO_1.17.0
## [79] UCSC.utils_1.3.0 munsell_0.5.1
## [81] tibble_3.2.1 pillar_1.9.0
## [83] htmltools_0.5.8.1 GenomeInfoDbData_1.2.13
## [85] R6_2.5.1 kableExtra_1.4.0
## [87] evaluate_1.0.1 lattice_0.22-6
## [89] Biobase_2.67.0 Rsamtools_2.23.0
## [91] png_0.1-8 openxlsx_4.2.7.1
## [93] memoise_2.0.1 bslib_0.8.0
## [95] zip_2.3.1 Rcpp_1.0.13-1
## [97] svglite_2.1.3 SparseArray_1.7.2
## [99] DESeq2_1.47.1 xfun_0.49
## [101] MatrixGenerics_1.19.0 buildtools_1.0.0
## [103] pkgconfig_2.0.3