RNAmodR: RiboMethSeq

Introduction

Among the various post-transcriptional RNA modifications, 2’-O methylations are commonly found in rRNA and tRNA. They promote the endo conformation of the ribose and confere resistance to alkaline degradation by preventing a nucleophilic attack on the 3’-phosphate especially in flexible RNA, which is fascilitated by high pH conditions. This property can be queried using a method called RiboMethSeq (Birkedal et al. 2015) for which RNA is treated in alkaline conditions and RNA fragments are used to prepare a sequencing library (Marchand et al. 2017).

At position containing a 2’-O methylations, read ends are less frequent, which is used to detect and score the 2’-O methylations.

The ModRiboMethSeq class uses the the ProtectedEndSequenceData class to store and aggregate data along the transcripts. The calculated scores follow the nomenclature of (Birkedal et al. 2015; Galvanin et al. 2019) with the names scoreRMS (default), scoreA, scoreB and scoreMean.

## Warning: replacing previous import 'utils::findMatches' by
## 'S4Vectors::findMatches' when loading 'ExperimentHubData'
library(rtracklayer)
library(GenomicRanges)
library(RNAmodR.RiboMethSeq)
library(RNAmodR.Data)

Example workflow

The example workflow is limited to two 2’-O methylated position on 5.8S rRNA, since the size of the raw data is limited. For annotation data either a gff file or a TxDb object and for sequence data a fasta file or a BSgenome object can be used. The data is provided as bam files.

annotation <- GFF3File(RNAmodR.Data.example.RMS.gff3())
sequences <- RNAmodR.Data.example.RMS.fasta()
files <- list("Sample1" = c(treated = RNAmodR.Data.example.RMS.1()),
              "Sample2" = c(treated = RNAmodR.Data.example.RMS.2()))

Analysis of data

The analysis is triggered by the construction of a ModSetRiboMethSeq object. Internally parallelization is used via the BiocParallel package, which would allow optimization depending on number/size of input files (number of samples, number of replicates, number of transcripts, etc).

msrms <- ModSetRiboMethSeq(files, annotation = annotation, sequences = sequences)
## Import genomic features from the file as a GRanges object ... OK
## Prepare the 'metadata' data frame ... OK
## Make the TxDb object ...
## Warning in .makeTxDb_normarg_chrominfo(chrominfo): genome version information
## is not available for this TxDb object
## OK
msrms
## ModSetRiboMethSeq of length 2
## names(2): Sample1 Sample2
## | Modification type(s):  Am / Cm / Gm / Um                                      
##                        Sample1 Sample2
## | Modifications found:      no yes (1)
## | Settings:
##         minCoverage minReplicate  find.mod maxLength minSignal flankingRegion
##           <integer>    <integer> <logical> <integer> <integer>      <integer>
## Sample1          10            1      TRUE        50        10              6
## Sample2          10            1      TRUE        50        10              6
##         minScoreA minScoreB minScoreRMS minScoreMean flankingRegionMean
##         <numeric> <numeric>   <numeric>    <numeric>          <integer>
## Sample1       0.6       3.6        0.75         0.75                  2
## Sample2       0.6       3.6        0.75         0.75                  2
##                 weights
##           <NumericList>
## Sample1 0.9,1.0,0.0,...
## Sample2 0.9,1.0,0.0,...
##         scoreOperator
##           <character>
## Sample1             &
## Sample2             &

Visualizing the results

To compare samples, we need to know, which positions should be part of the comparison. This can either be done by aggregating the detect over all samples and use the union or intersect or by using publish data. We want to assemble a GRanges object from the latter by utilising the infomation from the snoRNAdb (Lestrade and Weber 2006).

In this specific example only information for the 5.8S RNA is used, since the example data would be to big otherwise. The information regarding the parent and seqname must match the information used as the annotation data. Check that it matches the output of ranges() on a SequenceData, Modifier oder ModifierSet object.

table <- read.csv2(RNAmodR.Data.snoRNAdb(), stringsAsFactors = FALSE)
## see ?RNAmodR.Data and browseVignettes('RNAmodR.Data') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
table <- table[table$hgnc_id == "53533",] # Subset to RNA5.8S
# keep only the current coordinates
table <- table[,1L:7L]
snoRNAdb <- GRanges(seqnames = "chr1",
              ranges = IRanges(start = table$position,
                               width = 1),
              strand = "+",
              type = "RNAMOD",
              mod = table$modification,
              Parent = "1", #this is the transcript id
              Activity = IRanges::CharacterList(strsplit(table$guide,",")))
coord <- split(snoRNAdb,snoRNAdb$Parent)

In addition to the coordinates of published, we also want to include more meaningful names for the transcripts. For this we provide a data.frame with two columns, tx_id and name. All values in the first column have to match transcript IDs.

ranges(msrms)
## GRangesList object of length 1:
## $`1`
## GRanges object with 1 range and 3 metadata columns:
##       seqnames    ranges strand |   exon_id   exon_name exon_rank
##          <Rle> <IRanges>  <Rle> | <integer> <character> <integer>
##   [1]     chr1     1-157      + |         1 NR_003285.2         1
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths
alias <- data.frame(tx_id = "1", name = "5.8S rRNA", stringsAsFactors = FALSE)
plotCompareByCoord(msrms[c(2L,1L)], coord, alias = alias)
Heatmap showing RiboMethSeq scores for 2'-O methylated positions on the 5.8S rRNA.

Heatmap showing RiboMethSeq scores for 2’-O methylated positions on the 5.8S rRNA.

Results can also be compared on a sequence level, by selecting specific coordinates to compare.

singleCoord <- coord[[1L]][1L,]
plotDataByCoord(msrms, singleCoord)
RiboMethSeq scores around Um(14) on 5.8S rRNA.

RiboMethSeq scores around Um(14) on 5.8S rRNA.

By default only the RiboMethSeq score and the ScoreMean are shown. The raw sequence data can be inspected as well

singleCoord <- coord[[1L]][1L,]
plotDataByCoord(msrms, singleCoord, showSequenceData = TRUE)
RiboMethSeq scores around Um(14) on 5.8S rRNA. Sequence data is shown by setting `showSequenceData = TRUE`.

RiboMethSeq scores around Um(14) on 5.8S rRNA. Sequence data is shown by setting showSequenceData = TRUE.

Performance

To access the performance of the method in combination with samples used, use the plotROC function.

plotROC(msrms,coord)
TPR versus FPR plot.

TPR versus FPR plot.

The example given here should be regarded as a proof of concept. Based on the results, minimal scores for calling modified positions can be adjusted to the individual requirements.

settings(msrms) <- list(minScoreMean = 0.7)
msrms
## ModSetRiboMethSeq of length 2
## names(2): Sample1 Sample2
## | Modification type(s):  Am / Cm / Gm / Um                                      
##                        Sample1 Sample2
## | Modifications found:      no yes (1)
## | Settings:
##         minCoverage minReplicate  find.mod maxLength minSignal flankingRegion
##           <integer>    <integer> <logical> <integer> <integer>      <integer>
## Sample1          10            1      TRUE        50        10              6
## Sample2          10            1      TRUE        50        10              6
##         minScoreA minScoreB minScoreRMS minScoreMean flankingRegionMean
##         <numeric> <numeric>   <numeric>    <numeric>          <integer>
## Sample1       0.6       3.6        0.75          0.7                  2
## Sample2       0.6       3.6        0.75          0.7                  2
##                 weights
##           <NumericList>
## Sample1 0.9,1.0,0.0,...
## Sample2 0.9,1.0,0.0,...
##         scoreOperator
##           <character>
## Sample1             &
## Sample2             &
## Warning: Settings were changed after data aggregation or modification search.
## Rerun with modify(x,force = TRUE) to update with current settings.

As the warning suggested, after modifying the settings the results should be updated by running modify(x,force = TRUE).

msrms2 <- modify(msrms,force = TRUE)

Session info

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] Rsamtools_2.23.1           RNAmodR.Data_1.20.0       
##  [3] ExperimentHubData_1.33.0   AnnotationHubData_1.37.0  
##  [5] futile.logger_1.4.3        ExperimentHub_2.15.0      
##  [7] AnnotationHub_3.15.0       BiocFileCache_2.15.0      
##  [9] dbplyr_2.5.0               RNAmodR.RiboMethSeq_1.21.0
## [11] RNAmodR_1.21.0             Modstrings_1.23.0         
## [13] Biostrings_2.75.1          XVector_0.47.0            
## [15] rtracklayer_1.67.0         GenomicRanges_1.59.1      
## [17] GenomeInfoDb_1.43.2        IRanges_2.41.1            
## [19] S4Vectors_0.45.2           BiocGenerics_0.53.3       
## [21] generics_0.1.3             BiocStyle_2.35.0          
## 
## loaded via a namespace (and not attached):
##   [1] BiocIO_1.17.1               bitops_1.0-9               
##   [3] filelock_1.0.3              tibble_3.2.1               
##   [5] graph_1.85.0                XML_3.99-0.17              
##   [7] rpart_4.1.23                lifecycle_1.0.4            
##   [9] httr2_1.0.7                 lattice_0.22-6             
##  [11] ensembldb_2.31.0            OrganismDbi_1.49.0         
##  [13] backports_1.5.0             magrittr_2.0.3             
##  [15] Hmisc_5.2-0                 sass_0.4.9                 
##  [17] rmarkdown_2.29              jquerylib_0.1.4            
##  [19] yaml_2.3.10                 RUnit_0.4.33               
##  [21] Gviz_1.51.0                 DBI_1.2.3                  
##  [23] buildtools_1.0.0            RColorBrewer_1.1-3         
##  [25] abind_1.4-8                 zlibbioc_1.52.0            
##  [27] purrr_1.0.2                 AnnotationFilter_1.31.0    
##  [29] biovizBase_1.55.0           RCurl_1.98-1.16            
##  [31] nnet_7.3-19                 VariantAnnotation_1.53.0   
##  [33] rappdirs_0.3.3              GenomeInfoDbData_1.2.13    
##  [35] AnnotationForge_1.49.0      maketools_1.3.1            
##  [37] codetools_0.2-20            DelayedArray_0.33.2        
##  [39] xml2_1.3.6                  tidyselect_1.2.1           
##  [41] farver_2.1.2                UCSC.utils_1.3.0           
##  [43] matrixStats_1.4.1           base64enc_0.1-3            
##  [45] GenomicAlignments_1.43.0    jsonlite_1.8.9             
##  [47] Formula_1.2-5               tools_4.4.2                
##  [49] progress_1.2.3              stringdist_0.9.12          
##  [51] Rcpp_1.0.13-1               glue_1.8.0                 
##  [53] gridExtra_2.3               SparseArray_1.7.2          
##  [55] BiocBaseUtils_1.9.0         xfun_0.49                  
##  [57] MatrixGenerics_1.19.0       dplyr_1.1.4                
##  [59] withr_3.0.2                 formatR_1.14               
##  [61] BiocManager_1.30.25         fastmap_1.2.0              
##  [63] latticeExtra_0.6-30         fansi_1.0.6                
##  [65] digest_0.6.37               mime_0.12                  
##  [67] R6_2.5.1                    colorspace_2.1-1           
##  [69] jpeg_0.1-10                 dichromat_2.0-0.1          
##  [71] biomaRt_2.63.0              RSQLite_2.3.8              
##  [73] utf8_1.2.4                  data.table_1.16.2          
##  [75] prettyunits_1.2.0           httr_1.4.7                 
##  [77] htmlwidgets_1.6.4           S4Arrays_1.7.1             
##  [79] pkgconfig_2.0.3             gtable_0.3.6               
##  [81] blob_1.2.4                  sys_3.4.3                  
##  [83] htmltools_0.5.8.1           RBGL_1.83.0                
##  [85] ProtGenerics_1.39.0         scales_1.3.0               
##  [87] Biobase_2.67.0              png_0.1-8                  
##  [89] colorRamps_2.3.4            knitr_1.49                 
##  [91] lambda.r_1.2.4              rstudioapi_0.17.1          
##  [93] reshape2_1.4.4              rjson_0.2.23               
##  [95] checkmate_2.3.2             curl_6.0.1                 
##  [97] biocViews_1.75.0            cachem_1.1.0               
##  [99] stringr_1.5.1               BiocVersion_3.21.1         
## [101] parallel_4.4.2              foreign_0.8-87             
## [103] AnnotationDbi_1.69.0        restfulr_0.0.15            
## [105] pillar_1.9.0                grid_4.4.2                 
## [107] vctrs_0.6.5                 cluster_2.1.6              
## [109] htmlTable_2.4.3             evaluate_1.0.1             
## [111] GenomicFeatures_1.59.1      cli_3.6.3                  
## [113] compiler_4.4.2              futile.options_1.0.1       
## [115] rlang_1.1.4                 crayon_1.5.3               
## [117] labeling_0.4.3              interp_1.1-6               
## [119] plyr_1.8.9                  stringi_1.8.4              
## [121] deldir_2.0-4                BiocParallel_1.41.0        
## [123] BiocCheck_1.43.2            txdbmaker_1.3.1            
## [125] munsell_0.5.1               lazyeval_0.2.2             
## [127] Matrix_1.7-1                BSgenome_1.75.0            
## [129] hms_1.1.3                   bit64_4.5.2                
## [131] ggplot2_3.5.1               KEGGREST_1.47.0            
## [133] SummarizedExperiment_1.37.0 ROCR_1.0-11                
## [135] memoise_2.0.1               bslib_0.8.0                
## [137] bit_4.5.0

References

Birkedal, Ulf, Mikkel Christensen-Dalsgaard, Nicolai Krogh, Radhakrishnan Sabarinathan, Jan Gorodkin, and Henrik Nielsen. 2015. “Profiling of Ribose Methylations in RNA by High-Throughput Sequencing.” Angewandte Chemie (International Ed. In English) 54 (2): 451–55. https://doi.org/10.1002/anie.201408362.
Galvanin, Adeline, Lilia Ayadi, Mark Helm, Yuri Motorin, and Virginie Marchand. 2019. “Mapping and Quantification of tRNA 2’-o-Methylation by RiboMethSeq.” Edited by Narendra Wajapeyee and Romi Gupta, 273–95. https://doi.org/10.1007/978-1-4939-8808-2_21.
Lestrade, Laurent, and Michel J. Weber. 2006. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs.” Nucleic Acids Research 34 (suppl_1): D158–62. https://doi.org/10.1093/nar/gkj002.
Marchand, Virginie, Lilia Ayadi, Aseel El Hajj, Florence Blanloeil-Oillo, Mark Helm, and Yuri Motorin. 2017. “High-Throughput Mapping of 2’-o-Me Residues in RNA Using Next-Generation Sequencing (Illumina RiboMethSeq Protocol).” Methods in Molecular Biology (Clifton, N.J.) 1562: 171–87. https://doi.org/10.1007/978-1-4939-6807-7_12.