ScanMiRApp
offers a
shiny
interface to the scanMiR
package, as well as convenience
function to simplify its use with common annotations.
Both the shiny app and the convenience functions rely on objects of
the class ScanMiRAnno
, which contain the different pieces
of annotation relating to a species and genome build. Annotations for
human (GRCh38), mouse (GRCm38) and rat (Rnor_6) can be obtained as
follows:
library(scanMiRApp)
# anno <- ScanMiRAnno("Rnor_6")
# for this vignette, we'll work with a lightweight fake annotation:
anno <- ScanMiRAnno("fake")
anno
## Genome: /tmp/RtmpEBnMoQ/file230d54a9050a
## Annotation: Fake falsus (fake1)
## Models: KdModelList of length 1
You can also build your own ScanMiRAnno
object by
providing the function with the different components (minimally, a
BSgenome
and an ensembldb
object - see ?ScanMiRAnno
for more information). For
minimal functioning with the shiny app, the models
slot
additionally needs to be populated with a KdModelList
(see
the corresponding vignette of the scanMiR
package for more
information).
In addition, ScanMiRAnno
objects can contain
pre-compiled scans and aggregations, which are especially meant to speed
up the shiny application. These should be saved as IndexedFst files and should be respectively
indexed by transcript and by miRNA, and stored in the scan
and aggregated
slot of the object.
The transcript (or UTR) sequence for any (set of) transcript(s) in the annotation can be obtained with:
## DNAStringSet object of length 1:
## width seq names
## [1] 688 CGTATTAAATTTAGCAAGGTTCC...ACCTTCAGATTTCAGCAGACTAG ENSTFAKE0000056456
Binding sites of a given miRNA on a transcript can be visualized with:
## Prepare miRNA model
## Get Transcript Sequence
## Scan
This will fetch the sequence, perform the scan, and plot the results.
The runFullScan
function can be used to launch a the
scan for all miRNAs on all protein-coding transcripts (or their UTRs) of
a genome. These scans can then be used to speed up the shiny app (see
below). They can simply be launched as:
## Loading annotation
## Extracting transcripts
## Scanning with 1 thread(s)
## GRanges object with 2 ranges and 4 metadata columns:
## seqnames ranges strand | type log_kd p3.score note
## <Rle> <IRanges> <Rle> | <factor> <integer> <integer> <Rle>
## [1] ENSTFAKE0000056456 281-288 * | 8mer -4868 12 TDMD?
## [2] ENSTFAKE0000056456 482-489 * | 7mer-m8 -3702 0 -
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Multi-threading can be enabled through the ncores
argument. See ?runFullScan
for more options.
The enrichedMirTxPairs
identifies miRNA-target
enrichments (which could indicate sponge- or cargo-like behaviors) by
means of a binomial model estimating the probability of the given number
of binding sites for a given pair given the total number of bindings
sites for the miRNA (across all transcripts) and transcript (across all
miRNAs) in question. The output is a data.frame indicating, for each
pair passing some lenient filtering, the transcript, miRNA, the number
of 7mer/8mer sites, and the binomial log(p-value) of the combination. We
strongly recommend further filtering this list by expression of the
transcript in the system of interest, otherwise some transcripts with
very low expression (and hence biologically irrelevant) might come up as
strongly enriched.
The features of the shiny app are organized into two main components:
transcript (or sequence) -centered features are available in the search in gene/sequence tab. These for instance allow to scan custom sequences or selected transcript sequences for miRNA binding sites, visualize them on the transcript, and visualize the sequence pairing of specific matches.
the miRNA-centered features are available in the
miRNA-based tab. It shows the general binding specificity of a
given miRNA. If the scanMiRAnno
object contained aggregated
data (see below), the tab also shows the top predicted targets for the
miRNAs.
A ScanMiRAnno
object is the minimal input for the shiny
app, and multiple such objects can be provided in the form of a named
list:
Launched with this object, the app will not have access to any
pre-compiled scans or to aggregated data. This means that scans will be
performed on the fly, which also means that they will be slower. In
addition, it means that the top targets based on aggregated repression
estimates (in the miRNA-based tab) will not be available. To
provide this additional information, you first need to prepare the
objects as IndexedFst files. Assuming
you’ve saved (or downloaded) the scans as scan.rds
and the
aggregated data as aggregated.rds
, you can re-save them as
IndexedFst
(here in the folder out_path
) and
add them to the anno
object as follows:
# not run
anno <- ScanMiRAnno("Rnor_6")
saveIndexedFst(readRDS("scan.rds"), "seqnames", file.prefix="out_path/scan")
saveIndexedFst(readRDS("aggregated.rds"), "miRNA",
file.prefix="out_path/aggregated")
anno$scan <- loadIndexedFst("out_path/scan")
anno$aggregated <- loadIndexedFst("out_path/aggregated")
# then launch the app
scanMiRApp(list(Rnor_6=anno))
The same could be done for multiple ScanMiRAnno objects. If
scanMiRApp
is launched without any annotation
argument, it will generate anno objects for the three base species
(without any pre-compiled data).
Multithreading can be enabled in the shiny app by calling
scanMiRApp()
(or the underlying
scanMiRserver()
) with the BP
argument,
e.g.:
scanMiRApp(..., BP=BiocParallel::MulticoreParam(ncores))
where ncores
is the number of threads to use. This will
enable multi-threading for the scanning functions, which makes a big
difference when scanning for many miRNAs at a time. In addition,
multi-threading can be used to read the IndexedFst
files,
which is enabled by the nthreads
of the
loadIndexedFst
function. However, since reading is quite
fast already with a single core, improvements there are typically fairly
marginal.
By default, the app has a caching system which means that if a user
wants to launch the same scan with the same parameters twice, the
results will be re-used instead of re-computed. The cache has a maximum
size (by default 10MB) per user, beyond which older cache items will be
removed. The cache size can be manipulated through the
maxCacheSize
argument.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] GenomicRanges_1.59.1 GenomeInfoDb_1.43.1 IRanges_2.41.1
## [4] S4Vectors_0.45.2 BiocGenerics_0.53.3 generics_0.1.3
## [7] fstcore_0.9.18 scanMiRApp_1.13.0 scanMiR_1.13.0
## [10] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] sys_3.4.3 jsonlite_1.8.9
## [3] magrittr_2.0.3 shinyjqui_0.4.1
## [5] GenomicFeatures_1.59.1 farver_2.1.2
## [7] rmarkdown_2.29 BiocIO_1.17.0
## [9] zlibbioc_1.52.0 vctrs_0.6.5
## [11] memoise_2.0.1 Rsamtools_2.23.0
## [13] RCurl_1.98-1.16 htmltools_0.5.8.1
## [15] S4Arrays_1.7.1 progress_1.2.3
## [17] AnnotationHub_3.15.0 curl_6.0.1
## [19] SparseArray_1.7.2 sass_0.4.9
## [21] bslib_0.8.0 htmlwidgets_1.6.4
## [23] httr2_1.0.6 plotly_4.10.4
## [25] cachem_1.1.0 buildtools_1.0.0
## [27] GenomicAlignments_1.43.0 mime_0.12
## [29] lifecycle_1.0.4 pkgconfig_2.0.3
## [31] Matrix_1.7-1 R6_2.5.1
## [33] fastmap_1.2.0 GenomeInfoDbData_1.2.13
## [35] MatrixGenerics_1.19.0 shiny_1.9.1
## [37] digest_0.6.37 colorspace_2.1-1
## [39] AnnotationDbi_1.69.0 shinycssloaders_1.1.0
## [41] RSQLite_2.3.8 labeling_0.4.3
## [43] seqLogo_1.73.0 filelock_1.0.3
## [45] fansi_1.0.6 httr_1.4.7
## [47] abind_1.4-8 compiler_4.4.2
## [49] withr_3.0.2 bit64_4.5.2
## [51] BiocParallel_1.41.0 DBI_1.2.3
## [53] biomaRt_2.63.0 rappdirs_0.3.3
## [55] DelayedArray_0.33.2 waiter_0.2.5
## [57] rjson_0.2.23 tools_4.4.2
## [59] httpuv_1.6.15 fst_0.9.8
## [61] glue_1.8.0 restfulr_0.0.15
## [63] promises_1.3.0 grid_4.4.2
## [65] gtable_0.3.6 tidyr_1.3.1
## [67] ensembldb_2.31.0 data.table_1.16.2
## [69] hms_1.1.3 xml2_1.3.6
## [71] utf8_1.2.4 XVector_0.47.0
## [73] stringr_1.5.1 BiocVersion_3.21.1
## [75] pillar_1.9.0 later_1.3.2
## [77] rintrojs_0.3.4 dplyr_1.1.4
## [79] BiocFileCache_2.15.0 lattice_0.22-6
## [81] rtracklayer_1.67.0 bit_4.5.0
## [83] tidyselect_1.2.1 maketools_1.3.1
## [85] Biostrings_2.75.1 knitr_1.49
## [87] ProtGenerics_1.39.0 SummarizedExperiment_1.37.0
## [89] xfun_0.49 shinydashboard_0.7.2
## [91] Biobase_2.67.0 matrixStats_1.4.1
## [93] DT_0.33 stringi_1.8.4
## [95] UCSC.utils_1.3.0 lazyeval_0.2.2
## [97] yaml_2.3.10 evaluate_1.0.1
## [99] codetools_0.2-20 tibble_3.2.1
## [101] BiocManager_1.30.25 cli_3.6.3
## [103] xtable_1.8-4 munsell_0.5.1
## [105] jquerylib_0.1.4 Rcpp_1.0.13-1
## [107] dbplyr_2.5.0 png_0.1-8
## [109] XML_3.99-0.17 parallel_4.4.2
## [111] ggplot2_3.5.1 blob_1.2.4
## [113] prettyunits_1.2.0 AnnotationFilter_1.31.0
## [115] bitops_1.0-9 pwalign_1.3.0
## [117] txdbmaker_1.3.0 viridisLite_0.4.2
## [119] scales_1.3.0 scanMiRData_1.12.0
## [121] purrr_1.0.2 crayon_1.5.3
## [123] rlang_1.1.4 cowplot_1.1.3
## [125] KEGGREST_1.47.0