Alternative splicing occurs in most human genes and novel splice isoforms may be associated to disease or tissue specific functions. However, functional interpretation of splicing requires the integration of heterogeneous data sources such as transcriptomic and proteomic information.
We developed maser (Mapping Alternative Splicing Events to pRoteins) package to enable functional characterization of splicing in both transcriptomic and proteomic contexts. Overall, maser allows a detailed analysis of splicing events identified by rMATS by implementing the following functionalities:
A key feature of the package is mapping of splicing events to protein features such as topological domains, motifs and mutation sites provided by the UniprotKB database and visualized along side the genomic location of splicing.
In this manner, maser can quickly identify splicing affecting known protein domains, extracellular and transmembrane regions, as well as mutation sites in the protein.
In this vignette, we describe a basic maser workflow for analyzing alternative splicing, starting with importing splicing events, filtering events based on their coverage and differential expression, and analyzing global or specific spling events with several types of graphics.
The second vignette describes how to use maser for annotation and visualization of protein features affected by splicing.
Throughout the text we use the following abbreviations to describe different types of splicing events:
We demonstrate the package with data generated to investigate alternative splicing in colorectal cancer cells undergoing hypoxia publication. RNA-seq was collected at 0h and 24h after hypoxia in the HCT116 cell line. We applied rMATS to detect splicing events using the release 25 GTF (GRCh38 build) from the Gencode website. The example dataset was reduced because of size constraints.
To import splicing events, we use the constructor function
maser()
indicating the path containing rMATS files, and
labels for conditions. The maser()
constructor returns an
object containing all events quantified by rMATS.
Note: The argument ftype
indicates
which rMATS output files to import. Newer rMATS versions (>4.0.1) use
JCEC
or JC
nomenclature, while
ReadsOnTargetandJunction
or JunctionCountOnly
are used in rMATS 3.2.5 or lower. See ?maser
for a
description of parameters.
library(maser)
library(rtracklayer)
# path to Hypoxia data
path <- system.file("extdata", file.path("MATS_output"),
package = "maser")
hypoxia <- maser(path, c("Hypoxia 0h", "Hypoxia 24h"), ftype = "ReadsOnTargetAndJunctionCounts")
hypoxia
#> A Maser object with 6022 splicing events.
#>
#> Samples description:
#> Label=Hypoxia 0h n=3 replicates
#> Label=Hypoxia 24h n=3 replicates
#>
#> Splicing events:
#> A3SS.......... 459 events
#> A5SS.......... 567 events
#> SE.......... 4081 events
#> RI.......... 722 events
#> MXE.......... 193 events
Splicing events have genomic locations, raw counts, PSI levels and
statistics (p-values) regarding their differential splicing between
conditions. Access to different data types is done using
annotation()
, counts()
, PSI()
,
and summary()
, which takes as argument the
maser
object and event type.
ID | GeneID | geneSymbol | PValue | FDR | IncLevelDifference | PSI_1 | PSI_2 |
---|---|---|---|---|---|---|---|
10022 | ENSG00000072682.18 | P4HA2 | 0 | 0 | -0.331 | 0.567,0.572,0.509 | 0.919,0.912,0.81 |
10161 | ENSG00000175592.8 | FOSL1 | 0 | 0 | -0.090 | 0.877,0.864,0.87 | 0.942,0.969,0.971 |
10162 | ENSG00000175592.8 | FOSL1 | 0 | 0 | -0.125 | 0.821,0.803,0.814 | 0.906,0.952,0.955 |
10344 | ENSG00000121931.15 | LRIF1 | 0 | 0 | 0.310 | 0.398,0.383,0.465 | 0.037,0.076,0.204 |
10730 | ENSG00000273559.4 | CWC25 | 0 | 0 | -0.335 | 0.075,0.105,0.071 | 0.394,0.424,0.439 |
11204 | ENSG00000103121.8 | CMC2 | 0 | 0 | 0.170 | 0.935,0.968,0.939 | 0.841,0.708,0.784 |
Low coverage splicing junctions are commonly found in RNA-seq data
and lead to low confidence PSI levels. We can remove low coverage events
using filterByCoverage()
, which may signficantly reduced
the number of splicing events.
The function topEvents()
allows to select statistically
significant events given a FDR cutoff and minimum PSI change. Default
values are fdr = 0.05
and deltaPSI = 0.1
(ie.
10% minimum change).
hypoxia_top <- topEvents(hypoxia_filt, fdr = 0.05, deltaPSI = 0.1)
hypoxia_top
#> A Maser object with 2882 splicing events.
#>
#> Samples description:
#> Label=Hypoxia 0h n=3 replicates
#> Label=Hypoxia 24h n=3 replicates
#>
#> Splicing events:
#> A3SS.......... 224 events
#> A5SS.......... 285 events
#> SE.......... 1944 events
#> RI.......... 337 events
#> MXE.......... 92 events
Gene specific events can be selected using geneEvents()
.
For instance, there are 8 splicing changes affecting MIB2 as seen
below.
hypoxia_mib2 <- geneEvents(hypoxia_filt, geneS = "MIB2", fdr = 0.05, deltaPSI = 0.1)
print(hypoxia_mib2)
#> A Maser object with 8 splicing events.
#>
#> Samples description:
#> Label=Hypoxia 0h n=3 replicates
#> Label=Hypoxia 24h n=3 replicates
#>
#> Splicing events:
#> A3SS.......... 0 events
#> A5SS.......... 3 events
#> SE.......... 3 events
#> RI.......... 2 events
#> MXE.......... 0 events
Events in a maser
object can be queried using an
interactive data table provided by display()
. The table
allows to look up event information such as gene names, identifiers and
PSI levels.
PSI levels for gene events can be plotted using
plotGenePSI()
, indicating a valid splicing type.
An overview of significant events can be obtained using either
dotplot()
or volcano()
functions, specifying
FDR levels, minimum change in PSI between conditions and splicing type.
Significant events in each condition will be highlighted.
If only significant events should be plotted, then use
topEvents()
combined with volcano()
or
dotplot()
for visualization.
Splicing patterns of individual replicates can also be inspected
using principal component analysis (PCA) and plotted using
pca()
. Also, boxplots of PSI levels distributions for all
events in the maser
object can be visualized using
boxplot_PSI_levels()
. The breakdown of splicing types can
be plotted using splicingDistribution()
and desired
significance thresholds. Please refer to help pages for examples on how
to use these functions.
Users can use maser to visualize Ensembl transcripts
affected by splicing. The core function for transcripts visualization is
plotTranscripts()
. This is a wrapper function for
performing both mapping of splicing events to the transcriptome, as well
as visualization of transcripts that are compatible with the splice
event. The function uses Gviz for
visualization of the genomic locus of splicing event along with involved
transcripts.
Internally, plotTranscripts()
uses
mapTranscriptsToEvents()
to identify compatible transcripts
by overlapping exons involved in splicing to the gene models provided in
the Ensembl GTF. Each type of splice event requires a specific
overlapping rule (described below) and a customized Gviz plot is
created for each splicing type.
plotTranscripts()
requires an Ensembl or Gencode GTF
using the hg38 build of the human genome. Ensembl GTFs can be retrieved
using AnnotationHub
or imported using import.gff()
from the rtracklayer
package. Several GTF releases are available, and maser is
compatible with any version using the hg38 build.
We are going to use a reduced GTF extracted from Ensembl Release 85 for running examples.
## Ensembl GTF annotation
gtf_path <- system.file("extdata", file.path("GTF","Ensembl85_examples.gtf.gz"),
package = "maser")
ens_gtf <- rtracklayer::import.gff(gtf_path)
The most common type of splicing event involves a cassette exon that
is either expressed or skipped. For instance, the splicing factor SRSF6
undergoes splicing during hypoxia by expressing an alternative exon.
plotTranscripts
identify the switch from isoform
SRSF6-001 to the isoform SRSF6-002 after 24hr
hypoxia.
## Retrieve SRSF6 splicing events
srsf6_events <- geneEvents(hypoxia_filt, geneS = "SRSF6", fdr = 0.05,
deltaPSI = 0.1 )
## Dislay affected transcripts and PSI levels
plotTranscripts(srsf6_events, type = "SE", event_id = 33209,
gtf = ens_gtf, zoom = FALSE, show_PSI = TRUE)
In the Gviz, the event track depicts location of exons involved in skipping event. The Inclusion track shows transcripts overlapping the cassette exon as well as both flanking exons (i.e upstream and downstream exons). On the other hand, the skipping track displays transcripts overlapping both flanking exons but missing the cassette exon.
In the exon skipping event, the PSI track displays the inclusion level for the cassette exon (a.k.a. alternative exon). This PSI track is included by default and can be turned off with .
In the example above, there is a significant increase of the cassette exon inclusion after 24h hypoxia. This allows one to predict the direction of the isoform switch, i.e. from isoform SRSF6-001 to the isoform SRSF6-002 in hypoxic conditions.
Intron retention is a type of splice event that are usually associated to decreased protein translation. In this case, the Retention track shows transcripts with an exact overlap of the retained intron, and the Non-retention tracks will display transcripts in which the intron is spliced out and overlap flanking exons. Here the PSI track refers to the inclusion level of the retained intron. The example below shows the increase in retained intron affecting STAT2 in hypoxia.
This event refers to adjacent exons that are mutually exclusive, i.e. are not expressed together. There are 36 MXE events in the hypoxia dataset, including one affecting the IL32 gene. Tracks will display transcripts harboring the first or second mutually exclusive exons, as well as both flanking exons.
The PSI track in the mutually exclusive exons event wil show two sets of boxplots. The first set refers to Exon 1 PSI levels while the second set refers to Exon 2 PSI levels in the two conditions. Therefore, the example below denotes increased Exon 2 PSI in hypoxia 24h.
Alternative 5’ splicing occur due to alternative donor sites, while altenative 3’ is caused by change in the acceptor sites. Practically, these splicing events will lead to expression of longer or shorter exons.
The short exon track shows transcripts overlapping both short and flanking exons, while the long exon track shows transcripts overlapping both long and flanking exons.
In these type of events, the PSI track indicates inclusion
levels for the longest exon. It might be useful for visualization to set
zoom = TRUE
when the alternative splicing generates exons
of similar sizes.
In the example below, BCS1L contains an exon with alternative 5’ splice site, being that the longer exon has a signficantly higher PSI in normal condition (0h), while the shorter exon has higher PSI in hypoxia.
Here is the output of sessionInfo()
on the system on
which this document was compiled:
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] rtracklayer_1.67.0 maser_1.25.0 GenomicRanges_1.59.1
#> [4] GenomeInfoDb_1.43.1 IRanges_2.41.1 S4Vectors_0.45.2
#> [7] BiocGenerics_0.53.3 generics_0.1.3 ggplot2_3.5.1
#> [10] BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 sys_3.4.3
#> [3] rstudioapi_0.17.1 jsonlite_1.8.9
#> [5] magrittr_2.0.3 GenomicFeatures_1.59.1
#> [7] farver_2.1.2 rmarkdown_2.29
#> [9] BiocIO_1.17.0 zlibbioc_1.52.0
#> [11] vctrs_0.6.5 memoise_2.0.1
#> [13] Rsamtools_2.23.0 RCurl_1.98-1.16
#> [15] base64enc_0.1-3 htmltools_0.5.8.1
#> [17] S4Arrays_1.7.1 progress_1.2.3
#> [19] curl_6.0.1 SparseArray_1.7.2
#> [21] Formula_1.2-5 sass_0.4.9
#> [23] bslib_0.8.0 htmlwidgets_1.6.4
#> [25] plyr_1.8.9 Gviz_1.51.0
#> [27] httr2_1.0.6 cachem_1.1.0
#> [29] buildtools_1.0.0 GenomicAlignments_1.43.0
#> [31] lifecycle_1.0.4 pkgconfig_2.0.3
#> [33] Matrix_1.7-1 R6_2.5.1
#> [35] fastmap_1.2.0 GenomeInfoDbData_1.2.13
#> [37] MatrixGenerics_1.19.0 digest_0.6.37
#> [39] colorspace_2.1-1 AnnotationDbi_1.69.0
#> [41] crosstalk_1.2.1 Hmisc_5.2-0
#> [43] RSQLite_2.3.8 labeling_0.4.3
#> [45] filelock_1.0.3 fansi_1.0.6
#> [47] httr_1.4.7 abind_1.4-8
#> [49] compiler_4.4.2 bit64_4.5.2
#> [51] withr_3.0.2 htmlTable_2.4.3
#> [53] backports_1.5.0 BiocParallel_1.41.0
#> [55] DBI_1.2.3 biomaRt_2.63.0
#> [57] rappdirs_0.3.3 DelayedArray_0.33.2
#> [59] rjson_0.2.23 tools_4.4.2
#> [61] foreign_0.8-87 nnet_7.3-19
#> [63] glue_1.8.0 restfulr_0.0.15
#> [65] grid_4.4.2 checkmate_2.3.2
#> [67] reshape2_1.4.4 cluster_2.1.6
#> [69] gtable_0.3.6 BSgenome_1.75.0
#> [71] ensembldb_2.31.0 data.table_1.16.2
#> [73] hms_1.1.3 xml2_1.3.6
#> [75] utf8_1.2.4 XVector_0.47.0
#> [77] pillar_1.9.0 stringr_1.5.1
#> [79] dplyr_1.1.4 BiocFileCache_2.15.0
#> [81] lattice_0.22-6 deldir_2.0-4
#> [83] bit_4.5.0 biovizBase_1.55.0
#> [85] tidyselect_1.2.1 maketools_1.3.1
#> [87] Biostrings_2.75.1 knitr_1.49
#> [89] gridExtra_2.3 ProtGenerics_1.39.0
#> [91] SummarizedExperiment_1.37.0 xfun_0.49
#> [93] Biobase_2.67.0 matrixStats_1.4.1
#> [95] DT_0.33 stringi_1.8.4
#> [97] UCSC.utils_1.3.0 lazyeval_0.2.2
#> [99] yaml_2.3.10 evaluate_1.0.1
#> [101] codetools_0.2-20 interp_1.1-6
#> [103] tibble_3.2.1 BiocManager_1.30.25
#> [105] cli_3.6.3 rpart_4.1.23
#> [107] munsell_0.5.1 jquerylib_0.1.4
#> [109] Rcpp_1.0.13-1 dichromat_2.0-0.1
#> [111] dbplyr_2.5.0 png_0.1-8
#> [113] XML_3.99-0.17 parallel_4.4.2
#> [115] blob_1.2.4 prettyunits_1.2.0
#> [117] jpeg_0.1-10 latticeExtra_0.6-30
#> [119] AnnotationFilter_1.31.0 bitops_1.0-9
#> [121] VariantAnnotation_1.53.0 scales_1.3.0
#> [123] crayon_1.5.3 rlang_1.1.4
#> [125] KEGGREST_1.47.0