In this vignette, we describe a maser workflow for annotation and visualization of protein features affected by splicing.
Integration of protein features to splicing events may reveal the impact of alternative splicing on protein function. We developed maser to enable systematic mapping of protein annotation from UniprotKB to splicing events.
Protein features can be annotated and visualized along with transcripts affected by the splice event. In this manner, maser can identify whether the splicing is affecting regions of interest containing known domains or motifs, mutations, post-translational modification and other described protein structural features.
We illustrate the workflow using the hypoxia dataset from the previous vignette.
Available UniprotKB protein annotation can be queried using
availableFeaturesUniprotKB()
. Currently there are 30
distinct features grouped into broader categories, including Domain
and Sites, PTM (Post-translational modifications), Molecule Processing,
Topology, Mutagenesis and Structural Features.
Name | Description | Category |
---|---|---|
DNA-bind | UniProtKB DNA binding site sequence annotations | Domain_and_Sites |
Zn-fing | UniProtKB Zinc finger sequence annotations | Domain_and_Sites |
act-site | UniProtKB active site sequence annotations | Domain_and_Sites |
binding | UniProtKB binding site sequence annotations | Domain_and_Sites |
coiled | UniProtKB coiled coil sequence annotations | Domain_and_Sites |
domain | UniProtKB domain sequence annotations | Domain_and_Sites |
motif | UniProtKB motif of interest sequence annotations | Domain_and_Sites |
region | UniProtKB region of interest sequence annotations | Domain_and_Sites |
repeat | UniProtKB repeated motifs or domains sequence annotations | Domain_and_Sites |
site | UniProtKB single amino acid site sequence annotations | Domain_and_Sites |
Protein feature annotation of splicing events is performed in two steps.
mapTranscriptsToEvents()
to add both transcript and
protein IDs to all events in the maser object.mapProteinFeaturesToEvents()
for annotation
specifying UniprotKB features or categories.mapTranscriptsToEvents()
identifies transcripts
compatible with the splicing event by overlapping exons involved in
splicing to the gene models provided in the Ensembl GTF. Each type of
splice event applies a specific overlapping rule (described in the
Introduction vignette). The function also maps transcripts to their
corresponding protein identifiers in Uniprot when available.
mapTranscriptsToEvents()
requires an Ensembl or Gencode
GTF using the hg38 build of the human genome. Ensembl GTFs can be
retrieved using AnnotationHub
or imported using import.gff()
from the rtracklayer
package. Several GTF releases are available, and maser is
compatible with any version using the hg38 build.
We are using reduced GTF extracted from Ensembl Release 85 for running examples.
## Ensembl GTF annotation
gtf_path <- system.file("extdata", file.path("GTF","Ensembl85_examples.gtf.gz"),
package = "maser")
ens_gtf <- rtracklayer::import.gff(gtf_path)
In the second step, mapProteinFeaturesToEvents()
retrieves data from UniprotKB and overlaps splicing events to genomic
coordinates of protein features.
The splicing factor SRSF6 undergoes splicing during hypoxia by expressing an alternative exon. We will annotate the exon skipping event with domain, sites and topology information. The first step is to obtain a maser object containing SRSF6 splicing information, and then map transcripts to splicing events.
# Retrieve gene specific splicing events
srsf6_events <- geneEvents(hypoxia_filt, "SRSF6")
srsf6_events
#> A Maser object with 1 splicing events.
#>
#> Samples description:
#> Label=Hypoxia 0h n=3 replicates
#> Label=Hypoxia 24h n=3 replicates
#>
#> Splicing events:
#> A3SS.......... 0 events
#> A5SS.......... 0 events
#> SE.......... 1 events
#> RI.......... 0 events
#> MXE.......... 0 events
If transcript mapping worked correctly, Ensembl and Uniprot
identifiers will be added to splicing events. Possible NA
values indicates non-protein coding transcripts. In this case, the
splicing involves two Ensembl transcripts coding for the Q13247 isoform
of SRSF6.
ID | GeneID | geneSymbol | txn_3exons | txn_2exons | list_ptn_a | list_ptn_b |
---|---|---|---|---|---|---|
33209 | ENSG00000124193.14 | SRSF6 | ENST00000483871 | ENST00000244020 | Q13247 | Q13247 |
Now we are ready to call mapProteinFeaturesToEvents()
for annotation. Feature annotation can be interactively displayed in a
web browser using display()
or retrieved as a
data.frame
using annotation()
.
mapProteinFeaturesToEvents()
will add extra columns
describing the feature name, feature description and protein identifiers
for which the annotation has been assigned. Possible NA
values indicate the particular feature is not annotated for the splice
event.
# Annotate splicing events with protein features
srsf6_annot <- mapProteinFeaturesToEvents(srsf6_mapped, c("Domain_and_Sites", "Topology"), by="category")
ID | GeneID | geneSymbol | txn_3exons | txn_2exons | list_ptn_a | list_ptn_b | DNA-bind | Zn-fing | act-site | binding | coiled | domain | motif | region | repeat | site | intramem | topo-dom | transmem |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
33209 | ENSG00000124193.14 | SRSF6 | ENST00000483871 | ENST00000244020 | Q13247 | Q13247 | NA | NA | NA | NA | NA | Q13247:M1-G72;RRM1,A0A590UJK4:P2-G72;RRM,A0A590UJP7:P2-G72;RRM,A0A590UK01:P2-G72;RRM,A0A590UK80:X1-G66;RRM,A0A590UJK4:Y110-P183;RRM,A0A590UJP7:Y110-P183;RRM,A0A590UK01:Y110-P183;RRM,Q13247:Y110-P183;RRM2,A0A590UK80:Y104-P177;RRM | NA | A0A590UJK4:R75-G103;Disordered,A0A590UJP7:R75-G103;Disordered,A0A590UK01:R75-G103;Disordered,Q13247:R75-G103;Disordered,A0A590UK80:R69-G97;Disordered | NA | NA | NA | NA | NA |
By inspecting the results, we see that the SRSF6 exon skipping event
is annotated with the Uniprot features domain, chain and mod-res
(modidifed residue). Visualization of the splice event,
transcripts and protein features is performed with
plotUniprotKBFeatures()
. In this example, exons in the
splice event overlap the Serine/arginine-rich splicing factor 6 region
of the protein, while the upstream exon and downstream exons are
overlapping the RRM1 and RRM2 domains of SRSF6, respectively.
RIPK2 has an exon skipping event in the hypoxia dataset. Following the example above, we map transcripts to splicing events and annotate protein features overlapping the splice event. We find out that the alternative exon overlaps the kinase domain of the protein, thus possibly changing the configuration of this domain during hypoxia. The ATP and proton acceptor binding sites are overlapping exons flanking the alternative exon.
Here is the output of sessionInfo()
on the system on
which this document was compiled:
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] rtracklayer_1.67.0 maser_1.25.0 GenomicRanges_1.59.1
#> [4] GenomeInfoDb_1.43.1 IRanges_2.41.1 S4Vectors_0.45.2
#> [7] BiocGenerics_0.53.3 generics_0.1.3 ggplot2_3.5.1
#> [10] BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 sys_3.4.3
#> [3] rstudioapi_0.17.1 jsonlite_1.8.9
#> [5] magrittr_2.0.3 GenomicFeatures_1.59.1
#> [7] farver_2.1.2 rmarkdown_2.29
#> [9] BiocIO_1.17.0 zlibbioc_1.52.0
#> [11] vctrs_0.6.5 memoise_2.0.1
#> [13] Rsamtools_2.23.0 RCurl_1.98-1.16
#> [15] base64enc_0.1-3 htmltools_0.5.8.1
#> [17] S4Arrays_1.7.1 progress_1.2.3
#> [19] curl_6.0.1 SparseArray_1.7.2
#> [21] Formula_1.2-5 sass_0.4.9
#> [23] bslib_0.8.0 htmlwidgets_1.6.4
#> [25] plyr_1.8.9 Gviz_1.51.0
#> [27] httr2_1.0.6 cachem_1.1.0
#> [29] buildtools_1.0.0 GenomicAlignments_1.43.0
#> [31] lifecycle_1.0.4 pkgconfig_2.0.3
#> [33] Matrix_1.7-1 R6_2.5.1
#> [35] fastmap_1.2.0 GenomeInfoDbData_1.2.13
#> [37] MatrixGenerics_1.19.0 digest_0.6.37
#> [39] colorspace_2.1-1 AnnotationDbi_1.69.0
#> [41] crosstalk_1.2.1 Hmisc_5.2-0
#> [43] RSQLite_2.3.8 labeling_0.4.3
#> [45] filelock_1.0.3 fansi_1.0.6
#> [47] httr_1.4.7 abind_1.4-8
#> [49] compiler_4.4.2 bit64_4.5.2
#> [51] withr_3.0.2 htmlTable_2.4.3
#> [53] backports_1.5.0 BiocParallel_1.41.0
#> [55] DBI_1.2.3 biomaRt_2.63.0
#> [57] rappdirs_0.3.3 DelayedArray_0.33.2
#> [59] rjson_0.2.23 tools_4.4.2
#> [61] foreign_0.8-87 nnet_7.3-19
#> [63] glue_1.8.0 restfulr_0.0.15
#> [65] grid_4.4.2 checkmate_2.3.2
#> [67] reshape2_1.4.4 cluster_2.1.6
#> [69] gtable_0.3.6 BSgenome_1.75.0
#> [71] ensembldb_2.31.0 data.table_1.16.2
#> [73] hms_1.1.3 xml2_1.3.6
#> [75] utf8_1.2.4 XVector_0.47.0
#> [77] pillar_1.9.0 stringr_1.5.1
#> [79] dplyr_1.1.4 BiocFileCache_2.15.0
#> [81] lattice_0.22-6 deldir_2.0-4
#> [83] bit_4.5.0 biovizBase_1.55.0
#> [85] tidyselect_1.2.1 maketools_1.3.1
#> [87] Biostrings_2.75.1 knitr_1.49
#> [89] gridExtra_2.3 ProtGenerics_1.39.0
#> [91] SummarizedExperiment_1.37.0 xfun_0.49
#> [93] Biobase_2.67.0 matrixStats_1.4.1
#> [95] DT_0.33 stringi_1.8.4
#> [97] UCSC.utils_1.3.0 lazyeval_0.2.2
#> [99] yaml_2.3.10 evaluate_1.0.1
#> [101] codetools_0.2-20 interp_1.1-6
#> [103] tibble_3.2.1 BiocManager_1.30.25
#> [105] cli_3.6.3 rpart_4.1.23
#> [107] munsell_0.5.1 jquerylib_0.1.4
#> [109] Rcpp_1.0.13-1 dichromat_2.0-0.1
#> [111] dbplyr_2.5.0 png_0.1-8
#> [113] XML_3.99-0.17 parallel_4.4.2
#> [115] blob_1.2.4 prettyunits_1.2.0
#> [117] jpeg_0.1-10 latticeExtra_0.6-30
#> [119] AnnotationFilter_1.31.0 bitops_1.0-9
#> [121] VariantAnnotation_1.53.0 scales_1.3.0
#> [123] crayon_1.5.3 rlang_1.1.4
#> [125] KEGGREST_1.47.0