Many proteins are present as alternate transcripts where the same gene is produces alternative forms of the protein through differential mRNA splicing or post-translational cleavage.
These are detailed in UniProt. When they are extracted by the UniProt
API, it gives lists of alternative forms followed by lists of features.
In order to plot each protein and the appropriate features, these need
to be separated in our dataframe. This is done using the
extract_transcripts()
function.
This Vignette shows how this works and gives an example.
The workflow using extract_transcripts() is:
extract_transcripts()
to generate a new
dataframeSteps 1 and 2 are illustrated in drawProteins Vignette so only step3 and the visualisation of step 4 will be shown here.
The NFkappaB transcription factor family contains two proteins that
are present in two forms. The dataframe obtained from Uniprot is
contained in the drawProtein package as “five_rel_data” and can be
loaded using the data()
function.
When loaded this has 320 obs of 9 variables and will plot five chains
as shown by checking the max(five_rel_data$order)
function.
To plot all the transcripts, a new dataframe is produced using the
extact_transcripts()
function. The new dataframe is called
prot_data and has 430 obs of 9 variables and will plot seven chains as
shown by checking the max(prot_data$order)
function.
[1] 5
# returns 5
# use extract_transcripts() to create a new data frame
prot_data <- extract_transcripts(five_rel_data)
max(prot_data$order)
[1] 7
Now, let’s check out the chains for the two objects for comparison purposes.
p1 <- draw_canvas(five_rel_data)
p1 <- draw_chains(p1, five_rel_data)
p1 <- p1 + ggtitle("Five chains plotted")
p2 <- draw_canvas(prot_data)
p2 <- draw_chains(p2, prot_data)
p2 <- p2 + ggtitle("Seven chains plotted")
p1
The appropriate domains and phosphorylation sites can be drawn correctly.
Note that the names of the different transcripts are the same so it’s wise to use the option customize the labels.
Here is the output of sessionInfo()
on the system on
which this document was compiled:
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.48 ggplot2_3.5.1 httr_1.4.7
[4] drawProteins_1.27.0 BiocStyle_2.35.0
loaded via a namespace (and not attached):
[1] gtable_0.3.6 jsonlite_1.8.9 highr_0.11
[4] dplyr_1.1.4 compiler_4.4.1 BiocManager_1.30.25
[7] tidyselect_1.2.1 jquerylib_0.1.4 scales_1.3.0
[10] yaml_2.3.10 fastmap_1.2.0 R6_2.5.1
[13] labeling_0.4.3 generics_0.1.3 curl_5.2.3
[16] tibble_3.2.1 maketools_1.3.1 munsell_0.5.1
[19] bslib_0.8.0 pillar_1.9.0 rlang_1.1.4
[22] utf8_1.2.4 cachem_1.1.0 xfun_0.48
[25] sass_0.4.9 sys_3.4.3 cli_3.6.3
[28] withr_3.0.2 magrittr_2.0.3 digest_0.6.37
[31] grid_4.4.1 lifecycle_1.0.4 vctrs_0.6.5
[34] evaluate_1.0.1 glue_1.8.0 farver_2.1.2
[37] buildtools_1.0.0 fansi_1.0.6 colorspace_2.1-1
[40] rmarkdown_2.28 tools_4.4.1 pkgconfig_2.0.3
[43] htmltools_0.5.8.1