The package immunogenViewer
is meant to support
researchers in comparing and choosing suitable antibodies provided that
information on the immunogen used to raise the antibody is available.
When the immunogen of an antibody is known, its binding site within the
protein antigen is defined and can be examined in detail. As antibodies
raised against peptide immunogens often do not function properly when
used to detect natively folded proteins (Brown et
al. 2011), examination of the position of the immunogen within
the full-length protein can provide insights. Using
immunogenViewer
provides an easy approach to visualize,
evaluate and compare immunogens within the full-length sequence of a
protein. Information on structural and functional annotations of the
immunogen and thus antibody binding site can tell the user if an
antibody is potentially useful for native protein detection (Trier, Hansen, and Houen 2012; Waury et al.
2022).
Specifically, immunogenViewer
can be used to retrieve
protein features for a protein of interest using an API call to the UniProtKB (Bateman et al. 2022) and PredictProtein (Bernhofer et al. 2021) databases. The features
are saved on a per-residue level in a dataframe. One or several
immunogens can be associated with the protein. The immunogen(s) can then
be visualized and evaluated regarding their structure and other
annotations that can influence successful antibody recognition within
the full-length protein. A summary report of the immunogen can be
created to easily compare and select favorable immunogens and their
respective antibodies. This package should be used as a pre-selection
step to exclude unsuitable antibodies early on. It does not replace
comprehensive antibody validation. For more information on validation,
please refer to other excellent resources (Roncador et al. 2015; Voskuil et al. 2020).
The package can be installed directly from Bioconductor.
To retrieve the features for the protein of interest the correct
UniProt ID (also known as accession number) is required. If the UniProt
ID is not known yet, one can search the UniProtKB using the gene or protein
name. Be sure to select the UniProt ID of the correct organism and
preferable search within reviewed SwissProt entries instead of
unreviewed TrEMBL entries. Our example protein is the human protein
TREM2 (UniProt ID: Q9NZC2). Using
getProteinFeatures()
relevant features from UniProt and
PredictProtein are retrieved. Interaction with UniProt is done using the
Bioconductor package UniProt.ws.
To see how the dataframe is structured, we will look at the returned
dataframe.
After creating the protein dataframe using
getProteinFeatures()
immunogens to be visualized and
evaluated can be added to the dataframe. For this purpose, we use
addImmunogen()
. With every call to the function one
immunogen can be added to the protein dataframe. Besides the protein
dataframe, we need to define the immunogen to be added by supplying the
start and end position of the immunogen and a name.
Searching antibody database Antibodypedia, three antibodies are identified that were raised against known immunogens peptide. These immunogens are added to the dataframe by defining their start and end position or the immunogen peptide sequence within the full protein sequence and naming them after their catalog identifiers. Each immunogen is added as an additional column to the protein dataframe, the immunogen name is used as the column name.
protein <- addImmunogen(protein, start = 142, end = 192, name = "ABIN2783734_")
protein <- addImmunogen(protein, start = 196, end = 230, name = "HPA010917")
protein <- addImmunogen(protein, seq = "HGQKPGTHPPSELD", name = "EB07921")
# check that immunogens were added as columns
colnames(protein)
[1] "Uniprot" "Position" "Residue"
[4] "SecondaryStructure" "SolventAccessibility" "Membrane"
[7] "ProteinBinding" "Disorder" "PTM"
[10] "DisulfideBridge" "ABIN2783734_" "HPA010917"
[13] "EB07921"
Already added immunogens can be renamed using
renameImmunogen()
if the provided start and end position
are correct but the name should be updated. This way a typo can be
corrected or a more informative name added instead of re-adding the
immunogen. The column name in the protein dataframe is then updated.
protein <- renameImmunogen(protein, oldName = "ABIN2783734_", newName = "ABIN2783734")
# check that immunogen name was updated
colnames(protein)
[1] "Uniprot" "Position" "Residue"
[4] "SecondaryStructure" "SolventAccessibility" "Membrane"
[7] "ProteinBinding" "Disorder" "PTM"
[10] "DisulfideBridge" "ABIN2783734" "HPA010917"
[13] "EB07921"
A previously added immunogen can be removed from the protein
dataframe using removeImmunogen()
. The corresponding column
is dropped from the protein dataframe.
After retrieval of the protein features and adding the relevant immunogens correctly, the full protein sequence can be plotted with the features and the immunogens annotated along the sequence. The plot allows to understand the position of the immunogen peptide within the full-length sequence as well as identify relevant obstacles within the protein that might hinder or limit successful antibody binding.
If interested in one specific immunogen, one can visualize the relevant part of the protein sequence. In this plot the amino acid sequence of the immunogen is shown along the x axis while the same features as in the protein plot are included.
Apart from visualizing specific immunogens, it is also possible to summarize the protein features within a specific immunogen. This can either be done for an immunogen of interest or for all immunogens added to a protein dataframe at once. The output is a summary dataframe that can be sorted by the feature columns. By sorting the most suitable immunogen, e.g., with the highest fraction of exposed residues, can be selected.
immunogens <- evaluateImmunogen(protein)
[1] "No immunogen specified, evaluating all immunogens."
[1] "Immunogen name: SecondaryStructure"
[1] "Immunogen sequence: (Residues Inf - -Inf)"
[1] "Immunogen name: SolventAccessibility"
[1] "Immunogen sequence: (Residues Inf - -Inf)"
[1] "Immunogen name: ProteinBinding"
[1] "Immunogen sequence: SMKTHNLWLLCQSLH (Residues 40 - 114)"
[1] "Immunogen name: DisulfideBridge"
[1] "Immunogen sequence: CCCC (Residues 36 - 110)"
[1] "Immunogen name: ABIN2783734"
[1] "Immunogen sequence: WFPGESESFEDAHVEHSISRSLLEGEIPFPPTSILLLLACIFLIKILAASA (Residues 142 - 192)"
[1] "Immunogen name: EB07921"
[1] "Immunogen sequence: HGQKPGTHPPSELD (Residues 199 - 212)"
# check summary dataframe
DT::datatable(immunogens, width = "80%", options = list(scrollX = TRUE))
getProteinFeatures()
the taxonomy ID
for the protein’s species has to be set. The default is human (ID:
9606). If the protein of interest is from a different species, the
correct taxonomy ID must be set as a parameter.sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] immunogenViewer_1.1.0 BiocStyle_2.35.0
loaded via a namespace (and not attached):
[1] KEGGREST_1.47.0 gtable_0.3.6 xfun_0.49
[4] bslib_0.8.0 ggplot2_3.5.1 htmlwidgets_1.6.4
[7] Biobase_2.67.0 crosstalk_1.2.1 rjsoncons_1.3.1
[10] vctrs_0.6.5 tools_4.4.2 generics_0.1.3
[13] stats4_4.4.2 curl_6.0.1 tibble_3.2.1
[16] AnnotationDbi_1.69.0 RSQLite_2.3.9 blob_1.2.4
[19] pkgconfig_2.0.3 BiocBaseUtils_1.9.0 dbplyr_2.5.0
[22] S4Vectors_0.45.2 lifecycle_1.0.4 GenomeInfoDbData_1.2.13
[25] farver_2.1.2 compiler_4.4.2 Biostrings_2.75.3
[28] progress_1.2.3 munsell_0.5.1 GenomeInfoDb_1.43.2
[31] htmltools_0.5.8.1 sys_3.4.3 buildtools_1.0.0
[34] sass_0.4.9 yaml_2.3.10 pillar_1.10.0
[37] crayon_1.5.3 jquerylib_0.1.4 DT_0.33
[40] cachem_1.1.0 tidyselect_1.2.1 digest_0.6.37
[43] dplyr_1.1.4 labeling_0.4.3 maketools_1.3.1
[46] UniProt.ws_2.47.1 fastmap_1.2.0 grid_4.4.2
[49] colorspace_2.1-1 cli_3.6.3 magrittr_2.0.3
[52] patchwork_1.3.0 httpcache_1.2.0 withr_3.0.2
[55] prettyunits_1.2.0 filelock_1.0.3 UCSC.utils_1.3.0
[58] scales_1.3.0 bit64_4.5.2 rmarkdown_2.29
[61] XVector_0.47.1 httr_1.4.7 bit_4.5.0.1
[64] png_0.1-8 hms_1.1.3 memoise_2.0.1
[67] evaluate_1.0.1 knitr_1.49 IRanges_2.41.2
[70] BiocFileCache_2.15.0 rlang_1.1.4 glue_1.8.0
[73] DBI_1.2.3 BiocManager_1.30.25 BiocGenerics_0.53.3
[76] jsonlite_1.8.9 R6_2.5.1