if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("UniProt.ws")
uniport.ws
The UniProt.ws
package provides a select
interface to the UniProt web
service.
## Warning: multiple methods tables found for 'setequal'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'AnnotationDbi'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'IRanges'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'Biostrings'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'XVector'
## Warning: replacing previous import 'BiocGenerics::setequal' by
## 'S4Vectors::setequal' when loading 'GenomeInfoDb'
## Warning: multiple methods tables found for 'setequal'
If you already know about the select interface, you can immediately learn about the various methods for this object by just looking it’s the help page.
When you load the UniProt.ws
package, it creates a UniProt.ws
object. If you look at the
object you will see some helpful information about it.
## UniProt.ws interface object:
## Taxonomy ID: 9606
## Species name: Homo sapiens (Human)
## List species with 'availableUniprotSpecies()'
By default, you can see that the UniProt.ws
object is
set to retrieve records from Homo sapiens. But you can change that of
course. In order to change it, you first need to look up the appropriate
taxonomy ID for the species that you are interested in. Uniprot provides
support for over 20 thousand species, so there are a few to choose from!
In order to make this easier, we have provided the helper function
availableUniprotSpecies
which will list all the supported
species along with their taxonomy ids. When you call the
availableUniprotSpecies
function, it’s recommended that you
make use of the pattern argument to limit your queries like this:
## adding rname 'https://www.UniProt.org/docs/speclist.txt'
## kingdom Taxon Node Official (scientific) name
## ANTMS E 520121 Anthocoris musculus
## ANTMU E 208057 Anthoscopus musculus
## APOMU E 238007 Apomys musculus
## BAIMU E 213557 Baiomys musculus
## BALMU E 9771 Balaenoptera musculus
## BLEMU E 197864 Blepharisma musculus
## MOUSE E 10090 Mus musculus
## MUSMB E 35531 Mus musculus bactrianus
## MUSMC E 10091 Mus musculus castaneus
## MUSMM E 57486 Mus musculus molossinus
## MUSMS E 186842 Mus musculus x Mus spretus
## MUSMX E 477816 Mus musculus musculus x Mus musculus castaneus
## POVM1 V 1891730 Mus musculus polyomavirus 1
Once you have learned the taxonomy ID for the species of interest,
you can then change the taxonomy id for the UniProt.ws
object using taxId
setter or by calling the constructor for
UniProt.ws
## UniProt.ws interface object:
## Taxonomy ID: 10090
## Species name: Mus musculus (Mouse)
## List species with 'availableUniprotSpecies()'
As you can see the species is different for the mouseUp
new object.
UniProt.ws
Once you are safisfied that you have an uniport.ws
that
is using the appropriate organsims, you can make use of the standard set
of methods in a select
interface. Specifically:
columns
, keytypes
, keys
and
select
.
You will probably notice that there are a large number of columns that can be retrieved.
## [1] "Allergome" "ArachnoServer" "Araport" "BioCyc"
## [5] "BioGRID" "BioMuta"
And most (but not all) of these fields can also be used as keytypes.
## [1] "absorption" "accession"
## [3] "annotation_score" "cc_activity_regulation"
## [5] "cc_allergen" "cc_alternative_products"
If necessary you can also look up the keys of a given type. But please be warned that the web service is slow at this particular kind of lookup. So if you really want to do this kind of operation you are probably going to want to save the result to your R session.
Finally, you can loop up whatever combinations of columns, keytypes
and keys that you need when using select
.
Note. ‘ENTREZ_GENE’ is now ‘GeneID’
keys <- c("1","2")
columns <- c("xref_pdb", "xref_hgnc", "sequence")
kt <- "GeneID"
res <- select(up, keys, columns, kt)
res
## From Entry PDB
## 1 1 P04217 <NA>
## 2 1 V9HWD8 <NA>
## 3 2 P01023 1BV8;2P9R;6TAV;7O7L;7O7M;7O7N;7O7O;7O7P;7O7Q;7O7R;7O7S;7VON;7VOO;
## HGNC
## 1 HGNC:5;
## 2 <NA>
## 3 HGNC:7;
## Sequence
## 1 MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKPLANVTLTCQAHLETPDFQLFKNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTGWTQLSKLLELTGPKSLPAPWLSMAPVSWITPGLKTTAVCRGVLRGVTFLLRREGDHEFLEVPEAQEDVEATFPVHQPGNYSCSYRTDGEGALSEPSATVTIEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVDFQLRRGEKELLVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELILSDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQSPAGTEALFELHNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPPPRPQLRATWSGAVLAGRDAVLRCEGPIPDVTFELLREGETKAVKTVRTPGAAANLELIFVGPQHAGNYRCRYRSWVPHTFESELSDPVELLVAES
## 2 MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKPLANVTLTCQAHLETPDFQLFKNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTGWTQLSKLLELTGPKSLPAPWLSMAPVSWITPGLKTTAVCRGVLRGVTFLLRREGDHEFLEVPEAQEDVEATFPVHQPGNYSCSYRTDGEGALSEPSATVTIEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVDFQLRRGEKELLVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELILSDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQSPAGTEALFELHNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPPPRPQLRATWSGAVLAGRDAVLRCEGPIPDVTFELLREGETKAVKTVRTPGAAANLELIFVGPQHAGNYRCRYRSWVPHTFESELSDPVELLVAES
## 3 MGKNKLLHPSLVLLLLVLLPTDASVSGKPQYMVLVPSLLHTETTEKGCVLLSYLNETVTVSASLESVRGNRSLFTDLEAENDVLHCVAFAVPKSSSNEEVMFLTVQVKGPTQEFKKRTTVMVKNEDSLVFVQTDKSIYKPGQTVKFRVVSMDENFHPLNELIPLVYIQDPKGNRIAQWQSFQLEGGLKQFSFPLSSEPFQGSYKVVVQKKSGGRTEHPFTVEEFVLPKFEVQVTVPKIITILEEEMNVSVCGLYTYGKPVPGHVTVSICRKYSDASDCHGEDSQAFCEKFSGQLNSHGCFYQQVKTKVFQLKRKEYEMKLHTEAQIQEEGTVVELTGRQSSEITRTITKLSFVKVDSHFRQGIPFFGQVRLVDGKGVPIPNKVIFIRGNEANYYSNATTDEHGLVQFSINTTNVMGTSLTVRVNYKDRSPCYGYQWVSEEHEEAHHTAYLVFSPSKSFVHLEPMSHELPCGHTQTVQAHYILNGGTLLGLKKLSFYYLIMAKGGIVRTGTHGLLVKQEDMKGHFSISIPVKSDIAPVARLLIYAVLPTGDVIGDSAKYDVENCLANKVDLSFSPSQSLPASHAHLRVTAAPQSVCALRAVDQSVLLMKPDAELSASSVYNLLPEKDLTGFPGPLNDQDNEDCINRHNVYINGITYTPVSSTNEKDMYSFLEDMGLKAFTNSKIRKPKMCPQLQQYEMHGPEGLRVGFYESDVMGRGHARLVHVEEPHTETVRKYFPETWIWDLVVVNSAGVAEVGVTVPDTITEWKAGAFCLSEDAGLGISSTASLRAFQPFFVELTMPYSVIRGEAFTLKATVLNYLPKCIRVSVQLEASPAFLAVPVEKEQAPHCICANGRQTVSWAVTPKSLGNVNFTVSAEALESQELCGTEVPSVPEHGRKDTVIKPLLVEPEGLEKETTFNSLLCPSGGEVSEELSLKLPPNVVEESARASVSVLGDILGSAMQNTQNLLQMPYGCGEQNMVLFAPNIYVLDYLNETQQLTPEIKSKAIGYLNTGYQRQLNYKHYDGSYSTFGERYGRNQGNTWLTAFVLKTFAQARAYIFIDEAHITQALIWLSQRQKDNGCFRSSGSLLNNAIKGGVEDEVTLSAYITIALLEIPLTVTHPVVRNALFCLESAWKTAQEGDHGSHVYTKALLAYAFALAGNQDKRKEVLKSLNEEAVKKDNSVHWERPQKPKAPVGHFYEPQAPSAEVEMTSYVLLAYLTAQPAPTSEDLTSATNIVKWITKQQNAQGGFSSTQDTVVALHALSKYGAATFTRTGKAAQVTIQSSGTFSSKFQVDNNNRLLLQQVSLPELPGEYSMKVTGEGCVYLQTSLKYNILPEKEEFPFALGVQTLPQTCDEPKAHTSFQISLSVSYTGSRSASNMAIVDVKMVSGFIPLKPTVKMLERSNHVSRTEVSSNHVLIYLDKVSNQTLSLFFTVLQDVPVRDLKPAIVKVYDYYETDEFAIAEYNAPCSKDLGNA
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] UniProt.ws_2.47.1 RSQLite_2.3.7 BiocGenerics_0.53.1
## [4] generics_0.1.3 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.47.0 xfun_0.49 bslib_0.8.0
## [4] Biobase_2.67.0 rjsoncons_1.3.1 vctrs_0.6.5
## [7] tools_4.4.2 stats4_4.4.2 curl_6.0.0
## [10] tibble_3.2.1 fansi_1.0.6 AnnotationDbi_1.69.0
## [13] blob_1.2.4 pkgconfig_2.0.3 BiocBaseUtils_1.9.0
## [16] dbplyr_2.5.0 S4Vectors_0.45.0 lifecycle_1.0.4
## [19] GenomeInfoDbData_1.2.13 compiler_4.4.2 Biostrings_2.75.0
## [22] progress_1.2.3 GenomeInfoDb_1.43.0 htmltools_0.5.8.1
## [25] sys_3.4.3 buildtools_1.0.0 sass_0.4.9
## [28] yaml_2.3.10 pillar_1.9.0 crayon_1.5.3
## [31] jquerylib_0.1.4 cachem_1.1.0 tidyselect_1.2.1
## [34] digest_0.6.37 dplyr_1.1.4 purrr_1.0.2
## [37] maketools_1.3.1 fastmap_1.2.0 cli_3.6.3
## [40] magrittr_2.0.3 utf8_1.2.4 httpcache_1.2.0
## [43] withr_3.0.2 prettyunits_1.2.0 filelock_1.0.3
## [46] UCSC.utils_1.3.0 bit64_4.5.2 rmarkdown_2.29
## [49] XVector_0.47.0 httr_1.4.7 bit_4.5.0
## [52] png_0.1-8 hms_1.1.3 memoise_2.0.1
## [55] evaluate_1.0.1 knitr_1.48 IRanges_2.41.0
## [58] BiocFileCache_2.15.0 rlang_1.1.4 glue_1.8.0
## [61] DBI_1.2.3 BiocManager_1.30.25 jsonlite_1.8.9
## [64] R6_2.5.1 zlibbioc_1.52.0