| Title: | R Interface to UniProt Web Services |
|---|---|
| Description: | The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. This package provides a collection of functions for retrieving, processing, and re-packaging UniProt web services. The package makes use of UniProt's modernized REST API and allows mapping of identifiers accross different databases. |
| Authors: | Marc Carlson [aut], Csaba Ortutay [ctb], Marcel Ramos [aut, cre] (ORCID: <https://orcid.org/0000-0002-3242-0582>) |
| Maintainer: | Marcel Ramos <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 2.53.0 |
| Built: | 2026-05-19 10:11:51 UTC |
| Source: | https://github.com/bioc/UniProt.ws |
These functions are the main workhorses for mapping identifiers from one database to another. They make use of the latest UniProt API (seen at https://www.uniprot.org/help/api).
allFromKeys() allToKeys(fromName = "UniProtKB_AC-ID") returnFields() mapUniProt( from = "UniProtKB_AC-ID", to = "UniRef90", columns = character(0L), query, verbose = FALSE, debug = FALSE, paginate = TRUE, pageSize = 500L ) queryUniProt( query = character(0L), fields = c("accession", "id"), collapse = c("OR", "AND"), n = Inf, pageSize = 25L )allFromKeys() allToKeys(fromName = "UniProtKB_AC-ID") returnFields() mapUniProt( from = "UniProtKB_AC-ID", to = "UniRef90", columns = character(0L), query, verbose = FALSE, debug = FALSE, paginate = TRUE, pageSize = 500L ) queryUniProt( query = character(0L), fields = c("accession", "id"), collapse = c("OR", "AND"), n = Inf, pageSize = 25L )
fromName |
|
from |
|
to |
|
columns, fields
|
|
query |
|
verbose |
|
debug |
|
paginate |
|
pageSize |
|
collapse |
|
n |
|
Note that mapUniProt is used internally by the select method
but made available for API queries with finer control. Provide values from
the name column in returnFields as the columns input in
either mapUniProt or select method.
When using from='Gene_Name', you may restrict the search results to a
specific organism by including e.g., taxId=9606 in the query as a
named list element. See examples below.
mapUniProt: A data.frame of returned results
allToKeys: A sorted character vector of possible "To" keytypes based
on the given "From" type
allFromKeys: A sorted character vector of
possible "From" keytypes
returnFields: A data.frame of entries for
the columns input in mapUniProt; see 'name' column
M. Ramos
mapUniProt( from="UniProtKB_AC-ID", to='RefSeq_Protein', query=c('P13368','Q9UM73','P97793','Q17192') ) mapUniProt( from='GeneID', to='UniProtKB', query=c('1','2','3','9','10') ) mapUniProt( from = "UniProtKB_AC-ID", to = "UniProtKB", columns = c("accession", "id"), query = list(organism_id = 10090, ids = c('Q7TPG8', 'P63318')) ) ## restrict 'from = Gene_Name' result to taxId 9606 mapUniProt( from = "Gene_Name", to = "UniProtKB-Swiss-Prot", columns = c("accession", "id"), query = list(taxId = 9606, ids = 'TP53') ) mapUniProt( from = "UniProtKB_AC-ID", to = "UniProtKB", columns = c("accession", "id", "xref_pdb", "xref_hgnc", "sequence"), query = c("P31946", "P62258") ) ## query as character queryUniProt( query = c("accession:A5YMT3", "organism_id:9606"), fields = c("accession", "id", "reviewed"), collapse = "AND" ) ## query as list queryUniProt( query = list(organism_id = 9606, gene_exact = "A2M"), fields = c( "id", "accession", "gene_primary", "organism_name", "protein_name", "reviewed" ), collapse = "OR", n = 3, pageSize = 3 ) allToKeys(fromName = "UniRef100") head(allFromKeys()) head(returnFields())mapUniProt( from="UniProtKB_AC-ID", to='RefSeq_Protein', query=c('P13368','Q9UM73','P97793','Q17192') ) mapUniProt( from='GeneID', to='UniProtKB', query=c('1','2','3','9','10') ) mapUniProt( from = "UniProtKB_AC-ID", to = "UniProtKB", columns = c("accession", "id"), query = list(organism_id = 10090, ids = c('Q7TPG8', 'P63318')) ) ## restrict 'from = Gene_Name' result to taxId 9606 mapUniProt( from = "Gene_Name", to = "UniProtKB-Swiss-Prot", columns = c("accession", "id"), query = list(taxId = 9606, ids = 'TP53') ) mapUniProt( from = "UniProtKB_AC-ID", to = "UniProtKB", columns = c("accession", "id", "xref_pdb", "xref_hgnc", "sequence"), query = c("P31946", "P62258") ) ## query as character queryUniProt( query = c("accession:A5YMT3", "organism_id:9606"), fields = c("accession", "id", "reviewed"), collapse = "AND" ) ## query as list queryUniProt( query = list(organism_id = 9606, gene_exact = "A2M"), fields = c( "id", "accession", "gene_primary", "organism_name", "protein_name", "reviewed" ), collapse = "OR", n = 3, pageSize = 3 ) allToKeys(fromName = "UniRef100") head(allFromKeys()) head(returnFields())
UniProt.ws is the base class for interacting with the UniProt
web services from Bioconductor.
UniProt.ws(taxId = 9606, ...) ## S4 method for signature 'UniProt.ws' show(object) ## S4 method for signature 'UniProt.ws' taxId(x) availableUniprotSpecies(pattern = "") lookupUniprotSpeciesFromTaxId(taxId) ## S4 replacement method for signature 'UniProt.ws' taxId(x) <- value ## S4 method for signature 'UniProt.ws' species(object)UniProt.ws(taxId = 9606, ...) ## S4 method for signature 'UniProt.ws' show(object) ## S4 method for signature 'UniProt.ws' taxId(x) availableUniprotSpecies(pattern = "") lookupUniprotSpeciesFromTaxId(taxId) ## S4 replacement method for signature 'UniProt.ws' taxId(x) <- value ## S4 method for signature 'UniProt.ws' species(object)
taxId |
|
... |
other arguments |
x, object
|
a |
pattern |
character string containing a regular expression
(or character string for |
value |
|
UniProt.ws is a class that is used to interact with the UniProt
web services. It makes use of AnnotationDbi methods similarly to
AnnotationDb objects.
The UniProt.ws will be loaded whenever you load the UniProt.ws package.
This object will be set up to retrieve information from Homo sapiens by
default, but this value can be changed to any of the species supported by
UniProt. The species and taxId methods allow users to see what species
is currently being accessed, and taxId<- allows them to change this
value.
species shows the genus and species label currently attached to the
UniProt.ws objects database.
taxId shows the NCBI taxonomy ID currently attached to the AnnotationDb
objects database. Using the equivalently names replace method (taxId<-)
allows the user to change the taxon ID, and the species represented along
with it.
availableUniprotSpecies is a helper function to list out the available
Species along with their official taxonomy IDs that are available by
UniProt. Because there are so many species represented at UniProt, there
is also a pattern argument that can be used to restrict the range of things
returned to be only those whose species names match the searth term. Please
remember when using this argument that the Genus is always capitalized and
the species never is.
lookupUniprotSpeciesFromTaxId is another helper that will look up the
species of any tax ID that is supported by UniProt.
species and lookupUniprotSpeciesFromTaxId each return a
character vector of possible values
taxId returns a numeric value that corresponds to the taxonomy ID
availableUniprotSpecies returns a data.frame
show(UniProt.ws): Show method for UniProt.ws objects
taxId(UniProt.ws): Get the taxonomy ID from a UniProt.ws object
taxId(UniProt.ws) <- value: Set or chnage the taxonomy ID for a UniProt.ws
object
species(UniProt.ws): Get the species name from a UniProt.ws object
Marc Carlson
## Make a UniProt.ws object up <- UniProt.ws(taxId=9606) ## look at the object up ## get the current species species(up) ## look up available species with their tax ids availableUniprotSpecies("musculus") ## get the current taxId taxId(up) ## look up the species that goes with a tax id lookupUniprotSpeciesFromTaxId(9606) ## set the taxId to something else taxId(up) <- 10090 up## Make a UniProt.ws object up <- UniProt.ws(taxId=9606) ## look at the object up ## get the current species species(up) ## look up available species with their tax ids availableUniprotSpecies("musculus") ## get the current taxId taxId(up) ## look up the species that goes with a tax id lookupUniprotSpeciesFromTaxId(9606) ## set the taxId to something else taxId(up) <- 10090 up
Various methods from AnnotationDbi such as select, columns,
keys, keytypes, and species are made available for UniProt.ws
objects.
## S4 method for signature 'UniProt.ws' keytypes(x) ## S4 method for signature 'UniProt.ws' columns(x) ## S4 method for signature 'UniProt.ws' keys(x, keytype) ## S4 method for signature 'UniProt.ws' select(x, keys, columns, keytype, ...)## S4 method for signature 'UniProt.ws' keytypes(x) ## S4 method for signature 'UniProt.ws' columns(x) ## S4 method for signature 'UniProt.ws' keys(x, keytype) ## S4 method for signature 'UniProt.ws' select(x, keys, columns, keytype, ...)
x |
a |
keytype |
|
keys |
|
columns |
|
... |
Additional arguments passed to lower level functions, mainly used
for the |
In much the same way as an AnnotationDb object allows acces to
select for many other annotation packages, UniProt.ws is meant to allow
usage of select methods and other supporting methods to enable the easy
extraction of data from the UniProt web services.
select, columns and keys are used together to extract data via an
UniProt.ws object.
columns shows which kinds of data can be returned for the UniProt.ws
object.
keytypes allows the user to discover which keytypes can be passed in to
select or keys via the keytype argument.
keys returns keys for the database contained in the UniProt.ws object .
By default it will return the primary keys for the database, which are
UniProtKB keys, but if used with the keytype argument, it will return the
keys from that keytype.
select will retrieve the data as a data.frame based on parameters for
selected keys and columns and keytype arguments.
keys,columns,keytypes, return a character vector
of possible values
select returns a data.frame
keytypes(UniProt.ws): Get keytypes for a UniProt.ws object
columns(UniProt.ws):
keys(UniProt.ws): Get keys for a UniProt.ws object
select(UniProt.ws): Select columns from keys
## Make a UniProt.ws object up <- UniProt.ws(taxId=9606) ## list the possible key types head(keytypes(up)) ## list of possible columns head(columns(up)) ## list all possible keys of type entrez gene ID egs <- keys(up, "GeneID") ## use select to extract some data res <- select( x = up, keys = c("22627","22629"), columns = c("xref_pdb","xref_hgnc","sequence"), keytype = "GeneID" ) res univals <- c("A0A0C5B5G6", "A0A1B0GTW7", "A0JNW5", "A0JP26", "A0PK11") res <- select( x = up, keys = univals, to = "Ensembl" ) res## Make a UniProt.ws object up <- UniProt.ws(taxId=9606) ## list the possible key types head(keytypes(up)) ## list of possible columns head(columns(up)) ## list all possible keys of type entrez gene ID egs <- keys(up, "GeneID") ## use select to extract some data res <- select( x = up, keys = c("22627","22629"), columns = c("xref_pdb","xref_hgnc","sequence"), keytype = "GeneID" ) res univals <- c("A0A0C5B5G6", "A0A1B0GTW7", "A0JNW5", "A0JP26", "A0PK11") res <- select( x = up, keys = univals, to = "Ensembl" ) res
UniProt uses custom coding of organism names from which protein sequences they store. These taxon names are used also in the protein names (not in the UniProt IDs!). These functions help to translate those names to standard scientific (Latin) taxon names and other useful identifiers.
taxname2species(): converts UniProt taxonomy names to scientific species names
taxname2taxid(): converts UniProt taxonomy names to NCBI Taxonomy IDs
taxname2domain(): converts UniProt taxonomy names to the following taxonomical domains:
'A' for archaea (=archaebacteria)
'B' for bacteria (=prokaryota or eubacteria)
'E' for eukaryota (=eukarya)
'V' for viruses and phages (=viridae)
'O' for others (such as artificial sequences)
taxname2species(taxname, specfile) taxname2taxid(taxname, specfile) taxname2domain(taxname, specfile)taxname2species(taxname, specfile) taxname2taxid(taxname, specfile) taxname2domain(taxname, specfile)
taxname |
Character string up to 6 uppercase characters, like HUMAN, MOUSE, or AERPX. Also works for a vector of such taxon names. |
specfile |
An optional local file where speclist.RData is saved from
UniProt.org. When |
taxname2species: a character vector of scientific taxon names
matching to the UniProt taxon names supplied as taxname.
taxname2taxid: a numeric vector of Taxonomy IDs matching to the
UniProt taxon names supplied as taxname.
taxname2domain: a character vector of one letter domain
symbols matching to the UniProt taxon names supplied as taxname.
Csaba Ortutay
UniProt controlled vocabulary of species, which defines the taxon names.
taxname2species("PIG") taxname2species(c("PIG","HUMAN","TRIHA")) taxname2taxid("PIG") taxname2taxid(c("PIG","HUMAN","TRIHA")) taxname2domain("PIG") taxname2domain(c("PIG","HUMAN","TRIHA"))taxname2species("PIG") taxname2species(c("PIG","HUMAN","TRIHA")) taxname2taxid("PIG") taxname2taxid(c("PIG","HUMAN","TRIHA")) taxname2domain("PIG") taxname2domain(c("PIG","HUMAN","TRIHA"))