Title: | Querying annotation data from the high performance Cellbase web |
---|---|
Description: | This R package makes use of the exhaustive RESTful Web service API that has been implemented for the Cellabase database. It enable researchers to query and obtain a wealth of biological information from a single database saving a lot of time. Another benefit is that researchers can easily make queries about different biological topics and link all this information together as all information is integrated. |
Authors: | Mohammed OE Abdallah |
Maintainer: | Mohammed OE Abdallah <[email protected]> |
License: | Apache License (== 2.0) |
Version: | 1.31.0 |
Built: | 2024-10-30 04:33:25 UTC |
Source: | https://github.com/bioc/cellbaseR |
Querying annotation data from the high performance Cellbase web services
Documentation for the cellbaseR package
This R package makes use of the exhaustive RESTful Web service API that has been implemented for the Cellabase database. It enables researchers to query and obtain a wealth of biological information from a single database saving a lot of time. Another benefit is that researchers can easily make queries about different biological topics and link all this information together as all information is integrated. Currently Homo sapiens, Mus musculus and other 20 species are available and many others will be included soon. Results returned from the cellbase queries are parsed into R data.frames and other common R data strctures so users can readily get into downstream anaysis.
Mohammed OE Abdallah
Useful links:
This method is a convience method to annotate bgzipped tabix-indexed vcf files. It should be ideal for annotating small to medium sized vcf files.
## S4 method for signature 'CellBaseR' AnnotateVcf(object, file, batch_size, num_threads, BPPARAM = bpparam())
## S4 method for signature 'CellBaseR' AnnotateVcf(object, file, batch_size, num_threads, BPPARAM = bpparam())
object |
an object of class CellBaseR |
file |
Path to a bgzipped and tabix indexed vcf file |
batch_size |
intger if multiple queries are raised by a single method call, e.g. getting annotation info for several genes, queries will be sent to the server in batches. This slot indicates the size of each batch, e.g. 200 |
num_threads |
number of asynchronus batches to be sent to the server |
BPPARAM |
a BiocParallel class object |
a dataframe with the results of the query
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
cb <- CellBaseR() fl <- system.file("extdata", "hapmap_exome_chr22_200.vcf.gz", package = "cellbaseR" ) res <- AnnotateVcf(object=cb, file=fl, BPPARAM = bpparam(workers=2),batch_size=100)
cb <- CellBaseR() fl <- system.file("extdata", "hapmap_exome_chr22_200.vcf.gz", package = "cellbaseR" ) res <- AnnotateVcf(object=cb, file=fl, BPPARAM = bpparam(workers=2),batch_size=100)
CellBaseParam object is used to control what results are returned from the CellBaseR methods
CellBaseParam( assembly = character(), feature = character(), region = character(), rsid = character(), accession = character(), type = character(), mode_inheritance_labels = character(), clinsig_labels = character(), alleleOrigin = character(), consistency_labels = character(), so = character(), source = character(), trait = character(), include = character(), exclude = character(), limit = character() )
CellBaseParam( assembly = character(), feature = character(), region = character(), rsid = character(), accession = character(), type = character(), mode_inheritance_labels = character(), clinsig_labels = character(), alleleOrigin = character(), consistency_labels = character(), so = character(), source = character(), trait = character(), include = character(), exclude = character(), limit = character() )
assembly |
A character the assembly build to query, e.g.GRCh37(default) |
feature |
A character vector denoting the feature/s to be queried |
region |
A character vector denoting the region/s to be queried must be in the form 1:100000-1500000 |
rsid |
A character vector denoting the rs ids to be queried |
accession |
A caharcter vector of Cinvar accessions |
type |
A caharcter vector of Variant types |
mode_inheritance_labels |
A character vector |
clinsig_labels |
A character vector |
alleleOrigin |
A character vector |
consistency_labels |
A character vector |
so |
A character vector denoting sequence ontology to be queried |
source |
A character vector |
trait |
A character vector denoting the trait to be queried |
include |
A character vector denoting the fields to be returned |
exclude |
A character vector denoting the fields to be excluded |
limit |
A number limiting the number of results to be returned |
an object of class CellBaseParam
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
cbParam <- CellBaseParam(assembly="GRCh38",feature=c("TP73","TET1")) print(cbParam)
cbParam <- CellBaseParam(assembly="GRCh38",feature=c("TP73","TET1")) print(cbParam)
This class defines a CellBaseParam object to hold filtering parameters.
This class stores parameters used for filtering the CellBaseR query and is avaialable for all query methods. CellBaseParam object is used to control what results are returned from the' CellBaseR methods
assembly
A character the assembly build to query, e.g.GRCh37(default)
feature
A character vector denoting the feature/s to be queried
region
A character vector denoting the region/s to be queried must be in the form 1:100000-1500000
rsid
A character vector denoting the rs ids to be queried
accession
A caharcter vector of Cinvar accessions
type
A caharcter vector of Variant types
mode_inheritance_labels
A character vector
clinsig_labels
A character vector
alleleOrigin
A character vector
consistency_labels
A character vector
so
A character vector denoting sequence ontology to be queried
source
A character vector
trait
A character vector denoting the trait to be queried
include
A character vector denoting the fields to be returned
exclude
A character vector denoting the fields to be excluded
limit
A number limiting the number of results to be returned
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
This is a constructor function for the CellBaseR object
CellBaseR( host = "https://ws.zettagenomics.com/cellbase/webservices/rest/", version = "v5", species = "hsapiens", batch_size = 200L, num_threads = 8L )
CellBaseR( host = "https://ws.zettagenomics.com/cellbase/webservices/rest/", version = "v5", species = "hsapiens", batch_size = 200L, num_threads = 8L )
host |
A character the default host url for cellbase webservices, e.g. "http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/rest/" |
version |
A character the cellbae API version, e.g. "V4" |
species |
a character specifying the species to be queried, e.g. "hsapiens" |
batch_size |
intger if multiple queries are raised by a single method call, e.g. getting annotation info for several genes, queries will be sent to the server in batches.This slot indicates the size of each batch,e.g. 200 |
num_threads |
integer number of batches to be sent to the server |
CellbaseR constructor function
This class defines the CellBaseR object. It holds the default configuration required by CellBaseR methods to connect to the cellbase web services. By defult it is configured to query human data based on the GRCh37 genome assembly.
An object of class CellBaseR
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
cb <- CellBaseR() print(cb)
cb <- CellBaseR() print(cb)
This is an S4 class which defines the CellBaseR object
This S4 class holds the default configuration required by CellBaseR methods to connect to the cellbase web services. By default it is configured to query human data based on the GRCh37 assembly assembly.
host
a character specifying the host url. Default "http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/rest/"
version
a character specifying the API version. Default "v4"
species
a character specifying the species to be queried. Default "hsapiens"
batch_size
if multiple queries are raised by a single method call, e.g. getting annotation info for several features, queries will be sent to the server in batches. This slot indicates the size of these batches. Default 200
num_threads
the number of threads. Default 8
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
A convience functon to construct a genemodel
createGeneModel(object, region = NULL)
createGeneModel(object, region = NULL)
object |
an object of class CellbaseResponse |
region |
a character |
This function create a gene model data frame, which can be then
turned into a GeneRegionTrack for visualiaztion
by GeneRegionTrack
A geneModel
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
cb <- CellBaseR() test <- createGeneModel(object = cb, region = "17:1500000-1550000")
cb <- CellBaseR() test <- createGeneModel(object = cb, region = "17:1500000-1550000")
The generic method for querying CellBase web services.
## S4 method for signature 'CellBaseR' getCellBase(object, category, subcategory, ids, resource, param = NULL)
## S4 method for signature 'CellBaseR' getCellBase(object, category, subcategory, ids, resource, param = NULL)
object |
an object of class CellBaseR |
category |
character to specify the category to be queried. |
subcategory |
character to specify the subcategory to be queried |
ids |
a character vector of the ids to be queried |
resource |
a character to specify the resource to be queried |
param |
an object of class CellBaseParam specifying additional param for the CellBaseR |
This method allows the user to query the cellbase web services without any predefined categories, subcategries, or resources.
a dataframe holding the results of the query
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
cb <- CellBaseR() res <- getCellBase(object=cb, category="feature", subcategory="gene", ids="TET1", resource="info")
cb <- CellBaseR() res <- getCellBase(object=cb, category="feature", subcategory="gene", ids="TET1", resource="info")
A function to get help about available cellbase resources
getCellBaseResourceHelp(object, subcategory)
getCellBaseResourceHelp(object, subcategory)
object |
a cellBase class object |
subcategory |
a character the subcategory to be queried |
This function retrieves available resources for each generic method like getGene, getRegion, getprotein, etc. It help the user see all possible resources to use with the getGeneric methods
character vector of the available resources to that particular subcategory
cb <- CellBaseR() # Get help about what resources are available to the getGene method getCellBaseResourceHelp(cb, subcategory="gene") # Get help about what resources are available to the getRegion method getCellBaseResourceHelp(cb, subcategory="region") # Get help about what resources are available to the getXref method getCellBaseResourceHelp(cb, subcategory="id")
cb <- CellBaseR() # Get help about what resources are available to the getGene method getCellBaseResourceHelp(cb, subcategory="gene") # Get help about what resources are available to the getRegion method getCellBaseResourceHelp(cb, subcategory="region") # Get help about what resources are available to the getXref method getCellBaseResourceHelp(cb, subcategory="id")
A method to query sequence data from Cellbase web services.
## S4 method for signature 'CellBaseR' getChromosomeInfo(object, ids, resource, param = NULL)
## S4 method for signature 'CellBaseR' getChromosomeInfo(object, ids, resource, param = NULL)
object |
an object of class CellBaseR |
ids |
a character vector of chromosome ids to be queried |
resource |
a character vector to specify the resource to be queried |
param |
a object of class CellBaseParam specifying additional param for the query |
A method to query sequence data from Cellbase web services. This method retrieves information about chromosomes, including its size and detailed information about its different cytobands
a dataframe with the results of the query
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
cb <- CellBaseR() res <- getChromosomeInfo(object=cb, ids="22", resource="info")
cb <- CellBaseR() res <- getChromosomeInfo(object=cb, ids="22", resource="info")
A method to query Clinical data from Cellbase web services.
## S4 method for signature 'CellBaseR' getClinical(object, param = NULL)
## S4 method for signature 'CellBaseR' getClinical(object, param = NULL)
object |
an object of class CellBaseR |
param |
a object of class CellBaseParam specifying the parameters limiting the CellBaseR |
This method retrieves clinicaly relevant variants annotations from multiple resources including clinvar, cosmic and gwas catalog. Furthermore, the user can filter these data in many ways including trait, features, rs, etc,.
a dataframe with the results of the query
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
cb <- CellBaseR() cbParam <- CellBaseParam(feature=c("TP73","TET1"), limit=100) res <- getClinical(object=cb,param=cbParam)
cb <- CellBaseR() cbParam <- CellBaseParam(feature=c("TP73","TET1"), limit=100) res <- getClinical(object=cb,param=cbParam)
A convienice method to fetch conservation data for specific region/s
getConservationByRegion(object, id, param = NULL)
getConservationByRegion(object, id, param = NULL)
object |
an object of class CellBaseR |
id |
a charcter vector of genomic regions, eg 17:1000000-1100000 |
param |
an object of class CellBaseParam |
a dataframe of the query result
cb <- CellBaseR() res <- getConservationByRegion(cb, "17:1000000-1189811")
cb <- CellBaseR() res <- getConservationByRegion(cb, "17:1000000-1189811")
A method to query gene data from Cellbase web services.
## S4 method for signature 'CellBaseR' getGene(object, ids, resource, param = NULL)
## S4 method for signature 'CellBaseR' getGene(object, ids, resource, param = NULL)
object |
an object of class CellBaseR |
ids |
a character vector of gene ids to be queried |
resource |
a character vector to specify the resource to be queried |
param |
an object of class CellBaseParam specifying additional param for the CellBaseR |
This method retrieves various gene annotations including transcripts and exons data as well as gene expression and clinical data
a dataframe with the results of the query
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
cb <- CellBaseR() res <- getGene(object=cb, ids=c("TP73","TET1"), resource="info")
cb <- CellBaseR() res <- getGene(object=cb, ids=c("TP73","TET1"), resource="info")
A convienice method to fetch gene annotations specific gene/s
getGeneInfo(object, id, param = NULL)
getGeneInfo(object, id, param = NULL)
object |
an object of class CellBaseR |
id |
a charcter vector of HUGO symbol (gene names) |
param |
an object of class CellBaseParam |
a dataframe of the query result
cb <- CellBaseR() res <- getGeneInfo(cb, "TET1")
cb <- CellBaseR() res <- getGeneInfo(cb, "TET1")
A method for getting the available metadata from the cellbase web services
## S4 method for signature 'CellBaseR' getMeta(object, resource)
## S4 method for signature 'CellBaseR' getMeta(object, resource)
object |
an object of class CellBaseR |
resource |
the resource you want to query it metadata |
This method is for getting information about the avaialable species and available annotation, assembly for each species from the cellbase web services.
a dataframe with the results of the query
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
cb <- CellBaseR() res <- getMeta(object=cb, resource="species")
cb <- CellBaseR() res <- getMeta(object=cb, resource="species")
A method to query protein data from Cellbase web services.
## S4 method for signature 'CellBaseR' getProtein(object, ids, resource, param = NULL)
## S4 method for signature 'CellBaseR' getProtein(object, ids, resource, param = NULL)
object |
an object of class CellBaseR |
ids |
a character vector of uniprot ids to be queried, should be one or more of uniprot ids, for example O15350. |
resource |
a character vector to specify the resource to be queried |
param |
a object of class CellBaseParam specifying additional param for the query |
This method retrieves various protein annotations including protein description, features, sequence, substitution scores, evidence, etc.
an object of class CellBaseResponse which holds a dataframe with th e results of the query
cb <- CellBaseR() res <- getProtein(object=cb, ids="O15350", resource="info")
cb <- CellBaseR() res <- getProtein(object=cb, ids="O15350", resource="info")
A convienice method to fetch annotations for specific protein/s
getProteinInfo(object, id, param = NULL)
getProteinInfo(object, id, param = NULL)
object |
an object of class CellBaseR |
id |
a charcter vector of Uniprot Ids |
param |
an object of class CellBaseParam |
a dataframe of the query result
cb <- CellBaseR() res <- getProteinInfo(cb, "O15350")
cb <- CellBaseR() res <- getProteinInfo(cb, "O15350")
A method to query features within a genomic region from Cellbase web services.
## S4 method for signature 'CellBaseR' getRegion(object, ids, resource, param = NULL)
## S4 method for signature 'CellBaseR' getRegion(object, ids, resource, param = NULL)
object |
an object of class CellBaseR |
ids |
a character vector of the regions to be queried, for example, "1:1000000-1200000' should always be in the form 'chr:start-end' |
resource |
a character vector to specify the resource to be queried |
param |
a object of class CellBaseParam specifying additional param for the query |
This method retrieves various genomic features from a given region including genes, snps, clincally relevant variants, proteins, etc.
a dataframe with the results of the query
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
cb <- CellBaseR() res <- getRegion(object=cb, ids="17:1000000-1200000", resource="gene")
cb <- CellBaseR() res <- getRegion(object=cb, ids="17:1000000-1200000", resource="gene")
A convienice method to fetch regulatory data for specific region/s
getRegulatoryByRegion(object, id, param = NULL)
getRegulatoryByRegion(object, id, param = NULL)
object |
an object of class CellBaseR |
id |
a charcter vector of genomic regions, eg 17:1000000-1100000 |
param |
an object of class CellBaseParam |
a dataframe of the query result
cb <- CellBaseR() res <- getRegulatoryByRegion(cb, "17:1000000-1189811")
cb <- CellBaseR() res <- getRegulatoryByRegion(cb, "17:1000000-1189811")
A method to query transcript data from Cellbase web services.
## S4 method for signature 'CellBaseR' getTranscript(object, ids, resource, param = NULL)
## S4 method for signature 'CellBaseR' getTranscript(object, ids, resource, param = NULL)
object |
an object of class CellBaseR |
ids |
a character vector of the transcript ids to be queried, use ensemble transccript IDs eq, ENST00000380152 |
resource |
a character vector to specify the resource to be queried |
param |
an object of class CellBaseParam specifying additional params for the query |
This method retrieves various genomic annotations for transcripts including exons, cDNA sequence, annotations flags, and cross references,etc.
a dataframe with the results of the query
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
cb <- CellBaseR() res <- getTranscript(object=cb, ids="ENST00000373644", resource="info")
cb <- CellBaseR() res <- getTranscript(object=cb, ids="ENST00000373644", resource="info")
A convienice method to fetch transcripts for specific gene/s
getTranscriptByGene(object, id, param = NULL)
getTranscriptByGene(object, id, param = NULL)
object |
an object of class CellBaseR |
id |
a charcter vector of HUGO symbol (gene names) |
param |
an object of class CellBaseParam |
a dataframe of the query result
cb <- CellBaseR() res <- getTranscriptByGene(cb, "TET1")
cb <- CellBaseR() res <- getTranscriptByGene(cb, "TET1")
A method to query variant annotation data from Cellbase web services from Cellbase web services.
## S4 method for signature 'CellBaseR' getVariant(object, ids, resource, param = NULL)
## S4 method for signature 'CellBaseR' getVariant(object, ids, resource, param = NULL)
object |
an object of class CellBaseR |
ids |
a character vector of the ids to be queried, must be in the following format 'chr:start:ref:alt', for example, '1:128546:A:T' |
resource |
a character vector to specify the resource to be queried |
param |
a object of class CellBaseParam specifying additional param for the query |
This method retrieves extensive genomic annotations for variants including consequence types, conservation data, population frequncies from 1k genomes and Exac projects, etc. as well as clinical data and various other annotations
a dataframe with the results of the query
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
cb <- CellBaseR() res <- getVariant(object=cb, ids="19:45411941:T:C", resource="annotation")
cb <- CellBaseR() res <- getVariant(object=cb, ids="19:45411941:T:C", resource="annotation")
A convienice method to fetch variant annotation for specific variant/s
getVariantAnnotation(object, id, param = NULL)
getVariantAnnotation(object, id, param = NULL)
object |
an object of class CellBaseR |
id |
a charcter vector of length < 200 of genomic variants, eg 19:45411941:T:C |
param |
an object of class CellBaseParam |
a dataframe of the query result
cb <- CellBaseR() res <- getVariantAnnotation(cb, "19:45411941:T:C")
cb <- CellBaseR() res <- getVariantAnnotation(cb, "19:45411941:T:C")
A method to query cross reference data from Cellbase web services.
## S4 method for signature 'CellBaseR' getXref(object, ids, resource, param = NULL)
## S4 method for signature 'CellBaseR' getXref(object, ids, resource, param = NULL)
object |
an object of class CellBaseR |
ids |
a character vector of the ids to be queried, any crossrefereable ID, gene names, transcript ids, uniprot ids,etc. |
resource |
a character vector to specify the resource to be queried |
param |
a object of class CellBaseParam specifying additional param for the query |
This method retrieves cross references for genomic identifiers, eg ENSEMBL ids, it also provide starts_with service that is useful for autocomplete services.
a dataframe with the results of the query
https://github.com/opencb/cellbase/wiki and the RESTful API documentation http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/
cb <- CellBaseR() res <- getXref(object=cb, ids="ENST00000373644", resource="xref")
cb <- CellBaseR() res <- getXref(object=cb, ids="ENST00000373644", resource="xref")