| Title: | Interface to BioMart databases (i.e. Ensembl) |
|---|---|
| Description: | In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite (<https://www.ensembl.org/info/data/biomart/index.html>). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. The most prominent examples of BioMart databases are maintained by Ensembl, which provides biomaRt users direct access to a diverse set of data and enables a wide range of powerful online queries from gene annotation to database mining. |
| Authors: | Steffen Durinck [aut], Wolfgang Huber [aut], Sean Davis [ctb], Francois Pepin [ctb], Vince S Buffalo [ctb], Mike Smith [ctb] (ORCID: <https://orcid.org/0000-0002-7800-3848>), Hugo Gruson [ctb, cre] (ORCID: <https://orcid.org/0000-0002-4094-1476>), German Network for Bioinformatics Infrastructure - de.NBI [fnd] |
| Maintainer: | Hugo Gruson <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 2.69.0 |
| Built: | 2026-05-29 08:39:02 UTC |
| Source: | https://github.com/bioc/biomaRt |
Attributes in BioMart databases are grouped together in attribute pages.
The attributePages() function gives a summary of the attribute categories and
groups present in the BioMart. These page names can be used to display only
a subset of the available attributes in the listAttributes() function.
attributePages(mart)attributePages(mart)
mart |
object of class Mart, created with the |
Steffen Durinck
mart <- useMart( "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) attributePages(mart)mart <- useMart( "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) attributePages(mart)
These functions have been removed from biomaRt and replaced with alternatives.
The following functions are defunct and no longer work; use the replacement indicated below:
filterOptions: listFilterOptions()
listFilterValues: listFilterOptions()
searchFilterValues: searchFilterOptions()
biomaRt makes use of a results cache to speedup execution of queries that have been run before. These functions provide details on the status of this cache, and allow it to be deleted.
biomartCacheClear() biomartCacheInfo()biomartCacheClear() biomartCacheInfo()
These functions do not return anything and are called for their side
effects. biomartCacheInfo() prints the location of the cache, along
with the number of files and their total size on disk.
biomartCacheClear() will delete the current contents of the cache.
Mike Smith
Exports getSequence results to FASTA format
exportFASTA(sequences, file)exportFASTA(sequences, file)
sequences |
A data.frame that was the output of the |
file |
File to which you want to write the data |
Steffen Durinck
Hugo Gruson
mart <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl") seq <- getSequence( id = "BRCA1", type = "hgnc_symbol", seqType = "cdna", mart = mart ) exportFASTA(seq, file = "test.fasta")mart <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl") seq <- getSequence( id = "BRCA1", type = "hgnc_symbol", seqType = "cdna", mart = mart ) exportFASTA(seq, file = "test.fasta")
Displays the type of the filer given a filter name.
filterType(filter, mart)filterType(filter, mart)
filter |
A valid filter name. Valid filters are given by the
|
mart |
object of class Mart, created using the |
Steffen Durinck
mart <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl") filterType("chromosome_name", mart)mart <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl") filterType("chromosome_name", mart)
This function is the main biomaRt query function. Given a set of filters and corresponding values, it retrieves the user specified attributes from the BioMart database one is connected to.
getBM( attributes, filters = "", values = "", mart, checkFilters = TRUE, verbose = FALSE, uniqueRows = TRUE, bmHeader = FALSE, quote = "\"", useCache = TRUE )getBM( attributes, filters = "", values = "", mart, checkFilters = TRUE, verbose = FALSE, uniqueRows = TRUE, bmHeader = FALSE, quote = "\"", useCache = TRUE )
attributes |
Attributes you want to retrieve. A possible list of
attributes can be retrieved using the function |
filters |
Filters (one or more) that should be used in the query. A
possible list of filters can be retrieved using the function |
values |
Values of the filter, e.g. vector of affy IDs. If multiple filters are specified then the argument should be a list of vectors of which the position of each vector corresponds to the position of the filters in the filters argument. |
mart |
object of class Mart, created with the |
checkFilters |
Sometimes attributes where a value needs to be
specified, for example upstream_flank with value 20 for obtaining upstream
sequence flank regions of length 20bp, are treated as filters in BioMarts.
To enable such a query to work, one must specify the attribute as a filter
and set |
verbose |
When using biomaRt in webservice mode and setting verbose to TRUE, the XML query to the webservice will be printed. |
uniqueRows |
If the result of a query contains multiple identical rows,
setting this argument to |
bmHeader |
Boolean to indicate if the result retrieved from the BioMart
server should include the data headers or not, defaults to |
quote |
Sometimes parsing of the results fails due to errors in the Ensembl data fields such as containing a quote, in such cases you can try to change the value of quote to try to still parse the results. |
useCache |
Boolean indicating whether the results cache should be used.
Setting to |
A data.frame. There is no implicit mapping between its rows
and the function arguments (e.g. filters, values), therefore
make sure to have the relevant identifier(s) returned by specifying them in
attributes. See Examples.
Steffen Durinck
mart <- useEnsembl(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl") getBM(attributes = c("affy_hg_u95av2", "hgnc_symbol", "chromosome_name", "band"), filters = "affy_hg_u95av2", values = c("1939_at","1503_at","1454_at"), mart = mart)mart <- useEnsembl(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl") getBM(attributes = c("affy_hg_u95av2", "hgnc_symbol", "chromosome_name", "band"), filters = "affy_hg_u95av2", values = c("1939_at","1503_at","1454_at"), mart = mart)
This function retrieves gene annotations from Ensembl given a vector of identifiers. Annotation includes chromosome name, band, start position, end position, gene description and gene symbol. A wide variety of identifiers is available in Ensembl, these can be found with the listFilters function.
getGene(id, type, mart)getGene(id, type, mart)
id |
vector of gene identifiers one wants to annotate |
type |
type of identifier, possible values can be obtained by the listFilters function. Examples are entrezgene_id, hgnc_symbol (for hugo gene symbol), ensembl_gene_id, unigene, agilentprobe, affy_hg_u133_plus_2, refseq_dna, etc. |
mart |
object of class Mart, containing connections to the BioMart
databases. You can create such an object using the function |
Steffen Durinck
mart <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl") # example using affy id g <- getGene(id = "1939_at", type = "affy_hg_u95av2", mart = mart) show(g) # example using Entrez Gene id g <- getGene(id = "100", type = "entrezgene_id", mart = mart) show(g)mart <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl") # example using affy id g <- getGene(id = "1939_at", type = "affy_hg_u95av2", mart = mart) show(g) # example using Entrez Gene id g <- getGene(id = "100", type = "entrezgene_id", mart = mart) show(g)
This function simplifies the querying of the Ensembl BioMart if you're trying to return the homologs for one or more gene IDs between two species.
getHomologs(ensembl_gene_ids, species_from, species_to)getHomologs(ensembl_gene_ids, species_from, species_to)
ensembl_gene_ids |
Character vector. This contains the Ensembl Gene IDs that you want to find the homologs for. |
species_from, species_to
|
Character vectors of length 1. These
arguments specify the species the input IDs belong to ( |
Mike Smith
This function is the main biomaRt query function that links 2 datasets and retrieves information from these linked BioMart datasets. In Ensembl this translates to homology mapping.
getLDS( attributes, filters = "", values = "", mart, attributesL, filtersL = "", valuesL = "", martL, verbose = FALSE, uniqueRows = TRUE, bmHeader = TRUE )getLDS( attributes, filters = "", values = "", mart, attributesL, filtersL = "", valuesL = "", martL, verbose = FALSE, uniqueRows = TRUE, bmHeader = TRUE )
attributes |
Attributes you want to retrieve of primary dataset. A
possible list of attributes can be retrieved using the function
|
filters |
Filters that should be used in the query. These filters will
be applied to primary dataset. A possible list of filters can be retrieved
using the function |
values |
Values of the filter, e.g. list of affy IDs |
mart |
object of class Mart created with the |
attributesL |
Attributes of linked dataset that needs to be retrieved |
filtersL |
Filters to be applied to the linked dataset |
valuesL |
Values for the linked dataset filters |
martL |
Mart object representing linked dataset |
verbose |
When using biomaRt in webservice mode and setting
verbose to |
uniqueRows |
Logical to indicate if the BioMart web service should
return unique rows only or not. Has the value of either |
bmHeader |
Boolean to indicate if the result retrieved from the BioMart
server should include the data headers or not, defaults to |
Steffen Durinck
human <- useMart( "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl", host = "https://dec2021.archive.ensembl.org" ) mouse <- useMart( "ENSEMBL_MART_ENSEMBL", dataset = "mmusculus_gene_ensembl", host = "https://dec2021.archive.ensembl.org" ) getLDS( attributes = c("hgnc_symbol","chromosome_name", "start_position"), filters = "hgnc_symbol", values = "TP53", mart = human, attributesL = c("chromosome_name","start_position"), martL = mouse )human <- useMart( "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl", host = "https://dec2021.archive.ensembl.org" ) mouse <- useMart( "ENSEMBL_MART_ENSEMBL", dataset = "mmusculus_gene_ensembl", host = "https://dec2021.archive.ensembl.org" ) getLDS( attributes = c("hgnc_symbol","chromosome_name", "start_position"), filters = "hgnc_symbol", values = "TP53", mart = human, attributesL = c("chromosome_name","start_position"), martL = mouse )
This function retrieves sequences given the chromosome, start and end position or a list of identifiers. Using getSequence in web service mode (default) generates 5' to 3' sequences of the requested type on the correct strand.
getSequence( chromosome, start, end, id, type, seqType, upstream, downstream, mart, useCache = TRUE, verbose = FALSE )getSequence( chromosome, start, end, id, type, seqType, upstream, downstream, mart, useCache = TRUE, verbose = FALSE )
chromosome |
Chromosome name |
start |
start position of sequence on chromosome |
end |
end position of sequence on chromosome |
id |
An identifier or vector of identifiers. |
type |
The type of identifier used. Supported types are hugo, ensembl,
embl, entrezgene, refseq, ensemblTrans and unigene. Alternatively one can
also use a filter to specify the type. Possible filters are given by the
|
seqType |
Type of sequence that you want to retrieve. Allowed seqTypes are given in the details section. |
upstream |
To add the upstream sequence of a specified number of basepairs to the output. |
downstream |
To add the downstream sequence of a specified number of basepairs to the output. |
mart |
object of class Mart created using the |
useCache |
If |
verbose |
If 'verbose = TRUE“ then the XML query that was send to the webservice will be displayed. |
The type of sequence returned can be specified by the seqType argument which takes the following values:
'cdna': for nucleotide sequences
'peptide': for protein sequences
'3utr': for 3' UTR sequences
'5utr': for 5' UTR sequences
'gene_exon': for exon sequences only
'transcript_exon_intron': gives the full unspliced transcript, that is exons + introns
'gene_exon_intron' gives the exons + introns of a gene;'coding' gives the coding sequence only
'coding_transcript_flank': gives the flanking region of the transcript including the UTRs, this must be accompanied with a given value for the upstream or downstream attribute
'coding_gene_flank': gives the flanking region of the gene including the UTRs, this must be accompanied with a given value for the upstream or downstream attribute
'transcript_flank': gives the flanking region of the transcript excluding the UTRs, this must be accompanied with a given value for the upstream or downstream attribute
'gene_flank': gives the flanking region of the gene excluding the UTRs, this must be accompanied with a given value for the upstream or downstream attribute
Steffen Durinck, Mike Smith
mart <- useEnsembl("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl") seq <- getSequence( id = "BRCA1", type = "hgnc_symbol", seqType = "peptide", mart = mart ) show(seq) seq <- getSequence( id = "1939_at", type = "affy_hg_u95av2", seqType = "gene_flank", upstream = 20, mart = mart ) show(seq)mart <- useEnsembl("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl") seq <- getSequence( id = "BRCA1", type = "hgnc_symbol", seqType = "peptide", mart = mart ) show(seq) seq <- getSequence( id = "1939_at", type = "affy_hg_u95av2", seqType = "gene_flank", upstream = 20, mart = mart ) show(seq)
Attributes are the outputs of a biomaRt query, they are the information we
want to retrieve. For example if we want to retrieve all EntrezGene
identifiers of genes located on chromosome X, entrezgene_id will be
the attribute we use in the query. The listAttributes function lists
the available attributes in the selected dataset.
listAttributes(mart, page, what = c("name", "description", "page")) searchAttributes(mart, pattern = ".*")listAttributes(mart, page, what = c("name", "description", "page")) searchAttributes(mart, pattern = ".*")
mart |
object of class Mart created using the |
page |
Show only the attributes that belong to the specified attribute page. |
what |
vector of types of information about the attributes that need to be displayed. Can have values like name, description, fullDescription, page |
pattern |
Character vector defining the regular expression (regex) to be used for the search. If left blank the default is to use ".*" which will match everything. |
Steffen Durinck, Mike Smith
## list the available Ensembl marts and use Ensembl Genes listEnsembl() ensembl <- useEnsembl( biomart = "ENSEMBL_MART_ENSEMBL", dataset = 'hsapiens_gene_ensembl' ) ## list the available datasets in this Mart listAttributes(mart = ensembl) ## the list of attributes is very long and gets truncated by R ## we can search for a term of interest to filter this e.g. 'start' searchAttributes(mart = ensembl, pattern = "start") ## filter the attributes to give only entries containing 'entrez' or 'hgnc' searchAttributes(mart = ensembl, 'entrez|hgnc')## list the available Ensembl marts and use Ensembl Genes listEnsembl() ensembl <- useEnsembl( biomart = "ENSEMBL_MART_ENSEMBL", dataset = 'hsapiens_gene_ensembl' ) ## list the available datasets in this Mart listAttributes(mart = ensembl) ## the list of attributes is very long and gets truncated by R ## we can search for a term of interest to filter this e.g. 'start' searchAttributes(mart = ensembl, pattern = "start") ## filter the attributes to give only entries containing 'entrez' or 'hgnc' searchAttributes(mart = ensembl, 'entrez|hgnc')
Lists or search the datasets available in the selected BioMart database
listDatasets(mart, verbose = FALSE) searchDatasets(mart, pattern = ".*")listDatasets(mart, verbose = FALSE) searchDatasets(mart, pattern = ".*")
mart |
object of class Mart created with the useMart function |
verbose |
Give detailed output of what the method is doing, for debugging purposes |
pattern |
Character vector defining the regular expression
(regex) to be used for the search. If left blank the
default is to use ".*" which will match everything and return the same as
|
Steffen Durinck, Mike Smith
## list the available Ensembl marts and use Ensembl Genes listEnsembl() ensembl <- useEnsembl(biomart = "ENSEMBL_MART_ENSEMBL") ## list the available datasets in this Mart listDatasets(mart = ensembl) ## the list of Ensembl datasets grows ever larger (101 as of Ensembl 93) ## we can search for a term of interest to reduce the length e.g. 'sapiens' searchDatasets(mart = ensembl, pattern = "sapiens") ## search for any dataset containing the word Rat or rat searchDatasets(mart = ensembl, pattern = "(R|r)at")## list the available Ensembl marts and use Ensembl Genes listEnsembl() ensembl <- useEnsembl(biomart = "ENSEMBL_MART_ENSEMBL") ## list the available datasets in this Mart listDatasets(mart = ensembl) ## the list of Ensembl datasets grows ever larger (101 as of Ensembl 93) ## we can search for a term of interest to reduce the length e.g. 'sapiens' searchDatasets(mart = ensembl, pattern = "sapiens") ## search for any dataset containing the word Rat or rat searchDatasets(mart = ensembl, pattern = "(R|r)at")
This function returns a list of BioMart databases hosted by Ensembl. To
establish a connection use the useEnsembl() function.
listEnsembl( mart = NULL, version = NULL, GRCh = NULL, mirror = NULL, verbose = FALSE ) listEnsemblGenomes(includeHosts = FALSE, host = NULL)listEnsembl( mart = NULL, version = NULL, GRCh = NULL, mirror = NULL, verbose = FALSE ) listEnsemblGenomes(includeHosts = FALSE, host = NULL)
mart |
mart object created with the useEnsembl function. This is
optional, as you usually use |
version |
Ensembl version to connect to when wanting to connect to an archived Ensembl version |
GRCh |
GRCh version to connect to if not the current GRCh38, currently this can only be 37 |
mirror |
Specify an Ensembl mirror to connect to. The valid options here are 'www', 'useast', 'asia'. If no mirror is specified the primary site at www.ensembl.org will be used. |
verbose |
Give detailed output of what the method is doing, for debugging purposes |
includeHosts |
If this option is set to |
host |
Host to connect to. Use this argument to specify and archive
site for |
Steffen Durinck, Mike L. Smith
listEnsembl() ## list the default Ensembl Genomes marts listEnsemblGenomes() ## list only the marts available in the Ensmbl Plants 56 archive listEnsemblGenomes(host = "https://eg56-plants.ensembl.org/")listEnsembl() ## list the default Ensembl Genomes marts listEnsemblGenomes() ## list only the marts available in the Ensmbl Plants 56 archive listEnsemblGenomes(host = "https://eg56-plants.ensembl.org/")
Returns a table containing the available archived versions of Ensembl, along with the dates they were created and the URL used to access them.
listEnsemblArchives()listEnsemblArchives()
Mike Smith
listEnsemblArchives()listEnsemblArchives()
Filters are what we use as inputs for a biomaRt query. For example, if we
want to retrieve all EntrezGene identifiers on chromosome X,
chromosome will be the filter, with corresponding value X.
listFilters(mart, what = c("name", "description")) searchFilters(mart, pattern = ".*")listFilters(mart, what = c("name", "description")) searchFilters(mart, pattern = ".*")
mart |
object of class |
what |
character vector indicating what information to display about
the available filters. Valid values are |
pattern |
Character vector defining the regular expression (regex) to be used for the search. If left blank the default is to use '".*"“ which will match everything. |
Steffen Durinck, Mike Smith
## list the available Ensembl marts and use Ensembl Genes listEnsembl() ensembl <- useEnsembl( biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) ## list the available datasets in this Mart listFilters(mart = ensembl) ## the list of filters is long and not easy to read ## we can search for a term of interest to reduce this e.g. 'gene' searchFilters(mart = ensembl, pattern = "gene") ## search the available filters to find entries containing 'entrez' or 'hgnc' searchFilters(mart = ensembl, 'entrez|hgnc')## list the available Ensembl marts and use Ensembl Genes listEnsembl() ensembl <- useEnsembl( biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) ## list the available datasets in this Mart listFilters(mart = ensembl) ## the list of filters is long and not easy to read ## we can search for a term of interest to reduce this e.g. 'gene' searchFilters(mart = ensembl, pattern = "gene") ## search the available filters to find entries containing 'entrez' or 'hgnc' searchFilters(mart = ensembl, 'entrez|hgnc')
This function returns a list of BioMart databases to which biomaRt can
connect. By default the Ensembl BioMart databases are displayed. To
establish a connection use the useMart() function.
listMarts( mart = NULL, host = "https://www.ensembl.org", path = "/biomart/martservice", port, includeHosts = FALSE, http_config = list(), verbose = FALSE )listMarts( mart = NULL, host = "https://www.ensembl.org", path = "/biomart/martservice", port, includeHosts = FALSE, http_config = list(), verbose = FALSE )
mart |
mart object created with the |
host |
Host to connect to. Defaults to |
path |
path to martservice that should be pasted behind the host to get to web service URL |
port |
port to use in HTTP communication |
includeHosts |
boolean to indicate if function should return host of the BioMart databases |
http_config |
Some hosts require specific HTTP settings to be used when
connecting. This argument takes the output of |
verbose |
Give detailed output of what the method is doing, for debugging purposes. |
If you receive an error message saying 'Unexpected format to the list of
available marts', this is often because there is a problem with the BioMart
server you are trying to connect to, and something other than the list of
available marts is being returned - often some like a 'down for
maintenance' page. If you browse to the provided URL and find a page that
starts with '<MartRegistry>' this is the correct listing and you
should report the issue on the Bioconductor support site:
https://support.bioconductor.org
The previously available archive argument is defunct.
A better alternative is to specify the url of the archived BioMart
you want to access. For Ensembl you can view the list of archives using
listEnsemblArchives().
Steffen Durinck, Mike Smith
listMarts()listMarts()
This function opens an editor displaying the analysis code of the Nature Protocols 2009 paper
NP2009code()NP2009code()
The edit() function uses getOption("editor") to select
the editor. Use, for instance, options(editor="emacs") to set another
editor.
Steffen Durinck, Wolfgang Huber
NP2009code()NP2009code()
Some filters have a predefined list of values that can be used to search them. These functions give access to this list of options for a named filter, so you can check in the case where your biomaRt query is not finding anything.
searchFilterOptions(mart, filter, pattern = ".*") listFilterOptions(mart, filter)searchFilterOptions(mart, filter, pattern = ".*") listFilterOptions(mart, filter)
mart |
object of class |
filter |
The name of the filter whose options should be listed or
searched. You can list available filters via |
pattern |
Character vector defining the regular expression (regex) to be used for the search. If left blank the default is to use ".*" which will match everything. |
Mike Smith
## Use the Ensembl human genes dataset ensembl <- useEnsembl( biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) ## we can search for the name of a filter we're interested in e.g. 'phenotype' ## we need to use the name of the filter in the next function searchFilters(ensembl, pattern = "phenotype") ## list all the options available to the 'phenotype_source' filter listFilterOptions(mart = ensembl, filter = "phenotype_source") ## search the 'phenotype_description' filter for the term 'crohn' searchFilterOptions( mart = ensembl, filter = "phenotype_description", pattern = "crohn" )## Use the Ensembl human genes dataset ensembl <- useEnsembl( biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) ## we can search for the name of a filter we're interested in e.g. 'phenotype' ## we need to use the name of the filter in the next function searchFilters(ensembl, pattern = "phenotype") ## list all the options available to the 'phenotype_source' filter listFilterOptions(mart = ensembl, filter = "phenotype_source") ## search the 'phenotype_description' filter for the term 'crohn' searchFilterOptions( mart = ensembl, filter = "phenotype_description", pattern = "crohn" )
select, columns and keys are used together to extract
data from a Mart object. These functions work much the same as the
classic biomaRt functions such as getBM() etc. and are provide here to
make this easier for people who are comfortable using these methods from
other Annotation packages. Examples of other objects in other packages
where you can use these methods include (but are not limited to):
ChipDb, OrgDb GODb, InparanoidDb and
ReactomeDb.
## S4 method for signature 'Mart' keys(x, keytype, ...) ## S4 method for signature 'Mart' keytypes(x) ## S4 method for signature 'Mart' columns(x) ## S4 method for signature 'Mart' select(x, keys, columns, keytype, ...)## S4 method for signature 'Mart' keys(x, keytype, ...) ## S4 method for signature 'Mart' keytypes(x) ## S4 method for signature 'Mart' columns(x) ## S4 method for signature 'Mart' select(x, keys, columns, keytype, ...)
x |
the |
keytype |
the keytype that matches the keys used. For the
|
... |
other arguments. These include:
|
keys |
the keys to select records for from the database. Keys for some
keytypes can be extracted by using the |
columns |
the columns or kinds of things that can be retrieved from the
database. As with |
columns shows which kinds of data can be returned from the
Mart object.
keytypes allows the user to discover which keytypes can be passed in
to select or keys as the keytype argument.
keys returns keys from the Mart of the type specified by it's
keytype argument.
select is meant to be used with these other methods and has arguments
that take the kinds of values that these other methods return.
select will retrieve the results as a data.frame based on parameters
for selected keys and columns and keytype arguments.
keys,columns and keytypes each return a
character vector or possible values. select returns a data.frame.
Marc Carlson
## 1st create a Mart object and specify the dataset mart <- useEnsembl( biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) ## you can list the keytypes keytypes(mart) ## you can list the columns columns(mart) ## And you can extract keys when this is supported for your keytype of interest k <- keys(mart, keytype="chromosome_name") head(k) ## You can even do some pattern matching on the keys k <- keys(mart, keytype="chromosome_name", pattern="LRG") head(k) ## Finally you can use select to extract records for things that you are ## interested in. affy <- c("202763_at", "209310_s_at", "207500_at") select(mart, keys=affy, columns=c('affy_hg_u133_plus_2','entrezgene_id'), keytype='affy_hg_u133_plus_2')## 1st create a Mart object and specify the dataset mart <- useEnsembl( biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" ) ## you can list the keytypes keytypes(mart) ## you can list the columns columns(mart) ## And you can extract keys when this is supported for your keytype of interest k <- keys(mart, keytype="chromosome_name") head(k) ## You can even do some pattern matching on the keys k <- keys(mart, keytype="chromosome_name", pattern="LRG") head(k) ## Finally you can use select to extract records for things that you are ## interested in. affy <- c("202763_at", "209310_s_at", "207500_at") select(mart, keys=affy, columns=c('affy_hg_u133_plus_2','entrezgene_id'), keytype='affy_hg_u133_plus_2')
On some systems specific SSL settings have to be applied to allow https connections to the Ensembl servers. This function allows these to be saved in the biomaRt cache, so they will be retrieved each time they are needed. biomaRt will try to determine them automatically, but this function can be used to set them manually if required.
setEnsemblSSL(settings)setEnsemblSSL(settings)
settings |
A named list. Each entry should be a valid curl option, as
found in |
Mike Smith
## Not run: ssl_settings <- list( "ssl_cipher_list" = "DEFAULT@SECLEVEL=1", "ssl_verifypeer" = FALSE ) setEnsemblSSL(ssl_settings) ## End(Not run)## Not run: ssl_settings <- list( "ssl_cipher_list" = "DEFAULT@SECLEVEL=1", "ssl_verifypeer" = FALSE ) setEnsemblSSL(ssl_settings) ## End(Not run)
Represents a Mart class, containing connections to different BioMarts
## S4 method for signature 'Mart' show(object)## S4 method for signature 'Mart' show(object)
object |
An object of class |
show Print summary of the object
Steffen Durinck
This function selects a dataset and updates the Mart object
useDataset(dataset, mart, verbose = FALSE)useDataset(dataset, mart, verbose = FALSE)
dataset |
Dataset you want to use. List of possible datasets can be
retrieved using the function |
mart |
Mart object created with the |
verbose |
Give detailed output of what the method is doing, for debugging |
Steffen Durinck
mart <- useMart("ENSEMBL_MART_ENSEMBL") mart <- useDataset("hsapiens_gene_ensembl", mart = mart)mart <- useMart("ENSEMBL_MART_ENSEMBL") mart <- useDataset("hsapiens_gene_ensembl", mart = mart)
A first step in using the biomaRt package is to select a BioMart database
and dataset to use. The useEnsembl() function enables one to connect
to a specified BioMart database and dataset hosted by Ensembl without having
to specify the Ensembl URL. To know which BioMart databases are available
see the listEnsembl() and listEnsemblGenomes()
functions. To know which datasets are available within a BioMart database,
first select the BioMart database using useEnsembl() and then use the
listDatasets() function on the selected Mart object.
useEnsembl( biomart, dataset, host, version = NULL, GRCh = NULL, mirror = NULL, verbose = FALSE ) useEnsemblGenomes(biomart, dataset, host = NULL)useEnsembl( biomart, dataset, host, version = NULL, GRCh = NULL, mirror = NULL, verbose = FALSE ) useEnsemblGenomes(biomart, dataset, host = NULL)
biomart |
BioMart database name you want to connect to. Possible
database names can be retrieved with the function |
dataset |
Dataset you want to use. To see the different datasets available within a biomaRt you can e.g. do: mart = useEnsembl('genes'), followed by listDatasets(mart). |
host |
Host to connect to. Only needs to be specified if different
from www.ensembl.org. For |
version |
Ensembl version to connect to when wanting to connect to an archived Ensembl version |
GRCh |
GRCh version to connect to if not the current GRCh38, currently this can only be 37 |
mirror |
Specify an Ensembl mirror to connect to. The valid options here are 'www', 'useast', 'asia'. If no mirror is specified the primary site at www.ensembl.org will be used. Mirrors are not available for the Ensembl Genomes databases. |
verbose |
Give detailed output of what the method is doing while in use, for debugging |
The mirror argument can be considered as a "preferred choice" when
connecting to Ensembl. If the argument is provided then connectivity to
that mirror will be tested. If it responds positively then the requested
mirror will be used. If the response is a failure each of the remaining
mirrors will be selected at random and tested until a working server is
found. Once identified that Ensembl server will be associated with the
returned Mart object and will be used for all queries.
Steffen Durinck & Mike Smith
mart <- useEnsembl("ENSEMBL_MART_ENSEMBL") ## using the US East mirror us_mart <- useEnsembl(biomart = "ENSEMBL_MART_ENSEMBL", mirror = "useast") ## using the Arabidopsis thaliana genes dataset in Ensembl Plants plants_mart <- useEnsemblGenomes( biomart = "plants_mart", dataset = "athaliana_eg_gene" ) ## using the Cucumis melo genes dataset in the Ensembl Plants 56 archive plants_mart <- useEnsemblGenomes( biomart = "plants_mart", dataset = "cmelo_eg_gene", host = "https://feb2023-plants.ensembl.org/" )mart <- useEnsembl("ENSEMBL_MART_ENSEMBL") ## using the US East mirror us_mart <- useEnsembl(biomart = "ENSEMBL_MART_ENSEMBL", mirror = "useast") ## using the Arabidopsis thaliana genes dataset in Ensembl Plants plants_mart <- useEnsemblGenomes( biomart = "plants_mart", dataset = "athaliana_eg_gene" ) ## using the Cucumis melo genes dataset in the Ensembl Plants 56 archive plants_mart <- useEnsemblGenomes( biomart = "plants_mart", dataset = "cmelo_eg_gene", host = "https://feb2023-plants.ensembl.org/" )
A first step in using the biomaRt package is to select a BioMart database
and dataset to use. The useMart function enables one to connect to a
specified BioMart database and dataset within this database. To know which
BioMart databases are available see the listMarts() function. To know which
datasets are available within a BioMart database, first select the BioMart
database using useMart() and then use the listDatasets() function on the
selected BioMart, see listDatasets() function.
useMart( biomart, dataset, host = "https://www.ensembl.org", path = "/biomart/martservice", port, version, verbose = FALSE )useMart( biomart, dataset, host = "https://www.ensembl.org", path = "/biomart/martservice", port, version, verbose = FALSE )
biomart |
BioMart database name you want to connect to. Possible
database names can be retrieved with the function |
dataset |
Dataset you want to use. To see the different datasets available within a biomaRt you can e.g. do: mart = useMart(), followed by listDatasets(). |
host |
Host to connect to. Defaults to |
path |
Path that should be pasted after to host to get access to the web service URL |
port |
port to connect to, will be pasted between host and path |
version |
Use version name instead of biomart name to specify which BioMart you want to use |
verbose |
Give detailed output of what the method is doing while in use, for debugging |
The previously available archive argument is defunct.
A better alternative is to specify the url of the archived BioMart
you want to access. For Ensembl you can view the list of archives using
listEnsemblArchives().
Steffen Durinck, Mike L. Smith
mart <- useMart("ENSEMBL_MART_ENSEMBL") mart <- useMart( biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" )mart <- useMart("ENSEMBL_MART_ENSEMBL") mart <- useMart( biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" )