Title: | Package for CTDbase data query, visualization and downstream analysis |
---|---|
Description: | Package to retrieve and visualize data from the Comparative Toxicogenomics Database (http://ctdbase.org/). The downloaded data is formated as DataFrames for further downstream analyses. |
Authors: | Carles Hernandez-Ferrer [aut], Juan R. Gonzalez [aut], Xavier EscribĂ -Montagut [cre] |
Maintainer: | Xavier EscribĂ -Montagut <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.15.0 |
Built: | 2024-12-19 03:06:59 UTC |
Source: | https://github.com/bioc/CTDquerier |
Class resulting of query_ctd_gene
,
query_ctd_chem
and query_ctd_dise
. It is used
to encapsulate all the information in CTDbase for given set of genes,
chemicals or diseases.
## S4 method for signature 'CTDdata' enrich(x, y, universe, use = "curated", warnings = TRUE, ...) ## S4 method for signature 'CTDdata' get_table(object, index_name, ...) ## S4 method for signature 'CTDdata' get_terms(object) ## S4 method for signature 'CTDdata,ANY' plot(x, y, index_name = "base", representation = "heatmap", ...)
## S4 method for signature 'CTDdata' enrich(x, y, universe, use = "curated", warnings = TRUE, ...) ## S4 method for signature 'CTDdata' get_table(object, index_name, ...) ## S4 method for signature 'CTDdata' get_terms(object) ## S4 method for signature 'CTDdata,ANY' plot(x, y, index_name = "base", representation = "heatmap", ...)
x |
Object of class |
y |
NOT USED |
universe |
String vector of genes used as universe. If not provided, all genes in CTDbase are used. |
use |
Select if all or only curated relations are used. |
warnings |
Shows or hiddes warnings. |
... |
NOT USED |
object |
Object of class |
index_name |
Name of the plot to be draw. See |
representation |
Can take values |
CTDdata
objects provides with a summarized representation of the
downloaded data obtained from the standard show
method. For instance,
a CTDdata
created using query_ctd_chem
shows: Object of class 'CTDdata'
-------------------------
. Type: CHEMICAL
. Creation (timestamp): 2018-03-13 13:11:50
. Length: 2
. Items: IRON, ..., AIR POLLUTANTS
. Diseases: 1755 ( 203 / 3322 )
. Chemical-gene interactions: 2070 ( 2799 )
. KEGG pathways: 637 ( 637 )
. GO terms: 3641 ( 3641 )
The information shows corresponds to:
Type: Indicates the source (chemical, gene or disease) used to create the object.
Creation: Shows the time-stamp from the creation time.
Length: Shows the number of terms used to create the object.
Items: Shows some of the terms used to create the object.
Diseases: Corresponds to the unique number of diseases obtained in the query. In parenthesis: number of curated chemical-diseases and total number of chemical-diseases association.
Chemical-gene interactions: Indicates the unique number of chemical-gene interactions. In parenthesis the total number of chemical-gene interactions.
KEGG pathways: Shows the unique number of KEGG pathway versus chemical associations. In parenthesis the total number of associations.
GO terms: Shows the unique number of GO terms versus chemical associations. In parenthesis the total number of associations.
CTDdata
objects allows many types of representation according to the
different sources (chemical, gene or disease). The used method is
plot
, matching the argument x
with a CTDdata
object. The
argument index_name
indicates the type of plot to be draw. The
default value of index_name
is "base"
.
"base"
: shows a bar-plot indicating the number of lost &
found terms for the given object.
For gene queries, index_name
can take values:
"disease"
: (argument representation
must be
"heatmap"
) shows shows the inference score that associates the
given genes with diseases according to CTDbase.
"chemical interactions"
: (argument representation
must be "heatmap"
) shows the number of reference that cites the
association between the given genes and chemicals.
"gene-gene interaction"
: (argument representation
can be "network"
and "heatmap"
) in the network
representation teh original genes are dark-colored while the other genes
are light-colored. Both plots allows to to explore the gene-gene
interactions.
"kegg pathways"
: (argument representation
must be
"network"
) shows the linked between genes and KEGG pathways.
"go terms"
: (argument representation
must be
"network"
) shows the links between genes and GO terms.
For chemical queries, index_name
can take values:
"gene interactions"
: (argument representation
can be "network"
and "heatmap"
) shows the gene-chemical
interactions. Network representation includes the "mechanism" of the
interactions.
"disease"
: (argument representation
can be "network"
and "heatmap"
) shows the inference
score of the link between chemicals and diseases.
"kegg pathways"
: (argument representation
must be
"network"
) shows the P-Value of relation between KEGG pathways
and chemicals.
"go terms"
: (argument representation
must be
"network"
) shows the P-Value of relation between GO terms and
chemicals.
For disease queries, index_name
can take values:
"gene"
: (argument representation
must be
"heatmap"
) shows the number of references linking a set of genes
with a set of diseases.
"chemical"
: (argument representation
must be
"heatmap"
): shows the inference-score linking diseases with
chemicals.
"kegg pathways"
: (argument representation
must be
"network"
) shows the pathways linked to a set of diseases.
The other arguments of plot
functions follows:
subset.chemical
: filters the chemicals to be include into
the plot.
subset.gene
: filters the genes to be include into
the plot.
subset.pathway
: filters the KEGG pathways or GO terms
included into the plot.
subset.source
: filters the origin in gene-gene interaction
network.
subset.target
: filters the end in gene-gene interaction
network.
field.score
: can take values "Inference"
or
"Reference"
depending of the used source and representation.
filter.score
: allows to filter the relations to be included
into the plot in base of the set of field.score
.
max.length
: indicates the maximum number of characters of
the names of each "item" in the plot.
ontology
: for the KEGG pathways, allows to filter the
pathways in base of their ontology. By default:
c("Biological Process", "Cellular Component",
"Molecular Function" )
.
main
: title to be displayed in network representations.
For heatmap representations use: + ggtitle("TITLE")
.
An object of class CTDdata
enrich(CTDdata)
: Method to perform enrichment analysis given two
object of class CTDdata
.
get_table(CTDdata)
: Method to obtain a specific inner table from an CTDdata
object.
get_terms(CTDdata)
: Return a list with the terms found to create the object.
plot(x = CTDdata, y = ANY)
: Generates a basic plot showing the number of terms
that can be used to query CTDbase.
timestamp
Character with the timestamp.
type
Character saving "GENE"
, "CHEMICAL"
or
"DISEASE"
depending if it was created using
query_ctd_gene
, query_ctd_chem
or
query_ctd_dise
terms
DataFrame
with the genes, chemicals or diseases used
to create the object.
losts
Character with the terms used to create the object but that were nor present in CTDbase.
gene_interactions
(Only for chemicals) Table with a relation of the genes interacting with the given chemicals.
chemicals_interactions
(Only for genes) Table with a relation of the chemicals interacting with the given genes.
diseases
Table with a relation of the diseases associated with given genes or chemicals.
gene_gene_interactions
(Only for genes) Table with a relation of the genes interacting with the given genes.
kegg
Table with a relation of the KEGG pathways affected by the given chemicals or where the given genes play a role.
go
Table with a relation of the GO terms affected by the given chemicals or where the given genes play a role.
query_ctd_gene
to create a CTDdata
from a set of genes, query_ctd_chem
to create a
CTDdata
from a set of chemicals, query_ctd_dise
to
create a CTDdata
from a set of diseases,
get_table
to retrieve encapsulated data and
plot
to get nice plots from stored data.
It can retrieve information related to genes, chemicals and diseases.
CTDquerier
offers two functions to query CTDbase (http://ctdbase.org):
query_ctd_gene
to query CTDbase given a set of genes; and
query_ctd_chem
to query CTDbase given a set of chemicals. Both
functions returns CTDdata
objects. Raw downloaded information
can be retrieved from CTDdata
using method
get_table
.
CTDdata
objects offers basic visualization of the downloaded
information using standard plot
method.
This function download the "Chemical vocabulary" file (CTD_chemicals.tsv.gz
)
from http://ctdbase.org/downloads
.
download_ctd_chem(verbose = FALSE, ask = TRUE)
download_ctd_chem(verbose = FALSE, ask = TRUE)
verbose |
(default |
ask |
(default |
The field included in the file (CTD_chemicals.tsv.gz
) are:
ChemicalName
ChemicalID (MeSH identifier)
CasRN (CAS Registry Number, if available)
Definition
ParentIDs (identifiers of the parent terms; '|'-delimited list),
TreeNumbers (identifiers of the chemical's nodes; '|'-delimited list),
ParentTreeNumbers (identifiers of the parent nodes; '|'-delimited list),
Synonyms ('|'-delimited list)
DrugBankIDs ('|'-delimited list)
Passed name into filename
argument if it could be download
1
otherwise.
download_ctd_chem() file.exists( "CTD_chemicals.tsv.gz" )
download_ctd_chem() file.exists( "CTD_chemicals.tsv.gz" )
This function download the "Disease vocabulary" file (CTD_diseases.tsv.gz
)
from http://ctdbase.org/downloads
.
download_ctd_dise(verbose = FALSE, ask = TRUE)
download_ctd_dise(verbose = FALSE, ask = TRUE)
verbose |
(default |
ask |
(default |
The field included in the file (CTD_diseases.tsv.gz
) are:
DiseaseName
DiseaseID (MeSH or OMIM identifier)
Definition
AltDiseaseIDs (alternative identifiers; '|'-delimited list)
ParentIDs (identifiers of the parent terms; '|'-delimited list)
TreeNumbers (identifiers of the disease's nodes; '|'-delimited list)
ParentTreeNumbers (identifiers of the parent nodes; '|'-delimited list)
Synonyms ('|'-delimited list)
SlimMappings (MEDIC-Slim mappings; '|'-delimited list)
Passed name into filename
argument if it could be download
1
otherwise.
download_ctd_dise() file.exists( "CTD_diseases.tsv.gz" )
download_ctd_dise() file.exists( "CTD_diseases.tsv.gz" )
This function download the "Gene vocabulary" file (CTD_genes.tsv.gz
)
from http://ctdbase.org/downloads
.
download_ctd_genes(verbose = FALSE, ask = TRUE)
download_ctd_genes(verbose = FALSE, ask = TRUE)
verbose |
(default |
ask |
(default |
The field included in the file (CTD_genes.tsv.gz
) are:
GeneSymbol
GeneName
GeneID (NCBI Gene identifier)
AltGeneIDs (alternative NCBI Gene identifiers; '|'-delimited list)
Synonyms ('|'-delimited list)
BioGRIDIDs ('|'-delimited list)
PharmGKBIDs ('|'-delimited list)
UniprotIDs ('|'-delimited list)
Passed name into filename
argument if it could be download
1
otherwise.
download_ctd_genes()
download_ctd_genes()
CTDdata
objectsThis methods performs a fisher test using the genes in two objects of
class CTDdata
. The object in 'x' is used as source while
the object on 'y' is used as universe. When object 'x' corresponds to
an object created with query_ctd_gene
, the used genes
are the found terms in CTDbase. In the other cases (chemicals and
disease CTDdata
), the genes from the 'gene interactions'
table are used. If universe
is missing, all genes in CTDbase
are used as universe.
enrich(x, y, universe, use = "curated", warnings = TRUE, ...)
enrich(x, y, universe, use = "curated", warnings = TRUE, ...)
x |
Object of class |
y |
Object of class |
universe |
Vector of strings corresponding to the genes to be used as universe. |
use |
(default: |
warnings |
(default: |
... |
NOT USED |
A list with class htest
. Check
fisher.test
for more information.
# Example in a tryCatch, since we are performing a connection to a server we might # get a refused connection due to a server rejection. Evaluate the recieved HTTP # message to understand if the server is not available or if your IP adress is temporarly restricted tryCatch({ data("gala") air <- query_ctd_chem( terms = "Air Pollutants" ) hgnc_universe <- readRDS(paste0(path.package("CTDquerier"),"/extdata/universe.RDS")) enrich(gala, air, hgnc_universe) }, error = function(w){NULL})
# Example in a tryCatch, since we are performing a connection to a server we might # get a refused connection due to a server rejection. Evaluate the recieved HTTP # message to understand if the server is not available or if your IP adress is temporarly restricted tryCatch({ data("gala") air <- query_ctd_chem( terms = "Air Pollutants" ) hgnc_universe <- readRDS(paste0(path.package("CTDquerier"),"/extdata/universe.RDS")) enrich(gala, air, hgnc_universe) }, error = function(w){NULL})
CTDdata
for ilustrative purpousesCTDdata
with information of 258 genes downloaded from CTDbase.
The object was created from from the genes obtained from the scientific
article entitleed Case-control admixture mapping in Latino populations
enriches for known asthma-associated genes" (Table E1) by Torgerson et. al.
The genes were used to query CTDbase using query_ctd_genes
function.
data("gala")
data("gala")
An object of class CTDdata
of length 1.
An CTDdata
object.
data("gala") gala
data("gala") gala
CTDdata
object.Obtain the raw data from a CTDdata
object, result from a query to
CTDbase.
get_table(object, index_name, ...)
get_table(object, index_name, ...)
object |
Object of class |
index_name |
String indicating the type of data to obtain. |
... |
NOT USED |
Available tables are (index_name
):
"gene interactions"
: (Only for chemicals) Table with
a relation of the genes interacting with the given chemicals.
"chemical interactions"
: (Only for genes) Table with
a relation of the chemicals interacting with the given genes.
"diseases"
: Table with a relation of the diseases
associated with given genes or chemicals.
"gene-gene interactions"
: (Only for genes) Table with
a relation of the genes interacting with the given genes.
"kegg pathways"
: Table with a relation of the KEGG pathways
affected by the given chemicals or where the given genes play a role.
"go terms"
: Table with a relation of the GO terms affected by
the given chemicals or where the given genes play a role.
A DataFrame
containing the raw result from CTDdata.
data("gala") get_table(gala, "diseases")[1:3, ]
data("gala") get_table(gala, "diseases")[1:3, ]
Getter to obtain the terms used to perform a query into CTDbase
get_terms(object)
get_terms(object)
object |
Object of class |
A list with two accessors: "used"
for the terms that
exists in CTDbase, and "lost"
with the terms that do not
exists in CTDbase.
data("gala") get_terms(gala)[["lost"]]
data("gala") get_terms(gala)[["lost"]]
This functions taked a data.frame
and returns a gtable
with three plots. The left-leafes, the axis names and the right-leafes.
leaf_plot( dta, label = "name", valueLeft = "var1", valueRight = "var2", titleLeft = NULL, titleRight = NULL, colorLeft = "#FF7F50", colorRight = "#20B2AA" )
leaf_plot( dta, label = "name", valueLeft = "var1", valueRight = "var2", titleLeft = NULL, titleRight = NULL, colorLeft = "#FF7F50", colorRight = "#20B2AA" )
dta |
|
label |
(default |
valueLeft |
(default |
valueRight |
(default |
titleLeft |
(default |
titleRight |
(default |
colorLeft |
(default |
colorRight |
(default |
A ggplo2 object.
data <- data.frame( labels = LETTERS[1:15], right = runif(n = 15) * 11, left = runif(n = 15) * 9 ) leaf_plot( data, "labels", "left", "right", "runif09", "runif11")
data <- data.frame( labels = LETTERS[1:15], right = runif(n = 15) * 11, left = runif(n = 15) * 9 ) leaf_plot( data, "labels", "left", "right", "runif09", "runif11")
.tsv.gz
file for chemicalsFunction to load the .tsv.gz
file for chemicals
load_ctd_chem(verbose = FALSE)
load_ctd_chem(verbose = FALSE)
verbose |
(default |
The field included in the file (CTD_chemicals.tsv.gz
) are:
ChemicalName
ChemicalID (MeSH identifier)
CasRN (CAS Registry Number, if available)
Definition
ParentIDs (identifiers of the parent terms; '|'-delimited list),
TreeNumbers (identifiers of the chemical's nodes; '|'-delimited list),
ParentTreeNumbers (identifiers of the parent nodes; '|'-delimited list),
Synonyms ('|'-delimited list)
DrugBankIDs ('|'-delimited list)
A data.frame
with the content of the file "CTD_genes.tsv.gz"
if(download_ctd_chem()){ fdl <- load_ctd_chem() dim( fdl ) }
if(download_ctd_chem()){ fdl <- load_ctd_chem() dim( fdl ) }
.tsv.gz
file for diseaseFunction to load the .tsv.gz
file for disease
load_ctd_dise(verbose = FALSE)
load_ctd_dise(verbose = FALSE)
verbose |
(default |
The field included in the file (CTD_diseases.tsv.gz
) are:
DiseaseName
DiseaseID (MeSH or OMIM identifier)
Definition
AltDiseaseIDs (alternative identifiers; '|'-delimited list)
ParentIDs (identifiers of the parent terms; '|'-delimited list)
TreeNumbers (identifiers of the disease's nodes; '|'-delimited list)
ParentTreeNumbers (identifiers of the parent nodes; '|'-delimited list)
Synonyms ('|'-delimited list)
SlimMappings (MEDIC-Slim mappings; '|'-delimited list)
A data.frame
with the content of the file "CTD_genes.tsv.gz"
if(download_ctd_dise()){ fdl <- load_ctd_dise() dim( fdl ) }
if(download_ctd_dise()){ fdl <- load_ctd_dise() dim( fdl ) }
.tsv.gz
file for genesThis function works in pair with download_ctd_genes
. This
function loads into the R session the downloaded "CTD_genes.tsv.gz"
file.
load_ctd_gene(verbose = FALSE)
load_ctd_gene(verbose = FALSE)
verbose |
(default |
The field included in the file (CTD_genes.tsv.gz
) are:
GeneSymbol
GeneName
GeneID (NCBI Gene identifier)
AltGeneIDs (alternative NCBI Gene identifiers; '|'-delimited list)
Synonyms ('|'-delimited list)
BioGRIDIDs ('|'-delimited list)
PharmGKBIDs ('|'-delimited list)
UniprotIDs ('|'-delimited list)
A data.frame
with the content of the file "CTD_genes.tsv.gz"
if(download_ctd_genes()){ fdl <- load_ctd_gene() dim( fdl ) }
if(download_ctd_genes()){ fdl <- load_ctd_gene() dim( fdl ) }
This function checks for CTDbase gene vocabulary and query CTDbase for each one, downloading chemical-genes interactions, associated diseases, associated KEGG pathways and associated GO terms.
query_ctd_chem(terms, max.distance = 10, ask = FALSE, verbose = FALSE)
query_ctd_chem(terms, max.distance = 10, ask = FALSE, verbose = FALSE)
terms |
Character vector with the chemicals used in the query. |
max.distance |
(default |
ask |
(default |
verbose |
(default |
An object of class CTDdata
.
# Example in a tryCatch, since we are performing a connection to a server we might # get a refused connection due to a server rejection. Evaluate the recieved HTTP # message to understand if the server is not available or if your IP adress is temporarly restricted rst <- tryCatch({query_ctd_chem( terms = c( "Iron", "Air Pollutants" ), verbose = TRUE )}, error = function(w){NULL})
# Example in a tryCatch, since we are performing a connection to a server we might # get a refused connection due to a server rejection. Evaluate the recieved HTTP # message to understand if the server is not available or if your IP adress is temporarly restricted rst <- tryCatch({query_ctd_chem( terms = c( "Iron", "Air Pollutants" ), verbose = TRUE )}, error = function(w){NULL})
This function checks for CTDbase disease vocabulary and query CTDbase for each one, downloading disease-gene interactions, chemicals interactions, associated diseases, associated KEGG pathways and associated GO terms.
query_ctd_dise(terms, ask = TRUE, verbose = FALSE)
query_ctd_dise(terms, ask = TRUE, verbose = FALSE)
terms |
Character vector with the diseases used in the query. |
ask |
(default |
verbose |
(default |
An object of class CTDdata
.
# Example in a tryCatch, since we are performing a connection to a server we might # get a refused connection due to a server rejection. Evaluate the recieved HTTP # message to understand if the server is not available or if your IP adress is temporarly restricted rst <- tryCatch({query_ctd_dise( terms = "Asthma", verbose = TRUE )}, error = function(w){NULL})
# Example in a tryCatch, since we are performing a connection to a server we might # get a refused connection due to a server rejection. Evaluate the recieved HTTP # message to understand if the server is not available or if your IP adress is temporarly restricted rst <- tryCatch({query_ctd_dise( terms = "Asthma", verbose = TRUE )}, error = function(w){NULL})
This function checks for CTDbase gene vocabulary and query CTDbase for each one, downloading gene-gene interactions, chemicals interactions, associated disease, associated KEGG pathways and associated GO terms.
query_ctd_gene(terms, ask = TRUE, verbose = FALSE)
query_ctd_gene(terms, ask = TRUE, verbose = FALSE)
terms |
Character vector with the genes used in the query. |
ask |
(default |
verbose |
(default |
An object of class CTDdata
.
# Example in a tryCatch, since we are performing a connection to a server we might # get a refused connection due to a server rejection. Evaluate the recieved HTTP # message to understand if the server is not available or if your IP adress is temporarly restricted rst <- tryCatch({query_ctd_gene( terms = c( "APP", "HMOX1A", "hmox1" ), verbose = TRUE )}, error = function(w){NULL})
# Example in a tryCatch, since we are performing a connection to a server we might # get a refused connection due to a server rejection. Evaluate the recieved HTTP # message to understand if the server is not available or if your IP adress is temporarly restricted rst <- tryCatch({query_ctd_gene( terms = c( "APP", "HMOX1A", "hmox1" ), verbose = TRUE )}, error = function(w){NULL})