Title: | Drug-Target Interactions |
---|---|
Description: | Provides utilities for identifying drug-target interactions for sets of small molecule or gene/protein identifiers. The required drug-target interaction information is obained from a local SQLite instance of the ChEMBL database. ChEMBL has been chosen for this purpose, because it provides one of the most comprehensive and best annotatated knowledge resources for drug-target information available in the public domain. |
Authors: | Thomas Girke [cre, aut] |
Maintainer: | Thomas Girke <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.15.0 |
Built: | 2024-10-31 15:36:21 UTC |
Source: | https://github.com/bioc/drugTargetInteractions |
The drugTargetInteractions package provides utilities for identifying drug-target interactions for sets of small molecule or gene/protein identifiers. The required drug-target interaction information is obained from a local SQLite instance of the ChEMBL database.
The DESCRIPTION file:
Package: | drugTargetInteractions |
Type: | Package |
Title: | Drug-Target Interactions |
Version: | 1.15.0 |
Date: | 2023-10-24 |
Authors@R: | person("Thomas", "Girke", email="[email protected]", role=c("cre", "aut")) |
Description: | Provides utilities for identifying drug-target interactions for sets of small molecule or gene/protein identifiers. The required drug-target interaction information is obained from a local SQLite instance of the ChEMBL database. ChEMBL has been chosen for this purpose, because it provides one of the most comprehensive and best annotatated knowledge resources for drug-target information available in the public domain. |
Depends: | methods, R (>= 4.1) |
Imports: | utils, RSQLite, UniProt.ws, biomaRt,ensembldb, BiocFileCache,dplyr,rappdirs, AnnotationFilter, S4Vectors |
Suggests: | RUnit, BiocStyle, knitr, rmarkdown, ggplot2, reshape2, DT, EnsDb.Hsapiens.v86 |
VignetteBuilder: | knitr |
License: | Artistic-2.0 |
NeedsCompilation: | no |
URL: | https://github.com/girke-lab/drugTargetInteractions |
biocViews: | Cheminformatics, BiomedicalInformatics, Pharmacogenetics, Pharmacogenomics, Proteomics, Metabolomics |
RoxygenNote: | 7.1.1 |
BugReports: | https://github.com/girke-lab/drugTargetInteractions |
Repository: | https://bioc.r-universe.dev |
RemoteUrl: | https://github.com/bioc/drugTargetInteractions |
RemoteRef: | HEAD |
RemoteSha: | 262cba990e3a34e7f0a5c0ddc4e8567950179116 |
Author: | Thomas Girke [cre, aut] |
Maintainer: | Thomas Girke <[email protected]> |
Index of help topics:
cmpIdMapping cmpIdMapping downloadChemblDb downloadChemblDb downloadUniChem downloadUniChem drugTargetAnnot drugTargetAnnot drugTargetAnnotTable drugTargetAnnotTable drugTargetBioactivity drugTargetBioactivity drugTargetInteractions-package Drug-Target Interactions genConfig genConfig getDrugTarget getDrugTarget getParalogs getParalogs getSymEnsUp Gene to Protein ID Mappings getUniprotIDs Retrieve UniProt IDs via ID and Cluster Mappings processDrugage processDrugage runDrugTarget_Annot_Bioassay runDrugTarget_Annot_Bioassay transformTTD transformTTD
Thomas Girke [cre, aut]
Maintainer: Thomas Girke <[email protected]>
Function to generate compound ID mappings UniChem.
This function requires the ID mapping files "src1src2.txt.gz", "src1src22.txt.gz", and
"src1src7.txt.gz" to exist in a directory called "downloads" before being run. These
can be generated with the downloadUniChem
function.
It will do some processing on these files and output an RDS file
at outfile
. This file can then be used in other functions,
such as drugTargetAnnot
.
cmpIdMapping(outfile=file.path(config$resultsPath,"cmp_ids.rds"), rerun=TRUE,config=genConfig())
cmpIdMapping(outfile=file.path(config$resultsPath,"cmp_ids.rds"), rerun=TRUE,config=genConfig())
outfile |
Path to output file. |
rerun |
If true, runs processing, otherwise does nothing. |
config |
General configuration. See |
Generates an RDS file at outfile
.
Thomas Girke
cmpIdMapping("cmp_ids.rds",rerun=FALSE)
cmpIdMapping("cmp_ids.rds",rerun=FALSE)
Download ChEMBL sqlite db for use by several other functions in the package.
downloadChemblDb(version,rerun=TRUE,config=genConfig())
downloadChemblDb(version,rerun=TRUE,config=genConfig())
version |
The ChEMBL version to download. |
rerun |
If TRUE, the file will be downloaded, otherwise do nothing. |
config |
The configuration object. This gives the location to put the downloaded chembl db. |
No return value.
Kevin Horan
downloadChemblDb(27)
downloadChemblDb(27)
Downloads UniChem compound ID mappings from https://www.ebi.ac.uk/unichem/ucquery/listSources. Mappings are downloaded for DrugBank, PubChem, and ChEBI.
downloadUniChem(rerun=TRUE, config=genConfig())
downloadUniChem(rerun=TRUE, config=genConfig())
rerun |
If true, downloads the files, else does nothing. |
config |
General configuration. See |
Generates the following output files: "src1src2.txt.gz", "src1src22.txt.gz", and "src1src7.txt.gz". These correspond to mappings from ChEMBL to DrugBank, PubChem, and ChEBI, respectivly.
Thomas Girke
https://www.ebi.ac.uk/unichem/ucquery/listSources
downloadUniChem(rerun=TRUE)
downloadUniChem(rerun=TRUE)
Function to query known drug-target annotations.
drugTargetAnnot(queryBy=list(molType=NULL, idType=NULL, ids=NULL), cmpid_file=file.path(config$resultsPath,"cmp_ids.rds"), config=genConfig())
drugTargetAnnot(queryBy=list(molType=NULL, idType=NULL, ids=NULL), cmpid_file=file.path(config$resultsPath,"cmp_ids.rds"), config=genConfig())
queryBy |
A list defining the query, as described in |
cmpid_file |
Path to a compound ID mapping file, generated by |
config |
General configuration. See |
Returns the query results as a data frame.
Thomas Girke
# Tthese are just sample files included in the package. # You should use your own data files. config = genConfig(chemblDbPath= system.file("extdata", "chembl_sample.db", package="drugTargetInteractions"), resultsPath = system.file("extdata", "results", package="drugTargetInteractions")) queryBy <- list(molType="cmp", idType="chembl_id", ids=c("CHEMBL1233058", "CHEMBL1200916", "CHEMBL437765")) qresult <- drugTargetAnnot(queryBy, config=config)
# Tthese are just sample files included in the package. # You should use your own data files. config = genConfig(chemblDbPath= system.file("extdata", "chembl_sample.db", package="drugTargetInteractions"), resultsPath = system.file("extdata", "results", package="drugTargetInteractions")) queryBy <- list(molType="cmp", idType="chembl_id", ids=c("CHEMBL1233058", "CHEMBL1200916", "CHEMBL437765")) qresult <- drugTargetAnnot(queryBy, config=config)
Generates a drug target annotation TSV file. This file includes target information from ChEMBL, drugbank, pubchem, and chembi.
This function requires the ID mapping files "src1src2.txt.gz", "src1src22.txt.gz", and
"src1src7.txt.gz" to exist in a directory called "downloads" before being run. These
can be generated with the downloadUniChem
function.
drugTargetAnnotTable(outfile, rerun=TRUE,config=genConfig())
drugTargetAnnotTable(outfile, rerun=TRUE,config=genConfig())
outfile |
The name of the output file to write the results to. |
rerun |
If true, download and generate output file. Otherwise do nothing. |
config |
General configuration. See |
Writes output file to outfile
.
Thomas Girke
config = genConfig(chemblDbPath= system.file("extdata", "chembl_sample.db", package="drugTargetInteractions")) drugTargetAnnotTable(outfile="drugTargetAnnot.xls", config=config)
config = genConfig(chemblDbPath= system.file("extdata", "chembl_sample.db", package="drugTargetInteractions")) drugTargetAnnotTable(outfile="drugTargetAnnot.xls", config=config)
Function to query bioactivity data by target or compound ids
drugTargetBioactivity( queryBy=list(molType=NULL, idType=NULL, ids=NULL), cmpid_file=file.path(config$resultsPath,"cmp_ids.rds"),config=genConfig())
drugTargetBioactivity( queryBy=list(molType=NULL, idType=NULL, ids=NULL), cmpid_file=file.path(config$resultsPath,"cmp_ids.rds"),config=genConfig())
queryBy |
A list defining the query, as described in |
cmpid_file |
Path to a compound ID mapping file, generated by |
config |
General configuration. See |
Returns results as a data frame.
Thomas Girke
config = genConfig(chemblDbPath= system.file("extdata", "chembl_sample.db", package="drugTargetInteractions"), resultsPath = system.file("extdata", "results", package="drugTargetInteractions")) queryBy <- list(molType="protein", idType="uniprot", ids=c("P05979", "P35354", "P33033", "Q8VCT3", "P29475", "P51511")) qresult <- drugTargetBioactivity( queryBy, config=config) queryBy <- list(molType="cmp", idType="molregno", ids=c("101036", "101137", "1384464")) qresult <- drugTargetBioactivity( queryBy, config=config) queryBy <- list(molType="cmp", idType="DrugBank_ID", ids=c("DB00945", "DB00316", "DB01050")) qresult <- drugTargetBioactivity(queryBy, config=config) queryBy <- list(molType="cmp", idType="PubChem_ID", ids=c("2244", "3672", "1983")) qresult <- drugTargetBioactivity(queryBy, config=config)
config = genConfig(chemblDbPath= system.file("extdata", "chembl_sample.db", package="drugTargetInteractions"), resultsPath = system.file("extdata", "results", package="drugTargetInteractions")) queryBy <- list(molType="protein", idType="uniprot", ids=c("P05979", "P35354", "P33033", "Q8VCT3", "P29475", "P51511")) qresult <- drugTargetBioactivity( queryBy, config=config) queryBy <- list(molType="cmp", idType="molregno", ids=c("101036", "101137", "1384464")) qresult <- drugTargetBioactivity( queryBy, config=config) queryBy <- list(molType="cmp", idType="DrugBank_ID", ids=c("DB00945", "DB00316", "DB01050")) qresult <- drugTargetBioactivity(queryBy, config=config) queryBy <- list(molType="cmp", idType="PubChem_ID", ids=c("2244", "3672", "1983")) qresult <- drugTargetBioactivity(queryBy, config=config)
Create a default configuration object.
genConfig( chemblDbPath = "chembldb.db", downloadPath = "downloads", resultsPath = "results" )
genConfig( chemblDbPath = "chembldb.db", downloadPath = "downloads", resultsPath = "results" )
chemblDbPath |
Path or filename of ChEMBL SQLite db file. |
downloadPath |
The name of a directory to put downloaded files in. |
resultsPath |
The name of a directory to put output files in. |
A config object that can be passed to ther functions.
Kevin Horan
config = genConfig()
config = genConfig()
This function allows you to query a subset of the data
fetched by drugTargetAnnotTable
.
getDrugTarget(dt_file=file.path(config$resultsPath,"drugTargetAnnot.xls"), queryBy=list(molType=NULL, idType=NULL, ids=NULL), id_mapping=c(chembl="chembl_id", pubchem="PubChem_ID", uniprot="UniProt_ID"), columns,config=genConfig())
getDrugTarget(dt_file=file.path(config$resultsPath,"drugTargetAnnot.xls"), queryBy=list(molType=NULL, idType=NULL, ids=NULL), id_mapping=c(chembl="chembl_id", pubchem="PubChem_ID", uniprot="UniProt_ID"), columns,config=genConfig())
dt_file |
The drug target annotation file. This can be generated with |
queryBy |
A list defining the query, as described in |
id_mapping |
A list providing the id columns for ChEMBL, PubChem, and UniProt. It should contain the fields "chembl", "pubchem", and "uniprot", each wit the column name of the respective id number in the drug target annotation file. See default value above for an example. |
columns |
A list of column indexes to select as a subset of the final result set. |
config |
General configuration. See |
Returns the query result as a data frame.
Thomas Girke
config = genConfig(chemblDbPath= system.file("extdata", "chembl_sample.db", package="drugTargetInteractions"), resultsPath = system.file("extdata", "results", package="drugTargetInteractions")) id_mapping <- c(chembl="chembl_id", pubchem="PubChem_ID", uniprot="UniProt_ID", drugbank="DrugBank_ID") queryBy <- list(molType="cmp", idType="chembl", ids=c("CHEMBL25", "CHEMBL1742471")) getDrugTarget(queryBy=queryBy, id_mapping=id_mapping, columns=c(1,5,8,16,17),config=config) queryBy <- list(molType="cmp", idType="pubchem", ids=c("2244", "65869", "2244")) getDrugTarget(queryBy=queryBy, id_mapping=id_mapping, columns=c(1,5,8,16,17),config=config) queryBy <- list(molType="protein", idType="uniprot", ids=c("P43166", "P00915", "P43166")) getDrugTarget(queryBy=queryBy, id_mapping=id_mapping, columns=c(1,5,8,16,17),config=config)
config = genConfig(chemblDbPath= system.file("extdata", "chembl_sample.db", package="drugTargetInteractions"), resultsPath = system.file("extdata", "results", package="drugTargetInteractions")) id_mapping <- c(chembl="chembl_id", pubchem="PubChem_ID", uniprot="UniProt_ID", drugbank="DrugBank_ID") queryBy <- list(molType="cmp", idType="chembl", ids=c("CHEMBL25", "CHEMBL1742471")) getDrugTarget(queryBy=queryBy, id_mapping=id_mapping, columns=c(1,5,8,16,17),config=config) queryBy <- list(molType="cmp", idType="pubchem", ids=c("2244", "65869", "2244")) getDrugTarget(queryBy=queryBy, id_mapping=id_mapping, columns=c(1,5,8,16,17),config=config) queryBy <- list(molType="protein", idType="uniprot", ids=c("P43166", "P00915", "P43166")) getDrugTarget(queryBy=queryBy, id_mapping=id_mapping, columns=c(1,5,8,16,17),config=config)
Using biomaRt
, obtain for query genes the corresponding UniProt IDs as well
as paralogs. Query genes can be Gene Names or ENSEMBL Gene IDs from
H sapiens. The result is similar to IDMs and SSNNs from getUniprotIDs
function, but instead of UNIREF clusters, biomaRt's paralogs are used to
obtain SSNNs.
getParalogs(queryBy)
getParalogs(queryBy)
queryBy |
A list defining the query, as described in |
Returns a list with the paralogs for the given genes.
Thomas Girke
queryBy <- list(molType="gene", idType="external_gene_name", ids=c("ZPBP", "MAPK1", "EGFR")) #requires network connection and is slow result <- getParalogs(queryBy)
queryBy <- list(molType="gene", idType="external_gene_name", ids=c("ZPBP", "MAPK1", "EGFR")) #requires network connection and is slow result <- getParalogs(queryBy)
The getSymEnsUp
function returns for a query of gene or protein IDs a mapping
table containing: ENSEMBL Gene IDs, Gene Names/Symbols, UniProt IDs and ENSEMBL
Protein IDs. Subsequent slots contain the corresponding named character vectors.
Internally, the function uses the ensembldb
package.
getSymEnsUp(EnsDb = "EnsDb.Hsapiens.v86", ids, idtype)
getSymEnsUp(EnsDb = "EnsDb.Hsapiens.v86", ids, idtype)
EnsDb |
|
ids |
Character vector with IDs matching the type specified under |
idtype |
Character vector of length one containing one of: |
List object with following components:
idDF |
ID mapping |
ens_gene_id |
named character vector |
up_ens_id |
named character vector |
up_gene_id |
named character vector |
Thomas Girke
gene_name <- c("CA7", "CFTR") getSymEnsUp(EnsDb="EnsDb.Hsapiens.v86", ids=gene_name, idtype="GENE_NAME")
gene_name <- c("CA7", "CFTR") getSymEnsUp(EnsDb="EnsDb.Hsapiens.v86", ids=gene_name, idtype="GENE_NAME")
The following returns for a set of query IDs (e.g. Ensembl gene IDs) the corresponding UniProt IDs based on two independent approaches: ID mappings (IDMs) and sequence similarity nearest neighbors (SSNNs) using UNIREF clusters. Note, the 'keys' or query IDs (e.g. ENSEMBL genes) can only be reliably maintained in the SSNN results when 'chunksize=1' since batch queries for protein clusters with 'UnitProt.ws' will often drop the query IDs. To address this, the query result contains an extra 'QueryID' column when 'chunksize=1', but not when it is set to a different value than 1.
The getParalogs
function is similar but it uses biomaRt's paralogs instead of UNIREF clusters.
getUniprotIDs(taxId = 9606, kt = "ENSEMBL", keys, seq_cluster = "UNIREF90", chunksize=20)
getUniprotIDs(taxId = 9606, kt = "ENSEMBL", keys, seq_cluster = "UNIREF90", chunksize=20)
taxId |
An NCBI taxonomy ID |
kt |
Should be either "ENSEMBL" or "UNIPROTKB". |
keys |
Query IDs. |
seq_cluster |
Which cluster to use. Should be one of 'UNIREF100', 'UNIREF90', 'UNIREF50'. |
chunksize |
Queries are done in batches, this parameter sets the size of each batch. |
Returns a list of data.
Thomas Girke
keys <- c("ENSG00000145700", "ENSG00000135441", "ENSG00000120071", "ENSG00000120088", "ENSG00000185829", "ENSG00000185829", "ENSG00000185829", "ENSG00000238083", "ENSG00000012061", "ENSG00000104856", "ENSG00000104936", "ENSG00000117877", "ENSG00000130202", "ENSG00000130202", "ENSG00000142252", "ENSG00000189114", "ENSG00000234906") res_list100 <- getUniprotIDs(taxId=9606, kt="ENSEMBL", keys=keys, seq_cluster="UNIREF100")
keys <- c("ENSG00000145700", "ENSG00000135441", "ENSG00000120071", "ENSG00000120088", "ENSG00000185829", "ENSG00000185829", "ENSG00000185829", "ENSG00000238083", "ENSG00000012061", "ENSG00000104856", "ENSG00000104936", "ENSG00000117877", "ENSG00000130202", "ENSG00000130202", "ENSG00000142252", "ENSG00000189114", "ENSG00000234906") res_list100 <- getUniprotIDs(taxId=9606, kt="ENSEMBL", keys=keys, seq_cluster="UNIREF100")
Download Drug Age data from genomics.senescence.info/drugs. Process the data and write it out as a TSV spreadsheet.
processDrugage(drugagefile=file.path(config$resultsPath,"drugage_id_mapping.xls"), redownloaddrugage=TRUE,config=genConfig())
processDrugage(drugagefile=file.path(config$resultsPath,"drugage_id_mapping.xls"), redownloaddrugage=TRUE,config=genConfig())
drugagefile |
The name of the output file. |
redownloaddrugage |
If true, download the data file. Otherwise assume the file is already downloaded. |
config |
General configuration. See |
Output is written to drugagefile
.
Thomas Girke
tryCatch({ config = genConfig(chemblDbPath= system.file("extdata", "chembl_sample.db", package="drugTargetInteractions")) processDrugage("druage_id_mapping.xls",TRUE,config) }, error=function(e){ message("Failed to run processDrugage(), please try again later") } )
tryCatch({ config = genConfig(chemblDbPath= system.file("extdata", "chembl_sample.db", package="drugTargetInteractions")) processDrugage("druage_id_mapping.xls",TRUE,config) }, error=function(e){ message("Failed to run processDrugage(), please try again later") } )
Meta-function to obtain in one step both drug-target annotation and bioassay data.
runDrugTarget_Annot_Bioassay(res_list, up_col_id="ID", ens_gene_id, cmpid_file=file.path(config$resultsPath,"cmp_ids.rds") ,config=genConfig(), ...)
runDrugTarget_Annot_Bioassay(res_list, up_col_id="ID", ens_gene_id, cmpid_file=file.path(config$resultsPath,"cmp_ids.rds") ,config=genConfig(), ...)
res_list |
Object obtained from |
up_col_id |
Column name in |
ens_gene_id |
Named character vector with ENSEMBL gene IDs in name slot and gene symbols or other ID type in value slot |
cmpid_file |
Path to CMP ID mapping file, often named |
config |
General configuration. See |
... |
Slot to pass on additional arguments. |
List with two components each containing a data.frame
. The first one (Annotation
)
contains drug-target annotation data, and the second one (Bioassay
) contains drug-target
bioassay data.
Thomas Girke
References to be added...
See also: drugTargetAnnot
and drugTargetBioactivity
config = genConfig(chemblDbPath= system.file("extdata", "chembl_sample.db", package="drugTargetInteractions"), resultsPath = system.file("extdata", "results", package="drugTargetInteractions")) ## (1) Translate gene symbols to ENSEMBL gene IDs ensembl_gene_id <- c("ENSG00000001626", "ENSG00000168748") idMap <- getSymEnsUp(EnsDb="EnsDb.Hsapiens.v86", ids=ensembl_gene_id, idtype="ENSEMBL_GENE_ID") ens_gene_id <- idMap$ens_gene_id ## (2a) Retrieve UniProt IDs with both IDMs and SSNN paralogs queryBy <- list(molType="gene", idType="ensembl_gene_id", ids=names(ens_gene_id)) #this function is slow and requires a network connection res_list <- getParalogs(queryBy) ## (3) Obtain Drug-Target Annotation and Bioassay Data drug_target_list <- runDrugTarget_Annot_Bioassay(res_list=res_list, up_col_id="ID_up_sp", ens_gene_id,config=config )
config = genConfig(chemblDbPath= system.file("extdata", "chembl_sample.db", package="drugTargetInteractions"), resultsPath = system.file("extdata", "results", package="drugTargetInteractions")) ## (1) Translate gene symbols to ENSEMBL gene IDs ensembl_gene_id <- c("ENSG00000001626", "ENSG00000168748") idMap <- getSymEnsUp(EnsDb="EnsDb.Hsapiens.v86", ids=ensembl_gene_id, idtype="ENSEMBL_GENE_ID") ens_gene_id <- idMap$ens_gene_id ## (2a) Retrieve UniProt IDs with both IDMs and SSNN paralogs queryBy <- list(molType="gene", idType="ensembl_gene_id", ids=names(ens_gene_id)) #this function is slow and requires a network connection res_list <- getParalogs(queryBy) ## (3) Obtain Drug-Target Annotation and Bioassay Data drug_target_list <- runDrugTarget_Annot_Bioassay(res_list=res_list, up_col_id="ID_up_sp", ens_gene_id,config=config )
Integration with Therapeutic Target Database (TTD). This function downloads a data file from idrblab.org and returns it as a data frame.
transformTTD(ttdfile=file.path(config$downloadPath,"TTD_IDs.txt"), redownloadTTD=TRUE,config=genConfig())
transformTTD(ttdfile=file.path(config$downloadPath,"TTD_IDs.txt"), redownloadTTD=TRUE,config=genConfig())
ttdfile |
The name of the output file to write the downloaded file to. |
redownloadTTD |
If true, data file will be downloaded again. If false, we assume the file
already exists at |
config |
General configuration. See |
Returns a data frame with TTD data in it.
Thomas Girke
ttd=tryCatch( transformTTD(), error=function(e){ message("Failed to download TTD file, please try again later") } )
ttd=tryCatch( transformTTD(), error=function(e){ message("Failed to download TTD file, please try again later") } )