Title: | Cross-target analysis of small molecule bioactivity |
---|---|
Description: | bioassayR is a computational tool that enables simultaneous analysis of thousands of bioassay experiments performed over a diverse set of compounds and biological targets. Unique features include support for large-scale cross-target analyses of both public and custom bioassays, generation of high throughput screening fingerprints (HTSFPs), and an optional preloaded database that provides access to a substantial portion of publicly available bioactivity data. |
Authors: | Tyler Backman, Ronly Schlenk, Thomas Girke |
Maintainer: | Thomas Girke <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.45.0 |
Built: | 2025-01-20 06:13:23 UTC |
Source: | https://github.com/bioc/bioassayR |
Returns a data.frame
of small molecule cids which show activity against a specified target. Each row name represents a cid which shows activity, and the total screens and the percent active are shown in their respective columns.
activeAgainst(database, target)
activeAgainst(database, target)
database |
A |
target |
A string or integer containing a target_id referring to a target of interest. |
A data.frame
where the row names represent each compound showing activity against the specified target. The second column shows the number of distinct assays in which this cid was screened against the target, and the first column shows the percentage of these which exhibited activity.
Tyler Backman
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## get cids of compounds which show activity against target 116516899 myCids <- row.names(activeAgainst(sampleDB, "166897622")) ## disconnect from database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## get cids of compounds which show activity against target 116516899 myCids <- row.names(activeAgainst(sampleDB, "166897622")) ## disconnect from database disconnectBioassayDB(sampleDB)
Returns a data.frame
of the targets, which a given small molecule (specified by cid) shows activity against. For each target, a single row shows the total number of distinct screens it participated in, and the fraction of those in which it exhibits activity.
activeTargets(database, cid)
activeTargets(database, cid)
database |
A |
cid |
A string or integer containing a cid referring to a small molecule. |
A data.frame
where the row names represent each target the specified compound shows activity against, and the columns show the total screens and the fraction in which the compound was active.
Tyler Backman
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## get targets that compound 2244 shows activity against myTargets <- row.names(activeTargets(sampleDB, "2244")) ## disconnect from database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## get targets that compound 2244 shows activity against myTargets <- row.names(activeTargets(sampleDB, "2244")) ## disconnect from database disconnectBioassayDB(sampleDB)
Indexing a bioassayR database before performing queries will drastically improve query performance. However, it will also slow down loading large amounts of additional data. Therefore, we recommend loading the majority of your data, using this function to index, and then performing queries.
addBioassayIndex(database)
addBioassayIndex(database)
database |
A |
Tyler Backman
## create test database library(bioassayR) filename <- tempfile() mydb <- newBioassayDB(filename, indexed=FALSE) ## load any data at this point ## add database index addBioassayIndex(mydb) # perform queries here ## close and delete test database disconnectBioassayDB(mydb) unlink(filename)
## create test database library(bioassayR) filename <- tempfile() mydb <- newBioassayDB(filename, indexed=FALSE) ## load any data at this point ## add database index addBioassayIndex(mydb) # perform queries here ## close and delete test database disconnectBioassayDB(mydb) unlink(filename)
This function adds a new data source (name/description and version) for tracking data within a bioassayR database. This can be used later to identify the source of any specific activity data within the database, or to limit analysis to data from specific source(s).
addDataSource(database, description, version)
addDataSource(database, description, version)
database |
A |
description |
A string containing a name or description of the new data source. This exact value will be used as a key for querying and loading data from this source. |
version |
A string with the version and/or date of the data source. This can be used to track the date in which a non-version data source was mirrored. |
Tyler Backman
## create a test database library(bioassayR) filename <- tempfile() mydb <- newBioassayDB(filename, indexed=FALSE) ## add a new data source addDataSource(mydb, description="bioassayR_sample", version="1.0") ## list data sources loaded mydb ## close and delete database disconnectBioassayDB(mydb) unlink(filename)
## create a test database library(bioassayR) filename <- tempfile() mydb <- newBioassayDB(filename, indexed=FALSE) ## add a new data source addDataSource(mydb, description="bioassayR_sample", version="1.0") ## list data sources loaded mydb ## close and delete database disconnectBioassayDB(mydb) unlink(filename)
BioassayDB
, bioassay
, bioassaySet
, or target matrix (dgCMatrix
) object
Returns a vector
of small molecule cids contained within a
BioassayDB
, bioassay
, bioassaySet
, or target matrix (dgCMatrix
) object.
It can optionally only returned cids labeled as active.
allCids(inputObject, activesOnly = FALSE)
allCids(inputObject, activesOnly = FALSE)
inputObject |
A |
activesOnly |
logical. Should only active compounds be returned? Defaults to FALSE. |
A vector
of distinct small molecule cids. No particular order
is guranteed.
Tyler Backman
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## get all compound cids myCids <- allCids(sampleDB) ## get only active compound cids activeCids <- allCids(sampleDB, activesOnly = TRUE) ## disconnect from database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## get all compound cids myCids <- allCids(sampleDB) ## get only active compound cids activeCids <- allCids(sampleDB, activesOnly = TRUE) ## disconnect from database disconnectBioassayDB(sampleDB)
BioassayDB
, bioassay
, bioassaySet
, or target matrix (dgCMatrix
) object
Returns a vector
of target ids contained within a
BioassayDB
, bioassay
, bioassaySet
, or target matrix (dgCMatrix
) object.
allTargets(inputObject)
allTargets(inputObject)
inputObject |
A |
A vector
of distinct target ids. No particular order
is guaranteed.
Tyler Backman
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## get all target ids myTargets <- allTargets(sampleDB) ## disconnect from database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## get all target ids myTargets <- allTargets(sampleDB) ## disconnect from database disconnectBioassayDB(sampleDB)
bioassaySet
object
This takes a bioassaySet
of multiple assays and returns
a vector of the targets of each, with the assay identifiers themselves (aids)
as names. If a single assay contains multiple targets, these will all be listed.
assaySetTargets(assays)
assaySetTargets(assays)
assays |
A |
A character
vector
of the targets of each, with the assay identifiers themselves (aids) as names
Tyler William H Backman
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve three assays assays <- getAssays(sampleDB, c("673509","103","105")) assays ## get the targets for these assays myTargets <- assaySetTargets(assays) myTargets ## disconnect from sample database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve three assays assays <- getAssays(sampleDB, c("673509","103","105")) assays ## get the targets for these assays myTargets <- assaySetTargets(assays) myTargets ## disconnect from sample database disconnectBioassayDB(sampleDB)
FPset
object that contains bioactivity
results for a given set of compounds and targets.
Returns a custom binary descriptor fingerprint for a given set of query cids and
target compounds, based on the activity data within a bioassaySet
object.
bioactivityFingerprint(bioassaySet, targets = FALSE, summarizeReplicates = "activesFirst")
bioactivityFingerprint(bioassaySet, targets = FALSE, summarizeReplicates = "activesFirst")
bioassaySet |
A |
targets |
An optional list of target id(s) to consider when creating the binary fingerprint. If a listed target is not in the bioassaySet, or has no active scores it will still be accepted, but create a fingerprint with all zeros for this location. The binary order of this list is preserved, so that direct comparison and combination of resulting |
summarizeReplicates |
Optionally allows users to choose how replicates (multiple assays sharing common compounds and targets) are resolved if they disagree. If 'activesFirst' any active score will take precedence over an inactive. If 'mode' the resulting score will be computed according to the statistical mode using |
The returned object is a standard ChemmineR FPset
object, and
can be used as described in the ChemmineR documentation. The order and number of binary bits for each compound can be set using the targets
option, enabling the combination or comparison of multiple objects created with the same target list.
If a single compound has both active and inactive scores for the same target, it will
be resolved according to the confictResolver
option.
Tyler William H Backman
Functions: getBioassaySetByCids
, getAssays
, perTargetMatrix
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve all targets in database targetList <- allTargets(sampleDB) ## get an activity fingerprint object for selected CIDs queryCids <- c("2244", "3715", "2662", "3033", "133021", "44563999", "44564000", "44564001", "44564002") myAssaySet <- getBioassaySetByCids(sampleDB, queryCids) myFp <- bioactivityFingerprint(bioassaySet=myAssaySet) ## disconnect from sample database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve all targets in database targetList <- allTargets(sampleDB) ## get an activity fingerprint object for selected CIDs queryCids <- c("2244", "3715", "2662", "3033", "133021", "44563999", "44564000", "44564001", "44564002") myAssaySet <- getBioassaySetByCids(sampleDB, queryCids) myFp <- bioactivityFingerprint(bioassaySet=myAssaySet) ## disconnect from sample database disconnectBioassayDB(sampleDB)
"bioassay"
This class represents the data from a bioassay experiment, where a number of small molecules are screened against a defined target (such as a protein or living organism).
Objects can be created by calls of the form new("bioassay", ...)
.
aid
:Object of class "character"
containing the assay id. For assays sourced from NCBI PubChem, this should be a string containing the PubChem AID (assay identifier).
source_id
:Object of class "character"
.
This should match the description for a data source loaded via the addDataSource()
function.
assay_type
:Object of class "character"
.
A string noting the type of bioactivity experiment, such as “confirmatory” to represent a confirmatory assay.
organism
:Object of class "character"
.
A string noting the scientific name of the assays target organism.
scoring
:Object of class "character"
.
A string noting the scoring method used for the bioactivity experiment. For example, IC50 or EC50.
targets
:Object of class "character"
. A string or vector of strings
containing the target identifier indicating the assay target. In the case of protein targeted assays sourced from NCBI PubChem, this should be a genbank ID.
target_types
:Object of class "character"
.
A string of text or vector of strings, representing (in the same order) the target types for each target. For example “protein” or “cell.”
scores
:Object of class "data.frame"
containing the bioactivity data to be loaded. This must be a 3 column data frame, with each row representing the bioactivity results of a single molecule. The first column represents the compound id (cid), which must be a unique value for each structurally distinct molecule. The second column is a binary value representing activity (1=active, 0=inactive, NA=inconclusive or untested) for the given assay. The last column represents a score, scored by the method specified with the addBioassay()
function. Missing or non-applicable values in any column should be represented by a NA
value.
signature(x = "bioassay")
: ...
signature(x = "bioassay")
: ...
signature(x = "bioassay")
: ...
signature(x = "bioassay")
: ...
signature(object = "bioassay")
: ...
signature(object = "bioassay")
: ...
signature(x = "bioassay")
: ...
signature(x = "bioassay")
: ...
signature(x = "bioassay")
: ...
signature(x = "bioassay")
: ...
signature(object = "bioassay")
: ...
signature(x = "bioassay")
: ...
signature(x = "bioassay")
: ...
signature(x = "bioassay")
: ...
signature(x = "bioassay")
: ...
signature(x = "bioassay")
: ...
signature(x = "bioassay")
: ...
Tyler Backman
Related classes: bioassaySet, bioAssayDB.
showClass("bioassay") ## create a new bioassay object from sample data data(samplebioassay) myassay <- new("bioassay",aid="1000", source_id="test", targets="116516899", target_types="protein", scores=samplebioassay) myassay
showClass("bioassay") ## create a new bioassay object from sample data data(samplebioassay) myassay <- new("bioassay",aid="1000", source_id="test", targets="116516899", target_types="protein", scores=samplebioassay) myassay
"BioassayDB"
This class holds a connection to a bioassayR sqlite database.
Objects can be created by calls of the form BioassayDB("datbasePath")
.
database
:Object of class "SQLiteConnection"
~~
signature(object = "BioassayDB")
: ...
signature(object = "BioassayDB")
: ...
Tyler Backman
Related classes: bioassaySet, bioassay.
showClass("BioassayDB")
showClass("BioassayDB")
"bioassaySet"
This class stores a large number of bioactivity scores from multiple assays and experiments as a single sparse matrix.
Objects can be created with several functions including getAssays
and getBioassaySetByCids
.
activity
:Object of class "dgCMatrix"
a sparse matrix of assays (rows) vs compounds (columns) where 0 represents untested, NA represents inconclusive, 1 represents inactive, and 2 represents activity
scores
:Object of class "dgCMatrix"
numeric activity scores with the same dimensions as activity
targets
:Object of class "dgCMatrix"
a binary matrix of the targets (columns) for each aid (rows) listed in the activity and scores matrix. A 1 represents a target for the given assay, and a 0 represents that the given target was not used in the assay.
sources
:Object of class "data.frame"
data sources for each assay. There must be three columns titled 'source_id', 'description', and 'version.' Each row represents a data source for these data. The 'source_id' must be a numeric (integer) index that matches to those in the 'source_id' slot.
source_id
:Object of class "integer"
the source_id for each assay as an integer. The length should equal the number of rows in the activity matrix, with element names for each assay id (aid).
assay_type
:Object of class "character"
the experiment type for each assay. The length should equal the number of rows in the activity matrix, with element names for each assay id (aid).
organism
:Object of class "character"
scientific name of each target species. The length should equal the number of rows in the activity matrix, with element names for each assay id (aid).
scoring
:Object of class "character"
scoring method used in the scores matrix. The length should equal the number of rows in the activity matrix, with element names for each assay id (aid).
target_types
:Object of class "character"
type of target for each target id, where the names and order match the columns in the target matrix. The length should equal the number of rows in the activity matrix, with element names for each assay id (aid).
signature(x = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
signature(object = "bioassaySet")
: ...
signature(object = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
signature(object = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
signature(x = "bioassaySet")
: ...
Tyler William H Backman
Related classes: bioassay, bioAssayDB.
showClass("bioassaySet")
showClass("bioassaySet")
BioassayDB
object connected to the specified
database file
This function returns a BioassayDB
object for working with a pre-existing bioassayR database, already located on the users filesystem.
Users can download pre-built databases for use with this feature from http://chemmine.ucr.edu/bioassayr
connectBioassayDB(databasePath, writeable = FALSE)
connectBioassayDB(databasePath, writeable = FALSE)
databasePath |
Full path to the database file to be opened. |
writeable |
logical. Should the database allow data to be modified and written to? |
BioassayDB |
for details see ?"BioassayDB-class" |
Tyler Backman
## create a test database library(bioassayR) filename <- tempfile() mydb <- newBioassayDB(filename, indexed=FALSE) disconnectBioassayDB(mydb) ## connect to test database mydb <- connectBioassayDB(filename) ## close and delete database disconnectBioassayDB(mydb) unlink(filename)
## create a test database library(bioassayR) filename <- tempfile() mydb <- newBioassayDB(filename, indexed=FALSE) disconnectBioassayDB(mydb) ## connect to test database mydb <- connectBioassayDB(filename) ## close and delete database disconnectBioassayDB(mydb) unlink(filename)
Queries a compound vs target sparse matrix as generated by the perTargetMatrix
function,
and computes the probability for each compound, where theta
is the probability that the compound would be active in any given new assay against a novel untested target.
This code implements the Bayesian Modeling of Cross-Reactive Compounds
method described by Dancik, V. et al. (see references). This method assumes
that the number of observed active targets out of total tested targets follows a binomial
distribution. A beta conjugate prior distribution is calculated based on the hit ratios (active/total tested)
for a reference database.
crossReactivityProbability(inputMatrix, threshold=0.25, prior=list(hit_ratio_mean=0.0126, hit_ratio_sd=0.0375)) crossReactivityPrior(database, minTargets=20, category=FALSE, activesOnly=FALSE)
crossReactivityProbability(inputMatrix, threshold=0.25, prior=list(hit_ratio_mean=0.0126, hit_ratio_sd=0.0375)) crossReactivityPrior(database, minTargets=20, category=FALSE, activesOnly=FALSE)
inputMatrix |
A |
threshold |
A |
prior |
A |
database |
A |
minTargets |
The minimum number of distinct screened targets for a compound to be included in the prior probability distribution. |
category |
Include only once in prior hit ratio counts any targets which share a common annotation of this category
(as used by the |
activesOnly |
logical. Should only compounds with at least one active score be used in computing prior? Defaults to FALSE. |
This function models the hit-ratio theta (fraction of distinct targets which are active) for a given compound with a standard beta-binomial bayesian model. The observed activity values for a compound tested against N targets with n actives is assumed to follow a binomial distribution:
With a beta conjugate prior distribution
where the parameters a and b (alpha and beta) are calculated from the prior
mean and standard deviation of hit ratios for a large number of highly
screened compounds as follows:
and
.
This function then computes and returns the posterior probability
using the beta distribution function
pbeta
.
crossReactivityProbability
returns an numeric
vector containing the probability that the hit ratio
(active targets / total targets) is greater than value threshold
for each
compound in the inputMatrix
.
crossReactivityPrior
returns a list
in the prior format described above.
Tyler Backman
Dancik, V. et al. Connecting Small Molecules with Similar Assay Performance Profiles Leads to New Biological Hypotheses. J Biomol Screen 19, 771-781 (2014).
pbeta
for the beta distribution function.
perTargetMatrix
targetSelectivity
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve activity data for three compounds assays <- getBioassaySetByCids(sampleDB, c("2244","3715","133021")) ## collapse assays into perTargetMatrix targetMatrix <- perTargetMatrix(assays) ## compute P(theta > 0.25) crossReactivityProbability(targetMatrix) ## disconnect from sample database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve activity data for three compounds assays <- getBioassaySetByCids(sampleDB, c("2244","3715","133021")) ## collapse assays into perTargetMatrix targetMatrix <- perTargetMatrix(assays) ## compute P(theta > 0.25) crossReactivityProbability(targetMatrix) ## disconnect from sample database disconnectBioassayDB(sampleDB)
BioassayDB
object
This function disconnects the underlying sqlite database from a BioassayDB
object. This is a critical step for writeable databases, but can be omitted for read only databases.
disconnectBioassayDB(database)
disconnectBioassayDB(database)
database |
A codeBioassayDB object to be disconnected. |
Tyler Backman
## create a test database library(bioassayR) filename <- tempfile() mydb <- newBioassayDB(filename, indexed=FALSE) ## disconnect from database mydb <- connectBioassayDB(filename) ## delete database file unlink(filename)
## create a test database library(bioassayR) filename <- tempfile() mydb <- newBioassayDB(filename, indexed=FALSE) ## disconnect from database mydb <- connectBioassayDB(filename) ## delete database file unlink(filename)
Allows the user to delete all records from the database associated with a given assay identifier.
dropBioassay(database, aid)
dropBioassay(database, aid)
database |
A |
aid |
The assay identifier string (aid), matching an aid for an assay loaded into the database. |
Tyler Backman
## create sample database and load with data myDatabaseFilename <- tempfile() mydb <- newBioassayDB(myDatabaseFilename, indexed=FALSE) extdata_dir <- system.file("extdata", package="bioassayR") assayDescriptionFile <- file.path(extdata_dir, "exampleAssay.xml") activityScoresFile <- file.path(extdata_dir, "exampleScores.csv") myAssay <- parsePubChemBioassay("1000", activityScoresFile, assayDescriptionFile) addDataSource(mydb, description="PubChem BioAssay", version="unknown") loadBioassay(mydb, myAssay) ## delete the loaded assay dropBioassay(mydb, "1000") ## disconnect from and delete sample database disconnectBioassayDB(mydb) unlink(myDatabaseFilename)
## create sample database and load with data myDatabaseFilename <- tempfile() mydb <- newBioassayDB(myDatabaseFilename, indexed=FALSE) extdata_dir <- system.file("extdata", package="bioassayR") assayDescriptionFile <- file.path(extdata_dir, "exampleAssay.xml") activityScoresFile <- file.path(extdata_dir, "exampleScores.csv") myAssay <- parsePubChemBioassay("1000", activityScoresFile, assayDescriptionFile) addDataSource(mydb, description="PubChem BioAssay", version="unknown") loadBioassay(mydb, myAssay) ## delete the loaded assay dropBioassay(mydb, "1000") ## disconnect from and delete sample database disconnectBioassayDB(mydb) unlink(myDatabaseFilename)
Indexing a bioassayR database before performing queries will drastically improve query performance. However, it will also slow down loading large amounts of additional data. Therefore,
it may be necessary to use this index to remove an index from a database before adding large quantities of data. Afterwards, the index can be re-generated using the addBioassayIndex
function.
dropBioassayIndex(database)
dropBioassayIndex(database)
database |
A |
Tyler Backman
## create test database library(bioassayR) filename <- tempfile() mydb <- newBioassayDB(filename, indexed=TRUE) ## remove database index dropBioassayIndex(mydb) ## load new data into database here ## reactivate index addBioassayIndex(mydb) ## close and delete test database disconnectBioassayDB(mydb) unlink(filename)
## create test database library(bioassayR) filename <- tempfile() mydb <- newBioassayDB(filename, indexed=TRUE) ## remove database index dropBioassayIndex(mydb) ## load new data into database here ## reactivate index addBioassayIndex(mydb) ## close and delete test database disconnectBioassayDB(mydb) unlink(filename)
Retrieves a bioassay as a bioassay
object from a bioassayR database by identifier.
getAssay(database, aid)
getAssay(database, aid)
database |
A |
aid |
The assay identifier string (aid), matching an aid for an assay loaded into the database. |
A bioassay
object containing the requested assay.
Tyler Backman
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve an assay assay <- getAssay(sampleDB, "673509") assay ## disconnect from sample database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve an assay assay <- getAssay(sampleDB, "673509") assay ## disconnect from sample database disconnectBioassayDB(sampleDB)
Retrieves a list of aids as a single bioassaySet
matrix object
getAssays(database, aids)
getAssays(database, aids)
database |
A |
aids |
One or more assay identifier strings (aid), matching aid(s) for assays loaded into the database. |
A bioassaySet
object containing data from the specified assays.
Tyler William H Backman
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve three assays assays <- getAssays(sampleDB, c("673509","103","105")) assays ## disconnect from sample database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve three assays assays <- getAssays(sampleDB, c("673509","103","105")) assays ## disconnect from sample database disconnectBioassayDB(sampleDB)
bioassaySet
sparse matrix object with activity data only for specified compounds
Takes a list of compounds, and creates a bioassaySet
sparse matrix object with the activity data
for these compounds only, not including activity data from other compounds in the same assays.
getBioassaySetByCids(database, cids)
getBioassaySetByCids(database, cids)
database |
A |
cids |
One or more compounds IDs of interest. |
A bioassaySet
object containing data from the specified cids.
Tyler William H Backman
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve activity data on 3 compounds activitySet <- getBioassaySetByCids(sampleDB, c("2244","3715","237")) activitySet ## disconnect from sample database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve activity data on 3 compounds activitySet <- getBioassaySetByCids(sampleDB, c("2244","3715","237")) activitySet ## disconnect from sample database disconnectBioassayDB(sampleDB)
Returns a data.frame
of all targets a single cid (compound) has been found inactive against, and the number of times it has been found inactive in distinct assay experiments. If a compound has been found both active and inactive in different assays, it will be listed among these results.
inactiveTargets(database, cid)
inactiveTargets(database, cid)
database |
A |
cid |
A string or integer containing a cid referring to a small molecule. |
A data.frame
where the row names represent each target the specified compound shows inactivity against, and the column shows the number of assays in which it was found to be inactive.
Tyler Backman
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## get targets that compound 2244 shows inactivity against myCidInactiveTargets <- row.names(inactiveTargets(sampleDB, "2244")) ## disconnect from database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## get targets that compound 2244 shows inactivity against myCidInactiveTargets <- row.names(inactiveTargets(sampleDB, "2244")) ## disconnect from database disconnectBioassayDB(sampleDB)
Loads the results of a bioassay experiment (stored as a bioassay
object) into the specified database.
The data source specified in the bioassay
object be added to the database with
addDataSource
before loading. If the assay identifier (aid) is already present in the database,
an error is returned and no additional data is loaded.
loadBioassay(database, bioassay)
loadBioassay(database, bioassay)
database |
A |
bioassay |
A |
Tyler Backman
## create sample database myDatabaseFilename <- tempfile() mydb <- newBioassayDB(myDatabaseFilename, indexed=FALSE) ## parse example assay data extdata_dir <- system.file("extdata", package="bioassayR") assayDescriptionFile <- file.path(extdata_dir, "exampleAssay.xml") activityScoresFile <- file.path(extdata_dir, "exampleScores.csv") myAssay <- parsePubChemBioassay("1000", activityScoresFile, assayDescriptionFile) ## load bioassay into database addDataSource(mydb, description="PubChem BioAssay", version="unknown") loadBioassay(mydb, myAssay) ## disconnect from and delete sample database disconnectBioassayDB(mydb) unlink(myDatabaseFilename)
## create sample database myDatabaseFilename <- tempfile() mydb <- newBioassayDB(myDatabaseFilename, indexed=FALSE) ## parse example assay data extdata_dir <- system.file("extdata", package="bioassayR") assayDescriptionFile <- file.path(extdata_dir, "exampleAssay.xml") activityScoresFile <- file.path(extdata_dir, "exampleScores.csv") myAssay <- parsePubChemBioassay("1000", activityScoresFile, assayDescriptionFile) ## load bioassay into database addDataSource(mydb, description="PubChem BioAssay", version="unknown") loadBioassay(mydb, myAssay) ## disconnect from and delete sample database disconnectBioassayDB(mydb) unlink(myDatabaseFilename)
Loads an identifier mapping for a bioassay target (stored in the database as an NCBI GI number) to another protein target naming system. Common uses include UniProt identifiers, similarity clusters, and common names.
loadIdMapping(database, target, category, identifier)
loadIdMapping(database, target, category, identifier)
database |
A writable |
target |
A single protein target NCBI GI number. |
category |
The specified identifier type of the data being loaded, such as 'UniProt'. |
identifier |
A |
Tyler Backman
http://www.ncbi.nlm.nih.gov/protein NCBI Protein Database http://www.uniprot.org UniProt Protein Database
## create sample database myDatabaseFilename <- tempfile() mydb <- newBioassayDB(myDatabaseFilename, indexed=FALSE) ## load a sample translation from GI 6686268 to UniProt P11712 loadIdMapping(mydb, "6686268", "UniProt", "P11712") ## get UniProt identifier(s) for GI Number 6686268 UniProtIds <- translateTargetId(mydb, "6686268", "UniProt") UniProtIds ## disconnect from and delete sample database disconnectBioassayDB(mydb) unlink(myDatabaseFilename)
## create sample database myDatabaseFilename <- tempfile() mydb <- newBioassayDB(myDatabaseFilename, indexed=FALSE) ## load a sample translation from GI 6686268 to UniProt P11712 loadIdMapping(mydb, "6686268", "UniProt", "P11712") ## get UniProt identifier(s) for GI Number 6686268 UniProtIds <- translateTargetId(mydb, "6686268", "UniProt") UniProtIds ## disconnect from and delete sample database disconnectBioassayDB(mydb) unlink(myDatabaseFilename)
This function creates a new bioassayR database at the specified filesystem location, and returns a BioassayDB
object connected to the new database.
newBioassayDB(databasePath, writeable = TRUE, indexed = FALSE)
newBioassayDB(databasePath, writeable = TRUE, indexed = FALSE)
databasePath |
Full path to the database file to be created. |
writeable |
logical. Should the database allow data to be modified and written to? |
indexed |
logical. Should a performance enhancing index be created? The default is false, as typically an index is added only after initial data is loaded. Data loading is much slower into an already indexed database. |
Tyler Backman
## get a temporary filename library(bioassayR) filename <- tempfile() ## create a new bioassayR database mydb <- newBioassayDB(filename, indexed=FALSE) ## close and delete database disconnectBioassayDB(mydb) unlink(filename)
## get a temporary filename library(bioassayR) filename <- tempfile() ## create a new bioassayR database mydb <- newBioassayDB(filename, indexed=FALSE) ## close and delete database disconnectBioassayDB(mydb) unlink(filename)
Parses a PubChem Bioassay experimental result from two required files (a csv file and an XML description) into a bioassay
object.
parsePubChemBioassay(aid, csvFile, xmlFile, duplicates = "drop", missingCid = "drop", scoreRegex = "inhibition|ic50|ki|gi50|ec50|ed50|lc50")
parsePubChemBioassay(aid, csvFile, xmlFile, duplicates = "drop", missingCid = "drop", scoreRegex = "inhibition|ic50|ki|gi50|ec50|ed50|lc50")
aid |
The assay identifier (aid) for the assay to be parsed. |
csvFile |
A CSV file for a given assay, as downloaded from PubChem Bioassay. |
xmlFile |
An XML description file for a given assay, as downloaded from PubChem Bioassay. |
duplicates |
Specifies how duplicate CIDs in the same assay are treated. If 'drop' is specified, only the first of each duplicated cid is kept and a warning is returned. If 'FALSE' processing will stop with an error if duplicates are present. If 'TRUE' duplicates will be included without warning, which may cause erroneous results with other bioassayR functions that assume a unique cid list for each assay. |
missingCid |
A value of either 'drop' or a logical value of FALSE. If 'FALSE' processing will stop with an error for any input compounds with an empty cid string. If 'drop' is specified, a warning will be issued and these compounds will be skipped. |
scoreRegex |
A regular expression (perl compatible, case insensitive) to be matched to the column names in the CSV header, to identify relavent score rows. If any rows match this regex, the first matching row will be used in place of the 'PUBCHEM_ACTIVITY_SCORE' and it's row name will be stored as the assays scoring method. The default will identify most PubChem Bioassays which contain protein target inhibition data. If a matching row contains all empty or non-numeric results, the next matching row is automatically used. |
A bioassay
object containing the loaded data.
Tyler Backman
http://pubchem.ncbi.nlm.nih.gov NCBI PubChem
## get sample data locations extdata_dir <- system.file("extdata", package="bioassayR") assayDescriptionFile <- file.path(extdata_dir, "exampleAssay.xml") activityScoresFile <- file.path(extdata_dir, "exampleScores.csv") ## parse files myAssay <- parsePubChemBioassay("1000", activityScoresFile, assayDescriptionFile) myAssay
## get sample data locations extdata_dir <- system.file("extdata", package="bioassayR") assayDescriptionFile <- file.path(extdata_dir, "exampleAssay.xml") activityScoresFile <- file.path(extdata_dir, "exampleScores.csv") ## parse files myAssay <- parsePubChemBioassay("1000", activityScoresFile, assayDescriptionFile) myAssay
bioassaySet
object from multiple assays by combining assays with a common target
Creates a sparseMatrix
object which has an activity value for each distinct target identifier rather than each distinct assay. Users can optionally choose how replicates are resolved. By default active scores always take preference over inactives: if any assay for a given target vs compound combination shows active, this combination will be marked active in the resulting object.
Either binary activity categories or scalar numeric scores can be used. When used with numeric data, this will create a Z-score compound vs. target matrix similar to High Throughput Screening Fingerprints (HTSFPs).
This function is not designed for single assays with multiple targets, and if they are present only one of the targets will be considered.
perTargetMatrix(assays, inactives = TRUE, assayTargets = FALSE, targetOrder = FALSE, summarizeReplicates = "activesFirst", useNumericScores = FALSE)
perTargetMatrix(assays, inactives = TRUE, assayTargets = FALSE, targetOrder = FALSE, summarizeReplicates = "activesFirst", useNumericScores = FALSE)
assays |
A |
inactives |
A logical value. Include both active and inactive scores. If FALSE only active scores are returned. This is only used if |
assayTargets |
Provide a custom merge table of target identifiers for each assay. For example, if you have clustered the targets of many assays into bins you can here merge by common clusters instead of distinct targets. This must be vector of class |
targetOrder |
An optional |
summarizeReplicates |
Optionally allows users to choose how replicates (multiple assays sharing common compounds and targets) are resolved if they disagree. If 'activesFirst' any active score will take precedence over an inactive. If 'mode' the resulting score will be computed according to the statistical mode using |
useNumericScores |
A logical value. Use numeric score rather than binary data to create a scalar compound vs. target
matrix. When used with the output of |
When used with useNumericScores = FALSE
a sparseMatrix
which contains a value of 2 for each target vs compound combination which shows activity in at least one parent assay, a value of 1 for inactive combinations, and a value of zero for untested or ambiguous values.
Note that this is different from older versions of bioassayR (1.6 and older) which used to return a value of 1 for actives and did not have the option to process inactives.
When used with useNumericScores = TRUE
the raw numeric scores are returned, with replicates summarized as specified with the summarizeReplicates
option.
Tyler William H Backman
Functions: scaleBioassaySet
, getAssays
, bioactivityFingerprint
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## option 1: retrieve all data for three compounds assays <- getBioassaySetByCids(sampleDB, c("2244","3715","133021")) assays ## option 2: retrieve all data for three assays assays <- getAssays(sampleDB, c("673509","103","105")) assays ## collapse assays into perTargetMatrix targetMatrix <- perTargetMatrix(assays) targetMatrix ## disconnect from sample database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## option 1: retrieve all data for three compounds assays <- getBioassaySetByCids(sampleDB, c("2244","3715","133021")) assays ## option 2: retrieve all data for three assays assays <- getAssays(sampleDB, c("673509","103","105")) assays ## collapse assays into perTargetMatrix targetMatrix <- perTargetMatrix(assays) targetMatrix ## disconnect from sample database disconnectBioassayDB(sampleDB)
Provides extreme query flexibility by allowing the user to perform any SQLite query on a bioassayR database. This allows for analysis beyond that provided by the built in query functions.
queryBioassayDB(object, query)
queryBioassayDB(object, query)
object |
A |
query |
A string containing a valid SQLite query (see SQLite documentation for more details). |
A data.frame
containing the results of the specified query.
Tyler Backman
http://www.sqlite.org provides a complete reference for SQLite syntax that can be used with this function
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## inspect the structure of the database before forming a query queryBioassayDB(sampleDB, "SELECT * FROM sqlite_master WHERE type='table'") ## find all activity data for compound cid 2244 queryBioassayDB(sampleDB, "SELECT * FROM activity WHERE cid = '2244'") ## disconnect from database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## inspect the structure of the database before forming a query queryBioassayDB(sampleDB, "SELECT * FROM sqlite_master WHERE type='table'") ## find all activity data for compound cid 2244 queryBioassayDB(sampleDB, "SELECT * FROM activity WHERE cid = '2244'") ## disconnect from database disconnectBioassayDB(sampleDB)
This is sample bioactivity data, taken from assay identifier (aid) 1000 in the NCBI PubChem Bioassay database. These data are provided for testing the bioassayR library.
data(samplebioassay)
data(samplebioassay)
A data frame with activity scores for 4 distinct compounds.
cid
unique compound identifer
activity
1=active, 0=inactive, NA=other
score
activity scores
http://pubchem.ncbi.nlm.nih.gov NCBI PubChem
http://pubchem.ncbi.nlm.nih.gov NCBI Pubchem
## create a new bioassay object from these sample data data(samplebioassay) myassay <- new("bioassay",aid="1000", source_id="PubChem BioAssay", targets="116516899", target_types="protein", scores=samplebioassay) myassay
## create a new bioassay object from these sample data data(samplebioassay) myassay <- new("bioassay",aid="1000", source_id="PubChem BioAssay", targets="116516899", target_types="protein", scores=samplebioassay) myassay
bioassaySet
object (creates Z-scores)
Converts the numeric activity scores for a bioassaySet
object
into per-assay Z-scores. Untested '0' values are not considered
in computing the value, only actives and inactives. In essence, this
is a special version of the R base scale
function,
which ignores missing entries in a sparse matrix instead of using them
as zeros. A primary purpose of this function is to pass scaled
results to perTargetMatrix
, in order to compute a numeric Z-score
compound vs. target matrix.
scaleBioassaySet(bioassaySet, center=TRUE, scale=TRUE)
scaleBioassaySet(bioassaySet, center=TRUE, scale=TRUE)
bioassaySet |
A |
center |
A logical value. If center is TRUE then centering is done by subtracting the assay means (omitting inconclusive NAs) from their corresponding scores, and if center is FALSE, no centering is done. |
scale |
A logical value. Scaling is done by dividing the (centered) per-assay scores by their standard deviations if center is TRUE, and the root mean square otherwise. If scale is FALSE, no scaling is done. |
A bioassaySet
object with standardized numeric scores, that can be
accessed with scores(bioassaySet)
.
Tyler William H Backman
Functions: getAssays
, perTargetMatrix
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve three assays assays <- getAssays(sampleDB, c("347221","53211","624349")) ## disconnect from sample database disconnectBioassayDB(sampleDB) ## compute and return standardized scores scaledAssays <- scaleBioassaySet(assays) ## inspect scaled and unscaled scores scores(assays) scores(scaledAssays) ## NOTE: this example only returns non-NA Z-scores if tried with ## real data, not the test database used here
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve three assays assays <- getAssays(sampleDB, c("347221","53211","624349")) ## disconnect from sample database disconnectBioassayDB(sampleDB) ## compute and return standardized scores scaledAssays <- scaleBioassaySet(assays) ## inspect scaled and unscaled scores scores(assays) scores(scaledAssays) ## NOTE: this example only returns non-NA Z-scores if tried with ## real data, not the test database used here
Returns all compound cids screened against at least 'minTargets' distinct target identifiers. For a very large database (such as PubChem Bioassay) this function may take a long time to run.
screenedAtLeast(database, minTargets, inconclusives=TRUE)
screenedAtLeast(database, minTargets, inconclusives=TRUE)
database |
A |
minTargets |
The minimum number of distinct targets for each returned cid. |
inconclusives |
Logical. If |
Returns a character
vector of all CIDs meeting the specified criteria.
Tyler Backman
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## get all CIDS screened against at least 2 distinct targets highlyScreened <- screenedAtLeast(sampleDB, 2) highlyScreened ## get all CIDS screened against at least 2 distinct targets with conclusive results highlyScreened <- screenedAtLeast(sampleDB, 2, inconclusives=FALSE) highlyScreened ## disconnect from database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## get all CIDS screened against at least 2 distinct targets highlyScreened <- screenedAtLeast(sampleDB, 2) highlyScreened ## get all CIDS screened against at least 2 distinct targets with conclusive results highlyScreened <- screenedAtLeast(sampleDB, 2, inconclusives=FALSE) highlyScreened ## disconnect from database disconnectBioassayDB(sampleDB)
Allows the user to find compounds in the database that have been screened against a large number of distinct targets, but show high binding selectivity for a specific target of interest.
selectiveAgainst(database, target, maxCompounds = 10, minimumTargets = 10)
selectiveAgainst(database, target, maxCompounds = 10, minimumTargets = 10)
database |
A |
target |
A string or integer containing a target_id referring to a target of interest. |
maxCompounds |
An integer representing the number of resulting compounds to return. |
minimumTargets |
An integer representing the minimum number of distinct targets a compound must have been screened against to be included in the results. |
A data.frame
where the row names represent each compound showing binding specificity against the specified target.
The first column shows the number of distinct targets each compound shows activity against, and the second
column shows the total number of distinct targets it has been screened against.
Tyler Backman
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## find target selective compounds active against a protein of interest selectiveAgainst(sampleDB, target="166897622", maxCompounds=10,minimumTargets=20) ## disconnect from database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## find target selective compounds active against a protein of interest selectiveAgainst(sampleDB, target="166897622", maxCompounds=10,minimumTargets=20) ## disconnect from database disconnectBioassayDB(sampleDB)
Queries a BioassayDB
database and returns the target selectivity of the specified cids.
targetSelectivity(database, cids, scoring = "total", category=FALSE, multiTarget="keepOne")
targetSelectivity(database, cids, scoring = "total", category=FALSE, multiTarget="keepOne")
database |
A |
cids |
A string or integer vector containing query cids referring to a small molecules. |
scoring |
Must be one of two optional scoring methods "total" or "fraction". Fraction returns the target selectivity for each compound as the fraction of screened distinct targets that showed activity in at least one assay. Total returns the total number of active distinct targets for each compound, and does not consider inactive targets in the calculation. If fractional activity is requested, active values take precedence over inactives: if a target is both active and inactive in different assays it will be regarded as active. |
category |
Include only once in selectivity counts any targets which share a common annotation of this category
(as used by the |
multiTarget |
Decides how selectivity is counted with regard to multi-target assays. If |
Returns an numeric
vector containing the target selectivity for each query compound. Returned entires are named by their corresponding cid.
Tyler Backman
translateTargetId
loadIdMapping
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## make a vector with compounds of interest compoundsOfInterest <- c(2244, 2662, 3033) ## get "total" active targets for each compound of interest targetSelectivity(sampleDB, compoundsOfInterest, scoring="total") ## get fraction of active targets for each compound of interest targetSelectivity(sampleDB, compoundsOfInterest, scoring="fraction") ## disconnect from database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## make a vector with compounds of interest compoundsOfInterest <- c(2244, 2662, 3033) ## get "total" active targets for each compound of interest targetSelectivity(sampleDB, compoundsOfInterest, scoring="total") ## get fraction of active targets for each compound of interest targetSelectivity(sampleDB, compoundsOfInterest, scoring="fraction") ## disconnect from database disconnectBioassayDB(sampleDB)
Returns a character
vector of the protein target identifiers
using the specified category (classification system). This is most often
used to translate NCBI Protein GI numbers (as provided with the pre-build
PubChem Bioassay database) into UniProt identifiers.
translateTargetId(database, target, category, fromCategory = "GI")
translateTargetId(database, target, category, fromCategory = "GI")
database |
A |
target |
A protein target identifier to query (as set by the category option, with a default of "GI"). |
category |
The specified identifier type to return, such as 'UniProt'. |
fromCategory |
The identifier type of the query (default "GI"). |
A character
vector of the protein target identifiers
of the category specified for the target specified. An NA is returned
if no matching values exist in the database.
Tyler Backman
http://www.ncbi.nlm.nih.gov/protein NCBI Protein Database http://www.uniprot.org UniProt Protein Database
## create sample database myDatabaseFilename <- tempfile() mydb <- newBioassayDB(myDatabaseFilename, indexed=FALSE) ## load a sample translation from GI 6686268 to UniProt P11712 loadIdMapping(mydb, "6686268", "UniProt", "P11712") ## get UniProt identifier(s) for GI Number 6686268 UniProtIds <- translateTargetId(mydb, "6686268", "UniProt") UniProtIds ## get GI identifier(s) for UniProt ID P11712 GIs <- translateTargetId(mydb, "P11712", "GI", "UniProt") GIs ## disconnect from and delete sample database disconnectBioassayDB(mydb) unlink(myDatabaseFilename)
## create sample database myDatabaseFilename <- tempfile() mydb <- newBioassayDB(myDatabaseFilename, indexed=FALSE) ## load a sample translation from GI 6686268 to UniProt P11712 loadIdMapping(mydb, "6686268", "UniProt", "P11712") ## get UniProt identifier(s) for GI Number 6686268 UniProtIds <- translateTargetId(mydb, "6686268", "UniProt") UniProtIds ## get GI identifier(s) for UniProt ID P11712 GIs <- translateTargetId(mydb, "P11712", "GI", "UniProt") GIs ## disconnect from and delete sample database disconnectBioassayDB(mydb) unlink(myDatabaseFilename)
This computes tanimoto similarity coefficients between bioactivity profiles in a sparse matrix
aware way, where only commonly tested targets are considered. The computation is trinary in that
each compound is a column in a compound vs target matrix with three possible values
(2=active, 1=inactive, 0=untested or inconclusive) as generated by the perTargetMatrix
function.
A comparison will return a value of NA unless one of the two minimum thresholds is satisfied,
either a minimum number of shared screened targets, or a minimum number of shared active targets
as performed in Dancik, V. et al. (see references).
trinarySimilarity(queryMatrix, targetMatrix, minSharedScreenedTargets = 12, minSharedActiveTargets = 3)
trinarySimilarity(queryMatrix, targetMatrix, minSharedScreenedTargets = 12, minSharedActiveTargets = 3)
queryMatrix |
This is a compound vs. target sparse matrix representing the bioactivity profiles for one
compounds across one or more assays or targets. The format must be a
|
targetMatrix |
This is a compound vs. target sparse matrix representing the bioactivity profiles for one or more
compounds across one or more assays or targets. The format must be
|
minSharedScreenedTargets |
A |
minSharedActiveTargets |
A |
A numeric
vector where each element represents the tanimoto similarity between
the queryMatrix
and a given row in the targetMatrix
where only the shared
set of commonly screened targets is considered. If both the minSharedScreenedTargets
and minSharedActiveTargets
thresholds are unsatisfied, an NA
will be returned for the
given similarity value.
An NA
will also be returned if the tanimoto coefficient is undefined due
to a zero in the denominator, which occurs when neither compound was found active
against any of the commonly screened targets.
Tyler Backman
Tanimoto similarity coefficient: Tanimoto TT (1957) IBM Internal Report 17th Nov see also Jaccard P (1901) Bulletin del la Societe Vaudoisedes Sciences Naturelles 37, 241-272.
Dancik, V. et al. Connecting Small Molecules with Similar Assay Performance Profiles Leads to New Biological Hypotheses. J Biomol Screen 19, 771-781 (2014).
perTargetMatrix
getBioassaySetByCids
bioactivityFingerprint
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve activity data for three compounds assays <- getBioassaySetByCids(sampleDB, c("2244","3715","133021")) ## collapse assays into perTargetMatrix targetMatrix <- perTargetMatrix(assays) ## compute similarity between first column and all columns queryMatrix <- targetMatrix[,1,drop=FALSE] trinarySimilarity(queryMatrix, targetMatrix) ## disconnect from sample database disconnectBioassayDB(sampleDB)
## connect to a test database extdata_dir <- system.file("extdata", package="bioassayR") sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite") sampleDB <- connectBioassayDB(sampleDatabasePath) ## retrieve activity data for three compounds assays <- getBioassaySetByCids(sampleDB, c("2244","3715","133021")) ## collapse assays into perTargetMatrix targetMatrix <- perTargetMatrix(assays) ## compute similarity between first column and all columns queryMatrix <- targetMatrix[,1,drop=FALSE] trinarySimilarity(queryMatrix, targetMatrix) ## disconnect from sample database disconnectBioassayDB(sampleDB)