Package 'octad' reference manual

Title:	Open Cancer TherApeutic Discovery (OCTAD)
Description:	OCTAD provides a platform for virtually screening compounds targeting precise cancer patient groups. The essential idea is to identify drugs that reverse the gene expression signature of disease by tamping down over-expressed genes and stimulating weakly expressed ones. The package offers deep-learning based reference tissue selection, disease gene expression signature creation, pathway enrichment analysis, drug reversal potency scoring, cancer cell line selection, drug enrichment analysis and in silico hit validation. It currently covers ~20,000 patient tissue samples covering 50 cancer types, and expression profiles for ~12,000 distinct compounds.
Authors:	E. Chekalin [aut, cre], S. Paithankar [aut], B. Zeng [aut], B. Glicksberg [ctb], P. Newbury [ctb], J. Xing [ctb], K. Liu [ctb], A. Wen [ctb], D. Joseph [ctb], B. Chen [aut]
Maintainer:	E. Chekalin <[email protected]>
License:	Artistic-2.0
Version:	1.9.0
Built:	2025-03-15 06:02:30 UTC
Source:	https://github.com/bioc/octad

Compute Correlation between cell lines and vector of case ids.

Description

Select top CCLE cell lines sharing similar expression profiles with input case samples. Input case sample ids and output correlation scores for every cell line and/or output file. The results could be used for in-silico validation of predictions or used to weight cell lines in RGES computation. CellLineCorrelations.csv, correlation between CCLE cell lines and input disease samples.

Usage

 computeCellLine(case_id = case_id, expSet = NULL, LINCS_overlaps = TRUE,
                            source = c("octad.small", "octad.whole", "expSet"),
                            file = NULL, output = TRUE,
                            outputFolder = NULL)
computeCellLine(case_id = case_id, expSet = NULL, LINCS_overlaps = TRUE,
                            source = c("octad.small", "octad.whole", "expSet"),
                            file = NULL, output = TRUE,
                            outputFolder = NULL)

Arguments

`case_id`	vector of ids from octad database. Ids can be obtained from `phenoDF`.
`output`	by default `FALSE`, if `TRUE`, file CellLineCorrelations.csv with results are produced in working directory.
`outputFolder`	Folder to store results.
`LINCS_overlaps`	vector of cell line ids from octad database. If `TRUE`, overlap with LINCS cells database wll be performed
`source`	the file for the octad expression matrix. By default, set to `octad.small` to use only 978 landmark genes profiled in LINCS database. Use `octad.whole` option to compute DE on the whole transcriptome `octad.counts.and.tpm.h5` file. The file should be present in the working directory or the whole path should be included. If `source` is set to `'side'`, the expSet matrix is estimated.
`expSet`	input expression matrix. By default set to `NULL` since the expSet is created based on cases, controls and source file.
`file`	if `expSet='octad.whole'`, source path to `expSet='octad.counts.and.tpm.h5'` file is required if it is not in working directory. By default function seeks for the .h5 file in the working directory.

Value

topline

data.frame with row.names as cell line names and column medcor containing values for correlation between set of samples from case_id and cell lines.

Examples

#load data.frame with samples included in the OCTAD database
phenoDF=get_ExperimentHub_data('EH7274') 
HCC_primary=subset(phenoDF,cancer=='liver hepatocellular carcinoma'&
sample.type == 'primary') #select data
case_id=HCC_primary$sample.id #select cases
cell_line_computed=computeCellLine(case_id=case_id,source='octad.small')
#load data.frame with samples included in the OCTAD database
phenoDF=get_ExperimentHub_data('EH7274') 
HCC_primary=subset(phenoDF,cancer=='liver hepatocellular carcinoma'&
sample.type == 'primary') #select data
case_id=HCC_primary$sample.id #select cases
cell_line_computed=computeCellLine(case_id=case_id,source='octad.small')

Compute correlating reference control samples.

Description

Compute reference control samples from OCTAD database using precomputed EncoderDF models.

Usage

 computeRefTissue(case_id = NULL, adjacent = FALSE, source = "octad",
 n_varGenes = 500, method = c("varGenes",'random'), expSet = NULL,
control_size = length(case_id),
outputFolder = NULL, cor_cutoff = "0", output = TRUE)


computeRefTissue(case_id = NULL, adjacent = FALSE, source = "octad",
 n_varGenes = 500, method = c("varGenes",'random'), expSet = NULL,
control_size = length(case_id),
outputFolder = NULL, cor_cutoff = "0", output = TRUE)

Arguments

`case_id`	vector of cases used to compute references.
`source`	by default set `octad` to use autoencoder results for computation. Any other input like `'side'` is `expSet` defined by users.
`adjacent`	by default set to `FALSE`. If `TRUE`, only tissue with `sample.type` 'adjacent' from `phenoDF` would be used instead of 'normal'.
`expSet`	input for expression matrix. By default NULL, since autoencoder results are used.
`n_varGenes`	number of genes used to select control cases.
`method`	one of two options is avaliable. `random` will take a random number of samples from control subset and `varGenes` (default) will select control samples based on distance between cases and selected samples.
`control_size`	number of control samples to be selected.
`outputFolder`	path to output folder. By default, the function produces result files in working directory.
`cor_cutoff`	cut-off for correlation values, by default `cor_cutoff='0'`.
`output`	if `TRUE`, two output files are produced.

Value

Return

control_id

a vector of an appropriate set of control samples.

Besides, if output=TRUE, two files are created in the working directory:

`case_normal_corMatrix.csv`	contains pairwise correlation between case samples vs control samples.
`case_normal_median_cor.csv`	contains median correlation values with case samples for returned control samples.

Examples

#select data
#load data.frame with samples included in the OCTAD database
phenoDF=get_ExperimentHub_data('EH7274') 
HCC_primary=subset(phenoDF,cancer=='Liver Hepatocellular Carcinoma'&
sample.type == 'primary'&data.source == 'TCGA')
#select cases
case_id=HCC_primary$sample.id
#computing reference tissue, by default using small autoEncoder,
#but can use custom expression set,
#by default output=TRUE and outputFolder option is empty,
#which creates control corMatrix.csv to working directory
control_id=computeRefTissue(case_id,outputFolder='',output=TRUE,
expSet = "octad",control_size = 50)
#select data
#load data.frame with samples included in the OCTAD database
phenoDF=get_ExperimentHub_data('EH7274') 
HCC_primary=subset(phenoDF,cancer=='Liver Hepatocellular Carcinoma'&
sample.type == 'primary'&data.source == 'TCGA')
#select cases
case_id=HCC_primary$sample.id
#computing reference tissue, by default using small autoEncoder,
#but can use custom expression set,
#by default output=TRUE and outputFolder option is empty,
#which creates control corMatrix.csv to working directory
control_id=computeRefTissue(case_id,outputFolder='',output=TRUE,
expSet = "octad",control_size = 50)

Compute differential expression

Description

Compute differential expression for case vs control samples. Will produce the file computedEmpGenes.csv listing empiricaly differentially expressed genes used for RNA-Seq normalization.

Usage

  diffExp(case_id = NULL, control_id = NULL, source = "octad.small",
  file = "octad.counts.and.tpm.h5", normalize_samples = TRUE,
                    k = 1, expSet = NULL, n_topGenes = 500,
                    DE_method = c("edgeR",'DESeq2','wilcox','limma'),
                    output = FALSE, outputFolder = NULL, annotate = TRUE)
  
diffExp(case_id = NULL, control_id = NULL, source = "octad.small",
  file = "octad.counts.and.tpm.h5", normalize_samples = TRUE,
                    k = 1, expSet = NULL, n_topGenes = 500,
                    DE_method = c("edgeR",'DESeq2','wilcox','limma'),
                    output = FALSE, outputFolder = NULL, annotate = TRUE)

Arguments

`case_id`	vector of cases used for differential expression.
`control_id`	vector of controls used for differential expression.
`source`	the file for the octad expression matrix. By default, set to `octad.small` to use only 978 landmark genes profiled in LINCS database. Use `octad.whole` option to compute DE on the whole transcriptome `octad.counts.and.tpm.h5` file. The file should be present in the working directory or the whole path should be included. If `source` is set to `'side'`, the expSet matrix is estimated.
`expSet`	input expression matrix. By default set to `NULL` since the expSet is created based on cases, controls and source file.
`file`	if `expSet='octad.whole'`, source path to `expSet='octad.counts.and.tpm.h5'` file is required if it is not in working directory. By default function seeks for the .h5 file in the working directory.
`normalize_samples`	if TRUE, RUVSeq normalization is applied to either EdgeR or DESeq. No normalization needed for limma+voom.
`k`	eiter k=1 (by default), k=2 or k=3, number of factors used in model matrix construction in RUVSeq normalization if `normalize_samples=TRUE`.
`n_topGenes`	number of empiricaly differentially expressed genes estimated for RUVSeq normalization. Default is 5000.
`DE_method`	edgeR, DESeq2, limma or wilcox DE analysis.
`output`	if `TRUE`, output files is produced.
`outputFolder`	path to output folder. By default, the function produces result files in working directory.
`annotate`	if `TRUE`, annotation by ENSEMBL gene is performed. If `TRUE`, make sure row.names of the custom input contain ensembl gene ids.

Value

`res`	`data.frame` with list of differentially expressed genes.
`computedEmpGenes.csv`	`data.frame` listing empiricaly differentially expressed genes used for RNA-Seq normalization.

Examples

#load data.frame with samples included in the OCTAD database
phenoDF=get_ExperimentHub_data('EH7274') 
HCC_primary=subset(phenoDF,cancer=='liver hepatocellular carcinoma'&
sample.type == 'primary') #select data
case_id=HCC_primary$sample.id #select cases
HCC_adjacent=subset(phenoDF,cancer=='liver hepatocellular carcinoma'&
sample.type == 'adjacent'&data.source == 'TCGA') #select data
control_id=HCC_adjacent$sample.id #select cases
res=diffExp(case_id,control_id,source='octad.small',output=FALSE)
#load data.frame with samples included in the OCTAD database
phenoDF=get_ExperimentHub_data('EH7274') 
HCC_primary=subset(phenoDF,cancer=='liver hepatocellular carcinoma'&
sample.type == 'primary') #select data
case_id=HCC_primary$sample.id #select cases
HCC_adjacent=subset(phenoDF,cancer=='liver hepatocellular carcinoma'&
sample.type == 'adjacent'&data.source == 'TCGA') #select data
control_id=HCC_adjacent$sample.id #select cases
res=diffExp(case_id,control_id,source='octad.small',output=FALSE)

Load octad expression data

Description

Create TPM or count expression matrix for the selected samples from OCTAD.

Usage

  loadOctadCounts(sample_vector='',type='tpm',file='')
loadOctadCounts(sample_vector='',type='tpm',file='')

Arguments

`sample_vector`	vector of samples to be selected. Use `phenoDF` object for sample id selection.
`type`	either `tpm` (default) or `counts` to be returned.
`file`	full path to `octad.counts.and.tpm.h5` file.

Value

exprData

matrix with either log2 corrected counts or tmp matrix for selected samples.

Examples

#load data.frame with samples included in the OCTAD database
phenoDF=get_ExperimentHub_data('EH7274') 
#load expression data for raw counts or tpm values.
HCC_primary=subset(phenoDF,cancer=='liver hepatocellular carcinoma'&
sample.type == 'primary') #select data
#case_id=HCC_primary$sample.id #select cases
#expression_tmp=loadOctadCounts(case_id,type='tpm',
#file='octad.counts.and.tpm.h5')
#expression_log2=loadOctadCounts(case_id,type='counts',
#file='octad.counts.and.tpm.h5')
#load data.frame with samples included in the OCTAD database
phenoDF=get_ExperimentHub_data('EH7274') 
#load expression data for raw counts or tpm values.
HCC_primary=subset(phenoDF,cancer=='liver hepatocellular carcinoma'&
sample.type == 'primary') #select data
#case_id=HCC_primary$sample.id #select cases
#expression_tmp=loadOctadCounts(case_id,type='tpm',
#file='octad.counts.and.tpm.h5')
#expression_log2=loadOctadCounts(case_id,type='counts',
#file='octad.counts.and.tpm.h5')

Open Cancer TherApeutic Discovery (OCTAD) database package

Description

Open Cancer TherApeutic Discovery (OCTAD) package implies sRGES approach for the drug discovery. The essential idea is to identify drugs that reverse the gene expression signature of a disease by tamping down over-expressed genes and stimulating weakly expressed ones. The following package contains all required precomputed data for whole OCTAD pipeline computation.

Details

The main functions are:

computeRefTissue - Compute reference control samples from OCTAD database using precomputed EncoderDF models.
diffExp - Compute differential expression for case vs control samples. Will produce the file computedEmpGenes.csv listing empiricaly differentially expressed genes used for RNA-Seq normalization.
runsRGES - Compute sRGES, a score indicating the reveral potency of each drug. It first computes RGES (Reverse Gene Expression Score) for individual instances and then summarizes RGES of invididual drugs (one drug may have multiple instances under different treatment conditions).
computeCellLine - Compute Correlation between cell lines and vector of case ids.
topLineEval - Evaluate predictions using pharmacogenomics data. Given a cell line, the function computes the correlation between sRGES and drug sensitivity data taken from CTRP. A higher correlation means a better prediction. The cell line could be computed from computeCellLine.
octadDrugEnrichment - Perform enrichment analysis of drug hits based on chemical structures, drug-targets, and pharmacological classifications. An enrichment score calculated using ssGSEA and a p-value computed through a permutation test are provided.

For detailed information on usage, see the package vignette, by typing vignette('octad'), or the workflow linked to on the first page of the vignette.

The code can be viewed at the GitHub repository, which also lists the contributor code of conduct:

https://github.com/Bin-Chen-Lab/OCTAD

References

Zeng, B., Glicksberg, B.S., Newbury, P., Chekalin, E., Xing, J., Liu, K., Wen, A., Chow, C. and Chen, B., 2021. OCTAD: an open workspace for virtually screening therapeutics targeting precise cancer patient groups using gene expression features. Nature protocols, 16(2), pp.728-753. https://www.nature.com/articles/s41596-020-00430-z _PACKAGE package

Compute Drug enrichment

Description

Perform enrichment analysis of drug hits based on chemical structures, drug-targets, and pharmacological classifications. An enrichment score calculated using ssGSEA and a p-value computed through a permutation test are provided.

Usage

  octadDrugEnrichment(sRGES = NULL, target_type = "chembl_targets",
  enrichFolder = "enrichFolder", outputFolder = NULL, outputRank = FALSE)

octadDrugEnrichment(sRGES = NULL, target_type = "chembl_targets",
  enrichFolder = "enrichFolder", outputFolder = NULL, outputRank = FALSE)

Arguments

`sRGES`	sRGES data frame produced by `runsRGES`.
`target_type`	one or several of `'chembl_targets','mesh','ChemCluster'` databases selected. By deafult only 'chembl_targets' will be used.
`enrichFolder`	folder to store output.
`outputFolder`	path where to store enrichFolder, in case of `NULL` will be stored in work directory.
`outputRank`	output detailed rank if `TRUE`, write sRGES for selected target as vcf.

Value

Following files are created: enriched_*_targets.csv and top_enriched_*_*_targets.pdf. In the case of chemical structural analysis, additional files are created: *drugstructureClusters.csv and *misc.csv. The results provide useful information for following candidate selection and experimental design. For example, if two structurally similar drugs are both predicted as top hits, the chance of each drug as a true positive is high.

exprData

matrix with either log2 corrected counts or tmp matrix for selected samples.

Examples

data("sRGES_example",package='octad') #load example sRGES
#run drug enrichment 
octadDrugEnrichment(sRGES = sRGES_example, target_type = c('chembl_targets'))

data("sRGES_example",package='octad') #load example sRGES
#run drug enrichment 
octadDrugEnrichment(sRGES = sRGES_example, target_type = c('chembl_targets'))

Differential expression example for HCC vs adjacent liver tissue computed in diffExp() function

Description

Differential expression example for HCC vs adjacent liver tissue computed in diffExp() function

Usage

data(res_example)
data(res_example)

Format

A data.frame with 963 rows and 18 variables:

identifier: Ensg ID
log2FoldChange: Log2 fold-change
logCPM: log CPM value
LR: LR value
pvalue: p.value
padj: FDR
tax_id: taxon id
GeneID: Gene id
LocusTag: Locus tag
chromosome: Chromosome
map_location: Chromosome location
description: Full gene name
type: type of gene
Symbol_autho: HGNC symbol
other: Gene function

Details

To generate this dataset use the following code from the octad package #load data.frame with samples included in the OCTAD database.
phenoDF=.eh[['EH7274']]
#select data
HCC_primary=subset(phenoDF,cancer=='liver hepatocellular carcinoma'&sample.type == 'primary')
#select cases
case_id=HCC_primary$sample.id
control_id=subset(phenoDF,biopsy.site=='LIVER'&sample.type=='normal')$sample.id[1:50]
res=diffExp(case_id,control_id,source='octad.small',output=FALSE)

Compute sRGES

Description

Compute sRGES, a score indicating the reveral potency of each drug. It first computes RGES (Reverse Gene Expression Score) for individual instances and then summarizes RGES of invididual drugs (one drug may have multiple instances under different treatment conditions).

Usage

 runsRGES(dz_signature=NULL,choose_fda_drugs = FALSE,max_gene_size=500,
cells=NULL,output=FALSE,outputFolder='',weight_cell_line=NULL,permutations=10000)
runsRGES(dz_signature=NULL,choose_fda_drugs = FALSE,max_gene_size=500,
cells=NULL,output=FALSE,outputFolder='',weight_cell_line=NULL,permutations=10000)

Arguments

`dz_signature`	disease signature. Make sure input data frame has a gene `Symbol` column, otherwise an error is produced. It must be an UPPERCASE gene symbol.
`choose_fda_drugs`	if `TRUE`, only FDA approved drugs are used.
`max_gene_size`	maximum number of disease genes used for drug prediction. By default 50 for each side (up/down).
`cells`	cell ids in `lincs_sig_info` file used for prediction. By default, all cell lines are used.
`weight_cell_line`	by default `NULL`, if `!NULL`, an output object from `computeCellLine` is estimated (see example).
`permutations`	number of permutations, by default 10000.
`output`	if `TRUE`, output files is produced.
`outputFolder`	folder path to store drug results, by default write results to working directory.

Value

The function returns RGES data.frame

containing scores and p.values for every instance. data.frame contains drug id in pert_iname collumn, n contains the number of instances for this drug, mean, median and sd of sRGES RGES sores.

Besides, a number of additional files in the sourced directory:

`dz_sig_used.csv`	contains genes in the disease signature used for computing reverse gene expression scores.
`sRGES.csv`	contains the same data as returned data.frame.
`all__lincs_score.csv`	includes information of RGES.

Examples

#load differential expression example for HCC 
#vs adjacent liver tissue computed in diffExp() function
data("res_example",package='octad')
res_example=subset(res_example,abs(log2FoldChange)>1&padj<0.001)[1:10,]
#run sRGES computation
#sRGES=runsRGES(dz_signature=res_example,max_gene_size=100,permutations=1000,output=FALSE)
#load differential expression example for HCC 
#vs adjacent liver tissue computed in diffExp() function
data("res_example",package='octad')
res_example=subset(res_example,abs(log2FoldChange)>1&padj<0.001)[1:10,]
#run sRGES computation
#sRGES=runsRGES(dz_signature=res_example,max_gene_size=100,permutations=1000,output=FALSE)

Data of computed example sRGEs for HCC vs liver adjacent tissues on octad.small dataset

Description

Data of computed example sRGEs for HCC vs liver adjacent tissues on octad.small dataset

Usage

data(sRGES_example)
data(sRGES_example)

Format

A tibble with 12,442 rows and 6 variables:

pert_iname: dbl Year price was recorded
mean: mean sRGES for obtained drug if n>1
n: times this drug was obtained
median: median sRGES for drug if n>1
sd: standart deviation for obtained drug if n>1
sRGES: sRGES score of the drug

Details

To generate this dataset use the following code from the octad package load differential expression example for HCC vs adjacent liver tissue computed in diffExp() function from res_example.
data('res_example',package='octad.db')
res=subset(res_example,abs(log2FoldChange)>1&padj<0.001) #load example expression dataset
sRGES=runsRGES(res,max_gene_size=100,permutations=1000,output=FALSE)

Evaluate cell lines

Description

Evaluate predictions using pharmacogenomics data. Given a cell line, the function computes the correlation between sRGES and drug sensitivity data taken from CTRP. A higher correlation means a better prediction. The cell line could be computed from computeCellLine.

Usage

  topLineEval(topline=NULL,mysRGES=NULL,outputFolder="")

topLineEval(topline=NULL,mysRGES=NULL,outputFolder="")

Arguments

`topline`	list of cell lines to be used for prediction.
`mysRGES`	sRGES data.frame produced by `runsRGES`.
`outputFolder`	Path to store results.

Value

The function produces 3 feils in the output directory:

`CellLineEval*_drug_sensitivity_insilico_results.txt`	with drug sensitivity information.
`*_auc_insilico_validation.html`	correlation between drug AUC and sRGES in a related cell line.
`*_ic50_insilico_validation.html`	correlation between drug IC50 and sGRES in a related cell line.

Examples

#load example sRGES computed by runsRGES() function for HCC 
#vs liver adjacent tissues on octad.small dataset
data("sRGES_example",package='octad') #load example sRGES
#Pick up cell lines
topLineEval(topline = 'HEPG2',mysRGES = sRGES_example,outputFolder=tempdir())

#load example sRGES computed by runsRGES() function for HCC 
#vs liver adjacent tissues on octad.small dataset
data("sRGES_example",package='octad') #load example sRGES
#Pick up cell lines
topLineEval(topline = 'HEPG2',mysRGES = sRGES_example,outputFolder=tempdir())

Package 'octad'

Help Index

Compute Correlation between cell lines and vector of case ids.

Description

Usage

Arguments

Value

See Also

Examples

Compute correlating reference control samples.

Description

Usage

Arguments

Value

See Also

Examples

Compute differential expression

Description

Usage

Arguments

Value

See Also

Examples

Load octad expression data

Description

Usage

Arguments

Value

See Also

Examples

Open Cancer TherApeutic Discovery (OCTAD) database package

Description

Details

References

Compute Drug enrichment

Description

Usage

Arguments

Value

See Also

Examples

Differential expression example for HCC vs adjacent liver tissue computed in diffExp() function

Description

Usage

Format

Details

Compute sRGES

Description

Usage

Arguments

Value

See Also

Examples

Data of computed example sRGEs for HCC vs liver adjacent tissues on octad.small dataset

Description

Usage

Format

Details

Evaluate cell lines

Description

Usage

Arguments

Value

See Also

Examples