gDRcore

Overview

The gDRcore is the part of the gDR suite. The package provides set of tools to proces and analyze drug response data.

Introduction

Data model

The data model is built on the MultiAssayExperiments (MAE) structure. Within an MAE, each SummarizedExperiment (SE) contains a different unit type (e.g. single-agent, or combination treatment). Columns of the MAE are defined by the cell lines and any modification of them and are shared with the SEs. Rows are defined by the treatments (e.g drugs, perturbations) and are specific to each SE. Assays of the SE are the different levels of data processing (raw, control, normalized, averaged data, as well as metrics). Each nested element of the assays of the SEs comprises the series themselves as a table (data.table in practice). Although not all elements need to have a series or the same number of elements, the attributes (columns of the table) should be consistent across the SE.

Drug processing

For drug response data, the input files need to be merged such that each measurement (data) is associated with the right metadata (cell line properties and treatment definition). Metadata can be added with the function cleanup_metadata if the right reference databases are in place.

When the data and metadata are merged into a long table, the wrapper function runDrugResponseProcessingPipeline can be used to generate an MAE with processed and analyzed data.

Figure 1. The overview of the runDrugResponseProcessingPipeline..

In practice runDrugResponseProcessingPipeline does the following steps:

  • create_SE creates the structure of the MAE and the associated SEs by assigning metadata into the row and column attributes. The assignment is performed in the function split_SE_components (see details below for the assumption made when building SE structures). create_SE also dispatches the raw data and controls into the right nested tables. Note that data may be duplicated between different SEs to make them self-contained.
  • normalize_SE normalizes the raw data based on the control. Calculation of the GR value is based on a cell line division time provided by the reference database if no pre-treatment control is provided. If both information are missing, GR values cannot be calculated. Additional normalization can be added as new rows in the nested table.
  • average_SE averages technical replicates that are stored in the same nested table are averaged.
  • fit_SE fits the dose-response curves and calculates response metrics for each normalization type.
  • fit_SE.combinations calculates synergy scores for drug combination data and, if the data is appropriate, fits along the two drugs and matrix-level metrics (e.g. isobolograms) are calculated. This is also performed for each normalization type independently.

Figure 2. Detailed overview of the drug processing pipeline..

The functions to process the data have parameters for specifying the names of the variables and assays. Additional parameters are available to personalize the processing steps such as force the nesting (or not) of an attribute, specify attributes that should be considered as technical replicates or not.

Use Cases

Data preprocessing

Please familiarize with gDRimport package containing bunch of tools allowing to prepare input data for gDRcore.

This example is made up based on the artificial dataset called data1 available within gDRimport package. gDR required three types of data that should be used as the raw input: Template, Manifest, and RawData. More info about these three types of data you could find in our general documentation.

td <- gDRimport::get_test_data()

Provided dataset needs to be merged into the one data.table object to be able to run gDR pipeline. This process can be done using two functions – gDRimport::load_data() and gDRcore::merge_data().

Running gDR pipeline

We provide an all-in-one function that splits data into appropriate data types, creates the SummarizedExperiment object for each data type, splits data into treatment and control assays, normalizes, averages, calculates gDR metrics, and finally, creates the MultiAssayExperiment object. This function is called runDrugResponseProcessingPipeline.

mae <- runDrugResponseProcessingPipeline(input_df)
mae
#> A MultiAssayExperiment object of 2 listed
#>  experiments with user-defined names and respective classes.
#>  Containing an ExperimentList class object of length 2:
#>  [1] combination: SummarizedExperiment with 2 rows and 6 columns
#>  [2] single-agent: SummarizedExperiment with 3 rows and 6 columns
#> Functionality:
#>  experiments() - obtain the ExperimentList instance
#>  colData() - the primary/phenotype DataFrame
#>  sampleMap() - the sample coordination DataFrame
#>  `$`, `[`, `[[` - extract colData columns, subset, or experiment
#>  *Format() - convert into a long or wide DataFrame
#>  assays() - convert ExperimentList to a SimpleList of matrices
#>  exportClass() - save data to flat files

And we can subset the MultiAssayExperiment to receive the SummarizedExperiment specific to any data type, e.g.

mae[["single-agent"]]
#> class: SummarizedExperiment 
#> dim: 3 6 
#> metadata(5): identifiers experiment_metadata Keys fit_parameters
#>   .internal
#> assays(5): RawTreated Controls Normalized Averaged Metrics
#> rownames(3): G00002_drug_002_moa_A_168 G00004_drug_004_moa_A_168
#>   G00011_drug_011_moa_B_168
#> rowData names(4): Gnumber DrugName drug_moa Duration
#> colnames(6): CL00011_cellline_BA_breast_cellline_BA_unknown_26
#>   CL00012_cellline_CA_breast_cellline_CA_unknown_30 ...
#>   CL00015_cellline_FA_breast_cellline_FA_unknown_42
#>   CL00018_cellline_IB_breast_cellline_IB_unknown_54
#> colData names(6): clid CellLineName ... subtype ReferenceDivisionTime

Data extraction

Extraction of the data from either MultiAssayExperiment or SummarizedExperiment objects into more user-friendly structures as well as other data transformations can be done using gDRutils. We encourage to read gDRutils vignette to familiarize with these functionalities.

SessionInfo

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] gDRcore_1.5.3     gDRtestData_1.4.0 BiocStyle_2.35.0 
#> 
#> loaded via a namespace (and not attached):
#>  [1] fastmap_1.2.0               BumpyMatrix_1.15.0         
#>  [3] TH.data_1.1-2               digest_0.6.37              
#>  [5] lifecycle_1.0.4             gDRutils_1.5.5             
#>  [7] survival_3.8-3              magrittr_2.0.3             
#>  [9] compiler_4.4.2              rlang_1.1.4                
#> [11] sass_0.4.9                  drc_3.0-1                  
#> [13] tools_4.4.2                 plotrix_3.8-4              
#> [15] yaml_2.3.10                 data.table_1.16.4          
#> [17] knitr_1.49                  lambda.r_1.2.4             
#> [19] S4Arrays_1.7.1              DelayedArray_0.33.3        
#> [21] abind_1.4-8                 multcomp_1.4-26            
#> [23] BiocParallel_1.41.0         purrr_1.0.2                
#> [25] BiocGenerics_0.53.3         sys_3.4.3                  
#> [27] grid_4.4.2                  stats4_4.4.2               
#> [29] colorspace_2.1-1            scales_1.3.0               
#> [31] gtools_3.9.5                MASS_7.3-61                
#> [33] MultiAssayExperiment_1.33.1 SummarizedExperiment_1.37.0
#> [35] cli_3.6.3                   mvtnorm_1.3-2              
#> [37] rmarkdown_2.29              crayon_1.5.3               
#> [39] generics_0.1.3              httr_1.4.7                 
#> [41] readxl_1.4.3                cachem_1.1.0               
#> [43] stringr_1.5.1               zlibbioc_1.52.0            
#> [45] splines_4.4.2               gDRimport_1.5.4            
#> [47] assertthat_0.2.1            parallel_4.4.2             
#> [49] formatR_1.14                BiocManager_1.30.25        
#> [51] cellranger_1.1.0            XVector_0.47.0             
#> [53] matrixStats_1.4.1           vctrs_0.6.5                
#> [55] Matrix_1.7-1                sandwich_3.1-1             
#> [57] jsonlite_1.8.9              carData_3.0-5              
#> [59] car_3.1-3                   IRanges_2.41.2             
#> [61] S4Vectors_0.45.2            Formula_1.2-5              
#> [63] maketools_1.3.1             testthat_3.2.2             
#> [65] jquerylib_0.1.4             rematch_2.0.0              
#> [67] glue_1.8.0                  codetools_0.2-20           
#> [69] stringi_1.8.4               futile.logger_1.4.3        
#> [71] GenomeInfoDb_1.43.2         GenomicRanges_1.59.1       
#> [73] UCSC.utils_1.3.0            munsell_0.5.1              
#> [75] tibble_3.2.1                pillar_1.10.0              
#> [77] htmltools_0.5.8.1           brio_1.1.5                 
#> [79] GenomeInfoDbData_1.2.13     R6_2.5.1                   
#> [81] evaluate_1.0.1              lattice_0.22-6             
#> [83] Biobase_2.67.0              futile.options_1.0.1       
#> [85] backports_1.5.0             bslib_0.8.0                
#> [87] SparseArray_1.7.2           checkmate_2.3.2            
#> [89] xfun_0.49                   MatrixGenerics_1.19.0      
#> [91] zoo_1.8-12                  buildtools_1.0.0           
#> [93] pkgconfig_2.0.3