gDRimport

Overview

The gDRimport package is a part of the gDR suite. It helps to prepare raw drug response data for downstream processing. It mainly contains helper functions for importing/loading/validating dose response data provided in different file formats.

Use Cases

Test Data

There are currently four test datasets that can be used to see what’s the expected input data for the gDRimport.

# primary test data
td1 <- get_test_data()
summary(td1)
##        Length         Class          Mode 
##             1 gdr_test_data            S4
td1
## class: gdr_test_data 
## slots: manifest_path result_path template_path ref_m_df ref_r1_r2 ref_r1 ref_t1_t2 ref_t1
# test data in Tecan format
td2 <- get_test_Tecan_data()
summary(td2)
##          Length Class  Mode     
## m_file   1      -none- character
## r_files  1      -none- character
## t_files  1      -none- character
## ref_m_df 1      -none- character
## ref_r_df 1      -none- character
## ref_t_df 1      -none- character
# test data in D300 format
td3 <- get_test_D300_data()
summary(td3)
##        Length Class  Mode
## f_96w  6      -none- list
## f_384w 6      -none- list
# test data obtained from EnVision
td4 <- get_test_EnVision_data()
summary(td4)
##            Length Class  Mode     
## m_file      1     -none- character
## r_files    28     -none- character
## t_files     2     -none- character
## ref_l_path  1     -none- character

Load data

The load_data is the key function. It wraps load_manifest, load_templates and load_results functions and supports different file formats.

ml <- load_manifest(manifest_path(td1))
summary(ml)
##         Length Class      Mode
## data     4     data.table list
## headers 27     -none-     list
t_df <- load_templates(template_path(td1))
summary(t_df)
##    WellRow           WellColumn          Gnumber          Concentration     
##  Length:768         Length:768         Length:768         Length:768        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##   Gnumber_2         Concentration_2      Template        
##  Length:768         Length:768         Length:768        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character
r_df <- suppressMessages(load_results(result_path(td1)))
summary(r_df)
##    Barcode            WellRow            WellColumn     ReadoutValue    
##  Length:4587        Length:4587        Min.   : 1.00   Min.   :  12627  
##  Class :character   Class :character   1st Qu.: 6.50   1st Qu.:  67905  
##  Mode  :character   Mode  :character   Median :12.00   Median : 140865  
##                                        Mean   :12.49   Mean   : 263996  
##                                        3rd Qu.:18.00   3rd Qu.: 324707  
##                                        Max.   :24.00   Max.   :2423054  
##  BackgroundValue
##  Min.   :332.0  
##  1st Qu.:351.0  
##  Median :374.0  
##  Mean   :453.2  
##  3rd Qu.:570.0  
##  Max.   :704.0
l_tbl <-
  suppressMessages(
    load_data(manifest_path(td1), template_path(td1), result_path(td1)))
summary(l_tbl)
##            Length Class      Mode
## manifest   4      data.table list
## treatments 7      data.table list
## data       5      data.table list

PRISM

PRISM, the Multiplexed cancer cell line screening platform, facilitates rapid screening of a broad spectrum of drugs across more than 900 human cancer cell line models, employing a high-throughput, multiplexed approach. Publicly available PRISM data can be downloaded from the DepMap website (DepMap).

The gDRimport package provides support for processing PRISM data at two levels: LEVEL5 and LEVEL6.

  • LEVEL5 Data: This format encapsulates all information about drugs, cell lines, and viability within a single file. To process LEVEL5 PRISM data, you can use the convert_LEVEL5_prism_to_gDR_input() function. This function not only transforms and cleans the data but also executes the gDR pipeline for further analysis.

  • LEVEL6 Data: In LEVEL6, PRISM data is distributed across three separate files:

prism_data: containing collapsed log fold change data for viability assays. cell_line_data: providing information about cell lines. treatment_data: containing treatment data.

Processing LEVEL6 PRISM data can be accomplished using the convert_LEVEL6_prism_to_gDR_input() function, which requires paths to these three files as input arguments.

Processing LEVEL5 PRISM Data

To process LEVEL5 PRISM data, you can use the following function:

convert_LEVEL5_prism_to_gDR_input("path_to_file")

Replace “path_to_file” with the actual path to your LEVEL5 PRISM data file. This function will handle the transformation, cleaning, and execution of the gDR pipeline automatically.

Processing LEVEL6 PRISM Data

To process LEVEL6 PRISM data, you can use the following function:

convert_LEVEL6_prism_to_gDR_input("prism_data_path", "cell_line_data_path", "treatment_data_path")

Replace “prism_data_path”, “cell_line_data_path”, and “treatment_data_path” with the respective paths to your LEVEL6 PRISM data files.

Package installation

The function installAllDeps assists in installing package dependencies.

SessionInfo

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] BiocStyle_2.35.0            MultiAssayExperiment_1.33.0
##  [3] gDRimport_1.5.2             PharmacoGx_3.11.0          
##  [5] CoreGx_2.11.0               SummarizedExperiment_1.37.0
##  [7] Biobase_2.67.0              GenomicRanges_1.59.0       
##  [9] GenomeInfoDb_1.43.0         IRanges_2.41.0             
## [11] S4Vectors_0.45.0            MatrixGenerics_1.19.0      
## [13] matrixStats_1.4.1           BiocGenerics_0.53.1        
## [15] generics_0.1.3              rmarkdown_2.29             
## 
## loaded via a namespace (and not attached):
##   [1] bitops_1.0-9            formatR_1.14            readxl_1.4.3           
##   [4] testthat_3.2.1.1        rlang_1.1.4             magrittr_2.0.3         
##   [7] shinydashboard_0.7.2    compiler_4.4.2          vctrs_0.6.5            
##  [10] reshape2_1.4.4          relations_0.6-13        stringr_1.5.1          
##  [13] pkgconfig_2.0.3         crayon_1.5.3            fastmap_1.2.0          
##  [16] backports_1.5.0         XVector_0.47.0          caTools_1.18.3         
##  [19] utf8_1.2.4              promises_1.3.0          UCSC.utils_1.3.0       
##  [22] coop_0.6-3              xfun_0.49               zlibbioc_1.52.0        
##  [25] cachem_1.1.0            jsonlite_1.8.9          SnowballC_0.7.1        
##  [28] later_1.3.2             DelayedArray_0.33.1     BiocParallel_1.41.0    
##  [31] parallel_4.4.2          sets_1.0-25             cluster_2.1.6          
##  [34] R6_2.5.1                stringi_1.8.4           bslib_0.8.0            
##  [37] RColorBrewer_1.1-3      qs_0.27.2               limma_3.63.1           
##  [40] pkgload_1.4.0           boot_1.3-31             cellranger_1.1.0       
##  [43] brio_1.1.5              jquerylib_0.1.4         assertthat_0.2.1       
##  [46] Rcpp_1.0.13-1           knitr_1.48              downloader_0.4         
##  [49] httpuv_1.6.15           Matrix_1.7-1            igraph_2.1.1           
##  [52] tidyselect_1.2.1        abind_1.4-8             yaml_2.3.10            
##  [55] stringfish_0.16.0       gplots_3.2.0            codetools_0.2-20       
##  [58] plyr_1.8.9              lattice_0.22-6          tibble_3.2.1           
##  [61] shiny_1.9.1             BumpyMatrix_1.15.0      evaluate_1.0.1         
##  [64] lambda.r_1.2.4          desc_1.4.3              futile.logger_1.4.3    
##  [67] RcppParallel_5.1.9      bench_1.1.3             BiocManager_1.30.25    
##  [70] pillar_1.9.0            lsa_0.73.3              KernSmooth_2.23-24     
##  [73] checkmate_2.3.2         DT_0.33                 shinyjs_2.1.0          
##  [76] piano_2.23.0            rprojroot_2.0.4         ggplot2_3.5.1          
##  [79] munsell_0.5.1           scales_1.3.0            RApiSerialize_0.1.4    
##  [82] gtools_3.9.5            xtable_1.8-4            marray_1.85.0          
##  [85] glue_1.8.0              slam_0.1-54             maketools_1.3.1        
##  [88] tools_4.4.2             sys_3.4.3               data.table_1.16.2      
##  [91] gDRutils_1.5.1          fgsea_1.33.0            buildtools_1.0.0       
##  [94] visNetwork_2.1.2        fastmatch_1.1-4         cowplot_1.1.3          
##  [97] grid_4.4.2              colorspace_2.1-1        GenomeInfoDbData_1.2.13
## [100] cli_3.6.3               futile.options_1.0.1    fansi_1.0.6            
## [103] S4Arrays_1.7.1          rematch_2.0.0           dplyr_1.1.4            
## [106] gtable_0.3.6            sass_0.4.9              digest_0.6.37          
## [109] SparseArray_1.7.1       htmlwidgets_1.6.4       htmltools_0.5.8.1      
## [112] lifecycle_1.0.4         httr_1.4.7              statmod_1.5.0          
## [115] mime_0.12