library(gDRutils)
#> Warning: multiple methods tables found for 'setequal'
#> Warning: replacing previous import 'BiocGenerics::setequal' by
#> 'S4Vectors::setequal' when loading 'SummarizedExperiment'
#> Warning: replacing previous import 'BiocGenerics::setequal' by
#> 'S4Vectors::setequal' when loading 'IRanges'
#> Warning: replacing previous import 'BiocGenerics::setequal' by
#> 'S4Vectors::setequal' when loading 'GenomeInfoDb'
#> Warning: replacing previous import 'BiocGenerics::setequal' by
#> 'S4Vectors::setequal' when loading 'GenomicRanges'
#> Warning: replacing previous import 'BiocGenerics::setequal' by
#> 'S4Vectors::setequal' when loading 'XVector'
#> Warning: replacing previous import 'BiocGenerics::setequal' by
#> 'S4Vectors::setequal' when loading 'S4Arrays'
#> Warning: replacing previous import 'BiocGenerics::setequal' by
#> 'S4Vectors::setequal' when loading 'DelayedArray'
#> Warning: replacing previous import 'BiocGenerics::setequal' by
#> 'S4Vectors::setequal' when loading 'SparseArray'
#> Warning: replacing previous import 'S4Arrays::read_block' by
#> 'DelayedArray::read_block' when loading 'SummarizedExperiment'
suppressPackageStartupMessages(library(MultiAssayExperiment))
#> Warning: replacing previous import 'BiocGenerics::setequal' by
#> 'S4Vectors::setequal' when loading 'MultiAssayExperiment'
gDRutils
is part of the gDR
suite. This
package provides a bunch of tools for, among others:
gDRcore
package (MultiAssayExperiments
and
SummarizedExperiment
),gDR
experiments,The basic output of gDRcore
package is the
MultiAssayExperiment
object. Function MAEpply
allows for the data manipulation of this object, and can be used in a
similar way as a basic function lapply
.
mae <- get_synthetic_data("finalMAE_combo_matrix_small")
MAEpply(mae, dim)
#> $combination
#> [1] 6 2
#>
#> $`single-agent`
#> [1] 5 2
MAEpply(mae, rowData)
#> $combination
#> DataFrame with 6 rows and 7 columns
#> Gnumber DrugName
#> <character> <character>
#> G00004_drug_004_moa_A_G00021_drug_021_moa_D_72 G00004 drug_004
#> G00004_drug_004_moa_A_G00026_drug_026_moa_E_72 G00004 drug_004
#> G00005_drug_005_moa_A_G00021_drug_021_moa_D_72 G00005 drug_005
#> G00005_drug_005_moa_A_G00026_drug_026_moa_E_72 G00005 drug_005
#> G00006_drug_006_moa_A_G00021_drug_021_moa_D_72 G00006 drug_006
#> G00006_drug_006_moa_A_G00026_drug_026_moa_E_72 G00006 drug_006
#> drug_moa Gnumber_2
#> <character> <character>
#> G00004_drug_004_moa_A_G00021_drug_021_moa_D_72 moa_A G00021
#> G00004_drug_004_moa_A_G00026_drug_026_moa_E_72 moa_A G00026
#> G00005_drug_005_moa_A_G00021_drug_021_moa_D_72 moa_A G00021
#> G00005_drug_005_moa_A_G00026_drug_026_moa_E_72 moa_A G00026
#> G00006_drug_006_moa_A_G00021_drug_021_moa_D_72 moa_A G00021
#> G00006_drug_006_moa_A_G00026_drug_026_moa_E_72 moa_A G00026
#> DrugName_2 drug_moa_2
#> <character> <character>
#> G00004_drug_004_moa_A_G00021_drug_021_moa_D_72 drug_021 moa_D
#> G00004_drug_004_moa_A_G00026_drug_026_moa_E_72 drug_026 moa_E
#> G00005_drug_005_moa_A_G00021_drug_021_moa_D_72 drug_021 moa_D
#> G00005_drug_005_moa_A_G00026_drug_026_moa_E_72 drug_026 moa_E
#> G00006_drug_006_moa_A_G00021_drug_021_moa_D_72 drug_021 moa_D
#> G00006_drug_006_moa_A_G00026_drug_026_moa_E_72 drug_026 moa_E
#> Duration
#> <numeric>
#> G00004_drug_004_moa_A_G00021_drug_021_moa_D_72 72
#> G00004_drug_004_moa_A_G00026_drug_026_moa_E_72 72
#> G00005_drug_005_moa_A_G00021_drug_021_moa_D_72 72
#> G00005_drug_005_moa_A_G00026_drug_026_moa_E_72 72
#> G00006_drug_006_moa_A_G00021_drug_021_moa_D_72 72
#> G00006_drug_006_moa_A_G00026_drug_026_moa_E_72 72
#>
#> $`single-agent`
#> DataFrame with 5 rows and 4 columns
#> Gnumber DrugName drug_moa Duration
#> <character> <character> <character> <numeric>
#> G00004_drug_004_moa_A_72 G00004 drug_004 moa_A 72
#> G00005_drug_005_moa_A_72 G00005 drug_005 moa_A 72
#> G00006_drug_006_moa_A_72 G00006 drug_006 moa_A 72
#> G00021_drug_021_moa_D_72 G00021 drug_021 moa_D 72
#> G00026_drug_026_moa_E_72 G00026 drug_026 moa_E 72
This function allows also for extraction of unified data across all
the SummarizedExperiment
s inside
MultiAssayExperiment
, e.g.
MAEpply(mae, rowData, unify = TRUE)
#> Gnumber DrugName drug_moa Gnumber_2 DrugName_2 drug_moa_2 Duration
#> <char> <char> <char> <char> <char> <char> <num>
#> 1: G00004 drug_004 moa_A G00021 drug_021 moa_D 72
#> 2: G00004 drug_004 moa_A G00026 drug_026 moa_E 72
#> 3: G00005 drug_005 moa_A G00021 drug_021 moa_D 72
#> 4: G00005 drug_005 moa_A G00026 drug_026 moa_E 72
#> 5: G00006 drug_006 moa_A G00021 drug_021 moa_D 72
#> 6: G00006 drug_006 moa_A G00026 drug_026 moa_E 72
#> 7: G00004 drug_004 moa_A <NA> <NA> <NA> 72
#> 8: G00005 drug_005 moa_A <NA> <NA> <NA> 72
#> 9: G00006 drug_006 moa_A <NA> <NA> <NA> 72
#> 10: G00021 drug_021 moa_D <NA> <NA> <NA> 72
#> 11: G00026 drug_026 moa_E <NA> <NA> <NA> 72
All the metrics data are stored inside assays
of
SummarizedExperiment
. For the downstream analyses we
provide tools allowing for the extraction of the data into user-friendly
data.table
style.
There is a function working on the MultiAssayExperiment
object as well as a set of functions working on the
SummarizedExperiment
object:
mdt <- convert_mae_assay_to_dt(mae, "Metrics")
#> Loading required package: BumpyMatrix
head(mdt, 3)
#> rId
#> <char>
#> 1: G00004_drug_004_moa_A_G00021_drug_021_moa_D_72
#> 2: G00004_drug_004_moa_A_G00021_drug_021_moa_D_72
#> 3: G00004_drug_004_moa_A_G00021_drug_021_moa_D_72
#> cId x_mean x_AOC x_AOC_range xc50 x_max
#> <char> <num> <num> <num> <num> <num>
#> 1: CL00016_cellline_GB_tissue_y_46 -0.7046 1.7046 1.7046 -Inf -0.7046
#> 2: CL00016_cellline_GB_tissue_y_46 -0.7039 1.7039 1.7039 -Inf -0.7039
#> 3: CL00016_cellline_GB_tissue_y_46 -0.6920 1.6920 1.6920 -Inf -0.6920
#> ec50 x_inf x_0 h r2 p_value rss x_sd_avg
#> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1: 0 -0.7046 -0.7046 1e-04 0 NA NA 0
#> 2: 0 -0.7039 -0.7039 1e-04 0 NA NA 0
#> 3: 0 -0.6920 -0.6920 1e-04 0 NA NA 0
#> fit_type maxlog10Concentration N_conc normalization_type
#> <char> <num> <int> <char>
#> 1: DRCConstantFitResult 0.4996871 8 GR
#> 2: DRCConstantFitResult 0.4996871 8 GR
#> 3: DRCConstantFitResult 0.4996871 8 GR
#> fit_source cotrt_value ratio source Gnumber DrugName drug_moa
#> <char> <num> <num> <char> <char> <char> <char>
#> 1: gDR 3.160 NA row_fittings G00004 drug_004 moa_A
#> 2: gDR 1.000 NA row_fittings G00004 drug_004 moa_A
#> 3: gDR 0.316 NA row_fittings G00004 drug_004 moa_A
#> Gnumber_2 DrugName_2 drug_moa_2 Duration clid CellLineName Tissue
#> <char> <char> <char> <num> <char> <char> <char>
#> 1: G00021 drug_021 moa_D 72 CL00016 cellline_GB tissue_y
#> 2: G00021 drug_021 moa_D 72 CL00016 cellline_GB tissue_y
#> 3: G00021 drug_021 moa_D 72 CL00016 cellline_GB tissue_y
#> ReferenceDivisionTime
#> <num>
#> 1: 46
#> 2: 46
#> 3: 46
or alternatively for SummarizedExperiment
object:
se <- mae[[1]]
sdt <- convert_se_assay_to_dt(se, "Metrics")
head(sdt, 3)
#> rId
#> <char>
#> 1: G00004_drug_004_moa_A_G00021_drug_021_moa_D_72
#> 2: G00004_drug_004_moa_A_G00021_drug_021_moa_D_72
#> 3: G00004_drug_004_moa_A_G00021_drug_021_moa_D_72
#> cId x_mean x_AOC x_AOC_range xc50 x_max
#> <char> <num> <num> <num> <num> <num>
#> 1: CL00016_cellline_GB_tissue_y_46 -0.7046 1.7046 1.7046 -Inf -0.7046
#> 2: CL00016_cellline_GB_tissue_y_46 -0.7039 1.7039 1.7039 -Inf -0.7039
#> 3: CL00016_cellline_GB_tissue_y_46 -0.6920 1.6920 1.6920 -Inf -0.6920
#> ec50 x_inf x_0 h r2 p_value rss x_sd_avg
#> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1: 0 -0.7046 -0.7046 1e-04 0 NA NA 0
#> 2: 0 -0.7039 -0.7039 1e-04 0 NA NA 0
#> 3: 0 -0.6920 -0.6920 1e-04 0 NA NA 0
#> fit_type maxlog10Concentration N_conc normalization_type
#> <char> <num> <int> <char>
#> 1: DRCConstantFitResult 0.4996871 8 GR
#> 2: DRCConstantFitResult 0.4996871 8 GR
#> 3: DRCConstantFitResult 0.4996871 8 GR
#> fit_source cotrt_value ratio source Gnumber DrugName drug_moa
#> <char> <num> <num> <char> <char> <char> <char>
#> 1: gDR 3.160 NA row_fittings G00004 drug_004 moa_A
#> 2: gDR 1.000 NA row_fittings G00004 drug_004 moa_A
#> 3: gDR 0.316 NA row_fittings G00004 drug_004 moa_A
#> Gnumber_2 DrugName_2 drug_moa_2 Duration clid CellLineName Tissue
#> <char> <char> <char> <num> <char> <char> <char>
#> 1: G00021 drug_021 moa_D 72 CL00016 cellline_GB tissue_y
#> 2: G00021 drug_021 moa_D 72 CL00016 cellline_GB tissue_y
#> 3: G00021 drug_021 moa_D 72 CL00016 cellline_GB tissue_y
#> ReferenceDivisionTime
#> <num>
#> 1: 46
#> 2: 46
#> 3: 46
In gDR
we require standard identifiers that should be
visible in the input data, such as e.g. Gnumber
,
CLID
, Concentration
. However, user can define
their own custom identifiers.
To display gDR default identifier they can use
get_env_identifiers
function:
get_env_identifiers()
#> $duration
#> [1] "Duration"
#>
#> $cellline
#> [1] "clid"
#>
#> $cellline_name
#> [1] "CellLineName"
#>
#> $cellline_tissue
#> [1] "Tissue"
#>
#> $cellline_ref_div_time
#> [1] "ReferenceDivisionTime"
#>
#> $cellline_parental_identifier
#> [1] "parental_identifier"
#>
#> $cellline_subtype
#> [1] "subtype"
#>
#> $drug
#> [1] "Gnumber"
#>
#> $drug_name
#> [1] "DrugName"
#>
#> $drug_moa
#> [1] "drug_moa"
#>
#> $untreated_tag
#> [1] "vehicle" "untreated"
#>
#> $masked_tag
#> [1] "masked"
#>
#> $well_position
#> [1] "WellRow" "WellColumn"
#>
#> $concentration
#> [1] "Concentration"
#>
#> $template
#> [1] "Template" "Treatment"
#>
#> $barcode
#> [1] "Barcode" "Plate"
#>
#> $drug2
#> [1] "Gnumber_2"
#>
#> $drug_name2
#> [1] "DrugName_2"
#>
#> $drug_moa2
#> [1] "drug_moa_2"
#>
#> $concentration2
#> [1] "Concentration_2"
#>
#> $drug3
#> [1] "Gnumber_3"
#>
#> $drug_name3
#> [1] "DrugName_3"
#>
#> $drug_moa3
#> [1] "drug_moa_3"
#>
#> $concentration3
#> [1] "Concentration_3"
#>
#> $data_source
#> [1] "data_source"
#>
#> $replicate
#> [1] "Replicate"
#>
#> $normalization_type
#> [1] "normalization_type"
To change any of these identifiers user can use
set_env_identifier
, e.g.
and confirm, by displaying:
To restore default identifiers user can use
reset_env_identifiers
.
The validate_identifiers
function checks if the
specified identifier values exist in the data and (if needed) tries to
modify them to pass validation.
# Example data.table
dt <- data.table::data.table(
Barcode = c("A1", "A2", "A3"),
Duration = c(24, 48, 72),
Template = c("T1", "T2", "T3"),
clid = c("C1", "C2", "C3")
)
# Validate identifiers
validated_identifiers <- validate_identifiers(
dt,
req_ids = c("barcode", "duration", "template", "cellline")
)
print(validated_identifiers)
#> $duration
#> [1] "Duration"
#>
#> $cellline
#> [1] "clid"
#>
#> $cellline_name
#> [1] "CellLineName"
#>
#> $cellline_tissue
#> [1] "Tissue"
#>
#> $cellline_ref_div_time
#> [1] "ReferenceDivisionTime"
#>
#> $cellline_parental_identifier
#> [1] "parental_identifier"
#>
#> $cellline_subtype
#> [1] "subtype"
#>
#> $drug
#> [1] "Gnumber"
#>
#> $drug_name
#> [1] "DrugName"
#>
#> $drug_moa
#> [1] "drug_moa"
#>
#> $untreated_tag
#> [1] "vehicle" "untreated"
#>
#> $masked_tag
#> [1] "masked"
#>
#> $well_position
#> [1] "WellRow" "WellColumn"
#>
#> $concentration
#> [1] "Concentration"
#>
#> $template
#> [1] "Template"
#>
#> $barcode
#> [1] "Barcode"
#>
#> $drug2
#> [1] "Gnumber_2"
#>
#> $drug_name2
#> [1] "DrugName_2"
#>
#> $drug_moa2
#> [1] "drug_moa_2"
#>
#> $concentration2
#> [1] "Concentration_2"
#>
#> $drug3
#> [1] "Gnumber_3"
#>
#> $drug_name3
#> [1] "DrugName_3"
#>
#> $drug_moa3
#> [1] "drug_moa_3"
#>
#> $concentration3
#> [1] "Concentration_3"
#>
#> $data_source
#> [1] "data_source"
#>
#> $replicate
#> [1] "Replicate"
#>
#> $normalization_type
#> [1] "normalization_type"
In detail, validate_identifiers
wraps the following
steps:
.modify_polymapped_identifiers
function.check_required_identifiers
function.check_polymapped_identifiers
functionPrettifying identifiers means making them more user-friendly and
human-readable and is handled by the prettify_flat_metrics
function. Please see the relevant section for
more details.
Applied custom changes in the gDR output can disrupt internal
functions operation. Custom changes can be validated using
validate_MAE
or validate_SE
.
assay(se, "Normalized") <- NULL
validate_SE(se)
#> Error in validate_SE(se): Assertion on 'exp_assay_names' failed: Must be a subset of {'RawTreated','Controls','Averaged','excess','all_iso_points','isobolograms','scores','Metrics'}, but has additional elements {'Normalized'}.
There is also a group of functions to validate data used in the gDR application:
Prettifying involves transforming data into a more descriptive and human-readable version. This is particularly useful for front-end applications where user-friendly names are preferred over technical or abbreviated terms.
In gdrplatform there are two entities that can be prettified:
One can prettify the columns of the data.table(s) with a single
function called prettify_flat_metrics
.
dt <- get_testdata()[["raw_data"]]
colnames(dt)
prettify_flat_metrics(colnames(dt), human_readable = TRUE)
The prettify_flat_metrics
function is in fact a wrapper
for the following actions:
.convert_norm_specific_metrics
function.prettify_GDS_columns
.prettify_metadata_columns
function.prettify_metric_columns
function.prettify_cotreatment_columns
In case of data.table(s) with combo excess and score assays some of
the columns are prettified with the dedicated helper functions instead
of using prettify_flat_metrics
:
These helpers depend on the DATA_COMBO_INFO_TBL, (gDRutils) internal data.table.
The function get_assay_names
is the primary solution for
obtaining prettified versions of the assay names. It wraps the
get_env_assay_names
function which depends on
ASSAY_INFO_TBL, (gDRutils) internal data.table.
There are some functions that wrap the get_assay_names
function for combo data:
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] BumpyMatrix_1.15.0 MultiAssayExperiment_1.33.0
#> [3] SummarizedExperiment_1.37.0 Biobase_2.67.0
#> [5] GenomicRanges_1.59.0 GenomeInfoDb_1.43.0
#> [7] IRanges_2.41.0 S4Vectors_0.45.0
#> [9] BiocGenerics_0.53.1 generics_0.1.3
#> [11] MatrixGenerics_1.19.0 matrixStats_1.4.1
#> [13] gDRutils_1.5.2 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.9 SparseArray_1.7.1 stringi_1.8.4
#> [4] lattice_0.22-6 magrittr_2.0.3 digest_0.6.37
#> [7] evaluate_1.0.1 grid_4.4.2 fastmap_1.2.0
#> [10] jsonlite_1.8.9 Matrix_1.7-1 backports_1.5.0
#> [13] BiocManager_1.30.25 httr_1.4.7 UCSC.utils_1.3.0
#> [16] jquerylib_0.1.4 RApiSerialize_0.1.4 abind_1.4-8
#> [19] cli_3.6.3 rlang_1.1.4 crayon_1.5.3
#> [22] XVector_0.47.0 cachem_1.1.0 DelayedArray_0.33.1
#> [25] yaml_2.3.10 S4Arrays_1.7.1 tools_4.4.2
#> [28] qs_0.27.2 checkmate_2.3.2 GenomeInfoDbData_1.2.13
#> [31] buildtools_1.0.0 R6_2.5.1 lifecycle_1.0.4
#> [34] stringr_1.5.1 zlibbioc_1.52.0 stringfish_0.16.0
#> [37] RcppParallel_5.1.9 bslib_0.8.0 glue_1.8.0
#> [40] Rcpp_1.0.13-1 data.table_1.16.2 xfun_0.49
#> [43] sys_3.4.3 knitr_1.48 htmltools_0.5.8.1
#> [46] rmarkdown_2.29 maketools_1.3.1 compiler_4.4.2