Title: | A package with helper functions for processing drug response data |
---|---|
Description: | This package contains utility functions used throughout the gDR platform to fit data, manipulate data, and convert and validate data structures. This package also has the necessary default constants for gDR platform. Many of the functions are utilized by the gDRcore package. |
Authors: | Bartosz Czech [aut] , Arkadiusz Gladki [cre, aut] , Aleksander Chlebowski [aut], Marc Hafner [aut] , Pawel Piatkowski [aut], Dariusz Scigocki [aut], Janina Smola [aut], Sergiu Mocanu [aut], Allison Vuong [aut] |
Maintainer: | Arkadiusz Gladki <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.5.2 |
Built: | 2024-11-08 03:23:45 UTC |
Source: | https://github.com/bioc/gDRutils |
Set fit parameters for an invalid fit.
.set_invalid_fit_params(out, norm_values)
.set_invalid_fit_params(out, norm_values)
out |
Named list of fit parameters. |
norm_values |
Numeric vector used to estimate an |
Modified named list of fit parameters.
.set_invalid_fit_params(list(), norm_values = rep(0.3, 6))
.set_invalid_fit_params(list(), norm_values = rep(0.3, 6))
Modify and object's class
attribute.
addClass(x, newClass)
addClass(x, newClass)
x |
an object |
newClass |
character string; class to be added |
This is a simple convenience function that an item to the class
attribute of an object
so that it can be dispatched to a proper S3 method. This is purely for code clarity,
so that individual methods do not clutter the definitions of higher order functions.
The same object with an added S3 class.
addClass(data.table::data.table(), "someClass")
addClass(data.table::data.table(), "someClass")
BumpyMatrix
assay by a given aggregation function.Aggregation can only be performed on nested variables.
aggregate_assay(asy, by, FUN)
aggregate_assay(asy, by, FUN)
asy |
A |
by |
Character vector of the nested fields to aggregate by. |
FUN |
A function to use to aggregate the data. |
A BumpyMatrix
object aggregated by FUN
.
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] assay <- SummarizedExperiment::assay(se) aggregate_assay(assay, FUN = mean, by = c("Barcode"))
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] assay <- SummarizedExperiment::assay(se) aggregate_assay(assay, FUN = mean, by = c("Barcode"))
Apply a user-specified function to every element of a bumpy matrix.
apply_bumpy_function( se, FUN, req_assay_name, out_assay_name, parallelize = FALSE, ... )
apply_bumpy_function( se, FUN, req_assay_name, out_assay_name, parallelize = FALSE, ... )
se |
A |
FUN |
A function that will be applied to each element of the matrix in assay |
req_assay_name |
String of the assay name in the |
out_assay_name |
String of the assay name that will contain the results of the applied function. |
parallelize |
Logical indicating whether or not to parallelize the computation. |
... |
Additional args to be passed to teh |
The original se
object with a new assay, out_assay_name
.
mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] FUN <- function(x) { data.table::data.table(Concentration = x$Concentration, CorrectedReadout = x$CorrectedReadout) } apply_bumpy_function( se, FUN = FUN, req_assay_name = "RawTreated", out_assay_name = "CorrectedReadout" )
mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] FUN <- function(x) { data.table::data.table(Concentration = x$Concentration, CorrectedReadout = x$CorrectedReadout) } apply_bumpy_function( se, FUN = FUN, req_assay_name = "RawTreated", out_assay_name = "CorrectedReadout" )
assert choices
assert_choices(x, choices, ...)
assert_choices(x, choices, ...)
x |
charvec expected subset |
choices |
charvec reference set |
... |
Additional arguments to pass to |
NULL
assert_choices("x", c("x","y"))
assert_choices("x", c("x","y"))
Average biological replicates on the data table side.
average_biological_replicates_dt( dt, var, prettified = FALSE, fixed = TRUE, geometric_average_fields = get_header("metric_average_fields")$geometric_mean, fit_type_average_fields = get_header("metric_average_fields")$fit_type, blacklisted_fields = get_header("metric_average_fields")$blacklisted, add_sd = FALSE )
average_biological_replicates_dt( dt, var, prettified = FALSE, fixed = TRUE, geometric_average_fields = get_header("metric_average_fields")$geometric_mean, fit_type_average_fields = get_header("metric_average_fields")$fit_type, blacklisted_fields = get_header("metric_average_fields")$blacklisted, add_sd = FALSE )
dt |
data.table with Metric data |
var |
String representing additional metadata of replicates |
prettified |
Flag indicating if the provided identifiers in the dt are prettified |
fixed |
Flag indicating whether to add a fix for -Inf in the geometric mean. |
geometric_average_fields |
Character vector of column names in |
fit_type_average_fields |
Character vector of column names in |
blacklisted_fields |
Character vector of column names in |
add_sd |
Flag indicating whether to add standard deviation and count columns. |
data.table without replicates
dt <- data.table::data.table(a = c(1:10, 1), b = c(rep("drugA", 10), rep("drugB", 1))) average_biological_replicates_dt(dt, var = "a")
dt <- data.table::data.table(a = c(1:10, 1), b = c(rep("drugA", 10), rep("drugB", 1))) average_biological_replicates_dt(dt, var = "a")
This function calculates the standard deviation of a numeric vector. If the vector has a length of 1 and it is numeric, it returns 0.
calc_sd(x)
calc_sd(x)
x |
A numeric vector. |
The standard deviation of the vector if its length is greater than 1 or it is not numeric, otherwise 0.
calc_sd(c(1, 2, 3, 4, 5)) # Should return the standard deviation calc_sd(c(1)) # Should return 0 calc_sd(numeric(0)) # Should return NA calc_sd(c("a", "b", "c")) # Should return NA
calc_sd(c(1, 2, 3, 4, 5)) # Should return the standard deviation calc_sd(c(1)) # Should return 0 calc_sd(numeric(0)) # Should return NA calc_sd(c("a", "b", "c")) # Should return NA
Set IC50/GR50 value to Inf
or -Inf
based on upper and lower limits.
cap_xc50(xc50, max_conc, min_conc = NA, capping_fold = 5)
cap_xc50(xc50, max_conc, min_conc = NA, capping_fold = 5)
xc50 |
Numeric value of the IC50/GR50 to cap. |
max_conc |
Numeric value of the highest concentration in a dose series used to calculate the |
min_conc |
Numeric value of the lowest concentration in a dose series used to calculate the |
capping_fold |
Integer value of the fold number to use for capping. Defaults to |
Note: xc50
and max_conc
should share the same units.
Ideally, the lower_cap
should be based on the lowest tested concentration.
However, since we don't record that, it is set 5 orders of magnitude below the highest dose.
Capped IC50/GR50 value.
cap_xc50(xc50 = 1, max_conc = 2) cap_xc50(xc50 = 2, max_conc = 5, min_conc = 1) cap_xc50(xc50 = 26, max_conc = 5, capping_fold = 5)
cap_xc50(xc50 = 1, max_conc = 2) cap_xc50(xc50 = 2, max_conc = 5, min_conc = 1) cap_xc50(xc50 = 26, max_conc = 5, capping_fold = 5)
Convert colData to JSON format for elasticsearch indexing.
convert_colData_to_json( cdata, identifiers, req_cols = c("cellline", "cellline_name", "cellline_tissue", "cellline_ref_div_time") )
convert_colData_to_json( cdata, identifiers, req_cols = c("cellline", "cellline_name", "cellline_tissue", "cellline_ref_div_time") )
cdata |
data.table of |
identifiers |
charvec with identifiers |
req_cols |
charvec required columns |
Standardizes the cdata
to common schema fields
and tidies formatting to be condusive to joining
with other JSON responses.
JSON string capturing the cdata
.
cdata <- data.table::data.table( mycellline = letters, mycelllinename = letters, mycelllinetissue = letters, cellline_ref_div_time = "cellline_ref_div_time") identifiers <- list(cellline = "mycellline", cellline_name = "mycelllinename", cellline_ref_div_time = "cellline_ref_div_time", cellline_tissue = "mycelllinetissue") convert_colData_to_json(cdata, identifiers)
cdata <- data.table::data.table( mycellline = letters, mycelllinename = letters, mycelllinetissue = letters, cellline_ref_div_time = "cellline_ref_div_time") identifiers <- list(cellline = "mycellline", cellline_name = "mycelllinename", cellline_ref_div_time = "cellline_ref_div_time", cellline_tissue = "mycelllinetissue") convert_colData_to_json(cdata, identifiers)
convert combo assays from SummarizedExperiments to the list of data.tables
convert_combo_data_to_dt( se, c_assays = get_combo_assay_names(), normalization_type = c("RV", "GR"), prettify = TRUE )
convert_combo_data_to_dt( se, c_assays = get_combo_assay_names(), normalization_type = c("RV", "GR"), prettify = TRUE )
se |
|
c_assays |
charvec of combo assays to be used |
normalization_type |
charvec of normalization_types expected in the data |
prettify |
boolean flag indicating whether or not to prettify the colnames of the returned data |
list of data.table(s) with combo data
Arkadiusz GÅ‚adki [email protected]
mae <- get_synthetic_data("finalMAE_combo_matrix_small.qs") convert_combo_data_to_dt(mae[[1]])
mae <- get_synthetic_data("finalMAE_combo_matrix_small.qs") convert_combo_data_to_dt(mae[[1]])
get combo assay names based on the field name
convert_combo_field_to_assay(field)
convert_combo_field_to_assay(field)
field |
String containing name of the field for which the assay name should be returned |
charvec
convert_combo_field_to_assay("hsa_score")
convert_combo_field_to_assay("hsa_score")
Convert an assay within a SummarizedExperiment object in a MultiAssayExperiment to a long data.table.
convert_mae_assay_to_dt( mae, assay_name, experiment_name = NULL, include_metadata = TRUE, retain_nested_rownames = FALSE, wide_structure = FALSE )
convert_mae_assay_to_dt( mae, assay_name, experiment_name = NULL, include_metadata = TRUE, retain_nested_rownames = FALSE, wide_structure = FALSE )
mae |
A MultiAssayExperiment object holding experiments with raw and/or processed dose-response data in its assays. |
assay_name |
String of name of the assay to transform within an experiment of the |
experiment_name |
String of name of the experiment in |
include_metadata |
Boolean indicating whether or not to include |
retain_nested_rownames |
Boolean indicating whether or not to retain the rownames
nested within a |
wide_structure |
Boolean indicating whether or not to transform data.table into wide format.
|
NOTE: to extract information about 'Control' data, simply call the function with the name of the assay holding data on controls.
data.table representation of the data in assay_name
.
Bartosz Czech [email protected]
flatten convert_se_assay_to_dt
mae <- get_synthetic_data("finalMAE_small") convert_mae_assay_to_dt(mae, "Metrics")
mae <- get_synthetic_data("finalMAE_small") convert_mae_assay_to_dt(mae, "Metrics")
Convert a MultiAssayExperiment object to a JSON document.
convert_mae_to_json(mae, with_experiments = TRUE)
convert_mae_to_json(mae, with_experiments = TRUE)
mae |
SummarizedExperiment object. |
with_experiments |
logical convert experiment metadata as well? |
String representation of a JSON document.
mae <- get_synthetic_data("finalMAE_small") convert_mae_to_json(mae) convert_mae_to_json(mae, with_experiments = FALSE)
mae <- get_synthetic_data("finalMAE_small") convert_mae_to_json(mae) convert_mae_to_json(mae, with_experiments = FALSE)
Convert experiment metadata to JSON format for elasticsearch indexing.
convert_metadata_to_json(se)
convert_metadata_to_json(se)
se |
SummarizedExperiment object. |
JSON string capturing experiment metadata.
md <- list(title = "my awesome experiment", description = "description of experiment", sources = list(list(name = "GeneData_Screener", id = "QCS-12345"))) se <- SummarizedExperiment::SummarizedExperiment(metadata = md) convert_metadata_to_json(se)
md <- list(title = "my awesome experiment", description = "description of experiment", sources = list(list(name = "GeneData_Screener", id = "QCS-12345"))) se <- SummarizedExperiment::SummarizedExperiment(metadata = md) convert_metadata_to_json(se)
Convert rowData to JSON format for elasticsearch indexing.
convert_rowData_to_json( rdata, identifiers, req_cols = c("drug", "drug_name", "drug_moa", "duration") )
convert_rowData_to_json( rdata, identifiers, req_cols = c("drug", "drug_name", "drug_moa", "duration") )
rdata |
data.table of |
identifiers |
charvec with identifiers |
req_cols |
charvec required columns |
Standardizes the rdata
to common schema fields
and tidies formatting to be condusive to joining
with other JSON responses.
JSON string capturing the rdata
.
rdata <- data.table::data.table( mydrug = letters, mydrugname = letters, mydrugmoa = letters, Duration = 1) identifiers <- list(drug = "mydrug", drug_name = "mydrugname", drug_moa = "mydrugmoa", duration = "Duration") convert_rowData_to_json(rdata, identifiers)
rdata <- data.table::data.table( mydrug = letters, mydrugname = letters, mydrugmoa = letters, Duration = 1) identifiers <- list(drug = "mydrug", drug_name = "mydrugname", drug_moa = "mydrugmoa", duration = "Duration") convert_rowData_to_json(rdata, identifiers)
Convert an assay within a SummarizedExperiment object to a long data.table. Then condcut some post processing steps.
convert_se_assay_to_custom_dt(se, assay_name, output_table = NULL)
convert_se_assay_to_custom_dt(se, assay_name, output_table = NULL)
se |
A SummarizedExperiment object holding raw and/or processed dose-response data in its assays. |
assay_name |
String of name of the assay to transform within the |
output_table |
String of type name of the output data.table |
Current strategy is per-assay specific.
combo assays: conversion to data.table only (with wide_structure
= FALSE)
'Metrics' assay can be converted to three types of outputs:
Metrics_initial (conversion to data.table only, with wide_structure
= FALSE)
Metrics_raw: same as Metrics_initial followed by:
fix for 'EC50' and 'Metrics_rownames'
flatten
prettifying and dropping excess variables
Metrics (same as Metrics_raw + capVals)
'Normalization' and 'Averaged' assay:
conversion to data.table (with wide_structure
= TRUE)
prettifying and dropping excess variables
NOTE: to extract information about 'Control' data, simply call the
function with the name of the assay holding data on controls.
To extract the reference data in to same format as 'Averaged' use convert_se_ref_assay_to_dt
.
data.table representation of the data in assay_name
with added information from colData
.
convert_se_assay_to_dt
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] convert_se_assay_to_custom_dt(se, "Metrics") convert_se_assay_to_custom_dt(se, "Metrics", output_table = "Metrics_raw") convert_se_assay_to_custom_dt(se, "Metrics", output_table = "Metrics_initial") convert_se_assay_to_custom_dt(se, "Averaged")
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] convert_se_assay_to_custom_dt(se, "Metrics") convert_se_assay_to_custom_dt(se, "Metrics", output_table = "Metrics_raw") convert_se_assay_to_custom_dt(se, "Metrics", output_table = "Metrics_initial") convert_se_assay_to_custom_dt(se, "Averaged")
Convert an assay within a SummarizedExperiment object to a long data.table.
convert_se_assay_to_dt( se, assay_name, include_metadata = TRUE, retain_nested_rownames = FALSE, wide_structure = FALSE )
convert_se_assay_to_dt( se, assay_name, include_metadata = TRUE, retain_nested_rownames = FALSE, wide_structure = FALSE )
se |
A SummarizedExperiment object holding raw and/or processed dose-response data in its assays. |
assay_name |
String of name of the assay to transform within the |
include_metadata |
Boolean indicating whether or not to include |
retain_nested_rownames |
Boolean indicating whether or not to retain the rownames
nested within a |
wide_structure |
Boolean indicating whether or not to transform data.table into wide format.
|
NOTE: to extract information about 'Control' data, simply call the
function with the name of the assay holding data on controls.
To extract the reference data in to same format as 'Averaged' use convert_se_ref_assay_to_dt
.
data.table representation of the data in assay_name
.
flatten
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] convert_se_assay_to_dt(se, "Metrics")
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] convert_se_assay_to_dt(se, "Metrics")
Convert a SummarizedExperiment object to a JSON document.
convert_se_to_json(se)
convert_se_to_json(se)
se |
SummarizedExperiment object. |
String representation of a JSON document.
md <- list(title = "my awesome experiment", description = "description of experiment", source = list(name = "GeneData_Screener", id = "QCS-12345")) rdata <- data.table::data.table( mydrug = letters, mydrugname = letters, mydrugmoa = letters, Duration = 1) cdata <- data.table::data.table(mycellline = letters, mycelllinename = letters, mycelllinetissue = letters, cellline_ref_div_time = letters) identifiers <- list(cellline = "mycellline", cellline_name = "mycelllinename", cellline_tissue = "mycelllinetissue", cellline_ref_div_time = "cellline_ref_div_time", drug = "mydrug", drug_name = "mydrugname", drug_moa = "mydrugmoa", duration = "Duration") se <- SummarizedExperiment::SummarizedExperiment(rowData = rdata, colData = cdata) se <- set_SE_experiment_metadata(se, md) se <- set_SE_identifiers(se, identifiers) convert_se_to_json(se)
md <- list(title = "my awesome experiment", description = "description of experiment", source = list(name = "GeneData_Screener", id = "QCS-12345")) rdata <- data.table::data.table( mydrug = letters, mydrugname = letters, mydrugmoa = letters, Duration = 1) cdata <- data.table::data.table(mycellline = letters, mycelllinename = letters, mycelllinetissue = letters, cellline_ref_div_time = letters) identifiers <- list(cellline = "mycellline", cellline_name = "mycelllinename", cellline_tissue = "mycelllinetissue", cellline_ref_div_time = "cellline_ref_div_time", drug = "mydrug", drug_name = "mydrugname", drug_moa = "mydrugmoa", duration = "Duration") se <- SummarizedExperiment::SummarizedExperiment(rowData = rdata, colData = cdata) se <- set_SE_experiment_metadata(se, md) se <- set_SE_identifiers(se, identifiers) convert_se_to_json(se)
Define matrix grid positions
define_matrix_grid_positions(conc1, conc2)
define_matrix_grid_positions(conc1, conc2)
conc1 |
drug_1 concentration |
conc2 |
drug_2 concentration |
drug_1
is diluted along the rows as the y-axis and
drug_2
is diluted along the columns and will be the x-axis.
list with axis grid positions
cl_name <- "cellline_BC" drug1_name <- "drug_001" drug2_name <- "drug_026" se <- get_synthetic_data("combo_matrix_small")[["combination"]] dt_average <- convert_se_assay_to_dt(se, "Averaged")[normalization_type == "GR"] ls_axes <- define_matrix_grid_positions( dt_average[["Concentration"]], dt_average[["Concentration_2"]])
cl_name <- "cellline_BC" drug1_name <- "drug_001" drug2_name <- "drug_026" se <- get_synthetic_data("combo_matrix_small")[["combination"]] dt_average <- convert_se_assay_to_dt(se, "Averaged")[normalization_type == "GR"] ls_axes <- define_matrix_grid_positions( dt_average[["Concentration"]], dt_average[["Concentration_2"]])
rowData
or colData
of a SummarizedExperiment
object
to a nested field of a BumpyMatrix
assay.Demote a metadata field in the rowData
or colData
of a SummarizedExperiment
object
to a nested field of a BumpyMatrix
assay.
demote_fields(se, fields)
demote_fields(se, fields)
se |
A |
fields |
Character vector of metadata fields to demote as nested columns. |
Revert this operation using promote_fields
.
A SummarizedExperiment
object with new dimensions resulting from demoting given fields
to nested columns.
promote_fields
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] se <- promote_fields(se, "ReadoutValue", 2) demote_fields(se, "ReadoutValue")
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] se <- promote_fields(se, "ReadoutValue", 2) demote_fields(se, "ReadoutValue")
Convert data.table with dose-response data into a BumpyMatrix assay.
df_to_bm_assay(data, discard_keys = NULL)
df_to_bm_assay(data, discard_keys = NULL)
data |
data.table with drug-response data |
discard_keys |
a vector of keys that should be discarded |
The 'assay' is simply a BumpyMatrix object with rownames being the treatment ids, colnames being the ids of the cell lines and values with dose-response data for given cell lines under given conditions.
BumpyMatrix object
df_to_bm_assay(data.table::data.table(Gnumber = 2, clid = "A"))
df_to_bm_assay(data.table::data.table(Gnumber = 2, clid = "A"))
extend abbreviated normalization type
extend_normalization_type_name(x)
extend_normalization_type_name(x)
x |
string with normalization type |
string
extend_normalization_type_name("GR")
extend_normalization_type_name("GR")
Fit GR and RV curves from a data.table.
fit_curves( df_, series_identifiers, e_0 = 1, GR_0 = 1, n_point_cutoff = 4, range_conc = c(0.005, 5), force_fit = FALSE, pcutoff = 0.05, cap = 0.1, normalization_type = c("GR", "RV") )
fit_curves( df_, series_identifiers, e_0 = 1, GR_0 = 1, n_point_cutoff = 4, range_conc = c(0.005, 5), force_fit = FALSE, pcutoff = 0.05, cap = 0.1, normalization_type = c("GR", "RV") )
df_ |
data.table containing data to fit. See details. |
series_identifiers |
character vector of the column names in |
e_0 |
numeric value representing the |
GR_0 |
numeric value representing the |
n_point_cutoff |
integer of how many points should be considered the minimum required to try to fit a curve.
Defaults to |
range_conc |
numeric vector of length 2 indicating the lower and upper concentration ranges.
Defaults to |
force_fit |
boolean indicating whether or not to force a constant fit.
Defaults to |
pcutoff |
numeric of pvalue significance threshold above or equal to which to use a constant fit.
Defaults to |
cap |
numeric value capping |
normalization_type |
character vector of types of curves to fit.
Defaults to |
The df_
expects the following columns:
RelativeViability normalized relative viability values (if normalization_type
includes "RV"
)
GRvalue normalized GR values (if normalization_type
includes "GR"
)
The range_conc
is used to calculate the x_AOC_range
statistic.
The purpose of this statistic is to enable comparison across different experiments with slightly
different concentration ranges.
data.table of fit parameters as specified by the normalization_type
.
df_ <- data.table::data.table(Concentration = c(0.001, 0.00316227766016838, 0.01, 0.0316227766016838), x_std = c(0.1, 0.1, 0.1, 0.1), normalization_types = c("RV", "RV", "RV", "RV"), x = c(0.9999964000144, 0.999964001439942, 0.999640143942423, 0.996414342629482)) fit_curves(df_, "Concentration", normalization_type = "RV")
df_ <- data.table::data.table(Concentration = c(0.001, 0.00316227766016838, 0.01, 0.0316227766016838), x_std = c(0.1, 0.1, 0.1, 0.1), normalization_types = c("RV", "RV", "RV", "RV"), x = c(0.9999964000144, 0.999964001439942, 0.999640143942423, 0.996414342629482)) fit_curves(df_, "Concentration", normalization_type = "RV")
Flatten a stacked table into a wide format.
flatten(tbl, groups, wide_cols, sep = "_")
flatten(tbl, groups, wide_cols, sep = "_")
tbl |
table to flatten. |
groups |
character vector of column names representing uniquifying groups in expansion. |
wide_cols |
character vector of column names to flatten. |
sep |
string representing separator between |
flattened columns will be named with original column names prefixed by wide_cols
columns,
concatenated together and separated by sep
.
A common use case for this function is
when a flattened version of the "Metrics"
assay is desired.
table of flattened data as defined by wide_cols
.
convert_se_assay_to_dt
n <- 4 m <- 5 grid <- expand.grid(normalization_type = c("GR", "RV"), source = c("GDS", "GDR")) repgrid <- data.table::rbindlist(rep(list(grid), m)) repgrid$wide <- seq(m * n) repgrid$id <- rep(LETTERS[1:m], each = n) groups <- colnames(grid) wide_cols <- c("wide") flatten(repgrid, groups = groups, wide_cols = wide_cols)
n <- 4 m <- 5 grid <- expand.grid(normalization_type = c("GR", "RV"), source = c("GDS", "GDR")) repgrid <- data.table::rbindlist(rep(list(grid), m)) repgrid$wide <- seq(m * n) repgrid$id <- rep(LETTERS[1:m], each = n) groups <- colnames(grid) wide_cols <- c("wide") flatten(repgrid, groups = groups, wide_cols = wide_cols)
Function for generating local synthetic data used for unit tests in modules
gen_synthetic_data(m = 1, n = 5)
gen_synthetic_data(m = 1, n = 5)
m |
number of drugs |
n |
number of records |
list with drugs, cell_lines, raw_data and assay_data
gen_synthetic_data()
gen_synthetic_data()
Identify and return additional variables in list of dt
get_additional_variables(dt_list, unique = FALSE, prettified = FALSE)
get_additional_variables(dt_list, unique = FALSE, prettified = FALSE)
dt_list |
list of data.table or data.table containing additional variables |
unique |
logical flag indicating if all variables should be returned or only those containing more than one unique value |
prettified |
Flag indicating if the provided identifiers in the dt are prettified |
vector of variable names with additional variables
dt <- data.table::data.table( Gnumber = seq_len(10), Concentration = runif(10), Ligand = c(rep(0.5, 5), rep(0, 5)) ) get_additional_variables(dt)
dt <- data.table::data.table( Gnumber = seq_len(10), Concentration = runif(10), Ligand = c(rep(0.5, 5), rep(0, 5)) ) get_additional_variables(dt)
Helper function to find duplicated rows in assay data
get_assay_dt_duplicated_rows(dt, output = "index")
get_assay_dt_duplicated_rows(dt, output = "index")
dt |
data.table |
output |
string with the output format to be returned |
integer vector or data.table with duplicated rows
sdata <- get_synthetic_data("finalMAE_small") smetrics_data <- convert_se_assay_to_dt(sdata[[1]], "Metrics") get_assay_dt_duplicated_rows(smetrics_data, output = "data") get_assay_dt_duplicated_rows(smetrics_data)
sdata <- get_synthetic_data("finalMAE_small") smetrics_data <- convert_se_assay_to_dt(sdata[[1]], "Metrics") get_assay_dt_duplicated_rows(smetrics_data, output = "data") get_assay_dt_duplicated_rows(smetrics_data)
get_env_assay_names
otherwiseget assay names of the given se/dataset
fetch the data from the se if provided as metadata
use predefined values from get_env_assay_names
otherwise
get_assay_names(se = NULL, ...)
get_assay_names(se = NULL, ...)
se |
SummarizedExperiment or NULL |
... |
Additional arguments to pass to |
charvec
Arkadiusz GÅ‚adki [email protected]
get_assay_names()
get_assay_names()
get columns in the assay data required to have unique (non-duplicated) data
get_assay_req_uniq_cols(dt)
get_assay_req_uniq_cols(dt)
dt |
data.table with assay data |
charvec with columns required to have unique data
sdata <- get_synthetic_data("finalMAE_small") smetrics_data <- convert_se_assay_to_dt(sdata[[1]], "Metrics") get_assay_req_uniq_cols(smetrics_data)
sdata <- get_synthetic_data("finalMAE_small") smetrics_data <- convert_se_assay_to_dt(sdata[[1]], "Metrics") get_assay_req_uniq_cols(smetrics_data)
get names of combo assays
get_combo_assay_names(se = NULL, ...)
get_combo_assay_names(se = NULL, ...)
se |
SummarizedExperiment or NULL |
... |
Additional arguments to pass to |
charvec of combo assay names.
Arkadiusz GÅ‚adki [email protected]
get_combo_assay_names()
get_combo_assay_names()
get names of combo base assays
get_combo_base_assay_names(se = NULL, ...)
get_combo_base_assay_names(se = NULL, ...)
se |
SummarizedExperiment or NULL |
... |
Additional arguments to pass to |
charvec
Arkadiusz GÅ‚adki [email protected]
get_combo_base_assay_names()
get_combo_base_assay_names()
get names of combo excess fields
get_combo_excess_field_names()
get_combo_excess_field_names()
charvec
get_combo_excess_field_names()
get_combo_excess_field_names()
get names of combo score assays
get_combo_score_assay_names(se = NULL, ...)
get_combo_score_assay_names(se = NULL, ...)
se |
SummarizedExperiment or NULL |
... |
Additional arguments to pass to |
charvec
Arkadiusz GÅ‚adki [email protected]
get_combo_score_assay_names()
get_combo_score_assay_names()
get names of combo score fields
get_combo_score_field_names()
get_combo_score_field_names()
charvec
get_combo_score_assay_names()
get_combo_score_assay_names()
Get gDR default identifiers required for downstream analysis.
get_default_identifiers()
get_default_identifiers()
charvec
get_default_identifiers()
get_default_identifiers()
Helper function to find duplicated rows
get_duplicated_rows(x, col_names = NULL, output = "index")
get_duplicated_rows(x, col_names = NULL, output = "index")
x |
DataFrame or data.table |
col_names |
character vector, columns in which duplication are searched for |
output |
string with the output format to be returned - one of "index" (index of duplicates) or "data" (subset of input data with duplicates) |
integer vector or data.table with duplicated rows
dt <- data.table::data.table(a = c(1, 2, 3), b = c(3, 2, 2)) get_duplicated_rows(dt, "b") get_duplicated_rows(dt, "b", output = "data")
dt <- data.table::data.table(a = c(1, 2, 3), b = c(3, 2, 2)) get_duplicated_rows(dt, "b") get_duplicated_rows(dt, "b", output = "data")
get default assay names for the specified filters, i.e. set of assay types, assay groups and assay data types
get_env_assay_names( type = NULL, group = NULL, data_type = NULL, prettify = FALSE, simplify = TRUE )
get_env_assay_names( type = NULL, group = NULL, data_type = NULL, prettify = FALSE, simplify = TRUE )
type |
charvec of assay types |
group |
charvec of assay groups |
data_type |
charvec assay of data types |
prettify |
logical flag, prettify the assay name? |
simplify |
logical flag, simplify the output? will return single string instead of named vector with single element useful when function is expected to return single element/assay only |
charvec
Arkadiusz GÅ‚adki [email protected]
get_env_assay_names()
get_env_assay_names()
So far the helper is needed to handle env vars containing :
for which the backslash is automatically added in some contexts
and R could not get the original value for these env vars.
get_env_var(x, ...)
get_env_var(x, ...)
x |
string with the name of the environemntal variable |
... |
additional params for Sys.getenev |
sanitized value of the env variable
get_env_var("HOME")
get_env_var("HOME")
Get identifiers that expect only one value for each identifier.
get_expect_one_identifiers()
get_expect_one_identifiers()
charvec
get_expect_one_identifiers()
get_expect_one_identifiers()
get experiment groups
get_experiment_groups(type = NULL)
get_experiment_groups(type = NULL)
type |
String indicating the name of an assay group. Defaults to all experiment groups. |
list with experiment groups or string (if type not NULL)
Arkadiusz Gladki [email protected]
get_experiment_groups()
get_experiment_groups()
Get descriptions for identifiers
get_identifiers_dt(k = NULL, get_description = FALSE, get_example = FALSE)
get_identifiers_dt(k = NULL, get_description = FALSE, get_example = FALSE)
k |
identifier key, string |
get_description |
return descriptions only, boolean |
get_example |
return examples only, boolean |
named list
get_identifiers_dt()
get_identifiers_dt()
Get gDR synonyms for the identifiers
get_idfs_synonyms()
get_idfs_synonyms()
charvec
get_idfs_synonyms()
get_idfs_synonyms()
Get isobologram column names
get_isobologram_columns(k = NULL, prettify = TRUE)
get_isobologram_columns(k = NULL, prettify = TRUE)
k |
key |
prettify |
change to upper case and add underscore, iso_level –> Iso_Level |
character vector of isobologram column names for combination data
get_isobologram_columns() get_isobologram_columns("iso_level", prettify = TRUE)
get_isobologram_columns() get_isobologram_columns("iso_level", prettify = TRUE)
get the identifiers of all SE's in the MAE
get_MAE_identifiers(mae)
get_MAE_identifiers(mae)
mae |
MultiAssayExperiment |
named list with identifiers for each SE
mae <- get_synthetic_data("finalMAE_small.qs") get_MAE_identifiers(mae)
mae <- get_synthetic_data("finalMAE_small.qs") get_MAE_identifiers(mae)
get non empty assays
get_non_empty_assays(mae)
get_non_empty_assays(mae)
mae |
MultiAssayExperiment object |
charvec with non-empty experiments
Arkadiusz Gladki [email protected]
mae <- get_synthetic_data("finalMAE_small.qs") get_non_empty_assays(mae)
mae <- get_synthetic_data("finalMAE_small.qs") get_non_empty_assays(mae)
get optional colData fields
get_optional_coldata_fields(se)
get_optional_coldata_fields(se)
se |
a SummarizedExperiment object with drug-response data generate by gDR pipeline |
a charvec containing the names of the optional identifiers in the SE colData
get optional rowData fields
get_optional_rowdata_fields(se)
get_optional_rowdata_fields(se)
se |
a SummarizedExperiment object with drug-response data generate by gDR pipeline |
a charvec containing the names of the optional identifiers in the SE rowData
Get identifiers required for downstream analysis.
get_required_identifiers()
get_required_identifiers()
charvec
get_required_identifiers()
get_required_identifiers()
Get settings from JSON file In most common scenario the settings are stored in JSON file to avoid hardcoding
get_settings_from_json( s = NULL, json_path = system.file(package = "gDRutils", "cache.json") )
get_settings_from_json( s = NULL, json_path = system.file(package = "gDRutils", "cache.json") )
s |
charvec with setting entry/entries |
json_path |
string with the path to the JSON file |
value/values for entry/entries from JSON file
if (!nchar(system.file(package="gDRutils"))) { get_settings_from_json() }
if (!nchar(system.file(package="gDRutils"))) { get_settings_from_json() }
get supported experiments
get_supported_experiments(type = NULL)
get_supported_experiments(type = NULL)
type |
String indicating the type of experiment |
charvec with supported experiment name(s)
Arkadiusz Gladki [email protected]
get_supported_experiments()
get_supported_experiments()
Get synthetic data from gDRtestData package
get_synthetic_data(qs)
get_synthetic_data(qs)
qs |
qs filename |
loaded data
get_synthetic_data("finalMAE_small.qs")
get_synthetic_data("finalMAE_small.qs")
Function to obtain data from gDRtestData and prepare for unit tests
get_testdata()
get_testdata()
list with drugs, cell_lines, raw_data and assay_data
get_testdata()
get_testdata()
Function to obtain data from gDRtestData and prepare for unit tests
get_testdata_codilution()
get_testdata_codilution()
list with drugs, cell_lines, raw_data and assay_data
get_testdata_codilution()
get_testdata_codilution()
Function to obtain data from gDRtestData and prepare for unit tests
get_testdata_combo()
get_testdata_combo()
list with drugs, cell_lines, raw_data and assay_data
get_testdata_combo()
get_testdata_combo()
An auxiliary function that checks for duplicates in the assay data
has_assay_dt_duplicated_rows(dt)
has_assay_dt_duplicated_rows(dt)
dt |
data.table with assay data |
logical flag indicating if a dt contains duplicated rows or not
sdata <- get_synthetic_data("finalMAE_small") smetrics_data <- convert_se_assay_to_dt(sdata[[1]], "Metrics") has_assay_dt_duplicated_rows(smetrics_data)
sdata <- get_synthetic_data("finalMAE_small") smetrics_data <- convert_se_assay_to_dt(sdata[[1]], "Metrics") has_assay_dt_duplicated_rows(smetrics_data)
An auxiliary function that checks for duplicates in the data.table (or its subset)
has_dt_duplicated_rows(dt, col_names = NULL)
has_dt_duplicated_rows(dt, col_names = NULL)
dt |
data.table |
col_names |
charvec with columns to be used for subsetting |
logical flag indicating if a dt contains duplicated rows or not
dt <- data.table::data.table(a = c(1, 2, 3), b = c(3, 2, 2)) has_dt_duplicated_rows(dt, "b")
dt <- data.table::data.table(a = c(1, 2, 3), b = c(3, 2, 2)) has_dt_duplicated_rows(dt, "b")
Has Single Codrug Data
has_single_codrug_data( cols, prettify_identifiers = TRUE, codrug_identifiers = c("drug_name2", "concentration2") )
has_single_codrug_data( cols, prettify_identifiers = TRUE, codrug_identifiers = c("drug_name2", "concentration2") )
cols |
character vector with the columns of the input data |
prettify_identifiers |
logical flag specifying if identifiers are expected to be prettified |
codrug_identifiers |
character vector with identifiers for the codrug columns |
logical flag
has_single_codrug_data("Drug Name") has_single_codrug_data(c("Drug Name", "Cell Lines")) has_single_codrug_data(c("Drug Name 2", "Concentration 2")) has_single_codrug_data( get_prettified_identifiers( c("concentration2", "drug_name2"), simplify = FALSE ) )
has_single_codrug_data("Drug Name") has_single_codrug_data(c("Drug Name", "Cell Lines")) has_single_codrug_data(c("Drug Name 2", "Concentration 2")) has_single_codrug_data( get_prettified_identifiers( c("concentration2", "drug_name2"), simplify = FALSE ) )
Has Valid Codrug Data
has_valid_codrug_data( data, prettify_identifiers = TRUE, codrug_name_identifier = "drug_name2", codrug_conc_identifier = "concentration2" )
has_valid_codrug_data( data, prettify_identifiers = TRUE, codrug_name_identifier = "drug_name2", codrug_conc_identifier = "concentration2" )
data |
data.table with input data |
prettify_identifiers |
logical flag specifying if identifiers are expected to be prettified |
codrug_name_identifier |
string with the identifiers for the codrug drug_name column |
codrug_conc_identifier |
string with the identifiers for the codrug concentration column |
logical flag
dt <- data.table::data.table( "Drug Name" = letters[seq_len(3)], "Concentration" = seq_len(3), "Drug Name 2" = letters[4:6], "Concentration 2" = 4:6 ) has_valid_codrug_data(dt) dt$`Concentration 2` <- NULL has_valid_codrug_data(dt)
dt <- data.table::data.table( "Drug Name" = letters[seq_len(3)], "Concentration" = seq_len(3), "Drug Name 2" = letters[4:6], "Concentration 2" = 4:6 ) has_valid_codrug_data(dt) dt$`Concentration 2` <- NULL has_valid_codrug_data(dt)
Get the expected header(s) for one field or reset all header fields
get_header(k = NULL)
get_header(k = NULL)
k |
string of field (data type) to return headers for |
If get_header
is called with no values, the entire available header list is returned.
For get_header
a character vector of headers for field k
.
get_header(k = NULL) get_header("manifest")
get_header(k = NULL) get_header("manifest")
Get, set, or reset the expected identifier(s) for one or all identifier field(s).
Identifiers are used by the gDR processing functions to identify which columns in a data.table
correspond to certain expected fields. Functions of the family *et_identifier
will look for
identifiers from the environment while functions of the family *et_SE_identifiers
will look for
identifiers in the metadata
slot of a SummarizedExperiment
object.
See details for expected identifiers and their definitions.
get_env_identifiers(k = NULL, simplify = TRUE) get_prettified_identifiers(k = NULL, simplify = TRUE) set_env_identifier(k, v) reset_env_identifiers()
get_env_identifiers(k = NULL, simplify = TRUE) get_prettified_identifiers(k = NULL, simplify = TRUE) set_env_identifier(k, v) reset_env_identifiers()
k |
String corresponding to identifier name. |
simplify |
Boolean indicating whether output should be simplified. |
v |
Character vector corresponding to the value for given identifier |
Identifiers supported by the gDR suite include:
"barcode": String of column name containing barcode metadata
"cellline": String of column name containing unique, machine-readable cell line identifiers
"cellline_name": String of column name containing human-friendly cell line names
"cellline_tissue": String of column name containing metadata on cell line tissue type
"cellline_ref_div_time": String of column name containing reference division time for cell lines
"cellline_parental_identifier": String of column name containing unique, machine-readable parental cell line identifiers. Used in the case of derived or engineered cell lines.
"drug": String of column name containing unique, machine-readable drug identifiers
"drug_name": String of column name containing human-friendly drug names
"drug_moa": String of column name containing metadata for drug mode of action
"duration": String of column name containing metadata on duration that cells were treated (in hours)
"template": String of column name containing template metadata
"untreated_tag": Character vector of entries that identify control, untreated wells
"well_position": Character vector of column names containing metadata on well positions on a plate
For any set
ting or reset
ting functionality, a NULL
invisibly.
For get_env_identifiers
a character vector of identifiers for field k
.
For functions called with no arguments, the entire available identifier list is returned.
list or charvec depends on unify param
list or charvec depends on unify param
NULL
NULL
get_env_identifiers("duration") # "Duration"
get_env_identifiers("duration") # "Duration"
SummarizedExperiment
sIdentify unique metadata fields from a list of SummarizedExperiment
s
identify_unique_se_metadata_fields(SElist)
identify_unique_se_metadata_fields(SElist)
SElist |
named list of |
character vector of unique names of metadata
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] SElist <- list( se, se ) identify_unique_se_metadata_fields(SElist)
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] SElist <- list( se, se ) identify_unique_se_metadata_fields(SElist)
check if any experiment is empty
is_any_exp_empty(mae)
is_any_exp_empty(mae)
mae |
MultiAssayExperiment object |
logical
Arkadiusz Gladki [email protected]
mae <- get_synthetic_data("finalMAE_small.qs") is_any_exp_empty(mae)
mae <- get_synthetic_data("finalMAE_small.qs") is_any_exp_empty(mae)
se
is combo dataset.Checks if se
is combo dataset.
is_combo_data(se)
is_combo_data(se)
se |
SummarizedExperiment |
logical
se <- get_synthetic_data("combo_matrix")[[1]] is_combo_data(se) se <- get_synthetic_data("combo_matrix")[[2]] is_combo_data(se) se <- get_synthetic_data("small")[[1]] is_combo_data(se)
se <- get_synthetic_data("combo_matrix")[[1]] is_combo_data(se) se <- get_synthetic_data("combo_matrix")[[2]] is_combo_data(se) se <- get_synthetic_data("small")[[1]] is_combo_data(se)
check if experiment (SE) is empty
is_exp_empty(exp)
is_exp_empty(exp)
exp |
SummarizedExperiment object. |
logical
Arkadiusz Gladki [email protected]
mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] is_exp_empty(se)
mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] is_exp_empty(se)
check if all mae experiments are empty
is_mae_empty(mae)
is_mae_empty(mae)
mae |
MultiAssayExperiment object |
logical
Arkadiusz Gladki [email protected]
mae <- get_synthetic_data("finalMAE_small.qs") is_mae_empty(mae)
mae <- get_synthetic_data("finalMAE_small.qs") is_mae_empty(mae)
Fit a logistic curve to drug response data.
logisticFit( concs, norm_values, std_norm_values = NA, x_0 = 1, priors = NULL, lower = NULL, range_conc = c(0.005, 5), force_fit = FALSE, pcutoff = 0.05, cap = 0.1, n_point_cutoff = 4, capping_fold = 5 )
logisticFit( concs, norm_values, std_norm_values = NA, x_0 = 1, priors = NULL, lower = NULL, range_conc = c(0.005, 5), force_fit = FALSE, pcutoff = 0.05, cap = 0.1, n_point_cutoff = 4, capping_fold = 5 )
concs |
concentrations that have not been transformed into log space. |
norm_values |
normalized response values (Untreated = 1). |
std_norm_values |
std of values. |
x_0 |
upper limit.
Defaults to |
priors |
numeric vector containing starting values for all. mean parameters in the model. Overrules any self starter function. |
lower |
numeric vector of lower limits for all parameters in a 4-param model. |
range_conc |
range of concentration for calculating AOC_range. |
force_fit |
boolean indicating whether or not to force a parameter-based fit. |
pcutoff |
numeric of pvalue significance threshold above or equal to which to use a constant fit. |
cap |
numeric value capping |
n_point_cutoff |
integer indicating number of unique concentrations required to fit curve. |
capping_fold |
Integer value of the fold number to use for capping IC50/GR50. Default is |
Implementation of the genedata approach for curve fit: https://screener.genedata.com/documentation/display/DOC21/Business-Rules-for-Dose-Response-Curve-Fitting,-Model-Selection,-and-Fit-Validity.html #nolint
The output parameter names correspond to the following definitions:
The mean of a given dose-response metric
The range of the area over the curve
The area over the GR curve or, respectively, under the relative cell count curve, averaged over the range of concentration values
The concentration at which the effect reaches a value of 0.5 based on interpolation of the fitted curve
The maximum effect of the drug
The drug concentration at half-maximal effect
The asymptotic value of the sigmoidal fit to the dose-response data as concentration goes to infinity
The asymptotic metric value corresponding to a concentration of 0 for the primary drug
The hill coefficient of the fitted curve, which reflects how steep the dose-response curve is
The goodness of the fit
The standard deviation of GR/IC
This will be given by one of the following:
"DRC4pHillFitModel" Successfully fit with a 4-parameter model
"DRC3pHillFitModelFixS0" Successfully fit with a 3-parameter model
"DRCConstantFitResult" Successfully fit with a constant fit
"DRCTooFewPointsToFit" Not enough points to run a fit
"DRCInvalidFitResult" Fit was attempted but failed
The highest log10 concentration
Number of unique concentrations
data.table with metrics and fit parameters.
logisticFit( c(0.001, 0.00316227766016838, 0.01, 0.0316227766016838), c(0.9999964000144, 0.999964001439942, 0.999640143942423, 0.996414342629482), rep(0.1, 4), priors = c(2, 0.4, 1, 0.00658113883008419) )
logisticFit( c(0.001, 0.00316227766016838, 0.01, 0.0316227766016838), c(0.9999964000144, 0.999964001439942, 0.999640143942423, 0.996414342629482), rep(0.1, 4), priors = c(2, 0.4, 1, 0.00658113883008419) )
Lapply or bplapply.
loop(x, FUN, parallelize = TRUE, ...)
loop(x, FUN, parallelize = TRUE, ...)
x |
Vector (atomic or list) or an ‘expression’ object. Other objects (including classed objects) will be coerced by ‘base::as.list’. |
FUN |
A user-defined function. |
parallelize |
Logical indicating whether or not to parallelize the computation.
Defaults to |
... |
optional argument passed to
bplapply if |
List containing output of FUN
applied to every element in x
.
loop(list(c(1,2), c(2,3)), sum, parallelize = FALSE)
loop(list(c(1,2), c(2,3)), sum, parallelize = FALSE)
Lapply through all the experiments in MultiAssayExperiment object
MAEpply(mae, FUN, unify = FALSE, ...)
MAEpply(mae, FUN, unify = FALSE, ...)
mae |
MultiAssayExperiment object |
FUN |
function that should be applied on each experiment of MultiAssayExperiment object |
unify |
logical indicating if the output should be a unlisted object of unique values across all the experiments |
... |
Additional args to be passed to teh |
list or vector depends on unify param
Bartosz Czech [email protected]
mae <- get_synthetic_data("finalMAE_small.qs") MAEpply(mae, SummarizedExperiment::assayNames)
mae <- get_synthetic_data("finalMAE_small.qs") MAEpply(mae, SummarizedExperiment::assayNames)
get colData of all experiments
mcolData(mae)
mcolData(mae)
mae |
MultiAssayExperiment object |
data.table with all-experiments colData
Arkadiusz Gladki [email protected]
mae <- get_synthetic_data("finalMAE_small.qs") mcolData(mae)
mae <- get_synthetic_data("finalMAE_small.qs") mcolData(mae)
Merge assay data
merge_assay( SElist, assay_name, additional_col_name = "data_source", discard_keys = NULL )
merge_assay( SElist, assay_name, additional_col_name = "data_source", discard_keys = NULL )
SElist |
named list of Summarized Experiments |
assay_name |
name of the assay that should be extracted and merged |
additional_col_name |
string of column name that will be added to assay data for the distinction of possible duplicated metrics that can arise from multiple projects |
discard_keys |
character vector of string that will be discarded during creating BumpyMatrix object |
BumpyMatrix or list with data.table + BumpyMatrix
mae <- get_synthetic_data("finalMAE_combo_2dose_nonoise") listSE <- list( combo1 = mae[[1]], sa = mae[[2]] ) merge_assay(listSE, "Normalized")
mae <- get_synthetic_data("finalMAE_combo_2dose_nonoise") listSE <- list( combo1 = mae[[1]], sa = mae[[2]] ) merge_assay(listSE, "Normalized")
Merge metadata
merge_metadata(SElist, metadata_fields)
merge_metadata(SElist, metadata_fields)
SElist |
named list of |
metadata_fields |
vector of metadata names that will be merged |
list of merged metadata
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] listSE <- list( se, se ) metadata_fields <- identify_unique_se_metadata_fields(listSE) merge_metadata(listSE, metadata_fields)
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] listSE <- list( se, se ) metadata_fields <- identify_unique_se_metadata_fields(listSE) merge_metadata(listSE, metadata_fields)
Merge multiple Summarized Experiments
merge_SE( SElist, additional_col_name = "data_source", discard_keys = c("normalization_type", "fit_source", "record_id", "swap_sa", "control_type") )
merge_SE( SElist, additional_col_name = "data_source", discard_keys = c("normalization_type", "fit_source", "record_id", "swap_sa", "control_type") )
SElist |
named list of Summarized Experiments |
additional_col_name |
string with the name of the column that will be added to assay data for the distinction of possible duplicated metrics that can arise from multiple projects |
discard_keys |
character vector of string that will be discarded during creating BumpyMatrix object |
merged SummarizedExperiment object
se1 <- get_synthetic_data("finalMAE_small")[[1]] merge_SE(list(se1 = se1, se2 = se1))
se1 <- get_synthetic_data("finalMAE_small")[[1]] merge_SE(list(se1 = se1, se2 = se1))
Reduce dimensionality of an assay by dropping extra data or combining variables.
modifyData(x, ...) ## S3 method for class 'drug_name2' modifyData(x, option, keep, ...) ## S3 method for class 'data_source' modifyData(x, option, keep, ...) ## Default S3 method: modifyData(x, option, keep, ...)
modifyData(x, ...) ## S3 method for class 'drug_name2' modifyData(x, option, keep, ...) ## S3 method for class 'data_source' modifyData(x, option, keep, ...) ## Default S3 method: modifyData(x, option, keep, ...)
x |
a |
... |
additional arguments passed to methods |
option |
character string specifying the action to be taken, see |
keep |
character string specifying the value of the active variable that will be kept |
If an essay extracted from a SummarizedExperiment
contains additional information,
i.e. factors beyond DrugName
and CellLineName
, that information will be treated
in one of three ways, depending on the value of option
:
drop
: Some information will be discarded and only one value
of the additional variable (chosen by the user) will be kept.
toDrug
: The information is pasted together with the primary drug name.
All observations are kept.
toCellLine
: As above, but pasting is done with cell line name.
Depending on the type of additional information, the exact details will differ. This is handled in the app by adding special classes to the data tables and dispatching to S3 methods.
Following modification, the additional columns are discarded.
modified object
modifyData(drug_name2)
: includes the name and concentration of the second drug
modifyData(data_source)
: includes the data source
modifyData(default)
: includes the name of other additional variables
dt <- data.table::data.table(a = as.character(1:10), b = "data") dt <- addClass(dt, "a") modifyData(dt, "average", keep = "b")
dt <- data.table::data.table(a = as.character(1:10), b = "data") dt <- addClass(dt, "a") modifyData(dt, "average", keep = "b")
get rowData of all experiments
mrowData(mae)
mrowData(mae)
mae |
MultiAssayExperiment object |
data.table with all-experiments rowData
Arkadiusz Gladki [email protected]
mae <- get_synthetic_data("finalMAE_small.qs") mrowData(mae)
mae <- get_synthetic_data("finalMAE_small.qs") mrowData(mae)
Predict a concentration for a given efficacy with fit parameters.
predict_conc_from_efficacy(efficacy, x_inf, x_0, ec50, h)
predict_conc_from_efficacy(efficacy, x_inf, x_0, ec50, h)
efficacy |
Numeric vector representing efficacies to predict concentrations for. |
x_inf |
Numeric vector representing the asymptotic value of the sigmoidal fit to the dose-response data as concentration goes to infinity. |
x_0 |
Numeric vector representing the asymptotic metric value corresponding to a concentration of 0 for the primary drug. |
ec50 |
Numeric vector representing the drug concentration at half-maximal effect. |
h |
Numeric vector representing the hill coefficient of the fitted curve, which reflects how steep |
The inverse of this function is predict_efficacy_from_conc
.
Numeric vector representing predicted concentrations from given efficacies and fit parameters.
predict_efficacy_from_conc .calculate_x50
predict_conc_from_efficacy(efficacy = c(1, 1.5), x_inf = 0.1, x_0 = 1, ec50 = 0.5, h = 2)
predict_conc_from_efficacy(efficacy = c(1, 1.5), x_inf = 0.1, x_0 = 1, ec50 = 0.5, h = 2)
Predict efficacy values given fit parameters and a concentration.
predict_efficacy_from_conc(c, x_inf, x_0, ec50, h)
predict_efficacy_from_conc(c, x_inf, x_0, ec50, h)
c |
Numeric vector representing concentrations to predict efficacies for. |
x_inf |
Numeric vector representing the asymptotic value of the sigmoidal fit to the dose-response data as concentration goes to infinity. |
x_0 |
Numeric vector representing the asymptotic metric value corresponding to a concentration of 0 for the primary drug. |
ec50 |
Numeric vector representing the drug concentration at half-maximal effect. |
h |
Numeric vector representing the hill coefficient of the fitted curve, which reflects how steep the dose-response curve is. |
The inverse of this function is predict_conc_from_efficacy
.
Numeric vector representing predicted efficacies from given concentrations and fit parameters.
predict_conc_from_efficacy
predict_efficacy_from_conc(c = 1, x_inf = 0.1, x_0 = 1, ec50 = 0.5, h = 2)
predict_efficacy_from_conc(c = 1, x_inf = 0.1, x_0 = 1, ec50 = 0.5, h = 2)
Map existing column names of a flattened 'Metrics' assay to prettified names.
prettify_flat_metrics( x, human_readable = FALSE, normalization_type = c("GR", "RV") )
prettify_flat_metrics( x, human_readable = FALSE, normalization_type = c("GR", "RV") )
x |
character vector of names to prettify. |
human_readable |
boolean indicating whether or not to return column names in human readable format.
Defaults to |
normalization_type |
character vector with a specified normalization type.
Defaults to |
A common use case for this function is to prettify column names from a flattened version of
the "Metrics"
assay.
Mode "human_readable" = TRUE
is often used for prettification in the context
of front-end applications, whereas "human_readable" = FALSE
is often used for
prettification in the context of the command line.
character vector of prettified names.
x <- c("CellLineName", "Tissue", "Primary Tissue", "GR_gDR_x_mean", "GR_gDR_xc50", "RV_GDS_x_mean") prettify_flat_metrics(x, human_readable = FALSE)
x <- c("CellLineName", "Tissue", "Primary Tissue", "GR_gDR_x_mean", "GR_gDR_xc50", "RV_GDS_x_mean") prettify_flat_metrics(x, human_readable = FALSE)
SummarizedExperiment
as either the rowData
or colData
.Promote a nested field to be represented as a metadata field of the SummarizedExperiment
as either the rowData
or colData
.
promote_fields(se, fields, MARGIN = c(1, 2))
promote_fields(se, fields, MARGIN = c(1, 2))
se |
|
fields |
Character vector of nested fields in a |
MARGIN |
Numeric of values |
Revert this operation using demote_fields
.
A SummarizedExperiment
object with new dimensions resulting from promoting given fields
.
demote_fields
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] se <- promote_fields(se, "ReadoutValue", 2)
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] se <- promote_fields(se, "ReadoutValue", 2)
current improvements done on the colData as a standardization step:
set default value for optional colData fields
refine_coldata(cd, se, default_v = "Undefined")
refine_coldata(cd, se, default_v = "Undefined")
cd |
DataFrame with colData |
se |
a SummarizedExperiment object with drug-response data generate by gDR pipeline |
default_v |
string with default value for optional columns in colData |
refined colData
mae <- get_synthetic_data("finalMAE_small.qs") refine_coldata(SummarizedExperiment::colData(mae[[1]]), mae[[1]])
mae <- get_synthetic_data("finalMAE_small.qs") refine_coldata(SummarizedExperiment::colData(mae[[1]]), mae[[1]])
current improvements done on the rowData as a standardization step:
set default value for optional rowData fields
refine_rowdata(rd, se, default_v = "Undefined")
refine_rowdata(rd, se, default_v = "Undefined")
rd |
DataFrame with rowData |
se |
a SummarizedExperiment object with drug-response data generate by gDR pipeline |
default_v |
string with default value for optional columns in rowData |
refined rowData
mae <- get_synthetic_data("finalMAE_small.qs") refine_rowdata(SummarizedExperiment::colData(mae[[1]]), mae[[1]])
mae <- get_synthetic_data("finalMAE_small.qs") refine_rowdata(SummarizedExperiment::colData(mae[[1]]), mae[[1]])
Remove Codrug Data
remove_codrug_data( data, prettify_identifiers = TRUE, codrug_identifiers = c("drug_name2", "concentration2") )
remove_codrug_data( data, prettify_identifiers = TRUE, codrug_identifiers = c("drug_name2", "concentration2") )
data |
data.table with input data |
prettify_identifiers |
logical flag specifying if identifiers are expected to be prettified |
codrug_identifiers |
character vector with identifiers for the codrug columns |
data.table without combination columns
dt <- data.table::data.table( "Drug Name" = letters[seq_len(3)], "Concentration" = seq_len(3), "Drug Name 2" = letters[4:6], "Concentration 2" = 4:6 ) dt remove_codrug_data(dt)
dt <- data.table::data.table( "Drug Name" = letters[seq_len(3)], "Concentration" = seq_len(3), "Drug Name 2" = letters[4:6], "Concentration 2" = 4:6 ) dt remove_codrug_data(dt)
Rename BumpyMatrix
rename_bumpy(bumpy, mapping_vector)
rename_bumpy(bumpy, mapping_vector)
bumpy |
a BumpyMatrix object |
mapping_vector |
a named vector for mapping old and new values. The names of the character vector indicate the source names, and the corresponding values the destination names. |
a renamed BumpyMatrix object
mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] assay <- SummarizedExperiment::assay(se) rename_bumpy(assay, c("Concentration" = "conc"))
mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] assay <- SummarizedExperiment::assay(se) rename_bumpy(assay, c("Concentration" = "conc"))
Rename DFrame
rename_DFrame(df, mapping_vector)
rename_DFrame(df, mapping_vector)
df |
a DFrame object |
mapping_vector |
a named vector for mapping old and new values. The names of the character vector indicate the source names, and the corresponding values the destination names. |
a renamed DFrame object
mae <- get_synthetic_data("finalMAE_small.qs") rename_DFrame(SummarizedExperiment::rowData(mae[[1]]), c("Gnumber" = "Gnumber1"))
mae <- get_synthetic_data("finalMAE_small.qs") rename_DFrame(SummarizedExperiment::rowData(mae[[1]]), c("Gnumber" = "Gnumber1"))
Round concentration to ndigit significant digits
round_concentration(x, ndigit = 3)
round_concentration(x, ndigit = 3)
x |
value to be rounded. |
ndigit |
number of significant digits (default = 4). |
rounded x
round_concentration(x = c(0.00175,0.00324,0.0091), ndigit = 1)
round_concentration(x = c(0.00175,0.00324,0.0091), ndigit = 1)
Set metadata for the fitting parameters that define the Metrics assay in SummarizedExperiment object metadata.
set_SE_fit_parameters(se, value) set_SE_processing_metadata(se, value) set_SE_keys(se, value) set_SE_experiment_metadata(se, value) set_SE_experiment_raw_data(se, value) get_SE_fit_parameters(se) get_SE_processing_metadata(se) get_SE_experiment_raw_data(se) get_SE_experiment_metadata(se) get_SE_keys(se, key_type = NULL) get_SE_identifiers(se, id_type = NULL, simplify = TRUE) set_SE_identifiers(se, value)
set_SE_fit_parameters(se, value) set_SE_processing_metadata(se, value) set_SE_keys(se, value) set_SE_experiment_metadata(se, value) set_SE_experiment_raw_data(se, value) get_SE_fit_parameters(se) get_SE_processing_metadata(se) get_SE_experiment_raw_data(se) get_SE_experiment_metadata(se) get_SE_keys(se, key_type = NULL) get_SE_identifiers(se, id_type = NULL, simplify = TRUE) set_SE_identifiers(se, value)
se |
a SummarizedExperiment object for which to add fit parameter metadata. |
value |
named list of metadata for fit parameters. |
key_type |
string of a specific key type (i.e. 'nested_keys', etc.). |
id_type |
string of a specific id type (i.e. 'duration', 'cellline_name', etc.). |
simplify |
Boolean indicating whether output should be simplified. |
For *et_SE_processing_metadata
, get/set metadata for the processing info that defines
the date_processed and packages versions in SummarizedExperiment object metadata.
For *et_SE_fit_parameters
, get/set metadata for fit parameters
used to construct the Metrics assay in a SummarizedExperiment object.
se
with added metadata.
mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] get_SE_fit_parameters(se) mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] meta <- get_SE_processing_metadata(se) mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] get_SE_experiment_raw_data(se) mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] get_SE_experiment_metadata(se) mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] get_SE_identifiers(se)
mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] get_SE_fit_parameters(se) mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] meta <- get_SE_processing_metadata(se) mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] get_SE_experiment_raw_data(se) mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] get_SE_experiment_metadata(se) mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] get_SE_identifiers(se)
Replace values for flat fits: ec50 = 0, h = 0.0001 and xc50 = +/- Inf
set_constant_fit_params(out, mean_norm_value)
set_constant_fit_params(out, mean_norm_value)
out |
Named list of fit parameters. |
mean_norm_value |
Numeric value that be used to set all parameters that can be calculated from the mean. |
Modified named list of fit parameters.
na <- list(x_0 = NA) set_constant_fit_params(na, mean_norm_value = 0.6)
na <- list(x_0 = NA) set_constant_fit_params(na, mean_norm_value = 0.6)
This function sets the CellLineName
field in
colData
to be unique by appending the clid
in parentheses for duplicates.
set_unique_cl_names(se)
set_unique_cl_names(se)
se |
A SummarizedExperiment object. |
A SummarizedExperiment object with unique CellLineName
in colData
.
se <- SummarizedExperiment::SummarizedExperiment( assays = list(counts = matrix(1:4, ncol = 2)), colData = S4Vectors::DataFrame(CellLineName = c("ID1", "ID1"), clid = c("C1", "C2")) ) se <- set_unique_cl_names(se)
se <- SummarizedExperiment::SummarizedExperiment( assays = list(counts = matrix(1:4, ncol = 2)), colData = S4Vectors::DataFrame(CellLineName = c("ID1", "ID1"), clid = c("C1", "C2")) ) se <- set_unique_cl_names(se)
This function sets the CellLineName
field in
colData
to be unique by appending the clid
in parentheses for duplicates.
set_unique_cl_names_dt(col_data, sep = " ")
set_unique_cl_names_dt(col_data, sep = " ")
col_data |
data.table or DFrame with col data |
sep |
string with separator added before suffix |
fixed input table with unique CellLineName
in colData
.
col_data <- S4Vectors::DataFrame(CellLineName = c("ID1", "ID1"), clid = c("C1", "C2")) col_data <- set_unique_cl_names_dt(col_data)
col_data <- S4Vectors::DataFrame(CellLineName = c("ID1", "ID1"), clid = c("C1", "C2")) col_data <- set_unique_cl_names_dt(col_data)
This function sets the DrugName
, DrugName_2
, and DrugName_3
fields in rowData
to be unique by appending the corresponding Gnumber
, Gnumber_2
, and Gnumber_3
in parentheses for duplicates.
set_unique_drug_names(se)
set_unique_drug_names(se)
se |
A SummarizedExperiment object. |
A SummarizedExperiment object with unique DrugName
fields in rowData
.
se <- SummarizedExperiment::SummarizedExperiment( assays = list(counts = matrix(1:9, ncol = 3)), rowData = S4Vectors::DataFrame(DrugName = c("DrugA", "DrugA", "DrugB"), Gnumber = c("G1", "G2", "G5"), DrugName_2 = c("DrugC", "DrugC", "DrugD"), Gnumber_2 = c("G3", "G4", "G5") )) se <- set_unique_drug_names(se)
se <- SummarizedExperiment::SummarizedExperiment( assays = list(counts = matrix(1:9, ncol = 3)), rowData = S4Vectors::DataFrame(DrugName = c("DrugA", "DrugA", "DrugB"), Gnumber = c("G1", "G2", "G5"), DrugName_2 = c("DrugC", "DrugC", "DrugD"), Gnumber_2 = c("G3", "G4", "G5") )) se <- set_unique_drug_names(se)
This function sets the DrugName
, DrugName_2
, and DrugName_3
fields in
rowData
to be unique by appending the corresponding Gnumber
, Gnumber_2
,
and Gnumber_3
in parentheses for duplicates.
set_unique_drug_names_dt(row_data, sep = " ")
set_unique_drug_names_dt(row_data, sep = " ")
row_data |
data.table or DFrame with row data |
sep |
string with separator added before suffix |
fixed input table with unique DrugName
fields in rowData
.
row_data <- S4Vectors::DataFrame( DrugName = c("DrugA", "DrugA", "DrugB"), Gnumber = c("G1", "G2", "G5"), DrugName_2 = c("DrugC", "DrugC", "DrugD"), Gnumber_2 = c("G3", "G4", "G5") ) row_data <- set_unique_drug_names_dt(row_data)
row_data <- S4Vectors::DataFrame( DrugName = c("DrugA", "DrugA", "DrugB"), Gnumber = c("G1", "G2", "G5"), DrugName_2 = c("DrugC", "DrugC", "DrugD"), Gnumber_2 = c("G3", "G4", "G5") ) row_data <- set_unique_drug_names_dt(row_data)
This function sets the CellLineName
in colData
and DrugName
fields in rowData
to be unique for each SummarizedExperiment
in a MultiAssayExperiment
.
set_unique_identifiers(mae)
set_unique_identifiers(mae)
mae |
A MultiAssayExperiment object. |
A MultiAssayExperiment object with unique identifiers.
se1 <- SummarizedExperiment::SummarizedExperiment( assays = list(counts = matrix(1:4, ncol = 2)), colData = S4Vectors::DataFrame(parental_identifier = c("ID1", "ID1"), clid = c("C1", "C2")), rowData = S4Vectors::DataFrame(DrugName = c("DrugA", "DrugA"), Gnumber = c("G1", "G2")) ) rownames(SummarizedExperiment::colData(se1)) <- c("cl1", "cl2") rownames(SummarizedExperiment::rowData(se1)) <- c("g1", "g") se2 <- SummarizedExperiment::SummarizedExperiment( assays = list(counts = matrix(5:8, ncol = 2)), colData = S4Vectors::DataFrame(parental_identifier = c("ID2", "ID2"), clid = c("C3", "C4")), rowData = S4Vectors::DataFrame(DrugName = c("DrugB", "DrugB"), Gnumber = c("G3", "G4")) ) rownames(SummarizedExperiment::colData(se2)) <- c("cl3", "cl4") rownames(SummarizedExperiment::rowData(se2)) <- c("g3", "g4") mae <- MultiAssayExperiment::MultiAssayExperiment(experiments = list(se1 = se1, se2 = se2)) mae <- set_unique_identifiers(mae)
se1 <- SummarizedExperiment::SummarizedExperiment( assays = list(counts = matrix(1:4, ncol = 2)), colData = S4Vectors::DataFrame(parental_identifier = c("ID1", "ID1"), clid = c("C1", "C2")), rowData = S4Vectors::DataFrame(DrugName = c("DrugA", "DrugA"), Gnumber = c("G1", "G2")) ) rownames(SummarizedExperiment::colData(se1)) <- c("cl1", "cl2") rownames(SummarizedExperiment::rowData(se1)) <- c("g1", "g") se2 <- SummarizedExperiment::SummarizedExperiment( assays = list(counts = matrix(5:8, ncol = 2)), colData = S4Vectors::DataFrame(parental_identifier = c("ID2", "ID2"), clid = c("C3", "C4")), rowData = S4Vectors::DataFrame(DrugName = c("DrugB", "DrugB"), Gnumber = c("G3", "G4")) ) rownames(SummarizedExperiment::colData(se2)) <- c("cl3", "cl4") rownames(SummarizedExperiment::rowData(se2)) <- c("g3", "g4") mae <- MultiAssayExperiment::MultiAssayExperiment(experiments = list(se1 = se1, se2 = se2)) mae <- set_unique_identifiers(mae)
shorten normalization type
shorten_normalization_type_name(x)
shorten_normalization_type_name(x)
x |
string with normalization type |
shortened string representing the normalization type
shorten_normalization_type_name("GRvalue")
shorten_normalization_type_name("GRvalue")
Divide the columns of an input data.table into treatment metadata, condition metadata, experiment metadata, and assay data for further analysis. This will most commonly be used to identify the different components of a SummarizedExperiment object.
split_SE_components(df_, nested_keys = NULL, combine_on = 1L)
split_SE_components(df_, nested_keys = NULL, combine_on = 1L)
df_ |
data.table with drug-response data |
nested_keys |
character vector of keys to exclude from the row or column metadata, and to instead nest within an element of the matrix. See details. |
combine_on |
integer value of |
Named list containing the following elements:
treatment metadata
condition metadata
all data.table column names corresponding to fields nested within a BumpyMatrix cell
metadata that is constant for all entries of the data.table
key identifier mappings
The nested_keys
provides the user the opportunity to specify that they would not
like to use that metadata field as a differentiator of the treatments, and instead, incorporate it
into a nested DataFrame
in the BumpyMatrix matrix object.
In the event that if any of the nested_keys
are constant throughout the whole data.table,
they will still be included in the DataFrame of the BumpyMatrix and not in the experiment_metadata.
Columns within the df_
will be identified through the following logic:
First, the known data fields and any specified nested_keys
are extracted.
Following that, known cell and drug metadata fields are detected,
and any remaining columns that represent constant metadata fields across all rows are extracted.
Next, any cell line metadata will be heuristically extracted.
Finally, all remaining columns will be combined on either the rows or columns as specified by
combine_on
.
named list containing different elements of a SummarizedExperiment; see details.
split_SE_components(data.table::data.table(clid = "CL1", Gnumber = "DrugA"))
split_SE_components(data.table::data.table(clid = "CL1", Gnumber = "DrugA"))
Standardize MAE by switching from custom identifiers into gDR-default
standardize_mae(mae, use_default = TRUE)
standardize_mae(mae, use_default = TRUE)
mae |
a MultiAssayExperiment object with drug-response data generate by gDR pipeline |
use_default |
boolean indicating whether or not to use default identifiers for standardization |
mae a MultiAssayExperiment with default gDR identifiers
mae <- get_synthetic_data("finalMAE_small.qs") S4Vectors::metadata(mae[[1]])$identifiers$drug <- "druug" standardize_mae(mae)
mae <- get_synthetic_data("finalMAE_small.qs") S4Vectors::metadata(mae[[1]])$identifiers$drug <- "druug" standardize_mae(mae)
Standardize SE by switching from custom identifiers into gDR-default
standardize_se(se, use_default = TRUE)
standardize_se(se, use_default = TRUE)
se |
a SummarizedExperiment object with drug-response data generate by gDR pipeline |
use_default |
boolean indicating whether or not to use default identifiers for standardization |
se a SummarizedExperiment with default gDR identifiers
mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] S4Vectors::metadata(se)$identifiers$drug <- "druug" standardize_se(se)
mae <- get_synthetic_data("finalMAE_small.qs") se <- mae[[1]] S4Vectors::metadata(se)$identifiers$drug <- "druug" standardize_se(se)
String first and last characters of a string.
strip_first_and_last_char(jstring)
strip_first_and_last_char(jstring)
jstring |
String of any number of characters greater than 1. |
This is most often used to remove the JSON brackets '{'
and '}'
.
String with first and last characters stripped.
An auxiliary function that checks for duplicated rows in assay data.table,
In case of duplicates it throws a message. The messsage function is by default stop()
The message function can be customized with msg_f
parameter
throw_msg_if_duplicates( dt, assay_name = "unknown", msg_f = stop, preview_max_numb = 4 )
throw_msg_if_duplicates( dt, assay_name = "unknown", msg_f = stop, preview_max_numb = 4 )
dt |
data.table with assay data |
assay_name |
string with the name of the assay |
msg_f |
function to be used to throw the message |
preview_max_numb |
number of rows to preview if duplicates found |
sdata <- get_synthetic_data("finalMAE_small") smetrics_data <- convert_se_assay_to_dt(sdata[[1]], "Metrics") throw_msg_if_duplicates(smetrics_data, assay_name = "Metrics", msg_f = futile.logger::flog.info)
sdata <- get_synthetic_data("finalMAE_small") smetrics_data <- convert_se_assay_to_dt(sdata[[1]], "Metrics") throw_msg_if_duplicates(smetrics_data, assay_name = "Metrics", msg_f = futile.logger::flog.info)
Update environment identifiers from MAE object identifiers
update_env_idfs_from_mae(mae_idfs)
update_env_idfs_from_mae(mae_idfs)
mae_idfs |
A list containing MAE identifiers |
NULL
update_env_idfs_from_mae(list(get_env_identifiers()))
update_env_idfs_from_mae(list(get_env_identifiers()))
Update gDR synonyms for the identifier
update_idfs_synonyms(data, dict = get_idfs_synonyms())
update_idfs_synonyms(data, dict = get_idfs_synonyms())
data |
list of charvec with identifiers data |
dict |
list with dictionary |
list
mdict <- list(duration = "time") iv <- c("Time", "Duration", "time") update_idfs_synonyms(iv, dict = mdict)
mdict <- list(duration = "time") iv <- c("Time", "Duration", "time") update_idfs_synonyms(iv, dict = mdict)
Assure that dimnames of two objects are the same
validate_dimnames(obj, obj2, skip_empty = TRUE)
validate_dimnames(obj, obj2, skip_empty = TRUE)
obj |
first object with dimnames to compare |
obj2 |
second object with dimnames to compare |
skip_empty |
a logical indicating whether to skip comparison if a given dimname is empty in the case of both objects |
NULL
Check that specified identifier values exist in the data and error otherwise.
validate_identifiers( df, identifiers = NULL, req_ids = NULL, exp_one_ids = NULL )
validate_identifiers( df, identifiers = NULL, req_ids = NULL, exp_one_ids = NULL )
df |
data.table with |
identifiers |
Named list of identifiers where the |
req_ids |
Character vector of standardized identifier names required to pass identifier validation. |
exp_one_ids |
Character vector of standardized identifiers names
where only one identifier value is expected.
If not passed, defaults to |
Note that this does NOT set the identifiers anywhere (i.e. environment or SummarizedExperiment
object).
If identifiers do not validate, will throw error as side effect.
Named list of identifiers modified to pass validation against the input data. Errors with explanatory message if validation cannot pass with the given identifiers and data.
validate_identifiers( S4Vectors::DataFrame("Barcode" = NA, "Duration" = NA, "Template" = NA, "clid" = NA), req_ids = "barcode" )
validate_identifiers( S4Vectors::DataFrame("Barcode" = NA, "Duration" = NA, "Template" = NA, "clid" = NA), req_ids = "barcode" )
Validate JSON describing an object against a schema.
validate_json(json, schema_path)
validate_json(json, schema_path)
json |
String of JSON in memory. |
schema_path |
String of the schema to validate against. |
This is most often used to validate JSON before passing it in as a document to an ElasticSearch index.
Boolean of whether or not JSON successfully validated.
json <- '{}'
json <- '{}'
Function validates correctness of SE included in MAE by checking multiple cases:
detection of duplicated rowData/colData,
incompatibility of rownames/colnames,
occurrence of necessary assays,
detection of mismatch of CLIDs inside colData and colnames (different order),
correctness of metadata names.
validate_MAE(mae)
validate_MAE(mae)
mae |
MultiAssayExperiment object produced by the gDR pipeline |
NULL
invisibly if the MultiAssayExperiment is valid.
Throws an error if the MultiAssayExperiment is not valid.
Bartosz Czech [email protected]
mae <- get_synthetic_data("finalMAE_small") validate_MAE(mae)
mae <- get_synthetic_data("finalMAE_small") validate_MAE(mae)
Validate MAE object against a schema. Currently only SEs are validated TODO: add mae.json schema and validate full MAE object
validate_mae_with_schema( mae, schema_package = Sys.getenv("SCHEMA_PACKAGE", "gDRutils"), schema_dir_path = Sys.getenv("SCHEMA_DIR_PATH", "schemas"), schema = c(se = "se.json", mae = "mae.json") )
validate_mae_with_schema( mae, schema_package = Sys.getenv("SCHEMA_PACKAGE", "gDRutils"), schema_dir_path = Sys.getenv("SCHEMA_DIR_PATH", "schemas"), schema = c(se = "se.json", mae = "mae.json") )
mae |
MultiAssayExperiment object |
schema_package |
string name of the package with JSON schema files |
schema_dir_path |
path to the dir with JSON schema files |
schema |
named charvec with filenames of schemas to validate against. |
Boolean of whether or not mae is valid
mae <- get_synthetic_data("finalMAE_small") validate_mae_with_schema(mae)
mae <- get_synthetic_data("finalMAE_small") validate_mae_with_schema(mae)
Function validates correctness of SE by checking multiple cases:
detection of duplicated rowData/colData,
incompatibility of rownames/colnames,
occurrence of necessary assays,
detection of mismatch of CLIDs inside colData and colnames (different order),
correctness of metadata names.
validate_SE(se, expect_single_agent = FALSE)
validate_SE(se, expect_single_agent = FALSE)
se |
SummarizedExperiment object produced by the gDR pipeline |
expect_single_agent |
a logical indicating if the function should check whether the SummarizedExperiment is single-agent data |
NULL
invisibly if the SummarizedExperiment is valid.
Throws an error if the SummarizedExperiment is not valid.
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] validate_SE(se)
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] validate_SE(se)
Check for the presence of an assay in a SummarizedExperiment object.
validate_se_assay_name(se, name)
validate_se_assay_name(se, name)
se |
A SummarizedExperiment object. |
name |
String of name of the assay to validate. |
NULL
invisibly if the assay name is valid.
Throws an error if the assay is not valid.
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] validate_se_assay_name(se, "RawTreated")
mae <- get_synthetic_data("finalMAE_small") se <- mae[[1]] validate_se_assay_name(se, "RawTreated")