| Title: | Comprehensive Collection of Curated Benchmarking Datasets and their Evaluation |
|---|---|
| Description: | The trio is the combination of a data set, a metric and supporting evidence which provides some best case scenario, if not the ground truth itself. BenchHub has data downloaders for FigShare, G.E.O., and ExperimentHub. Caching is used to avoid lengthy downloads after the first time a data set is accessed. The user may also specify their own data set and supporting evidence. The Benchmark Insights module provides functionality for comparing and contrasting the performance of alternative algorithms. |
| Authors: | Cabiria Liang [aut], Sanghyun Kim [aut], Nick Robertson [aut], Marni Torkel [aut], Yue Cao [aut] (ORCID: <https://orcid.org/0000-0002-2356-4031>), Dario Strbenac [aut], Jean Yang [aut], SOMS Maintainer [aut, cre] |
| Maintainer: | SOMS Maintainer <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.99.15 |
| Built: | 2026-06-21 00:45:35 UTC |
| Source: | https://github.com/bioc/BenchHub |
Computes the adjusted Rand index between two cluster labelings.
ARImetric(evidence, predicted)ARImetric(evidence, predicted)
evidence |
The true labels. |
predicted |
The predicted labels. |
The adjusted Rand index.
evidence <- factor(c("A", "A", "B", "B")) predicted <- factor(c("A", "A", "B", "B")) ARImetric(evidence, predicted)evidence <- factor(c("A", "A", "B", "B")) predicted <- factor(c("A", "A", "B", "B")) ARImetric(evidence, predicted)
Computes the balanced accuracy of the predictions.
balAccMetric(evidence, predicted)balAccMetric(evidence, predicted)
evidence |
The true labels. |
predicted |
The predicted labels. |
The balanced accuracy.
evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) balAccMetric(evidence, predicted)evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) balAccMetric(evidence, predicted)
Computes the balanced error of the predictions.
balErrMetric(evidence, predicted)balErrMetric(evidence, predicted)
evidence |
The true labels. |
predicted |
The predicted labels. |
The balanced error.
evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) balErrMetric(evidence, predicted)evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) balErrMetric(evidence, predicted)
Computes Begg's C-Index for survival analysis.
beggCIndexMetric(evidence, predicted)beggCIndexMetric(evidence, predicted)
evidence |
The true survival times and event indicators. |
predicted |
The predicted survival times. |
Begg's C-Index.
# More realistic training dataset (8 patients) evidence <- list( survival::Surv(time = c(5, 10, 15, 20, 25, 30, 35, 40), event = c(1, 1, 0, 1, 0, 1, 1, 0)), # Training survival::Surv(time = c(12, 18, 25, 32), event = c(1, 0, 1, 0)) # Testing ) # Predicted risk scores predicted <- list( c(0.5142118, 0.3902035, 0.9057381, 0.4469696, 0.8360043, 0.7375956, 0.8110551, 0.3881083), # Training predictions c(0.685169729, 0.003948339, 0.832916080, 0.007334147) # Testing predictions ) beggCIndexMetric(evidence, predicted)# More realistic training dataset (8 patients) evidence <- list( survival::Surv(time = c(5, 10, 15, 20, 25, 30, 35, 40), event = c(1, 1, 0, 1, 0, 1, 1, 0)), # Training survival::Surv(time = c(12, 18, 25, 32), event = c(1, 0, 1, 0)) # Testing ) # Predicted risk scores predicted <- list( c(0.5142118, 0.3902035, 0.9057381, 0.4469696, 0.8360043, 0.7375956, 0.8110551, 0.3881083), # Training predictions c(0.685169729, 0.003948339, 0.832916080, 0.007334147) # Testing predictions ) beggCIndexMetric(evidence, predicted)
An object containing a benchmark result for evaluating analytical tasks.
A BenchmarkInsights object.
evalSummaryThe evaluation summary is stored by dataframe, where each row is the methodd identifier, each column is the metric used in the evaluation task and related information.
metadataA dataframe to store metadata for the benchmark.
new()
Create a BenchmarkInsights object
BenchmarkInsights$new(evalResult = NULL)
evalResultA dataframe containing initial evaluation results with columns such as datasetID, evidence, metric, and result.
addevalSummary()
Add additional evaluation summary to the existing evalSummary
BenchmarkInsights$addevalSummary(additional_evalResult)
additional_evalResultA dataframe containing additional evaluation results to be appended.
addMetadata()
Add metadata to the BenchmarkInsights object
BenchmarkInsights$addMetadata(metadata)
metadataA dataframe containing metadata information.
getHeatmap()
Creates a heatmap from the evaluation summary by averaging results across datasets.
BenchmarkInsights$getHeatmap()
A heatmap object.
getLineplot()
Creates a line plot for the given x and y variables, with an optional grouping and fixed x order.
BenchmarkInsights$getLineplot(order = NULL, metricVariable)
orderAn optional vector specifying the order of x-axis values.
metricVariableSpecify subset value in metric column.
A ggplot2 line plot object.
getScatterplot()
Creates a scatter plot for the same evidence, with an two method metrics.
BenchmarkInsights$getScatterplot(variables)
variablesA character vector of length two specifying the metric names to be used for the x and y axes.
A ggplot2 line plot object.
getBoxplot()
Creates boxplot plots for the mutiple evidence, different method, one metric.
BenchmarkInsights$getBoxplot(metricVariable, evidenceVariable)
metricVariableSpecify subset value in metric column.
evidenceVariableSpecify subset value in evidence column.
A ggplot2 line plot object.
getCorplot()
Creates a correlation plot based on the provided evaluation summary and the specified input type (either "evidence", "metric", or "method"). The correlation plot shows the pairwise correlation between results for different categories (evidence, metric, or method).
BenchmarkInsights$getCorplot(input_type)
input_typeA string that specifies the input type for generating the correlation plot. It must be either "evidence", "metric", or "method".
A ggplot2 correlation plot object. The plot visualizes the correlation matrix using ggcorrplot with aesthetic enhancements like labeled values and angled axis text.
getForestplot()
This function generates a forest plot using linear models based on the comparison between groups in the provided evaluation summary. The plot is created using dotwhisker and broom packages, with custom grouping and labeling.
BenchmarkInsights$getForestplot(input_group, input_model)
input_groupA string specifying the grouping variable (only "datasetID", "method", or "evidence" allowed).
input_modelA string specifying the model variable (only "datasetID", "method", or "evidence" allowed).
A forest plot showing the comparison of models across groups.
clone()
The objects of this class are cloneable with this method.
BenchmarkInsights$clone(deep = FALSE)
deepWhether to make a deep clone.
BenchmarkInsights$new()BenchmarkInsights$new()
This class manages a collection of benchmark trios and mapping functions. It allows adding new trios, mapping functions, and running mappings on data.
A BenchmarkStudy object.
nameA character string to name the study.
triosA list to store benchmark trios.
mappingFunctionsA list to store mapping functions with metadata.
descriptionA character string describing the study.
versionInteger specifying the version of the study.
new()
BenchmarkStudy$new( name = NULL, trios = list(), fetchFromCtd = FALSE, version = NULL )
nameA character string to name the study. If fetchFromCtd is TRUE, this name will be used to fetch the study from the submission database.
triosA list of Trio objects to initialize the study.
fetchFromCtdLogical indicating whether to fetch study details from the submission database.
versionOptional integer specifying which version of the study to fetch (when fetchFromCtd is TRUE).
addTrio()
Add a new trio to the study
BenchmarkStudy$addTrio(trioObject)
trioObjectA Trio object to be added.
addMappingFunction()
Add a mapping function with metadata
BenchmarkStudy$addMappingFunction( name, func, inputDescription, outputDescription, exampleUsage = NULL )
nameA character string to name the mapping function.
funcA function that takes data as input and returns transformed data.
inputDescriptionA character string describing the input data.
outputDescriptionA character string describing the output data.
exampleUsageAn optional character string showing example usage of the function.
runMapping()
Apply a mapping function to data
BenchmarkStudy$runMapping(mappingName, data)
mappingNameA character string naming the mapping function to apply.
dataThe data to which the mapping function will be applied.
The transformed data after applying the mapping function.
getMappingFunctionDocumentation()
Documentation getter for mapping function
BenchmarkStudy$getMappingFunctionDocumentation(mappingName)
mappingNameA character string naming the mapping function.
A list containing the input description, output description, and example usage.
printMappingFunctionDocumentation()
Print the documentation for a mapping function in a human-readable format.
BenchmarkStudy$printMappingFunctionDocumentation(mappingName)
mappingNameA character string naming the mapping function.
Prints inputDescription, outputDescription, exampleUsage
listMappingFunctions()
available options
BenchmarkStudy$listMappingFunctions()
A character vector of mapping function names.
generateVignetteTemplate()
Generate R Markdown vignette template
BenchmarkStudy$generateVignetteTemplate( outputPath = "benchmark_study_template.Rmd" )
outputPathA character string specifying the path to save the vignette template.
evaluate()
Evaluate a trio with input data
BenchmarkStudy$evaluate(trioName, input)
trioNameA character string naming the trio to evaluate.
inputThe input data to evaluate the trio against.
The evaluation result from the trio.
writeBenchmarkStudy()
Write the BenchmarkStudy metadata to Curated Trio Datasets sheet.
BenchmarkStudy$writeBenchmarkStudy()
print()
Print method to display key information about the BenchmarkStudy object.
BenchmarkStudy$print()
clone()
The objects of this class are cloneable with this method.
BenchmarkStudy$clone(deep = FALSE)
deepWhether to make a deep clone.
study <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." study$namestudy <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." study$name
Computes the Brier score for survival analysis.
brierScoreMetric(evidence, predicted)brierScoreMetric(evidence, predicted)
evidence |
The true survival times and event indicators. |
predicted |
The predicted survival times. |
The Brier score.
# More realistic training dataset (8 patients) evidence <- list( survival::Surv(time = c(5, 10, 15, 20, 25, 30, 35, 40), event = c(1, 1, 0, 1, 0, 1, 1, 0)), # Training survival::Surv(time = c(12, 18, 25, 32), event = c(1, 0, 1, 0)) # Testing ) # Predicted risk scores predicted <- list( c(0.5142118, 0.3902035, 0.9057381, 0.4469696, 0.8360043, 0.7375956, 0.8110551, 0.3881083), # Training predictions c(0.685169729, 0.003948339, 0.832916080, 0.007334147) # Testing predictions ) brierScoreMetric(evidence, predicted)# More realistic training dataset (8 patients) evidence <- list( survival::Surv(time = c(5, 10, 15, 20, 25, 30, 35, 40), event = c(1, 1, 0, 1, 0, 1, 1, 0)), # Training survival::Surv(time = c(12, 18, 25, 32), event = c(1, 0, 1, 0)) # Testing ) # Predicted risk scores predicted <- list( c(0.5142118, 0.3902035, 0.9057381, 0.4469696, 0.8360043, 0.7375956, 0.8110551, 0.3881083), # Training predictions c(0.685169729, 0.003948339, 0.832916080, 0.007334147) # Testing predictions ) brierScoreMetric(evidence, predicted)
This helper creates one DatasetEvidence row per supporting evidence entry
in the Trio object. It does not upload anything and does not modify the
existing writeCTD() path.
buildDatasetEvidenceSubmission( trio, datasetTaskID, evidenceName = names(trio$evidence), evidenceType = NA_character_ )buildDatasetEvidenceSubmission( trio, datasetTaskID, evidenceName = names(trio$evidence), evidenceType = NA_character_ )
trio |
A |
datasetTaskID |
Task identifier to link evidence rows to. Defaults to generated task IDs when not supplied. |
evidenceName |
Character vector of Trio evidence names to include.
Defaults to all evidence in |
evidenceType |
Either |
A data.frame matching the proposed DatasetEvidence table schema.
evidenceID is left as NA.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) buildDatasetEvidenceSubmission( trio, datasetTaskID = "task_001", evidenceType = "experimental_ground_truth" )data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) buildDatasetEvidenceSubmission( trio, datasetTaskID = "task_001", evidenceType = "experimental_ground_truth" )
This helper is the first step in the new submission workflow. It does not upload anything and does not modify the existing writeCTD() path.
buildDatasetSubmission( trio, name = private_submission_dataset_name(trio), dataType, dataModality, technology, description = trio$description, doi = NA_character_, organism = NA_character_, tissue, status )buildDatasetSubmission( trio, name = private_submission_dataset_name(trio), dataType, dataModality, technology, description = trio$description, doi = NA_character_, organism = NA_character_, tissue, status )
trio |
A |
name |
Dataset name for submission. Defaults to the existing Trio name,
falling back to |
dataType |
One of |
dataModality |
One of |
technology |
Free-text technology description. |
description |
Dataset description. Defaults to |
doi |
DOI reference for the dataset. |
organism |
Free-text organism label. |
tissue |
Free-text tissue label. |
status |
One of |
A one-row data.frame matching the proposed Dataset table schema.
datasetID is left as NA because it is expected to be generated by the
submission backend.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) buildDatasetSubmission( trio, dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" )data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) buildDatasetSubmission( trio, dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" )
This helper creates one DatasetTaskMetric row for each task-metric pairing.
It does not upload anything and does not modify the existing writeCTD() path.
buildDatasetTaskMetricSubmission( trio, datasetTaskID, taskMetrics = NULL, evidenceTaskID = NULL, evidenceName = names(trio$evidence) )buildDatasetTaskMetricSubmission( trio, datasetTaskID, taskMetrics = NULL, evidenceTaskID = NULL, evidenceName = names(trio$evidence) )
trio |
A |
datasetTaskID |
Task identifier(s) to link metrics to. Defaults to |
taskMetrics |
A named list mapping each |
evidenceTaskID |
Character vector mapping each Trio evidence entry to a
|
evidenceName |
Character vector of Trio evidence names corresponding to
|
A data.frame matching the proposed DatasetTaskMetric table
schema. datasetTaskMetricID and metricID are left as NA.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) buildDatasetTaskMetricSubmission( trio, datasetTaskID = "task_001", evidenceTaskID = "task_001" )data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) buildDatasetTaskMetricSubmission( trio, datasetTaskID = "task_001", evidenceTaskID = "task_001" )
This helper creates one DatasetTask row per supporting evidence name in the
Trio object. It does not upload anything and does not modify the existing
writeCTD() path.
buildDatasetTaskSubmission( trio, datasetID = NA_character_, taskStage, taskType, taskName )buildDatasetTaskSubmission( trio, datasetID = NA_character_, taskStage, taskType, taskName )
trio |
A |
datasetID |
Dataset identifier to link tasks to. Defaults to |
taskStage |
A character vector giving the task stage for each task row. |
taskType |
A character vector giving the task type for each task row. |
taskName |
A character vector giving the task name for each task row. |
A data.frame matching the proposed DatasetTask table schema.
datasetTaskID is left as NA.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) buildDatasetTaskSubmission( trio, taskStage = "prediction", taskType = "classification", taskName = "class_prediction" )data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) buildDatasetTaskSubmission( trio, taskStage = "prediction", taskType = "classification", taskName = "class_prediction" )
This helper creates one Metric row per metric in the Trio object. It does
not upload anything and does not modify the existing writeCTD() path.
buildMetricSubmission(trio)buildMetricSubmission(trio)
trio |
A |
A data.frame matching the proposed Metric table schema.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) buildMetricSubmission(trio)data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) buildMetricSubmission(trio)
Build Study and StudyDataset submission tables
buildStudySubmission( study, datasetIDs, existing_studies = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", version = NULL, type = NULL, protocolGist = "", mappingFunctions = "" )buildStudySubmission( study, datasetIDs, existing_studies = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", version = NULL, type = NULL, protocolGist = "", mappingFunctions = "" )
study |
A |
datasetIDs |
Character vector of existing dataset IDs to link. |
existing_studies |
Optional data frame of current Study rows. |
ss |
Submission spreadsheet ID or URL. Used when |
version |
Optional version override. |
type |
Optional type override. Must be |
protocolGist |
Optional protocol gist URL. |
mappingFunctions |
Optional mapping functions gist URL. |
A named list containing Study and StudyDataset.
study <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." existing_studies <- data.frame( studyID = character(0), studyName = character(0), version = character(0), description = character(0), type = character(0), protocolGist = character(0), mappingFunctions = character(0), stringsAsFactors = FALSE ) buildStudySubmission( study, datasetIDs = "dataset_001", existing_studies = existing_studies )study <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." existing_studies <- data.frame( studyID = character(0), studyName = character(0), version = character(0), description = character(0), type = character(0), protocolGist = character(0), mappingFunctions = character(0), stringsAsFactors = FALSE ) buildStudySubmission( study, datasetIDs = "dataset_001", existing_studies = existing_studies )
Convert a Study submission to payload structure
buildStudySubmissionPayload(submission)buildStudySubmissionPayload(submission)
submission |
A submission object returned by |
A named list with a top-level payload entry.
study <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." existing_studies <- data.frame( studyID = character(0), studyName = character(0), version = character(0), description = character(0), type = character(0), protocolGist = character(0), mappingFunctions = character(0), stringsAsFactors = FALSE ) submission <- buildStudySubmission( study, datasetIDs = "dataset_001", existing_studies = existing_studies ) payload <- buildStudySubmissionPayload(submission) names(payload)study <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." existing_studies <- data.frame( studyID = character(0), studyName = character(0), version = character(0), description = character(0), type = character(0), protocolGist = character(0), mappingFunctions = character(0), stringsAsFactors = FALSE ) submission <- buildStudySubmission( study, datasetIDs = "dataset_001", existing_studies = existing_studies ) payload <- buildStudySubmissionPayload(submission) names(payload)
This helper assembles the five Trio-focused submission tables into a single list so it can be inspected locally before any upload step is added.
buildTrioSubmission(trio, dataset_args, task_args, evidence_task_map)buildTrioSubmission(trio, dataset_args, task_args, evidence_task_map)
trio |
A |
dataset_args |
Named list of arguments passed to
|
task_args |
Named list of arguments passed to
|
evidence_task_map |
Character vector mapping Trio evidence names to the
corresponding task names in the submission. Names must be evidence names
from |
A named list containing Dataset, DatasetTask,
DatasetEvidence, Metric, DatasetTaskMetric, and
submission_links.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) dataset_args <- list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ) task_args <- list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ) submission <- buildTrioSubmission( trio = trio, dataset_args = dataset_args, task_args = task_args, evidence_task_map = c(class_labels = "class_prediction") ) names(submission)data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) dataset_args <- list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ) task_args <- list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ) submission <- buildTrioSubmission( trio = trio, dataset_args = dataset_args, task_args = task_args, evidence_task_map = c(class_labels = "class_prediction") ) names(submission)
This helper converts the output of buildTrioSubmission() into the nested
list structure used by the JSON payload in the prototype submission script.
buildTrioSubmissionPayload(submission)buildTrioSubmissionPayload(submission)
submission |
A submission object returned by |
A named list with a top-level payload entry ready for
jsonlite::toJSON(..., auto_unbox = TRUE).
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) submission <- buildTrioSubmission( trio = trio, dataset_args = list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ), task_args = list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ), evidence_task_map = c(class_labels = "class_prediction") ) payload <- buildTrioSubmissionPayload(submission) names(payload)data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) submission <- buildTrioSubmission( trio = trio, dataset_args = list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ), task_args = list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ), evidence_task_map = c(class_labels = "class_prediction") ) payload <- buildTrioSubmissionPayload(submission) names(payload)
This helper gathers the dataset-level fields needed for the Dataset table.
It reuses Trio metadata when already available and only prompts for the
remaining values in interactive sessions.
collectDatasetSubmissionInfo(trio, defaults = list())collectDatasetSubmissionInfo(trio, defaults = list())
trio |
A |
defaults |
Optional named list of pre-filled values. Any non- |
A named list ready to pass as dataset_args to
buildDatasetSubmission().
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) collectDatasetSubmissionInfo( trio, defaults = list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ) )data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) collectDatasetSubmissionInfo( trio, defaults = list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ) )
This helper builds the DatasetTaskMetric linking table automatically from
the evidence-to-task assignments and the metric links already stored inside
the Trio object.
collectDatasetTaskMetricSubmission(trio, evidence_args, task_args)collectDatasetTaskMetricSubmission(trio, evidence_args, task_args)
trio |
A |
evidence_args |
Named list returned by
|
task_args |
Named list returned by |
A data.frame matching the DatasetTaskMetric schema.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) collectDatasetTaskMetricSubmission( trio, evidence_args = list( datasetTaskID = "task_001", evidenceName = "class_labels" ), task_args = list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ) )data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) collectDatasetTaskMetricSubmission( trio, evidence_args = list( datasetTaskID = "task_001", evidenceName = "class_labels" ), task_args = list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ) )
This helper gathers the evidence-to-task assignment and evidence type for
each supporting evidence item in a Trio.
collectEvidenceSubmissionInfo(trio, task_args, defaults = list())collectEvidenceSubmissionInfo(trio, task_args, defaults = list())
trio |
A |
task_args |
Named list returned by |
defaults |
Optional named list with entries |
A named list with datasetTaskID, evidenceName,
evidenceType, and evidence_task_map.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) task_args <- list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ) collectEvidenceSubmissionInfo( trio, task_args = task_args, defaults = list( taskName = "class_prediction", evidenceType = "experimental_ground_truth" ) )data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) task_args <- list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ) collectEvidenceSubmissionInfo( trio, task_args = task_args, defaults = list( taskName = "class_prediction", evidenceType = "experimental_ground_truth" ) )
This helper gathers user-supplied metric classification for each metric in a
Trio. The metric names and wrappers still come from the Trio object and the
package helper logic.
collectMetricSubmissionInfo(trio, defaults = list())collectMetricSubmissionInfo(trio, defaults = list())
trio |
A |
defaults |
Optional named list with entry |
A named list ready to merge into the Metric submission table.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) collectMetricSubmissionInfo( trio, defaults = list(metricType = "label_based") )data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) collectMetricSubmissionInfo( trio, defaults = list(metricType = "label_based") )
Collect Study submission metadata
collectStudySubmissionInfo( study, datasetIDs = NULL, available_datasets = NULL, existing_studies = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", defaults = list() )collectStudySubmissionInfo( study, datasetIDs = NULL, available_datasets = NULL, existing_studies = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", defaults = list() )
study |
A |
datasetIDs |
Optional character vector of dataset IDs. When |
available_datasets |
Optional data frame of available Dataset rows. |
existing_studies |
Optional data frame of current Study rows. |
ss |
Submission spreadsheet ID or URL. |
defaults |
Optional named list with entries such as |
A named list ready to pass into buildStudySubmission().
study <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." available_datasets <- data.frame( datasetID = "dataset_001", name = "example_dataset", stringsAsFactors = FALSE ) existing_studies <- data.frame( studyID = character(0), studyName = character(0), version = character(0), description = character(0), type = character(0), protocolGist = character(0), mappingFunctions = character(0), stringsAsFactors = FALSE ) collectStudySubmissionInfo( study, datasetIDs = "dataset_001", available_datasets = available_datasets, existing_studies = existing_studies )study <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." available_datasets <- data.frame( datasetID = "dataset_001", name = "example_dataset", stringsAsFactors = FALSE ) existing_studies <- data.frame( studyID = character(0), studyName = character(0), version = character(0), description = character(0), type = character(0), protocolGist = character(0), mappingFunctions = character(0), stringsAsFactors = FALSE ) collectStudySubmissionInfo( study, datasetIDs = "dataset_001", available_datasets = available_datasets, existing_studies = existing_studies )
This helper gathers the task-level fields needed for the DatasetTask
table. It does not modify the Trio object.
collectTaskSubmissionInfo(trio, defaults = list(), n_tasks = NULL)collectTaskSubmissionInfo(trio, defaults = list(), n_tasks = NULL)
trio |
A |
defaults |
Optional named list of pre-filled task values. Supported
entries are |
n_tasks |
Number of tasks to define. If |
A named list ready to pass as task_args to
buildDatasetTaskSubmission().
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) collectTaskSubmissionInfo( trio, defaults = list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ) )data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) collectTaskSubmissionInfo( trio, defaults = list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ) )
Download a BenchmarkStudy from the submission database
downloadSubmissionStudy( studyID, name = NULL, version = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", cachePath = tempdir() )downloadSubmissionStudy( studyID, name = NULL, version = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", cachePath = tempdir() )
studyID |
Study identifier from the |
name |
Optional study name fallback for compatibility. Prefer
|
version |
Optional version string used only when loading by |
ss |
Google Sheets spreadsheet ID containing the submission tables. |
cachePath |
Directory for downloaded files. Defaults to |
A populated BenchmarkStudy object.
if (interactive()) { study <- downloadSubmissionStudy( studyID = "ST005", cachePath = tempdir() ) study }if (interactive()) { study <- downloadSubmissionStudy( studyID = "ST005", cachePath = tempdir() ) study }
Download a Trio from the five-table submission database
downloadSubmissionTrio( datasetID, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", cachePath = tempdir() )downloadSubmissionTrio( datasetID, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", cachePath = tempdir() )
datasetID |
Dataset identifier from the |
ss |
Google Sheets spreadsheet ID containing the five database tables. |
cachePath |
Directory for downloaded files. Defaults to |
A populated Trio object.
Get one existing Study row by studyID
getSubmissionStudy( studyID, studies = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg" )getSubmissionStudy( studyID, studies = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg" )
studyID |
Existing Study identifier. |
studies |
Optional data frame of Study rows. |
ss |
Submission spreadsheet ID or URL. |
A one-row data frame for the requested Study.
studies <- data.frame( studyID = "study_001", studyName = "example_study", version = "0.0.1", description = "A small example benchmark study.", type = "original", protocolGist = "", mappingFunctions = "", stringsAsFactors = FALSE ) getSubmissionStudy( "study_001", studies = studies )studies <- data.frame( studyID = "study_001", studyName = "example_study", version = "0.0.1", description = "A small example benchmark study.", type = "original", protocolGist = "", mappingFunctions = "", stringsAsFactors = FALSE ) getSubmissionStudy( "study_001", studies = studies )
Get linked StudyDataset rows by studyID
getSubmissionStudyDatasets( studyID, study_datasets = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg" )getSubmissionStudyDatasets( studyID, study_datasets = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg" )
studyID |
Existing Study identifier. |
study_datasets |
Optional data frame of StudyDataset rows. |
ss |
Submission spreadsheet ID or URL. |
A data frame of StudyDataset rows linked to the supplied studyID.
study_datasets <- data.frame( studyDatasetID = "study_dataset_001", studyID = "study_001", datasetID = "dataset_001", stringsAsFactors = FALSE ) getSubmissionStudyDatasets( "study_001", study_datasets = study_datasets )study_datasets <- data.frame( studyDatasetID = "study_dataset_001", studyID = "study_001", datasetID = "dataset_001", stringsAsFactors = FALSE ) getSubmissionStudyDatasets( "study_001", study_datasets = study_datasets )
Computes the GH C-Index for survival analysis.
ghCIndexMetric(evidence, predicted)ghCIndexMetric(evidence, predicted)
evidence |
The true survival times and event indicators. |
predicted |
The predicted survival times. |
The GH C-Index.
# More realistic training dataset (8 patients) evidence <- list( survival::Surv(time = c(5, 10, 15, 20, 25, 30, 35, 40), event = c(1, 1, 0, 1, 0, 1, 1, 0)), # Training survival::Surv(time = c(12, 18, 25, 32), event = c(1, 0, 1, 0)) # Testing ) # Predicted risk scores predicted <- list( c(0.5142118, 0.3902035, 0.9057381, 0.4469696, 0.8360043, 0.7375956, 0.8110551, 0.3881083), # Training predictions c(0.685169729, 0.003948339, 0.832916080, 0.007334147) # Testing predictions ) ghCIndexMetric(evidence, predicted)# More realistic training dataset (8 patients) evidence <- list( survival::Surv(time = c(5, 10, 15, 20, 25, 30, 35, 40), event = c(1, 1, 0, 1, 0, 1, 1, 0)), # Training survival::Surv(time = c(12, 18, 25, 32), event = c(1, 0, 1, 0)) # Testing ) # Predicted risk scores predicted <- list( c(0.5142118, 0.3902035, 0.9057381, 0.4469696, 0.8360043, 0.7375956, 0.8110551, 0.3881083), # Training predictions c(0.685169729, 0.003948339, 0.832916080, 0.007334147) # Testing predictions ) ghCIndexMetric(evidence, predicted)
Computes Harrel's C-Index for survival analysis.
harrelCIndexMetric(evidence, predicted)harrelCIndexMetric(evidence, predicted)
evidence |
The true survival times and event indicators. |
predicted |
The predicted survival times. |
Harrel's C-Index.
# More realistic training dataset (8 patients) evidence <- list( survival::Surv(time = c(5, 10, 15, 20, 25, 30, 35, 40), event = c(1, 1, 0, 1, 0, 1, 1, 0)), # Training survival::Surv(time = c(12, 18, 25, 32), event = c(1, 0, 1, 0)) # Testing ) # Predicted risk scores predicted <- list( c(0.5142118, 0.3902035, 0.9057381, 0.4469696, 0.8360043, 0.7375956, 0.8110551, 0.3881083), # Training predictions c(0.685169729, 0.003948339, 0.832916080, 0.007334147) # Testing predictions ) harrelCIndexMetric(evidence, predicted)# More realistic training dataset (8 patients) evidence <- list( survival::Surv(time = c(5, 10, 15, 20, 25, 30, 35, 40), event = c(1, 1, 0, 1, 0, 1, 1, 0)), # Training survival::Surv(time = c(12, 18, 25, 32), event = c(1, 0, 1, 0)) # Testing ) # Predicted risk scores predicted <- list( c(0.5142118, 0.3902035, 0.9057381, 0.4469696, 0.8360043, 0.7375956, 0.8110551, 0.3881083), # Training predictions c(0.685169729, 0.003948339, 0.832916080, 0.007334147) # Testing predictions ) harrelCIndexMetric(evidence, predicted)
Interactively prepare a Study submission
interactivePrepareStudySubmission( study, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", githubPat = Sys.getenv("GITHUB_PAT"), gistPublic = TRUE, url = submission_webapp_url, review = TRUE, build_payload = TRUE, build_json = TRUE )interactivePrepareStudySubmission( study, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", githubPat = Sys.getenv("GITHUB_PAT"), gistPublic = TRUE, url = submission_webapp_url, review = TRUE, build_payload = TRUE, build_json = TRUE )
study |
A |
ss |
Submission spreadsheet ID or URL. |
githubPat |
Optional GitHub personal access token. |
gistPublic |
Logical; whether uploaded gists should be public. |
url |
Google Apps Script endpoint URL. Defaults to the package submission web app. |
review |
Whether to print a short submission summary. Defaults to
|
build_payload |
Whether to include the nested payload. Defaults to
|
build_json |
Whether to include JSON output. Defaults to |
A named list containing the Study submission bundle.
Interactively prepare a Study update submission
interactivePrepareStudyUpdateSubmission( studyID = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", githubPat = Sys.getenv("GITHUB_PAT"), gistPublic = TRUE, url = submission_webapp_url, review = TRUE, build_payload = TRUE, build_json = TRUE )interactivePrepareStudyUpdateSubmission( studyID = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", githubPat = Sys.getenv("GITHUB_PAT"), gistPublic = TRUE, url = submission_webapp_url, review = TRUE, build_payload = TRUE, build_json = TRUE )
studyID |
Optional existing Study identifier to update. When omitted, the user is prompted to choose one. |
ss |
Submission spreadsheet ID or URL. |
githubPat |
Optional GitHub personal access token. |
gistPublic |
Logical; whether uploaded gists should be public. |
url |
Google Apps Script endpoint URL. Defaults to the package submission web app. |
review |
Whether to print a short submission summary. Defaults to
|
build_payload |
Whether to include the nested payload. Defaults to
|
build_json |
Whether to include JSON output. Defaults to |
A named list containing the updated Study submission bundle.
Computes the Jensen-Shannon divergence between two non-negative numeric vectors after normalizing them to probability distributions.
JSDmetric(evidence, predicted)JSDmetric(evidence, predicted)
evidence |
The true values. |
predicted |
The predicted values. |
The Jensen-Shannon divergence.
evidence <- c(0.2, 0.3, 0.5) predicted <- c(0.1, 0.4, 0.5) JSDmetric(evidence, predicted)evidence <- c(0.2, 0.3, 0.5) predicted <- c(0.1, 0.4, 0.5) JSDmetric(evidence, predicted)
Computes the kernel density estimation test statistic.
kdeMetric(evidence, predicted)kdeMetric(evidence, predicted)
evidence |
The true values. |
predicted |
The predicted values. |
The KDE test statistic.
evidence <- c(1, 2, 3, 4) predicted <- c(1.1, 2.1, 2.9, 4.2) kdeMetric(evidence, predicted)evidence <- c(1, 2, 3, 4) predicted <- c(1.1, 2.1, 2.9, 4.2) kdeMetric(evidence, predicted)
List the curated Trio datasets
listCuratedTrioDatasets(name_filter = NULL, dataType_filter = NULL)listCuratedTrioDatasets(name_filter = NULL, dataType_filter = NULL)
name_filter |
A string to filter datasets by name (case-insensitive partial match) |
dataType_filter |
A string or vector of strings to filter datasets by data type |
A data frame with the dataset names and IDs.
List the curated Trio studies
listCuratedTrioStudies(name_filter = NULL)listCuratedTrioStudies(name_filter = NULL)
name_filter |
A string to filter studies by name (case-insensitive partial match) |
A data frame with the study names and IDs.
List existing Study rows
listSubmissionStudies(ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg")listSubmissionStudies(ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg")
ss |
Submission spreadsheet ID or URL. |
A data frame of existing Study rows.
List available datasets for Study submission
listSubmissionStudyDatasets( ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg" )listSubmissionStudyDatasets( ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg" )
ss |
Submission spreadsheet ID or URL. |
A data frame of existing Dataset rows.
Data set consists of a matrix of abundances of 1192 microbial taxa for 575 samples and a factor vector of classes for Parkinson's disease for the same 575 patients
x has a row for each sample and a column for each
taxon. lubomPD is a factor vector with values 0 representing Healthy Control (HC)
and 1 representing Parkinson's Disease (PD).
An R package PD16Sdata, BMC, 2023. Webpage: https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-023-01475-4
Computes the macro F1 score of the predictions.
macroF1Metric(evidence, predicted)macroF1Metric(evidence, predicted)
evidence |
The true labels. |
predicted |
The predicted labels. |
The macro F1 score.
evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) macroF1Metric(evidence, predicted)evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) macroF1Metric(evidence, predicted)
Computes the macro precision of the predictions.
macroPrecMetric(evidence, predicted)macroPrecMetric(evidence, predicted)
evidence |
The true labels. |
predicted |
The predicted labels. |
The macro precision.
evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) macroPrecMetric(evidence, predicted)evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) macroPrecMetric(evidence, predicted)
Computes the macro recall of the predictions.
macroRecMetric(evidence, predicted)macroRecMetric(evidence, predicted)
evidence |
The true labels. |
predicted |
The predicted labels. |
The macro recall.
evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) macroRecMetric(evidence, predicted)evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) macroRecMetric(evidence, predicted)
Computes the Matthews Correlation Coefficient (MCC) of the predictions.
MCCmetric(evidence, predicted)MCCmetric(evidence, predicted)
evidence |
The true labels. |
predicted |
The predicted labels. |
The MCC.
evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) MCCmetric(evidence, predicted)evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) MCCmetric(evidence, predicted)
Computes the micro F1 score of the predictions.
microF1Metric(evidence, predicted)microF1Metric(evidence, predicted)
evidence |
The true labels. |
predicted |
The predicted labels. |
The micro F1 score.
evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) microF1Metric(evidence, predicted)evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) microF1Metric(evidence, predicted)
Computes the micro precision of the predictions.
microPrecMetric(evidence, predicted)microPrecMetric(evidence, predicted)
evidence |
The true labels. |
predicted |
The predicted labels. |
The micro precision.
evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) microPrecMetric(evidence, predicted)evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) microPrecMetric(evidence, predicted)
Computes the micro recall of the predictions.
microRecMetric(evidence, predicted)microRecMetric(evidence, predicted)
evidence |
The true labels. |
predicted |
The predicted labels. |
The micro recall.
evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) microRecMetric(evidence, predicted)evidence <- factor(c("A", "B", "A", "B")) predicted <- factor(c("A", "A", "A", "B")) microRecMetric(evidence, predicted)
Computes the mean squared error of the predictions.
MSEmetric(evidence, predicted)MSEmetric(evidence, predicted)
evidence |
The true values. |
predicted |
The predicted values. |
The mean squared error.
evidence <- c(1, 2, 3, 4) predicted <- c(1.1, 2.1, 2.9, 4.2) MSEmetric(evidence, predicted)evidence <- c(1, 2, 3, 4) predicted <- c(1.1, 2.1, 2.9, 4.2) MSEmetric(evidence, predicted)
Computes the normalized mutual information between two cluster labelings.
NMImetric(evidence, predicted)NMImetric(evidence, predicted)
evidence |
The true labels. |
predicted |
The predicted labels. |
The normalized mutual information.
evidence <- factor(c("A", "A", "B", "B")) predicted <- factor(c("A", "A", "B", "B")) NMImetric(evidence, predicted)evidence <- factor(c("A", "A", "B", "B")) predicted <- factor(c("A", "A", "B", "B")) NMImetric(evidence, predicted)
Prepare a Study submission bundle
prepareStudySubmission( study, datasetIDs = NULL, available_datasets = NULL, existing_studies = NULL, defaults = list(), protocolGist = NULL, mappingFunctions = NULL, protocolFile = NULL, mappingFunctionsFile = NULL, uploadProtocol = FALSE, uploadMappingFunctions = FALSE, githubPat = Sys.getenv("GITHUB_PAT"), gistPublic = TRUE, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", build_payload = TRUE, build_json = TRUE, review = TRUE, submit = FALSE, url = submission_webapp_url, submittedBy = NULL )prepareStudySubmission( study, datasetIDs = NULL, available_datasets = NULL, existing_studies = NULL, defaults = list(), protocolGist = NULL, mappingFunctions = NULL, protocolFile = NULL, mappingFunctionsFile = NULL, uploadProtocol = FALSE, uploadMappingFunctions = FALSE, githubPat = Sys.getenv("GITHUB_PAT"), gistPublic = TRUE, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", build_payload = TRUE, build_json = TRUE, review = TRUE, submit = FALSE, url = submission_webapp_url, submittedBy = NULL )
study |
A |
datasetIDs |
Optional character vector of existing dataset IDs to link. |
available_datasets |
Optional data frame of Dataset rows for selection. |
existing_studies |
Optional data frame of Study rows. |
defaults |
Optional named list for |
protocolGist |
Optional protocol gist URL. |
mappingFunctions |
Optional mapping functions gist URL. |
protocolFile |
Optional local file path to upload as a protocol gist. |
mappingFunctionsFile |
Optional local file path to upload as a mapping functions gist. |
uploadProtocol |
Logical; whether to upload |
uploadMappingFunctions |
Logical; whether to upload
|
githubPat |
Optional GitHub personal access token. |
gistPublic |
Logical; whether uploaded gists should be public. |
ss |
Submission spreadsheet ID or URL. |
build_payload |
Whether to include the nested payload. Defaults to
|
build_json |
Whether to include JSON output. Defaults to |
review |
Whether to print a short submission summary. Defaults to
|
submit |
Whether to submit immediately. Defaults to |
url |
Google Apps Script endpoint URL. Defaults to the package submission web app. |
submittedBy |
Submitter email or identifier. |
A named list containing the Study submission bundle.
study <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." available_datasets <- data.frame( datasetID = "dataset_001", name = "example_dataset", stringsAsFactors = FALSE ) existing_studies <- data.frame( studyID = character(0), studyName = character(0), version = character(0), description = character(0), type = character(0), protocolGist = character(0), mappingFunctions = character(0), stringsAsFactors = FALSE ) result <- prepareStudySubmission( study, datasetIDs = "dataset_001", available_datasets = available_datasets, existing_studies = existing_studies, build_json = FALSE, review = FALSE ) names(result)study <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." available_datasets <- data.frame( datasetID = "dataset_001", name = "example_dataset", stringsAsFactors = FALSE ) existing_studies <- data.frame( studyID = character(0), studyName = character(0), version = character(0), description = character(0), type = character(0), protocolGist = character(0), mappingFunctions = character(0), stringsAsFactors = FALSE ) result <- prepareStudySubmission( study, datasetIDs = "dataset_001", available_datasets = available_datasets, existing_studies = existing_studies, build_json = FALSE, review = FALSE ) names(result)
Prepare an update submission from an existing Study version
prepareStudyUpdateSubmission( studyID, study = NULL, datasetIDs = NULL, description = NULL, protocolGist = NULL, mappingFunctions = NULL, studies = NULL, study_datasets = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", build_payload = TRUE, build_json = TRUE, review = TRUE, submit = FALSE, url = submission_webapp_url, submittedBy = NULL )prepareStudyUpdateSubmission( studyID, study = NULL, datasetIDs = NULL, description = NULL, protocolGist = NULL, mappingFunctions = NULL, studies = NULL, study_datasets = NULL, ss = "1H8hOxL8D0XTquao8vGZ2cr9-XeaFC48SWAdFn0M3fkg", build_payload = TRUE, build_json = TRUE, review = TRUE, submit = FALSE, url = submission_webapp_url, submittedBy = NULL )
studyID |
Existing Study identifier to use as the update baseline. |
study |
Optional |
datasetIDs |
Optional replacement dataset IDs. Defaults to the linked datasets from the baseline Study. |
description |
Optional replacement description. |
protocolGist |
Optional replacement protocol gist URL. |
mappingFunctions |
Optional replacement mapping functions gist URL. |
studies |
Optional data frame of Study rows. |
study_datasets |
Optional data frame of StudyDataset rows. |
ss |
Submission spreadsheet ID or URL. |
build_payload |
Whether to include the nested payload. Defaults to
|
build_json |
Whether to include JSON output. Defaults to |
review |
Whether to print a short submission summary. Defaults to
|
submit |
Whether to submit immediately. Defaults to |
url |
Google Apps Script endpoint URL. Defaults to the package submission web app. |
submittedBy |
Submitter email or identifier. |
A named list containing the updated Study submission bundle.
studies <- data.frame( studyID = "study_001", studyName = "example_study", version = "0.0.1", description = "A small example benchmark study.", type = "original", protocolGist = "", mappingFunctions = "", stringsAsFactors = FALSE ) study_datasets <- data.frame( studyDatasetID = "study_dataset_001", studyID = "study_001", datasetID = "dataset_001", stringsAsFactors = FALSE ) result <- prepareStudyUpdateSubmission( studyID = "study_001", studies = studies, study_datasets = study_datasets, build_json = FALSE, review = FALSE ) names(result)studies <- data.frame( studyID = "study_001", studyName = "example_study", version = "0.0.1", description = "A small example benchmark study.", type = "original", protocolGist = "", mappingFunctions = "", stringsAsFactors = FALSE ) study_datasets <- data.frame( studyDatasetID = "study_dataset_001", studyID = "study_001", datasetID = "dataset_001", stringsAsFactors = FALSE ) result <- prepareStudyUpdateSubmission( studyID = "study_001", studies = studies, study_datasets = study_datasets, build_json = FALSE, review = FALSE ) names(result)
This is the higher-level R interface for the new submission workflow. It can
prepare dataset/evidence files, prepare metric metadata, build the five-table
submission object, and return the payload/JSON for inspection before calling
submitTrioSubmission().
prepareTrioSubmissionBundle( trio, dataset_args, task_args, evidence_task_map, prepare_files = FALSE, file_args = list(), prepare_metrics = FALSE, metric_args = list(), build_payload = TRUE, build_json = FALSE )prepareTrioSubmissionBundle( trio, dataset_args, task_args, evidence_task_map, prepare_files = FALSE, file_args = list(), prepare_metrics = FALSE, metric_args = list(), build_payload = TRUE, build_json = FALSE )
trio |
A |
dataset_args |
Named list passed to |
task_args |
Named list passed to |
evidence_task_map |
Named character vector mapping Trio evidence names to submission task names. |
prepare_files |
Logical; if |
file_args |
Named list of additional arguments for
|
prepare_metrics |
Logical; if |
metric_args |
Named list of additional arguments for
|
build_payload |
Logical; if |
build_json |
Logical; if |
A named list containing the built submission, plus optional
files, metrics, payload, and json.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) dataset_args <- list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ) task_args <- list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ) bundle <- prepareTrioSubmissionBundle( trio = trio, dataset_args = dataset_args, task_args = task_args, evidence_task_map = c(class_labels = "class_prediction") ) names(bundle)data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) dataset_args <- list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ) task_args <- list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ) bundle <- prepareTrioSubmissionBundle( trio = trio, dataset_args = dataset_args, task_args = task_args, evidence_task_map = c(class_labels = "class_prediction") ) names(bundle)
This helper reuses the good parts of the old writeCTD() workflow for the
new submission path. It can either save dataset/evidence files locally so
the user can upload them, or reuse an existing downloadable source already
attached to the Trio.
prepareTrioSubmissionFiles( trio, name = trio$name, outputDir = ".", saveData = NULL, saveEvidence = NULL, useExistingSource = NULL, figshareUrl = NULL, datasetFileName = NULL, evidenceFileName = NULL, verifyFigshare = FALSE, skipMd5Check = FALSE )prepareTrioSubmissionFiles( trio, name = trio$name, outputDir = ".", saveData = NULL, saveEvidence = NULL, useExistingSource = NULL, figshareUrl = NULL, datasetFileName = NULL, evidenceFileName = NULL, verifyFigshare = FALSE, skipMd5Check = FALSE )
trio |
A |
name |
Dataset name used for generated filenames. Defaults to
|
outputDir |
Directory for generated |
saveData |
Logical indicating whether to save |
saveEvidence |
Logical indicating whether to save all supporting
evidence to a single RDS file. If |
useExistingSource |
Logical indicating whether to keep using
|
figshareUrl |
Optional Figshare URL to verify against prepared files. |
datasetFileName |
Optional dataset filename to match inside the Figshare article. |
evidenceFileName |
Optional evidence filename to match inside the Figshare article. |
verifyFigshare |
Logical; if |
skipMd5Check |
Logical; if |
A named list describing the saved files and/or reusable download sources for dataset and evidence.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) files <- prepareTrioSubmissionFiles( trio, outputDir = tempdir(), saveData = TRUE, saveEvidence = TRUE, useExistingSource = FALSE ) names(files)data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) files <- prepareTrioSubmissionFiles( trio, outputDir = tempdir(), saveData = TRUE, saveEvidence = TRUE, useExistingSource = FALSE ) names(files)
This helper aligns the new submission workflow with the older writeCTD()
metric handling. Internal package metrics are recorded directly. Custom
metrics can optionally be uploaded to a GitHub gist and linked from the
resulting Metric table rows.
prepareTrioSubmissionMetrics( trio, name = trio$name, uploadCustom = FALSE, githubPat = NULL, gistPublic = TRUE )prepareTrioSubmissionMetrics( trio, name = trio$name, uploadCustom = FALSE, githubPat = NULL, gistPublic = TRUE )
trio |
A |
name |
Dataset name used for the gist filename. Defaults to |
uploadCustom |
Logical; if |
githubPat |
Optional GitHub personal access token with gist scope. |
gistPublic |
Logical; whether the created gist should be public. |
A list with entries Metric, gist, and custom_metric_lines.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) prepareTrioSubmissionMetrics(trio)data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) prepareTrioSubmissionMetrics(trio)
Computes the root mean squared error of the predictions.
RMSEmetric(evidence, predicted)RMSEmetric(evidence, predicted)
evidence |
The true values. |
predicted |
The predicted values. |
The root mean squared error.
evidence <- c(1, 2, 3, 4) predicted <- c(1.1, 2.1, 2.9, 4.2) RMSEmetric(evidence, predicted)evidence <- c(1, 2, 3, 4) predicted <- c(1.1, 2.1, 2.9, 4.2) RMSEmetric(evidence, predicted)
Convert a Study submission to JSON
studySubmissionToJSON(submission, pretty = TRUE)studySubmissionToJSON(submission, pretty = TRUE)
submission |
A submission object returned by |
pretty |
Whether to pretty-print the JSON. Defaults to |
A JSON string.
study <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." existing_studies <- data.frame( studyID = character(0), studyName = character(0), version = character(0), description = character(0), type = character(0), protocolGist = character(0), mappingFunctions = character(0), stringsAsFactors = FALSE ) submission <- buildStudySubmission( study, datasetIDs = "dataset_001", existing_studies = existing_studies ) json <- studySubmissionToJSON(submission) substr(json, 1, 20)study <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." existing_studies <- data.frame( studyID = character(0), studyName = character(0), version = character(0), description = character(0), type = character(0), protocolGist = character(0), mappingFunctions = character(0), stringsAsFactors = FALSE ) submission <- buildStudySubmission( study, datasetIDs = "dataset_001", existing_studies = existing_studies ) json <- studySubmissionToJSON(submission) substr(json, 1, 20)
Submit a Study submission payload to Google Apps Script
submitStudySubmission(submission, url, submittedBy)submitStudySubmission(submission, url, submittedBy)
submission |
A submission object returned by |
url |
Google Apps Script endpoint URL. |
submittedBy |
Submitter email or identifier. |
A list containing request status information and response text.
study <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." existing_studies <- data.frame( studyID = character(0), studyName = character(0), version = character(0), description = character(0), type = character(0), protocolGist = character(0), mappingFunctions = character(0), stringsAsFactors = FALSE ) submission <- buildStudySubmission( study, datasetIDs = "dataset_001", existing_studies = existing_studies ) if (interactive() && curl::has_internet()) { response <- submitStudySubmission( submission, url = "https://script.google.com/macros/s/example/exec", submittedBy = "[email protected]" ) names(response) }study <- BenchmarkStudy$new(name = "example_study") study$description <- "A small example benchmark study." existing_studies <- data.frame( studyID = character(0), studyName = character(0), version = character(0), description = character(0), type = character(0), protocolGist = character(0), mappingFunctions = character(0), stringsAsFactors = FALSE ) submission <- buildStudySubmission( study, datasetIDs = "dataset_001", existing_studies = existing_studies ) if (interactive() && curl::has_internet()) { response <- submitStudySubmission( submission, url = "https://script.google.com/macros/s/example/exec", submittedBy = "[email protected]" ) names(response) }
This helper posts the Trio submission payload to a Google Apps Script endpoint using the same redirect-tolerant approach as the prototype script.
submitTrioSubmission(submission, url, submittedBy, submittedType = "Trio")submitTrioSubmission(submission, url, submittedBy, submittedType = "Trio")
submission |
A submission object returned by |
url |
Google Apps Script endpoint URL. |
submittedBy |
Submitter email or identifier. |
submittedType |
Submission type label. Defaults to |
A list containing request status information and response text.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) submission <- buildTrioSubmission( trio = trio, dataset_args = list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ), task_args = list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ), evidence_task_map = c(class_labels = "class_prediction") ) if (interactive() && curl::has_internet()) { response <- submitTrioSubmission( submission, url = "https://script.google.com/macros/s/example/exec", submittedBy = "[email protected]" ) names(response) }data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) submission <- buildTrioSubmission( trio = trio, dataset_args = list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ), task_args = list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ), evidence_task_map = c(class_labels = "class_prediction") ) if (interactive() && curl::has_internet()) { response <- submitTrioSubmission( submission, url = "https://script.google.com/macros/s/example/exec", submittedBy = "[email protected]" ) names(response) }
Computes the time-dependent AUC for survival analysis.
timeDependentAUCMetric(evidence, predicted)timeDependentAUCMetric(evidence, predicted)
evidence |
The true survival times and event indicators. |
predicted |
The predicted survival times. |
The time-dependent AUC.
# More realistic training dataset (8 patients) evidence <- list( survival::Surv(time = c(5, 10, 15, 20, 25, 30, 35, 40), event = c(1, 1, 0, 1, 0, 1, 1, 0)), # Training survival::Surv(time = c(12, 18, 25, 32), event = c(1, 0, 1, 0)) # Testing ) # Predicted risk scores predicted <- list( c(0.5142118, 0.3902035, 0.9057381, 0.4469696, 0.8360043, 0.7375956, 0.8110551, 0.3881083), # Training predictions c(0.685169729, 0.003948339, 0.832916080, 0.007334147) # Testing predictions ) timeDependentAUCMetric(evidence, predicted)# More realistic training dataset (8 patients) evidence <- list( survival::Surv(time = c(5, 10, 15, 20, 25, 30, 35, 40), event = c(1, 1, 0, 1, 0, 1, 1, 0)), # Training survival::Surv(time = c(12, 18, 25, 32), event = c(1, 0, 1, 0)) # Testing ) # Predicted risk scores predicted <- list( c(0.5142118, 0.3902035, 0.9057381, 0.4469696, 0.8360043, 0.7375956, 0.8110551, 0.3881083), # Training predictions c(0.685169729, 0.003948339, 0.832916080, 0.007334147) # Testing predictions ) timeDependentAUCMetric(evidence, predicted)
An object containing a dataset and methods for evaluating analytical tasks against ground truths for the dataset.
A Trio object
dataThe data
evidenceThe supporting evidence for the data
metricsThe metric for evaluating tasks against the gold standards
cachePathThe path to the data cache
dataSourceThe data repository that the data were retrieved from
dataSourceIDThe dataset ID for dataSource
evidenceSourceThe data repository that the supporting evidence was retrieved from.
evidenceSourceIDThe dataset ID for evidenceSource.
splitIndicesIndices for cross-validation
splitSeedThe seed used to generate the split indices
verboseSet the verbosity of Trio. Defaults to FALSE.
descriptionA description of the dataset.
nameThe name of the Trio object, as defined in Curated Trio Datasets.
new()
Create a Trio object
Trio$new( datasetID = NULL, data = NULL, dataLoader = NULL, evidenceID = NULL, evidence = NULL, evidenceColumns = NULL, evidenceLoader = NULL, task = NULL, metrics = NULL, cachePath = FALSE, verbose = FALSE, description = NULL, name = NULL )
datasetIDA string specifying a dataset, either a name from curated-trio-data or
a format string of the form source:source_id.
dataAn object to use as the Trio dataset.
dataLoaderA custom loading function that takes the path of a downloaded file and returns a single dataset, ready to be used in evaluation tasks.
evidenceIDIf datasetID is not an ID from the curated trio datasets spreadsheet,
then a format string of the form source:source_id indicating the file to
obtain the supporting evidence from.
evidenceA named list of lists. The top-level list is named by task type. The lower-level list
is of length-two and named "evidence" and "metrics". The "evidence"
component has supporting evidence and the "metrics" component has a character vector
of metric names (corresponding to the names of the list provided to the metrics parameter).
evidenceColumnsIf specified, extract supporting evidence from columns in the loaded dataset.
evidenceLoaderAlternative to evidence and evidenceColumns. Extract the evidence in a flexible way.
taskIf evidenceColumns or evidenceLoader specified, a character vector of length 1 naming the task the evidence is for.
metricsA named list of metric functions.
cachePathThe path to the data cache
verboseSet the verbosity of Trio. Defaults to FALSE.
descriptionA description of the dataset.
nameThe name of the Trio object, as defined in Curated Trio Datasets.
addEvidence()
Add supporting evidence to the Trio.
Trio$addEvidence(name, evidence, metrics, args = NULL)
nameA string specifying the name of the supporting evidence.
evidenceThe supporting evidence. An object to be compared or a function to be run on the data.
metricsA list of one or more metrics names used to compare gs with the input to evaluate.
argsA named list of parameters and values to be passed to the function.
addMetric()
Add a metric to the Trio.
Trio$addMetric(name, metric, args = NULL)
nameA string specifying the name of the metric.
metricThe metric. A function to be run on the input to evaluate to compare it
with the gold standard. Should be of the form f(x, y, ...). Where x
is the "truth" and y is the output to be evaluated. Otherwise input
a wrapper function of the desired metric.
argsA named list of parameters and values to be passed to the function.
getMetrics()
Get metrics by supporting evidence name.
Trio$getMetrics(evidenceName)
evidenceNameA string specifying the name of the supporting evidence.
getEvidence()
Get supporting evidence by name.
Trio$getEvidence(name)
nameA string specifying the name of the supporting evidence.
evaluate()
Evaluate against gold standards
Trio$evaluate(input)
inputA named list of objects to be evaluated against gold standards.
split()
Create cross-validation indices.
Trio$split( y, n_fold = 5L, n_repeat = 1L, stratify = TRUE, seed = 23624482, overwrite = FALSE, ... )
yA variable to use for stratified sampling (e.g. supporting evidence). If stratify is false, a
vector the length of the data.
n_foldNumber of folds. Defaults to 5L.
n_repeatNumber of repeats. Defaults to 1L.
stratifyIf TRUE, uses stratified sampling. Defaults to TRUE.
seedAn integer of lenth 1. Defaults to 23624482, which is the text "BenchHub" in vanity number form.
overwriteIf TRUE, overwrites the current split. Defaults to FALSE.
...Additional arguments passed to splitTools::create_folds.
print()
Print method to display key information about the Trio object.
Trio$print()
writeCTD()
Write the Trio Metadata to Curated Trio Datasets sheet.
Trio$writeCTD( name, email = NULL, githubPat = NULL, description = NULL, figshareUrl = NULL, datasetFileName = NULL, evidenceFileName = NULL, dataType = NULL, skipMd5Check = FALSE )
nameThe name of the dataset to be added.
emailRequired. Email address of the contributor for dataset update notifications.
githubPatOptional GitHub Personal Access Token. If not provided and not set in environment, will prompt user.
descriptionOptional description of the dataset. If not provided and not set, will prompt user.
figshareUrlOptional URL to the Figshare dataset. If not provided, will prompt user.
datasetFileNameOptional name of the dataset file in Figshare. If not provided, will prompt user for selection.
evidenceFileNameOptional name of the evidence file in Figshare. If not provided, will prompt user for selection.
dataTypeOptional type of data. Must be one of: "omics", "clinical", "spatial", "other". If not provided, will prompt user.
skipMd5CheckOptional boolean to skip MD5 verification. Defaults to FALSE.
clone()
The objects of this class are cloneable with this method.
Trio$clone(deep = FALSE)
deepWhether to make a deep clone.
trio <- Trio$new("figshare:26054188/47112109", cachePath = tempdir())trio <- Trio$new("figshare:26054188/47112109", cachePath = tempdir())
Convert a combined Trio submission to JSON
trioSubmissionToJSON(submission, pretty = TRUE)trioSubmissionToJSON(submission, pretty = TRUE)
submission |
A submission object returned by |
pretty |
Whether to pretty-print the JSON. Defaults to |
A JSON string.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) submission <- buildTrioSubmission( trio = trio, dataset_args = list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ), task_args = list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ), evidence_task_map = c(class_labels = "class_prediction") ) json <- trioSubmissionToJSON(submission) substr(json, 1, 20)data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) submission <- buildTrioSubmission( trio = trio, dataset_args = list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ), task_args = list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ), evidence_task_map = c(class_labels = "class_prediction") ) json <- trioSubmissionToJSON(submission) substr(json, 1, 20)
Computes Uno's C-Index for survival analysis.
unoCIndexMetric(evidence, predicted)unoCIndexMetric(evidence, predicted)
evidence |
The true survival times and event indicators. |
predicted |
The predicted survival times. |
Uno's C-Index.
# More realistic training dataset (8 patients) evidence <- list( survival::Surv(time = c(5, 10, 15, 20, 25, 30, 35, 40), event = c(1, 1, 0, 1, 0, 1, 1, 0)), # Training survival::Surv(time = c(12, 18, 25, 32), event = c(1, 0, 1, 0)) # Testing ) # Predicted risk scores predicted <- list( c(0.5142118, 0.3902035, 0.9057381, 0.4469696, 0.8360043, 0.7375956, 0.8110551, 0.3881083), # Training predictions c(0.685169729, 0.003948339, 0.832916080, 0.007334147) # Testing predictions ) unoCIndexMetric(evidence, predicted)# More realistic training dataset (8 patients) evidence <- list( survival::Surv(time = c(5, 10, 15, 20, 25, 30, 35, 40), event = c(1, 1, 0, 1, 0, 1, 1, 0)), # Training survival::Surv(time = c(12, 18, 25, 32), event = c(1, 0, 1, 0)) # Testing ) # Predicted risk scores predicted <- list( c(0.5142118, 0.3902035, 0.9057381, 0.4469696, 0.8360043, 0.7375956, 0.8110551, 0.3881083), # Training predictions c(0.685169729, 0.003948339, 0.832916080, 0.007334147) # Testing predictions ) unoCIndexMetric(evidence, predicted)
This high-level helper provides a writeCTD()-style console workflow for
the new five-table submission path. It collects dataset, task, evidence, and
metric metadata; optionally prepares local files for upload and verifies a
Figshare article; optionally uploads custom metrics to a GitHub gist; and
returns all submission tables ready for review before calling
submitTrioSubmission().
writeSubmission( trio, n_tasks = NULL, dataset_defaults = list(), task_defaults = list(), evidence_defaults = list(), metric_defaults = list(), prepare_files = TRUE, file_args = list(), upload_custom_metrics = FALSE, githubPat = Sys.getenv("GITHUB_PAT"), gistPublic = TRUE, review = TRUE, submit = NULL, url = submission_webapp_url, submittedBy = NULL, build_payload = TRUE, build_json = FALSE )writeSubmission( trio, n_tasks = NULL, dataset_defaults = list(), task_defaults = list(), evidence_defaults = list(), metric_defaults = list(), prepare_files = TRUE, file_args = list(), upload_custom_metrics = FALSE, githubPat = Sys.getenv("GITHUB_PAT"), gistPublic = TRUE, review = TRUE, submit = NULL, url = submission_webapp_url, submittedBy = NULL, build_payload = TRUE, build_json = FALSE )
trio |
A |
n_tasks |
Optional number of tasks to define. If |
dataset_defaults |
Optional named list passed into
|
task_defaults |
Optional named list passed into
|
evidence_defaults |
Optional named list passed into
|
metric_defaults |
Optional named list passed into
|
prepare_files |
Logical; if |
file_args |
Optional named list passed into
|
upload_custom_metrics |
Logical; if |
githubPat |
Optional GitHub personal access token. Defaults to the
current |
gistPublic |
Logical; whether any created gist should be public. |
review |
Logical; if |
submit |
Logical; if |
url |
Optional Google Apps Script endpoint URL used when
|
submittedBy |
Optional submitter email or identifier used when
|
build_payload |
Logical; if |
build_json |
Logical; if |
A named list containing collected arguments, optional prepared file
and metric metadata, the final submission, and optional payload/json.
data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) result <- writeSubmission( trio = trio, n_tasks = 1, dataset_defaults = list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ), task_defaults = list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ), evidence_defaults = list( taskName = "class_prediction", evidenceType = "experimental_ground_truth" ), metric_defaults = list(metricType = "label_based"), prepare_files = FALSE, build_json = FALSE, review = FALSE ) names(result)data <- data.frame(feature = c(1, 2, 3), row.names = paste0("sample", 1:3)) labels <- factor(c("A", "B", "A")) names(labels) <- rownames(data) trio <- Trio$new( data = data, evidence = list(class_labels = list( evidence = labels, metrics = "macroF1Metric" )), metrics = list(macroF1Metric = macroF1Metric), name = "example_dataset", description = "A small example dataset." ) result <- writeSubmission( trio = trio, n_tasks = 1, dataset_defaults = list( dataType = "omics", dataModality = "transcriptomics", technology = "RNA-seq", tissue = "blood", status = "healthy" ), task_defaults = list( taskStage = "prediction", taskType = "classification", taskName = "class_prediction" ), evidence_defaults = list( taskName = "class_prediction", evidenceType = "experimental_ground_truth" ), metric_defaults = list(metricType = "label_based"), prepare_files = FALSE, build_json = FALSE, review = FALSE ) names(result)