The BenchmarkStudy object is designed to encapsulate all
necessary components in a benchmarking study, including the data and
functions associated. It provides a unified structure for benchmark
developers to share their work and for method developers to interact
with an existing benchmark study.
This vignette provides a guide for both use cases under the current BenchHub submission workflow.
This section demonstrates how to create a BenchmarkStudy
object from a benchmarking study.
We begin by creating an empty BenchmarkStudy object.
A mapping function is a helper function that processes method output
into a format that can then be compared with the supporting evidence
stored in the reference Trio. There are three ways to
contribute the mapping function:
In this toy spatial transcriptomics example, the Trio
contains the following supporting evidence:
annotated_domaincelltype_proportionsWe therefore define two mapping functions that extract those objects from a method result.
Example 1: extract predicted spatial domains.
# Define the mapping function
extract_domains <- function(result) {
if (is.data.frame(result) && "annotated_domain" %in% colnames(result)) {
return(result$annotated_domain)
}
if (is.list(result) && "annotated_domain" %in% names(result)) {
return(result$annotated_domain)
}
stop("Could not find 'annotated_domain' in the method output.")
}
# Add the mapping function
study$addMappingFunction(
name = "annotated_domain",
func = extract_domains,
inputDescription = "Method output containing one predicted spatial domain label per spot.",
outputDescription = "A vector of predicted spatial domain labels aligned to spots.",
exampleUsage = paste(
"## Minimal example",
"#result <- list(annotated_domain = c('domain_1', 'domain_1', 'domain_2', 'domain_2'))",
"#res <- study$runMapping('annotated_domain', result)",
"#head(res)",
sep = "\n"
)
)Example 2: extract predicted cell type proportions.
# Define the mapping function
extract_celltype_props <- function(result) {
if (is.data.frame(result) && "celltype_proportions" %in% names(result)) {
return(result$celltype_proportions)
}
if (is.list(result) && "celltype_proportions" %in% names(result)) {
return(result$celltype_proportions)
}
if (is.matrix(result) || is.data.frame(result)) {
mat <- as.matrix(result)
rs <- rowSums(mat)
rs[rs == 0] <- 1
return(mat / rs)
}
stop("Could not extract cell type proportions from the method output.")
}
# Add the mapping function, it is optional but recommended to add example usage
study$addMappingFunction(
name = "celltype_proportions",
func = extract_celltype_props,
inputDescription = "Method output containing cell type proportions per spot.",
outputDescription = "A matrix or data frame of cell type proportions aligned to spots.",
exampleUsage = paste(
"## Minimal example",
"#props <- matrix(c(0.9, 0.1, 0.8, 0.2, 0.2, 0.8, 0.1, 0.9), ncol = 2, byrow = TRUE)",
"#study$runMapping('celltype_proportions', props)",
sep = "\n"
)
)Similar as mapping functions, the protocol function is the full workflow of benchmarking study. There are three ways to contribute the protocol function:
Once the BenchmarkStudy object includes:
the recommended next step is to an interactive console workflow via
interactivePrepareStudySubmission(study).
# Set name and description manually
study <- BenchmarkStudy$new(name = "ST toy study")
study$description <- "Toy spatial transcriptomics study."
interactivePrepareStudySubmission(study)In that interactive workflow, BenchHub will guide you through:
study object,This section illustrates how a method developer can use the benchmark study object created by another user, apply their method, and evaluate its performance.
A BenchmarkStudy object can be downloaded from the
submission database through its studyID.
This returns a populated BenchmarkStudy object. For
example:
loaded_study
loaded_study$name
loaded_study$description
loaded_study$version
length(loaded_study$trios)Inspect the list of available trios, and available mapping functions
Each entry of loaded_study$trios is a Trio
object with supporting evidence that can be used for evaluation.
This study provides mapping functions to process method outputs into a format that can be used for evaluation.
Each mapping function has documentation.
This benchmark study aims to assess predicted spatial domains and cell type proportions.
Suppose the method developer has run a method and obtained predicted domain labels and cell type proportions for each spot.
method_output <- list(
annotated_domain = c("domain_1", "domain_1", "domain_2", "domain_2"),
celltype_proportions = data.frame(
celltype_A = c(0.9, 0.8, 0.2, 0.1),
celltype_B = c(0.1, 0.2, 0.8, 0.9)
)
)The method developer can apply the mapping functions to the method output to generate the objects required for evaluation.
Now we can compare the simulated data against an experimental dataset
using the evaluate function.
The evaluate function is in the format of
study$evaluate(trio_name, list(supporting evidence = output to compare with)).
In the function below, the names in the list correspond to supporting
evidence stored in the reference Trio.
This vignette demonstrated two ways that users can interact with the
BenchmarkStudy framework:
Benchmark developers: create or update a
BenchmarkStudy by adding Trio objects and
optional mapping functions with clear documentation, then prepare and
submit the study through the current Study submission workflow.
Method developers: load an existing BenchmarkStudy
from the submission database, use the Trio objects to
execute benchmarking methods of interest, use the mapping functions to
convert method outputs where needed, and evaluate those outputs against
the study’s supporting evidence using the evaluate()
function.
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scuttle_1.23.1 SingleCellExperiment_1.35.1
## [3] SummarizedExperiment_1.43.0 Biobase_2.73.1
## [5] GenomicRanges_1.65.0 Seqinfo_1.3.0
## [7] IRanges_2.47.2 S4Vectors_0.51.3
## [9] BiocGenerics_0.59.7 generics_0.1.4
## [11] MatrixGenerics_1.25.0 matrixStats_1.5.0
## [13] R6_2.6.1 glmnet_5.0
## [15] Matrix_1.7-5 lubridate_1.9.5
## [17] forcats_1.0.1 stringr_1.6.0
## [19] dplyr_1.2.1 purrr_1.2.2
## [21] readr_2.2.0 tidyr_1.3.2
## [23] tibble_3.3.1 ggplot2_4.0.3
## [25] tidyverse_2.0.0 BenchHub_0.99.15
## [27] BiocStyle_2.41.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 sys_3.4.3 rstudioapi_0.19.0
## [4] jsonlite_2.0.0 shape_1.4.6.1 datawizard_1.3.1
## [7] magrittr_2.0.5 ggstance_0.3.7 farver_2.1.2
## [10] rmarkdown_2.31 fs_2.1.0 vctrs_0.7.3
## [13] base64enc_0.1-6 htmltools_0.5.9 S4Arrays_1.13.0
## [16] curl_7.1.0 broom_1.0.13 cellranger_1.1.0
## [19] SparseArray_1.13.2 Formula_1.2-5 funkyheatmap_0.5.2
## [22] googlesheets4_1.1.2 sass_0.4.10 bslib_0.11.0
## [25] htmlwidgets_1.6.4 plyr_1.8.9 httr2_1.2.2
## [28] cachem_1.1.0 buildtools_1.0.0 lifecycle_1.0.5
## [31] iterators_1.0.14 pkgconfig_2.0.3 fastmap_1.2.0
## [34] rbibutils_2.4.1 digest_0.6.39 colorspace_2.1-2
## [37] patchwork_1.3.2 Hmisc_5.2-6 beachmat_2.29.0
## [40] labeling_0.4.3 timechange_0.4.0 abind_1.4-8
## [43] httr_1.4.8 polyclip_1.10-7 compiler_4.6.0
## [46] gargle_1.6.1 bit64_4.8.2 withr_3.0.3
## [49] htmlTable_2.5.0 S7_0.2.2 backports_1.5.1
## [52] BiocParallel_1.47.0 ggcorrplot_0.1.4.1 performance_0.17.0
## [55] ggforce_0.5.0 MASS_7.3-65 DelayedArray_0.39.3
## [58] rappdirs_0.3.4 ggsci_5.0.0 tools_4.6.0
## [61] foreign_0.8-91 otel_0.2.0 googledrive_2.1.2
## [64] nnet_7.3-20 glue_1.8.1 grid_4.6.0
## [67] checkmate_2.3.4 cluster_2.1.8.2 reshape2_1.4.5
## [70] gtable_0.3.6 tzdb_0.5.0 data.table_1.18.4
## [73] hms_1.1.4 XVector_0.53.0 utf8_1.2.6
## [76] ggrepel_0.9.8 foreach_1.5.2 pillar_1.11.1
## [79] vroom_1.7.1 splines_4.6.0 tweenr_2.0.3
## [82] splitTools_1.0.1 lattice_0.22-9 survival_3.8-6
## [85] bit_4.6.0 dotwhisker_0.8.4 tidyselect_1.2.1
## [88] maketools_1.3.2 knitr_1.51 gridExtra_2.3
## [91] xfun_0.59 stringi_1.8.7 yaml_2.3.12
## [94] evaluate_1.0.5 codetools_0.2-20 BiocManager_1.30.27
## [97] cli_3.6.6 rpart_4.1.27 parameters_0.29.1
## [100] Rdpack_2.6.6 jquerylib_0.1.4 Rcpp_1.1.1-1.1
## [103] survAUC_1.4-0 parallel_4.6.0 assertthat_0.2.1
## [106] bayestestR_0.18.1 marginaleffects_0.32.0 scales_1.4.0
## [109] insight_1.5.1 crayon_1.5.3 rlang_1.2.0
## [112] cowplot_1.2.0