library(BenchHub)

Overview

The BenchmarkStudy object is designed to encapsulate all necessary components in a benchmarking study, including the data and functions associated. It provides a unified structure for benchmark developers to share their work and for method developers to interact with an existing benchmark study.

Benchmark developers can store Trio objects (containing the input data, metrics, and supporting evidence), any mapping functions and distribute a ready-to-use study object.
Method developers can apply their methods to the provided data and evaluate their outputs using the built-in metrics.

This vignette provides a guide for both use cases under the current BenchHub submission workflow.

For Benchmark Developer

This section demonstrates how to create a BenchmarkStudy object from a benchmarking study.

Initialising the Study

We begin by creating an empty BenchmarkStudy object.

study <- BenchmarkStudy$new()

# Download an existing Trio from the submission database
example_trio <- downloadSubmissionTrio("D001", cachePath = tempdir())

example_trio

Define mapping function and protocol function

A mapping function is a helper function that processes method output into a format that can then be compared with the supporting evidence stored in the reference Trio. There are three ways to contribute the mapping function:

Leave blank: if you don’t want to contribute now
Use existing GitHub repository: if your mapping function has been uploaded to the GitHub repository in the published paper
Upload mapping functions stored in the study object
Upload local mapping function to gist: define the mapping function locally

In this toy spatial transcriptomics example, the Trio contains the following supporting evidence:

annotated_domain
celltype_proportions

We therefore define two mapping functions that extract those objects from a method result.

Example 1: extract predicted spatial domains.

# Define the mapping function 
extract_domains <- function(result) {
  if (is.data.frame(result) && "annotated_domain" %in% colnames(result)) {
    return(result$annotated_domain)
  }
  if (is.list(result) && "annotated_domain" %in% names(result)) {
    return(result$annotated_domain)
  }
  stop("Could not find 'annotated_domain' in the method output.")
}

# Add the mapping function
study$addMappingFunction(
  name = "annotated_domain",
  func = extract_domains,
  inputDescription = "Method output containing one predicted spatial domain label per spot.",
  outputDescription = "A vector of predicted spatial domain labels aligned to spots.",
  exampleUsage = paste(
    "## Minimal example",
    "#result <- list(annotated_domain = c('domain_1', 'domain_1', 'domain_2', 'domain_2'))",
    "#res <- study$runMapping('annotated_domain', result)",
    "#head(res)",
    sep = "\n"
  )
)

Example 2: extract predicted cell type proportions.

# Define the mapping function 
extract_celltype_props <- function(result) {
  if (is.data.frame(result) && "celltype_proportions" %in% names(result)) {
    return(result$celltype_proportions)
  }
  if (is.list(result) && "celltype_proportions" %in% names(result)) {
    return(result$celltype_proportions)
  }
  if (is.matrix(result) || is.data.frame(result)) {
    mat <- as.matrix(result)
    rs <- rowSums(mat)
    rs[rs == 0] <- 1
    return(mat / rs)
  }
  stop("Could not extract cell type proportions from the method output.")
}

# Add the mapping function, it is optional but recommended to add example usage 
study$addMappingFunction(
  name = "celltype_proportions",
  func = extract_celltype_props,
  inputDescription = "Method output containing cell type proportions per spot.",
  outputDescription = "A matrix or data frame of cell type proportions aligned to spots.",
  exampleUsage = paste(
    "## Minimal example",
    "#props <- matrix(c(0.9, 0.1, 0.8, 0.2, 0.2, 0.8, 0.1, 0.9), ncol = 2, byrow = TRUE)",
    "#study$runMapping('celltype_proportions', props)",
    sep = "\n"
  )
)

Similar as mapping functions, the protocol function is the full workflow of benchmarking study. There are three ways to contribute the protocol function:

Leave blank: if you don’t want to contribute now
Use existing protocol gist URL: if your protocol function has been uploaded to the GitHub repository in the published paper
Upload local protocol file to gist

Upload Study

Once the BenchmarkStudy object includes:

A study name and description
One or more Trio objects already represented in the submission database
Mapping functions [optional]
Protocol functions [optional]

the recommended next step is to an interactive console workflow via interactivePrepareStudySubmission(study).

# Set name and description manually
study <- BenchmarkStudy$new(name = "ST toy study")
study$description <- "Toy spatial transcriptomics study."

interactivePrepareStudySubmission(study)

In that interactive workflow, BenchHub will guide you through:

selecting or confirming dataset IDs to link to the study,
entering the study description,
optionally providing a protocol gist,
optionally providing a mapping-functions gist or uploading mapping functions already stored in the study object,
reviewing the submission bundle, and
optionally submitting the Study immediately.

For Method Developer

This section illustrates how a method developer can use the benchmark study object created by another user, apply their method, and evaluate its performance.

Loading the Study

A BenchmarkStudy object can be downloaded from the submission database through its studyID.

loaded_study <- downloadSubmissionStudy(studyID = "ST005", cachePath = tempdir())

This returns a populated BenchmarkStudy object. For example:

loaded_study
loaded_study$name
loaded_study$description
loaded_study$version
length(loaded_study$trios)

Inspect the list of available trios, and available mapping functions

Each entry of loaded_study$trios is a Trio object with supporting evidence that can be used for evaluation.

length(loaded_study$trios)

loaded_study$trios[[1]]

This study provides mapping functions to process method outputs into a format that can be used for evaluation.

Each mapping function has documentation.

# list the names of the mapping function
loaded_study$listMappingFunctions()

# choose one to print the documentation 
loaded_study$printMappingFunctionDocumentation("annotated_domain")

Preparing for evaluation

This benchmark study aims to assess predicted spatial domains and cell type proportions.

Suppose the method developer has run a method and obtained predicted domain labels and cell type proportions for each spot.

method_output <- list(
  annotated_domain = c("domain_1", "domain_1", "domain_2", "domain_2"),
  celltype_proportions = data.frame(
    celltype_A = c(0.9, 0.8, 0.2, 0.1),
    celltype_B = c(0.1, 0.2, 0.8, 0.9)
  )
)

The method developer can apply the mapping functions to the method output to generate the objects required for evaluation.

domain_pred <- loaded_study$runMapping("annotated_domain", method_output)
prop_pred <- loaded_study$runMapping("celltype_proportions", method_output)

Evaluate

Now we can compare the simulated data against an experimental dataset using the evaluate function.

The evaluate function is in the format of study$evaluate(trio_name, list(supporting evidence = output to compare with)).

In the function below, the names in the list correspond to supporting evidence stored in the reference Trio.

result <- loaded_study$evaluate(loaded_study$trios[[1]]$name,  # name of the Trio to compare with
  list(
    "annotated_domain" = domain_pred,
    "celltype_proportions" = prop_pred
  ))

result

Summary

This vignette demonstrated two ways that users can interact with the BenchmarkStudy framework:

Benchmark developers: create or update a BenchmarkStudy by adding Trio objects and optional mapping functions with clear documentation, then prepare and submit the study through the current Study submission workflow.
Method developers: load an existing BenchmarkStudy from the submission database, use the Trio objects to execute benchmarking methods of interest, use the mapping functions to convert method outputs where needed, and evaluate those outputs against the study’s supporting evidence using the evaluate() function.

Session Info

sessionInfo()

## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] scuttle_1.23.1              SingleCellExperiment_1.35.1
##  [3] SummarizedExperiment_1.43.0 Biobase_2.73.1             
##  [5] GenomicRanges_1.65.0        Seqinfo_1.3.0              
##  [7] IRanges_2.47.2              S4Vectors_0.51.3           
##  [9] BiocGenerics_0.59.7         generics_0.1.4             
## [11] MatrixGenerics_1.25.0       matrixStats_1.5.0          
## [13] R6_2.6.1                    glmnet_5.0                 
## [15] Matrix_1.7-5                lubridate_1.9.5            
## [17] forcats_1.0.1               stringr_1.6.0              
## [19] dplyr_1.2.1                 purrr_1.2.2                
## [21] readr_2.2.0                 tidyr_1.3.2                
## [23] tibble_3.3.1                ggplot2_4.0.3              
## [25] tidyverse_2.0.0             BenchHub_0.99.15           
## [27] BiocStyle_2.41.0           
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3     sys_3.4.3              rstudioapi_0.19.0     
##   [4] jsonlite_2.0.0         shape_1.4.6.1          datawizard_1.3.1      
##   [7] magrittr_2.0.5         ggstance_0.3.7         farver_2.1.2          
##  [10] rmarkdown_2.31         fs_2.1.0               vctrs_0.7.3           
##  [13] base64enc_0.1-6        htmltools_0.5.9        S4Arrays_1.13.0       
##  [16] curl_7.1.0             broom_1.0.13           cellranger_1.1.0      
##  [19] SparseArray_1.13.2     Formula_1.2-5          funkyheatmap_0.5.2    
##  [22] googlesheets4_1.1.2    sass_0.4.10            bslib_0.11.0          
##  [25] htmlwidgets_1.6.4      plyr_1.8.9             httr2_1.2.2           
##  [28] cachem_1.1.0           buildtools_1.0.0       lifecycle_1.0.5       
##  [31] iterators_1.0.14       pkgconfig_2.0.3        fastmap_1.2.0         
##  [34] rbibutils_2.4.1        digest_0.6.39          colorspace_2.1-2      
##  [37] patchwork_1.3.2        Hmisc_5.2-6            beachmat_2.29.0       
##  [40] labeling_0.4.3         timechange_0.4.0       abind_1.4-8           
##  [43] httr_1.4.8             polyclip_1.10-7        compiler_4.6.0        
##  [46] gargle_1.6.1           bit64_4.8.2            withr_3.0.3           
##  [49] htmlTable_2.5.0        S7_0.2.2               backports_1.5.1       
##  [52] BiocParallel_1.47.0    ggcorrplot_0.1.4.1     performance_0.17.0    
##  [55] ggforce_0.5.0          MASS_7.3-65            DelayedArray_0.39.3   
##  [58] rappdirs_0.3.4         ggsci_5.0.0            tools_4.6.0           
##  [61] foreign_0.8-91         otel_0.2.0             googledrive_2.1.2     
##  [64] nnet_7.3-20            glue_1.8.1             grid_4.6.0            
##  [67] checkmate_2.3.4        cluster_2.1.8.2        reshape2_1.4.5        
##  [70] gtable_0.3.6           tzdb_0.5.0             data.table_1.18.4     
##  [73] hms_1.1.4              XVector_0.53.0         utf8_1.2.6            
##  [76] ggrepel_0.9.8          foreach_1.5.2          pillar_1.11.1         
##  [79] vroom_1.7.1            splines_4.6.0          tweenr_2.0.3          
##  [82] splitTools_1.0.1       lattice_0.22-9         survival_3.8-6        
##  [85] bit_4.6.0              dotwhisker_0.8.4       tidyselect_1.2.1      
##  [88] maketools_1.3.2        knitr_1.51             gridExtra_2.3         
##  [91] xfun_0.59              stringi_1.8.7          yaml_2.3.12           
##  [94] evaluate_1.0.5         codetools_0.2-20       BiocManager_1.30.27   
##  [97] cli_3.6.6              rpart_4.1.27           parameters_0.29.1     
## [100] Rdpack_2.6.6           jquerylib_0.1.4        Rcpp_1.1.1-1.1        
## [103] survAUC_1.4-0          parallel_4.6.0         assertthat_0.2.1      
## [106] bayestestR_0.18.1      marginaleffects_0.32.0 scales_1.4.0          
## [109] insight_1.5.1          crayon_1.5.3           rlang_1.2.0           
## [112] cowplot_1.2.0

5 BenchmarkStudy