This guide describes lute
’s generics, methods, and
classes for algorithms, including deconvolution and marker selection
algorithms. This software and the method to rescale on cell
type-specific sizes is detailed in the manuscript Maden et al. (2024). This may be useful to
algorithm developers and researchers interested in conducting systematic
algorithm benchmarks.
The class structure used by lute
is based on the bluster
R/Bioconductor package. It expands on that class structure by defining a
Many algorithms are maintained and versioned in GitHub or Zenodo rather than a routinely versioned repository such as Bioconductor or CRAN. This can prove an obstacle when tracing package development and attempting comprehensive benchmarks, as software that is not actively maintained can become deprecated over time, and not all software will use compatible dependency versions (Maden et al. (2023)).
classes can help to (1.) encourage use of common
Bioconductor object classes (e.g. SummarizedExperiment
, DelayedArray
, etc.) and
(2.) to use more standard inputs and outputs to encourage code reuse,
discourage duplicated efforts, and enable more rapid and exhaustive
In a general sense, the class hierarchy is a wrapper allowing access to many algorithms using a single function and shared methods. However, it is possible to share data reformatting and preprocessing tasks, making the hierarchy more effectively similar to a workflow.
Topmost parameter class for cell type gene markers. This is used to manage the marker IDs.
This is the parent class for all deconvolution algorithm param
objects. The deconvolutionParam
class is minimal, and
simply defines slots for bulkExpression
, or a matrix of
bulk expression data, and returnInfo
, a logical value
indicating whether the default algorithm output will be stored and
returned with standard output from running the
method on a valid algorithm param
As shown in the class hierarchy diagram (above),
is a parent subclass inheriting
attributes from deconvolutionParam
. It is meant to contain
and manage all tasks shared by reference-based deconvolution algorithms,
or algorithms that utilize a cell type summary dataset. This is to be
distinguished from reference-free algorithms.
This param class adds slots for referenceExpression
, the
cell type reference data, and cellScaleFactors
, an optional
vector of cell type size factors used to transform the reference.
This class is a subset of referencebasedParam
specifying explicit samples used separately, such as for discrete
training and test stages.
This param class adds a slot called
, which is for a dataset of bulk
samples independent from samples specified in the
provides a number of helper functions used to make
the algorithm classes work. These include the parent classes and
subclasses, and several functions to convert between object classes.
These helper functions may be useful to developers. The following table
indicates the functions and a short summary of what they do.
function_name | description |
referenceFromSingleCellExperiment() |
Makes the Z cell atlas reference from a SingleCellExperiment. |
eset_to_sce() |
Convert ExpressionSet to SingleCellExperiment. |
sce_to_eset() |
Convert SingleCellExperiment to ExpressionSet |
se_to_eset() |
Convert SummarizedExperiment to ExpressionSet. |
get_eset_from_matrix() |
Makes an ExpressionSet from a matrix. |
parseDeconvolutionPredictionsResults() |
Gets formatted predicted cell type proportions table from deconvolution results list. |
show() |
Method to inspect and summarize param object contents. |
deconvolution() |
Method to perform deconvolution with a param object. |
typemarkers() |
Method to get cell type markers with a param object. |
deconvolutionParam() |
Defines the principal parent class for all deconvolution method parameters. |
referencebasedParam() |
Class and methods for managing reference-based deconvolution methods. |
independentbulkParam() |
Class and methods for managing methods requiring independent bulk samples. |
typemarkersParam() |
Main constructor for class to manage mappings to the typemarkers() generic. |
The param class findmarkersParam
is defined for the
function findMarkers()
from scran
). This is a function to identify cell
type marker genes from a single-cell or single-nucleus expression
The findmarkersParam
class is organized under its parent
classes as typemarkersParam->findMarkersParam
. It
includes the typemarkers()
method for the identification of
marker genes, and show()
for inspecting the param
The following images annotate the constructor function and the
generic defined for the
The param class nnlsParam
is defined for the function
from the nnls
R/CRAN package (see
). Non-negative least squares (NNLS) is commonly
used for deconvolution.
The nnlsParam
class is organized under its parent
classes as
It includes the deconvolution()
generic for cell type
deconvolution, and the show()
method for inspecting the
param contents.
The following images annotate the constructor function and the
generic defined for the
The param class bisqueParam
is defined for the function
from the BisqueRNA
R/Bioconductor package (see ?bisqueParam
). The Bisque
algorithm adjusts on assay-specific biases arising between the bulk and
single-cell or single-nucleus platforms used to generate expression
datasets for deconvolution.
The bisqueParam
class is organized under its parent
classes as
It includes the deconvolution()
generic for cell type
deconvolution, and the show()
method for inspecting the
param contents.
The following images annotate the constructor function and the
generic defined for the
We demonstrated the extensibility and flexibility of
’s generic, method, and class system by extending
support for additional algorithms beyond the 3 described above.
These algorithms can be used by sourcing the provided R/GitHub packages which pair the classes and functions with YML files for easier dependency management.
The param class meanratiosParam
is defined for the
function get_mean_ratios2()
from the
R/GitHub package at LieberInstitute/DeconvoBuddies.
This function uses the mean of cell type summary ratios to rank and
select for top marker genes.
The meanratiosParam
class is organized under its parent
classes as typemarkersParam->meanratiosParam
. It
includes the typemarkers()
generic for the identification
of marker genes, and the show()
method for inspecting the
param contents.
is available from GitHub at metamaden/meanratiosParam.
The param class deconrnaseqParam
is defined for the
function DeconRNASeq
(see ?deconrnaseqParam
from the DeconRNASeq
R/Bioconductor package (link).
The DeconRNASeq algorithm uses weighted averaged expression between
types to predicted cell type amounts more accurately for heterogeneous
tissues (Gong and Szustakowski
The deconrnaseqParam
class is organized under its parent
classes as
It includes the deconvolution()
generic for cell type
deconvolution, and the show()
method for inspecting the
param contents.
The deconrnaseqParam
class is available from GitHub at
The param class epicParam
is defined for the function
from the EPIC
R/GitHub package (see
). The EPIC algorithm was developed in blood
samples and incorporates cell size mRNA abundance (i.e. cell size) and
variance normalizations (Racle and Gfeller
The epicParam
class is organized under its parent
classes as
It includes the deconvolution()
generic for cell type
deconvolution, and the show()
method for inspecting the
param contents.
The epicParam
class is available from GitHub at metamaden/epicParam
The param class musicParam
is defined for the function
from the MuSiC
R/GitHub package (see ?musicParam
). The MuSiC algorithm
adjusts on between-source variances for reference data from multiple
sources (Wang et al. (2019)).
The musicParam
class is organized under its parent
classes as
It includes the deconvolution()
generic for cell type
deconvolution, and the show()
method for inspecting the
param contents.
The musicParam
class is available from GitHub at metamaden/musicParam
The param class music2Param
is defined for 2
implementations of the MuSiC2
algorithm from the
and MuSiC2
R/GitHub packages,
respectively (see ?music2Param
). The MuSiC2 algorithm pairs
the features of the MuSiC algorithm with an additional filter for marker
genes differentially expressed between cases and controls in the bulk
and expression datasets (Fan et al.
The music2Param
class is organized under its parent
classes as
It includes the deconvolution()
generic for cell type
deconvolution, and the show()
method for inspecting param
The music2Param
class is available from GitHub at metamaden/music2Param
This vignette showed how lute
’s classes and methods are
extensible and modular, and can encourage further development with
standard algorithm I/O and object class management. First, we described
’s algorithm class hierarchy, including how its parent
classes and subclasses manage common tasks shared among algorithms and a
table of functions developers may find useful. Further, we showed the
annotated class and generic functions for algorithms supported by
out of the box. Finally, we detail additional
algorithms supported by R/GitHub packages that may be individually
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/; LAPACK version 3.12.0
## locale:
## time zone: Etc/UTC
## tzcode source: system (glibc)
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
## other attached packages:
## [1] lute_1.3.0 SingleCellExperiment_1.29.1
## [3] SummarizedExperiment_1.37.0 Biobase_2.67.0
## [5] GenomicRanges_1.59.1 GenomeInfoDb_1.43.3
## [7] IRanges_2.41.2 S4Vectors_0.45.2
## [9] BiocGenerics_0.53.5 generics_0.1.3
## [11] MatrixGenerics_1.19.1 matrixStats_1.5.0
## [13] BiocStyle_2.35.0
## loaded via a namespace (and not attached):
## [1] xfun_0.50 bslib_0.8.0 lattice_0.22-6
## [4] vctrs_0.6.5 tools_4.4.2 parallel_4.4.2
## [7] tibble_3.2.1 cluster_2.1.8 pkgconfig_2.0.3
## [10] BiocNeighbors_2.1.2 Matrix_1.7-1 dqrng_0.4.1
## [13] lifecycle_1.0.4 GenomeInfoDbData_1.2.13 compiler_4.4.2
## [16] statmod_1.5.0 bluster_1.17.0 codetools_0.2-20
## [19] htmltools_0.5.8.1 sys_3.4.3 buildtools_1.0.0
## [22] sass_0.4.9 yaml_2.3.10 pillar_1.10.1
## [25] crayon_1.5.3 jquerylib_0.1.4 BiocParallel_1.41.0
## [28] limma_3.63.3 DelayedArray_0.33.4 cachem_1.1.0
## [31] abind_1.4-8 metapod_1.15.0 locfit_1.5-9.10
## [34] tidyselect_1.2.1 rsvd_1.0.5 digest_0.6.37
## [37] BiocSingular_1.23.0 dplyr_1.1.4 maketools_1.3.1
## [40] fastmap_1.2.0 grid_4.4.2 cli_3.6.3
## [43] SparseArray_1.7.4 magrittr_2.0.3 S4Arrays_1.7.1
## [46] edgeR_4.5.1 UCSC.utils_1.3.1 rmarkdown_2.29
## [49] XVector_0.47.2 httr_1.4.7 igraph_2.1.3
## [52] scran_1.35.0 ScaledMatrix_1.15.0 beachmat_2.23.6
## [55] evaluate_1.0.3 knitr_1.49 irlba_2.3.5.1
## [58] rlang_1.1.5 Rcpp_1.0.14 scuttle_1.17.0
## [61] glue_1.8.0 BiocManager_1.30.25 jsonlite_1.8.9
## [64] R6_2.5.1