pipeComp is a simple framework to facilitate the comparison of pipelines involving various steps and parameters. It was initially developed to benchmark single-cell RNA sequencing (scRNAseq) pipelines:
pipeComp, a general framework for the evaluation of computational
pipelines, reveals performant single-cell RNA-seq preprocessing
tools
Pierre-Luc Germain, Anthony Sonrel & Mark D
Robinson, Genome Biology 2020, doi: 10.1186/s13059-020-02136-7.
However the framework can be applied to any other context. This
vignette introduces the package and framework; for information
specifically about the scRNAseq pipeline and evaluation metrics, see the
pipeComp_scRNA vignette. For a
completely different example, with walkthrough the creating of a new
PipelineDefinition
, see the pipeComp_dea vignette.
Install using:
Because pipeComp was meant as a general pipeline
benchmarking framework, we have tried to restrict the package’s
dependencies to a minimum. To use the scRNA-seq pipeline and wrappers,
however, requires further packages to be installed. To check whether
these dependencies are met for a given PipelineDefinition
and set of alternatives, see ?checkPipelinePackages
.
pipeComp is built around the S4 class
PipelineDefinition
which defines a basic abstract workflow
with different steps (which can have any number of parameters, including
subroutines), as depicted in Figure 1A. When a pipeline is executed,
each step is consecutively applied to a starting object. Importantly,
the step functions are not related to specific methods, but
should be generic, whereas the methods to be compared are passed as
parameters (e.g. as the name of wrapper functions defined
separately from the pipeline). Often, the step function will simply
apply the wrapper to the object and do no more. In addition, each step
can optionally have evaluation functions, which take the output of the
step as an input, and outputs evaluation metrics. Finally, functions can
be specified to aggregate these metrics across multiple datasets (in
case the evalation output isn’t a data.frame or list of
data.frames).
To run a benchmark one in addition needs a set of alternative values to the parameters. For example, alternative values for the parameters depicted in Figure 1 could be defined as follows:
library(pipeComp)
alternatives <- list(
par1=c("function_A", "function_B"),
par2=1:3,
par3=TRUE,
# ...
parN=c(10,25,50)
)
Each parameter (slot of the list) can take any number of alternative
scalar values (e.g. character, numeric, logical). The name of functions,
like function_A
and function_B
(which should
be loaded in the environment), can be passed if the
PipelineDefinition
allows it. This means that wrappers
should be created for distinct methods, and the name of the wrapper
functions passed as alternative parameter values.
Given a PipelineDefinition
, a list of alternative
parameters and a list benchmark datasets, the runPipeline
(Figure 1B) function proceeds through all combinations arguments,
avoiding recomputing the same step (with the same parameters) twice and
compiling evaluations on the fly to avoid storing potentially large
intermediate data:
Aggregated evaluation metrics for each combination of parameters, along with computing times, are returned as s with the following structure:
In addition to the (aggregated) output returned by the function,
runPipeline
will produce at least one RDS file (saved
according to the output.prefix
argument) per dataset: the
*.evaluation.rds
contain the (non-aggregated) evaluation
results at each step, while the .endOutputs.rds
files
(assuming saveEndResults=TRUE
) contain the final output of
each combination (i.e. the output of the final step).
The pipeComp package includes a
PipelineDefinition
for single-cell RNA sequencing
(scRNAseq) data. For more information about this application and
examples of real outputs, see the pipeComp_scRNA vignette.
Rather than running all possible combinations of parameters, one can
run only a subset of them through the comb
parameter of
runPipeline
. The parameter accepts either a matrix (of
argument indices) or data.frame (of factors) which can be built
manually, but the simplest way is to first create all combinations, and
then get rid of the undesired ones:
## par1 par2 par3 parN
## 1 function_A 1 TRUE 10
## 2 function_A 1 TRUE 25
## 3 function_A 1 TRUE 50
## 4 function_A 2 TRUE 10
## 5 function_A 2 TRUE 25
## 6 function_A 2 TRUE 50
And then we could remove some combinations before passing the
argument to runPipeline
:
The PipelineDefinition
object is, minimally, a set of
functions consecutively executed on the output of the previous one, and
optionally accompanied by evaluation and aggregation functions. A simple
pipeline can be constructed as follows:
my_pip <- PipelineDefinition( list(
step1=function(x, param1){
# do something with x and param1
x
},
step2=function(x, method1, param2){
get(method1)(x, param2) # apply method1 to x, with param2
},
step3=function(x, param3){
x <- some_fancy_function(x, param3)
# the functions can also output evaluation through the `intermediate_return` slot:
e <- my_evaluation_function(x)
list( x=x, intermediate_return=e)
}
))
my_pip
## A PipelineDefinition object with the following steps:
## - step1(x, param1)
## - step2(x, method1, param2)
## - step3(x, param3)
The PipelineDefinition
can also include descriptions of
each step or evaluation and aggregation functions. For example:
my_pip <- PipelineDefinition(
list( step1=function(x, meth1){ get(meth1)(x) },
step2=function(x, meth2){ get(meth2)(x) } ),
evaluation=list( step2=function(x) c(mean=mean(x), max=max(x)) ),
description=list( step1="This steps applies meth1 to x.",
step2="This steps applies meth2 to x.")
)
my_pip
## A PipelineDefinition object with the following steps:
## - step1(x, meth1)
## This steps applies meth1 to x.
## - step2(x, meth2) *
## This steps applies meth2 to x.
Running it with dummy data and functions:
datasets <- list( ds1=1:3, ds2=c(5,10,15))
alternatives <- list(meth1=c("log","sqrt"), meth2="cumsum")
tmpdir1 <- paste0(tempdir(),"/")
res <- runPipeline(datasets, alternatives, my_pip, output.prefix=tmpdir1)
## Running 2 pipeline settings on 2 datasets using 1 threads
##
## Finished running on all datasets
##
## Aggregating results...
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Aggregating evaluation results for: step2
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## dataset meth1 meth2 mean max
## 1 ds1 log cumsum 0.8283022 1.791759
## 2 ds1 sqrt cumsum 2.5201593 4.146264
## 3 ds2 log cumsum 4.0471780 6.620073
## 4 ds2 sqrt cumsum 5.6352475 9.271329
Computing times can be accessed through res$elapsed
;
they can either be accessed as the pipeline total for each combination,
or in a step-wise fashion. They can also be plotted using
plotElapsed
(see below). Evaluation results can be accessed
through res$evaluation
and plotted with
evalHeatmap
(see below).
A number of generic methods are implemented for
PipelineDefinition
objects, including show
,
names
, length
, [
,
as.list
. This means that, for instance, a step can be
removed from a pipeline in the following way:
## A PipelineDefinition object with the following steps:
## - step2(x, meth2) *
## This steps applies meth2 to x.
Steps can be added using the addPipelineStep
function:
## A PipelineDefinition object with the following steps:
## - step1(x, meth1)
## This steps applies meth1 to x.
## - newstep(x)
## - step2(x, meth2) *
## This steps applies meth2 to x.
Functions for the new step can be specified through the
slots
argument of addPipelineStep
or
afterwards through stepFn
:
## A PipelineDefinition object with the following steps:
## - step1(x, meth1)
## This steps applies meth1 to x.
## - newstep(x, newparam)
## - step2(x, meth2) *
## This steps applies meth2 to x.
Finally, the arguments()
method can be used to extract
the arguments for each step, and the defaultArguments
methods can be used to get or set the default arguments.
The previous call to runPipeline
produced one evaluation
file for each dataset:
## [1] "res.ds1.evaluation.rds" "res.ds2.evaluation.rds"
Their aggregation could be repeated manually by reading those files
and calling aggregatePipelineResults
:
ds <- list.files(tmpdir1, pattern="evaluation\\.rds", full.names = TRUE)
names(ds) <- basename(ds)
res1 <- readPipelineResults(resfiles=ds)
res <- aggregatePipelineResults(res1)
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Aggregating evaluation results for: step2
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
In addition, the results of different calls to
runPipeline
(with partially different datasets and/or
parameter combinations) can be merged together, provided that the
PipelineDefinition
is the same. To illustrate this, we
first make another runPipeline
call using slightly
different alternative parameter values:
alternatives <- list(meth1=c("log2","sqrt"), meth2="cumsum")
tmpdir2 <- paste0(tempdir(),"/")
res <- runPipeline(datasets, alternatives, my_pip, output.prefix=tmpdir2)
## Running 2 pipeline settings on 2 datasets using 1 threads
##
## Finished running on all datasets
##
## Aggregating results...
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Aggregating evaluation results for: step2
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
We then load the (non-aggregated) results of each run from the files,
and merge them using mergePipelineResults
:
res1 <- readPipelineResults(tmpdir1)
res2 <- readPipelineResults(tmpdir2)
res <- mergePipelineResults(res1,res2)
## $res1
## ds1 ds2
## 2 2
##
## $res2
## ds1 ds2
## 2 2
##
## $merged
## ds1 ds2
## 2 2
We can then aggregate the results as we do for a single run:
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Aggregating evaluation results for: step2
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## dataset meth1 meth2 mean max
## 1 ds1 log2 cumsum 1.194988 2.584963
## 2 ds1 sqrt cumsum 2.520159 4.146264
## 3 ds2 log2 cumsum 5.838844 9.550747
## 4 ds2 sqrt cumsum 5.635248 9.271329
Errors typically come from the method wrappers, and debugging them
from within runPipeline
can be complicated by
multithreading. For this reason runPipeline
wraps
additional information around error messages. This would be an example
of the error report:
Error in dataset `mydataset` with parameters:
filt=lenient, norm=norm.test, k=20
in step `normalization`, evaluating command:
`pipDef[["normalization"]](x = x, norm = "norm.test")`
Error:
Error in get(norm)(x): not enough cells!
The offending dataset, pipeline step and exact set of parameters is reported, enabling you to try to repeat the error outside, where it is easier to debug.
A useful behavior is for runPipeline
to continue when
encountering errors, so that successful results are not lost. This
behavior can be enabled with the skipErrors=TRUE
argument.
We can illustrate this with our above dummy pipeline, by adding an
alternative that is bound to give rise to an error:
fragile.fun <- function(x){
if(any(x>10)) stop("Too big!")
2^x
}
alternatives$meth1 <- c(alternatives$meth1, "fragile.fun")
res <- runPipeline(datasets, alternatives, my_pip, output.prefix=tmpdir1, skipErrors=TRUE)
## Running 3 pipeline settings on 2 datasets using 1 threads
## Error in dataset `ds2` with parameters:
## meth1=fragile.fun, meth2=cumsum
## in step `step1`, evaluating command:
## `pipDef[["step1"]](x = x, meth1 = "fragile.fun")`
## Error:
## Error in get(meth1)(x): Too big!
##
## Finished running on all datasets
##
## Some errors were encountered during the run:
## dataset step
## 1 ds2 meth1=fragile.fun
## All results were saved, but only those of analyses successful across all datasets will be retained for aggregation.
##
## Aggregating results...
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Aggregating evaluation results for: step2
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(y[[i]]): 'as.is' should be specified by the
## caller; using TRUE
As indicated in the error message, only the analyses successful across all datasets were aggregated:
## dataset meth1 meth2 mean max
## 1 ds1 log2 cumsum 1.194988 2.584963
## 2 ds1 sqrt cumsum 2.520159 4.146264
## 3 ds2 log2 cumsum 5.838844 9.550747
## 4 ds2 sqrt cumsum 5.635248 9.271329
However, all successful results (e.g. including ‘fragile.fun’ on
‘ds1’) were saved in the non-aggregated files. This means that one can
then correct the problematic function, re-run only the analyses that
failed (i.e. only on ‘ds2’ in this case), and merge the results with the
first batch (see ?mergePipelineResults
).
pipeComp has two general plotting functions which can be
used with any PipelineDefinition: evalHeatmap
and
plotElapsed
. In addition, the package includes
pipeline-specific plotting functions (see the pipeComp_scRNA and pipeComp_dea vignettes).
Running times can be plotted with the plotElapsed
function:
In this case we show all tested combinations, but in most cases the
agg.by
argument will be used to aggregate the results by
parameters of interest.
Evaluation results can be plotted through the
evalHeatmap
function:
By default, the heatmap prints the actual metric values, while the
colors is mapped to scaled data. Specifically, pipeComp
map
colors to the signed square-root of the number of (matrix-wise) Median
Absolute Deviations from the (column-wise) median. We found this mapping
to be particularly useful because differences in color have comparable
meaning across datasets, and the color-scale is not primarily capturing
outliers or baseline differences between datasets. This behavior can be
changed with the scale
parameter of the function. For more
complex examples of using evalHeatmap
, see its usage in the
pipeComp_scRNA vignette, or refer to
the function’s help.
For a more complex, ‘real-life’ example of the creation of a
PipelineDefinition
, see the pipeComp_dea vignette. For a complex
example of evaluation outputs, see the pipeComp_scRNA vignette.