Package 'pipeComp' reference manual

Title:	pipeComp pipeline benchmarking framework
Description:	A simple framework to facilitate the comparison of pipelines involving various steps and parameters. The `pipelineDefinition` class represents pipelines as, minimally, a set of functions consecutively executed on the output of the previous one, and optionally accompanied by step-wise evaluation and aggregation functions. Given such an object, a set of alternative parameters/methods, and benchmark datasets, the `runPipeline` function then proceeds through all combinations arguments, avoiding recomputing the same step twice and compiling evaluations on the fly to avoid storing potentially large intermediate data.
Authors:	Pierre-Luc Germain [cre, aut] , Anthony Sonrel [aut] , Mark D. Robinson [aut, fnd]
Maintainer:	Pierre-Luc Germain <[email protected]>
License:	GPL
Version:	1.17.0
Built:	2025-04-02 06:11:15 UTC
Source:	https://github.com/bioc/pipeComp

pipeComp - a framework for pipeline benchmarking

Description

pipeComp is a simple framework to facilitate the comparison of pipelines involving various steps and parameters. It was initially developed to benchmark single-cell RNA sequencing pipelines, and contains pre-defined PipelineDefinitions and functions to that effect, but could be applied to any context. See 'vignette("pipeComp")' for an introduction.

Author(s)

Pierre-Luc Germain [email protected]

Anthony Sonrel [email protected]

Mark D. Robinson [email protected]

addPipelineStep

Description

Add a step to an existing PipelineDefinition

Usage

addPipelineStep(object, name, after = NULL, slots = list())
addPipelineStep(object, name, after = NULL, slots = list())

Arguments

`object`	A `PipelineDefinition`
`name`	The name of the step to add
`after`	The name of the step after which to add the new step. If NULL, will add the step at the beginning of the pipeline.
`slots`	A optional named list with slots to fill for that step (i.e. 'functions', 'evaluation', 'aggregation', 'descriptions' - will be parsed)

Value

A PipelineDefinition

Examples

pd <- mockPipeline()
pd
pd <- addPipelineStep(pd, name="newstep", after="step1", 
                      slots=list(description="Step that does nothing..."))
pd
pd <- mockPipeline()
pd
pd <- addPipelineStep(pd, name="newstep", after="step1", 
                      slots=list(description="Step that does nothing..."))
pd

aggregatePipelineResults

Description

Aggregates the evaluation and running times of 'runPipeline' results. Results should be indicated either as a 'path“ prefix or as a vector of paths to 'evaluation\.rds' files ('resfiles').

Usage

aggregatePipelineResults(res, pipDef = NULL)
aggregatePipelineResults(res, pipDef = NULL)

Arguments

`res`	A (named) list of results (per dataset), as produced by `readPipelineResults` (or 'mergePipelineResults').
`pipDef`	An optional `PipelineDefinition` containing the aggregation methods. If omitted, that from the results will be used.

Value

A list with a slot for each step for which there is an aggregation method, or (if no aggregation method available) a list of the 'stepIntermediateReturnObjects' of 'runPipeline'

Examples

# we produce mock pipeline results:
pip <- mockPipeline()
datasets <- list( ds1=1:3, ds2=c(5,10,15) )
tmpdir1 <- paste0(tempdir(),'/')
res <- runPipeline(datasets, pipelineDef=pip, output.prefix=tmpdir1,
                   alternatives=list() )
# we read the evaluation files:
res <- readPipelineResults(tmpdir1)
# we aggregate the results (equivalent to the output of `runPipeline`):
res <- aggregatePipelineResults(res)
# we produce mock pipeline results:
pip <- mockPipeline()
datasets <- list( ds1=1:3, ds2=c(5,10,15) )
tmpdir1 <- paste0(tempdir(),'/')
res <- runPipeline(datasets, pipelineDef=pip, output.prefix=tmpdir1,
                   alternatives=list() )
# we read the evaluation files:
res <- readPipelineResults(tmpdir1)
# we aggregate the results (equivalent to the output of `runPipeline`):
res <- aggregatePipelineResults(res)

buildCombMatrix

Description

Builds a matrix of parameter combinations from a list of alternative values.

Usage

buildCombMatrix(alt, returnIndexMatrix = FALSE)
buildCombMatrix(alt, returnIndexMatrix = FALSE)

Arguments

`alt`	A named list of alternative parameter values
`returnIndexMatrix`	Logical; whether to return a matrix of indices, rather than a data.frame of factors.

Value

a matrix or data.frame

Examples

buildCombMatrix(list(param1=LETTERS[1:3], param2=1:2))
buildCombMatrix(list(param1=LETTERS[1:3], param2=1:2))

checkPipelinePackages

Description

Checks whether the packages required by a pipeline and its alternative methods are available.

Usage

checkPipelinePackages(alternatives, pipDef = NULL)
checkPipelinePackages(alternatives, pipDef = NULL)

Arguments

`alternatives`	A named list of alternative parameter values
`pipDef`	An object of class 'PipelineDefinition'.

Value

Logical.

Examples

checkPipelinePackages(list(argument1="mean"), scrna_pipeline())
checkPipelinePackages(list(argument1="mean"), scrna_pipeline())

Correlations across clustering evaluation metrics

Description

A list of two matrices containing, respectively, the Pearson and Spearman pairwise correlations between various clustering evalution metrics, computed across a wide range of scRNAseq clustering analyses (see reference).

Value

a list.

References

See https://doi.org/10.1101/2020.02.02.930578

colCenterScale

Description

Matrix scaling by centering columns separately and then performing variance scaling on the whole matrix, in a NA-robust fashion. With the default arguments, the output will be the number of (matrix-)median absolute deviations from the column-median.

Usage

colCenterScale(
  x,
  centerFn = median,
  scaleFn = function(x, na.rm) median(abs(x), na.rm = na.rm)
)
colCenterScale(
  x,
  centerFn = median,
  scaleFn = function(x, na.rm) median(abs(x), na.rm = na.rm)
)

Arguments

`x`	A numeric matrix.
`centerFn`	The function for calculating centers. Should accept the 'na.rm' argument. E.g. 'centerFn=mean' or 'centerFn=median'.
`scaleFn`	The function for calculating the (matrix-wise) scaling factor. Should accept the 'na.rm' argument. Default 'median(abs(x))'.

Value

A scaled matrix of the same dimensions as 'x'.

Examples

# random data with column mean differences
d <- cbind(A=rnorm(5, 10, 2), B=rnorm(5, 20, 2), C=rnorm(5,30, 2))
colCenterScale(d)
# random data with column mean differences
d <- cbind(A=rnorm(5, 10, 2), B=rnorm(5, 20, 2), C=rnorm(5,30, 2))
colCenterScale(d)

Lists of control genes

Description

Lists of mouse and human control genes (mitochondrial, ribosomal, protein-coding), as ensembl gene ids or official symbols, for computing cell QC.

Value

a list.

dea_evalPlot_curve

Description

dea_evalPlot_curve

Usage

dea_evalPlot_curve(
  res,
  scales = "free",
  agg.by = NULL,
  agg.fn = mean,
  xlim = c(NA, NA),
  colourBy = "method",
  shapeBy = NULL,
  pointsize = 4
)
dea_evalPlot_curve(
  res,
  scales = "free",
  agg.by = NULL,
  agg.fn = mean,
  xlim = c(NA, NA),
  colourBy = "method",
  shapeBy = NULL,
  pointsize = 4
)

Arguments

`res`	Aggregated results of the DEA pipeline
`scales`	Passed to 'facet_grid'
`agg.by`	Aggregate results by these columns (default no aggregation)
`agg.fn`	Function for aggregation (default mean)
`xlim`	Optional vector of x limits
`colourBy`	Name of column by which to colour
`shapeBy`	Name of column determining the shape of the points. If omitted, the shape will indicate whether the nominal FDR is below or equal the real FDR.
`pointsize`	Size of the points

Value

A ggplot.

Examples

data("exampleDEAresults", package="pipeComp")
dea_evalPlot_curve(exampleDEAresults, agg.by=c("sva.method"))
data("exampleDEAresults", package="pipeComp")
dea_evalPlot_curve(exampleDEAresults, agg.by=c("sva.method"))

dea_pipeline

Description

The 'PipelineDefinition' for bulk RNAseq differential expression analysis (DEA).

Usage

dea_pipeline()
dea_pipeline()

Value

A 'PipelineDefinition' object to be used with 'runPipeline'.

Examples

pip <- dea_pipeline()
pip
pip <- dea_pipeline()
pip

defaultStepAggregation

Description

defaultStepAggregation

Usage

defaultStepAggregation(x)
defaultStepAggregation(x)

Arguments

`x`	A list of results per dataset, each containing a list (1 element per combination of parameters) of evaluation metrics (coercible to vectors or matrix).

Value

A data.frame.

evalHeatmap

Description

General heatmap representation of aggregated evaluation results. By default, the actual metric values are printed in the cells, and while the coloring is determined by colCenterScale (number of matrix-median absolute deviations from the column means). Unless the total number of analyses is small, it is strongly recommended to use the 'agg.by' argument to limit the size and improve the readability of the heatmap.

Usage

evalHeatmap(
  res,
  step = NULL,
  what,
  what2 = NULL,
  agg.by = NULL,
  agg.fn = mean,
  filterExpr = NULL,
  scale = "colCenterScale",
  value_format = "%.2f",
  reorder_rows = FALSE,
  show_heatmap_legend = FALSE,
  show_column_names = FALSE,
  col = NULL,
  font_factor = 0.9,
  row_split = NULL,
  shortNames = TRUE,
  value_cols = c("black", "white"),
  title = NULL,
  name = NULL,
  anno_legend = TRUE,
  ...
)
evalHeatmap(
  res,
  step = NULL,
  what,
  what2 = NULL,
  agg.by = NULL,
  agg.fn = mean,
  filterExpr = NULL,
  scale = "colCenterScale",
  value_format = "%.2f",
  reorder_rows = FALSE,
  show_heatmap_legend = FALSE,
  show_column_names = FALSE,
  col = NULL,
  font_factor = 0.9,
  row_split = NULL,
  shortNames = TRUE,
  value_cols = c("black", "white"),
  title = NULL,
  name = NULL,
  anno_legend = TRUE,
  ...
)

Arguments

`res`	Aggregated pipeline results (i.e. the output of 'runPipeline' or 'aggregateResults')
`step`	Name of the step for which to plot the evaluation results. If unspecified, will use the latest step that has evaluation results.
`what`	What metric to plot.
`what2`	If the step has more than one benchmark data.frame, which one to use. The function will attempt to guess that automatically based on 'what', and will notify in case of ambiguity.
`agg.by`	Aggregate results by these columns (default no aggregation)
`agg.fn`	Function for aggregation (default mean)
`filterExpr`	An optional filtering expression based on the columns of the target dataframe, (e.g. 'filterExpr=param1=="value1"').
`scale`	Controls the scaling of the columns for the color mapping. Can either be a logical (TRUE will use NA-safe column z-scores, FALSE will not scale) or a function performing the scaling. The default uses the 'colCenterScale' function (per-column centering, but per-matrix variance scaling).
`value_format`	Format for displaying cells' values (use 'value_format=""' to disable)
`reorder_rows`	Logical; whether to sort rows (default FALSE). The row names themselves can also be passed to specify an order, or a 'ComplexHeatmap'.
`show_heatmap_legend`	Passed to 'Heatmap' (default FALSE)
`show_column_names`	Passed to 'Heatmap' (default FALSE)
`col`	Colors for the heatmap, or a color-mapping function as produced by 'colorRamp2'. If passing a vector of colors and the data is scaled, there should be an odd number of colors. By default, will apply linear mapping (if the data is not scaled) or signed sqrt mapping (if scaled) on the 'viridisLite::inferno' palette.
`font_factor`	A scaling factor applied to fontsizes (default 1)
`row_split`	Optional column (included in 'agg.by') by which to split the rows. Alternatively, an expression using the columns (retained after aggregation) can be passed.
`shortNames`	Logical; whether to use short row names (with only the parameter values instead of the parameter name and value pairs), default TRUE.
`value_cols`	A vector of length 2 indicating the colors of the values (above and below the mean), if printed
`title`	Plot title
`name`	Heatmap name (e.g. used for the legend)
`anno_legend`	Logical; whether to plot the legend for the datasets
`...`	Passed to 'Heatmap'

Value

A Heatmap

Examples

data("exampleResults", package="pipeComp")
evalHeatmap( exampleResults, step="clustering", what=c("ARI","MI","min_pr"), 
             agg.by=c("filt", "norm"), row_split = "norm" ) +
evalHeatmap( exampleResults, step="clustering", what="ARI", 
             agg.by=c("filt", "norm"), filterExpr=n_clus==true.nbClusts, 
             name="ARI at true k", title="ARI at
true K" )
data("exampleResults", package="pipeComp")
evalHeatmap( exampleResults, step="clustering", what=c("ARI","MI","min_pr"), 
             agg.by=c("filt", "norm"), row_split = "norm" ) +
evalHeatmap( exampleResults, step="clustering", what="ARI", 
             agg.by=c("filt", "norm"), filterExpr=n_clus==true.nbClusts, 
             name="ARI at true k", title="ARI at
true K" )

evaluateClustering

Description

Evaluates a clustering using 'true' labels. Entries with missing true labels (i.e. NA) are excluded from calculations. If using 'evaluteClustering' in a custom pipeline, you might want to use the corresponding 'pipeComp:::.aggregateClusterEvaluation' aggregation function.

Usage

evaluateClustering(x, tl = NULL)
evaluateClustering(x, tl = NULL)

Arguments

`x`	The clustering labels
`tl`	The true labels

Value

A numeric vector of metrics (see the 'pipeComp_scRNA' vignette for details)

Examples

# random data
dat <- data.frame( 
 cluster=rep(LETTERS[1:3], each=10),
 x=c(rnorm(20, 0), rnorm(10, 1)),
 y=c(rnorm(10, 1), rnorm(20, 0))
)
# clustering
dat$predicted <- kmeans(dist(dat[,-1]),3)$cluster
# evaluation
evaluateClustering(dat$predicted, dat$cluster)
# random data
dat <- data.frame( 
 cluster=rep(LETTERS[1:3], each=10),
 x=c(rnorm(20, 0), rnorm(10, 1)),
 y=c(rnorm(10, 1), rnorm(20, 0))
)
# clustering
dat$predicted <- kmeans(dist(dat[,-1]),3)$cluster
# evaluation
evaluateClustering(dat$predicted, dat$cluster)

evaluateDEA

Description

Evaluates a differential expression analysis (DEA).

Usage

evaluateDEA(dea, truth = NULL, th = c(0.01, 0.05, 0.1))
evaluateDEA(dea, truth = NULL, th = c(0.01, 0.05, 0.1))

Arguments

`dea`	Expects a data.frame with logFC and FDR, as produced by 'edgeR::topTags', 'limma::topTable' or 'DESeq2::results'.
`truth`	A data.frame containing the columns 'expected.beta' (real logFC) and 'isDE' (logical indicating whether there is a difference or not; accepts NA values)
`th`	The significance thresholds for which to compute the metrics.

Value

A list with two slots: 'logFC' (vector of metrics on logFC) and 'significance' table of significance-related statistics.

Examples

# fake DEA results
dea <- data.frame( row.names=paste0("gene",1:10), logFC=rnorm(10) )
dea$PValue <- dea$FDR <- c(2:8/100, 0.2, 0.5, 1)
truth <- data.frame( row.names=paste0("gene",1:10), expected.beta=rnorm(10),
                     isDE=rep(c(TRUE,FALSE,TRUE,FALSE), c(3,1,2,4)) )
evaluateDEA(dea, truth)
# fake DEA results
dea <- data.frame( row.names=paste0("gene",1:10), logFC=rnorm(10) )
dea$PValue <- dea$FDR <- c(2:8/100, 0.2, 0.5, 1)
truth <- data.frame( row.names=paste0("gene",1:10), expected.beta=rnorm(10),
                     isDE=rep(c(TRUE,FALSE,TRUE,FALSE), c(3,1,2,4)) )
evaluateDEA(dea, truth)

evaluateDimRed

Description

Gathers evaluation statistics on a reduced space using known cell labels. If using 'evaluteDimRed' in a custom pipeline, you will probably want to use 'pipeComp:::.aggregateDR' as the corresponding aggregation function.

Usage

evaluateDimRed(x, clusters = NULL, n = c(10, 20, 50), covars)
evaluateDimRed(x, clusters = NULL, n = c(10, 20, 50), covars)

Arguments

`x`	The matrix of the reduced space, with cells as rows and components as columns
`clusters`	The vector indicating each cell's cluster.
`n`	A numeric vector indiciating the number of top dimensions at which to gather statistics (default 'c(10,20,50)'). Will use all available dimensions if a higher number is given.
`covars`	A character vectors containing any additional covariates (column names of 'colData') to track during evalutation. If missing, will attempt to use default covariates. To disable, set 'covars=c()'.

Value

A list with the following components: * silhouettes: a matrix of the silhouette for each cell-cluster pair at each value of 'n' * clust.avg.silwidth: a matrix of the cluster average width at each value of 'n' * R2: the proportion of variance in each component (up to 'max(n)') that is explained by the clusters (i.e. R-squared of a linear model).

Examples

# random data
library(scater)
sce <- runPCA(logNormCounts(mockSCE(ngenes = 500)))
sce <- addPerCellQC(sce)
# random population labels
sce$cluster <- sample(LETTERS[1:3], ncol(sce), replace=TRUE)
res <- evaluateDimRed(sce, sce$cluster, covars=c("sum","detected"))
# average silhouette widths:
res$clust.avg.silwidth
# adjusted R2 of covariates:
res$covar.adjR2
# random data
library(scater)
sce <- runPCA(logNormCounts(mockSCE(ngenes = 500)))
sce <- addPerCellQC(sce)
# random population labels
sce$cluster <- sample(LETTERS[1:3], ncol(sce), replace=TRUE)
res <- evaluateDimRed(sce, sce$cluster, covars=c("sum","detected"))
# average silhouette widths:
res$clust.avg.silwidth
# adjusted R2 of covariates:
res$covar.adjR2

evaluateNorm

Description

evaluateNorm

Usage

evaluateNorm(x, clusters = NULL, covars)
evaluateNorm(x, clusters = NULL, covars)

Arguments

`x`	An object of class 'Seurat' or 'SingleCellExperiment' with normalized data
`clusters`	A vector of true cluster identities. If missing, will attempt to fetch it from the 'phenoid' colData.
`covars`	Covariates to include, either as data.frame or as colData columns of 'x'

Value

a data.frame.

Examples

# random data
library(scater)
sce <- logNormCounts(mockSCE(ngenes = 500))
sce <- addPerCellQC(sce)
# random population labels
sce$cluster <- sample(LETTERS[1:3], ncol(sce), replace=TRUE)
evaluateNorm(sce, sce$cluster, covars="detected")
# random data
library(scater)
sce <- logNormCounts(mockSCE(ngenes = 500))
sce <- addPerCellQC(sce)
# random population labels
sce$cluster <- sample(LETTERS[1:3], ncol(sce), replace=TRUE)
evaluateNorm(sce, sce$cluster, covars="detected")

Example results from the DEA pipeline

Description

Example benchmarking results from a DEA pipeline (see vignette 'pipeComp_dea').

Value

a list.

Example pipeline results

Description

Example benchmarking results from a scRNAseq pipeline (see vignette 'pipeComp_scRNA').

Value

a list.

farthestPoint

Description

Identifies the point farthest from a line passing through by the first and last points. Used for automatization of the elbow method.

Usage

farthestPoint(y, x = NULL)
farthestPoint(y, x = NULL)

Arguments

`y`	Monotonically inscreasing or decreasing values
`x`	Optional x coordinates corresponding to 'y' (defaults to seq)

Value

The value of 'x' farthest from the diagonal.

Examples

y <- 2^(10:1)
plot(y)
x <- farthestPoint(y)
points(x,y[x],pch=16)
y <- 2^(10:1)
plot(y)
x <- farthestPoint(y)
points(x,y[x],pch=16)

getDimensionality

Description

Returns the estimated intrinsic dimensionality of a dataset.

Usage

getDimensionality(dat, method, maxDims = NULL)
getDimensionality(dat, method, maxDims = NULL)

Arguments

`dat`	A Seurat or SCE object with a pca embedding.
`method`	The dimensionality method to use.
`maxDims`	Deprecated and ignored.

Value

An integer.

getQualitativePalette

Description

Returns a qualitative color palette of the given size. If less than 23 colors are required, the colors are based on Paul Tol's palettes. If more, the 'randomcoloR' package is used.

Usage

getQualitativePalette(nbcolors)
getQualitativePalette(nbcolors)

Arguments

nbcolors

A positive integer indicating the number of colors

Value

A vector of colors

Examples

getQualitativePalette(5)
getQualitativePalette(5)

match_evaluate_multiple

Description

Function to match cluster labels with 'true' clusters using the Hungarian algorithm, and return precision, recall, and F1 score. Written by Lukas Weber in August 2016 as part of his cytometry clustering comparison, with just slight modifications on initial handling of input arguments.

Usage

match_evaluate_multiple(clus_algorithm, clus_truth = NULL)
match_evaluate_multiple(clus_algorithm, clus_truth = NULL)

Arguments

`clus_algorithm`	cluster labels from algorithm
`clus_truth`	true cluster labels. If NULL, will attempt to read them from the names of 'clus_algorithm' (expecting the format 'clusterName.cellName')

Value

A list.

Examples

# random data
dat <- data.frame( 
 cluster=rep(LETTERS[1:3], each=10),
 x=c(rnorm(20, 0), rnorm(10, 1)),
 y=c(rnorm(10, 1), rnorm(20, 0))
)
# clustering
dat$predicted <- kmeans(dist(dat[,-1]),3)$cluster
# evaluation
match_evaluate_multiple(dat$predicted, dat$cluster)
# random data
dat <- data.frame( 
 cluster=rep(LETTERS[1:3], each=10),
 x=c(rnorm(20, 0), rnorm(10, 1)),
 y=c(rnorm(10, 1), rnorm(20, 0))
)
# clustering
dat$predicted <- kmeans(dist(dat[,-1]),3)$cluster
# evaluation
match_evaluate_multiple(dat$predicted, dat$cluster)

mergePipelineResults

Description

Merges the (non-aggregated) results of any number of runs of 'runPipeline' using the same PipelineDefinition (but on different datasets and/or using different parameters). First read the different sets of results using readPipelineResults, and pass them to this function.

Usage

mergePipelineResults(..., rr = NULL, verbose = TRUE)
mergePipelineResults(..., rr = NULL, verbose = TRUE)

Arguments

`...`	Any number of lists of pipeline results, each as produced by `readPipelineResults`
`rr`	Alternatively, the pipeline results can be passed as a list (in which case '...' is ignored)
`verbose`	Whether to print processing information

Value

A list of merged pipeline results.

Examples

# we produce 2 mock pipeline results:
pip <- mockPipeline()
datasets <- list( ds1=1:3, ds2=c(5,10,15) )
tmpdir1 <- paste0(tempdir(),'/')
res <- runPipeline(datasets, pipelineDef=pip, output.prefix=tmpdir1,
                   alternatives=list() )
alternatives <- list(meth1=c('log2','sqrt'), meth2='cumsum')
tmpdir2 <- paste0(tempdir(),'/')
res <- runPipeline(datasets, alternatives, pip, output.prefix=tmpdir2)
# we read the evaluation files:
res1 <- readPipelineResults(tmpdir1)
res2 <- readPipelineResults(tmpdir2)
# we merge them:
res <- mergePipelineResults(res1,res2)
# and we aggregate:
res <- aggregatePipelineResults(res)
# we produce 2 mock pipeline results:
pip <- mockPipeline()
datasets <- list( ds1=1:3, ds2=c(5,10,15) )
tmpdir1 <- paste0(tempdir(),'/')
res <- runPipeline(datasets, pipelineDef=pip, output.prefix=tmpdir1,
                   alternatives=list() )
alternatives <- list(meth1=c('log2','sqrt'), meth2='cumsum')
tmpdir2 <- paste0(tempdir(),'/')
res <- runPipeline(datasets, alternatives, pip, output.prefix=tmpdir2)
# we read the evaluation files:
res1 <- readPipelineResults(tmpdir1)
res2 <- readPipelineResults(tmpdir2)
# we merge them:
res <- mergePipelineResults(res1,res2)
# and we aggregate:
res <- aggregatePipelineResults(res)

mockPipeline

Description

A mock 'PipelineDefinition' for use in examples.

Usage

mockPipeline()
mockPipeline()

Value

a 'PipelineDefinition'

Examples

mockPipeline()
mockPipeline()

parsePipNames

Description

Parses the names of analyses performed through 'runPipeline' to extract a data.frame of parameter values (with decent classes).

Usage

parsePipNames(x, setRowNames = FALSE, addcolumns = NULL)
parsePipNames(x, setRowNames = FALSE, addcolumns = NULL)

Arguments

`x`	The names to parse, or a data.frame with the names to parse as row.names. All names are expected to contain the same parameters.
`setRowNames`	Logical; whether to set original names as row.names of the output data.frame (default FALSE)
`addcolumns`	An optional data.frame of 'length(x)' rows to cbind to the output.

Value

A data.frame

Examples

my_names <- c("param1=A;param2=5","param1=B;param2=0")
parsePipNames(my_names)
my_names <- c("param1=A;param2=5","param1=B;param2=0")
parsePipNames(my_names)

PipelineDefinition

Description

Creates on object of class 'PipelineDefinition' containing step functions, as well as optionally step evaluation and aggregation functions.

Usage

PipelineDefinition(
  functions,
  descriptions = NULL,
  evaluation = NULL,
  aggregation = NULL,
  initiation = identity,
  defaultArguments = list(),
  misc = list(),
  verbose = TRUE
)
PipelineDefinition(
  functions,
  descriptions = NULL,
  evaluation = NULL,
  aggregation = NULL,
  initiation = identity,
  defaultArguments = list(),
  misc = list(),
  verbose = TRUE
)

Arguments

`functions`	A list of functions for each step
`descriptions`	A list of descriptions for each step
`evaluation`	A list of optional evaluation functions for each step
`aggregation`	A list of optional aggregation functions for each step
`initiation`	A function ran when initiating a dataset
`defaultArguments`	A lsit of optional default arguments
`misc`	A list of whatever.
`verbose`	Whether to output additional warnings (default TRUE).

Value

An object of class 'PipelineDefinition', with the slots functions, descriptions, evaluation, aggregation, defaultArguments, and misc.

Examples

PipelineDefinition(
  list( step1=function(x, meth1){ get(meth1)(x) },
        step2=function(x, meth2){ get(meth2)(x) } )
)
PipelineDefinition(
  list( step1=function(x, meth1){ get(meth1)(x) },
        step2=function(x, meth2){ get(meth2)(x) } )
)

Methods for `PipelineDefinition` class

Description

Methods for PipelineDefinition class

get names of PipelineDefinition steps

set names of PipelineDefinition steps

Usage

## S4 method for signature 'PipelineDefinition'
show(object)

## S4 method for signature 'PipelineDefinition'
names(x)

## S4 replacement method for signature 'PipelineDefinition'
names(x) <- value

## S4 method for signature 'PipelineDefinition'
x$name

## S4 method for signature 'PipelineDefinition'
length(x)

## S4 method for signature 'PipelineDefinition,ANY,ANY,ANY'
x[i]

## S4 method for signature 'PipelineDefinition'
as.list(x)

arguments(object)

## S4 method for signature 'PipelineDefinition'
arguments(object)

defaultArguments(object)

defaultArguments(object) <- value

## S4 method for signature 'PipelineDefinition'
defaultArguments(object)

## S4 replacement method for signature 'PipelineDefinition'
defaultArguments(object) <- value

stepFn(object, step = NULL, type)

## S4 method for signature 'PipelineDefinition'
stepFn(object, step = NULL, type)

stepFn(object, step, type) <- value

## S4 replacement method for signature 'PipelineDefinition'
stepFn(object, step, type) <- value
## S4 method for signature 'PipelineDefinition'
show(object)

## S4 method for signature 'PipelineDefinition'
names(x)

## S4 replacement method for signature 'PipelineDefinition'
names(x) <- value

## S4 method for signature 'PipelineDefinition'
x$name

## S4 method for signature 'PipelineDefinition'
length(x)

## S4 method for signature 'PipelineDefinition,ANY,ANY,ANY'
x[i]

## S4 method for signature 'PipelineDefinition'
as.list(x)

arguments(object)

## S4 method for signature 'PipelineDefinition'
arguments(object)

defaultArguments(object)

defaultArguments(object) <- value

## S4 method for signature 'PipelineDefinition'
defaultArguments(object)

## S4 replacement method for signature 'PipelineDefinition'
defaultArguments(object) <- value

stepFn(object, step = NULL, type)

## S4 method for signature 'PipelineDefinition'
stepFn(object, step = NULL, type)

stepFn(object, step, type) <- value

## S4 replacement method for signature 'PipelineDefinition'
stepFn(object, step, type) <- value

Arguments

`object`	An object of class `PipelineDefinition`
`x`	An object of class `PipelineDefinition`
`value`	Replacement values
`name`	The step name
`i`	The index(es) of the steps
`step`	The name of the step for which to set or get the function
`type`	The type of function to set/get, either 'functions', 'evaluation', 'aggregation', 'descriptions', or 'initiation' (will parse partial matches)

Value

Depends on the method.

Examples

pd <- mockPipeline()
length(pd)
names(pd)
pd$step1
pd[2:1]
pd <- mockPipeline()
length(pd)
names(pd)
pd$step1
pd[2:1]

plotElapsed

Description

Plot total elapsed time per run, split per step.

Usage

plotElapsed(
  res,
  steps = names(res$elapsed$stepwise),
  agg.by,
  agg.fn = mean,
  width = 0.9,
  split.datasets = TRUE,
  return.df = FALSE
)
plotElapsed(
  res,
  steps = names(res$elapsed$stepwise),
  agg.by,
  agg.fn = mean,
  width = 0.9,
  split.datasets = TRUE,
  return.df = FALSE
)

Arguments

`res`	Aggregated pipeline results
`steps`	The step(s) to plot (default all)
`agg.by`	The parameters by which to aggregate (set to FALSE to disable aggregation)
`agg.fn`	Aggregation function
`width`	Width of the bar; default 0.9, use 1 to remove the gaps
`split.datasets`	Logical; whether to split the datasets into facets
`return.df`	Logical; whether to return the data.frame instead of plot

Value

A ggplot, or a data.frame if 'return.df=TRUE'

Examples

data("exampleResults", package="pipeComp")
plotElapsed(exampleResults, agg.by = "norm")

data("exampleResults", package="pipeComp")
plotElapsed(exampleResults, agg.by = "norm")

readPipelineResults

Description

readPipelineResults

Usage

readPipelineResults(path = NULL, resfiles = NULL)
readPipelineResults(path = NULL, resfiles = NULL)

Arguments

`path`	The path (e.g. folder or prefix) to the results. Either 'path' or 'resfiles' should be given.
`resfiles`	A vector of paths to '*.evaluation.rds' files. Either 'path' or 'resfiles' should be given.

Value

A list of results.

Examples

# we produce mock pipeline results:
pip <- mockPipeline()
datasets <- list( ds1=1:3, ds2=c(5,10,15) )
tmpdir1 <- paste0(tempdir(),'/')
res <- runPipeline(datasets, pipelineDef=pip, output.prefix=tmpdir1,
                   alternatives=list() )
# we read the evaluation files:
res <- readPipelineResults(tmpdir1)
# we produce mock pipeline results:
pip <- mockPipeline()
datasets <- list( ds1=1:3, ds2=c(5,10,15) )
tmpdir1 <- paste0(tempdir(),'/')
res <- runPipeline(datasets, pipelineDef=pip, output.prefix=tmpdir1,
                   alternatives=list() )
# we read the evaluation files:
res <- readPipelineResults(tmpdir1)

runPipeline

Description

This function runs a pipeline with combinations of parameter variations on nested steps. The pipeline has to be defined as a list of functions applied consecutively on their respective outputs. See 'examples' for more details.

Usage

runPipeline(
  datasets,
  alternatives,
  pipelineDef,
  comb = NULL,
  output.prefix = "",
  nthreads = 1,
  saveEndResults = TRUE,
  debug = FALSE,
  skipErrors = TRUE,
  ...
)
runPipeline(
  datasets,
  alternatives,
  pipelineDef,
  comb = NULL,
  output.prefix = "",
  nthreads = 1,
  saveEndResults = TRUE,
  debug = FALSE,
  skipErrors = TRUE,
  ...
)

Arguments

`datasets`	A named vector of initial objects or paths to rds files.
`alternatives`	The (named) list of alternative values for each parameter.
`pipelineDef`	An object of class `PipelineDefinition`.
`comb`	An optional matrix of indexes indicating the combination to run. Each column should correspond to an element of 'alternatives', and contain indexes relative to this element. If omitted, all combinations will be performed.
`output.prefix`	An optional prefix for the output files.
`nthreads`	Number of threads, default 1. If the memory requirements are very high or the first steps very long to compute, consider setting this as the number of datasets or below.
`saveEndResults`	Logical; whether to save the output of the last step.
`debug`	Logical (default FALSE). When enabled, disables multithreading and prints extra information.
`skipErrors`	Logical. When enabled, 'runPipeline' will continue even when an error has been encountered, and report the list of steps/datasets in which errors were encountered.
`...`	passed to MulticoreParam. Can for instance be used to set 'makeCluster' arguments, or set 'threshold="TRACE"' when debugging in a multithreaded context.

Value

A SimpleList with elapsed time and the results of the evaluation functions defined by the given 'pipelineDef'.

The results are also stored in the output folder with:

The clustering results for each dataset ('endOutputs.rds' files),
A SimpletList of elapsed time and evaluations for each dataset ('evaluation.rds' files),
A list of the 'pipelineDef', 'alternatives', 'sessionInfo()' and function call used to produce the results ('runPipelineInfo.rds' file),
A copy of the SimpleList returned by the function ('aggregated.rds'file).

Examples

pip <- mockPipeline()
datasets <- list( ds1=1:3, ds2=c(5,10,15) )
tmpdir1 <- paste0(tempdir(),"/")
res <- runPipeline(datasets, pipelineDef=pip, output.prefix=tmpdir1,
                   alternatives=list() )
# See the `pipeComp_scRNA` vignette for a more complex example

pip <- mockPipeline()
datasets <- list( ds1=1:3, ds2=c(5,10,15) )
tmpdir1 <- paste0(tempdir(),"/")
res <- runPipeline(datasets, pipelineDef=pip, output.prefix=tmpdir1,
                   alternatives=list() )
# See the `pipeComp_scRNA` vignette for a more complex example

scrna_describeDatasets

Description

Plots descriptive information about the datasets

Usage

scrna_describeDatasets(sces, pt.size = 0.3, ...)
scrna_describeDatasets(sces, pt.size = 0.3, ...)

Arguments

`sces`	A character vector of paths to SCE rds files, or a list of SCEs
`pt.size`	Point size (for reduced dims)
`...`	Passed to geom_point()

Value

A plot_grid output

scrna_evalPlot_filtering

Description

scrna_evalPlot_filtering

Usage

scrna_evalPlot_filtering(
  res,
  steps = c("doublet", "filtering"),
  clustMetric = "mean_F1",
  filterExpr = TRUE,
  atNearestK = FALSE,
  returnTable = FALSE,
  point.size = 2.2,
  ...
)
scrna_evalPlot_filtering(
  res,
  steps = c("doublet", "filtering"),
  clustMetric = "mean_F1",
  filterExpr = TRUE,
  atNearestK = FALSE,
  returnTable = FALSE,
  point.size = 2.2,
  ...
)

Arguments

`res`	Aggregated pipeline results (i.e. the output of 'runPipeline' or 'aggregateResults')
`steps`	Steps to include (default 'doublet' and 'filtering'); other steps will be averaged.
`clustMetric`	Clustering accuracy metric to use (default 'mean_F1“)
`filterExpr`	An optional filtering expression based on the columns of the clustering evaluation (e.g. 'filterExpr=param1=="value1"' or 'filterExpr=n_clus==true.nbClusts').
`atNearestK`	Logical; whether to restrict analyses to those giving the smallest deviation from the real number of clusters (default FALSE).
`returnTable`	Logical; whether to return the data rather than plot.
`point.size`	Size of the points
`...`	passed to 'geom_point'

Value

A ggplot, or a data.frame if 'returnTable=TRUE'

Examples

data("exampleResults", package="pipeComp")
scrna_evalPlot_filtering(exampleResults)
data("exampleResults", package="pipeComp")
scrna_evalPlot_filtering(exampleResults)

scrna_evalPlot_overall

Description

Plots a multi-level summary heatmap of many analyses of the 'scrna_pipeline'.

Usage

scrna_evalPlot_overall(
  res,
  agg.by = NULL,
  width = NULL,
  datasets_as_columnNames = TRUE,
  rowAnnoColors = NULL,
  column_names_gp = gpar(fontsize = 10),
  column_title_gp = gpar(fontsize = 12),
  heatmap_legend_param = list(by_row = TRUE, direction = "horizontal", nrow = 1),
  ...
)
scrna_evalPlot_overall(
  res,
  agg.by = NULL,
  width = NULL,
  datasets_as_columnNames = TRUE,
  rowAnnoColors = NULL,
  column_names_gp = gpar(fontsize = 10),
  column_title_gp = gpar(fontsize = 12),
  heatmap_legend_param = list(by_row = TRUE, direction = "horizontal", nrow = 1),
  ...
)

Arguments

`res`	Aggregated pipeline results (i.e. the output of 'runPipeline' or 'aggregateResults')
`agg.by`	The paramters by which to aggregate.
`width`	The width of individual heatmap bodies.
`datasets_as_columnNames`	Logical; whether dataset names should be printed below the columns (except for silhouette) rather than using a legend.
`rowAnnoColors`	Optional list of colors for the row annotation variables (passed to 'HeatmapAnnotation(col=...)')
`column_names_gp`	Passed to each calls to 'Heatmap'
`column_title_gp`	Passed to each calls to 'Heatmap'
`heatmap_legend_param`	Passed to each calls to 'Heatmap'
`...`	Passed to each calls to 'Heatmap'

Value

A HeatmapList

Examples

library(ComplexHeatmap)
data("exampleResults")
h <- scrna_evalPlot_overall(exampleResults)
draw(h, heatmap_legend_side="bottom")
library(ComplexHeatmap)
data("exampleResults")
h <- scrna_evalPlot_overall(exampleResults)
draw(h, heatmap_legend_side="bottom")

scrna_evalPlot_silh

Description

Plot a min/max/mean/median silhouette width heatmap from aggregated evaluation results of the 'scrna_pipeline'.

Usage

scrna_evalPlot_silh(
  res,
  what = c("minSilWidth", "meanSilWidth"),
  step = "dimreduction",
  dims = 1,
  agg.by = NULL,
  agg.fn = mean,
  filterExpr = NULL,
  value_format = "",
  reorder_rows = FALSE,
  reorder_columns = TRUE,
  show_heatmap_legend = TRUE,
  show_column_names = FALSE,
  col = rev(RColorBrewer::brewer.pal(n = 11, "RdBu")),
  font_factor = 0.9,
  row_split = NULL,
  shortNames = TRUE,
  value_cols = c("white", "black"),
  title = NULL,
  anno_legend = TRUE,
  ...
)
scrna_evalPlot_silh(
  res,
  what = c("minSilWidth", "meanSilWidth"),
  step = "dimreduction",
  dims = 1,
  agg.by = NULL,
  agg.fn = mean,
  filterExpr = NULL,
  value_format = "",
  reorder_rows = FALSE,
  reorder_columns = TRUE,
  show_heatmap_legend = TRUE,
  show_column_names = FALSE,
  col = rev(RColorBrewer::brewer.pal(n = 11, "RdBu")),
  font_factor = 0.9,
  row_split = NULL,
  shortNames = TRUE,
  value_cols = c("white", "black"),
  title = NULL,
  anno_legend = TRUE,
  ...
)

Arguments

`res`	Aggregated pipeline results (i.e. the output of 'runPipeline' or 'aggregateResults')
`what`	What metric to plot, possible values are “minSilWidth”, “meanSilWidth” (default), “medianSilWidth”, or “maxSilWidth”.
`step`	Name of the step for which to plot the evaluation results. Defaults to "dimreduction".
`dims`	If multiple sets of dimensions are available, which one to use (defaults to the first).
`agg.by`	Aggregate results by these columns (default no aggregation)
`agg.fn`	Function for aggregation (default mean)
`filterExpr`	An optional filtering expression based on the columns of the target dataframe, (e.g. 'filterExpr=param1=="value1"').
`value_format`	Format for displaying cells' values (no label by default)
`reorder_rows`	Whether to sort rows (default FALSE). The row names themselves can also be passed to specify an order, or a 'ComplexHeatmap'.
`reorder_columns`	Whether to sort columns (default TRUE).
`show_heatmap_legend`	Passed to 'Heatmap' (default FALSE)
`show_column_names`	Passed to 'Heatmap' (default FALSE)
`col`	Colors for the heatmap
`font_factor`	A scaling factor applied to fontsizes (default 1)
`row_split`	Optional column (included in 'agg.by') by which to split the rows. Alternatively, an expression using the columns (retained after aggregation) can be passed.
`shortNames`	Logical; whether to use short row names (with only the parameter values instead of the parameter name and value pairs), default TRUE.
`value_cols`	A vector of length 2 indicating the colors of the values (above and below the mean), if printed
`title`	Plot title
`anno_legend`	Logical; whether to plot the legend for the datasets
`...`	Passed to 'Heatmap'

Value

A Heatmap

Examples

data("exampleResults", package="pipeComp")
scrna_evalPlot_silh( exampleResults, agg.by=c("filt","norm"), 
                     row_split="norm" )
data("exampleResults", package="pipeComp")
scrna_evalPlot_silh( exampleResults, agg.by=c("filt","norm"), 
                     row_split="norm" )

scrna_pipeline

Description

The 'PipelineDefinition' for the default scRNAseq clustering pipeline, with steps for doublet identification, filtering, normalization, feature selection, dimensionality reduction, and clustering. Alternative arguments should be character, numeric or logical vectors of length 1 (e.g. the function name for a method, the number of dimensions, etc). The default pipeline has the following steps and arguments:

doublet: 'doubletmethod' (name of the doublet removal function)
filtering: 'filt' (name of the filtering function, or filter string)
normalization: 'norm' (name of the normalization function)
selection: 'sel' (name of the selection function, or variable of rowData on which to select) and 'selnb' (number of features to select)
dimreduction: 'dr' (name of the dimensionality reduction function) and 'maxdim' (maximum number of components to compute)
clustering: 'clustmethod' (name of the clustering function), 'dims' (number of dimensions to use), 'k' (number of nearest neighbors to use, if applicable), 'steps' (number of steps in the random walk, if applicable), 'resolution' (resolution, if applicable), 'min.size' (minimum cluster size, if applicable). If using the 'scrna_alternatives.R' wrappers, the dimensionality can be automatically estimated by specifying 'dims = "method_name"'.

Usage

scrna_pipeline(saveDimRed = FALSE, pipeClass = c("seurat", "sce"))
scrna_pipeline(saveDimRed = FALSE, pipeClass = c("seurat", "sce"))

Arguments

`saveDimRed`	Logical; whether to save the dimensionality reduction for each analysis (default FALSE)
`pipeClass`	'sce' or 'seurat'; which object class to use throughout the pipeline. Note that the 'alternatives' functions have to be built around the chosen class. Given that, if running the 'scrna_alternatives', the class of whole pipeline is determined by the output of the filtering, only this step is affected by this option.

Value

A 'PipelineDefinition' object to be used with 'runPipeline'.

Examples

pip <- scrna_pipeline()
pip
pip <- scrna_pipeline()
pip

Package 'pipeComp'

Help Index

pipeComp - a framework for pipeline benchmarking

Description

Author(s)

addPipelineStep

Description

Usage

Arguments

Value

See Also

Examples

aggregatePipelineResults

Description

Usage

Arguments

Value

Examples

buildCombMatrix

Description

Usage

Arguments

Value

Examples

checkPipelinePackages

Description

Usage

Arguments

Value

Examples

Correlations across clustering evaluation metrics

Description

Value

References

colCenterScale

Description

Usage

Arguments

Value

Examples

Lists of control genes

Description

Value

dea_evalPlot_curve

Description

Usage

Arguments

Value

Examples

dea_pipeline

Description

Usage

Value

Examples

defaultStepAggregation

Description

Usage

Arguments

Value

evalHeatmap

Description

Usage

Arguments

Value

Examples

evaluateClustering

Description

Usage

Arguments

Value

Examples

evaluateDEA

Description

Usage

Arguments

Value

Examples

evaluateDimRed

Description

Usage

Methods for `PipelineDefinition` class