Package 'scater' reference manual

Title:	Single-Cell Analysis Toolkit for Gene Expression Data in R
Description:	A collection of tools for doing various analyses of single-cell RNA-seq gene expression data, with a focus on quality control and visualization.
Authors:	Davis McCarthy [aut], Kieran Campbell [aut], Aaron Lun [aut, ctb], Quin Wills [aut], Vladimir Kiselev [ctb], Felix G.M. Ernst [ctb], Alan O'Callaghan [ctb, cre], Yun Peng [ctb], Leo Lahti [ctb] , Tuomas Borman [ctb]
Maintainer:	Alan O'Callaghan <[email protected]>
License:	GPL-3
Version:	1.35.4
Built:	2025-03-09 18:24:02 UTC
Source:	https://github.com/bioc/scater

Get feature annotation information from Biomart

Description

Use the biomaRt package to add feature annotation information to an SingleCellExperiment.

Usage

annotateBMFeatures(
  ids,
  biomart = "ENSEMBL_MART_ENSEMBL",
  dataset = "mmusculus_gene_ensembl",
  id.type = "ensembl_gene_id",
  symbol.type,
  attributes = c(id.type, symbol.type, "chromosome_name", "gene_biotype",
    "start_position", "end_position"),
  filters = id.type,
  ...
)

getBMFeatureAnnos(x, ids = rownames(x), ...)
annotateBMFeatures(
  ids,
  biomart = "ENSEMBL_MART_ENSEMBL",
  dataset = "mmusculus_gene_ensembl",
  id.type = "ensembl_gene_id",
  symbol.type,
  attributes = c(id.type, symbol.type, "chromosome_name", "gene_biotype",
    "start_position", "end_position"),
  filters = id.type,
  ...
)

getBMFeatureAnnos(x, ids = rownames(x), ...)

Arguments

`ids`	A character vector containing feature identifiers.
`biomart`	String defining the biomaRt to be used, to be passed to `useMart`.
`dataset`	String defining the dataset to use, to be passed to `useMart`.
`id.type`	String specifying the type of identifier in `ids`.
`symbol.type`	String specifying the type of symbol to retrieve. If missing, this is set to `"mgi_symbol"` if `dataset="mmusculus_gene_ensembl"`, or to `"hgnc_symbol"` if `dataset="hsapiens_gene_ensembl"`,
`attributes`	Character vector defining the attributes to pass to `getBM`.
`filters`	String defining the type of identifier in `ids`, to be used as a filter in `getBM`.
`...`	For `annotateBMFeatures`, further named arguments to pass to `biomaRt::useMart`. For `getBMFeatureAnnos`, further arguments to pass to `annotateBMFeatures`.
`x`	A SingleCellExperiment object.

Details

These functions provide convenient wrappers around biomaRt to quickly obtain annotation in the required format.

Value

For annotateBMFeatures, a DataFrame containing feature annotation, with one row per value in ids.

For getBMFeatureAnnos, x is returned containing the output of annotateBMFeatures appended to its rowData.

Author(s)

Aaron Lun, based on code by Davis McCarthy

Examples

## Not run: 
# Making up Ensembl IDs for demonstration purposes.
mock_id <- paste0("ENSMUSG", sprintf("%011d", seq_len(1000)))
anno <- annotateBMFeatures(ids=mock_id)

## End(Not run)
## Not run: 
# Making up Ensembl IDs for demonstration purposes.
mock_id <- paste0("ENSMUSG", sprintf("%011d", seq_len(1000)))
anno <- annotateBMFeatures(ids=mock_id)

## End(Not run)

Accessor and replacement for bootstrap results in a `SingleCellExperiment` object

Description

SingleCellExperiment objects can contain bootstrap expression values (for example, as generated by the kallisto software for quantifying feature abundance). These functions conveniently access and replace the 'bootstrap' elements in the assays slot with the value supplied, which must be an matrix of the correct size, namely the same number of rows and columns as the SingleCellExperiment object as a whole.

Usage

bootstraps(object)

bootstraps(object) <- value

## S4 method for signature 'SingleCellExperiment'
bootstraps(object)

## S4 replacement method for signature 'SingleCellExperiment,array'
bootstraps(object) <- value
bootstraps(object)

bootstraps(object) <- value

## S4 method for signature 'SingleCellExperiment'
bootstraps(object)

## S4 replacement method for signature 'SingleCellExperiment,array'
bootstraps(object) <- value

Arguments

`object`	a `SingleCellExperiment` object.
`value`	an array of class `"numeric"` containing bootstrap expression values

Value

If accessing bootstraps slot of an SingleCellExperiment, then an array with the bootstrap values, otherwise an SingleCellExperiment object containing new bootstrap values.

Author(s)

Davis McCarthy

Examples

example_sce <- mockSCE()
bootstraps(example_sce)

example_sce <- mockSCE()
bootstraps(example_sce)

Perform MDS on cell-level data

Description

Perform multi-dimensional scaling (MDS) on cells, based on the data in a SingleCellExperiment object.

Usage

calculateMDS(x, ...)

## S4 method for signature 'ANY'
calculateMDS(
  x,
  FUN = dist,
  ncomponents = 2,
  ntop = 500,
  subset_row = NULL,
  scale = FALSE,
  transposed = FALSE,
  keep_dist = FALSE,
  ...
)

## S4 method for signature 'SummarizedExperiment'
calculateMDS(x, ..., exprs_values = "logcounts", assay.type = exprs_values)

## S4 method for signature 'SingleCellExperiment'
calculateMDS(
  x,
  ...,
  exprs_values = "logcounts",
  dimred = NULL,
  n_dimred = NULL,
  assay.type = exprs_values
)

runMDS(x, ..., altexp = NULL, name = "MDS")
calculateMDS(x, ...)

## S4 method for signature 'ANY'
calculateMDS(
  x,
  FUN = dist,
  ncomponents = 2,
  ntop = 500,
  subset_row = NULL,
  scale = FALSE,
  transposed = FALSE,
  keep_dist = FALSE,
  ...
)

## S4 method for signature 'SummarizedExperiment'
calculateMDS(x, ..., exprs_values = "logcounts", assay.type = exprs_values)

## S4 method for signature 'SingleCellExperiment'
calculateMDS(
  x,
  ...,
  exprs_values = "logcounts",
  dimred = NULL,
  n_dimred = NULL,
  assay.type = exprs_values
)

runMDS(x, ..., altexp = NULL, name = "MDS")

Arguments

`x`	For `calculateMDS`, a numeric matrix of log-expression values where rows are features and columns are cells. Alternatively, a SummarizedExperiment or SingleCellExperiment containing such a matrix. For `runMDS`, a SingleCellExperiment object.
`...`	For the `calculateMDS` generic, additional arguments to pass to specific methods. For the SummarizedExperiment and SingleCellExperiment methods, additional arguments to pass to the ANY method. For `runMDS`, additional arguments to pass to `calculateMDS`.
`FUN`	A function that accepts a numeric matrix as its first argument, where rows are samples and columns are features; and returns a distance structure such as that returned by `dist` or a full symmetric matrix containing the dissimilarities.
`ncomponents`	Numeric scalar indicating the number of MDS?g dimensions to obtain.
`ntop`	Numeric scalar specifying the number of features with the highest variances to use for dimensionality reduction.
`subset_row`	Vector specifying the subset of features to use for dimensionality reduction. This can be a character vector of row names, an integer vector of row indices or a logical vector.
`scale`	Logical scalar, should the expression values be standardized?
`transposed`	Logical scalar, is `x` transposed with cells in rows?
`keep_dist`	Logical scalar indicating whether the `dist` object calculated by `FUN` should be stored as ‘dist’ attribute of the matrix returned/stored by `calculateMDS` or `runMDS`.
`exprs_values`	Alias to `assay.type`.
`assay.type`	Integer scalar or string indicating which assay of `x` contains the expression values.
`dimred`	String or integer scalar specifying the existing dimensionality reduction results to use.
`n_dimred`	Integer scalar or vector specifying the dimensions to use if `dimred` is specified.
`altexp`	String or integer scalar specifying an alternative experiment containing the input data.
`name`	String specifying the name to be used to store the result in the `reducedDims` of the output.

Details

The function cmdscale is used internally to compute the MDS components with eig = TRUE. The eig and GOF fields of the object returned by cmdscale are stored as attributes “eig” and “GOF” of the MDS matrix calculated.

Value

For calculateMDS, a matrix is returned containing the MDS coordinates for each cell (row) and dimension (column).

For runMDS, a modified x is returned that contains the MDS coordinates in reducedDim(x, name).

Feature selection

This section is relevant if x is a numeric matrix of (log-)expression values with features in rows and cells in columns; or if x is a SingleCellExperiment and dimred=NULL. In the latter, the expression values are obtained from the assay specified by assay.type.

The subset_row argument specifies the features to use for dimensionality reduction. The aim is to allow users to specify highly variable features to improve the signal/noise ratio, or to specify genes in a pathway of interest to focus on particular aspects of heterogeneity.

If subset_row=NULL, the ntop features with the largest variances are used instead. We literally compute the variances from the expression values without considering any mean-variance trend, so often a more considered choice of genes is possible, e.g., with scran functions. Note that the value of ntop is ignored if subset_row is specified.

If scale=TRUE, the expression values for each feature are standardized so that their variance is unity. This will also remove features with standard deviations below 1e-8.

Using reduced dimensions

If x is a SingleCellExperiment, the method can be applied on existing dimensionality reduction results in x by setting the dimred argument. This is typically used to run slower non-linear algorithms (t-SNE, UMAP) on the results of fast linear decompositions (PCA). We might also use this with existing reduced dimensions computed from a priori knowledge (e.g., gene set scores), where further dimensionality reduction could be applied to compress the data.

The matrix of existing reduced dimensions is taken from reducedDim(x, dimred). By default, all dimensions are used to compute the second set of reduced dimensions. If n_dimred is also specified, only the first n_dimred columns are used. Alternatively, n_dimred can be an integer vector specifying the column indices of the dimensions to use.

When dimred is specified, no additional feature selection or standardization is performed. This means that any settings of ntop, subset_row and scale are ignored.

If x is a numeric matrix, setting transposed=TRUE will treat the rows as cells and the columns as the variables/diemnsions. This allows users to manually pass in dimensionality reduction results without needing to wrap them in a SingleCellExperiment. As such, no feature selection or standardization is performed, i.e., ntop, subset_row and scale are ignored.

Using alternative Experiments

This section is relevant if x is a SingleCellExperiment and altexp is not NULL. In such cases, the method is run on data from an alternative SummarizedExperiment nested within x. This is useful for performing dimensionality reduction on other features stored in altExp(x, altexp), e.g., antibody tags.

Setting altexp with assay.type will use the specified assay from the alternative SummarizedExperiment. If the alternative is a SingleCellExperiment, setting dimred will use the specified dimensionality reduction results from the alternative. This option will also interact as expected with n_dimred.

Note that the output is still stored in the reducedDims of the output SingleCellExperiment. It is advisable to use a different name to distinguish this output from the results generated from the main experiment's assay values.

Author(s)

Aaron Lun, based on code by Davis McCarthy

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

example_sce <- runMDS(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

example_sce <- runMDS(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))

Perform NMF on cell-level data

Description

Perform non-negative matrix factorization (NMF) for the cells, based on the data in a SingleCellExperiment object.

Usage

calculateNMF(x, ...)

## S4 method for signature 'ANY'
calculateNMF(
  x,
  ncomponents = 2,
  ntop = 500,
  subset_row = NULL,
  scale = FALSE,
  transposed = FALSE,
  ...
)

## S4 method for signature 'SummarizedExperiment'
calculateNMF(x, ..., exprs_values = "logcounts", assay.type = exprs_values)

## S4 method for signature 'SingleCellExperiment'
calculateNMF(
  x,
  ...,
  exprs_values = "logcounts",
  dimred = NULL,
  n_dimred = NULL,
  assay.type = exprs_values
)

runNMF(x, ..., altexp = NULL, name = "NMF")
calculateNMF(x, ...)

## S4 method for signature 'ANY'
calculateNMF(
  x,
  ncomponents = 2,
  ntop = 500,
  subset_row = NULL,
  scale = FALSE,
  transposed = FALSE,
  ...
)

## S4 method for signature 'SummarizedExperiment'
calculateNMF(x, ..., exprs_values = "logcounts", assay.type = exprs_values)

## S4 method for signature 'SingleCellExperiment'
calculateNMF(
  x,
  ...,
  exprs_values = "logcounts",
  dimred = NULL,
  n_dimred = NULL,
  assay.type = exprs_values
)

runNMF(x, ..., altexp = NULL, name = "NMF")

Arguments

`x`	For `calculateNMF`, a numeric matrix of log-expression values where rows are features and columns are cells. Alternatively, a SummarizedExperiment or SingleCellExperiment containing such a matrix. For `runNMF`, a SingleCellExperiment object.
`...`	For the `calculateNMF` generic, additional arguments to pass to specific methods. For the ANY method, additional arguments to pass to `nmf`. For the SummarizedExperiment and SingleCellExperiment methods, additional arguments to pass to the ANY method. For `runNMF`, additional arguments to pass to `calculateNMF`.
`ncomponents`	Numeric scalar indicating the number of NMF dimensions to obtain.
`ntop`	Numeric scalar specifying the number of features with the highest variances to use for dimensionality reduction.
`subset_row`	Vector specifying the subset of features to use for dimensionality reduction. This can be a character vector of row names, an integer vector of row indices or a logical vector.
`scale`	Logical scalar, should the expression values be standardized?
`transposed`	Logical scalar, is `x` transposed with cells in rows?
`exprs_values`	Alias to `assay.type`.
`assay.type`	Integer scalar or string indicating which assay of `x` contains the expression values.
`dimred`	String or integer scalar specifying the existing dimensionality reduction results to use.
`n_dimred`	Integer scalar or vector specifying the dimensions to use if `dimred` is specified.
`altexp`	String or integer scalar specifying an alternative experiment containing the input data.
`name`	String specifying the name to be used to store the result in the `reducedDims` of the output.

Details

The function nmf is used internally to compute the NMF. Note that the algorithm is not deterministic, so different runs of the function will produce differing results. Users are advised to test multiple random seeds, and then use set.seed to set a random seed for replicable results.

Value

For calculateNMF, a numeric matrix is returned containing the NMF coordinates for each cell (row) and dimension (column).

For runNMF, a modified x is returned that contains the NMF coordinates in reducedDim(x, name).

In both cases, the matrix will have the attribute "basis" containing the gene-by-factor basis matrix.

Feature selection

If scale=TRUE, the expression values for each feature are standardized so that their variance is unity. This will also remove features with standard deviations below 1e-8.

Using reduced dimensions

When dimred is specified, no additional feature selection or standardization is performed. This means that any settings of ntop, subset_row and scale are ignored.

Using alternative Experiments

Author(s)

Aaron Lun

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

example_sce <- runNMF(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

example_sce <- runNMF(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))

Perform PCA on expression data

Description

Perform a principal components analysis (PCA) on cells, based on the expression data in a SingleCellExperiment object.

Usage

calculatePCA(x, ...)

## S4 method for signature 'ANY'
calculatePCA(
  x,
  ncomponents = 50,
  ntop = 500,
  subset_row = NULL,
  scale = FALSE,
  transposed = FALSE,
  BSPARAM = bsparam(),
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
calculatePCA(x, ..., exprs_values = "logcounts", assay.type = exprs_values)

## S4 method for signature 'SingleCellExperiment'
calculatePCA(
  x,
  ...,
  exprs_values = "logcounts",
  dimred = NULL,
  n_dimred = NULL,
  assay.type = exprs_values
)

## S4 method for signature 'SingleCellExperiment'
runPCA(x, ..., altexp = NULL, name = "PCA")
calculatePCA(x, ...)

## S4 method for signature 'ANY'
calculatePCA(
  x,
  ncomponents = 50,
  ntop = 500,
  subset_row = NULL,
  scale = FALSE,
  transposed = FALSE,
  BSPARAM = bsparam(),
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
calculatePCA(x, ..., exprs_values = "logcounts", assay.type = exprs_values)

## S4 method for signature 'SingleCellExperiment'
calculatePCA(
  x,
  ...,
  exprs_values = "logcounts",
  dimred = NULL,
  n_dimred = NULL,
  assay.type = exprs_values
)

## S4 method for signature 'SingleCellExperiment'
runPCA(x, ..., altexp = NULL, name = "PCA")

Arguments

`x`	For `calculatePCA`, a numeric matrix of log-expression values where rows are features and columns are cells. Alternatively, a SummarizedExperiment or SingleCellExperiment containing such a matrix. For `runPCA`, a SingleCellExperiment object containing such a matrix.
`...`	For the `calculatePCA` generic, additional arguments to pass to specific methods. For the SummarizedExperiment and SingleCellExperiment methods, additional arguments to pass to the ANY method. For `runPCA`, additional arguments to pass to `calculatePCA`.
`ncomponents`	Numeric scalar indicating the number of principal components to obtain.
`ntop`	Numeric scalar specifying the number of features with the highest variances to use for dimensionality reduction.
`subset_row`	Vector specifying the subset of features to use for dimensionality reduction. This can be a character vector of row names, an integer vector of row indices or a logical vector.
`scale`	Logical scalar, should the expression values be standardized?
`transposed`	Logical scalar, is `x` transposed with cells in rows?
`BSPARAM`	A BiocSingularParam object specifying which algorithm should be used to perform the PCA.
`BPPARAM`	A BiocParallelParam object specifying whether the PCA should be parallelized.
`exprs_values`	Alias to `assay.type`.
`assay.type`	Integer scalar or string indicating which assay of `x` contains the expression values.
`dimred`	String or integer scalar specifying the existing dimensionality reduction results to use.
`n_dimred`	Integer scalar or vector specifying the dimensions to use if `dimred` is specified.
`altexp`	String or integer scalar specifying an alternative experiment containing the input data.
`name`	String specifying the name to be used to store the result in the `reducedDims` of the output.

Details

Fast approximate SVD algorithms like BSPARAM=IrlbaParam() or RandomParam() use a random initialization, after which they converge towards the exact PCs. This means that the result will change slightly across different runs. For full reproducibility, users should call set.seed prior to running runPCA with such algorithms. (Note that this includes BSPARAM=bsparam(), which uses approximate algorithms by default.)

Value

For calculatePCA, a numeric matrix of coordinates for each cell (row) in each of ncomponents PCs (column).

For runPCA, a SingleCellExperiment object is returned containing this matrix in reducedDims(..., name).

In both cases, the attributes of the PC coordinate matrix contain the following elements:

"percentVar", the percentage of variance explained by each PC. This may not sum to 100 if not all PCs are reported.
"varExplained", the actual variance explained by each PC.
"rotation", the rotation matrix containing loadings for all genes used in the analysis and for each PC.

Feature selection

If scale=TRUE, the expression values for each feature are standardized so that their variance is unity. This will also remove features with standard deviations below 1e-8.

Using reduced dimensions

When dimred is specified, no additional feature selection or standardization is performed. This means that any settings of ntop, subset_row and scale are ignored.

Using alternative Experiments

Author(s)

Aaron Lun, based on code by Davis McCarthy

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

example_sce <- runPCA(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

example_sce <- runPCA(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))

Perform t-SNE on cell-level data

Description

Perform t-stochastic neighbour embedding (t-SNE) for the cells, based on the data in a SingleCellExperiment object.

Usage

calculateTSNE(x, ...)

## S4 method for signature 'ANY'
calculateTSNE(
  x,
  ncomponents = 2,
  ntop = 500,
  subset_row = NULL,
  scale = FALSE,
  transposed = FALSE,
  perplexity = NULL,
  normalize = TRUE,
  theta = 0.5,
  num_threads = NULL,
  ...,
  external_neighbors = FALSE,
  BNPARAM = KmknnParam(),
  BPPARAM = SerialParam(),
  use_fitsne = FALSE,
  use_densvis = FALSE,
  dens_frac = 0.3,
  dens_lambda = 0.1
)

## S4 method for signature 'SummarizedExperiment'
calculateTSNE(x, ..., exprs_values = "logcounts", assay.type = exprs_values)

## S4 method for signature 'SingleCellExperiment'
calculateTSNE(
  x,
  ...,
  pca = is.null(dimred),
  exprs_values = "logcounts",
  dimred = NULL,
  n_dimred = NULL,
  assay.type = exprs_values
)

runTSNE(x, ..., altexp = NULL, name = "TSNE")
calculateTSNE(x, ...)

## S4 method for signature 'ANY'
calculateTSNE(
  x,
  ncomponents = 2,
  ntop = 500,
  subset_row = NULL,
  scale = FALSE,
  transposed = FALSE,
  perplexity = NULL,
  normalize = TRUE,
  theta = 0.5,
  num_threads = NULL,
  ...,
  external_neighbors = FALSE,
  BNPARAM = KmknnParam(),
  BPPARAM = SerialParam(),
  use_fitsne = FALSE,
  use_densvis = FALSE,
  dens_frac = 0.3,
  dens_lambda = 0.1
)

## S4 method for signature 'SummarizedExperiment'
calculateTSNE(x, ..., exprs_values = "logcounts", assay.type = exprs_values)

## S4 method for signature 'SingleCellExperiment'
calculateTSNE(
  x,
  ...,
  pca = is.null(dimred),
  exprs_values = "logcounts",
  dimred = NULL,
  n_dimred = NULL,
  assay.type = exprs_values
)

runTSNE(x, ..., altexp = NULL, name = "TSNE")

Arguments

`x`	For `calculateTSNE`, a numeric matrix of log-expression values where rows are features and columns are cells. Alternatively, a SummarizedExperiment or SingleCellExperiment containing such a matrix. For `runTSNE`, a SingleCellExperiment object.
`...`	For the `calculateTSNE` generic, additional arguments to pass to specific methods. For the ANY method, additional arguments to pass to `Rtsne`. For the SummarizedExperiment and SingleCellExperiment methods, additional arguments to pass to the ANY method. For `runTSNE`, additional arguments to pass to `calculateTSNE`.
`ncomponents`	Numeric scalar indicating the number of t-SNE dimensions to obtain.
`ntop`	Numeric scalar specifying the number of features with the highest variances to use for dimensionality reduction.
`subset_row`	Vector specifying the subset of features to use for dimensionality reduction. This can be a character vector of row names, an integer vector of row indices or a logical vector.
`scale`	Logical scalar, should the expression values be standardized?
`transposed`	Logical scalar, is `x` transposed with cells in rows?
`perplexity`	Numeric scalar defining the perplexity parameter, see `?Rtsne` for more details.
`normalize`	Logical scalar indicating if input values should be scaled for numerical precision, see `normalize_input`.
`theta`	Numeric scalar specifying the approximation accuracy of the Barnes-Hut algorithm, see `Rtsne` for details.
`num_threads`	Integer scalar specifying the number of threads to use in `Rtsne`. If `NULL` and `BPPARAM` is a MulticoreParam, it is set to the number of workers in `BPPARAM`; otherwise, the `Rtsne` defaults are used.
`external_neighbors`	Logical scalar indicating whether a nearest neighbors search should be computed externally with `findKNN`.
`BNPARAM`	A BiocNeighborParam object specifying the neighbor search algorithm to use when `external_neighbors=TRUE`.
`BPPARAM`	A BiocParallelParam object specifying how the neighbor search should be parallelized when `external_neighbors=TRUE`.
`use_fitsne`	Logical scalar indicating whether `fitsne` should be used to perform t-SNE.
`use_densvis`	Logical scalar indicating whether `densne` should be used to perform density-preserving t-SNE.
`dens_frac`, `dens_lambda`	See `densne`
`exprs_values`	Alias to `assay.type`.
`assay.type`	Integer scalar or string indicating which assay of `x` contains the expression values.
`pca`	Logical scalar indicating whether a PCA step should be performed inside `Rtsne`.
`dimred`	String or integer scalar specifying the existing dimensionality reduction results to use.
`n_dimred`	Integer scalar or vector specifying the dimensions to use if `dimred` is specified.
`altexp`	String or integer scalar specifying an alternative experiment containing the input data.
`name`	String specifying the name to be used to store the result in the `reducedDims` of the output.

Details

The function Rtsne is used internally to compute the t-SNE. Note that the algorithm is not deterministic, so different runs of the function will produce differing results. Users are advised to test multiple random seeds, and then use set.seed to set a random seed for replicable results.

The value of the perplexity parameter can have a large effect on the results. By default, the function will set a “reasonable” perplexity that scales with the number of cells in x. (Specifically, it is the number of cells divided by 5, capped at a maximum of 50.) However, it is often worthwhile to manually try multiple values to ensure that the conclusions are robust.

If external_neighbors=TRUE, the nearest neighbor search step will use a different algorithm to that in the Rtsne function. This can be parallelized or approximate to achieve greater speed for large data sets. The neighbor search results are then used for t-SNE via the Rtsne_neighbors function.

If dimred is specified, the PCA step of the Rtsne function is automatically turned off by default. This presumes that the existing dimensionality reduction is sufficient such that an additional PCA is not required.

Value

For calculateTSNE, a numeric matrix is returned containing the t-SNE coordinates for each cell (row) and dimension (column).

For runTSNE, a modified x is returned that contains the t-SNE coordinates in reducedDim(x, name).

Feature selection

If scale=TRUE, the expression values for each feature are standardized so that their variance is unity. This will also remove features with standard deviations below 1e-8.

Using reduced dimensions

When dimred is specified, no additional feature selection or standardization is performed. This means that any settings of ntop, subset_row and scale are ignored.

Using alternative Experiments

Author(s)

Aaron Lun, based on code by Davis McCarthy

References

van der Maaten LJP, Hinton GE (2008). Visualizing High-Dimensional Data Using t-SNE. J. Mach. Learn. Res. 9, 2579-2605.

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

example_sce <- runTSNE(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

example_sce <- runTSNE(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))

Perform UMAP on cell-level data

Description

Perform uniform manifold approximation and projection (UMAP) for the cells, based on the data in a SingleCellExperiment object.

Usage

calculateUMAP(x, ...)

## S4 method for signature 'ANY'
calculateUMAP(
  x,
  ncomponents = 2,
  ntop = 500,
  subset_row = NULL,
  scale = FALSE,
  transposed = FALSE,
  pca = if (transposed) NULL else 50,
  n_neighbors = 15,
  n_threads = bpnworkers(BPPARAM),
  ...,
  external_neighbors = FALSE,
  BNPARAM = KmknnParam(),
  BPPARAM = SerialParam(),
  use_densvis = FALSE,
  dens_frac = 0.3,
  dens_lambda = 0.1
)

## S4 method for signature 'SummarizedExperiment'
calculateUMAP(x, ..., exprs_values = "logcounts", assay.type = exprs_values)

## S4 method for signature 'SingleCellExperiment'
calculateUMAP(
  x,
  ...,
  pca = if (!is.null(dimred)) NULL else 50,
  exprs_values = "logcounts",
  dimred = NULL,
  n_dimred = NULL,
  assay.type = exprs_values
)

runUMAP(x, ..., altexp = NULL, name = "UMAP")
calculateUMAP(x, ...)

## S4 method for signature 'ANY'
calculateUMAP(
  x,
  ncomponents = 2,
  ntop = 500,
  subset_row = NULL,
  scale = FALSE,
  transposed = FALSE,
  pca = if (transposed) NULL else 50,
  n_neighbors = 15,
  n_threads = bpnworkers(BPPARAM),
  ...,
  external_neighbors = FALSE,
  BNPARAM = KmknnParam(),
  BPPARAM = SerialParam(),
  use_densvis = FALSE,
  dens_frac = 0.3,
  dens_lambda = 0.1
)

## S4 method for signature 'SummarizedExperiment'
calculateUMAP(x, ..., exprs_values = "logcounts", assay.type = exprs_values)

## S4 method for signature 'SingleCellExperiment'
calculateUMAP(
  x,
  ...,
  pca = if (!is.null(dimred)) NULL else 50,
  exprs_values = "logcounts",
  dimred = NULL,
  n_dimred = NULL,
  assay.type = exprs_values
)

runUMAP(x, ..., altexp = NULL, name = "UMAP")

Arguments

`x`	For `calculateUMAP`, a numeric matrix of log-expression values where rows are features and columns are cells. Alternatively, a SummarizedExperiment or SingleCellExperiment containing such a matrix. For `runTSNE`, a SingleCellExperiment object containing such a matrix.
`...`	For the `calculateUMAP` generic, additional arguments to pass to specific methods. For the ANY method, additional arguments to pass to `umap`. For the SummarizedExperiment and SingleCellExperiment methods, additional arguments to pass to the ANY method. For `runUMAP`, additional arguments to pass to `calculateUMAP`.
`ncomponents`	Numeric scalar indicating the number of UMAP dimensions to obtain.
`ntop`	Numeric scalar specifying the number of features with the highest variances to use for dimensionality reduction.
`subset_row`	Vector specifying the subset of features to use for dimensionality reduction. This can be a character vector of row names, an integer vector of row indices or a logical vector.
`scale`	Logical scalar, should the expression values be standardized?
`transposed`	Logical scalar, is `x` transposed with cells in rows?
`pca`	Integer scalar specifying how many PCs should be used as input into the UMAP algorithm. By default, no PCA is performed if the input is a dimensionality reduction result.
`n_neighbors`	Integer scalar, number of nearest neighbors to identify when constructing the initial graph.
`n_threads`	Integer scalar specifying the number of threads to use in `umap`. If `NULL` and `BPPARAM` is a MulticoreParam, it is set to the number of workers in `BPPARAM`; otherwise, the `umap` defaults are used.
`external_neighbors`	Logical scalar indicating whether a nearest neighbors search should be computed externally with `findKNN`.
`BNPARAM`	A BiocNeighborParam object specifying the neighbor search algorithm to use when `external_neighbors=TRUE`.
`BPPARAM`	A BiocParallelParam object specifying whether the PCA should be parallelized.
`use_densvis`	Logical scalar indicating whether `densne` should be used to perform density-preserving t-SNE.
`dens_frac`, `dens_lambda`	See `densne`
`exprs_values`	Alias to `assay.type`.
`assay.type`	Integer scalar or string indicating which assay of `x` contains the expression values.
`dimred`	String or integer scalar specifying the existing dimensionality reduction results to use.
`n_dimred`	Integer scalar or vector specifying the dimensions to use if `dimred` is specified.
`altexp`	String or integer scalar specifying an alternative experiment containing the input data.
`name`	String specifying the name to be used to store the result in the `reducedDims` of the output.

Details

The function umap is used internally to compute the UMAP. Note that the algorithm is not deterministic, so different runs of the function will produce differing results. Users are advised to test multiple random seeds, and then use set.seed to set a random seed for replicable results.

If external_neighbors=TRUE, the nearest neighbor search is conducted using a different algorithm to that in the umap function. This can be parallelized or approximate to achieve greater speed for large data sets. The neighbor search results are then used directly to create the UMAP embedding.

Value

For calculateUMAP, a matrix is returned containing the UMAP coordinates for each cell (row) and dimension (column).

For runUMAP, a modified x is returned that contains the UMAP coordinates in reducedDim(x, name).

Feature selection

If scale=TRUE, the expression values for each feature are standardized so that their variance is unity. This will also remove features with standard deviations below 1e-8.

Using reduced dimensions

When dimred is specified, no additional feature selection or standardization is performed. This means that any settings of ntop, subset_row and scale are ignored.

Using alternative Experiments

Author(s)

Aaron Lun

References

McInnes L, Healy J, Melville J (2018). UMAP: uniform manifold approximation and projection for dimension reduction. arXiv.

Examples

example_sce <- mockSCE() 
example_sce <- logNormCounts(example_sce)

example_sce <- runUMAP(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))
example_sce <- mockSCE() 
example_sce <- logNormCounts(example_sce)

example_sce <- runUMAP(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))

Defunct functions

Description

Functions that have passed on to the function afterlife. Their successors are also listed.

Usage

calculateQCMetrics(...)

## S4 method for signature 'SingleCellExperiment'
normalize(object, ...)

centreSizeFactors(...)

calculateDiffusionMap(x, ...)

## S4 method for signature 'ANY'
calculateDiffusionMap(x, ...)

runDiffusionMap(...)

multiplot(...)
calculateQCMetrics(...)

## S4 method for signature 'SingleCellExperiment'
normalize(object, ...)

centreSizeFactors(...)

calculateDiffusionMap(x, ...)

## S4 method for signature 'ANY'
calculateDiffusionMap(x, ...)

runDiffusionMap(...)

multiplot(...)

Arguments

object, x, ...

Ignored arguments.

Details

calculateQCMetrics is succeeded by perCellQCMetrics and perFeatureQCMetrics.

normalize is succeeded by logNormCounts.

centreSizeFactors has no replacement - the SingleCellExperiment is removing support for multiple size factors, so this function is now trivial.

runDiffusionMap and calculateDiffusionMap have no replacement. destiny is no longer on Bioconductor. You can calculate a diffusion map yourself, and add it to a reducedDim field, if you so wish.

Value

All functions error out with a defunct message pointing towards its descendent (if available).

Author(s)

Aaron Lun

Examples

try(calculateQCMetrics())
try(calculateQCMetrics())

Per-PC variance explained by a variable

Description

Compute, for each principal component, the percentage of variance that is explained by one or more variables of interest.

Usage

getExplanatoryPCs(x, dimred = "PCA", n_dimred = 10, ...)
getExplanatoryPCs(x, dimred = "PCA", n_dimred = 10, ...)

Arguments

`x`	A SingleCellExperiment object containing dimensionality reduction results.
`dimred`	String or integer scalar specifying the field in `reducedDims(x)` that contains the PCA results.
`n_dimred`	Integer scalar specifying the number of the top principal components to use.
`...`	Additional arguments passed to `getVarianceExplained`.

Details

This function computes the percentage of variance in PC scores that is explained by variables in the sample-level metadata. It allows identification of important PCs that are driven by known experimental conditions, e.g., treatment, disease. PCs correlated with technical factors (e.g., batch effects, library size) can also be detected and removed prior to further analysis.

By default, the function will attempt to use pre-computed PCA results in object. This is done by taking the top n_dimred PCs from the matrix specified by dimred. If these are not available or if rerun=TRUE, the function will rerun the PCA using runPCA; however, this mode is deprecated and users are advised to explicitly call runPCA themselves.

Value

A matrix containing the percentage of variance explained by each factor (column) and for each PC (row).

Author(s)

Aaron Lun

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
example_sce <- runPCA(example_sce)

r2mat <- getExplanatoryPCs(example_sce)

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
example_sce <- runPCA(example_sce)

r2mat <- getExplanatoryPCs(example_sce)

Per-gene variance explained by a variable

Description

Compute, for each gene, the percentage of variance that is explained by one or more variables of interest.

Usage

getVarianceExplained(x, ...)

## S4 method for signature 'ANY'
getVarianceExplained(x, variables, subset_row = NULL, BPPARAM = SerialParam())

## S4 method for signature 'SummarizedExperiment'
getVarianceExplained(
  x,
  variables = NULL,
  ...,
  exprs_values = "logcounts",
  assay.type = exprs_values
)
getVarianceExplained(x, ...)

## S4 method for signature 'ANY'
getVarianceExplained(x, variables, subset_row = NULL, BPPARAM = SerialParam())

## S4 method for signature 'SummarizedExperiment'
getVarianceExplained(
  x,
  variables = NULL,
  ...,
  exprs_values = "logcounts",
  assay.type = exprs_values
)

Arguments

`x`	A numeric matrix of expression values, usually log-transformed and normalized. Alternatively, a SummarizedExperiment containing such a matrix.
`...`	For the generic, arguments to be passed to specific methods. For the SummarizedExperiment method, arguments to be passed to the ANY method.
`variables`	A DataFrame or data.frame containing one or more variables of interest. This should have number of rows equal to the number of columns in `x`. For the SummarizedExperiment method, this can also be a character vector specifying column names of `colData(x)` to use; or `NULL`, in which case all columns in `colData(x)` are used.
`subset_row`	A vector specifying the subset of rows of `x` for which to return a result.
`BPPARAM`	A BiocParallelParam object specifying whether the calculations should be parallelized.
`exprs_values`	Alias for `assay.type`.
`assay.type`	String or integer scalar specifying the expression values for which to compute the variance (also an alias `exprs_value` is accepted).

Details

This function computes the percentage of variance in gene expression that is explained by variables in the sample-level metadata. It allows problematic factors to be quickly identified, as well as the genes that are most affected.

Value

A numeric matrix containing the percentage of variance explained by each factor (column) and for each gene (row).

Author(s)

Aaron Lun

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

r2mat <- getVarianceExplained(example_sce)
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

r2mat <- getVarianceExplained(example_sce)

Create a ggplot from a SingleCellExperiment

Description

Create a base ggplot object from a SingleCellExperiment, the contents of which can be directly referenced in subsequent layers without prior specification.

Usage

ggcells(
  x,
  mapping = aes(),
  features = NULL,
  exprs_values = "logcounts",
  use_dimred = TRUE,
  use_altexps = FALSE,
  prefix_altexps = FALSE,
  check_names = TRUE,
  extract_mapping = TRUE,
  assay.type = exprs_values,
  ...
)

ggfeatures(
  x,
  mapping = aes(),
  cells = NULL,
  exprs_values = "logcounts",
  check_names = TRUE,
  extract_mapping = TRUE,
  assay.type = exprs_values,
  ...
)
ggcells(
  x,
  mapping = aes(),
  features = NULL,
  exprs_values = "logcounts",
  use_dimred = TRUE,
  use_altexps = FALSE,
  prefix_altexps = FALSE,
  check_names = TRUE,
  extract_mapping = TRUE,
  assay.type = exprs_values,
  ...
)

ggfeatures(
  x,
  mapping = aes(),
  cells = NULL,
  exprs_values = "logcounts",
  check_names = TRUE,
  extract_mapping = TRUE,
  assay.type = exprs_values,
  ...
)

Arguments

`x`	A SingleCellExperiment object. This is expected to have row names for `ggcells` and column names for `ggfeatures`.
`mapping`	A list containing aesthetic mappings, usually the output of `aes` or related functions.
`features`	Character vector specifying the features for which to extract expression profiles across cells. May also include features in alternative Experiments if permitted by `use.altexps`.
`exprs_values`, `use_dimred`, `use_altexps`, `prefix_altexps`, `check_names`	Soft-deprecated equivalents of the arguments described above.
`extract_mapping`	Logical scalar indicating whether `features` or `cells` should be automatically expanded to include variables referenced in `mapping`.
`assay.type`	String or integer scalar specifying the expression values for which to compute the variance (also an alias `exprs_value` is accepted).
`...`	Further arguments to pass to ggplot.
`cells`	Character vector specifying the features for which to extract expression profiles across cells.

Details

These functions generate a data.frame from the contents of a SingleCellExperiment and pass it to ggplot. Rows, columns or metadata fields in the x can then be referenced in subsequent ggplot2 commands.

ggcells treats cells as the data values so users can reference row names of x (if provided in features), column metadata variables and dimensionality reduction results. They can also reference row names and metadata variables for alternative Experiments.

ggfeatures treats features as the data values so users can reference column names of x (if provided in cells) and row metadata variables.

If mapping is supplied, the function will automatically expand features or cells for any features or cells requested in the mapping. This is convenient as features/cells do not have to specified twice (once in data.frame construction and again in later geom or stat layers). Developers may wish to turn this off with extract_mapping=FALSE for greater control.

Value

A ggplot object containing the specified contents of x.

Author(s)

Aaron Lun

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
example_sce <- runPCA(example_sce)

ggcells(example_sce, aes(x=PCA.1, y=PCA.2, colour=Gene_0001)) +
    geom_point()

ggcells(example_sce, aes(x=Mutation_Status, y=Gene_0001)) +
    geom_violin() +
    facet_wrap(~Cell_Cycle)

rowData(example_sce)$GC <- runif(nrow(example_sce))
ggfeatures(example_sce, aes(x=GC, y=Cell_001)) +
    geom_point() +
    stat_smooth()

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
example_sce <- runPCA(example_sce)

ggcells(example_sce, aes(x=PCA.1, y=PCA.2, colour=Gene_0001)) +
    geom_point()

ggcells(example_sce, aes(x=Mutation_Status, y=Gene_0001)) +
    geom_violin() +
    facet_wrap(~Cell_Cycle)

rowData(example_sce)$GC <- runif(nrow(example_sce))
ggfeatures(example_sce, aes(x=GC, y=Cell_001)) +
    geom_point() +
    stat_smooth()

Count the number of non-zero counts per cell or feature

Description

Counting the number of non-zero counts in each row (per feature) or column (per cell).

Usage

nexprs(x, ...)

## S4 method for signature 'ANY'
nexprs(
  x,
  byrow = FALSE,
  detection_limit = 0,
  subset_row = NULL,
  subset_col = NULL,
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
nexprs(x, ..., exprs_values = "counts", assay.type = exprs_values)
nexprs(x, ...)

## S4 method for signature 'ANY'
nexprs(
  x,
  byrow = FALSE,
  detection_limit = 0,
  subset_row = NULL,
  subset_col = NULL,
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
nexprs(x, ..., exprs_values = "counts", assay.type = exprs_values)

Arguments

`x`	A numeric matrix of counts where features are rows and cells are columns. Alternatively, a SummarizedExperiment containing such counts.
`...`	For the generic, further arguments to pass to specific methods. For the SummarizedExperiment method, further arguments to pass to the ANY method.
`byrow`	Logical scalar indicating whether to count the number of detected cells per feature. If `FALSE`, the function will count the number of detected features per cell.
`detection_limit`	Numeric scalar providing the value above which observations are deemed to be expressed.
`subset_row`	Logical, integer or character vector indicating which rows (i.e. features) to use.
`subset_col`	Logical, integer or character vector indicating which columns (i.e., cells) to use.
`BPPARAM`	A BiocParallelParam object specifying whether the calculations should be parallelized. Only relevant when `x` is a DelayedMatrix.
`exprs_values`	Alias for `assay.type`.
`assay.type`	String or integer specifying the assay of `x` to obtain the count matrix from (also the alias `exprs_values` is accepted for this argument).

Value

An integer vector containing counts per gene or cell, depending on the provided arguments.

Author(s)

Aaron Lun

Examples

example_sce <- mockSCE()

nexprs(example_sce)[1:10]
nexprs(example_sce, byrow = TRUE)[1:10]

example_sce <- mockSCE()

nexprs(example_sce)[1:10]
nexprs(example_sce, byrow = TRUE)[1:10]

Additional accessors for the typical elements of a SingleCellExperiment object.

Description

Convenience functions to access commonly-used assays of the SingleCellExperiment object.

Usage

norm_exprs(object)

norm_exprs(object) <- value

stand_exprs(object)

stand_exprs(object) <- value

fpkm(object)

fpkm(object) <- value
norm_exprs(object)

norm_exprs(object) <- value

stand_exprs(object)

stand_exprs(object) <- value

fpkm(object)

fpkm(object) <- value

Arguments

`object`	`SingleCellExperiment` class object from which to access or to which to assign assay values. Namely: "exprs", norm_exprs", "stand_exprs", "fpkm". The following are imported from `SingleCellExperiment`: "counts", "normcounts", "logcounts", "cpm", "tpm".
`value`	a numeric matrix (e.g. for `exprs`)

Value

a matrix of normalised expression data

a matrix of standardised expressiond data

a matrix of FPKM values

A matrix of numeric, integer or logical values.

Author(s)

Davis McCarthy

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
head(logcounts(example_sce)[,1:10])
head(exprs(example_sce)[,1:10]) # identical to logcounts()

norm_exprs(example_sce) <- log2(calculateCPM(example_sce) + 1)

stand_exprs(example_sce) <- log2(calculateCPM(example_sce) + 1)

tpm(example_sce) <- calculateTPM(example_sce, lengths = 5e4)

cpm(example_sce) <- calculateCPM(example_sce)

fpkm(example_sce)
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
head(logcounts(example_sce)[,1:10])
head(exprs(example_sce)[,1:10]) # identical to logcounts()

norm_exprs(example_sce) <- log2(calculateCPM(example_sce) + 1)

stand_exprs(example_sce) <- log2(calculateCPM(example_sce) + 1)

tpm(example_sce) <- calculateTPM(example_sce, lengths = 5e4)

cpm(example_sce) <- calculateCPM(example_sce)

fpkm(example_sce)

Plot column metadata

Description

Plot column-level (i.e., cell) metadata in an SingleCellExperiment object.

Usage

plotColData(
  object,
  y,
  x = NULL,
  colour_by = color_by,
  shape_by = NULL,
  size_by = NULL,
  order_by = NULL,
  by_exprs_values = "logcounts",
  other_fields = list(),
  swap_rownames = NULL,
  color_by = NULL,
  point_fun = NULL,
  scattermore = FALSE,
  bins = NULL,
  summary_fun = "sum",
  hex = FALSE,
  by.assay.type = by_exprs_values,
  ...
)
plotColData(
  object,
  y,
  x = NULL,
  colour_by = color_by,
  shape_by = NULL,
  size_by = NULL,
  order_by = NULL,
  by_exprs_values = "logcounts",
  other_fields = list(),
  swap_rownames = NULL,
  color_by = NULL,
  point_fun = NULL,
  scattermore = FALSE,
  bins = NULL,
  summary_fun = "sum",
  hex = FALSE,
  by.assay.type = by_exprs_values,
  ...
)

Arguments

`object`	A SingleCellExperiment object containing expression values and experimental information.
`y`	String specifying the column-level metadata field to show on the y-axis. Alternatively, an AsIs vector or data.frame, see `?retrieveCellInfo`.
`x`	String specifying the column-level metadata to show on the x-axis. Alternatively, an AsIs vector or data.frame, see `?retrieveCellInfo`. If `NULL`, nothing is shown on the x-axis.
`colour_by`	Specification of a column metadata field or a feature to colour by, see the `by` argument in `?retrieveCellInfo` for possible values.
`shape_by`	Specification of a column metadata field or a feature to shape by, see the `by` argument in `?retrieveCellInfo` for possible values.
`size_by`	Specification of a column metadata field or a feature to size by, see the `by` argument in `?retrieveCellInfo` for possible values.
`order_by`	Specification of a column metadata field or a feature to order points by, see the `by` argument in `?retrieveCellInfo` for possible values.
`by_exprs_values`	Alias for `by.assay.type`.
`other_fields`	Additional cell-based fields to include in the data.frame, see `?"scater-plot-args"` for details.
`swap_rownames`	Column name of `rowData(object)` to be used to identify features instead of `rownames(object)` when labelling plot elements.
`color_by`	Alias to `colour_by`.
`point_fun`	Function used to create a geom that shows individual cells. Should take `...` args and return a ggplot2 geom. For example, `point_fun=function(...) geom_quasirandom(...)`.
`scattermore`	Logical, whether to use the `scattermore` package to greatly speed up plotting a large number of cells. Use `point_size = 0` for the most performance gain.
`bins`	Number of bins, can be different in x and y, to bin and summarize the points and their values, to avoid overplotting. If `NULL` (default), then the points are plotted without binning. Only used when both x and y are numeric.
`summary_fun`	Function to summarize the feature value of each point (e.g. gene expression of each cell) when the points binned, defaults to `sum`. Can be either the name of the function or the function itself.
`hex`	Logical, whether to use `geom_hex`.
`by.assay.type`	A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see `?retrieveCellInfo` for details (also alias `by_exprs_values` is accepted for this argument).
`...`	Additional arguments for visualization, see `?"scater-plot-args"` for details.

Details

If y is continuous and x=NULL, a violin plot is generated. If x is categorical, a grouped violin plot will be generated, with one violin for each level of x. If x is continuous, a scatter plot will be generated.

If y is categorical and x is continuous, horizontal violin plots will be generated. If x is missing or categorical, rectangule plots will be generated where the area of a rectangle is proportional to the number of points for a combination of factors.

Value

A ggplot object.

Note

Arguments shape_by and size_by are ignored when scattermore = TRUE. Using scattermore is only recommended for very large datasets to speed up plotting. Small point size is also recommended. For larger point size, the point shape may be distorted. Also, when scattermore = TRUE, the point_size argument works differently.

Author(s)

Davis McCarthy, with modifications by Aaron Lun

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
colData(example_sce) <- cbind(colData(example_sce),
    perCellQCMetrics(example_sce))

plotColData(example_sce, y = "detected", x = "sum",
   colour_by = "Mutation_Status") + scale_x_log10()

plotColData(example_sce, y = "detected", x = "sum",
   colour_by = "Mutation_Status", size_by = "Gene_0001",
   shape_by = "Treatment") + scale_x_log10()

plotColData(example_sce, y = "Treatment", x = "sum",
   colour_by = "Mutation_Status") + scale_y_log10() # flipped violin.

plotColData(example_sce, y = "detected",
   x = "Cell_Cycle", colour_by = "Mutation_Status")
# With scattermore
plotColData(example_sce, x = "sum", y = "detected", scattermore = TRUE,
   point_size = 2)
# Bin to show point density
plotColData(example_sce, x = "sum", y = "detected", bins = 10)
# Bin to summarize value (default is sum)
plotColData(example_sce, x = "sum", y = "detected", bins = 10, colour_by = "total")
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
colData(example_sce) <- cbind(colData(example_sce),
    perCellQCMetrics(example_sce))

plotColData(example_sce, y = "detected", x = "sum",
   colour_by = "Mutation_Status") + scale_x_log10()

plotColData(example_sce, y = "detected", x = "sum",
   colour_by = "Mutation_Status", size_by = "Gene_0001",
   shape_by = "Treatment") + scale_x_log10()

plotColData(example_sce, y = "Treatment", x = "sum",
   colour_by = "Mutation_Status") + scale_y_log10() # flipped violin.

plotColData(example_sce, y = "detected",
   x = "Cell_Cycle", colour_by = "Mutation_Status")
# With scattermore
plotColData(example_sce, x = "sum", y = "detected", scattermore = TRUE,
   point_size = 2)
# Bin to show point density
plotColData(example_sce, x = "sum", y = "detected", bins = 10)
# Bin to summarize value (default is sum)
plotColData(example_sce, x = "sum", y = "detected", bins = 10, colour_by = "total")

Create a dot plot of expression values

Description

Create a dot plot of expression values for a grouping of cells, where the size and colour of each dot represents the proportion of detected expression values and the average expression, respectively, for each feature in each group of cells.

Usage

plotDots(
  object,
  features,
  group = NULL,
  block = NULL,
  exprs_values = "logcounts",
  detection_limit = 0,
  zlim = NULL,
  colour = color,
  color = NULL,
  max_detected = NULL,
  other_fields = list(),
  by_exprs_values = exprs_values,
  swap_rownames = NULL,
  center = FALSE,
  scale = FALSE,
  assay.type = exprs_values,
  by.assay.type = by_exprs_values
)
plotDots(
  object,
  features,
  group = NULL,
  block = NULL,
  exprs_values = "logcounts",
  detection_limit = 0,
  zlim = NULL,
  colour = color,
  color = NULL,
  max_detected = NULL,
  other_fields = list(),
  by_exprs_values = exprs_values,
  swap_rownames = NULL,
  center = FALSE,
  scale = FALSE,
  assay.type = exprs_values,
  by.assay.type = by_exprs_values
)

Arguments

`object`	A SingleCellExperiment object.
`features`	A character (or factor) vector of row names, a logical vector, or integer vector of indices specifying rows of `object` to visualize. When using character or integer vectors, the ordering specified by the user is retained. When using factor vectors, ordering is controlled by the factor levels.
`group`	String specifying the field of `colData(object)` containing the grouping factor, e.g., cell types or clusters. Alternatively, any value that can be used in the `by` argument to `retrieveCellInfo`.
`block`	String specifying the field of `colData(object)` containing a blocking factor (e.g., batch of origin). Alternatively, any value that can be used in the `by` argument to `retrieveCellInfo`.
`exprs_values`	Alias to `assay.type`.
`detection_limit`	Numeric scalar providing the value above which observations are deemed to be expressed.
`zlim`	A numeric vector of length 2, specifying the upper and lower bounds for colour mapping of expression values. Values outside this range are set to the most extreme colour. If `NULL`, it defaults to the range of the expression matrix. If `center=TRUE`, this defaults to the range of the centered expression matrix, made symmetric around zero.
`colour`	A vector of colours specifying the palette to use for increasing expression. This defaults to viridis if `center=FALSE`, and the the `"RdYlBu"` colour palette from `brewer.pal` otherwise.
`color`	Alias to `colour`.
`max_detected`	Numeric value specifying the cap on the proportion of detected expression values.
`other_fields`	Additional feature-based fields to include in the data.frame, see `?"scater-plot-args"` for details. Note that any AsIs vectors or data.frames must be of length equal to `nrow(object)`, not `features`.
`by_exprs_values`	Alias for `by.assay.type`.
`swap_rownames`	Column name of `rowData(object)` to be used to identify features instead of `rownames(object)` when labelling plot elements.
`center`	A logical scalar indicating whether each feature should have its mean expression (specifically, the mean of averages across all groups) centered at zero prior to plotting.
`scale`	A logical scalar specifying whether each row should have its average expression values scaled to unit variance prior to plotting.
`assay.type`	A string or integer scalar indicating which assay of `object` should be used as expression values.
`by.assay.type`	A string or integer scalar specifying which assay to obtain expression values from, for entries of `other_fields`. Also alias `by_exprs_values` is accepted as argument name.

Details

This implements a Seurat-style “dot plot” that creates a dot for each feature (row) in each group of cells (column). The proportion of detected expression values and the average expression for each feature in each group of cells is visualized efficiently using the size and colour, respectively, of each dot. If block is specified, batch-corrected averages and proportions for each group are computed with correctGroupSummary.

Some caution is required during interpretation due to the difficulty of simultaneously interpreting both size and colour. For example, if we coloured by z-score on a conventional blue-white-red colour axis, a gene that is downregulated in a group of cells would show up as a small blue dot. If the background colour was also white, this could be easily mistaken for a gene that is not downregulated at all. We suggest choosing a colour scale that remains distinguishable from the background colour at all points. Admittedly, that is easier said than done as many colour scales will approach a lighter colour at some stage, so some magnifying glasses may be required.

We can also cap the colour and size scales using zlim and max_detected, respectively. This aims to preserve resolution for low-abundance genes by preventing domination of the scales by high-abundance features.

Value

A ggplot object containing a dot plot.

Author(s)

Aaron Lun

Examples

sce <- mockSCE()
sce <- logNormCounts(sce)

plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle")
plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle", center=TRUE)
plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle", scale=TRUE)
plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle", center=TRUE, scale=TRUE)

plotDots(sce, features=rownames(sce)[1:10], group="Treatment", block="Cell_Cycle")

sce <- mockSCE()
sce <- logNormCounts(sce)

plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle")
plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle", center=TRUE)
plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle", scale=TRUE)
plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle", center=TRUE, scale=TRUE)

plotDots(sce, features=rownames(sce)[1:10], group="Treatment", block="Cell_Cycle")

Plot the explanatory PCs for each variable

Description

Plot the explanatory PCs for each variable

Usage

plotExplanatoryPCs(
  object,
  nvars_to_plot = 10,
  npcs_to_plot = 50,
  theme_size = 10,
  ...
)
plotExplanatoryPCs(
  object,
  nvars_to_plot = 10,
  npcs_to_plot = 50,
  theme_size = 10,
  ...
)

Arguments

`object`	A SingleCellExperiment object containing expression values and experimental information. Alternatively, a matrix containing the output of `getExplanatoryPCs`.
`nvars_to_plot`	Integer scalar specifying the number of variables with the greatest explanatory power to plot. This can be set to `Inf` to show all variables.
`npcs_to_plot`	Integer scalar specifying the number of PCs to plot.
`theme_size`	numeric scalar providing base font size for ggplot theme.
`...`	Parameters to be passed to `getExplanatoryPCs`.

Details

A density plot is created for each variable, showing the R-squared for each successive PC (up to npcs_to_plot PCs). Only the nvars_to_plot variables with the largest maximum R-squared across PCs are shown.

If object is a SingleCellExperiment object, getExplanatoryPCs will be called to compute the variance in expression explained by each variable in each gene. Users may prefer to run getExplanatoryPCs manually and pass the resulting matrix as object, in which case the R-squared values are used directly.

Value

A ggplot object.

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
example_sce <- runPCA(example_sce)

plotExplanatoryPCs(example_sce)
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
example_sce <- runPCA(example_sce)

plotExplanatoryPCs(example_sce)

Plot explanatory variables ordered by percentage of variance explained

Description

Plot explanatory variables ordered by percentage of variance explained

Usage

plotExplanatoryVariables(
  object,
  nvars_to_plot = 10,
  min_marginal_r2 = 0,
  theme_size = 10,
  ...
)
plotExplanatoryVariables(
  object,
  nvars_to_plot = 10,
  min_marginal_r2 = 0,
  theme_size = 10,
  ...
)

Arguments

`object`	A SingleCellExperiment object containing expression values and experimental information. Alternatively, a matrix containing the output of `getVarianceExplained`.
`nvars_to_plot`	Integer scalar specifying the number of variables with the greatest explanatory power to plot. This can be set to `Inf` to show all variables.
`min_marginal_r2`	Numeric scalar specifying the minimal value required for median marginal R-squared for a variable to be plotted. Only variables with a median marginal R-squared strictly larger than this value will be plotted.
`theme_size`	Numeric scalar specifying the font size to use for the plotting theme
`...`	Parameters to be passed to `getVarianceExplained`.

Details

A density plot is created for each variable, showing the distribution of R-squared across all genes. Only the nvars_to_plot variables with the largest median R-squared across genes are shown. Variables are also only shown if they have median R-squared values above min_marginal_r2.

If object is a SingleCellExperiment object, getVarianceExplained will be called to compute the variance in expression explained by each variable in each gene. Users may prefer to run getVarianceExplained manually and pass the resulting matrix as object, in which case the R-squared values are used directly.

Value

A ggplot object.

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
plotExplanatoryVariables(example_sce)
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
plotExplanatoryVariables(example_sce)

Plot expression values for all cells

Description

Plot expression values for a set of features (e.g. genes or transcripts) in a SingleExperiment object, against a continuous or categorical covariate for all cells.

Usage

plotExpression(
  object,
  features,
  x = NULL,
  exprs_values = "logcounts",
  log2_values = FALSE,
  colour_by = color_by,
  shape_by = NULL,
  size_by = NULL,
  order_by = NULL,
  by_exprs_values = exprs_values,
  xlab = NULL,
  feature_colours = feature_colors,
  one_facet = TRUE,
  ncol = 2,
  scales = "fixed",
  other_fields = list(),
  swap_rownames = NULL,
  color_by = NULL,
  feature_colors = TRUE,
  point_fun = NULL,
  assay.type = exprs_values,
  scattermore = FALSE,
  bins = NULL,
  summary_fun = "sum",
  hex = FALSE,
  by.assay.type = by_exprs_values,
  ...
)
plotExpression(
  object,
  features,
  x = NULL,
  exprs_values = "logcounts",
  log2_values = FALSE,
  colour_by = color_by,
  shape_by = NULL,
  size_by = NULL,
  order_by = NULL,
  by_exprs_values = exprs_values,
  xlab = NULL,
  feature_colours = feature_colors,
  one_facet = TRUE,
  ncol = 2,
  scales = "fixed",
  other_fields = list(),
  swap_rownames = NULL,
  color_by = NULL,
  feature_colors = TRUE,
  point_fun = NULL,
  assay.type = exprs_values,
  scattermore = FALSE,
  bins = NULL,
  summary_fun = "sum",
  hex = FALSE,
  by.assay.type = by_exprs_values,
  ...
)

Arguments

`object`	A SingleCellExperiment object containing expression values and other metadata.
`features`	A character vector or a list specifying the features to plot. If a list is supplied, each entry of the list can be a string, an AsIs-wrapped vector or a data.frame - see `?retrieveCellInfo`.
`x`	Specification of a column metadata field or a feature to show on the x-axis, see the `by` argument in `?retrieveCellInfo` for possible values.
`exprs_values`	Alias to `assay.type`.
`log2_values`	Logical scalar, specifying whether the expression values be transformed to the log2-scale for plotting (with an offset of 1 to avoid logging zeroes).
`colour_by`	Specification of a column metadata field or a feature to colour by, see the `by` argument in `?retrieveCellInfo` for possible values.
`shape_by`	Specification of a column metadata field or a feature to shape by, see the `by` argument in `?retrieveCellInfo` for possible values.
`size_by`	Specification of a column metadata field or a feature to size by, see the `by` argument in `?retrieveCellInfo` for possible values.
`order_by`	Specification of a column metadata field or a feature to order points by, see the `by` argument in `?retrieveCellInfo` for possible values.
`by_exprs_values`	Alias to `by.assay.type`.
`xlab`	String specifying the label for x-axis. If `NULL` (default), `x` will be used as the x-axis label.
`feature_colours`	Logical scalar indicating whether violins should be coloured by feature when `x` and `colour_by` are not specified and `one_facet=TRUE`.
`one_facet`	Logical scalar indicating whether grouped violin plots for multiple features should be put onto one facet. Only relevant when `x=NULL`.
`ncol`	Integer scalar, specifying the number of columns to be used for the panels of a multi-facet plot.
`scales`	String indicating whether should multi-facet scales be fixed (`"fixed"`), free (`"free"`), or free in one dimension (`"free_x"`, `"free_y"`). Passed to the `scales` argument in the `facet_wrap` when multiple facets are generated.
`other_fields`	Additional cell-based fields to include in the data.frame, see `?"scater-plot-args"` for details.
`swap_rownames`	Column name of `rowData(object)` to be used to identify features instead of `rownames(object)` when labelling plot elements.
`color_by`	Alias to `colour_by`.
`feature_colors`	Alias to `feature_colours`.
`point_fun`	Function used to create a geom that shows individual cells. Should take `...` args and return a ggplot2 geom. For example, `point_fun=function(...) geom_quasirandom(...)`.
`assay.type`	A string or integer scalar specifying which assay in `assays(object)` to obtain expression values from. Also the alias `assay.type` is accepted.
`scattermore`	Logical, whether to use the `scattermore` package to greatly speed up plotting a large number of cells. Use `point_size = 0` for the most performance gain.
`bins`	Number of bins, can be different in x and y, to bin and summarize the points and their values, to avoid overplotting. If `NULL` (default), then the points are plotted without binning. Only used when both x and y are numeric.
`summary_fun`	Function to summarize the feature value of each point (e.g. gene expression of each cell) when the points binned, defaults to `sum`. Can be either the name of the function or the function itself.
`hex`	Logical, whether to use `geom_hex`.
`by.assay.type`	A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see the `assay.type` argument in `?retrieveCellInfo`. Also the alias `by.assay.type` is accepted.
`...`	Additional arguments for visualization, see `?"scater-plot-args"` for details.

Details

This function plots expression values for one or more features. If x is not specified, a violin plot will be generated of expression values. If x is categorical, a grouped violin plot will be generated, with one violin for each level of x. If x is continuous, a scatter plot will be generated.

If multiple features are requested and x is not specified and one_facet=TRUE, a grouped violin plot will be generated with one violin per feature. This will be coloured by feature if colour_by=NULL and feature_colours=TRUE, to yield a more aesthetically pleasing plot. Otherwise, if x is specified or one_facet=FALSE, a multi-panel plot will be generated where each panel corresponds to a feature. Each panel will be a scatter plot or (grouped) violin plot, depending on the nature of x.

Note that this assumes that the expression values are numeric. If not, and x is continuous, horizontal violin plots will be generated. If x is missing or categorical, rectangule plots will be generated where the area of a rectangle is proportional to the number of points for a combination of factors.

Value

A ggplot object.

Note

Author(s)

Davis McCarthy, with modifications by Aaron Lun

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

## default plot
plotExpression(example_sce, rownames(example_sce)[1:15])

## plot expression against an x-axis value
plotExpression(example_sce, c("Gene_0001", "Gene_0004"),
    x="Mutation_Status")
plotExpression(example_sce, c("Gene_0001", "Gene_0004"),
    x="Gene_0002")

## add visual options
plotExpression(example_sce, rownames(example_sce)[1:6],
    colour_by = "Mutation_Status")
plotExpression(example_sce, rownames(example_sce)[1:6],
    colour_by = "Mutation_Status", shape_by = "Treatment",
    size_by = "Gene_0010")

## use boxplot as well as violin plot
plotExpression(example_sce, rownames(example_sce)[1:6],
    show_boxplot = TRUE, show_violin = FALSE)

## plot expression against expression values for Gene_0004
plotExpression(example_sce, rownames(example_sce)[1:4],
    "Gene_0004", show_smooth = TRUE)

# Use scattermore
plotExpression(example_sce, "Gene_0001", x = "Gene_0100", scattermore = TRUE,
    point_size = 2)
# Bin to show point density
plotExpression(example_sce, "Gene_0001", x = "Gene_0100", bins = 10)
# Bin to summarize values (default is sum but can be changed with summary_fun)
plotExpression(example_sce, "Gene_0001", x = "Gene_0100", bins = 10,
    colour_by = "Gene_0002", summary_fun = "mean")

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

## default plot
plotExpression(example_sce, rownames(example_sce)[1:15])

## plot expression against an x-axis value
plotExpression(example_sce, c("Gene_0001", "Gene_0004"),
    x="Mutation_Status")
plotExpression(example_sce, c("Gene_0001", "Gene_0004"),
    x="Gene_0002")

## add visual options
plotExpression(example_sce, rownames(example_sce)[1:6],
    colour_by = "Mutation_Status")
plotExpression(example_sce, rownames(example_sce)[1:6],
    colour_by = "Mutation_Status", shape_by = "Treatment",
    size_by = "Gene_0010")

## use boxplot as well as violin plot
plotExpression(example_sce, rownames(example_sce)[1:6],
    show_boxplot = TRUE, show_violin = FALSE)

## plot expression against expression values for Gene_0004
plotExpression(example_sce, rownames(example_sce)[1:4],
    "Gene_0004", show_smooth = TRUE)

# Use scattermore
plotExpression(example_sce, "Gene_0001", x = "Gene_0100", scattermore = TRUE,
    point_size = 2)
# Bin to show point density
plotExpression(example_sce, "Gene_0001", x = "Gene_0100", bins = 10)
# Bin to summarize values (default is sum but can be changed with summary_fun)
plotExpression(example_sce, "Gene_0001", x = "Gene_0100", bins = 10,
    colour_by = "Gene_0002", summary_fun = "mean")

Plot heatmap of group-level expression averages

Description

Create a heatmap of average expression values for each group of cells and specified features in a SingleCellExperiment object.

Usage

plotGroupedHeatmap(
  object,
  features,
  group,
  block = NULL,
  columns = NULL,
  exprs_values = "logcounts",
  center = FALSE,
  scale = FALSE,
  zlim = NULL,
  colour = color,
  swap_rownames = NULL,
  color = NULL,
  assay.type = exprs_values,
  ...
)
plotGroupedHeatmap(
  object,
  features,
  group,
  block = NULL,
  columns = NULL,
  exprs_values = "logcounts",
  center = FALSE,
  scale = FALSE,
  zlim = NULL,
  colour = color,
  swap_rownames = NULL,
  color = NULL,
  assay.type = exprs_values,
  ...
)

Arguments

`object`	A SingleCellExperiment object.
`features`	A character (or factor) vector of row names, a logical vector, or integer vector of indices specifying rows of `object` to visualize. When using character or integer vectors, the ordering specified by the user is retained. When using factor vectors, ordering is controlled by the factor levels.
`group`	String specifying the field of `colData(object)` containing the grouping factor, e.g., cell types or clusters. Alternatively, any value that can be used in the `by` argument to `retrieveCellInfo`.
`block`	String specifying the field of `colData(object)` containing a blocking factor (e.g., batch of origin). Alternatively, any value that can be used in the `by` argument to `retrieveCellInfo`.
`columns`	A vector specifying the subset of columns in `object` to use when computing averages.
`exprs_values`	Alias to `assay.type`.
`center`	A logical scalar indicating whether each feature should have its mean expression (specifically, the mean of averages across all groups) centered at zero prior to plotting.
`scale`	A logical scalar specifying whether each row should have its average expression values scaled to unit variance prior to plotting.
`zlim`	A numeric vector of length 2, specifying the upper and lower bounds for colour mapping of expression values. Values outside this range are set to the most extreme colour. If `NULL`, it defaults to the range of the expression matrix. If `center=TRUE`, this defaults to the range of the centered expression matrix, made symmetric around zero.
`colour`	A vector of colours specifying the palette to use for increasing expression. This defaults to viridis if `center=FALSE`, and the the `"RdYlBu"` colour palette from `brewer.pal` otherwise.
`swap_rownames`	Column name of `rowData(object)` to be used to identify features instead of `rownames(object)` when labelling plot elements.
`color`	Alias to `colour`.
`assay.type`	A string or integer scalar indicating which assay of `object` should be used as expression values.
`...`	Additional arguments to pass to `pheatmap`.

Details

This function shows the average expression values for each group of cells on a heatmap, as defined using the group factor. A per-group visualization can be preferable to a per-cell visualization when dealing with large number of cells or groups with different size. If block is also specified, the block effect is regressed out of the averages with correctGroupSummary prior to visualization.

Setting center=TRUE is useful for examining log-fold changes of each group's expression profile from the average across all groups. This avoids issues with the entire row appearing a certain colour because the gene is highly/lowly expressed across all cells.

Setting zlim preserves the dynamic range of colours in the presence of outliers. Otherwise, the plot may be dominated by a few genes, which will “flatten” the observed colours for the rest of the heatmap.

Value

A heatmap is produced on the current graphics device. The output of pheatmap is invisibly returned.

Author(s)

Aaron Lun

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
example_sce$Group <- paste0(example_sce$Treatment, "+", example_sce$Mutation_Status)

plotGroupedHeatmap(example_sce, features=rownames(example_sce)[1:10],
    group="Group")

plotGroupedHeatmap(example_sce, features=rownames(example_sce)[1:10],
    group="Group", center=TRUE)

plotGroupedHeatmap(example_sce, features=rownames(example_sce)[1:10],
    group="Group", block="Cell_Cycle", center=TRUE)

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
example_sce$Group <- paste0(example_sce$Treatment, "+", example_sce$Mutation_Status)

plotGroupedHeatmap(example_sce, features=rownames(example_sce)[1:10],
    group="Group")

plotGroupedHeatmap(example_sce, features=rownames(example_sce)[1:10],
    group="Group", center=TRUE)

plotGroupedHeatmap(example_sce, features=rownames(example_sce)[1:10],
    group="Group", block="Cell_Cycle", center=TRUE)

Plot heatmap of gene expression values

Description

Create a heatmap of expression values for each cell and specified features in a SingleCellExperiment object.

Usage

plotHeatmap(
  object,
  features,
  columns = NULL,
  exprs_values = "logcounts",
  center = FALSE,
  scale = FALSE,
  zlim = NULL,
  colour = color,
  color = NULL,
  colour_columns_by = color_columns_by,
  color_columns_by = NULL,
  column_annotation_colours = column_annotation_colors,
  column_annotation_colors = list(),
  row_annotation_colours = row_annotation_colors,
  row_annotation_colors = list(),
  colour_rows_by = color_rows_by,
  color_rows_by = NULL,
  order_columns_by = NULL,
  by_exprs_values = exprs_values,
  show_colnames = FALSE,
  cluster_cols = is.null(order_columns_by),
  swap_rownames = NULL,
  assay.type = exprs_values,
  by.assay.type = by_exprs_values,
  ...
)
plotHeatmap(
  object,
  features,
  columns = NULL,
  exprs_values = "logcounts",
  center = FALSE,
  scale = FALSE,
  zlim = NULL,
  colour = color,
  color = NULL,
  colour_columns_by = color_columns_by,
  color_columns_by = NULL,
  column_annotation_colours = column_annotation_colors,
  column_annotation_colors = list(),
  row_annotation_colours = row_annotation_colors,
  row_annotation_colors = list(),
  colour_rows_by = color_rows_by,
  color_rows_by = NULL,
  order_columns_by = NULL,
  by_exprs_values = exprs_values,
  show_colnames = FALSE,
  cluster_cols = is.null(order_columns_by),
  swap_rownames = NULL,
  assay.type = exprs_values,
  by.assay.type = by_exprs_values,
  ...
)

Arguments

`object`	A SingleCellExperiment object.
`features`	A character (or factor) vector of row names, a logical vector, or integer vector of indices specifying rows of `object` to visualize. When using character or integer vectors, the ordering specified by the user is retained. When using factor vectors, ordering is controlled by the factor levels.
`columns`	A vector specifying the subset of columns in `object` to show as columns in the heatmap. Also specifies the column order if `cluster_cols=FALSE` and `order_columns_by=NULL`. By default, all columns are used.
`exprs_values`	Alias to `assay.type`.
`center`	A logical scalar indicating whether each feature should have its mean expression centered at zero prior to plotting.
`scale`	A logical scalar specifying whether each feature should have its expression values scaled to have unit variance prior to plotting.
`zlim`	A numeric vector of length 2, specifying the upper and lower bounds for colour mapping of expression values. Values outside this range are set to the most extreme colour. If `NULL`, it defaults to the range of the expression matrix. If `center=TRUE`, this defaults to the range of the centered expression matrix, made symmetric around zero.
`colour`	A vector of colours specifying the palette to use for increasing expression. This defaults to viridis if `center=FALSE`, and the the `"RdYlBu"` colour palette from `brewer.pal` otherwise.
`color`, `color_columns_by`, `column_annotation_colors`, `color_rows_by`, `row_annotation_colors`	Aliases to `color`, `color_columns_by`, `column_annotation_colors`, `color_rows_by`, `row_annotation_colors`.
`colour_columns_by`	A list of values specifying how the columns should be annotated with colours. Each entry of the list can be any acceptable input to the `by` argument in `?retrieveCellInfo`. A character vector can also be supplied and will be treated as a list of strings.
`column_annotation_colours`	A named list of colour scales to be used for the column annotations specified in `colour_columns_by`. Names should be character values present in `colour_columns_by`, If a colour scale is not specified for a particular annotation, a default colour scale is chosen. The full list of colour maps is passed to `pheatmap` as the `annotation_colours` argument.
`row_annotation_colours`	Similar to `column_annotation_colours` but relating to row annotation rather than column annotation.
`colour_rows_by`	Similar to `colour_columns_by` but for rows rather than columns. Each entry of the list can be any acceptable input to the `by` argument in `?retrieveFeatureInfo`.
`order_columns_by`	A list of values specifying how the columns should be ordered. Each entry of the list can be any acceptable input to the `by` argument in `?retrieveCellInfo`. A character vector can also be supplied and will be treated as a list of strings. This argument is automatically appended to `colour_columns_by`.
`by_exprs_values`	Alias to `by.assay.type`.
`show_colnames`, `cluster_cols`, `...`	Additional arguments to pass to `pheatmap`.
`swap_rownames`	Column name of `rowData(object)` to be used to identify features instead of `rownames(object)` when labelling plot elements.
`assay.type`	A string or integer scalar indicating which assay of `object` should be used as expression values.
`by.assay.type`	A string or integer scalar specifying which assay to obtain expression values from, for colouring of column-level data - see the `assay.type` argument in `?retrieveCellInfo`.

Details

Setting center=TRUE is useful for examining log-fold changes of each cell's expression profile from the average across all cells. This avoids issues with the entire row appearing a certain colour because the gene is highly/lowly expressed across all cells.

Setting order_columns_by is useful for automatically ordering the heatmap by one or more factors of interest, e.g., cluster identity. This avoids the need to set colour_columns_by, cluster_cols and columns to achieve the same effect.

Value

A heatmap is produced on the current graphics device. The output of pheatmap is invisibly returned.

Author(s)

Aaron Lun

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

plotHeatmap(example_sce, features=rownames(example_sce)[1:10])

plotHeatmap(example_sce, features=rownames(example_sce)[1:10],
    center=TRUE)

plotHeatmap(example_sce, features=rownames(example_sce)[1:10],
    colour_columns_by=c("Mutation_Status", "Cell_Cycle"))

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

plotHeatmap(example_sce, features=rownames(example_sce)[1:10])

plotHeatmap(example_sce, features=rownames(example_sce)[1:10],
    center=TRUE)

plotHeatmap(example_sce, features=rownames(example_sce)[1:10],
    colour_columns_by=c("Mutation_Status", "Cell_Cycle"))

Plot the highest expressing features

Description

Plot the features with the highest average expression across all cells, along with their expression in each individual cell.

Usage

plotHighestExprs(
  object,
  n = 50,
  colour_cells_by = color_cells_by,
  drop_features = NULL,
  exprs_values = "counts",
  by_exprs_values = exprs_values,
  feature_names_to_plot = NULL,
  as_percentage = TRUE,
  swap_rownames = NULL,
  color_cells_by = NULL,
  assay.type = exprs_values,
  by.assay.type = by_exprs_values
)
plotHighestExprs(
  object,
  n = 50,
  colour_cells_by = color_cells_by,
  drop_features = NULL,
  exprs_values = "counts",
  by_exprs_values = exprs_values,
  feature_names_to_plot = NULL,
  as_percentage = TRUE,
  swap_rownames = NULL,
  color_cells_by = NULL,
  assay.type = exprs_values,
  by.assay.type = by_exprs_values
)

Arguments

`object`	A SingleCellExperiment object.
`n`	A numeric scalar specifying the number of the most expressed features to show.
`colour_cells_by`	Specification of a column metadata field or a feature to colour by, see `?retrieveCellInfo` for possible values.
`drop_features`	A character, logical or numeric vector indicating which features (e.g. genes, transcripts) to drop when producing the plot. For example, spike-in transcripts might be dropped to examine the contribution from endogenous genes.
`exprs_values`	Alias to `assay.type`.
`by_exprs_values`	Alias to `by.assay.type`.
`feature_names_to_plot`	String specifying which row-level metadata column contains the feature names. Alternatively, an AsIs-wrapped vector or a data.frame, see `?retrieveFeatureInfo` for possible values. Default is `NULL`, in which case `rownames(object)` are used.
`as_percentage`	logical scalar indicating whether percentages should be plotted. If `FALSE`, the raw `assay.type` are shown instead.
`swap_rownames`	Column name of `rowData(object)` to be used to identify features instead of `rownames(object)` when labelling plot elements.
`color_cells_by`	Alias to `colour_cells_by`.
`assay.type`	A integer scalar or string specifying the assay to obtain expression values from.
`by.assay.type`	A string or integer scalar specifying which assay to obtain expression values from, for use in colouring - see `?retrieveCellInfo` for details.

Details

This function will plot the percentage of counts accounted for by the top n most highly expressed features across the dataset. Each row on the plot corresponds to a feature and is sorted by average expression (denoted by the point). The distribution of expression across all cells is shown as tick marks for each feature. These ticks can be coloured according to cell-level metadata, as specified by colour_cells_by.

Value

A ggplot object.

Examples

example_sce <- mockSCE()
colData(example_sce) <- cbind(colData(example_sce),
     perCellQCMetrics(example_sce))

plotHighestExprs(example_sce, colour_cells_by="detected")
plotHighestExprs(example_sce, colour_cells_by="Mutation_Status")

example_sce <- mockSCE()
colData(example_sce) <- cbind(colData(example_sce),
     perCellQCMetrics(example_sce))

plotHighestExprs(example_sce, colour_cells_by="detected")
plotHighestExprs(example_sce, colour_cells_by="Mutation_Status")

Plot cells in plate positions

Description

Plots cells in their position on a plate, coloured by metadata variables or feature expression values from a SingleCellExperiment object.

Usage

plotPlatePosition(
  object,
  plate_position = NULL,
  colour_by = color_by,
  size_by = NULL,
  shape_by = NULL,
  order_by = NULL,
  by_exprs_values = "logcounts",
  add_legend = TRUE,
  theme_size = 24,
  point_alpha = 0.6,
  point_size = 24,
  point_shape = 19,
  other_fields = list(),
  swap_rownames = NULL,
  color_by = NULL,
  by.assay.type = by_exprs_values
)
plotPlatePosition(
  object,
  plate_position = NULL,
  colour_by = color_by,
  size_by = NULL,
  shape_by = NULL,
  order_by = NULL,
  by_exprs_values = "logcounts",
  add_legend = TRUE,
  theme_size = 24,
  point_alpha = 0.6,
  point_size = 24,
  point_shape = 19,
  other_fields = list(),
  swap_rownames = NULL,
  color_by = NULL,
  by.assay.type = by_exprs_values
)

Arguments

`object`	A SingleCellExperiment object.
`plate_position`	A character vector specifying the plate position for each cell (e.g., A01, B12, and so on, where letter indicates row and number indicates column). If `NULL`, the function will attempt to extract this from `object$plate_position`. Alternatively, a list of two factors (`"row"` and `"column"`) can be supplied, specifying the row (capital letters) and column (integer) for each cell in `object`.
`colour_by`	Specification of a column metadata field or a feature to colour by, see the `by` argument in `?retrieveCellInfo` for possible values.
`size_by`	Specification of a column metadata field or a feature to size by, see the `by` argument in `?retrieveCellInfo` for possible values.
`shape_by`	Specification of a column metadata field or a feature to shape by, see the `by` argument in `?retrieveCellInfo` for possible values.
`order_by`	Specification of a column metadata field or a feature to order points by, see the `by` argument in `?retrieveCellInfo` for possible values.
`by_exprs_values`	Alias for `by.assay.type`.
`add_legend`	Logical scalar specifying whether a legend should be shown.
`theme_size`	Numeric scalar, see `?"scater-plot-args"` for details.
`point_alpha`	Numeric scalar specifying the transparency of the points, see `?"scater-plot-args"` for details.
`point_size`	Numeric scalar specifying the size of the points, see `?"scater-plot-args"` for details.
`point_shape`	An integer, or a string specifying the shape of the points. See `?"scater-plot-args"` for details.
`other_fields`	Additional cell-based fields to include in the data.frame, see `?"scater-plot-args"` for details.
`swap_rownames`	Column name of `rowData(object)` to be used to identify features instead of `rownames(object)` when labelling plot elements.
`color_by`	Alias to `colour_by`.
`by.assay.type`	A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see the `assay.type` argument in `?retrieveCellInfo`.

Details

This function expects plate positions to be given in a charcter format where a letter indicates the row on the plate and a numeric value indicates the column. Each cell has a plate position such as "A01", "B12", "K24" and so on. From these plate positions, the row is extracted as the letter, and the column as the numeric part. Alternatively, the row and column identities can be directly supplied by setting plate_position as a list of two factors.

Value

A ggplot object.

Author(s)

Davis McCarthy, with modifications by Aaron Lun

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

## define plate positions
example_sce$plate_position <- paste0(
    rep(LETTERS[1:5], each = 8), 
    rep(formatC(1:8, width = 2, flag = "0"), 5)
)

## plot plate positions
plotPlatePosition(example_sce, colour_by = "Mutation_Status")

plotPlatePosition(example_sce, shape_by = "Treatment", 
    colour_by = "Gene_0004")

plotPlatePosition(example_sce, shape_by = "Treatment", size_by = "Gene_0001",
    colour_by = "Cell_Cycle")

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

## define plate positions
example_sce$plate_position <- paste0(
    rep(LETTERS[1:5], each = 8), 
    rep(formatC(1:8, width = 2, flag = "0"), 5)
)

## plot plate positions
plotPlatePosition(example_sce, colour_by = "Mutation_Status")

plotPlatePosition(example_sce, shape_by = "Treatment", 
    colour_by = "Gene_0004")

plotPlatePosition(example_sce, shape_by = "Treatment", size_by = "Gene_0001",
    colour_by = "Cell_Cycle")

Plot reduced dimensions

Description

Plot cell-level reduced dimension results stored in a SingleCellExperiment object.

Usage

plotReducedDim(
  object,
  dimred,
  ncomponents = 2,
  percentVar = NULL,
  colour_by = color_by,
  shape_by = NULL,
  size_by = NULL,
  order_by = NULL,
  by_exprs_values = "logcounts",
  text_by = NULL,
  text_size = 5,
  text_colour = text_color,
  label_format = c("%s %i", " (%i%%)"),
  other_fields = list(),
  text_color = "black",
  color_by = NULL,
  swap_rownames = NULL,
  point.padding = NA,
  force = 1,
  rasterise = FALSE,
  scattermore = FALSE,
  bins = NULL,
  summary_fun = "sum",
  hex = FALSE,
  by.assay.type = by_exprs_values,
  min.value = NULL,
  max.value = NULL,
  ...
)
plotReducedDim(
  object,
  dimred,
  ncomponents = 2,
  percentVar = NULL,
  colour_by = color_by,
  shape_by = NULL,
  size_by = NULL,
  order_by = NULL,
  by_exprs_values = "logcounts",
  text_by = NULL,
  text_size = 5,
  text_colour = text_color,
  label_format = c("%s %i", " (%i%%)"),
  other_fields = list(),
  text_color = "black",
  color_by = NULL,
  swap_rownames = NULL,
  point.padding = NA,
  force = 1,
  rasterise = FALSE,
  scattermore = FALSE,
  bins = NULL,
  summary_fun = "sum",
  hex = FALSE,
  by.assay.type = by_exprs_values,
  min.value = NULL,
  max.value = NULL,
  ...
)

Arguments

`object`	A SingleCellExperiment object.
`dimred`	A string or integer scalar indicating the reduced dimension result in `reducedDims(object)` to plot.
`ncomponents`	A numeric scalar indicating the number of dimensions to plot, starting from the first dimension. Alternatively, a numeric vector specifying the dimensions to be plotted.
`percentVar`	A numeric vector giving the proportion of variance in expression explained by each reduced dimension. Only expected to be used in PCA settings, e.g., in the `plotPCA` function.
`colour_by`	Specification of a column metadata field or a feature to colour by, see the `by` argument in `?retrieveCellInfo` for possible values.
`shape_by`	Specification of a column metadata field or a feature to shape by, see the `by` argument in `?retrieveCellInfo` for possible values.
`size_by`	Specification of a column metadata field or a feature to size by, see the `by` argument in `?retrieveCellInfo` for possible values.
`order_by`	Specification of a column metadata field or a feature to order points by, see the `by` argument in `?retrieveCellInfo` for possible values.
`by_exprs_values`	Alias for `by.assay.type`.
`text_by`	String specifying the column metadata field with which to add text labels on the plot. This must refer to a categorical field, i.e., coercible into a factor. Alternatively, an AsIs vector or data.frame, see `?retrieveCellInfo`.
`text_size`	Numeric scalar specifying the size of added text.
`text_colour`	String specifying the colour of the added text.
`label_format`	Character vector of length 2 containing format strings to use for the axis labels. The first string expects a string containing the result type (e.g., `"PCA"`) and an integer containing the component number, while the second string shows the rounded percentage of variance explained and is only relevant when this information is provided in `object`.
`other_fields`	Additional cell-based fields to include in the data.frame, see `?"scater-plot-args"` for details.
`text_color`	Alias to `text_colour`.
`color_by`	Alias to `colour_by`.
`swap_rownames`	Column name of `rowData(object)` to be used to identify features instead of `rownames(object)` when labelling plot elements.
`point.padding`, `force`	See `?ggrepel::geom_text_repel`.
`rasterise`	Whether to rasterise the points in the plot with `rasterise`. To control the dpi, set `options(ggrastr.default.dpi)`, for example `options(ggrastr.default.dpi=300)`.
`scattermore`	Logical, whether to use the `scattermore` package to greatly speed up plotting a large number of cells. Use `point_size = 0` for the most performance gain.
`bins`	Number of bins, can be different in x and y, to bin and summarize the points and their values, to avoid overplotting. If `NULL` (default), then the points are plotted without binning. Only used when both x and y are numeric.
`summary_fun`	Function to summarize the feature value of each point (e.g. gene expression of each cell) when the points binned, defaults to `sum`. Can be either the name of the function or the function itself.
`hex`	Logical, whether to use `geom_hex`.
`by.assay.type`	A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see the `assay.type` argument in `?retrieveCellInfo`.
`min.value`, `max.value`	Minimum and maximum values, beyond which `colour_by` values (if numeric) are truncated. Can be set to a numeric value to prevent outlying values from skewing the colour scale, or set to quantiles of the `colour_by` variable by setting to (e.g.) `"q10"` for the 10th quantile.
`...`	Additional arguments for visualization, see `?"scater-plot-args"` for details.

Details

If ncomponents is a scalar equal to 2, a scatterplot of the first two dimensions is produced. If ncomponents is greater than 2, a pairs plots for the top dimensions is produced.

Alternatively, if ncomponents is a vector of length 2, a scatterplot of the two specified dimensions is produced. If it is of length greater than 2, a pairs plot is produced containing all pairwise plots between the specified dimensions.

The text_by option will add factor levels as labels onto the plot, placed at the median coordinate across all points in that level. This is useful for annotating position-related metadata (e.g., clusters) when there are too many levels to distinguish by colour. It is only available for scatterplots.

Value

A ggplot object

Note

Author(s)

Davis McCarthy, with modifications by Aaron Lun

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

example_sce <- runPCA(example_sce, ncomponents=5)
plotReducedDim(example_sce, "PCA")
plotReducedDim(example_sce, "PCA", colour_by="Cell_Cycle")
plotReducedDim(example_sce, "PCA", colour_by="Gene_0001")

plotReducedDim(example_sce, "PCA", ncomponents=5)
plotReducedDim(example_sce, "PCA", ncomponents=5, colour_by="Cell_Cycle",
    shape_by="Treatment")

# Use scattermore
plotPCA(example_sce, ncomponents = 4, scattermore = TRUE, point_size = 3)

# Bin to show point density
plotPCA(example_sce, bins = 10)
# Bin to summarize values (default is sum)
plotPCA(example_sce, bins = 10, colour_by = "Gene_0001")

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

example_sce <- runPCA(example_sce, ncomponents=5)
plotReducedDim(example_sce, "PCA")
plotReducedDim(example_sce, "PCA", colour_by="Cell_Cycle")
plotReducedDim(example_sce, "PCA", colour_by="Gene_0001")

plotReducedDim(example_sce, "PCA", ncomponents=5)
plotReducedDim(example_sce, "PCA", ncomponents=5, colour_by="Cell_Cycle",
    shape_by="Treatment")

# Use scattermore
plotPCA(example_sce, ncomponents = 4, scattermore = TRUE, point_size = 3)

# Bin to show point density
plotPCA(example_sce, bins = 10)
# Bin to summarize values (default is sum)
plotPCA(example_sce, bins = 10, colour_by = "Gene_0001")

Plot relative log expression

Description

Produce a relative log expression (RLE) plot of one or more transformations of cell expression values.

Usage

plotRLE(
  object,
  exprs_values = "logcounts",
  exprs_logged = TRUE,
  style = "minimal",
  legend = TRUE,
  ordering = NULL,
  colour_by = color_by,
  by_exprs_values = exprs_values,
  BPPARAM = BiocParallel::bpparam(),
  color_by = NULL,
  assay.type = exprs_values,
  by.assay.type = by_exprs_values,
  assay_logged = exprs_logged,
  ...
)
plotRLE(
  object,
  exprs_values = "logcounts",
  exprs_logged = TRUE,
  style = "minimal",
  legend = TRUE,
  ordering = NULL,
  colour_by = color_by,
  by_exprs_values = exprs_values,
  BPPARAM = BiocParallel::bpparam(),
  color_by = NULL,
  assay.type = exprs_values,
  by.assay.type = by_exprs_values,
  assay_logged = exprs_logged,
  ...
)

Arguments

`object`	A SingleCellExperiment object.
`exprs_values`	Alias to `assay.type`.
`exprs_logged`	A logical scalar indicating whether the expression matrix is already log-transformed. If not, a log2-transformation (+1) will be performed prior to plotting.
`style`	String defining the boxplot style to use, either `"minimal"` (default) or `"full"`; see Details.
`legend`	Logical scalar specifying whether a legend should be shown.
`ordering`	A vector specifying the ordering of cells in the RLE plot. This can be useful for arranging cells by experimental conditions or batches.
`colour_by`	Specification of a column metadata field or a feature to colour by, see the `by` argument in `?retrieveCellInfo` for possible values.
`by_exprs_values`	Alias to `by.assay.type`.
`BPPARAM`	A BiocParallelParam object to be used to parallelise operations using `DelayedArray`.
`color_by`	Alias to `colour_by`.
`assay.type`	A string or integer scalar specifying the expression matrix in `object` to use.
`by.assay.type`	A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see the `assay.type` argument in `?retrieveCellInfo`.
`assay_logged`	Alias to `exprs_logged`.
`...`	further arguments passed to `geom_boxplot` when `style="full"`.

Details

Relative log expression (RLE) plots are a powerful tool for visualising unwanted variation in high dimensional data. These plots were originally devised for gene expression data from microarrays but can also be used on single-cell expression data. RLE plots are particularly useful for assessing whether a procedure aimed at removing unwanted variation (e.g., scaling normalisation) has been successful.

If style is “full”, the usual ggplot2 boxplot is created for each cell. Here, the box shows the inter-quartile range and whiskers extend no more than 1.5 times the IQR from the hinge (the 25th or 75th percentile). Data beyond the whiskers are called outliers and are plotted individually. The median (50th percentile) is shown with a white bar. This approach is detailed and flexible, but can take a long time to plot for large datasets.

If style is “minimal”, a Tufte-style boxplot is created for each cell. Here, the median is shown with a circle, the IQR in a grey line, and “whiskers” (as defined above) for the plots are shown with coloured lines. No outliers are shown for this plot style. This approach is more succinct and faster for large numbers of cells.

Value

A ggplot object

Author(s)

Davis McCarthy, with modifications by Aaron Lun

References

Gandolfo LC, Speed TP (2017). RLE plots: visualising unwanted variation in high dimensional data. arXiv.

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

plotRLE(example_sce, colour_by = "Mutation_Status", style = "minimal")

plotRLE(example_sce, colour_by = "Mutation_Status", style = "full",
       outlier.alpha = 0.1, outlier.shape = 3, outlier.size = 0)

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

plotRLE(example_sce, colour_by = "Mutation_Status", style = "minimal")

plotRLE(example_sce, colour_by = "Mutation_Status", style = "full",
       outlier.alpha = 0.1, outlier.shape = 3, outlier.size = 0)

Plot row metadata

Description

Plot row-level (i.e., gene) metadata from a SingleCellExperiment object.

Usage

plotRowData(
  object,
  y,
  x = NULL,
  colour_by = color_by,
  shape_by = NULL,
  size_by = NULL,
  by_exprs_values = "logcounts",
  other_fields = list(),
  color_by = NULL,
  by.assay.type = by_exprs_values,
  ...
)
plotRowData(
  object,
  y,
  x = NULL,
  colour_by = color_by,
  shape_by = NULL,
  size_by = NULL,
  by_exprs_values = "logcounts",
  other_fields = list(),
  color_by = NULL,
  by.assay.type = by_exprs_values,
  ...
)

Arguments

`object`	A SingleCellExperiment object containing expression values and experimental information.
`y`	String specifying the column-level metadata field to show on the y-axis. Alternatively, an AsIs vector or data.frame, see `?retrieveFeatureInfo`.
`x`	String specifying the column-level metadata to show on the x-axis. Alternatively, an AsIs vector or data.frame, see `?retrieveFeatureInfo`. If `NULL`, nothing is shown on the x-axis.
`colour_by`	Specification of a row metadata field or a cell to colour by, see `?retrieveFeatureInfo` for possible values.
`shape_by`	Specification of a row metadata field or a cell to shape by, see `?retrieveFeatureInfo` for possible values.
`size_by`	Specification of a row metadata field or a cell to size by, see `?retrieveFeatureInfo` for possible values.
`by_exprs_values`	Alias to `by.assay.type`.
`other_fields`	Additional feature-based fields to include in the data.frame, see `?"scater-plot-args"` for details.
`color_by`	Alias to `colour_by`.
`by.assay.type`	A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see `?retrieveFeatureInfo` for details.
`...`	Additional arguments for visualization, see `?"scater-plot-args"` for details.

Details

Value

A ggplot object.

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
rowData(example_sce) <- cbind(rowData(example_sce), 
    perFeatureQCMetrics(example_sce))

plotRowData(example_sce, y="detected", x="mean") +
    scale_x_log10()

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
rowData(example_sce) <- cbind(rowData(example_sce), 
    perFeatureQCMetrics(example_sce))

plotRowData(example_sce, y="detected", x="mean") +
    scale_x_log10()

Plot an overview of expression for each cell

Description

Plot the relative proportion of the library size that is accounted for by the most highly expressed features for each cell in a SingleCellExperiment object.

Usage

plotScater(
  x,
  nfeatures = 500,
  exprs_values = "counts",
  colour_by = color_by,
  by_exprs_values = exprs_values,
  block1 = NULL,
  block2 = NULL,
  ncol = 3,
  line_width = 1.5,
  theme_size = 10,
  color_by = NULL,
  assay.type = exprs_values,
  by.assay.type = by_exprs_values
)
plotScater(
  x,
  nfeatures = 500,
  exprs_values = "counts",
  colour_by = color_by,
  by_exprs_values = exprs_values,
  block1 = NULL,
  block2 = NULL,
  ncol = 3,
  line_width = 1.5,
  theme_size = 10,
  color_by = NULL,
  assay.type = exprs_values,
  by.assay.type = by_exprs_values
)

Arguments

`x`	A SingleCellExperiment object.
`nfeatures`	Numeric scalar indicating the number of top-expressed features to show n the plot.
`exprs_values`	Alias to `assay.type`.
`colour_by`	Specification of a column metadata field or a feature to colour by, see the `by` argument in `?retrieveCellInfo` for possible values. The curve for each cell will be coloured according to this specification.
`by_exprs_values`	Alias to `by.assay.type`.
`block1`	String specifying the column-level metadata field by which to separate the cells into separate panels in the plot. Alternatively, an AsIs vector or data.frame, see `?retrieveCellInfo`. Default is `NULL`, in which case there is no blocking.
`block2`	Same as `block1`, providing another level of blocking.
`ncol`	Number of columns to use for `facet_wrap` if only one block is defined.
`line_width`	Numeric scalar specifying the line width.
`theme_size`	Numeric scalar specifying the font size to use for the plotting theme.
`color_by`	Alias to `colour_by`.
`assay.type`	String or integer scalar indicating which assay of `object` should be used to obtain the expression values for this plot.
`by.assay.type`	A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see the `assay.type` argument in `?retrieveCellInfo`.

Details

For each cell, the features are ordered from most-expressed to least-expressed. The cumulative proportion of the total expression for the cell is computed across the top nfeatures features. These plots can flag cells with a very high proportion of the library coming from a small number of features; such cells are likely to be problematic for downstream analyses.

Using the colour and blocking arguments can flag overall differences in cells under different experimental conditions or affected by different batch and other variables. If only one of block1 and block2 are specified, each panel corresponds to a separate level of the specified blocking factor. If both are specified, each panel corresponds to a combination of levels.

Value

A ggplot object.

Author(s)

Davis McCarthy, with modifications by Aaron Lun

Examples

example_sce <- mockSCE()
plotScater(example_sce)
plotScater(example_sce, assay.type = "counts", colour_by = "Cell_Cycle")
plotScater(example_sce, block1 = "Treatment", colour_by = "Cell_Cycle")

example_sce <- mockSCE()
plotScater(example_sce)
plotScater(example_sce, assay.type = "counts", colour_by = "Cell_Cycle")
plotScater(example_sce, block1 = "Treatment", colour_by = "Cell_Cycle")

Project cells into an arbitrary dimensionality reduction space.

Description

Projects observations into arbitrary dimensionality reduction space (e.g., t-SNE, UMAP) using a tricube weighted average of the k nearest neighbours.

Usage

projectReducedDim(x, ...)

## S4 method for signature 'matrix'
projectReducedDim(x, old.embedding, ...)

## S4 method for signature 'SummarizedExperiment'
projectReducedDim(
  x,
  old.sce,
  dimred.embed = "TSNE",
  dimred.knn = "PCA",
  dimred.name = dimred.embed,
  k = 5
)
projectReducedDim(x, ...)

## S4 method for signature 'matrix'
projectReducedDim(x, old.embedding, ...)

## S4 method for signature 'SummarizedExperiment'
projectReducedDim(
  x,
  old.sce,
  dimred.embed = "TSNE",
  dimred.knn = "PCA",
  dimred.name = dimred.embed,
  k = 5
)

Arguments

`x`	A numeric matrix of a dimensionality reduction containing the cells that should be projected into the existing embedding defined in either `old.embedding` or `old.sce`. Alternatively, a SummarizedExperiment or SingleCellExperiment containing such a matrix.
`...`	Passed to methods.
`old.embedding`	If `x` is a matrix and `old` is given, then `old.embedding` is the existing dimensionality reduction embedding that `x` should be projected into.
`old.sce`	The object containing the original dimensionality points. If `x` is a matrix, then `old.points` must be supplied as a matrix of
`dimred.embed`	The name of the target dimensionality reduction that points should be embedded into, if .
`dimred.knn`	The name of the dimensionality reduction to use to identify the K-nearest neighbours from `x` in the dimensionality reduction slot of the same name defined in either `old` or `old.sce`.
`dimred.name`	The name of the dimensionality reduction that the projected embedding will be saved as, for the SummarizedExperiment method.
`k`	The number of nearest neighours to use to project points into the embedding.

Value

When x is a matrix, a matrix is returned. When x is a SummarizedExperiment (or SingleCellExperiment), the return value is of the same class as the input, but the projected dimensionality reduction is added as a reducedDim field.

Examples

example_sce <- mockSCE() 
example_sce <- logNormCounts(example_sce)
example_sce <- runUMAP(example_sce)
example_sce <- runPCA(example_sce)

example_sce_new <- mockSCE() 
example_sce_new <- logNormCounts(example_sce_new)
example_sce_new <- runPCA(example_sce_new)

## sce method
projectReducedDim(
    example_sce_new,
    old.sce = example_sce,
    dimred.embed="UMAP",
    dimred.knn="PCA"
)

## matrix method
projectReducedDim(
    reducedDim(example_sce, "PCA"),
    new.points = reducedDim(example_sce_new, "PCA"),
    old.embedding = reducedDim(example_sce, "UMAP")
)
example_sce <- mockSCE() 
example_sce <- logNormCounts(example_sce)
example_sce <- runUMAP(example_sce)
example_sce <- runPCA(example_sce)

example_sce_new <- mockSCE() 
example_sce_new <- logNormCounts(example_sce_new)
example_sce_new <- runPCA(example_sce_new)

## sce method
projectReducedDim(
    example_sce_new,
    old.sce = example_sce,
    dimred.embed="UMAP",
    dimred.knn="PCA"
)

## matrix method
projectReducedDim(
    reducedDim(example_sce, "PCA"),
    new.points = reducedDim(example_sce_new, "PCA"),
    old.embedding = reducedDim(example_sce, "UMAP")
)

Plot specific reduced dimensions

Description

Wrapper functions to create plots for specific types of reduced dimension results in a SingleCellExperiment object.

Usage

plotPCASCE(object, ..., ncomponents = 2, dimred = "PCA")

plotTSNE(object, ..., ncomponents = 2, dimred = "TSNE")

plotUMAP(object, ..., ncomponents = 2, dimred = "UMAP")

plotDiffusionMap(object, ..., ncomponents = 2, dimred = "DiffusionMap")

plotMDS(object, ..., ncomponents = 2, dimred = "MDS")

plotNMF(object, ..., ncomponents = 2, dimred = "NMF")

## S4 method for signature 'SingleCellExperiment'
plotPCA(object, ..., ncomponents = 2, dimred = "PCA")
plotPCASCE(object, ..., ncomponents = 2, dimred = "PCA")

plotTSNE(object, ..., ncomponents = 2, dimred = "TSNE")

plotUMAP(object, ..., ncomponents = 2, dimred = "UMAP")

plotDiffusionMap(object, ..., ncomponents = 2, dimred = "DiffusionMap")

plotMDS(object, ..., ncomponents = 2, dimred = "MDS")

plotNMF(object, ..., ncomponents = 2, dimred = "NMF")

## S4 method for signature 'SingleCellExperiment'
plotPCA(object, ..., ncomponents = 2, dimred = "PCA")

Arguments

`object`	A SingleCellExperiment object.
`...`	Additional arguments to pass to `plotReducedDim`.
`ncomponents`	Numeric scalar indicating the number of dimensions components to (calculate and) plot. This can also be a numeric vector, see `?plotReducedDim` for details.
`dimred`	A string or integer scalar indicating the reduced dimension result in `reducedDims(object)` to plot.

Details

Each function is a convenient wrapper around plotReducedDim that searches the reducedDims slot for an appropriately named dimensionality reduction result:

"PCA" for plotPCA
"TSNE" for plotTSNE
"DiffusionMap" for plotDiffusionMap
"MDS" for "plotMDS"
"NMF" for "plotNMF"
"UMAP" for "plotUMAP"

Its only purpose is to streamline workflows to avoid the need to specify the dimred argument.

Value

A ggplot object.

Author(s)

Davis McCarthy, with modifications by Aaron Lun

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
example_sce <- runPCA(example_sce)

## Examples plotting PC1 and PC2
plotPCA(example_sce)
plotPCA(example_sce, colour_by = "Cell_Cycle")
plotPCA(example_sce, colour_by = "Cell_Cycle", shape_by = "Treatment")

## Examples plotting more than 2 PCs
plotPCA(example_sce, ncomponents = 4, colour_by = "Treatment",
    shape_by = "Mutation_Status")

## Same for TSNE:
example_sce <- runTSNE(example_sce)
plotTSNE(example_sce, colour_by="Mutation_Status")

## Not run: 
## Same for DiffusionMaps:
example_sce <- runDiffusionMap(example_sce)
plotDiffusionMap(example_sce)

## End(Not run)

## Same for MDS plots:
example_sce <- runMDS(example_sce)
plotMDS(example_sce)

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
example_sce <- runPCA(example_sce)

## Examples plotting PC1 and PC2
plotPCA(example_sce)
plotPCA(example_sce, colour_by = "Cell_Cycle")
plotPCA(example_sce, colour_by = "Cell_Cycle", shape_by = "Treatment")

## Examples plotting more than 2 PCs
plotPCA(example_sce, ncomponents = 4, colour_by = "Treatment",
    shape_by = "Mutation_Status")

## Same for TSNE:
example_sce <- runTSNE(example_sce)
plotTSNE(example_sce, colour_by="Mutation_Status")

## Not run: 
## Same for DiffusionMaps:
example_sce <- runDiffusionMap(example_sce)
plotDiffusionMap(example_sce)

## End(Not run)

## Same for MDS plots:
example_sce <- runMDS(example_sce)
plotMDS(example_sce)

Cell-based data retrieval

Description

Retrieves a per-cell (meta)data field from a SingleCellExperiment based on a single keyword, typically for use in visualization functions.

Usage

retrieveCellInfo(
  x,
  by,
  search = c("colData", "assays", "altExps"),
  exprs_values = "logcounts",
  swap_rownames = NULL,
  assay.type = exprs_values
)
retrieveCellInfo(
  x,
  by,
  search = c("colData", "assays", "altExps"),
  exprs_values = "logcounts",
  swap_rownames = NULL,
  assay.type = exprs_values
)

Arguments

`x`	A SingleCellExperiment object.
`by`	A string specifying the field to extract (see Details). Alternatively, a data.frame, DataFrame or an AsIs vector.
`search`	Character vector specifying the types of data or metadata to use.
`exprs_values`	Alias to `assay.type`.
`swap_rownames`	Column name of `rowData(object)` to be used to identify features instead of `rownames(object)` when labelling plot elements.
`assay.type`	String or integer scalar specifying the assay from which expression values should be extracted.

Details

Given an AsIs-wrapped vector in by, this function will directly return the vector values as value, while name is set to an empty string. For data.frame or DataFrame instances with a single column, this function will return the vector from that column as value and the column name as name. This allows downstream visualization functions to accommodate arbitrary inputs for adjusting aesthetics.

Given a character string in by, this function will:

Search colData for a column named by, and return the corresponding field as the output value. We do not consider nested elements within the colData.
Search assay(x, assay.type) for a row named by, and return the expression vector for this feature as the output value.
Search each alternative experiment in altExps(x) for a row names by, and return the expression vector for this feature at assay.type as the output value.

Any match will cause the function to return without considering later possibilities. The search can be modified by changing the presence and ordering of elements in search.

If there is a name clash that results in retrieval of an unintended field, users should explicitly set by to a data.frame, DataFrame or AsIs-wrapped vector containing the desired values. Developers can also consider setting search to control the fields that are returned.

Value

A list containing name, a string with the name of the extracted field (usually identically to by); and value, a vector of length equal to ncol(x) containing per-cell (meta)data values. If by=NULL, both name and value are set to NULL.

Author(s)

Aaron Lun

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

retrieveCellInfo(example_sce, "Cell_Cycle")
retrieveCellInfo(example_sce, "Gene_0001")

arbitrary.field <- rnorm(ncol(example_sce))
retrieveCellInfo(example_sce, I(arbitrary.field))
retrieveCellInfo(example_sce, data.frame(stuff=arbitrary.field))

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

retrieveCellInfo(example_sce, "Cell_Cycle")
retrieveCellInfo(example_sce, "Gene_0001")

arbitrary.field <- rnorm(ncol(example_sce))
retrieveCellInfo(example_sce, I(arbitrary.field))
retrieveCellInfo(example_sce, data.frame(stuff=arbitrary.field))

Feature-based data retrieval

Description

Retrieves a per-feature (meta)data field from a SingleCellExperiment based on a single keyword, typically for use in visualization functions.

Usage

retrieveFeatureInfo(
  x,
  by,
  search = c("rowData", "assays"),
  exprs_values = "logcounts",
  assay.type = exprs_values
)
retrieveFeatureInfo(
  x,
  by,
  search = c("rowData", "assays"),
  exprs_values = "logcounts",
  assay.type = exprs_values
)

Arguments

`x`	A SingleCellExperiment object.
`by`	A string specifying the field to extract (see Details). Alternatively, a data.frame, DataFrame or an AsIs vector.
`search`	Character vector specifying the types of data or metadata to use.
`exprs_values`	Alias to `assay.type`.
`assay.type`	String or integer scalar specifying the assay from which expression values should be extracted.

Details

Given a AsIs-wrapped vector in by, this function will directly return the vector values as value, while name is set to an empty string. For data.frame or DataFrame instances with a single column, this function will return the vector from that column as value and the column name as name. This allows downstream visualization functions to accommodate arbitrary inputs for adjusting aesthetics.

Given a character string in by, this function will:

Search rowData for a column named by, and return the corresponding field as the output value. We do not consider nested elements within the rowData.
Search assay(x, assay.type) for a column named by, and return the expression vector for this feature as the output value.

Any match will cause the function to return without considering later possibilities. The search can be modified by changing the presence and ordering of elements in search.

Value

A list containing name, a string with the name of the extracted field (usually identically to by); and value, a vector of length equal to ncol(x) containing per-feature (meta)data values. If by=NULL, both name and value are set to NULL.

Author(s)

Aaron Lun

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
rowData(example_sce)$blah <- sample(LETTERS,
    nrow(example_sce), replace=TRUE)

str(retrieveFeatureInfo(example_sce, "blah"))
str(retrieveFeatureInfo(example_sce, "Cell_001"))

arbitrary.field <- rnorm(nrow(example_sce))
str(retrieveFeatureInfo(example_sce, I(arbitrary.field)))
str(retrieveFeatureInfo(example_sce, data.frame(stuff=arbitrary.field)))

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
rowData(example_sce)$blah <- sample(LETTERS,
    nrow(example_sce), replace=TRUE)

str(retrieveFeatureInfo(example_sce, "blah"))
str(retrieveFeatureInfo(example_sce, "Cell_001"))

arbitrary.field <- rnorm(nrow(example_sce))
str(retrieveFeatureInfo(example_sce, I(arbitrary.field)))
str(retrieveFeatureInfo(example_sce, data.frame(stuff=arbitrary.field)))

Perform PCA on column metadata

Description

Perform a principal components analysis (PCA) on cells, based on the column metadata in a SingleCellExperiment object.

Usage

runColDataPCA(
  x,
  ncomponents = 2,
  variables = NULL,
  scale = TRUE,
  outliers = FALSE,
  BSPARAM = ExactParam(),
  BPPARAM = SerialParam(),
  name = "PCA_coldata"
)
runColDataPCA(
  x,
  ncomponents = 2,
  variables = NULL,
  scale = TRUE,
  outliers = FALSE,
  BSPARAM = ExactParam(),
  BPPARAM = SerialParam(),
  name = "PCA_coldata"
)

Arguments

`x`	A SingleCellExperiment object.
`ncomponents`	Numeric scalar indicating the number of principal components to obtain.
`variables`	List of strings or a character vector indicating which variables in `colData(x)` to use. If a list, each entry can also be an AsIs vector or a data.frame, as described in `?retrieveCellInfo`.
`scale`	Logical scalar, should the expression values be standardised so that each feature has unit variance? This will also remove features with standard deviations below 1e-8.
`outliers`	Logical indicating whether outliers should be detected based on PCA coordinates.
`BSPARAM`	A BiocSingularParam object specifying which algorithm should be used to perform the PCA.
`BPPARAM`	A BiocParallelParam object specifying whether the PCA should be parallelized.
`name`	String specifying the name to be used to store the result in the `reducedDims` of the output.

Details

This function performs PCA on variables from the column-level metadata instead of the gene expression matrix. Doing so can be occasionally useful when other forms of experimental data are stored in the colData, e.g., protein intensities from FACs or other cell-specific phenotypic information.

This function is particularly useful for identifying low-quality cells based on QC metrics with outliers=TRUE. This uses an “outlyingness” measure computed by adjOutlyingness in the robustbase package. Outliers are defined those cells with outlyingness values more than 5 MADs above the median, using isOutlier.

Value

A SingleCellExperiment object containing the first ncomponent principal coordinates for each cell. By default, these are stored in the "PCA_coldata" entry of the reducedDims slot. The proportion of variance explained by each PC is stored as a numeric vector in the "percentVar" attribute.

If outliers=TRUE, the output colData will also contain a logical outlier field. This specifies the cells that correspond to the identified outliers.

Author(s)

Aaron Lun, based on code by Davis McCarthy

Examples

example_sce <- mockSCE()
qc.df <- perCellQCMetrics(example_sce, subset=list(Mito=1:10))
colData(example_sce) <- cbind(colData(example_sce), qc.df)

# Can supply names of colData variables to 'variables',
# as well as AsIs-wrapped vectors of interest.
example_sce <- runColDataPCA(example_sce, variables=list(
    "sum", "detected", "subsets_Mito_percent", "altexps_Spikes_percent" 
))
reducedDimNames(example_sce)
head(reducedDim(example_sce))

example_sce <- mockSCE()
qc.df <- perCellQCMetrics(example_sce, subset=list(Mito=1:10))
colData(example_sce) <- cbind(colData(example_sce), qc.df)

# Can supply names of colData variables to 'variables',
# as well as AsIs-wrapped vectors of interest.
example_sce <- runColDataPCA(example_sce, variables=list(
    "sum", "detected", "subsets_Mito_percent", "altexps_Spikes_percent" 
))
reducedDimNames(example_sce)
head(reducedDim(example_sce))

Multi-modal UMAP

Description

Perform UMAP with multiple input matrices by intersecting their simplicial sets. Typically used to combine results from multiple data modalities into a single embedding.

Usage

calculateMultiUMAP(x, ...)

## S4 method for signature 'ANY'
calculateMultiUMAP(x, ..., metric = "euclidean")

## S4 method for signature 'SummarizedExperiment'
calculateMultiUMAP(
  x,
  exprs_values,
  metric = "euclidean",
  assay.type = exprs_values,
  ...
)

## S4 method for signature 'SingleCellExperiment'
calculateMultiUMAP(
  x,
  exprs_values,
  dimred,
  altexp,
  altexp_exprs_values = "logcounts",
  assay.type = exprs_values,
  altexp.assay.type = altexp_exprs_values,
  ...
)

runMultiUMAP(x, ..., name = "MultiUMAP")
calculateMultiUMAP(x, ...)

## S4 method for signature 'ANY'
calculateMultiUMAP(x, ..., metric = "euclidean")

## S4 method for signature 'SummarizedExperiment'
calculateMultiUMAP(
  x,
  exprs_values,
  metric = "euclidean",
  assay.type = exprs_values,
  ...
)

## S4 method for signature 'SingleCellExperiment'
calculateMultiUMAP(
  x,
  exprs_values,
  dimred,
  altexp,
  altexp_exprs_values = "logcounts",
  assay.type = exprs_values,
  altexp.assay.type = altexp_exprs_values,
  ...
)

runMultiUMAP(x, ..., name = "MultiUMAP")

Arguments

`x`	For `calculateMultiUMAP`, a list of numeric matrices where each row is a cell and each column is some dimension/variable. For gene expression data, this is usually the matrix of PC coordinates. Alternatively, a SummarizedExperiment containing relevant matrices in its assays. Alternatively, a SingleCellExperiment containing relevant matrices in its assays, `reducedDims` or `altExps`. This is also the only permissible argument for `runMultiUMAP`.
`...`	For the generic, further arguments to pass to specific methods. For the ANY method, further arguments to pass to `umap`. For the SummarizedExperiment and SingleCellExperiment methods, and for `runMultiUMAP`, further arguments to pass to the ANY method.
`metric`	Character vector specifying the type of distance to use for each matrix in `x`. This is recycled to the same number of matrices supplied in `x`.
`exprs_values`	Alias to `assay.type`.
`assay.type`	A character or integer vector of assays to extract and transpose for use in the UMAP. For the SingleCellExperiment, this argument can be missing, in which case no assays are used.
`dimred`	A character or integer vector of `reducedDims` to extract for use in the UMAP. This argument can be missing, in which case no assays are used.
`altexp`	A character or integer vector of `altExps` to extract and transpose for use in the UMAP. This argument can be missing, in which case no alternative experiments are used.
`altexp_exprs_values`	Alias to `altexp.assay.type`.
`altexp.assay.type`	A character or integer vector specifying the assay to extract from alternative experiments, when `altexp` is specified. This is recycled to the same length as `altexp`.
`name`	String specifying the name of the `reducedDims` in which to store the UMAP.

Details

These functions serve as convenience wrappers around umap for multi-modal analysis. The idea is that each input matrix in x corresponds to data for a different mode. A typical example would consist of the PC coordinates generated from gene expression counts, plus the log-abundance matrix for ADT counts from CITE-seq experiments; one might also include matrices of transformed intensities from indexed FACS, to name some more possibilities.

Roughly speaking, the idea is to identify nearest neighbors within each mode to construct the simplicial sets. Integration of multiple modes is performed by intersecting the sets to obtain a single graph, which is used in the rest of the UMAP algorithm. By performing an intersection, we focus on relationships between cells that are consistently neighboring across all the modes, thus providing greater resolution of differences at any mode. The neighbor search within each mode also avoids difficulties with quantitative comparisons of distances between modes.

The most obvious use of this function is to generate a low-dimensional embedding for visualization. However, users can also set n_components to a higher value (e.g., 10-20) to retain more information for downstream steps like clustering. This Do, however, remember to set the seed appropriately.

By default, all modes use the distance metric of metric to construct the simplicial sets within each mode. However, it is possible to vary this by supplying a vector of metrics, e.g., "euclidean" for the first matrix, "manhattan" for the second. For the SingleCellExperiment method, matrices are extracted in the order of assays, reduced dimensions and alternative experiments, so any variation in metrics is also assumed to follow this order.

Value

For calculateMultiUMAP, a numeric matrix containing the low-dimensional UMAP embedding.

For runMultiUMAP, x is returned with a MultiUMAP field in its reducedDims.

Author(s)

Aaron Lun

Examples

# Mocking up a gene expression + ADT dataset:
exprs_sce <- mockSCE()
exprs_sce <- logNormCounts(exprs_sce)
exprs_sce <- runPCA(exprs_sce)

adt_sce <- mockSCE(ngenes=20) 
adt_sce <- logNormCounts(adt_sce)
altExp(exprs_sce, "ADT") <- adt_sce

# Running a multimodal analysis using PCs for expression
# and log-counts for the ADTs:
exprs_sce <- runMultiUMAP(exprs_sce, dimred="PCA", altexp="ADT")
plotReducedDim(exprs_sce, "MultiUMAP")

# Mocking up a gene expression + ADT dataset:
exprs_sce <- mockSCE()
exprs_sce <- logNormCounts(exprs_sce)
exprs_sce <- runPCA(exprs_sce)

adt_sce <- mockSCE(ngenes=20) 
adt_sce <- logNormCounts(adt_sce)
altExp(exprs_sce, "ADT") <- adt_sce

# Running a multimodal analysis using PCs for expression
# and log-counts for the ADTs:
exprs_sce <- runMultiUMAP(exprs_sce, dimred="PCA", altexp="ADT")
plotReducedDim(exprs_sce, "MultiUMAP")

The scater package

Description

Provides functions for convenient visualization of single-cell data, mostly via ggplot2. It also used to provide utilities for data transformation and quality control, but these have largely been moved to the scuttle package.

Author(s)

Davis McCarthy, Aaron Lun

General visualization parameters

Description

scater functions that plot points share a number of visualization parameters, which are described on this page.

Aesthetic parameters

add_legend:: Logical scalar, specifying whether a legend should be shown. Defaults to TRUE.
theme_size:: Integer scalar, specifying the font size. Defaults to 10.
point_alpha:: Numeric scalar in [0, 1], specifying the transparency. Defaults to 0.6.
point_size:: Numeric scalar, specifying the size of the points. Defaults to NULL.
point_shape:: An integer, or a string specifying the shape of the points. Details see vignette("ggplot2-specs"). Defaults to 19.
jitter_type:: String to define how points are to be jittered in a violin plot. This is either with random jitter on the x-axis ("jitter") or in a “beeswarm” style (if "swarm", default). The latter usually looks more attractive, but for datasets with a large number of cells, or for dense plots, the jitter option may work better.

Distributional calculations

show_median:: Logical, should the median of the distribution be shown for violin plots? Defaults to FALSE.
show_violin:: Logical, should the outline of a violin plot be shown? Defaults to TRUE.
show_smooth:: Logical, should a smoother be fitted to a scatter plot? Defaults to FALSE.
show_se:: Logical, should standard errors for the fitted line be shown on a scatter plot when show_smooth=TRUE? Defaults to TRUE.
show_boxplot:: Logical, should a box plot be shown? Defaults to FALSE.

Miscellaneous fields

Addititional fields can be added to the data.frame passed to ggplot by setting the other_fields argument. This allows users to easily incorporate additional metadata for use in further ggplot operations.

The other_fields argument should be character vector where each string is passed to retrieveCellInfo (for cell-based plots) or retrieveFeatureInfo (for feature-based plots). Alternatively, other_fields can be a named list where each element is of any type accepted by retrieveCellInfo or retrieveFeatureInfo. This includes AsIs-wrapped vectors, data.frames or DataFrames.

Each additional column of the output data.frame will be named according to the name returned by retrieveCellInfo or retrieveFeatureInfo. If these clash with inbuilt names (e.g., X, Y, colour_by), a warning will be raised and the additional column will not be added to avoid overwriting an existing column.

The "Single Cell Expression Set" (SCESet) class

Description

S4 class and the main class used by scater to hold single cell expression data. SCESet extends the basic Bioconductor ExpressionSet class.

Details

This class is initialized from a matrix of expression values.

Methods that operate on SCESet objects constitute the basic scater workflow.

Slots

logExprsOffset:: Scalar of class "numeric", providing an offset applied to expression data in the 'exprs' slot when undergoing log2-transformation to avoid trying to take logs of zero.
lowerDetectionLimit:: Scalar of class "numeric", giving the lower limit for an expression value to be classified as "expressed".
cellPairwiseDistances:: Matrix of class "numeric", containing pairwise distances between cells.
featurePairwiseDistances:: Matrix of class "numeric", containing pairwise distances between features.
reducedDimension:: Matrix of class "numeric", containing reduced-dimension coordinates for cells (generated, for example, by PCA).
bootstraps:: Array of class "numeric" that can contain bootstrap estimates of the expression or count values.
sc3:: List containing results from consensus clustering from the SC3 package.
featureControlInfo:: Data frame of class "AnnotatedDataFrame" that can contain information/metadata about sets of control features defined for the SCESet object. bootstrap estimates of the expression or count values.

References

Thanks to the Monocle package (github.com/cole-trapnell-lab/monocle-release/) for their CellDataSet class, which provided the inspiration and template for SCESet.

Convert an SCESet object to a SingleCellExperiment object

Description

Convert an SCESet object produced with an older version of the package to a SingleCellExperiment object compatible with the current version.

Usage

updateSCESet(object)

toSingleCellExperiment(object)
updateSCESet(object)

toSingleCellExperiment(object)

Arguments

object

an SCESet object to be updated

Value

a SingleCellExperiment object

Examples

## Not run: 
updateSCESet(example_sceset)

## End(Not run)
## Not run: 
toSingleCellExperiment(example_sceset)

## End(Not run)
## Not run: 
updateSCESet(example_sceset)

## End(Not run)
## Not run: 
toSingleCellExperiment(example_sceset)

## End(Not run)

Package 'scater'

Help Index

Get feature annotation information from Biomart

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Accessor and replacement for bootstrap results in a SingleCellExperiment object

Description

Usage

Arguments

Value

Author(s)

Examples

Perform MDS on cell-level data

Description

Usage

Arguments

Details

Value

Feature selection

Using reduced dimensions

Using alternative Experiments

Author(s)

See Also

Examples

Perform NMF on cell-level data

Description

Usage

Arguments

Details

Value

Feature selection

Using reduced dimensions

Using alternative Experiments

Author(s)

See Also

Examples

Perform PCA on expression data

Description

Usage

Arguments

Details

Value

Feature selection

Using reduced dimensions

Using alternative Experiments

Author(s)

See Also

Examples

Perform t-SNE on cell-level data

Description

Usage

Arguments

Details

Value

Feature selection

Using reduced dimensions

Using alternative Experiments

Author(s)

References

See Also

Examples

Perform UMAP on cell-level data

Description

Usage

Arguments

Details

Value

Feature selection

Using reduced dimensions

Using alternative Experiments

Author(s)

References

See Also

Examples

Defunct functions

Accessor and replacement for bootstrap results in a `SingleCellExperiment` object