Title: | Single-Cell Analysis Toolkit for Gene Expression Data in R |
---|---|
Description: | A collection of tools for doing various analyses of single-cell RNA-seq gene expression data, with a focus on quality control and visualization. |
Authors: | Davis McCarthy [aut], Kieran Campbell [aut], Aaron Lun [aut, ctb], Quin Wills [aut], Vladimir Kiselev [ctb], Felix G.M. Ernst [ctb], Alan O'Callaghan [ctb, cre], Yun Peng [ctb], Leo Lahti [ctb] , Tuomas Borman [ctb] |
Maintainer: | Alan O'Callaghan <[email protected]> |
License: | GPL-3 |
Version: | 1.35.0 |
Built: | 2024-12-30 04:24:00 UTC |
Source: | https://github.com/bioc/scater |
Use the biomaRt package to add feature annotation information to an SingleCellExperiment
.
annotateBMFeatures( ids, biomart = "ENSEMBL_MART_ENSEMBL", dataset = "mmusculus_gene_ensembl", id.type = "ensembl_gene_id", symbol.type, attributes = c(id.type, symbol.type, "chromosome_name", "gene_biotype", "start_position", "end_position"), filters = id.type, ... ) getBMFeatureAnnos(x, ids = rownames(x), ...)
annotateBMFeatures( ids, biomart = "ENSEMBL_MART_ENSEMBL", dataset = "mmusculus_gene_ensembl", id.type = "ensembl_gene_id", symbol.type, attributes = c(id.type, symbol.type, "chromosome_name", "gene_biotype", "start_position", "end_position"), filters = id.type, ... ) getBMFeatureAnnos(x, ids = rownames(x), ...)
ids |
A character vector containing feature identifiers. |
biomart |
String defining the biomaRt to be used, to be passed to |
dataset |
String defining the dataset to use, to be passed to |
id.type |
String specifying the type of identifier in |
symbol.type |
String specifying the type of symbol to retrieve.
If missing, this is set to |
attributes |
Character vector defining the attributes to pass to |
filters |
String defining the type of identifier in |
... |
For For |
x |
A SingleCellExperiment object. |
These functions provide convenient wrappers around biomaRt to quickly obtain annotation in the required format.
For annotateBMFeatures
, a DataFrame containing feature annotation, with one row per value in ids
.
For getBMFeatureAnnos
, x
is returned containing the output of annotateBMFeatures
appended to its rowData
.
Aaron Lun, based on code by Davis McCarthy
## Not run: # Making up Ensembl IDs for demonstration purposes. mock_id <- paste0("ENSMUSG", sprintf("%011d", seq_len(1000))) anno <- annotateBMFeatures(ids=mock_id) ## End(Not run)
## Not run: # Making up Ensembl IDs for demonstration purposes. mock_id <- paste0("ENSMUSG", sprintf("%011d", seq_len(1000))) anno <- annotateBMFeatures(ids=mock_id) ## End(Not run)
SingleCellExperiment
objectSingleCellExperiment
objects can contain bootstrap
expression values (for example, as generated by the kallisto software for
quantifying feature abundance). These functions conveniently access and
replace the 'bootstrap' elements in the assays
slot with the value
supplied, which must be an matrix of the correct size, namely the same
number of rows and columns as the SingleCellExperiment
object as a
whole.
bootstraps(object) bootstraps(object) <- value ## S4 method for signature 'SingleCellExperiment' bootstraps(object) ## S4 replacement method for signature 'SingleCellExperiment,array' bootstraps(object) <- value
bootstraps(object) bootstraps(object) <- value ## S4 method for signature 'SingleCellExperiment' bootstraps(object) ## S4 replacement method for signature 'SingleCellExperiment,array' bootstraps(object) <- value
object |
a |
value |
an array of class |
If accessing bootstraps slot of an SingleCellExperiment
, then
an array with the bootstrap values, otherwise an SingleCellExperiment
object containing new bootstrap values.
Davis McCarthy
example_sce <- mockSCE() bootstraps(example_sce)
example_sce <- mockSCE() bootstraps(example_sce)
Perform multi-dimensional scaling (MDS) on cells, based on the data in a SingleCellExperiment object.
calculateMDS(x, ...) ## S4 method for signature 'ANY' calculateMDS( x, FUN = dist, ncomponents = 2, ntop = 500, subset_row = NULL, scale = FALSE, transposed = FALSE, keep_dist = FALSE, ... ) ## S4 method for signature 'SummarizedExperiment' calculateMDS(x, ..., exprs_values = "logcounts", assay.type = exprs_values) ## S4 method for signature 'SingleCellExperiment' calculateMDS( x, ..., exprs_values = "logcounts", dimred = NULL, n_dimred = NULL, assay.type = exprs_values ) runMDS(x, ..., altexp = NULL, name = "MDS")
calculateMDS(x, ...) ## S4 method for signature 'ANY' calculateMDS( x, FUN = dist, ncomponents = 2, ntop = 500, subset_row = NULL, scale = FALSE, transposed = FALSE, keep_dist = FALSE, ... ) ## S4 method for signature 'SummarizedExperiment' calculateMDS(x, ..., exprs_values = "logcounts", assay.type = exprs_values) ## S4 method for signature 'SingleCellExperiment' calculateMDS( x, ..., exprs_values = "logcounts", dimred = NULL, n_dimred = NULL, assay.type = exprs_values ) runMDS(x, ..., altexp = NULL, name = "MDS")
x |
For For |
... |
For the For |
FUN |
A function that accepts a numeric matrix as its first argument,
where rows are samples and columns are features; and returns a distance
structure such as that returned by |
ncomponents |
Numeric scalar indicating the number of MDS?g dimensions to obtain. |
ntop |
Numeric scalar specifying the number of features with the highest variances to use for dimensionality reduction. |
subset_row |
Vector specifying the subset of features to use for dimensionality reduction. This can be a character vector of row names, an integer vector of row indices or a logical vector. |
scale |
Logical scalar, should the expression values be standardized? |
transposed |
Logical scalar, is |
keep_dist |
Logical scalar indicating whether the |
exprs_values |
Alias to |
assay.type |
Integer scalar or string indicating which assay of |
dimred |
String or integer scalar specifying the existing dimensionality reduction results to use. |
n_dimred |
Integer scalar or vector specifying the dimensions to use if |
altexp |
String or integer scalar specifying an alternative experiment containing the input data. |
name |
String specifying the name to be used to store the result in the |
The function cmdscale
is used internally to compute the MDS
components with eig = TRUE
.
The eig
and GOF
fields of the object returned by
cmdscale
are stored as attributes “eig” and “GOF”
of the MDS matrix calculated.
For calculateMDS
, a matrix is returned containing the MDS coordinates
for each cell (row) and dimension (column).
For runMDS
, a modified x
is returned that contains the MDS
coordinates in reducedDim(x, name)
.
This section is relevant if x
is a numeric matrix of (log-)expression values with features in rows and cells in columns;
or if x
is a SingleCellExperiment and dimred=NULL
.
In the latter, the expression values are obtained from the assay specified by assay.type
.
The subset_row
argument specifies the features to use for dimensionality reduction.
The aim is to allow users to specify highly variable features to improve the signal/noise ratio,
or to specify genes in a pathway of interest to focus on particular aspects of heterogeneity.
If subset_row=NULL
, the ntop
features with the largest variances are used instead.
We literally compute the variances from the expression values without considering any mean-variance trend,
so often a more considered choice of genes is possible, e.g., with scran functions.
Note that the value of ntop
is ignored if subset_row
is specified.
If scale=TRUE
, the expression values for each feature are standardized so that their variance is unity.
This will also remove features with standard deviations below 1e-8.
If x
is a SingleCellExperiment, the method can be applied on existing dimensionality reduction results in x
by setting the dimred
argument.
This is typically used to run slower non-linear algorithms (t-SNE, UMAP) on the results of fast linear decompositions (PCA).
We might also use this with existing reduced dimensions computed from a priori knowledge (e.g., gene set scores), where further dimensionality reduction could be applied to compress the data.
The matrix of existing reduced dimensions is taken from reducedDim(x, dimred)
.
By default, all dimensions are used to compute the second set of reduced dimensions.
If n_dimred
is also specified, only the first n_dimred
columns are used.
Alternatively, n_dimred
can be an integer vector specifying the column indices of the dimensions to use.
When dimred
is specified, no additional feature selection or standardization is performed.
This means that any settings of ntop
, subset_row
and scale
are ignored.
If x
is a numeric matrix, setting transposed=TRUE
will treat the rows as cells and the columns as the variables/diemnsions.
This allows users to manually pass in dimensionality reduction results without needing to wrap them in a SingleCellExperiment.
As such, no feature selection or standardization is performed, i.e., ntop
, subset_row
and scale
are ignored.
This section is relevant if x
is a SingleCellExperiment and altexp
is not NULL
.
In such cases, the method is run on data from an alternative SummarizedExperiment nested within x
.
This is useful for performing dimensionality reduction on other features stored in altExp(x, altexp)
, e.g., antibody tags.
Setting altexp
with assay.type
will use the specified assay from the alternative SummarizedExperiment.
If the alternative is a SingleCellExperiment, setting dimred
will use the specified dimensionality reduction results from the alternative.
This option will also interact as expected with n_dimred
.
Note that the output is still stored in the reducedDims
of the output SingleCellExperiment.
It is advisable to use a different name
to distinguish this output from the results generated from the main experiment's assay values.
Aaron Lun, based on code by Davis McCarthy
cmdscale
, to perform the underlying calculations.
dist
for the function used as default to calculate the
dist
object.
plotMDS
, to quickly visualize the results.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runMDS(example_sce) reducedDimNames(example_sce) head(reducedDim(example_sce))
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runMDS(example_sce) reducedDimNames(example_sce) head(reducedDim(example_sce))
Perform non-negative matrix factorization (NMF) for the cells, based on the data in a SingleCellExperiment object.
calculateNMF(x, ...) ## S4 method for signature 'ANY' calculateNMF( x, ncomponents = 2, ntop = 500, subset_row = NULL, scale = FALSE, transposed = FALSE, seed = 1, ... ) ## S4 method for signature 'SummarizedExperiment' calculateNMF(x, ..., exprs_values = "logcounts", assay.type = exprs_values) ## S4 method for signature 'SingleCellExperiment' calculateNMF( x, ..., exprs_values = "logcounts", dimred = NULL, n_dimred = NULL, assay.type = exprs_values ) runNMF(x, ..., altexp = NULL, name = "NMF")
calculateNMF(x, ...) ## S4 method for signature 'ANY' calculateNMF( x, ncomponents = 2, ntop = 500, subset_row = NULL, scale = FALSE, transposed = FALSE, seed = 1, ... ) ## S4 method for signature 'SummarizedExperiment' calculateNMF(x, ..., exprs_values = "logcounts", assay.type = exprs_values) ## S4 method for signature 'SingleCellExperiment' calculateNMF( x, ..., exprs_values = "logcounts", dimred = NULL, n_dimred = NULL, assay.type = exprs_values ) runNMF(x, ..., altexp = NULL, name = "NMF")
x |
For For |
... |
For the For |
ncomponents |
Numeric scalar indicating the number of NMF dimensions to obtain. |
ntop |
Numeric scalar specifying the number of features with the highest variances to use for dimensionality reduction. |
subset_row |
Vector specifying the subset of features to use for dimensionality reduction. This can be a character vector of row names, an integer vector of row indices or a logical vector. |
scale |
Logical scalar, should the expression values be standardized? |
transposed |
Logical scalar, is |
seed |
Random number generation seed to be passed to |
exprs_values |
Alias to |
assay.type |
Integer scalar or string indicating which assay of |
dimred |
String or integer scalar specifying the existing dimensionality reduction results to use. |
n_dimred |
Integer scalar or vector specifying the dimensions to use if |
altexp |
String or integer scalar specifying an alternative experiment containing the input data. |
name |
String specifying the name to be used to store the result in the |
The function nmf
is used internally to compute the NMF.
Note that the algorithm is not deterministic, so different runs of the function will produce differing results.
Users are advised to test multiple random seeds, and then use set.seed
to set a random seed for replicable results.
For calculateNMF
, a numeric matrix is returned containing the NMF coordinates for each cell (row) and dimension (column).
For runNMF
, a modified x
is returned that contains the NMF coordinates in reducedDim(x, name)
.
In both cases, the matrix will have the attribute "basis"
containing the gene-by-factor basis matrix.
This section is relevant if x
is a numeric matrix of (log-)expression values with features in rows and cells in columns;
or if x
is a SingleCellExperiment and dimred=NULL
.
In the latter, the expression values are obtained from the assay specified by assay.type
.
The subset_row
argument specifies the features to use for dimensionality reduction.
The aim is to allow users to specify highly variable features to improve the signal/noise ratio,
or to specify genes in a pathway of interest to focus on particular aspects of heterogeneity.
If subset_row=NULL
, the ntop
features with the largest variances are used instead.
We literally compute the variances from the expression values without considering any mean-variance trend,
so often a more considered choice of genes is possible, e.g., with scran functions.
Note that the value of ntop
is ignored if subset_row
is specified.
If scale=TRUE
, the expression values for each feature are standardized so that their variance is unity.
This will also remove features with standard deviations below 1e-8.
If x
is a SingleCellExperiment, the method can be applied on existing dimensionality reduction results in x
by setting the dimred
argument.
This is typically used to run slower non-linear algorithms (t-SNE, UMAP) on the results of fast linear decompositions (PCA).
We might also use this with existing reduced dimensions computed from a priori knowledge (e.g., gene set scores), where further dimensionality reduction could be applied to compress the data.
The matrix of existing reduced dimensions is taken from reducedDim(x, dimred)
.
By default, all dimensions are used to compute the second set of reduced dimensions.
If n_dimred
is also specified, only the first n_dimred
columns are used.
Alternatively, n_dimred
can be an integer vector specifying the column indices of the dimensions to use.
When dimred
is specified, no additional feature selection or standardization is performed.
This means that any settings of ntop
, subset_row
and scale
are ignored.
If x
is a numeric matrix, setting transposed=TRUE
will treat the rows as cells and the columns as the variables/diemnsions.
This allows users to manually pass in dimensionality reduction results without needing to wrap them in a SingleCellExperiment.
As such, no feature selection or standardization is performed, i.e., ntop
, subset_row
and scale
are ignored.
This section is relevant if x
is a SingleCellExperiment and altexp
is not NULL
.
In such cases, the method is run on data from an alternative SummarizedExperiment nested within x
.
This is useful for performing dimensionality reduction on other features stored in altExp(x, altexp)
, e.g., antibody tags.
Setting altexp
with assay.type
will use the specified assay from the alternative SummarizedExperiment.
If the alternative is a SingleCellExperiment, setting dimred
will use the specified dimensionality reduction results from the alternative.
This option will also interact as expected with n_dimred
.
Note that the output is still stored in the reducedDims
of the output SingleCellExperiment.
It is advisable to use a different name
to distinguish this output from the results generated from the main experiment's assay values.
Aaron Lun
nmf
, for the underlying calculations.
plotNMF
, to quickly visualize the results.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runNMF(example_sce) reducedDimNames(example_sce) head(reducedDim(example_sce))
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runNMF(example_sce) reducedDimNames(example_sce) head(reducedDim(example_sce))
Perform a principal components analysis (PCA) on cells, based on the expression data in a SingleCellExperiment object.
calculatePCA(x, ...) ## S4 method for signature 'ANY' calculatePCA( x, ncomponents = 50, ntop = 500, subset_row = NULL, scale = FALSE, transposed = FALSE, BSPARAM = bsparam(), BPPARAM = SerialParam() ) ## S4 method for signature 'SummarizedExperiment' calculatePCA(x, ..., exprs_values = "logcounts", assay.type = exprs_values) ## S4 method for signature 'SingleCellExperiment' calculatePCA( x, ..., exprs_values = "logcounts", dimred = NULL, n_dimred = NULL, assay.type = exprs_values ) ## S4 method for signature 'SingleCellExperiment' runPCA(x, ..., altexp = NULL, name = "PCA")
calculatePCA(x, ...) ## S4 method for signature 'ANY' calculatePCA( x, ncomponents = 50, ntop = 500, subset_row = NULL, scale = FALSE, transposed = FALSE, BSPARAM = bsparam(), BPPARAM = SerialParam() ) ## S4 method for signature 'SummarizedExperiment' calculatePCA(x, ..., exprs_values = "logcounts", assay.type = exprs_values) ## S4 method for signature 'SingleCellExperiment' calculatePCA( x, ..., exprs_values = "logcounts", dimred = NULL, n_dimred = NULL, assay.type = exprs_values ) ## S4 method for signature 'SingleCellExperiment' runPCA(x, ..., altexp = NULL, name = "PCA")
x |
For For |
... |
For the For |
ncomponents |
Numeric scalar indicating the number of principal components to obtain. |
ntop |
Numeric scalar specifying the number of features with the highest variances to use for dimensionality reduction. |
subset_row |
Vector specifying the subset of features to use for dimensionality reduction. This can be a character vector of row names, an integer vector of row indices or a logical vector. |
scale |
Logical scalar, should the expression values be standardized? |
transposed |
Logical scalar, is |
BSPARAM |
A BiocSingularParam object specifying which algorithm should be used to perform the PCA. |
BPPARAM |
A BiocParallelParam object specifying whether the PCA should be parallelized. |
exprs_values |
Alias to |
assay.type |
Integer scalar or string indicating which assay of |
dimred |
String or integer scalar specifying the existing dimensionality reduction results to use. |
n_dimred |
Integer scalar or vector specifying the dimensions to use if |
altexp |
String or integer scalar specifying an alternative experiment containing the input data. |
name |
String specifying the name to be used to store the result in the |
Fast approximate SVD algorithms like BSPARAM=IrlbaParam()
or RandomParam()
use a random initialization, after which they converge towards the exact PCs.
This means that the result will change slightly across different runs.
For full reproducibility, users should call set.seed
prior to running runPCA
with such algorithms.
(Note that this includes BSPARAM=bsparam()
, which uses approximate algorithms by default.)
For calculatePCA
, a numeric matrix of coordinates for each cell (row) in each of ncomponents
PCs (column).
For runPCA
, a SingleCellExperiment object is returned containing this matrix in reducedDims(..., name)
.
In both cases, the attributes of the PC coordinate matrix contain the following elements:
"percentVar"
, the percentage of variance explained by each PC.
This may not sum to 100 if not all PCs are reported.
"varExplained"
, the actual variance explained by each PC.
"rotation"
, the rotation matrix containing loadings for all genes used in the analysis and for each PC.
This section is relevant if x
is a numeric matrix of (log-)expression values with features in rows and cells in columns;
or if x
is a SingleCellExperiment and dimred=NULL
.
In the latter, the expression values are obtained from the assay specified by assay.type
.
The subset_row
argument specifies the features to use for dimensionality reduction.
The aim is to allow users to specify highly variable features to improve the signal/noise ratio,
or to specify genes in a pathway of interest to focus on particular aspects of heterogeneity.
If subset_row=NULL
, the ntop
features with the largest variances are used instead.
We literally compute the variances from the expression values without considering any mean-variance trend,
so often a more considered choice of genes is possible, e.g., with scran functions.
Note that the value of ntop
is ignored if subset_row
is specified.
If scale=TRUE
, the expression values for each feature are standardized so that their variance is unity.
This will also remove features with standard deviations below 1e-8.
If x
is a SingleCellExperiment, the method can be applied on existing dimensionality reduction results in x
by setting the dimred
argument.
This is typically used to run slower non-linear algorithms (t-SNE, UMAP) on the results of fast linear decompositions (PCA).
We might also use this with existing reduced dimensions computed from a priori knowledge (e.g., gene set scores), where further dimensionality reduction could be applied to compress the data.
The matrix of existing reduced dimensions is taken from reducedDim(x, dimred)
.
By default, all dimensions are used to compute the second set of reduced dimensions.
If n_dimred
is also specified, only the first n_dimred
columns are used.
Alternatively, n_dimred
can be an integer vector specifying the column indices of the dimensions to use.
When dimred
is specified, no additional feature selection or standardization is performed.
This means that any settings of ntop
, subset_row
and scale
are ignored.
If x
is a numeric matrix, setting transposed=TRUE
will treat the rows as cells and the columns as the variables/diemnsions.
This allows users to manually pass in dimensionality reduction results without needing to wrap them in a SingleCellExperiment.
As such, no feature selection or standardization is performed, i.e., ntop
, subset_row
and scale
are ignored.
This section is relevant if x
is a SingleCellExperiment and altexp
is not NULL
.
In such cases, the method is run on data from an alternative SummarizedExperiment nested within x
.
This is useful for performing dimensionality reduction on other features stored in altExp(x, altexp)
, e.g., antibody tags.
Setting altexp
with assay.type
will use the specified assay from the alternative SummarizedExperiment.
If the alternative is a SingleCellExperiment, setting dimred
will use the specified dimensionality reduction results from the alternative.
This option will also interact as expected with n_dimred
.
Note that the output is still stored in the reducedDims
of the output SingleCellExperiment.
It is advisable to use a different name
to distinguish this output from the results generated from the main experiment's assay values.
Aaron Lun, based on code by Davis McCarthy
runPCA
, for the underlying calculations.
plotPCA
, to conveniently visualize the results.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runPCA(example_sce) reducedDimNames(example_sce) head(reducedDim(example_sce))
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runPCA(example_sce) reducedDimNames(example_sce) head(reducedDim(example_sce))
Perform t-stochastic neighbour embedding (t-SNE) for the cells, based on the data in a SingleCellExperiment object.
calculateTSNE(x, ...) ## S4 method for signature 'ANY' calculateTSNE( x, ncomponents = 2, ntop = 500, subset_row = NULL, scale = FALSE, transposed = FALSE, perplexity = NULL, normalize = TRUE, theta = 0.5, num_threads = NULL, ..., external_neighbors = FALSE, BNPARAM = KmknnParam(), BPPARAM = SerialParam(), use_fitsne = FALSE, use_densvis = FALSE, dens_frac = 0.3, dens_lambda = 0.1 ) ## S4 method for signature 'SummarizedExperiment' calculateTSNE(x, ..., exprs_values = "logcounts", assay.type = exprs_values) ## S4 method for signature 'SingleCellExperiment' calculateTSNE( x, ..., pca = is.null(dimred), exprs_values = "logcounts", dimred = NULL, n_dimred = NULL, assay.type = exprs_values ) runTSNE(x, ..., altexp = NULL, name = "TSNE")
calculateTSNE(x, ...) ## S4 method for signature 'ANY' calculateTSNE( x, ncomponents = 2, ntop = 500, subset_row = NULL, scale = FALSE, transposed = FALSE, perplexity = NULL, normalize = TRUE, theta = 0.5, num_threads = NULL, ..., external_neighbors = FALSE, BNPARAM = KmknnParam(), BPPARAM = SerialParam(), use_fitsne = FALSE, use_densvis = FALSE, dens_frac = 0.3, dens_lambda = 0.1 ) ## S4 method for signature 'SummarizedExperiment' calculateTSNE(x, ..., exprs_values = "logcounts", assay.type = exprs_values) ## S4 method for signature 'SingleCellExperiment' calculateTSNE( x, ..., pca = is.null(dimred), exprs_values = "logcounts", dimred = NULL, n_dimred = NULL, assay.type = exprs_values ) runTSNE(x, ..., altexp = NULL, name = "TSNE")
x |
For For |
... |
For the For |
ncomponents |
Numeric scalar indicating the number of t-SNE dimensions to obtain. |
ntop |
Numeric scalar specifying the number of features with the highest variances to use for dimensionality reduction. |
subset_row |
Vector specifying the subset of features to use for dimensionality reduction. This can be a character vector of row names, an integer vector of row indices or a logical vector. |
scale |
Logical scalar, should the expression values be standardized? |
transposed |
Logical scalar, is |
perplexity |
Numeric scalar defining the perplexity parameter, see |
normalize |
Logical scalar indicating if input values should be scaled for numerical precision, see |
theta |
Numeric scalar specifying the approximation accuracy of the Barnes-Hut algorithm, see |
num_threads |
Integer scalar specifying the number of threads to use in |
external_neighbors |
Logical scalar indicating whether a nearest neighbors search should be computed externally with |
BNPARAM |
A BiocNeighborParam object specifying the neighbor search algorithm to use when |
BPPARAM |
A BiocParallelParam object specifying how the neighbor search should be parallelized when |
use_fitsne |
Logical scalar indicating whether |
use_densvis |
Logical scalar indicating whether |
dens_frac , dens_lambda
|
See |
exprs_values |
Alias to |
assay.type |
Integer scalar or string indicating which assay of |
pca |
Logical scalar indicating whether a PCA step should be performed inside |
dimred |
String or integer scalar specifying the existing dimensionality reduction results to use. |
n_dimred |
Integer scalar or vector specifying the dimensions to use if |
altexp |
String or integer scalar specifying an alternative experiment containing the input data. |
name |
String specifying the name to be used to store the result in the |
The function Rtsne
is used internally to compute the t-SNE.
Note that the algorithm is not deterministic, so different runs of the function will produce differing results.
Users are advised to test multiple random seeds, and then use set.seed
to set a random seed for replicable results.
The value of the perplexity
parameter can have a large effect on the results.
By default, the function will set a “reasonable” perplexity that scales with the number of cells in x
.
(Specifically, it is the number of cells divided by 5, capped at a maximum of 50.)
However, it is often worthwhile to manually try multiple values to ensure that the conclusions are robust.
If external_neighbors=TRUE
, the nearest neighbor search step will use a different algorithm to that in the Rtsne
function.
This can be parallelized or approximate to achieve greater speed for large data sets.
The neighbor search results are then used for t-SNE via the Rtsne_neighbors
function.
If dimred
is specified, the PCA step of the Rtsne
function is automatically turned off by default.
This presumes that the existing dimensionality reduction is sufficient such that an additional PCA is not required.
For calculateTSNE
, a numeric matrix is returned containing the t-SNE coordinates for each cell (row) and dimension (column).
For runTSNE
, a modified x
is returned that contains the t-SNE coordinates in reducedDim(x, name)
.
This section is relevant if x
is a numeric matrix of (log-)expression values with features in rows and cells in columns;
or if x
is a SingleCellExperiment and dimred=NULL
.
In the latter, the expression values are obtained from the assay specified by assay.type
.
The subset_row
argument specifies the features to use for dimensionality reduction.
The aim is to allow users to specify highly variable features to improve the signal/noise ratio,
or to specify genes in a pathway of interest to focus on particular aspects of heterogeneity.
If subset_row=NULL
, the ntop
features with the largest variances are used instead.
We literally compute the variances from the expression values without considering any mean-variance trend,
so often a more considered choice of genes is possible, e.g., with scran functions.
Note that the value of ntop
is ignored if subset_row
is specified.
If scale=TRUE
, the expression values for each feature are standardized so that their variance is unity.
This will also remove features with standard deviations below 1e-8.
If x
is a SingleCellExperiment, the method can be applied on existing dimensionality reduction results in x
by setting the dimred
argument.
This is typically used to run slower non-linear algorithms (t-SNE, UMAP) on the results of fast linear decompositions (PCA).
We might also use this with existing reduced dimensions computed from a priori knowledge (e.g., gene set scores), where further dimensionality reduction could be applied to compress the data.
The matrix of existing reduced dimensions is taken from reducedDim(x, dimred)
.
By default, all dimensions are used to compute the second set of reduced dimensions.
If n_dimred
is also specified, only the first n_dimred
columns are used.
Alternatively, n_dimred
can be an integer vector specifying the column indices of the dimensions to use.
When dimred
is specified, no additional feature selection or standardization is performed.
This means that any settings of ntop
, subset_row
and scale
are ignored.
If x
is a numeric matrix, setting transposed=TRUE
will treat the rows as cells and the columns as the variables/diemnsions.
This allows users to manually pass in dimensionality reduction results without needing to wrap them in a SingleCellExperiment.
As such, no feature selection or standardization is performed, i.e., ntop
, subset_row
and scale
are ignored.
This section is relevant if x
is a SingleCellExperiment and altexp
is not NULL
.
In such cases, the method is run on data from an alternative SummarizedExperiment nested within x
.
This is useful for performing dimensionality reduction on other features stored in altExp(x, altexp)
, e.g., antibody tags.
Setting altexp
with assay.type
will use the specified assay from the alternative SummarizedExperiment.
If the alternative is a SingleCellExperiment, setting dimred
will use the specified dimensionality reduction results from the alternative.
This option will also interact as expected with n_dimred
.
Note that the output is still stored in the reducedDims
of the output SingleCellExperiment.
It is advisable to use a different name
to distinguish this output from the results generated from the main experiment's assay values.
Aaron Lun, based on code by Davis McCarthy
van der Maaten LJP, Hinton GE (2008). Visualizing High-Dimensional Data Using t-SNE. J. Mach. Learn. Res. 9, 2579-2605.
Rtsne
, for the underlying calculations.
plotTSNE
, to quickly visualize the results.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runTSNE(example_sce) reducedDimNames(example_sce) head(reducedDim(example_sce))
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runTSNE(example_sce) reducedDimNames(example_sce) head(reducedDim(example_sce))
Perform uniform manifold approximation and projection (UMAP) for the cells, based on the data in a SingleCellExperiment object.
calculateUMAP(x, ...) ## S4 method for signature 'ANY' calculateUMAP( x, ncomponents = 2, ntop = 500, subset_row = NULL, scale = FALSE, transposed = FALSE, pca = if (transposed) NULL else 50, n_neighbors = 15, n_threads = NULL, ..., external_neighbors = FALSE, BNPARAM = KmknnParam(), BPPARAM = SerialParam(), use_densvis = FALSE, dens_frac = 0.3, dens_lambda = 0.1 ) ## S4 method for signature 'SummarizedExperiment' calculateUMAP(x, ..., exprs_values = "logcounts", assay.type = exprs_values) ## S4 method for signature 'SingleCellExperiment' calculateUMAP( x, ..., pca = if (!is.null(dimred)) NULL else 50, exprs_values = "logcounts", dimred = NULL, n_dimred = NULL, assay.type = exprs_values ) runUMAP(x, ..., altexp = NULL, name = "UMAP")
calculateUMAP(x, ...) ## S4 method for signature 'ANY' calculateUMAP( x, ncomponents = 2, ntop = 500, subset_row = NULL, scale = FALSE, transposed = FALSE, pca = if (transposed) NULL else 50, n_neighbors = 15, n_threads = NULL, ..., external_neighbors = FALSE, BNPARAM = KmknnParam(), BPPARAM = SerialParam(), use_densvis = FALSE, dens_frac = 0.3, dens_lambda = 0.1 ) ## S4 method for signature 'SummarizedExperiment' calculateUMAP(x, ..., exprs_values = "logcounts", assay.type = exprs_values) ## S4 method for signature 'SingleCellExperiment' calculateUMAP( x, ..., pca = if (!is.null(dimred)) NULL else 50, exprs_values = "logcounts", dimred = NULL, n_dimred = NULL, assay.type = exprs_values ) runUMAP(x, ..., altexp = NULL, name = "UMAP")
x |
For For |
... |
For the For |
ncomponents |
Numeric scalar indicating the number of UMAP dimensions to obtain. |
ntop |
Numeric scalar specifying the number of features with the highest variances to use for dimensionality reduction. |
subset_row |
Vector specifying the subset of features to use for dimensionality reduction. This can be a character vector of row names, an integer vector of row indices or a logical vector. |
scale |
Logical scalar, should the expression values be standardized? |
transposed |
Logical scalar, is |
pca |
Integer scalar specifying how many PCs should be used as input into the UMAP algorithm. By default, no PCA is performed if the input is a dimensionality reduction result. |
n_neighbors |
Integer scalar, number of nearest neighbors to identify when constructing the initial graph. |
n_threads |
Integer scalar specifying the number of threads to use in |
external_neighbors |
Logical scalar indicating whether a nearest neighbors search should be computed externally with |
BNPARAM |
A BiocNeighborParam object specifying the neighbor search algorithm to use when |
BPPARAM |
A BiocParallelParam object specifying whether the PCA should be parallelized. |
use_densvis |
Logical scalar indicating whether |
dens_frac , dens_lambda
|
See |
exprs_values |
Alias to |
assay.type |
Integer scalar or string indicating which assay of |
dimred |
String or integer scalar specifying the existing dimensionality reduction results to use. |
n_dimred |
Integer scalar or vector specifying the dimensions to use if |
altexp |
String or integer scalar specifying an alternative experiment containing the input data. |
name |
String specifying the name to be used to store the result in the |
The function umap
is used internally to compute the UMAP.
Note that the algorithm is not deterministic, so different runs of the function will produce differing results.
Users are advised to test multiple random seeds, and then use set.seed
to set a random seed for replicable results.
If external_neighbors=TRUE
, the nearest neighbor search is conducted using a different algorithm to that in the umap
function.
This can be parallelized or approximate to achieve greater speed for large data sets.
The neighbor search results are then used directly to create the UMAP embedding.
For calculateUMAP
, a matrix is returned containing the UMAP coordinates for each cell (row) and dimension (column).
For runUMAP
, a modified x
is returned that contains the UMAP coordinates in reducedDim(x, name)
.
This section is relevant if x
is a numeric matrix of (log-)expression values with features in rows and cells in columns;
or if x
is a SingleCellExperiment and dimred=NULL
.
In the latter, the expression values are obtained from the assay specified by assay.type
.
The subset_row
argument specifies the features to use for dimensionality reduction.
The aim is to allow users to specify highly variable features to improve the signal/noise ratio,
or to specify genes in a pathway of interest to focus on particular aspects of heterogeneity.
If subset_row=NULL
, the ntop
features with the largest variances are used instead.
We literally compute the variances from the expression values without considering any mean-variance trend,
so often a more considered choice of genes is possible, e.g., with scran functions.
Note that the value of ntop
is ignored if subset_row
is specified.
If scale=TRUE
, the expression values for each feature are standardized so that their variance is unity.
This will also remove features with standard deviations below 1e-8.
If x
is a SingleCellExperiment, the method can be applied on existing dimensionality reduction results in x
by setting the dimred
argument.
This is typically used to run slower non-linear algorithms (t-SNE, UMAP) on the results of fast linear decompositions (PCA).
We might also use this with existing reduced dimensions computed from a priori knowledge (e.g., gene set scores), where further dimensionality reduction could be applied to compress the data.
The matrix of existing reduced dimensions is taken from reducedDim(x, dimred)
.
By default, all dimensions are used to compute the second set of reduced dimensions.
If n_dimred
is also specified, only the first n_dimred
columns are used.
Alternatively, n_dimred
can be an integer vector specifying the column indices of the dimensions to use.
When dimred
is specified, no additional feature selection or standardization is performed.
This means that any settings of ntop
, subset_row
and scale
are ignored.
If x
is a numeric matrix, setting transposed=TRUE
will treat the rows as cells and the columns as the variables/diemnsions.
This allows users to manually pass in dimensionality reduction results without needing to wrap them in a SingleCellExperiment.
As such, no feature selection or standardization is performed, i.e., ntop
, subset_row
and scale
are ignored.
This section is relevant if x
is a SingleCellExperiment and altexp
is not NULL
.
In such cases, the method is run on data from an alternative SummarizedExperiment nested within x
.
This is useful for performing dimensionality reduction on other features stored in altExp(x, altexp)
, e.g., antibody tags.
Setting altexp
with assay.type
will use the specified assay from the alternative SummarizedExperiment.
If the alternative is a SingleCellExperiment, setting dimred
will use the specified dimensionality reduction results from the alternative.
This option will also interact as expected with n_dimred
.
Note that the output is still stored in the reducedDims
of the output SingleCellExperiment.
It is advisable to use a different name
to distinguish this output from the results generated from the main experiment's assay values.
Aaron Lun
McInnes L, Healy J, Melville J (2018). UMAP: uniform manifold approximation and projection for dimension reduction. arXiv.
umap
, for the underlying calculations.
plotUMAP
, to quickly visualize the results.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runUMAP(example_sce) reducedDimNames(example_sce) head(reducedDim(example_sce))
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runUMAP(example_sce) reducedDimNames(example_sce) head(reducedDim(example_sce))
Functions that have passed on to the function afterlife. Their successors are also listed.
calculateQCMetrics(...) ## S4 method for signature 'SingleCellExperiment' normalize(object, ...) centreSizeFactors(...) calculateDiffusionMap(x, ...) ## S4 method for signature 'ANY' calculateDiffusionMap(x, ...) runDiffusionMap(...) multiplot(...)
calculateQCMetrics(...) ## S4 method for signature 'SingleCellExperiment' normalize(object, ...) centreSizeFactors(...) calculateDiffusionMap(x, ...) ## S4 method for signature 'ANY' calculateDiffusionMap(x, ...) runDiffusionMap(...) multiplot(...)
object , x , ...
|
Ignored arguments. |
calculateQCMetrics
is succeeded by perCellQCMetrics
and perFeatureQCMetrics
.
normalize
is succeeded by logNormCounts
.
centreSizeFactors
has no replacement - the SingleCellExperiment is removing support for multiple size factors, so this function is now trivial.
runDiffusionMap
and calculateDiffusionMap
have no replacement.
destiny is no longer on Bioconductor. You can calculate a diffusion map
yourself, and add it to a reducedDim
field, if you so wish.
All functions error out with a defunct message pointing towards its descendent (if available).
Aaron Lun
try(calculateQCMetrics())
try(calculateQCMetrics())
Compute, for each principal component, the percentage of variance that is explained by one or more variables of interest.
getExplanatoryPCs(x, dimred = "PCA", n_dimred = 10, ...)
getExplanatoryPCs(x, dimred = "PCA", n_dimred = 10, ...)
x |
A SingleCellExperiment object containing dimensionality reduction results. |
dimred |
String or integer scalar specifying the field in |
n_dimred |
Integer scalar specifying the number of the top principal components to use. |
... |
Additional arguments passed to |
This function computes the percentage of variance in PC scores that is explained by variables in the sample-level metadata. It allows identification of important PCs that are driven by known experimental conditions, e.g., treatment, disease. PCs correlated with technical factors (e.g., batch effects, library size) can also be detected and removed prior to further analysis.
By default, the function will attempt to use pre-computed PCA results in object
.
This is done by taking the top n_dimred
PCs from the matrix specified by dimred
.
If these are not available or if rerun=TRUE
, the function will rerun the PCA using runPCA
;
however, this mode is deprecated and users are advised to explicitly call runPCA
themselves.
A matrix containing the percentage of variance explained by each factor (column) and for each PC (row).
Aaron Lun
plotExplanatoryPCs
, to plot the results.
getVarianceExplained
, to compute the variance explained.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runPCA(example_sce) r2mat <- getExplanatoryPCs(example_sce)
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runPCA(example_sce) r2mat <- getExplanatoryPCs(example_sce)
Compute, for each gene, the percentage of variance that is explained by one or more variables of interest.
getVarianceExplained(x, ...) ## S4 method for signature 'ANY' getVarianceExplained(x, variables, subset_row = NULL, BPPARAM = SerialParam()) ## S4 method for signature 'SummarizedExperiment' getVarianceExplained( x, variables = NULL, ..., exprs_values = "logcounts", assay.type = exprs_values )
getVarianceExplained(x, ...) ## S4 method for signature 'ANY' getVarianceExplained(x, variables, subset_row = NULL, BPPARAM = SerialParam()) ## S4 method for signature 'SummarizedExperiment' getVarianceExplained( x, variables = NULL, ..., exprs_values = "logcounts", assay.type = exprs_values )
x |
A numeric matrix of expression values, usually log-transformed and normalized. Alternatively, a SummarizedExperiment containing such a matrix. |
... |
For the generic, arguments to be passed to specific methods. For the SummarizedExperiment method, arguments to be passed to the ANY method. |
variables |
A DataFrame or data.frame containing one or more variables of interest.
This should have number of rows equal to the number of columns in For the SummarizedExperiment method, this can also be a character vector specifying column names of |
subset_row |
A vector specifying the subset of rows of |
BPPARAM |
A BiocParallelParam object specifying whether the calculations should be parallelized. |
exprs_values |
Alias for |
assay.type |
String or integer scalar specifying the expression values for which to compute the variance (also an alias |
This function computes the percentage of variance in gene expression that is explained by variables in the sample-level metadata. It allows problematic factors to be quickly identified, as well as the genes that are most affected.
A numeric matrix containing the percentage of variance explained by each factor (column) and for each gene (row).
Aaron Lun
getExplanatoryPCs
, which calls this function.
plotExplanatoryVariables
, to plot the results.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) r2mat <- getVarianceExplained(example_sce)
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) r2mat <- getVarianceExplained(example_sce)
Create a base ggplot object from a SingleCellExperiment, the contents of which can be directly referenced in subsequent layers without prior specification.
ggcells( x, mapping = aes(), features = NULL, exprs_values = "logcounts", use_dimred = TRUE, use_altexps = FALSE, prefix_altexps = FALSE, check_names = TRUE, extract_mapping = TRUE, assay.type = exprs_values, ... ) ggfeatures( x, mapping = aes(), cells = NULL, exprs_values = "logcounts", check_names = TRUE, extract_mapping = TRUE, assay.type = exprs_values, ... )
ggcells( x, mapping = aes(), features = NULL, exprs_values = "logcounts", use_dimred = TRUE, use_altexps = FALSE, prefix_altexps = FALSE, check_names = TRUE, extract_mapping = TRUE, assay.type = exprs_values, ... ) ggfeatures( x, mapping = aes(), cells = NULL, exprs_values = "logcounts", check_names = TRUE, extract_mapping = TRUE, assay.type = exprs_values, ... )
x |
A SingleCellExperiment object.
This is expected to have row names for |
mapping |
A list containing aesthetic mappings, usually the output of |
features |
Character vector specifying the features for which to extract expression profiles across cells.
May also include features in alternative Experiments if permitted by |
exprs_values , use_dimred , use_altexps , prefix_altexps , check_names
|
Soft-deprecated equivalents of the arguments described above. |
extract_mapping |
Logical scalar indicating whether |
assay.type |
String or integer scalar specifying the expression values for which to compute the variance (also an alias |
... |
Further arguments to pass to ggplot. |
cells |
Character vector specifying the features for which to extract expression profiles across cells. |
These functions generate a data.frame from the contents of a SingleCellExperiment and pass it to ggplot
.
Rows, columns or metadata fields in the x
can then be referenced in subsequent ggplot2 commands.
ggcells
treats cells as the data values so users can reference row names of x
(if provided in features
), column metadata variables and dimensionality reduction results.
They can also reference row names and metadata variables for alternative Experiments.
ggfeatures
treats features as the data values so users can reference column names of x
(if provided in cells
) and row metadata variables.
If mapping
is supplied, the function will automatically expand features
or cells
for any features or cells requested in the mapping.
This is convenient as features/cells do not have to specified twice (once in data.frame construction and again in later geom
or stat
layers).
Developers may wish to turn this off with extract_mapping=FALSE
for greater control.
A ggplot object containing the specified contents of x
.
Aaron Lun
makePerCellDF
and makePerFeatureDF
, for the construction of the data.frame.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runPCA(example_sce) ggcells(example_sce, aes(x=PCA.1, y=PCA.2, colour=Gene_0001)) + geom_point() ggcells(example_sce, aes(x=Mutation_Status, y=Gene_0001)) + geom_violin() + facet_wrap(~Cell_Cycle) rowData(example_sce)$GC <- runif(nrow(example_sce)) ggfeatures(example_sce, aes(x=GC, y=Cell_001)) + geom_point() + stat_smooth()
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runPCA(example_sce) ggcells(example_sce, aes(x=PCA.1, y=PCA.2, colour=Gene_0001)) + geom_point() ggcells(example_sce, aes(x=Mutation_Status, y=Gene_0001)) + geom_violin() + facet_wrap(~Cell_Cycle) rowData(example_sce)$GC <- runif(nrow(example_sce)) ggfeatures(example_sce, aes(x=GC, y=Cell_001)) + geom_point() + stat_smooth()
Counting the number of non-zero counts in each row (per feature) or column (per cell).
nexprs(x, ...) ## S4 method for signature 'ANY' nexprs( x, byrow = FALSE, detection_limit = 0, subset_row = NULL, subset_col = NULL, BPPARAM = SerialParam() ) ## S4 method for signature 'SummarizedExperiment' nexprs(x, ..., exprs_values = "counts", assay.type = exprs_values)
nexprs(x, ...) ## S4 method for signature 'ANY' nexprs( x, byrow = FALSE, detection_limit = 0, subset_row = NULL, subset_col = NULL, BPPARAM = SerialParam() ) ## S4 method for signature 'SummarizedExperiment' nexprs(x, ..., exprs_values = "counts", assay.type = exprs_values)
x |
A numeric matrix of counts where features are rows and cells are columns. Alternatively, a SummarizedExperiment containing such counts. |
... |
For the generic, further arguments to pass to specific methods. For the SummarizedExperiment method, further arguments to pass to the ANY method. |
byrow |
Logical scalar indicating whether to count the number of detected cells per feature.
If |
detection_limit |
Numeric scalar providing the value above which observations are deemed to be expressed. |
subset_row |
Logical, integer or character vector indicating which rows (i.e. features) to use. |
subset_col |
Logical, integer or character vector indicating which columns (i.e., cells) to use. |
BPPARAM |
A BiocParallelParam object specifying whether the calculations should be parallelized.
Only relevant when |
exprs_values |
Alias for |
assay.type |
String or integer specifying the assay of |
An integer vector containing counts per gene or cell, depending on the provided arguments.
Aaron Lun
numDetectedAcrossFeatures
and numDetectedAcrossCells
,
to do this calculation for each group of features or cells, respectively.
example_sce <- mockSCE() nexprs(example_sce)[1:10] nexprs(example_sce, byrow = TRUE)[1:10]
example_sce <- mockSCE() nexprs(example_sce)[1:10] nexprs(example_sce, byrow = TRUE)[1:10]
Convenience functions to access commonly-used assays of the
SingleCellExperiment
object.
norm_exprs(object) norm_exprs(object) <- value stand_exprs(object) stand_exprs(object) <- value fpkm(object) fpkm(object) <- value
norm_exprs(object) norm_exprs(object) <- value stand_exprs(object) stand_exprs(object) <- value fpkm(object) fpkm(object) <- value
object |
|
value |
a numeric matrix (e.g. for |
a matrix of normalised expression data
a matrix of standardised expressiond data
a matrix of FPKM values
A matrix of numeric, integer or logical values.
Davis McCarthy
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) head(logcounts(example_sce)[,1:10]) head(exprs(example_sce)[,1:10]) # identical to logcounts() norm_exprs(example_sce) <- log2(calculateCPM(example_sce) + 1) stand_exprs(example_sce) <- log2(calculateCPM(example_sce) + 1) tpm(example_sce) <- calculateTPM(example_sce, lengths = 5e4) cpm(example_sce) <- calculateCPM(example_sce) fpkm(example_sce)
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) head(logcounts(example_sce)[,1:10]) head(exprs(example_sce)[,1:10]) # identical to logcounts() norm_exprs(example_sce) <- log2(calculateCPM(example_sce) + 1) stand_exprs(example_sce) <- log2(calculateCPM(example_sce) + 1) tpm(example_sce) <- calculateTPM(example_sce, lengths = 5e4) cpm(example_sce) <- calculateCPM(example_sce) fpkm(example_sce)
Plot column-level (i.e., cell) metadata in an SingleCellExperiment object.
plotColData( object, y, x = NULL, colour_by = color_by, shape_by = NULL, size_by = NULL, order_by = NULL, by_exprs_values = "logcounts", other_fields = list(), swap_rownames = NULL, color_by = NULL, point_fun = NULL, scattermore = FALSE, bins = NULL, summary_fun = "sum", hex = FALSE, by.assay.type = by_exprs_values, ... )
plotColData( object, y, x = NULL, colour_by = color_by, shape_by = NULL, size_by = NULL, order_by = NULL, by_exprs_values = "logcounts", other_fields = list(), swap_rownames = NULL, color_by = NULL, point_fun = NULL, scattermore = FALSE, bins = NULL, summary_fun = "sum", hex = FALSE, by.assay.type = by_exprs_values, ... )
object |
A SingleCellExperiment object containing expression values and experimental information. |
y |
String specifying the column-level metadata field to show on the
y-axis. Alternatively, an AsIs vector or data.frame, see
|
x |
String specifying the column-level metadata to show on the x-axis.
Alternatively, an AsIs vector or data.frame, see |
colour_by |
Specification of a column metadata field or a feature to colour by, see the |
shape_by |
Specification of a column metadata field or a feature to shape by, see the |
size_by |
Specification of a column metadata field or a feature to size by, see the |
order_by |
Specification of a column metadata field or a feature to order points by, see the |
by_exprs_values |
Alias for |
other_fields |
Additional cell-based fields to include in the data.frame, see |
swap_rownames |
Column name of |
color_by |
Alias to |
point_fun |
Function used to create a geom that shows individual cells.
Should take |
scattermore |
Logical, whether to use the |
bins |
Number of bins, can be different in x and y, to bin and summarize
the points and their values, to avoid overplotting. If |
summary_fun |
Function to summarize the feature value of each point
(e.g. gene expression of each cell) when the points binned, defaults to
|
hex |
Logical, whether to use |
by.assay.type |
A string or integer scalar specifying which assay to obtain expression values from,
for use in point aesthetics - see |
... |
Additional arguments for visualization, see
|
If y
is continuous and x=NULL
, a violin plot is
generated. If x
is categorical, a grouped violin plot will be
generated, with one violin for each level of x
. If x
is
continuous, a scatter plot will be generated.
If y
is categorical and x
is continuous, horizontal violin
plots will be generated. If x
is missing or categorical, rectangule
plots will be generated where the area of a rectangle is proportional to
the number of points for a combination of factors.
A ggplot object.
Arguments shape_by
and size_by
are ignored when
scattermore = TRUE
. Using scattermore
is only recommended for
very large datasets to speed up plotting. Small point size is also
recommended. For larger point size, the point shape may be distorted. Also,
when scattermore = TRUE
, the point_size
argument works
differently.
Davis McCarthy, with modifications by Aaron Lun
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) colData(example_sce) <- cbind(colData(example_sce), perCellQCMetrics(example_sce)) plotColData(example_sce, y = "detected", x = "sum", colour_by = "Mutation_Status") + scale_x_log10() plotColData(example_sce, y = "detected", x = "sum", colour_by = "Mutation_Status", size_by = "Gene_0001", shape_by = "Treatment") + scale_x_log10() plotColData(example_sce, y = "Treatment", x = "sum", colour_by = "Mutation_Status") + scale_y_log10() # flipped violin. plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Mutation_Status") # With scattermore plotColData(example_sce, x = "sum", y = "detected", scattermore = TRUE, point_size = 2) # Bin to show point density plotColData(example_sce, x = "sum", y = "detected", bins = 10) # Bin to summarize value (default is sum) plotColData(example_sce, x = "sum", y = "detected", bins = 10, colour_by = "total")
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) colData(example_sce) <- cbind(colData(example_sce), perCellQCMetrics(example_sce)) plotColData(example_sce, y = "detected", x = "sum", colour_by = "Mutation_Status") + scale_x_log10() plotColData(example_sce, y = "detected", x = "sum", colour_by = "Mutation_Status", size_by = "Gene_0001", shape_by = "Treatment") + scale_x_log10() plotColData(example_sce, y = "Treatment", x = "sum", colour_by = "Mutation_Status") + scale_y_log10() # flipped violin. plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Mutation_Status") # With scattermore plotColData(example_sce, x = "sum", y = "detected", scattermore = TRUE, point_size = 2) # Bin to show point density plotColData(example_sce, x = "sum", y = "detected", bins = 10) # Bin to summarize value (default is sum) plotColData(example_sce, x = "sum", y = "detected", bins = 10, colour_by = "total")
Create a dot plot of expression values for a grouping of cells, where the size and colour of each dot represents the proportion of detected expression values and the average expression, respectively, for each feature in each group of cells.
plotDots( object, features, group = NULL, block = NULL, exprs_values = "logcounts", detection_limit = 0, zlim = NULL, colour = color, color = NULL, max_detected = NULL, other_fields = list(), by_exprs_values = exprs_values, swap_rownames = NULL, center = FALSE, scale = FALSE, assay.type = exprs_values, by.assay.type = by_exprs_values )
plotDots( object, features, group = NULL, block = NULL, exprs_values = "logcounts", detection_limit = 0, zlim = NULL, colour = color, color = NULL, max_detected = NULL, other_fields = list(), by_exprs_values = exprs_values, swap_rownames = NULL, center = FALSE, scale = FALSE, assay.type = exprs_values, by.assay.type = by_exprs_values )
object |
A SingleCellExperiment object. |
features |
A character (or factor) vector of row names, a logical vector, or integer vector of indices specifying rows of |
group |
String specifying the field of |
block |
String specifying the field of |
exprs_values |
Alias to |
detection_limit |
Numeric scalar providing the value above which observations are deemed to be expressed. |
zlim |
A numeric vector of length 2, specifying the upper and lower bounds for colour mapping of expression values.
Values outside this range are set to the most extreme colour.
If |
colour |
A vector of colours specifying the palette to use for increasing expression.
This defaults to viridis if |
color |
Alias to |
max_detected |
Numeric value specifying the cap on the proportion of detected expression values. |
other_fields |
Additional feature-based fields to include in the data.frame, see |
by_exprs_values |
Alias for |
swap_rownames |
Column name of |
center |
A logical scalar indicating whether each feature should have its mean expression (specifically, the mean of averages across all groups) centered at zero prior to plotting. |
scale |
A logical scalar specifying whether each row should have its average expression values scaled to unit variance prior to plotting. |
assay.type |
A string or integer scalar indicating which assay of |
by.assay.type |
A string or integer scalar specifying which assay to obtain expression values from, for entries of |
This implements a Seurat-style “dot plot” that creates a dot for each feature (row) in each group of cells (column).
The proportion of detected expression values and the average expression for each feature in each group of cells is visualized efficiently using the size and colour, respectively, of each dot.
If block
is specified, batch-corrected averages and proportions for each group are computed with correctGroupSummary
.
Some caution is required during interpretation due to the difficulty of simultaneously interpreting both size and colour. For example, if we coloured by z-score on a conventional blue-white-red colour axis, a gene that is downregulated in a group of cells would show up as a small blue dot. If the background colour was also white, this could be easily mistaken for a gene that is not downregulated at all. We suggest choosing a colour scale that remains distinguishable from the background colour at all points. Admittedly, that is easier said than done as many colour scales will approach a lighter colour at some stage, so some magnifying glasses may be required.
We can also cap the colour and size scales using zlim
and max_detected
, respectively.
This aims to preserve resolution for low-abundance genes by preventing domination of the scales by high-abundance features.
A ggplot object containing a dot plot.
Aaron Lun
plotExpression
and plotHeatmap
,
for alternatives to visualizing group-level expression values.
sce <- mockSCE() sce <- logNormCounts(sce) plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle") plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle", center=TRUE) plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle", scale=TRUE) plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle", center=TRUE, scale=TRUE) plotDots(sce, features=rownames(sce)[1:10], group="Treatment", block="Cell_Cycle")
sce <- mockSCE() sce <- logNormCounts(sce) plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle") plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle", center=TRUE) plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle", scale=TRUE) plotDots(sce, features=rownames(sce)[1:10], group="Cell_Cycle", center=TRUE, scale=TRUE) plotDots(sce, features=rownames(sce)[1:10], group="Treatment", block="Cell_Cycle")
Plot the explanatory PCs for each variable
plotExplanatoryPCs( object, nvars_to_plot = 10, npcs_to_plot = 50, theme_size = 10, ... )
plotExplanatoryPCs( object, nvars_to_plot = 10, npcs_to_plot = 50, theme_size = 10, ... )
object |
A SingleCellExperiment object containing expression values and experimental information.
Alternatively, a matrix containing the output of |
nvars_to_plot |
Integer scalar specifying the number of variables with the greatest explanatory power to plot.
This can be set to |
npcs_to_plot |
Integer scalar specifying the number of PCs to plot. |
theme_size |
numeric scalar providing base font size for ggplot theme. |
... |
Parameters to be passed to |
A density plot is created for each variable, showing the R-squared for each successive PC (up to npcs_to_plot
PCs).
Only the nvars_to_plot
variables with the largest maximum R-squared across PCs are shown.
If object
is a SingleCellExperiment object, getExplanatoryPCs
will be called to compute the variance in expression explained by each variable in each gene.
Users may prefer to run getExplanatoryPCs
manually and pass the resulting matrix as object
, in which case the R-squared values are used directly.
A ggplot object.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runPCA(example_sce) plotExplanatoryPCs(example_sce)
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runPCA(example_sce) plotExplanatoryPCs(example_sce)
Plot explanatory variables ordered by percentage of variance explained
plotExplanatoryVariables( object, nvars_to_plot = 10, min_marginal_r2 = 0, theme_size = 10, ... )
plotExplanatoryVariables( object, nvars_to_plot = 10, min_marginal_r2 = 0, theme_size = 10, ... )
object |
A SingleCellExperiment object containing expression values and experimental information.
Alternatively, a matrix containing the output of |
nvars_to_plot |
Integer scalar specifying the number of variables with the greatest explanatory power to plot.
This can be set to |
min_marginal_r2 |
Numeric scalar specifying the minimal value required for median marginal R-squared for a variable to be plotted. Only variables with a median marginal R-squared strictly larger than this value will be plotted. |
theme_size |
Numeric scalar specifying the font size to use for the plotting theme |
... |
Parameters to be passed to |
A density plot is created for each variable, showing the distribution of R-squared across all genes.
Only the nvars_to_plot
variables with the largest median R-squared across genes are shown.
Variables are also only shown if they have median R-squared values above min_marginal_r2
.
If object
is a SingleCellExperiment object, getVarianceExplained
will be called to compute the variance in expression explained by each variable in each gene.
Users may prefer to run getVarianceExplained
manually and pass the resulting matrix as object
, in which case the R-squared values are used directly.
A ggplot object.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) plotExplanatoryVariables(example_sce)
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) plotExplanatoryVariables(example_sce)
Plot expression values for a set of features (e.g. genes or transcripts) in a SingleExperiment object, against a continuous or categorical covariate for all cells.
plotExpression( object, features, x = NULL, exprs_values = "logcounts", log2_values = FALSE, colour_by = color_by, shape_by = NULL, size_by = NULL, order_by = NULL, by_exprs_values = exprs_values, xlab = NULL, feature_colours = feature_colors, one_facet = TRUE, ncol = 2, scales = "fixed", other_fields = list(), swap_rownames = NULL, color_by = NULL, feature_colors = TRUE, point_fun = NULL, assay.type = exprs_values, scattermore = FALSE, bins = NULL, summary_fun = "sum", hex = FALSE, by.assay.type = by_exprs_values, ... )
plotExpression( object, features, x = NULL, exprs_values = "logcounts", log2_values = FALSE, colour_by = color_by, shape_by = NULL, size_by = NULL, order_by = NULL, by_exprs_values = exprs_values, xlab = NULL, feature_colours = feature_colors, one_facet = TRUE, ncol = 2, scales = "fixed", other_fields = list(), swap_rownames = NULL, color_by = NULL, feature_colors = TRUE, point_fun = NULL, assay.type = exprs_values, scattermore = FALSE, bins = NULL, summary_fun = "sum", hex = FALSE, by.assay.type = by_exprs_values, ... )
object |
A SingleCellExperiment object containing expression values and other metadata. |
features |
A character vector or a list specifying the features to plot.
If a list is supplied, each entry of the list can be a string, an AsIs-wrapped vector or a data.frame - see |
x |
Specification of a column metadata field or a feature to show on the x-axis, see the |
exprs_values |
Alias to |
log2_values |
Logical scalar, specifying whether the expression values be transformed to the log2-scale for plotting (with an offset of 1 to avoid logging zeroes). |
colour_by |
Specification of a column metadata field or a feature to colour by, see the |
shape_by |
Specification of a column metadata field or a feature to shape by, see the |
size_by |
Specification of a column metadata field or a feature to size by, see the |
order_by |
Specification of a column metadata field or a feature to order points by, see the |
by_exprs_values |
Alias to |
xlab |
String specifying the label for x-axis.
If |
feature_colours |
Logical scalar indicating whether violins should be coloured by feature when |
one_facet |
Logical scalar indicating whether grouped violin plots for multiple features should be put onto one facet.
Only relevant when |
ncol |
Integer scalar, specifying the number of columns to be used for the panels of a multi-facet plot. |
scales |
String indicating whether should multi-facet scales be fixed ( |
other_fields |
Additional cell-based fields to include in the data.frame, see |
swap_rownames |
Column name of |
color_by |
Alias to |
feature_colors |
Alias to |
point_fun |
Function used to create a geom that shows individual cells. Should take |
assay.type |
A string or integer scalar specifying which assay in |
scattermore |
Logical, whether to use the |
bins |
Number of bins, can be different in x and y, to bin and summarize
the points and their values, to avoid overplotting. If |
summary_fun |
Function to summarize the feature value of each point
(e.g. gene expression of each cell) when the points binned, defaults to
|
hex |
Logical, whether to use |
by.assay.type |
A string or integer scalar specifying which assay to obtain expression values from,
for use in point aesthetics - see the |
... |
Additional arguments for visualization, see |
This function plots expression values for one or more features.
If x
is not specified, a violin plot will be generated of expression values.
If x
is categorical, a grouped violin plot will be generated, with one violin for each level of x
.
If x
is continuous, a scatter plot will be generated.
If multiple features are requested and x
is not specified and one_facet=TRUE
, a grouped violin plot will be generated with one violin per feature.
This will be coloured by feature if colour_by=NULL
and feature_colours=TRUE
, to yield a more aesthetically pleasing plot.
Otherwise, if x
is specified or one_facet=FALSE
, a multi-panel plot will be generated where each panel corresponds to a feature.
Each panel will be a scatter plot or (grouped) violin plot, depending on the nature of x
.
Note that this assumes that the expression values are numeric.
If not, and x
is continuous, horizontal violin plots will be generated.
If x
is missing or categorical, rectangule plots will be generated where the area of a rectangle is proportional to the number of points for a combination of factors.
A ggplot object.
Arguments shape_by
and size_by
are ignored when
scattermore = TRUE
. Using scattermore
is only recommended for
very large datasets to speed up plotting. Small point size is also
recommended. For larger point size, the point shape may be distorted. Also,
when scattermore = TRUE
, the point_size
argument works
differently.
Davis McCarthy, with modifications by Aaron Lun
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) ## default plot plotExpression(example_sce, rownames(example_sce)[1:15]) ## plot expression against an x-axis value plotExpression(example_sce, c("Gene_0001", "Gene_0004"), x="Mutation_Status") plotExpression(example_sce, c("Gene_0001", "Gene_0004"), x="Gene_0002") ## add visual options plotExpression(example_sce, rownames(example_sce)[1:6], colour_by = "Mutation_Status") plotExpression(example_sce, rownames(example_sce)[1:6], colour_by = "Mutation_Status", shape_by = "Treatment", size_by = "Gene_0010") ## use boxplot as well as violin plot plotExpression(example_sce, rownames(example_sce)[1:6], show_boxplot = TRUE, show_violin = FALSE) ## plot expression against expression values for Gene_0004 plotExpression(example_sce, rownames(example_sce)[1:4], "Gene_0004", show_smooth = TRUE) # Use scattermore plotExpression(example_sce, "Gene_0001", x = "Gene_0100", scattermore = TRUE, point_size = 2) # Bin to show point density plotExpression(example_sce, "Gene_0001", x = "Gene_0100", bins = 10) # Bin to summarize values (default is sum but can be changed with summary_fun) plotExpression(example_sce, "Gene_0001", x = "Gene_0100", bins = 10, colour_by = "Gene_0002", summary_fun = "mean")
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) ## default plot plotExpression(example_sce, rownames(example_sce)[1:15]) ## plot expression against an x-axis value plotExpression(example_sce, c("Gene_0001", "Gene_0004"), x="Mutation_Status") plotExpression(example_sce, c("Gene_0001", "Gene_0004"), x="Gene_0002") ## add visual options plotExpression(example_sce, rownames(example_sce)[1:6], colour_by = "Mutation_Status") plotExpression(example_sce, rownames(example_sce)[1:6], colour_by = "Mutation_Status", shape_by = "Treatment", size_by = "Gene_0010") ## use boxplot as well as violin plot plotExpression(example_sce, rownames(example_sce)[1:6], show_boxplot = TRUE, show_violin = FALSE) ## plot expression against expression values for Gene_0004 plotExpression(example_sce, rownames(example_sce)[1:4], "Gene_0004", show_smooth = TRUE) # Use scattermore plotExpression(example_sce, "Gene_0001", x = "Gene_0100", scattermore = TRUE, point_size = 2) # Bin to show point density plotExpression(example_sce, "Gene_0001", x = "Gene_0100", bins = 10) # Bin to summarize values (default is sum but can be changed with summary_fun) plotExpression(example_sce, "Gene_0001", x = "Gene_0100", bins = 10, colour_by = "Gene_0002", summary_fun = "mean")
Create a heatmap of average expression values for each group of cells and specified features in a SingleCellExperiment object.
plotGroupedHeatmap( object, features, group, block = NULL, columns = NULL, exprs_values = "logcounts", center = FALSE, scale = FALSE, zlim = NULL, colour = color, swap_rownames = NULL, color = NULL, assay.type = exprs_values, ... )
plotGroupedHeatmap( object, features, group, block = NULL, columns = NULL, exprs_values = "logcounts", center = FALSE, scale = FALSE, zlim = NULL, colour = color, swap_rownames = NULL, color = NULL, assay.type = exprs_values, ... )
object |
A SingleCellExperiment object. |
features |
A character (or factor) vector of row names, a logical vector, or integer vector of indices specifying rows of |
group |
String specifying the field of |
block |
String specifying the field of |
columns |
A vector specifying the subset of columns in |
exprs_values |
Alias to |
center |
A logical scalar indicating whether each feature should have its mean expression (specifically, the mean of averages across all groups) centered at zero prior to plotting. |
scale |
A logical scalar specifying whether each row should have its average expression values scaled to unit variance prior to plotting. |
zlim |
A numeric vector of length 2, specifying the upper and lower bounds for colour mapping of expression values.
Values outside this range are set to the most extreme colour.
If |
colour |
A vector of colours specifying the palette to use for increasing expression.
This defaults to viridis if |
swap_rownames |
Column name of |
color |
Alias to |
assay.type |
A string or integer scalar indicating which assay of |
... |
Additional arguments to pass to |
This function shows the average expression values for each group of cells on a heatmap, as defined using the group
factor.
A per-group visualization can be preferable to a per-cell visualization when dealing with large number of cells or groups with different size.
If block
is also specified, the block effect is regressed out of the averages with correctGroupSummary
prior to visualization.
Setting center=TRUE
is useful for examining log-fold changes of each group's expression profile from the average across all groups.
This avoids issues with the entire row appearing a certain colour because the gene is highly/lowly expressed across all cells.
Setting zlim
preserves the dynamic range of colours in the presence of outliers.
Otherwise, the plot may be dominated by a few genes, which will “flatten” the observed colours for the rest of the heatmap.
A heatmap is produced on the current graphics device.
The output of pheatmap
is invisibly returned.
Aaron Lun
pheatmap
, for the underlying function.
plotHeatmap
, for a per-cell heatmap.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce$Group <- paste0(example_sce$Treatment, "+", example_sce$Mutation_Status) plotGroupedHeatmap(example_sce, features=rownames(example_sce)[1:10], group="Group") plotGroupedHeatmap(example_sce, features=rownames(example_sce)[1:10], group="Group", center=TRUE) plotGroupedHeatmap(example_sce, features=rownames(example_sce)[1:10], group="Group", block="Cell_Cycle", center=TRUE)
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce$Group <- paste0(example_sce$Treatment, "+", example_sce$Mutation_Status) plotGroupedHeatmap(example_sce, features=rownames(example_sce)[1:10], group="Group") plotGroupedHeatmap(example_sce, features=rownames(example_sce)[1:10], group="Group", center=TRUE) plotGroupedHeatmap(example_sce, features=rownames(example_sce)[1:10], group="Group", block="Cell_Cycle", center=TRUE)
Create a heatmap of expression values for each cell and specified features in a SingleCellExperiment object.
plotHeatmap( object, features, columns = NULL, exprs_values = "logcounts", center = FALSE, scale = FALSE, zlim = NULL, colour = color, color = NULL, colour_columns_by = color_columns_by, color_columns_by = NULL, column_annotation_colours = column_annotation_colors, column_annotation_colors = list(), row_annotation_colours = row_annotation_colors, row_annotation_colors = list(), colour_rows_by = color_rows_by, color_rows_by = NULL, order_columns_by = NULL, by_exprs_values = exprs_values, show_colnames = FALSE, cluster_cols = is.null(order_columns_by), swap_rownames = NULL, assay.type = exprs_values, by.assay.type = by_exprs_values, ... )
plotHeatmap( object, features, columns = NULL, exprs_values = "logcounts", center = FALSE, scale = FALSE, zlim = NULL, colour = color, color = NULL, colour_columns_by = color_columns_by, color_columns_by = NULL, column_annotation_colours = column_annotation_colors, column_annotation_colors = list(), row_annotation_colours = row_annotation_colors, row_annotation_colors = list(), colour_rows_by = color_rows_by, color_rows_by = NULL, order_columns_by = NULL, by_exprs_values = exprs_values, show_colnames = FALSE, cluster_cols = is.null(order_columns_by), swap_rownames = NULL, assay.type = exprs_values, by.assay.type = by_exprs_values, ... )
object |
A SingleCellExperiment object. |
features |
A character (or factor) vector of row names, a logical vector, or integer vector of indices specifying rows of |
columns |
A vector specifying the subset of columns in |
exprs_values |
Alias to |
center |
A logical scalar indicating whether each feature should have its mean expression centered at zero prior to plotting. |
scale |
A logical scalar specifying whether each feature should have its expression values scaled to have unit variance prior to plotting. |
zlim |
A numeric vector of length 2, specifying the upper and lower bounds for colour mapping of expression values.
Values outside this range are set to the most extreme colour.
If |
colour |
A vector of colours specifying the palette to use for increasing expression.
This defaults to viridis if |
color , color_columns_by , column_annotation_colors , color_rows_by , row_annotation_colors
|
Aliases to |
colour_columns_by |
A list of values specifying how the columns should be annotated with colours.
Each entry of the list can be any acceptable input to the |
column_annotation_colours |
A named list of colour scales to be used for
the column annotations specified in |
row_annotation_colours |
Similar to |
colour_rows_by |
Similar to |
order_columns_by |
A list of values specifying how the columns should be ordered.
Each entry of the list can be any acceptable input to the |
by_exprs_values |
Alias to |
show_colnames , cluster_cols , ...
|
Additional arguments to pass to |
swap_rownames |
Column name of |
assay.type |
A string or integer scalar indicating which assay of |
by.assay.type |
A string or integer scalar specifying which assay to obtain expression values from,
for colouring of column-level data - see the |
Setting center=TRUE
is useful for examining log-fold changes of each cell's expression profile from the average across all cells.
This avoids issues with the entire row appearing a certain colour because the gene is highly/lowly expressed across all cells.
Setting zlim
preserves the dynamic range of colours in the presence of outliers.
Otherwise, the plot may be dominated by a few genes, which will “flatten” the observed colours for the rest of the heatmap.
Setting order_columns_by
is useful for automatically ordering the heatmap by one or more factors of interest, e.g., cluster identity.
This avoids the need to set colour_columns_by
, cluster_cols
and columns
to achieve the same effect.
A heatmap is produced on the current graphics device.
The output of pheatmap
is invisibly returned.
Aaron Lun
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) plotHeatmap(example_sce, features=rownames(example_sce)[1:10]) plotHeatmap(example_sce, features=rownames(example_sce)[1:10], center=TRUE) plotHeatmap(example_sce, features=rownames(example_sce)[1:10], colour_columns_by=c("Mutation_Status", "Cell_Cycle"))
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) plotHeatmap(example_sce, features=rownames(example_sce)[1:10]) plotHeatmap(example_sce, features=rownames(example_sce)[1:10], center=TRUE) plotHeatmap(example_sce, features=rownames(example_sce)[1:10], colour_columns_by=c("Mutation_Status", "Cell_Cycle"))
Plot the features with the highest average expression across all cells, along with their expression in each individual cell.
plotHighestExprs( object, n = 50, colour_cells_by = color_cells_by, drop_features = NULL, exprs_values = "counts", by_exprs_values = exprs_values, feature_names_to_plot = NULL, as_percentage = TRUE, swap_rownames = NULL, color_cells_by = NULL, assay.type = exprs_values, by.assay.type = by_exprs_values )
plotHighestExprs( object, n = 50, colour_cells_by = color_cells_by, drop_features = NULL, exprs_values = "counts", by_exprs_values = exprs_values, feature_names_to_plot = NULL, as_percentage = TRUE, swap_rownames = NULL, color_cells_by = NULL, assay.type = exprs_values, by.assay.type = by_exprs_values )
object |
A SingleCellExperiment object. |
n |
A numeric scalar specifying the number of the most expressed features to show. |
colour_cells_by |
Specification of a column metadata field or a feature to colour by, see |
drop_features |
A character, logical or numeric vector indicating which features (e.g. genes, transcripts) to drop when producing the plot. For example, spike-in transcripts might be dropped to examine the contribution from endogenous genes. |
exprs_values |
Alias to |
by_exprs_values |
Alias to |
feature_names_to_plot |
String specifying which row-level metadata column contains the feature names.
Alternatively, an AsIs-wrapped vector or a data.frame, see |
as_percentage |
logical scalar indicating whether percentages should be plotted.
If |
swap_rownames |
Column name of |
color_cells_by |
Alias to |
assay.type |
A integer scalar or string specifying the assay to obtain expression values from. |
by.assay.type |
A string or integer scalar specifying which assay to obtain expression values from,
for use in colouring - see |
This function will plot the percentage of counts accounted for by the top n
most highly expressed features across the dataset.
Each row on the plot corresponds to a feature and is sorted by average expression (denoted by the point).
The distribution of expression across all cells is shown as tick marks for each feature.
These ticks can be coloured according to cell-level metadata, as specified by colour_cells_by
.
A ggplot object.
example_sce <- mockSCE() colData(example_sce) <- cbind(colData(example_sce), perCellQCMetrics(example_sce)) plotHighestExprs(example_sce, colour_cells_by="detected") plotHighestExprs(example_sce, colour_cells_by="Mutation_Status")
example_sce <- mockSCE() colData(example_sce) <- cbind(colData(example_sce), perCellQCMetrics(example_sce)) plotHighestExprs(example_sce, colour_cells_by="detected") plotHighestExprs(example_sce, colour_cells_by="Mutation_Status")
Plots cells in their position on a plate, coloured by metadata variables or feature expression values from a SingleCellExperiment object.
plotPlatePosition( object, plate_position = NULL, colour_by = color_by, size_by = NULL, shape_by = NULL, order_by = NULL, by_exprs_values = "logcounts", add_legend = TRUE, theme_size = 24, point_alpha = 0.6, point_size = 24, point_shape = 19, other_fields = list(), swap_rownames = NULL, color_by = NULL, by.assay.type = by_exprs_values )
plotPlatePosition( object, plate_position = NULL, colour_by = color_by, size_by = NULL, shape_by = NULL, order_by = NULL, by_exprs_values = "logcounts", add_legend = TRUE, theme_size = 24, point_alpha = 0.6, point_size = 24, point_shape = 19, other_fields = list(), swap_rownames = NULL, color_by = NULL, by.assay.type = by_exprs_values )
object |
A SingleCellExperiment object. |
plate_position |
A character vector specifying the plate position for each cell (e.g., A01, B12, and so on, where letter indicates row and number indicates column).
If |
colour_by |
Specification of a column metadata field or a feature to colour by, see the |
size_by |
Specification of a column metadata field or a feature to size by, see the |
shape_by |
Specification of a column metadata field or a feature to shape by, see the |
order_by |
Specification of a column metadata field or a feature to order points by, see the |
by_exprs_values |
Alias for |
add_legend |
Logical scalar specifying whether a legend should be shown. |
theme_size |
Numeric scalar, see |
point_alpha |
Numeric scalar specifying the transparency of the points, see |
point_size |
Numeric scalar specifying the size of the points, see |
point_shape |
An integer, or a string specifying the shape
of the points. See |
other_fields |
Additional cell-based fields to include in the data.frame, see |
swap_rownames |
Column name of |
color_by |
Alias to |
by.assay.type |
A string or integer scalar specifying which assay to obtain expression values from,
for use in point aesthetics - see the |
This function expects plate positions to be given in a charcter format where a letter indicates the row on the plate and a numeric value indicates the column.
Each cell has a plate position such as "A01", "B12", "K24" and so on.
From these plate positions, the row is extracted as the letter, and the column as the numeric part.
Alternatively, the row and column identities can be directly supplied by setting plate_position
as a list of two factors.
A ggplot object.
Davis McCarthy, with modifications by Aaron Lun
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) ## define plate positions example_sce$plate_position <- paste0( rep(LETTERS[1:5], each = 8), rep(formatC(1:8, width = 2, flag = "0"), 5) ) ## plot plate positions plotPlatePosition(example_sce, colour_by = "Mutation_Status") plotPlatePosition(example_sce, shape_by = "Treatment", colour_by = "Gene_0004") plotPlatePosition(example_sce, shape_by = "Treatment", size_by = "Gene_0001", colour_by = "Cell_Cycle")
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) ## define plate positions example_sce$plate_position <- paste0( rep(LETTERS[1:5], each = 8), rep(formatC(1:8, width = 2, flag = "0"), 5) ) ## plot plate positions plotPlatePosition(example_sce, colour_by = "Mutation_Status") plotPlatePosition(example_sce, shape_by = "Treatment", colour_by = "Gene_0004") plotPlatePosition(example_sce, shape_by = "Treatment", size_by = "Gene_0001", colour_by = "Cell_Cycle")
Plot cell-level reduced dimension results stored in a SingleCellExperiment object.
plotReducedDim( object, dimred, ncomponents = 2, percentVar = NULL, colour_by = color_by, shape_by = NULL, size_by = NULL, order_by = NULL, by_exprs_values = "logcounts", text_by = NULL, text_size = 5, text_colour = text_color, label_format = c("%s %i", " (%i%%)"), other_fields = list(), text_color = "black", color_by = NULL, swap_rownames = NULL, point.padding = NA, force = 1, rasterise = FALSE, scattermore = FALSE, bins = NULL, summary_fun = "sum", hex = FALSE, by.assay.type = by_exprs_values, ... )
plotReducedDim( object, dimred, ncomponents = 2, percentVar = NULL, colour_by = color_by, shape_by = NULL, size_by = NULL, order_by = NULL, by_exprs_values = "logcounts", text_by = NULL, text_size = 5, text_colour = text_color, label_format = c("%s %i", " (%i%%)"), other_fields = list(), text_color = "black", color_by = NULL, swap_rownames = NULL, point.padding = NA, force = 1, rasterise = FALSE, scattermore = FALSE, bins = NULL, summary_fun = "sum", hex = FALSE, by.assay.type = by_exprs_values, ... )
object |
A SingleCellExperiment object. |
dimred |
A string or integer scalar indicating the reduced dimension
result in |
ncomponents |
A numeric scalar indicating the number of dimensions to plot, starting from the first dimension. Alternatively, a numeric vector specifying the dimensions to be plotted. |
percentVar |
A numeric vector giving the proportion of variance in
expression explained by each reduced dimension.
Only expected to be used in PCA settings, e.g., in the
|
colour_by |
Specification of a column metadata field or a feature to
colour by, see the |
shape_by |
Specification of a column metadata field or a feature to
shape by, see the |
size_by |
Specification of a column metadata field or a feature to
size by, see the |
order_by |
Specification of a column metadata field or a feature to
order points by, see the |
by_exprs_values |
Alias for |
text_by |
String specifying the column metadata field with which to add
text labels on the plot.
This must refer to a categorical field, i.e., coercible into a factor.
Alternatively, an AsIs vector or data.frame, see
|
text_size |
Numeric scalar specifying the size of added text. |
text_colour |
String specifying the colour of the added text. |
label_format |
Character vector of length 2 containing format strings
to use for the axis labels.
The first string expects a string containing the result type (e.g.,
|
other_fields |
Additional cell-based fields to include in the
data.frame, see |
text_color |
Alias to |
color_by |
Alias to |
swap_rownames |
Column name of |
point.padding , force
|
See |
rasterise |
Whether to rasterise the points in the plot with
|
scattermore |
Logical, whether to use the |
bins |
Number of bins, can be different in x and y, to bin and summarize
the points and their values, to avoid overplotting. If |
summary_fun |
Function to summarize the feature value of each point
(e.g. gene expression of each cell) when the points binned, defaults to
|
hex |
Logical, whether to use |
by.assay.type |
A string or integer scalar specifying which assay to
obtain expression values from,
for use in point aesthetics - see the |
... |
Additional arguments for visualization, see
|
If ncomponents
is a scalar equal to 2, a scatterplot of the first two
dimensions is produced.
If ncomponents
is greater than 2, a pairs plots for the top
dimensions is produced.
Alternatively, if ncomponents
is a vector of length 2, a scatterplot
of the two specified dimensions is produced.
If it is of length greater than 2, a pairs plot is produced containing all
pairwise plots between the specified dimensions.
The text_by
option will add factor levels as labels onto the plot,
placed at the median coordinate across all points in that level.
This is useful for annotating position-related metadata (e.g., clusters)
when there are too many levels to distinguish by colour.
It is only available for scatterplots.
A ggplot object
Arguments shape_by
and size_by
are ignored when
scattermore = TRUE
. Using scattermore
is only recommended for
very large datasets to speed up plotting. Small point size is also
recommended. For larger point size, the point shape may be distorted. Also,
when scattermore = TRUE
, the point_size
argument works
differently.
Davis McCarthy, with modifications by Aaron Lun
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runPCA(example_sce, ncomponents=5) plotReducedDim(example_sce, "PCA") plotReducedDim(example_sce, "PCA", colour_by="Cell_Cycle") plotReducedDim(example_sce, "PCA", colour_by="Gene_0001") plotReducedDim(example_sce, "PCA", ncomponents=5) plotReducedDim(example_sce, "PCA", ncomponents=5, colour_by="Cell_Cycle", shape_by="Treatment") # Use scattermore plotPCA(example_sce, ncomponents = 4, scattermore = TRUE, point_size = 3) # Bin to show point density plotPCA(example_sce, bins = 10) # Bin to summarize values (default is sum) plotPCA(example_sce, bins = 10, colour_by = "Gene_0001")
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runPCA(example_sce, ncomponents=5) plotReducedDim(example_sce, "PCA") plotReducedDim(example_sce, "PCA", colour_by="Cell_Cycle") plotReducedDim(example_sce, "PCA", colour_by="Gene_0001") plotReducedDim(example_sce, "PCA", ncomponents=5) plotReducedDim(example_sce, "PCA", ncomponents=5, colour_by="Cell_Cycle", shape_by="Treatment") # Use scattermore plotPCA(example_sce, ncomponents = 4, scattermore = TRUE, point_size = 3) # Bin to show point density plotPCA(example_sce, bins = 10) # Bin to summarize values (default is sum) plotPCA(example_sce, bins = 10, colour_by = "Gene_0001")
Produce a relative log expression (RLE) plot of one or more transformations of cell expression values.
plotRLE( object, exprs_values = "logcounts", exprs_logged = TRUE, style = "minimal", legend = TRUE, ordering = NULL, colour_by = color_by, by_exprs_values = exprs_values, BPPARAM = BiocParallel::bpparam(), color_by = NULL, assay.type = exprs_values, by.assay.type = by_exprs_values, assay_logged = exprs_logged, ... )
plotRLE( object, exprs_values = "logcounts", exprs_logged = TRUE, style = "minimal", legend = TRUE, ordering = NULL, colour_by = color_by, by_exprs_values = exprs_values, BPPARAM = BiocParallel::bpparam(), color_by = NULL, assay.type = exprs_values, by.assay.type = by_exprs_values, assay_logged = exprs_logged, ... )
object |
A SingleCellExperiment object. |
exprs_values |
Alias to |
exprs_logged |
A logical scalar indicating whether the expression matrix is already log-transformed. If not, a log2-transformation (+1) will be performed prior to plotting. |
style |
String defining the boxplot style to use, either |
legend |
Logical scalar specifying whether a legend should be shown. |
ordering |
A vector specifying the ordering of cells in the RLE plot. This can be useful for arranging cells by experimental conditions or batches. |
colour_by |
Specification of a column metadata field or a feature to colour by, see the |
by_exprs_values |
Alias to |
BPPARAM |
A BiocParallelParam object to be used to parallelise operations using |
color_by |
Alias to |
assay.type |
A string or integer scalar specifying the expression matrix in |
by.assay.type |
A string or integer scalar specifying which assay to obtain expression values from,
for use in point aesthetics - see the |
assay_logged |
Alias to |
... |
further arguments passed to |
Relative log expression (RLE) plots are a powerful tool for visualising unwanted variation in high dimensional data. These plots were originally devised for gene expression data from microarrays but can also be used on single-cell expression data. RLE plots are particularly useful for assessing whether a procedure aimed at removing unwanted variation (e.g., scaling normalisation) has been successful.
If style is “full”, the usual ggplot2 boxplot is created for each cell. Here, the box shows the inter-quartile range and whiskers extend no more than 1.5 times the IQR from the hinge (the 25th or 75th percentile). Data beyond the whiskers are called outliers and are plotted individually. The median (50th percentile) is shown with a white bar. This approach is detailed and flexible, but can take a long time to plot for large datasets.
If style is “minimal”, a Tufte-style boxplot is created for each cell. Here, the median is shown with a circle, the IQR in a grey line, and “whiskers” (as defined above) for the plots are shown with coloured lines. No outliers are shown for this plot style. This approach is more succinct and faster for large numbers of cells.
A ggplot object
Davis McCarthy, with modifications by Aaron Lun
Gandolfo LC, Speed TP (2017). RLE plots: visualising unwanted variation in high dimensional data. arXiv.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) plotRLE(example_sce, colour_by = "Mutation_Status", style = "minimal") plotRLE(example_sce, colour_by = "Mutation_Status", style = "full", outlier.alpha = 0.1, outlier.shape = 3, outlier.size = 0)
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) plotRLE(example_sce, colour_by = "Mutation_Status", style = "minimal") plotRLE(example_sce, colour_by = "Mutation_Status", style = "full", outlier.alpha = 0.1, outlier.shape = 3, outlier.size = 0)
Plot row-level (i.e., gene) metadata from a SingleCellExperiment object.
plotRowData( object, y, x = NULL, colour_by = color_by, shape_by = NULL, size_by = NULL, by_exprs_values = "logcounts", other_fields = list(), color_by = NULL, by.assay.type = by_exprs_values, ... )
plotRowData( object, y, x = NULL, colour_by = color_by, shape_by = NULL, size_by = NULL, by_exprs_values = "logcounts", other_fields = list(), color_by = NULL, by.assay.type = by_exprs_values, ... )
object |
A SingleCellExperiment object containing expression values and experimental information. |
y |
String specifying the column-level metadata field to show on the y-axis.
Alternatively, an AsIs vector or data.frame, see |
x |
String specifying the column-level metadata to show on the x-axis.
Alternatively, an AsIs vector or data.frame, see |
colour_by |
Specification of a row metadata field or a cell to colour by, see |
shape_by |
Specification of a row metadata field or a cell to shape by, see |
size_by |
Specification of a row metadata field or a cell to size by, see |
by_exprs_values |
Alias to |
other_fields |
Additional feature-based fields to include in the data.frame, see |
color_by |
Alias to |
by.assay.type |
A string or integer scalar specifying which assay to obtain expression values from,
for use in point aesthetics - see |
... |
Additional arguments for visualization, see |
If y
is continuous and x=NULL
, a violin plot is generated.
If x
is categorical, a grouped violin plot will be generated, with one violin for each level of x
.
If x
is continuous, a scatter plot will be generated.
If y
is categorical and x
is continuous, horizontal violin plots will be generated.
If x
is missing or categorical, rectangule plots will be generated where the area of a rectangle is proportional to the number of points for a combination of factors.
A ggplot object.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) rowData(example_sce) <- cbind(rowData(example_sce), perFeatureQCMetrics(example_sce)) plotRowData(example_sce, y="detected", x="mean") + scale_x_log10()
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) rowData(example_sce) <- cbind(rowData(example_sce), perFeatureQCMetrics(example_sce)) plotRowData(example_sce, y="detected", x="mean") + scale_x_log10()
Plot the relative proportion of the library size that is accounted for by the most highly expressed features for each cell in a SingleCellExperiment object.
plotScater( x, nfeatures = 500, exprs_values = "counts", colour_by = color_by, by_exprs_values = exprs_values, block1 = NULL, block2 = NULL, ncol = 3, line_width = 1.5, theme_size = 10, color_by = NULL, assay.type = exprs_values, by.assay.type = by_exprs_values )
plotScater( x, nfeatures = 500, exprs_values = "counts", colour_by = color_by, by_exprs_values = exprs_values, block1 = NULL, block2 = NULL, ncol = 3, line_width = 1.5, theme_size = 10, color_by = NULL, assay.type = exprs_values, by.assay.type = by_exprs_values )
x |
A SingleCellExperiment object. |
nfeatures |
Numeric scalar indicating the number of top-expressed features to show n the plot. |
exprs_values |
Alias to |
colour_by |
Specification of a column metadata field or a feature to colour by, see the |
by_exprs_values |
Alias to |
block1 |
String specifying the column-level metadata field by which to separate the cells into separate panels in the plot.
Alternatively, an AsIs vector or data.frame, see |
block2 |
Same as |
ncol |
Number of columns to use for |
line_width |
Numeric scalar specifying the line width. |
theme_size |
Numeric scalar specifying the font size to use for the plotting theme. |
color_by |
Alias to |
assay.type |
String or integer scalar indicating which assay of |
by.assay.type |
A string or integer scalar specifying which assay to obtain expression values from,
for use in point aesthetics - see the |
For each cell, the features are ordered from most-expressed to least-expressed.
The cumulative proportion of the total expression for the cell is computed across the top nfeatures
features.
These plots can flag cells with a very high proportion of the library coming from a small number of features; such cells are likely to be problematic for downstream analyses.
Using the colour and blocking arguments can flag overall differences in cells under different experimental conditions or affected by different batch and other variables.
If only one of block1
and block2
are specified, each panel corresponds to a separate level of the specified blocking factor.
If both are specified, each panel corresponds to a combination of levels.
A ggplot object.
Davis McCarthy, with modifications by Aaron Lun
example_sce <- mockSCE() plotScater(example_sce) plotScater(example_sce, assay.type = "counts", colour_by = "Cell_Cycle") plotScater(example_sce, block1 = "Treatment", colour_by = "Cell_Cycle")
example_sce <- mockSCE() plotScater(example_sce) plotScater(example_sce, assay.type = "counts", colour_by = "Cell_Cycle") plotScater(example_sce, block1 = "Treatment", colour_by = "Cell_Cycle")
Projects observations into arbitrary dimensionality reduction space (e.g., t-SNE, UMAP) using a tricube weighted average of the k nearest neighbours.
projectReducedDim(x, ...) ## S4 method for signature 'matrix' projectReducedDim(x, old.embedding, ...) ## S4 method for signature 'SummarizedExperiment' projectReducedDim( x, old.sce, dimred.embed = "TSNE", dimred.knn = "PCA", dimred.name = dimred.embed, k = 5 )
projectReducedDim(x, ...) ## S4 method for signature 'matrix' projectReducedDim(x, old.embedding, ...) ## S4 method for signature 'SummarizedExperiment' projectReducedDim( x, old.sce, dimred.embed = "TSNE", dimred.knn = "PCA", dimred.name = dimred.embed, k = 5 )
x |
A numeric matrix of a dimensionality reduction containing the cells that should be projected into the existing embedding defined in either |
... |
Passed to methods. |
old.embedding |
If |
old.sce |
The object containing the original dimensionality points. If |
dimred.embed |
The name of the target dimensionality reduction that points should be embedded into, if . |
dimred.knn |
The name of the dimensionality reduction to use to identify the K-nearest neighbours from |
dimred.name |
The name of the dimensionality reduction that the projected embedding will be saved as, for the SummarizedExperiment method. |
k |
The number of nearest neighours to use to project points into the embedding. |
When x
is a matrix, a matrix is returned. When x
is a
SummarizedExperiment
(or SingleCellExperiment
), the return value is of
the same class as the input, but the projected dimensionality reduction
is added as a reducedDim
field.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runUMAP(example_sce) example_sce <- runPCA(example_sce) example_sce_new <- mockSCE() example_sce_new <- logNormCounts(example_sce_new) example_sce_new <- runPCA(example_sce_new) ## sce method projectReducedDim( example_sce_new, old.sce = example_sce, dimred.embed="UMAP", dimred.knn="PCA" ) ## matrix method projectReducedDim( reducedDim(example_sce, "PCA"), new.points = reducedDim(example_sce_new, "PCA"), old.embedding = reducedDim(example_sce, "UMAP") )
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runUMAP(example_sce) example_sce <- runPCA(example_sce) example_sce_new <- mockSCE() example_sce_new <- logNormCounts(example_sce_new) example_sce_new <- runPCA(example_sce_new) ## sce method projectReducedDim( example_sce_new, old.sce = example_sce, dimred.embed="UMAP", dimred.knn="PCA" ) ## matrix method projectReducedDim( reducedDim(example_sce, "PCA"), new.points = reducedDim(example_sce_new, "PCA"), old.embedding = reducedDim(example_sce, "UMAP") )
Wrapper functions to create plots for specific types of reduced dimension results in a SingleCellExperiment object.
plotPCASCE(object, ..., ncomponents = 2, dimred = "PCA") plotTSNE(object, ..., ncomponents = 2, dimred = "TSNE") plotUMAP(object, ..., ncomponents = 2, dimred = "UMAP") plotDiffusionMap(object, ..., ncomponents = 2, dimred = "DiffusionMap") plotMDS(object, ..., ncomponents = 2, dimred = "MDS") plotNMF(object, ..., ncomponents = 2, dimred = "NMF") ## S4 method for signature 'SingleCellExperiment' plotPCA(object, ..., ncomponents = 2, dimred = "PCA")
plotPCASCE(object, ..., ncomponents = 2, dimred = "PCA") plotTSNE(object, ..., ncomponents = 2, dimred = "TSNE") plotUMAP(object, ..., ncomponents = 2, dimred = "UMAP") plotDiffusionMap(object, ..., ncomponents = 2, dimred = "DiffusionMap") plotMDS(object, ..., ncomponents = 2, dimred = "MDS") plotNMF(object, ..., ncomponents = 2, dimred = "NMF") ## S4 method for signature 'SingleCellExperiment' plotPCA(object, ..., ncomponents = 2, dimred = "PCA")
object |
A SingleCellExperiment object. |
... |
Additional arguments to pass to |
ncomponents |
Numeric scalar indicating the number of dimensions components to (calculate and) plot.
This can also be a numeric vector, see |
dimred |
A string or integer scalar indicating the reduced dimension
result in |
Each function is a convenient wrapper around plotReducedDim
that searches the reducedDims
slot for an appropriately named dimensionality reduction result:
"PCA"
for plotPCA
"TSNE"
for plotTSNE
"DiffusionMap"
for plotDiffusionMap
"MDS"
for "plotMDS"
"NMF"
for "plotNMF"
"UMAP"
for "plotUMAP"
Its only purpose is to streamline workflows to avoid the need to specify the dimred
argument.
A ggplot object.
Davis McCarthy, with modifications by Aaron Lun
runPCA
,
runDiffusionMap
,
runTSNE
,
runMDS
,
runNMF
,
and runUMAP
,
for the functions that actually perform the calculations.
plotReducedDim
, for the underlying plotting function.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runPCA(example_sce) ## Examples plotting PC1 and PC2 plotPCA(example_sce) plotPCA(example_sce, colour_by = "Cell_Cycle") plotPCA(example_sce, colour_by = "Cell_Cycle", shape_by = "Treatment") ## Examples plotting more than 2 PCs plotPCA(example_sce, ncomponents = 4, colour_by = "Treatment", shape_by = "Mutation_Status") ## Same for TSNE: example_sce <- runTSNE(example_sce) plotTSNE(example_sce, colour_by="Mutation_Status") ## Not run: ## Same for DiffusionMaps: example_sce <- runDiffusionMap(example_sce) plotDiffusionMap(example_sce) ## End(Not run) ## Same for MDS plots: example_sce <- runMDS(example_sce) plotMDS(example_sce)
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) example_sce <- runPCA(example_sce) ## Examples plotting PC1 and PC2 plotPCA(example_sce) plotPCA(example_sce, colour_by = "Cell_Cycle") plotPCA(example_sce, colour_by = "Cell_Cycle", shape_by = "Treatment") ## Examples plotting more than 2 PCs plotPCA(example_sce, ncomponents = 4, colour_by = "Treatment", shape_by = "Mutation_Status") ## Same for TSNE: example_sce <- runTSNE(example_sce) plotTSNE(example_sce, colour_by="Mutation_Status") ## Not run: ## Same for DiffusionMaps: example_sce <- runDiffusionMap(example_sce) plotDiffusionMap(example_sce) ## End(Not run) ## Same for MDS plots: example_sce <- runMDS(example_sce) plotMDS(example_sce)
Retrieves a per-cell (meta)data field from a SingleCellExperiment based on a single keyword, typically for use in visualization functions.
retrieveCellInfo( x, by, search = c("colData", "assays", "altExps"), exprs_values = "logcounts", swap_rownames = NULL, assay.type = exprs_values )
retrieveCellInfo( x, by, search = c("colData", "assays", "altExps"), exprs_values = "logcounts", swap_rownames = NULL, assay.type = exprs_values )
x |
A SingleCellExperiment object. |
by |
A string specifying the field to extract (see Details). Alternatively, a data.frame, DataFrame or an AsIs vector. |
search |
Character vector specifying the types of data or metadata to use. |
exprs_values |
Alias to |
swap_rownames |
Column name of |
assay.type |
String or integer scalar specifying the assay from which expression values should be extracted. |
Given an AsIs-wrapped vector in by
, this function will directly return the vector values as value
,
while name
is set to an empty string.
For data.frame or DataFrame instances with a single column,
this function will return the vector from that column as value
and the column name as name
.
This allows downstream visualization functions to accommodate arbitrary inputs for adjusting aesthetics.
Given a character string in by
, this function will:
Search colData
for a column named by
,
and return the corresponding field as the output value
.
We do not consider nested elements within the colData
.
Search assay(x, assay.type)
for a row named by
,
and return the expression vector for this feature as the output value
.
Search each alternative experiment in altExps(x)
for a row names by
,
and return the expression vector for this feature at assay.type
as the output value
.
Any match will cause the function to return without considering later possibilities.
The search can be modified by changing the presence and ordering of elements in search
.
If there is a name clash that results in retrieval of an unintended field,
users should explicitly set by
to a data.frame, DataFrame or AsIs-wrapped vector containing the desired values.
Developers can also consider setting search
to control the fields that are returned.
A list containing name
, a string with the name of the extracted field (usually identically to by
);
and value
, a vector of length equal to ncol(x)
containing per-cell (meta)data values.
If by=NULL
, both name
and value
are set to NULL
.
Aaron Lun
makePerCellDF
, which provides a more user-friendly interface to this function.
plotColData
,
plotReducedDim
,
plotExpression
,
plotPlatePosition
,
and most other cell-based plotting functions.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) retrieveCellInfo(example_sce, "Cell_Cycle") retrieveCellInfo(example_sce, "Gene_0001") arbitrary.field <- rnorm(ncol(example_sce)) retrieveCellInfo(example_sce, I(arbitrary.field)) retrieveCellInfo(example_sce, data.frame(stuff=arbitrary.field))
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) retrieveCellInfo(example_sce, "Cell_Cycle") retrieveCellInfo(example_sce, "Gene_0001") arbitrary.field <- rnorm(ncol(example_sce)) retrieveCellInfo(example_sce, I(arbitrary.field)) retrieveCellInfo(example_sce, data.frame(stuff=arbitrary.field))
Retrieves a per-feature (meta)data field from a SingleCellExperiment based on a single keyword, typically for use in visualization functions.
retrieveFeatureInfo( x, by, search = c("rowData", "assays"), exprs_values = "logcounts", assay.type = exprs_values )
retrieveFeatureInfo( x, by, search = c("rowData", "assays"), exprs_values = "logcounts", assay.type = exprs_values )
x |
A SingleCellExperiment object. |
by |
A string specifying the field to extract (see Details). Alternatively, a data.frame, DataFrame or an AsIs vector. |
search |
Character vector specifying the types of data or metadata to use. |
exprs_values |
Alias to |
assay.type |
String or integer scalar specifying the assay from which expression values should be extracted. |
Given a AsIs-wrapped vector in by
, this function will directly return the vector values as value
,
while name
is set to an empty string.
For data.frame or DataFrame instances with a single column,
this function will return the vector from that column as value
and the column name as name
.
This allows downstream visualization functions to accommodate arbitrary inputs for adjusting aesthetics.
Given a character string in by
, this function will:
Search rowData
for a column named by
,
and return the corresponding field as the output value
.
We do not consider nested elements within the rowData
.
Search assay(x, assay.type)
for a column named by
,
and return the expression vector for this feature as the output value
.
Any match will cause the function to return without considering later possibilities.
The search can be modified by changing the presence and ordering of elements in search
.
If there is a name clash that results in retrieval of an unintended field,
users should explicitly set by
to a data.frame, DataFrame or AsIs-wrapped vector containing the desired values.
Developers can also consider setting search
to control the fields that are returned.
A list containing name
, a string with the name of the extracted field (usually identically to by
);
and value
, a vector of length equal to ncol(x)
containing per-feature (meta)data values.
If by=NULL
, both name
and value
are set to NULL
.
Aaron Lun
makePerFeatureDF
, which provides a more user-friendly interface to this function.
plotRowData
and other feature-based plotting functions.
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) rowData(example_sce)$blah <- sample(LETTERS, nrow(example_sce), replace=TRUE) str(retrieveFeatureInfo(example_sce, "blah")) str(retrieveFeatureInfo(example_sce, "Cell_001")) arbitrary.field <- rnorm(nrow(example_sce)) str(retrieveFeatureInfo(example_sce, I(arbitrary.field))) str(retrieveFeatureInfo(example_sce, data.frame(stuff=arbitrary.field)))
example_sce <- mockSCE() example_sce <- logNormCounts(example_sce) rowData(example_sce)$blah <- sample(LETTERS, nrow(example_sce), replace=TRUE) str(retrieveFeatureInfo(example_sce, "blah")) str(retrieveFeatureInfo(example_sce, "Cell_001")) arbitrary.field <- rnorm(nrow(example_sce)) str(retrieveFeatureInfo(example_sce, I(arbitrary.field))) str(retrieveFeatureInfo(example_sce, data.frame(stuff=arbitrary.field)))
Perform a principal components analysis (PCA) on cells, based on the column metadata in a SingleCellExperiment object.
runColDataPCA( x, ncomponents = 2, variables = NULL, scale = TRUE, outliers = FALSE, BSPARAM = ExactParam(), BPPARAM = SerialParam(), name = "PCA_coldata" )
runColDataPCA( x, ncomponents = 2, variables = NULL, scale = TRUE, outliers = FALSE, BSPARAM = ExactParam(), BPPARAM = SerialParam(), name = "PCA_coldata" )
x |
A SingleCellExperiment object. |
ncomponents |
Numeric scalar indicating the number of principal components to obtain. |
variables |
List of strings or a character vector indicating which variables in |
scale |
Logical scalar, should the expression values be standardised so that each feature has unit variance? This will also remove features with standard deviations below 1e-8. |
outliers |
Logical indicating whether outliers should be detected based on PCA coordinates. |
BSPARAM |
A BiocSingularParam object specifying which algorithm should be used to perform the PCA. |
BPPARAM |
A BiocParallelParam object specifying whether the PCA should be parallelized. |
name |
String specifying the name to be used to store the result in the |
This function performs PCA on variables from the column-level metadata instead of the gene expression matrix.
Doing so can be occasionally useful when other forms of experimental data are stored in the colData
,
e.g., protein intensities from FACs or other cell-specific phenotypic information.
This function is particularly useful for identifying low-quality cells based on QC metrics with outliers=TRUE
.
This uses an “outlyingness” measure computed by adjOutlyingness
in the robustbase package.
Outliers are defined those cells with outlyingness values more than 5 MADs above the median, using isOutlier
.
A SingleCellExperiment object containing the first ncomponent
principal coordinates for each cell.
By default, these are stored in the "PCA_coldata"
entry of the reducedDims
slot.
The proportion of variance explained by each PC is stored as a numeric vector in the "percentVar"
attribute.
If outliers=TRUE
, the output colData
will also contain a logical outlier
field.
This specifies the cells that correspond to the identified outliers.
Aaron Lun, based on code by Davis McCarthy
runPCA
, for the corresponding method operating on expression data.
example_sce <- mockSCE() qc.df <- perCellQCMetrics(example_sce, subset=list(Mito=1:10)) colData(example_sce) <- cbind(colData(example_sce), qc.df) # Can supply names of colData variables to 'variables', # as well as AsIs-wrapped vectors of interest. example_sce <- runColDataPCA(example_sce, variables=list( "sum", "detected", "subsets_Mito_percent", "altexps_Spikes_percent" )) reducedDimNames(example_sce) head(reducedDim(example_sce))
example_sce <- mockSCE() qc.df <- perCellQCMetrics(example_sce, subset=list(Mito=1:10)) colData(example_sce) <- cbind(colData(example_sce), qc.df) # Can supply names of colData variables to 'variables', # as well as AsIs-wrapped vectors of interest. example_sce <- runColDataPCA(example_sce, variables=list( "sum", "detected", "subsets_Mito_percent", "altexps_Spikes_percent" )) reducedDimNames(example_sce) head(reducedDim(example_sce))
Perform UMAP with multiple input matrices by intersecting their simplicial sets. Typically used to combine results from multiple data modalities into a single embedding.
calculateMultiUMAP(x, ...) ## S4 method for signature 'ANY' calculateMultiUMAP(x, ..., metric = "euclidean") ## S4 method for signature 'SummarizedExperiment' calculateMultiUMAP( x, exprs_values, metric = "euclidean", assay.type = exprs_values, ... ) ## S4 method for signature 'SingleCellExperiment' calculateMultiUMAP( x, exprs_values, dimred, altexp, altexp_exprs_values = "logcounts", assay.type = exprs_values, altexp.assay.type = altexp_exprs_values, ... ) runMultiUMAP(x, ..., name = "MultiUMAP")
calculateMultiUMAP(x, ...) ## S4 method for signature 'ANY' calculateMultiUMAP(x, ..., metric = "euclidean") ## S4 method for signature 'SummarizedExperiment' calculateMultiUMAP( x, exprs_values, metric = "euclidean", assay.type = exprs_values, ... ) ## S4 method for signature 'SingleCellExperiment' calculateMultiUMAP( x, exprs_values, dimred, altexp, altexp_exprs_values = "logcounts", assay.type = exprs_values, altexp.assay.type = altexp_exprs_values, ... ) runMultiUMAP(x, ..., name = "MultiUMAP")
x |
For Alternatively, a SummarizedExperiment containing relevant matrices in its assays. Alternatively, a SingleCellExperiment containing relevant matrices in its assays, |
... |
For the generic, further arguments to pass to specific methods. For the ANY method, further arguments to pass to For the SummarizedExperiment and SingleCellExperiment methods, and for |
metric |
Character vector specifying the type of distance to use for each matrix in |
exprs_values |
Alias to |
assay.type |
A character or integer vector of assays to extract and transpose for use in the UMAP. For the SingleCellExperiment, this argument can be missing, in which case no assays are used. |
dimred |
A character or integer vector of |
altexp |
A character or integer vector of |
altexp_exprs_values |
Alias to |
altexp.assay.type |
A character or integer vector specifying the assay to extract from alternative experiments, when |
name |
String specifying the name of the |
These functions serve as convenience wrappers around umap
for multi-modal analysis.
The idea is that each input matrix in x
corresponds to data for a different mode.
A typical example would consist of the PC coordinates generated from gene expression counts,
plus the log-abundance matrix for ADT counts from CITE-seq experiments;
one might also include matrices of transformed intensities from indexed FACS, to name some more possibilities.
Roughly speaking, the idea is to identify nearest neighbors within each mode to construct the simplicial sets. Integration of multiple modes is performed by intersecting the sets to obtain a single graph, which is used in the rest of the UMAP algorithm. By performing an intersection, we focus on relationships between cells that are consistently neighboring across all the modes, thus providing greater resolution of differences at any mode. The neighbor search within each mode also avoids difficulties with quantitative comparisons of distances between modes.
The most obvious use of this function is to generate a low-dimensional embedding for visualization.
However, users can also set n_components
to a higher value (e.g., 10-20) to retain more information for downstream steps like clustering.
This
Do, however, remember to set the seed appropriately.
By default, all modes use the distance metric of metric
to construct the simplicial sets within each mode.
However, it is possible to vary this by supplying a vector of metrics, e.g., "euclidean"
for the first matrix, "manhattan"
for the second.
For the SingleCellExperiment method, matrices are extracted in the order of assays, reduced dimensions and alternative experiments,
so any variation in metrics
is also assumed to follow this order.
For calculateMultiUMAP
, a numeric matrix containing the low-dimensional UMAP embedding.
For runMultiUMAP
, x
is returned with a MultiUMAP
field in its reducedDims
.
Aaron Lun
runUMAP
, for the more straightforward application of UMAP.
# Mocking up a gene expression + ADT dataset: exprs_sce <- mockSCE() exprs_sce <- logNormCounts(exprs_sce) exprs_sce <- runPCA(exprs_sce) adt_sce <- mockSCE(ngenes=20) adt_sce <- logNormCounts(adt_sce) altExp(exprs_sce, "ADT") <- adt_sce # Running a multimodal analysis using PCs for expression # and log-counts for the ADTs: exprs_sce <- runMultiUMAP(exprs_sce, dimred="PCA", altexp="ADT") plotReducedDim(exprs_sce, "MultiUMAP")
# Mocking up a gene expression + ADT dataset: exprs_sce <- mockSCE() exprs_sce <- logNormCounts(exprs_sce) exprs_sce <- runPCA(exprs_sce) adt_sce <- mockSCE(ngenes=20) adt_sce <- logNormCounts(adt_sce) altExp(exprs_sce, "ADT") <- adt_sce # Running a multimodal analysis using PCs for expression # and log-counts for the ADTs: exprs_sce <- runMultiUMAP(exprs_sce, dimred="PCA", altexp="ADT") plotReducedDim(exprs_sce, "MultiUMAP")
Provides functions for convenient visualization of single-cell data, mostly via ggplot2. It also used to provide utilities for data transformation and quality control, but these have largely been moved to the scuttle package.
Davis McCarthy, Aaron Lun
scater functions that plot points share a number of visualization parameters, which are described on this page.
add_legend
:Logical scalar, specifying whether a legend should be shown. Defaults to TRUE.
theme_size
:Integer scalar, specifying the font size. Defaults to 10.
point_alpha
:Numeric scalar in [0, 1], specifying the transparency. Defaults to 0.6.
point_size
:Numeric scalar, specifying the size of the points.
Defaults to NULL
.
point_shape
:An integer, or a string specifying the shape
of the points. Details see vignette("ggplot2-specs")
. Defaults to
19
.
jitter_type
:String to define how points are to be jittered in a violin plot.
This is either with random jitter on the x-axis ("jitter"
) or in a “beeswarm” style (if "swarm"
, default).
The latter usually looks more attractive, but for datasets with a large number of cells, or for dense plots, the jitter option may work better.
show_median
:Logical, should the median of the distribution be shown for violin plots?
Defaults to FALSE
.
show_violin
:Logical, should the outline of a violin plot be shown?
Defaults to TRUE
.
show_smooth
:Logical, should a smoother be fitted to a scatter plot?
Defaults to FALSE
.
show_se
:Logical, should standard errors for the fitted line be shown on a scatter plot when show_smooth=TRUE
?
Defaults to TRUE
.
show_boxplot
:Logical, should a box plot be shown? Defaults to FALSE
.
Addititional fields can be added to the
data.frame passed to ggplot by setting the other_fields
argument. This allows users to easily incorporate additional metadata for
use in further ggplot operations.
The other_fields
argument should be character vector where each
string is passed to retrieveCellInfo
(for cell-based plots)
or retrieveFeatureInfo
(for feature-based plots).
Alternatively, other_fields
can be a named list where each element
is of any type accepted by retrieveCellInfo
or
retrieveFeatureInfo
. This includes AsIs-wrapped
vectors, data.frames or DataFrames.
Each additional column of the output data.frame will be named according to
the name
returned by retrieveCellInfo
or
retrieveFeatureInfo
. If these clash with inbuilt names (e.g.,
X
, Y
, colour_by
), a warning will be raised and the
additional column will not be added to avoid overwriting an existing
column.
plotColData
, plotRowData
,
plotReducedDim
, plotExpression
,
plotPlatePosition
, and most other plotting functions.
S4 class and the main class used by scater to hold single cell expression data. SCESet extends the basic Bioconductor ExpressionSet class.
This class is initialized from a matrix of expression values.
Methods that operate on SCESet objects constitute the basic scater workflow.
logExprsOffset
:Scalar of class "numeric"
, providing an offset
applied to expression data in the 'exprs' slot when undergoing log2-transformation
to avoid trying to take logs of zero.
lowerDetectionLimit
:Scalar of class "numeric"
,
giving the lower limit for an expression value to be classified as
"expressed".
cellPairwiseDistances
:Matrix of class "numeric"
,
containing pairwise distances between cells.
featurePairwiseDistances
:Matrix of class "numeric"
,
containing pairwise distances between features.
reducedDimension
:Matrix of class "numeric"
, containing
reduced-dimension coordinates for cells (generated, for example, by PCA).
bootstraps
:Array of class "numeric"
that can contain
bootstrap estimates of the expression or count values.
sc3
:List containing results from consensus clustering from the SC3 package.
featureControlInfo
:Data frame of class
"AnnotatedDataFrame"
that can contain information/metadata about
sets of control features defined for the SCESet
object.
bootstrap estimates of the expression or count values.
Thanks to the Monocle package (github.com/cole-trapnell-lab/monocle-release/) for their CellDataSet class, which provided the inspiration and template for SCESet.
Convert an SCESet object produced with an older version of the package to a SingleCellExperiment object compatible with the current version.
updateSCESet(object) toSingleCellExperiment(object)
updateSCESet(object) toSingleCellExperiment(object)
object |
an |
a SingleCellExperiment
object
## Not run: updateSCESet(example_sceset) ## End(Not run) ## Not run: toSingleCellExperiment(example_sceset) ## End(Not run)
## Not run: updateSCESet(example_sceset) ## End(Not run) ## Not run: toSingleCellExperiment(example_sceset) ## End(Not run)