Title: | Correspondence Analysis for Single Cell Data |
---|---|
Description: | Correspondence analysis (CA) is a matrix factorization method, and is similar to principal components analysis (PCA). Whereas PCA is designed for application to continuous, approximately normally distributed data, CA is appropriate for non-negative, count-based data that are in the same additive scale. The corral package implements CA for dimensionality reduction of a single matrix of single-cell data, as well as a multi-table adaptation of CA that leverages data-optimized scaling to align data generated from different sequencing platforms by projecting into a shared latent space. corral utilizes sparse matrices and a fast implementation of SVD, and can be called directly on Bioconductor objects (e.g., SingleCellExperiment) for easy pipeline integration. The package also includes additional options, including variations of CA to address overdispersion in count data (e.g., Freeman-Tukey chi-squared residual), as well as the option to apply CA-style processing to continuous data (e.g., proteomic TOF intensities) with the Hellinger distance adaptation of CA. |
Authors: | Lauren Hsu [aut, cre] , Aedin Culhane [aut] |
Maintainer: | Lauren Hsu <[email protected]> |
License: | GPL-2 |
Version: | 1.17.0 |
Built: | 2024-10-30 05:40:46 UTC |
Source: | https://github.com/bioc/corral |
Add embeddings to list of SCEs
add_embeddings2scelist(scelist, embeddings, slotname = "corralm")
add_embeddings2scelist(scelist, embeddings, slotname = "corralm")
scelist |
list of SingleCellExperiments; to which the corresponding embeddings should be added |
embeddings |
matrix; the embeddings outputted from a dimension reduction, e.g. |
slotname |
character; name of the slot for the reduced dim embedding; defaults to |
list of SingleCellExperiments with respective embeddings stored in them
library(DuoClustering2018) sce <- sce_full_Zhengmix4eq() scelist <- list(sce,sce) embeddings <- matrix(sample(seq(0,20,1),dim(sce)[2]*6,replace = TRUE),nrow = dim(sce)[2]*2) scelist <- add_embeddings2scelist(scelist, embeddings)
library(DuoClustering2018) sce <- sce_full_Zhengmix4eq() scelist <- list(sce,sce) embeddings <- matrix(sample(seq(0,20,1),dim(sce)[2]*6,replace = TRUE),nrow = dim(sce)[2]*2) scelist <- add_embeddings2scelist(scelist, embeddings)
Checks if all elements of a list or List are of a (single) particular type typechar
all_are(inplist, typechar)
all_are(inplist, typechar)
inplist |
list or List to be checked |
typechar |
char of the type to check for |
boolean, for whether the elements of inplist
are all typechar
x <- list(1,2) all_are(x,'numeric') all_are(x,'char') y <- list(1,2,'c') all_are(y,'numeric') all_are(y,'char')
x <- list(1,2) all_are(x,'numeric') all_are(x,'char') y <- list(1,2,'c') all_are(y,'numeric') all_are(y,'char')
Generate biplot for corral object
biplot_corral( corral_obj, color_vec, text_vec, feat_name = "(genes)", nfeat = 20, xpc = 1, plot_title = "Biplot", text_size = 2, xjitter = 0.005, yjitter = 0.005, coords = c("svd", "PC", "SC") )
biplot_corral( corral_obj, color_vec, text_vec, feat_name = "(genes)", nfeat = 20, xpc = 1, plot_title = "Biplot", text_size = 2, xjitter = 0.005, yjitter = 0.005, coords = c("svd", "PC", "SC") )
corral_obj |
list outputted by the |
color_vec |
vector; length should correspond to the number of rows in v of |
text_vec |
vector; length should correspond to the number of rows in u of |
feat_name |
char; the label will in the legend. Defaults to |
nfeat |
int; the number of features to include. The function will first order them by distance from origin in the selected dimensions, then select the top n to be displayed. |
xpc |
int; which PC to put on the x-axis (defaults to 1) |
plot_title |
char; title of plot (defaults to *Biplot*) |
text_size |
numeric; size of the feature labels given in |
xjitter |
numeric; the amount of jitter for the text labels in x direction (defaults to .005; for |
yjitter |
numeric; the amount of jitter for the text labels in y direction (defaults to .005; for |
coords |
char; indicator for sets of coordinates to use. |
ggplot2 object of the biplot
library(DuoClustering2018) library(SingleCellExperiment) zm4eq.sce <- sce_full_Zhengmix4eq() zm4eq.countmat <- counts(zm4eq.sce) zm4eq.corral_obj <- corral(zm4eq.countmat) gene_names <- rowData(zm4eq.sce)$symbol ctvec <- zm4eq.sce$phenoid biplot_corral(corral_obj = zm4eq.corral_obj, color_vec = ctvec, text_vec = gene_names)
library(DuoClustering2018) library(SingleCellExperiment) zm4eq.sce <- sce_full_Zhengmix4eq() zm4eq.countmat <- counts(zm4eq.sce) zm4eq.corral_obj <- corral(zm4eq.countmat) gene_names <- rowData(zm4eq.sce)$symbol ctvec <- zm4eq.sce$phenoid biplot_corral(corral_obj = zm4eq.corral_obj, color_vec = ctvec, text_vec = gene_names)
Computes SVD.
compsvd(mat, method = c("irl", "svd"), ncomp = 30, ...)
compsvd(mat, method = c("irl", "svd"), ncomp = 30, ...)
mat |
matrix, pre-processed input; can be sparse or full (pre-processing can be performed using |
method |
character, the algorithm to be used for svd. Default is irl. Currently supports 'irl' for irlba::irlba or 'svd' for stats::svd |
ncomp |
numeric, number of components; Default is 30 |
... |
(additional arguments for methods) |
SVD result - a list with the following elements:
d
a vector of the diagonal singular values of the input mat
. Note that using svd
will result in the full set of singular values, while irlba
will only compute the first ncomp
singular values.
u
a matrix of with the left singular vectors of mat
in the columns
v
a matrix of with the right singular vectors of mat
in the columns
eigsum
sum of the eigenvalues, for calculating percent variance explained
mat <- matrix(sample(0:10, 2500, replace=TRUE), ncol=50) compsvd(mat, method = 'irl', ncomp = 5)
mat <- matrix(sample(0:10, 2500, replace=TRUE), ncol=50) compsvd(mat, method = 'irl', ncomp = 5)
corral can be used for dimension reduction to find a set of low-dimensional embeddings for a count matrix.
corral
is a wrapper for corral_mat
and corral_sce
, and can be called on any of the acceptable input types.
corral_mat( inp, method = c("irl", "svd"), ncomp = 30, row.w = NULL, col.w = NULL, rtype = c("standardized", "indexed", "hellinger", "freemantukey", "pearson"), vst_mth = c("none", "sqrt", "freemantukey", "anscombe"), ... ) corral_sce( inp, method = c("irl", "svd"), ncomp = 30, whichmat = "counts", fullout = FALSE, subset_row = NULL, ... ) corral(inp, ...) ## S3 method for class 'corral' print(x, ...)
corral_mat( inp, method = c("irl", "svd"), ncomp = 30, row.w = NULL, col.w = NULL, rtype = c("standardized", "indexed", "hellinger", "freemantukey", "pearson"), vst_mth = c("none", "sqrt", "freemantukey", "anscombe"), ... ) corral_sce( inp, method = c("irl", "svd"), ncomp = 30, whichmat = "counts", fullout = FALSE, subset_row = NULL, ... ) corral(inp, ...) ## S3 method for class 'corral' print(x, ...)
inp |
matrix (any type), |
method |
character, the algorithm to be used for svd. Default is irl. Currently supports 'irl' for irlba::irlba or 'svd' for stats::svd |
ncomp |
numeric, number of components; Default is 30 |
row.w |
numeric vector; the row weights to use in chi-squared scaling. Defaults to 'NULL', in which case row weights are computed from the input matrix. |
col.w |
numeric vector; the column weights to use in chi-squared scaling. For instance, size factors could be given here. Defaults to 'NULL', in which case column weights are computed from the input matrix. |
rtype |
character indicating what type of residual should be computed; options are '"indexed"', '"standardized"' (or '"pearson"' is equivalent), '"freemantukey"', and '"hellinger"'; defaults to '"standardized"' for |
vst_mth |
character indicating whether a variance-stabilizing transform should be applied prior to calculating chi-squared residuals; defaults to '"none"' |
... |
(additional arguments for methods) |
whichmat |
character; defaults to |
fullout |
boolean; whether the function will return the full |
subset_row |
numeric, character, or boolean vector; the rows to include in corral, as indices (numeric), rownames (character), or with booleans (same length as the number of rows in the matrix). If this parameter is |
x |
(print method) corral object; the list output from |
When run on a matrix, a list with the correspondence analysis matrix decomposition result:
d
a vector of the diagonal singular values of the input mat
(from SVD output)
u
a matrix of with the left singular vectors of mat
in the columns (from SVD output)
v
a matrix of with the right singular vectors of mat
in the columns. When cells are in the columns, these are the cell embeddings. (from SVD output)
eigsum
sum of the eigenvalues for calculating percent variance explained
SCu and SCv
standard coordinates, left and right, respectively
PCu and PCv
principal coordinates, left and right, respectively
When run on a SingleCellExperiment
, returns a SCE with the embeddings (PCv from the full corral output) in the reducedDim
slot corral
(default). Also can return the same output as corral_mat
when fullout
is set to TRUE
.
For matrix and SummarizedExperiment
input, returns list with the correspondence analysis matrix decomposition result (u,v,d are the raw svd output; SCu and SCv are the standard coordinates; PCu and PCv are the principal coordinates)
For SummarizedExperiment
input, returns the same as for a matrix.
.
mat <- matrix(sample(0:10, 5000, replace=TRUE), ncol=50) result <- corral_mat(mat) result <- corral_mat(mat, method = 'irl', ncomp = 5) library(DuoClustering2018) sce <- sce_full_Zhengmix4eq()[1:100,1:100] result_1 <- corral_sce(sce) result_2 <- corral_sce(sce, method = 'svd') result_3 <- corral_sce(sce, method = 'irl', ncomp = 30, whichmat = 'logcounts') library(DuoClustering2018) sce <- sce_full_Zhengmix4eq()[1:100,1:100] corral_sce <- corral(sce,whichmat = 'counts') mat <- matrix(sample(0:10, 500, replace=TRUE), ncol=25) corral_mat <- corral(mat, ncomp=5) mat <- matrix(sample(1:100, 10000, replace = TRUE), ncol = 100) corral(mat)
mat <- matrix(sample(0:10, 5000, replace=TRUE), ncol=50) result <- corral_mat(mat) result <- corral_mat(mat, method = 'irl', ncomp = 5) library(DuoClustering2018) sce <- sce_full_Zhengmix4eq()[1:100,1:100] result_1 <- corral_sce(sce) result_2 <- corral_sce(sce, method = 'svd') result_3 <- corral_sce(sce, method = 'irl', ncomp = 30, whichmat = 'logcounts') library(DuoClustering2018) sce <- sce_full_Zhengmix4eq()[1:100,1:100] corral_sce <- corral(sce,whichmat = 'counts') mat <- matrix(sample(0:10, 500, replace=TRUE), ncol=25) corral_mat <- corral(mat, ncomp=5) mat <- matrix(sample(1:100, 10000, replace = TRUE), ncol = 100) corral(mat)
This function performs the row and column scaling pre-processing operations, prior to SVD, for the corral methods. See corral
for single matrix correspondence analysis and corralm
for multi-matrix correspondence analysis.
corral_preproc( inp, rtype = c("standardized", "indexed", "hellinger", "freemantukey", "pearson"), vst_mth = c("none", "sqrt", "freemantukey", "anscombe"), powdef_alpha = NULL, row.w = NULL, col.w = NULL, smooth = FALSE, ... )
corral_preproc( inp, rtype = c("standardized", "indexed", "hellinger", "freemantukey", "pearson"), vst_mth = c("none", "sqrt", "freemantukey", "anscombe"), powdef_alpha = NULL, row.w = NULL, col.w = NULL, smooth = FALSE, ... )
inp |
matrix, numeric, counts or logcounts; can be sparse Matrix or matrix |
rtype |
character indicating what type of residual should be computed; options are '"indexed"', '"standardized"' (or '"pearson"' is equivalent), '"freemantukey"', and '"hellinger"'; defaults to '"standardized"' for |
vst_mth |
character indicating whether a variance-stabilizing transform should be applied prior to calculating chi-squared residuals; defaults to '"none"' |
powdef_alpha |
numeric for the power that should be applied if using power deflation. Must be in (0,1), and if provided a number outside this range, will be ignored. Defaults to 'NULL' which does not perform this step. |
row.w |
numeric vector; Default is |
col.w |
numeric vector; Default is |
smooth |
logical; Whether or not to perform the additional smoothing step with 'trim_matdist'. Default is |
... |
(additional arguments for methods) |
matrix, processed for input to compsvd
to finish CA routine
mat <- matrix(sample(0:10, 500, replace=TRUE), ncol=25) mat_corral <- corral_preproc(mat) corral_output <- compsvd(mat_corral, ncomp = 5)
mat <- matrix(sample(0:10, 500, replace=TRUE), ncol=25) mat_corral <- corral_preproc(mat) corral_output <- compsvd(mat_corral, ncomp = 5)
This multi-table adaptation of correpondence analysis applies the same scaling technique and enables data alignment by finding a set of embeddings for each dataset within shared latent space.
corralm_matlist( matlist, method = c("irl", "svd"), ncomp = 30, rtype = c("indexed", "standardized", "hellinger", "freemantukey", "pearson"), vst_mth = c("none", "sqrt", "freemantukey", "anscombe"), rw_contrib = NULL, ... ) corralm_sce( sce, splitby, method = c("irl", "svd"), ncomp = 30, whichmat = "counts", fullout = FALSE, rw_contrib = NULL, ... ) corralm(inp, whichmat = "counts", fullout = FALSE, ...) ## S3 method for class 'corralm' print(x, ...)
corralm_matlist( matlist, method = c("irl", "svd"), ncomp = 30, rtype = c("indexed", "standardized", "hellinger", "freemantukey", "pearson"), vst_mth = c("none", "sqrt", "freemantukey", "anscombe"), rw_contrib = NULL, ... ) corralm_sce( sce, splitby, method = c("irl", "svd"), ncomp = 30, whichmat = "counts", fullout = FALSE, rw_contrib = NULL, ... ) corralm(inp, whichmat = "counts", fullout = FALSE, ...) ## S3 method for class 'corralm' print(x, ...)
matlist |
(for |
method |
character, the algorithm to be used for svd. Default is irl. Currently supports 'irl' for irlba::irlba or 'svd' for stats::svd |
ncomp |
numeric, number of components; Default is 30 |
rtype |
character indicating what type of residual should be computed; options are '"indexed"', '"standardized"' (or '"pearson"' is equivalent), '"freemantukey"', and '"hellinger"'; defaults to '"standardized"' for |
vst_mth |
character indicating whether a variance-stabilizing transform should be applied prior to calculating chi-squared residuals; defaults to '"none"' |
rw_contrib |
numeric vector, same length as the matlist. Indicates the weight that each dataset should contribute to the row weights. When set to NULL the row weights are *not* combined and each matrix is scaled independently (i.e., using their observed row weights, respectively). When set to a vector of all the same values, this is equivalent to taking the mean. Another option is to the number of observations per matrix to create a weighted mean. Regardless of input scale, row weights for each table must sum to 1 and thus are scaled. When this option is specified (i.e., not 'NULL'), the 'rtype' argument will automatically be set to 'standardized', and whatever argument is given will be ignored. |
... |
(additional arguments for methods) |
sce |
(for |
splitby |
character; name of the attribute from |
whichmat |
char, when using SingleCellExperiment or other SummarizedExperiment, can be specified. default is 'counts'. |
fullout |
boolean; whether the function will return the full |
inp |
list of matrices (any type), a |
x |
(print method) corralm object; the list output from |
corralm
is a wrapper for corralm_matlist
and corralm_sce
, and can be called on any of the acceptable input types (see inp
below).
When run on a list of matrices, a list with the correspondence analysis matrix decomposition result, with indices corresponding to the concatenated matrices (in order of the list):
d
a vector of the diagonal singular values of the input mat
(from SVD output)
u
a matrix of with the left singular vectors of mat
in the columns (from SVD output)
v
a matrix of with the right singular vectors of mat
in the columns. When cells are in the columns, these are the cell embeddings. (from SVD output)
eigsum
sum of the eigenvalues for calculating percent variance explained
For SingleCellExperiment input, returns the SCE with embeddings in the reducedDim slot 'corralm'
For a list of SingleCellExperiment
s, returns a list of the SCEs with the embeddings in the respective reducedDim
slot 'corralm'
.
listofmats <- list(matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 25), matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 25)) result <- corralm_matlist(listofmats) library(DuoClustering2018) library(SingleCellExperiment) sce <- sce_full_Zhengmix4eq()[1:100,sample(1:3500,100,replace = FALSE)] colData(sce)$Method <- matrix(sample(c('Method1','Method2'),100,replace = TRUE)) result <- corralm_sce(sce, splitby = 'Method') listofmats <- list(matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 20), matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 20)) corralm(listofmats) library(DuoClustering2018) library(SingleCellExperiment) sce <- sce_full_Zhengmix4eq()[seq(1,100,1),sample(seq(1,3500,1),100,replace = FALSE)] colData(sce)$Method <- matrix(sample(c('Method1','Method2'),100,replace = TRUE)) result <- corralm(sce, splitby = 'Method') # default print method for corralm objects
listofmats <- list(matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 25), matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 25)) result <- corralm_matlist(listofmats) library(DuoClustering2018) library(SingleCellExperiment) sce <- sce_full_Zhengmix4eq()[1:100,sample(1:3500,100,replace = FALSE)] colData(sce)$Method <- matrix(sample(c('Method1','Method2'),100,replace = TRUE)) result <- corralm_sce(sce, splitby = 'Method') listofmats <- list(matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 20), matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 20)) corralm(listofmats) library(DuoClustering2018) library(SingleCellExperiment) sce <- sce_full_Zhengmix4eq()[seq(1,100,1),sample(seq(1,3500,1),100,replace = FALSE)] colData(sce)$Method <- matrix(sample(c('Method1','Method2'),100,replace = TRUE)) result <- corralm(sce, splitby = 'Method') # default print method for corralm objects
i.e., wasserstein distance with L1 (p_param = 1); can also use other penalties > 1 (Not technically earthmover distance if using other p_param values)
earthmover_dist(batch1, batch2, whichdim = 1, numbins = 100, p_param = 1)
earthmover_dist(batch1, batch2, whichdim = 1, numbins = 100, p_param = 1)
batch1 |
matrix; subset of observations from an embedding correponding to some attribute (e.g., batch or phenotype) |
batch2 |
matrix; subset of observations from an embedding correponding to some attribute (e.g., batch or phenotype) |
whichdim |
int; which dimension (i.e., column) from the embeddings is used. defaults on first |
numbins |
int; number of bins for the probability discretization (defaults to 100) |
p_param |
int; penalty parameter for general Wasserstein distance. Defaults to 1, which corresonds to earthmover. |
num; the distance
# To compare distributions of reduced dimension values to assess similarity, # e.g. as a metric for batch integration embedding <- matrix(sample(x = seq(0,10,.1),1000, replace = TRUE),ncol = 5) batch <- matrix(sample(c(1,2),200, replace = TRUE)) earthmover_dist(embedding[which(batch == 1),],embedding[which(batch == 2),])
# To compare distributions of reduced dimension values to assess similarity, # e.g. as a metric for batch integration embedding <- matrix(sample(x = seq(0,10,.1),1000, replace = TRUE),ncol = 5) batch <- matrix(sample(c(1,2),200, replace = TRUE)) earthmover_dist(embedding[which(batch == 1),],embedding[which(batch == 2),])
Compute percent of variance explained
get_pct_var_exp_svd(thissvd, preproc_mat = thissvd$d)
get_pct_var_exp_svd(thissvd, preproc_mat = thissvd$d)
thissvd |
list outputted from an svd function (svd, irlba; can also take output from |
preproc_mat |
matrix of pre-processed values (optional) - important to include if the svd is only partial as this is used to compute the sum of eigenvalues |
vector of percent variance explained values, indexed by PC
mat <- matrix(sample(seq(0,20,1),100,replace = TRUE),nrow = 10) my_svd <- svd(mat) get_pct_var_exp_svd(my_svd) # this works if my_svd is a full svd my_irl <- irlba::irlba(mat,nv = 2) get_pct_var_exp_svd(my_irl, preproc_mat = mat) # ... otherwise use this
mat <- matrix(sample(seq(0,20,1),100,replace = TRUE),nrow = 10) my_svd <- svd(mat) get_pct_var_exp_svd(my_svd) # this works if my_svd is a full svd my_irl <- irlba::irlba(mat,nv = 2) get_pct_var_exp_svd(my_irl, preproc_mat = mat) # ... otherwise use this
Computes row weights and column weights
get_weights(inp_mat)
get_weights(inp_mat)
inp_mat |
matrix for which weights should be calculated (sparse or full) |
list of 2 elements: 'row.w' and 'col.w' contain the row and column weights respectively
mat <- matrix(sample(seq(0,20,1),100,replace = TRUE),nrow = 10) ws <- get_weights(mat)
mat <- matrix(sample(seq(0,20,1),100,replace = TRUE),nrow = 10) ws <- get_weights(mat)
List to Matrix
list2mat(matlist, direction = c("c", "r")[1])
list2mat(matlist, direction = c("c", "r")[1])
matlist |
list of matrices to concatenate |
direction |
character, r or c, to indicate whether should be row-wise (i.e., rbind to match on columns) or column-wise (i.e., cbind to match on rows). Defaults to columnwise (matching on rows) to match convention of SingleCellExperiments |
matrix
listofmats <- list(matrix(sample(seq(0,20,1),100,replace = TRUE),nrow = 10), matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 10)) newmat <- list2mat(listofmats) # to "cbind" them listofmats_t <- lapply(listofmats,t) newmat_t <- list2mat(listofmats_t, 'r') # to "rbind" them
listofmats <- list(matrix(sample(seq(0,20,1),100,replace = TRUE),nrow = 10), matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 10)) newmat <- list2mat(listofmats) # to "cbind" them listofmats_t <- lapply(listofmats,t) newmat_t <- list2mat(listofmats_t, 'r') # to "rbind" them
Set na to 0
na2zero(x)
na2zero(x)
x |
matrix of values for which na values should be changed to 0 |
matrix, where na values are set to 0
x <- matrix(sample(0:10, 5000, replace = TRUE), ncol = 25) x[sample(1:5000, 10)] <- NA na2zero(x)
x <- matrix(sample(0:10, 5000, replace = TRUE), ncol = 25) x[sample(1:5000, 10)] <- NA na2zero(x)
Pairwise rv coefficient
pairwise_rv(matlist)
pairwise_rv(matlist)
matlist |
list of matrices (or matrix-like; see |
matrix of the pairwise coefficients
a <- matrix(sample(1:10,100,TRUE), nrow = 10) b <- matrix(sample(1:10,50,TRUE), nrow = 5) c <- matrix(sample(1:10,20,TRUE), nrow = 2) matlist <- list(a,b,c) pairwise_rv(matlist) pairwise_rv(lapply(matlist, t))
a <- matrix(sample(1:10,100,TRUE), nrow = 10) b <- matrix(sample(1:10,50,TRUE), nrow = 5) c <- matrix(sample(1:10,20,TRUE), nrow = 2) matlist <- list(a,b,c) pairwise_rv(matlist) pairwise_rv(lapply(matlist, t))
Plot selected PCs from an embedding
plot_embedding( embedding, xpc = 1, ypc = xpc + 1, plot_title = paste0("Dim", xpc, " by Dim", ypc), color_vec = NULL, color_title = NULL, ellipse_vec = NULL, facet_vec = NULL, ptsize = 0.8, saveplot = FALSE, plotfn = paste(plot_title, xpc, sep = "_"), showplot = TRUE, returngg = FALSE, color_pal_vec = NULL, dimname = "Dim" )
plot_embedding( embedding, xpc = 1, ypc = xpc + 1, plot_title = paste0("Dim", xpc, " by Dim", ypc), color_vec = NULL, color_title = NULL, ellipse_vec = NULL, facet_vec = NULL, ptsize = 0.8, saveplot = FALSE, plotfn = paste(plot_title, xpc, sep = "_"), showplot = TRUE, returngg = FALSE, color_pal_vec = NULL, dimname = "Dim" )
embedding |
matrix or other tabular format where columns correspond to PCs and rows correspond to cells (entries). |
xpc |
int; which PC to put on the x-axis (defaults to 1) |
ypc |
int; which PC to put on the y-axis (defaults to the one after |
plot_title |
char; title of plot (defaults to titling based on |
color_vec |
vector; length should correspond to the number of rows in embedding, and each element of the vector classifies that cell (entry) in the embedding to that particular class, which will be colored the same. (e.g., this could be indicating which batch each cell is from) |
color_title |
char; what attribute the colors represent |
ellipse_vec |
vector; length should correspond to the number of rows in embedding, and each element of the vector classifies that cell (entry) in the embedding to that particular class, and elements of the same class will be circled in an ellipse. (e.g., this could be indicating the cell type or cell line; works best for attributes intended to be compact) |
facet_vec |
vector; length should correspond to the number of rows in embedding, and each element of the vector classifies that cell (entry) in the embedding to that particular class. Plot will be faceted by this attribute. |
ptsize |
numeric; the size of the points as passed to |
saveplot |
boolean; whether or not to save the plot, defaults |
plotfn |
char; what the filename is to be called. (defaults to making a name based on |
showplot |
boolean; whether or not to show the plot, defaults |
returngg |
boolean; whether or not to return a |
color_pal_vec |
char; hex codes for the color palette to be used. Default is to use the ggthemes few for plots with less than 9 colors, and to use/"stretch" pals polychrome if more colors are needed. |
dimname |
char; the name of the dimensions. defaults to "Dim" |
default none; options to display plot (showplot
), save plot (saveplot
), and/or return ggplot2
object (returngg
)
listofmats <- list(matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 20), matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 20)) corralm_obj <- corralm(listofmats, ncomp = 5) embed_mat <- corralm_obj$v cell_type_vec <- sample(c('type1','type2','type3'),100,replace = TRUE) plot_embedding(embedding = embed_mat, xpc = 1, plot_title = 'corralm plot', color_vec = cell_type_vec, color_title = 'cell type', saveplot = FALSE) # or, call directly on the corralm object plot_embedding(corralm_obj)
listofmats <- list(matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 20), matrix(sample(seq(0,20,1),1000,replace = TRUE),nrow = 20)) corralm_obj <- corralm(listofmats, ncomp = 5) embed_mat <- corralm_obj$v cell_type_vec <- sample(c('type1','type2','type3'),100,replace = TRUE) plot_embedding(embedding = embed_mat, xpc = 1, plot_title = 'corralm plot', color_vec = cell_type_vec, color_title = 'cell type', saveplot = FALSE) # or, call directly on the corralm object plot_embedding(corralm_obj)
Plot selected PCs from an embedding saved in a SingleCellExperiment object
plot_embedding_sce( sce, which_embedding, color_attr = NULL, color_title = color_attr, ellipse_attr = NULL, facet_attr = NULL, ... )
plot_embedding_sce( sce, which_embedding, color_attr = NULL, color_title = color_attr, ellipse_attr = NULL, facet_attr = NULL, ... )
sce |
|
which_embedding |
character; for the embedding to plot |
color_attr |
character; name of the attribute within |
color_title |
character; title to use for colors legend, defaults to the same as |
ellipse_attr |
character; name of the attribute within |
facet_attr |
character; name of the attribute within |
... |
additional optional arguments - see |
default none; options to display plot (showplot
), save plot (saveplot
), and/or return ggplot2
object (returngg
)
library(DuoClustering2018) library(SingleCellExperiment) sce <- sce_full_Zhengmix4eq()[1:100,sample(1:3500,100,replace = FALSE)] colData(sce)$Method <- matrix(sample(c('Method1','Method2'),100,replace = TRUE)) sce <- corralm(sce, splitby = 'Method') # to plot and show only plot_embedding_sce(sce = sce, which_embedding = 'corralm', xpc = 1, plot_title = 'corralm: PC1 by PC2', color_attr = "Method", ellipse_attr = 'phenoid', saveplot = FALSE) # to return ggplot2 object and display, but not save corralm_ggplot <- plot_embedding_sce(sce = sce, which_embedding = 'corralm', xpc = 1, plot_title = 'corralm: PC1 by PC2', color_attr = 'Method', ellipse_attr = 'phenoid', returngg = TRUE, saveplot = FALSE)
library(DuoClustering2018) library(SingleCellExperiment) sce <- sce_full_Zhengmix4eq()[1:100,sample(1:3500,100,replace = FALSE)] colData(sce)$Method <- matrix(sample(c('Method1','Method2'),100,replace = TRUE)) sce <- corralm(sce, splitby = 'Method') # to plot and show only plot_embedding_sce(sce = sce, which_embedding = 'corralm', xpc = 1, plot_title = 'corralm: PC1 by PC2', color_attr = "Method", ellipse_attr = 'phenoid', saveplot = FALSE) # to return ggplot2 object and display, but not save corralm_ggplot <- plot_embedding_sce(sce = sce, which_embedding = 'corralm', xpc = 1, plot_title = 'corralm: PC1 by PC2', color_attr = 'Method', ellipse_attr = 'phenoid', returngg = TRUE, saveplot = FALSE)
rv coefficient
rv(mat1, mat2)
rv(mat1, mat2)
mat1 |
matrix (or matrix-like, e.g., df); either columns or rows should be matched with |
mat2 |
matrix (or matrix-like, e.g., df); either columns or rows should be matched with |
numeric; RV coefficient between the matched matrices
a <- matrix(sample(1:10,100, TRUE), nrow = 10) b <- matrix(sample(1:10,50, TRUE), nrow = 5) rv(a, b) # matched by columns rv(t(a), t(b)) # matched by rows
a <- matrix(sample(1:10,100, TRUE), nrow = 10) b <- matrix(sample(1:10,50, TRUE), nrow = 5) rv(a, b) # matched by columns rv(t(a), t(b)) # matched by rows
Generate a scaled variance plot for an integrative embedding
scal_var( inp, batchvec = NULL, pcs = seq(3), returngg = FALSE, showplot = TRUE, plot_subtitle = NULL )
scal_var( inp, batchvec = NULL, pcs = seq(3), returngg = FALSE, showplot = TRUE, plot_subtitle = NULL )
inp |
corralm object or matrix; embedding to compute scaled variances |
batchvec |
vector; batch labels (can be numeric or char). Defaults to 'NULL', which is appropriate for using a corralm object. If using an embedding matrix for inp, then this argument must be given and length must correspond to number of rows in 'inp'. |
pcs |
numeric; vector of which PCs should be shown. Defaults to 1:3 |
returngg |
boolean; whether or not to return a |
showplot |
boolean; whether or not to show the plot, defaults |
plot_subtitle |
string; the text that should show in the subtitle for the plot. defaults to NULL |
N/A or a ggplot object
dat <- matrix(rnorm(10000), ncol = 50) bv <- rep(seq(4),c(10,30,60,100)) scal_var(dat,bv, pcs = seq(4))
dat <- matrix(rnorm(10000), ncol = 50) bv <- rep(seq(4),c(10,30,60,100)) scal_var(dat,bv, pcs = seq(4))
Generate a matrix of the scaled variance values
scal_var_mat(inp, batchvec = NULL)
scal_var_mat(inp, batchvec = NULL)
inp |
corralm object or matrix; embedding to compute scaled variances |
batchvec |
vector; batch labels (can be numeric or char). Defaults to 'NULL', which is appropriate for using a corralm object. If using an embedding matrix for inp, then this argument must be given and length must correspond to number of rows in 'inp'. |
matrix of the scaled variance values by PC (batches in rows; PCs in columns)
dat <- matrix(rnorm(5000), ncol = 50) bv <- rep(seq(3),c(10,30,60)) scal_var_mat(dat, bv)
dat <- matrix(rnorm(5000), ncol = 50) bv <- rep(seq(3),c(10,30,60)) scal_var_mat(dat, bv)
SingleCellExperiment to list of matrices
sce2matlist(sce, splitby, to_include = NULL, whichmat = "counts")
sce2matlist(sce, splitby, to_include = NULL, whichmat = "counts")
sce |
SingleCellExperiment that is to be separated into list of count matrices |
splitby |
character; name of the attribute from colData that should be used to separate the SCE |
to_include |
(optional) character vector; determines which values from the "splitby" column will be included in the outputted matlist. NULL is the default, and will result in selecting all elements |
whichmat |
character; defaults to |
list of matrices
library(DuoClustering2018) sce <- sce_full_Zhengmix4eq() matlist <- sce2matlist(sce = sce, splitby = 'phenoid', whichmat = 'logcounts')
library(DuoClustering2018) sce <- sce_full_Zhengmix4eq() matlist <- sce2matlist(sce = sce, splitby = 'phenoid', whichmat = 'logcounts')
Smooths the extreme values in a chi-square-transformed matrix to lessen the influence of "rare objects."
trim_matdist(mat, pct_trim = 0.01)
trim_matdist(mat, pct_trim = 0.01)
mat |
matrix; should be pre-processed/normalized to some sort of approximately normally distributed statistic (e.g., chi-squared transformation with 'corral_preproc' or Z-score normalization) |
pct_trim |
numeric; the percent of observations to smooth. Defaults to 'pct_trim' = .01, which corresponds to smoothing all observations to be between the .5 percentile and 99.5 percentile range of the input matrix |
(Usually not called directly; can be included by using the 'smooth' argument in the 'corral', 'corralm', and 'corral_preproc' functions)
smoothed matrix
count_mat <- matrix(rpois(10000, 300)*rbinom(10000,1,.1), ncol = 100) smoothed_preproc_mat <- corral_preproc(count_mat, smooth = TRUE)
count_mat <- matrix(rpois(10000, 300)*rbinom(10000,1,.1), ncol = 100) smoothed_preproc_mat <- corral_preproc(count_mat, smooth = TRUE)
Prior to running CA, there is an option to apply a variance stabilizing transformation. This function can be called explicitly or used with the 'vst_mth' argument in corral
and corral_preproc
.
var_stabilize(inp, transform = c("sqrt", "freemantukey", "anscombe"))
var_stabilize(inp, transform = c("sqrt", "freemantukey", "anscombe"))
inp |
matrix, numeric, counts or logcounts; can be sparse Matrix or matrix |
transform |
character indicating which method should be applied. Defaults to the square root transform ('"sqrt"'). Other options include '"freemantukey"' and '"anscombe"'. |
variance-stabilized matrix; sparse if possible
x <- as.matrix(rpois(100, lambda = 50), ncol = 10) vst_x <- var_stabilize(x)
x <- as.matrix(rpois(100, lambda = 50), ncol = 10) vst_x <- var_stabilize(x)