Title: | Association Plots |
---|---|
Description: | APL is a package developed for computation of Association Plots (AP), a method for visualization and analysis of single cell transcriptomics data. The main focus of APL is the identification of genes characteristic for individual clusters of cells from input data. The package performs correspondence analysis (CA) and allows to identify cluster-specific genes using Association Plots. Additionally, APL computes the cluster-specificity scores for all genes which allows to rank the genes by their specificity for a selected cell cluster of interest. |
Authors: | Clemens Kohl [cre, aut], Elzbieta Gralinska [aut], Martin Vingron [aut] |
Maintainer: | Clemens Kohl <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.11.2 |
Built: | 2024-12-03 06:18:27 UTC |
Source: | https://github.com/bioc/APL |
Plot an Association Plot for the chosen columns.
apl( caobj, type = "ggplot", rows_idx = NULL, cols_idx = caobj@group, row_labs = FALSE, col_labs = FALSE, show_score = FALSE, show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow" )
apl( caobj, type = "ggplot", rows_idx = NULL, cols_idx = caobj@group, row_labs = FALSE, col_labs = FALSE, show_score = FALSE, show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow" )
caobj |
An object of class "cacomp" and "APL" with apl coordinates calculated. |
type |
"ggplot"/"plotly". For a static plot a string "ggplot", for an interactive plot "plotly". Default "ggplot". |
rows_idx |
numeric/character vector. Indices or names of the rows that should be labelled. Default NULL. |
cols_idx |
numeric/character vector. Indices or names of the columns that should be labelled. Default is only to label columns making up the centroid: caobj@group. |
row_labs |
Logical. Whether labels for rows indicated by rows_idx should be labeled with text. Default TRUE. |
col_labs |
Logical. Whether labels for columns indicated by cols_idx shouls be labeled with text. Default FALSE. |
show_score |
Logical. Whether the S-alpha score should be shown in the plot. |
show_cols |
Logical. Whether column points should be plotted. |
show_rows |
Logical. Whether row points should be plotted. |
score_cutoff |
Numeric. Rows (genes) with a score >= score_cutoff will be colored according to their score if show_score = TRUE. |
score_color |
Either "rainbow" or "viridis". |
For an interactive plot type="plotly" can be chosen, otherwise a static plot will be returned. The row and column coordinates have to be already calculated by 'apl_coords()'.
Either a ggplot or plotly object.
Association Plots: Visualizing associations in high-dimensional
correspondence analysis biplots
Elzbieta Gralinska, Martin Vingron
bioRxiv 2020.10.23.352096; doi: https://doi.org/10.1101/2020.10.23.352096
set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis ca <- cacomp(obj = cnts, princ_coords = 3) # Calculate APL coordinates for arbitrary group ca <- apl_coords(ca, group = 1:10) # plot results # Note: # Due to random gene expression & group, no highly # associated genes are visible. apl(ca, type = "ggplot")
set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis ca <- cacomp(obj = cnts, princ_coords = 3) # Calculate APL coordinates for arbitrary group ca <- apl_coords(ca, group = 1:10) # plot results # Note: # Due to random gene expression & group, no highly # associated genes are visible. apl(ca, type = "ggplot")
Calculates the Association Plot coordinates for either the rows, columns or both (default).
apl_coords(caobj, group, calc_rows = TRUE, calc_cols = TRUE)
apl_coords(caobj, group, calc_rows = TRUE, calc_cols = TRUE)
caobj |
A "cacomp" object with principal row coordinates and standardized column coordinates calculated. |
group |
Numeric/Character. Vector of indices or column names of the columns to calculate centroid/x-axis direction. |
calc_rows |
TRUE/FALSE. Whether apl row coordinates should be calculated. Default TRUE. |
calc_cols |
TRUE/FALSE. Whether apl column coordinates should be calculated. Default TRUE. |
Coordinates (x,y) of row vector are defined as
The x-direction is determined by calculating the centroid of the columns selected with the indices in "group".
Returns input "cacomp" object and adds components "apl_rows" and/or "apl_cols" for row and column coordinates. In "group" the indices of the columns used to calculate the centroid are saved.
Association Plots: Visualizing associations in high-dimensional correspondence analysis biplots Elzbieta Gralinska, Martin Vingron bioRxiv 2020.10.23.352096; doi: https://doi.org/10.1101/2020.10.23.352096
set.seed(1234) # Simulate scRNAseq data cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20), cell_4 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis ca <- cacomp(obj = cnts, princ_coords = 3, dims = 3) # Calculate APL coordinates ca <- apl_coords(ca, group = 3:4)
set.seed(1234) # Simulate scRNAseq data cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20), cell_4 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis ca <- cacomp(obj = cnts, princ_coords = 3, dims = 3) # Calculate APL coordinates ca <- apl_coords(ca, group = 3:4)
Uses ggplot to plot an Association Plot
apl_ggplot( rows, rows_group = NULL, cols, cols_group = NULL, rows_scored = NULL, rows_color = "#0066FF", rows_high_color = "#FF0000", cols_color = "#601A4A", cols_high_color = "#EE442F", score_color = "rainbow", row_labs = FALSE, col_labs = FALSE, show_score = FALSE, show_cols = FALSE, show_rows = TRUE )
apl_ggplot( rows, rows_group = NULL, cols, cols_group = NULL, rows_scored = NULL, rows_color = "#0066FF", rows_high_color = "#FF0000", cols_color = "#601A4A", cols_high_color = "#EE442F", score_color = "rainbow", row_labs = FALSE, col_labs = FALSE, show_score = FALSE, show_cols = FALSE, show_rows = TRUE )
rows |
Row APL-coordinates |
rows_group |
Row AP-coordinates to highlight |
cols |
Column AP-coordinates |
cols_group |
Column AP-coordinates for the group to be highlighted. |
rows_scored |
Row AP-coordinates of rows above a score cutoff. |
rows_color |
Color for rows |
rows_high_color |
Color for rows to be highlighted. |
cols_color |
Column points color. |
cols_high_color |
Color for column points to be highlighted.. |
score_color |
Color scheme for row points with a score. |
row_labs |
Logical. Whether labels for rows indicated by rows_idx should be labeled with text. Default TRUE. |
col_labs |
Logical. Whether labels for columns indicated by cols_idx shouls be labeled with text. Default FALSE. |
show_score |
Logical. Whether the S-alpha score should be shown in the plot. |
show_cols |
Logical. Whether column points should be plotted. |
show_rows |
Logical. Whether row points should be plotted. |
ggplot Association Plot
Uses plotly to generate an interactive Association Plot
apl_plotly( rows, rows_group = NULL, cols, cols_group, rows_scored = NULL, rows_color = "#0066FF", rows_high_color = "#FF0000", cols_color = "#601A4A", cols_high_color = "#EE442F", score_color = "rainbow", row_labs = FALSE, col_labs = FALSE, show_score = FALSE, show_cols = FALSE, show_rows = TRUE )
apl_plotly( rows, rows_group = NULL, cols, cols_group, rows_scored = NULL, rows_color = "#0066FF", rows_high_color = "#FF0000", cols_color = "#601A4A", cols_high_color = "#EE442F", score_color = "rainbow", row_labs = FALSE, col_labs = FALSE, show_score = FALSE, show_cols = FALSE, show_rows = TRUE )
rows |
Row APL-coordinates |
rows_group |
Row AP-coordinates to highlight |
cols |
Column AP-coordinates |
cols_group |
Column AP-coordinates for the group to be highlighted. |
rows_scored |
Row AP-coordinates of rows above a score cutoff. |
rows_color |
Color for rows |
rows_high_color |
Color for rows to be highlighted. |
cols_color |
Column points color. |
cols_high_color |
Color for column points to be highlighted. |
score_color |
Color scheme for row points with a score. |
row_labs |
Logical. Whether labels for rows indicated by rows_idx should be labeled with text. Default TRUE. |
col_labs |
Logical. Whether labels for columns indicated by cols_idx shouls be labeled with text. Default FALSE. |
show_score |
Logical. Whether the S-alpha score should be shown in the plot. |
show_cols |
Logical. Whether column points should be plotted. |
show_rows |
Logical. Whether row points should be plotted. |
Interactive plotly Association Plot
Ranks rows by a calculated score which balances the association of the row with the condition and how associated it is with other conditions.
apl_score( caobj, mat = NULL, dims = caobj@dims, group = caobj@group, reps = 10, quant = 0.99, python = FALSE, store_perm = TRUE, method = "permutation" )
apl_score( caobj, mat = NULL, dims = caobj@dims, group = caobj@group, reps = 10, quant = 0.99, python = FALSE, store_perm = TRUE, method = "permutation" )
caobj |
A "cacomp" object with principal row coordinates and standardized column coordinates calculated. |
mat |
A numeric matrix. For sequencing a count matrix, gene expression values with genes in rows and samples/cells in columns. Should contain row and column names. |
dims |
Integer. Number of CA dimensions to retain. Needs to be the same as in caobj! |
group |
Vector of indices of the columns to calculate centroid/x-axis direction. |
reps |
Integer. Number of permutations to perform. |
quant |
Numeric. Single number between 0 and 1 indicating the quantile used to calculate the cutoff. Default 0.99. |
python |
DEPRACTED. A logical value indicating whether to use singular-value decomposition from the python package torch. |
store_perm |
Logical. Whether permuted data should be stored in the CA object. This implementation dramatically speeds up computation compared to 'svd()' in R. |
method |
Method to calculate the cutoff. Either "random" for random direction method or "permutation" for the permutation method. |
The score is calculated by permuting the values of each row to determine the cutoff angle of the 99
By default the permutation is repeated 10 times (for random direction min. 300 repetition is recommended!), but for very large matrices this can be reduced. The method "permutation" permutes the columns in each row and calculates AP-coordinates for each such permutation. The cutoff is then taken by the quantile specified by "quan". The "random" method in contrast calculates AP-coordinates for the original data, but by looking into random directions.
If store_perm is TRUE the permuted data is stored in the cacomp object and can be used for future scoring.
Returns the input "cacomp" object with "APL_score" component added. APL_score contains a data frame with ranked rows, their score and their original row number.
Association Plots: Visualizing associations in high-dimensional
correspondence analysis biplots
Elzbieta Gralinska, Martin Vingron
bioRxiv 2020.10.23.352096; doi: https://doi.org/10.1101/2020.10.23.352096
set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 3) # Calculate APL coordinates: ca <- apl_coords(ca, group = 1:10) # Rank genes by S-alpha score ca <- apl_score(ca, mat = cnts)
set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 3) # Calculate APL coordinates: ca <- apl_coords(ca, group = 1:10) # Rank genes by S-alpha score ca <- apl_score(ca, mat = cnts)
This function uses the Kolmogorov-Smirnov test as implemented by the package topGO to test for overrepresentation in Gene Ontology gene sets.
apl_topGO( caobj, ontology, organism = "hs", ngenes = 1000, score_cutoff = 0, use_coords = FALSE, return_plot = FALSE, top_res = 15 )
apl_topGO( caobj, ontology, organism = "hs", ngenes = 1000, score_cutoff = 0, use_coords = FALSE, return_plot = FALSE, top_res = 15 )
caobj |
A "cacomp" object with principal row coordinates and standardized column coordinates calculated. |
ontology |
Character string. Chooses GO sets for 'BP' (biological processes), 'CC' (cell compartment) or 'MF' (molecular function). |
organism |
Character string. Either 'hs' (homo sapiens), 'mm' (mus musculus) or the name of the organism package such as 'org.*.eg.db'. |
ngenes |
Numeric. Number of top ranked genes to test for overrepresentation. |
score_cutoff |
numeric. S-alpha score cutoff. Only genes with a score larger will be tested. |
use_coords |
Logical. Whether the x-coordinates of the row APL coordinates should be used for ranking. Only recommended when no S-alpha score (see apl_score()) can be calculated. |
return_plot |
Logical. Whether a plot of significant gene sets should be additionally returned. |
top_res |
Numeric. Number of top scoring genes to plot. |
For a chosen group of cells/samples, the top 'ngenes' group specific genes are used for gene overrepresentation analysis. The genes are ranked either by the precomputed APL score, or, if not available by their APL x-coordinates.
A data.frame containing the gene sets with the highest overrepresentation.
Adrian Alexa and Jorg Rahnenfuhrer
topGO: Enrichment Analysis for Gene Ontology.
R package version 2.42.0.
library(SeuratObject) set.seed(1234) cnts <- SeuratObject::LayerData(pbmc_small, assay = "RNA", layer = "counts") cnts <- as.matrix(cnts) # Run CA on example from Seurat ca <- cacomp(pbmc_small, princ_coords = 3, return_input = FALSE, assay = "RNA", slot = "counts") grp <- which(Idents(pbmc_small) == 2) ca <- apl_coords(ca, group = grp) ca <- apl_score(ca, mat = cnts) enr <- apl_topGO(ca, ontology = "BP", organism = "hs") plot_enrichment(enr)
library(SeuratObject) set.seed(1234) cnts <- SeuratObject::LayerData(pbmc_small, assay = "RNA", layer = "counts") cnts <- as.matrix(cnts) # Run CA on example from Seurat ca <- cacomp(pbmc_small, princ_coords = 3, return_input = FALSE, assay = "RNA", slot = "counts") grp <- which(Idents(pbmc_small) == 2) ca <- apl_coords(ca, group = grp) ca <- apl_score(ca, mat = cnts) enr <- apl_topGO(ca, ontology = "BP", organism = "hs") plot_enrichment(enr)
Converts the values stored in the Seurat/SingleCellExperiment dimensional reduction slot "CA" to a cacomp object. If recompute = TRUE additional parameters are recomputed from the saved values without rerunning SVD (need to specify assay to work).
as.cacomp.cacomp returns input without any calculations.
Recomputes missing values and returns cacomp object from a list. If you have a *complete* cacomp object in list form, use do.call(new_cacomp, obj).
as.cacomp.Seurat: Converts the values stored in the Seurat DimReduc slot "CA" to an cacomp object.
as.cacomp.SingleCellExperiment: Converts the values stored in the SingleCellExperiment reducedDim slot "CA" to a cacomp object.
as.cacomp(obj, ...) ## S4 method for signature 'cacomp' as.cacomp(obj, ...) ## S4 method for signature 'list' as.cacomp(obj, ..., mat = NULL) ## S4 method for signature 'Seurat' as.cacomp(obj, ..., assay = "RNA", slot = "counts") ## S4 method for signature 'SingleCellExperiment' as.cacomp(obj, ..., assay = "counts")
as.cacomp(obj, ...) ## S4 method for signature 'cacomp' as.cacomp(obj, ...) ## S4 method for signature 'list' as.cacomp(obj, ..., mat = NULL) ## S4 method for signature 'Seurat' as.cacomp(obj, ..., assay = "RNA", slot = "counts") ## S4 method for signature 'SingleCellExperiment' as.cacomp(obj, ..., assay = "counts")
obj |
An object of class "Seurat" or "SingleCellExperiment" with a dim. reduction named "CA" saved. For obj "cacomp" input is returned. |
... |
Further arguments. |
mat |
Original input matrix. |
assay |
Character. The assay from which extract the count matrix, e.g. "RNA" for Seurat objects or "counts"/"logcounts" for SingleCellExperiments. |
slot |
character. Slot of the Seurat assay to use. Default "counts". |
By default extracts std_coords_cols, D, prin_coords_rows, top_rows and dims from obj and outputs a cacomp object. If recompute = TRUE the following are additionally recalculated (doesn't run SVD): U, V, std_coords_rows, row_masses, col_masses.
A cacomp object.
######### # lists # ######### # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis ca <- cacomp(obj = cnts, princ_coords = 3) ca_list <- as.list(ca) # Only keep subset of elements for demonstration ca_list <- ca_list[c("U", "std_coords_rows", "std_coords_cols", "params")] # convert (incomplete) list to cacomp object. ca <- as.cacomp(ca_list, mat = cnts) ########## # Seurat # ########## library(SeuratObject) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) seu <- CreateSeuratObject(counts = cnts) seu <- cacomp(seu, return_input = TRUE) ca <- as.cacomp(seu, assay = "RNA", slot = "counts") ######################## # SingleCellExperiment # ######################## library(SingleCellExperiment) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) sce <- SingleCellExperiment(assays=list(counts=cnts)) sce <- cacomp(sce, return_input = TRUE) ca <- as.cacomp(sce, assay = "counts")
######### # lists # ######### # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis ca <- cacomp(obj = cnts, princ_coords = 3) ca_list <- as.list(ca) # Only keep subset of elements for demonstration ca_list <- ca_list[c("U", "std_coords_rows", "std_coords_cols", "params")] # convert (incomplete) list to cacomp object. ca <- as.cacomp(ca_list, mat = cnts) ########## # Seurat # ########## library(SeuratObject) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) seu <- CreateSeuratObject(counts = cnts) seu <- cacomp(seu, return_input = TRUE) ca <- as.cacomp(seu, assay = "RNA", slot = "counts") ######################## # SingleCellExperiment # ######################## library(SingleCellExperiment) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) sce <- SingleCellExperiment(assays=list(counts=cnts)) sce <- cacomp(sce, return_input = TRUE) ca <- as.cacomp(sce, assay = "counts")
Convert cacomp object to list.
## S4 method for signature 'cacomp' as.list(x)
## S4 method for signature 'cacomp' as.list(x)
x |
A cacomp object. |
A cacomp object.
# Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis ca <- cacomp(obj = cnts, princ_coords = 3) ca_list <- as.list(ca)
# Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis ca <- cacomp(obj = cnts, princ_coords = 3) ca_list <- as.list(ca)
Plots the first 3 dimensions of the rows and columns in the same plot.
ca_3Dplot( obj, xdim = 1, ydim = 2, zdim = 3, princ_coords = 1, row_labels = NULL, col_labels = NULL, ... ) ## S4 method for signature 'cacomp' ca_3Dplot( obj, xdim = 1, ydim = 2, zdim = 3, princ_coords = 1, row_labels = NULL, col_labels = NULL, ... ) ## S4 method for signature 'Seurat' ca_3Dplot( obj, xdim = 1, ydim = 2, zdim = 3, princ_coords = 1, row_labels = NULL, col_labels = NULL, ..., assay = SeuratObject::DefaultAssay(obj), slot = "counts" ) ## S4 method for signature 'SingleCellExperiment' ca_3Dplot( obj, xdim = 1, ydim = 2, zdim = 3, princ_coords = 1, row_labels = NULL, col_labels = NULL, ..., assay = "counts" )
ca_3Dplot( obj, xdim = 1, ydim = 2, zdim = 3, princ_coords = 1, row_labels = NULL, col_labels = NULL, ... ) ## S4 method for signature 'cacomp' ca_3Dplot( obj, xdim = 1, ydim = 2, zdim = 3, princ_coords = 1, row_labels = NULL, col_labels = NULL, ... ) ## S4 method for signature 'Seurat' ca_3Dplot( obj, xdim = 1, ydim = 2, zdim = 3, princ_coords = 1, row_labels = NULL, col_labels = NULL, ..., assay = SeuratObject::DefaultAssay(obj), slot = "counts" ) ## S4 method for signature 'SingleCellExperiment' ca_3Dplot( obj, xdim = 1, ydim = 2, zdim = 3, princ_coords = 1, row_labels = NULL, col_labels = NULL, ..., assay = "counts" )
obj |
An object of class "cacomp", or alternatively an object of class "Seurat" or "SingleCellExperiment" with a dim. reduction named "CA" saved. |
xdim |
Integer. The dimension for the x-axis. Default 1. |
ydim |
Integer. The dimension for the y-axis. Default 2. |
zdim |
Integer. The dimension for the z-axis. Default 3. |
princ_coords |
Integer. If 1 then principal coordinates are used for the rows, if 2 for the columns. Default 1 (rows). |
row_labels |
Numeric vector. Indices for the rows for which a label should be added (label should be stored in rownames). Default NULL. |
col_labels |
Numeric vector. Indices for the columns for which a label should be added (label should be stored in colnames). Default NULL (no columns). |
... |
Further arguments. |
assay |
SingleCellExperiment assay to obtain counts from. |
slot |
Seurat slot from assay to get count matrix from. |
Depending on whether 'princ_coords' is set to 1 or 2 either the principal coordinates of either the rows (1) or the columns (2) are chosen. For the other the standardized coordinates are plotted (assymetric biplot). Labels for rows and columns should be stored in the row- and column names respectively.
Plot of class "plotly".
# Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis ca <- cacomp(obj = cnts, princ_coords = 3) ca_3Dplot(ca)
# Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis ca <- cacomp(obj = cnts, princ_coords = 3) ca_3Dplot(ca)
Plots the first 2 dimensions of the rows and columns in the same plot.
ca_biplot( obj, xdim = 1, ydim = 2, princ_coords = 1, row_labels = NULL, col_labels = NULL, type = "ggplot", col_metadata = NULL, row_metadata = NULL, show_all = TRUE, ... ) ## S4 method for signature 'cacomp' ca_biplot( obj, xdim = 1, ydim = 2, princ_coords = 1, row_labels = NULL, col_labels = NULL, type = "ggplot", col_metadata = NULL, row_metadata = NULL, show_all = TRUE, ... ) ## S4 method for signature 'Seurat' ca_biplot( obj, xdim = 1, ydim = 2, princ_coords = 1, row_labels = NULL, col_labels = NULL, type = "ggplot", col_metadata = NULL, row_metadata = NULL, show_all = TRUE, ..., assay = SeuratObject::DefaultAssay(obj), slot = "counts" ) ## S4 method for signature 'SingleCellExperiment' ca_biplot( obj, xdim = 1, ydim = 2, princ_coords = 1, row_labels = NULL, col_labels = NULL, type = "ggplot", col_metadata = NULL, row_metadata = NULL, show_all = TRUE, ..., assay = "counts" )
ca_biplot( obj, xdim = 1, ydim = 2, princ_coords = 1, row_labels = NULL, col_labels = NULL, type = "ggplot", col_metadata = NULL, row_metadata = NULL, show_all = TRUE, ... ) ## S4 method for signature 'cacomp' ca_biplot( obj, xdim = 1, ydim = 2, princ_coords = 1, row_labels = NULL, col_labels = NULL, type = "ggplot", col_metadata = NULL, row_metadata = NULL, show_all = TRUE, ... ) ## S4 method for signature 'Seurat' ca_biplot( obj, xdim = 1, ydim = 2, princ_coords = 1, row_labels = NULL, col_labels = NULL, type = "ggplot", col_metadata = NULL, row_metadata = NULL, show_all = TRUE, ..., assay = SeuratObject::DefaultAssay(obj), slot = "counts" ) ## S4 method for signature 'SingleCellExperiment' ca_biplot( obj, xdim = 1, ydim = 2, princ_coords = 1, row_labels = NULL, col_labels = NULL, type = "ggplot", col_metadata = NULL, row_metadata = NULL, show_all = TRUE, ..., assay = "counts" )
obj |
An object of class "cacomp" with the relevant standardized and principal coordinates calculated, or alternatively an object of class "Seurat" or "SingleCellExperiment" with a dim. reduction named "CA" saved. |
xdim |
Integer. The dimension for the x-axis. Default 1. |
ydim |
Integer. The dimension for the y-axis. Default 2. |
princ_coords |
Integer. If 1 then principal coordinates are used for the rows, if 2 for the columns. Default 1 (rows). |
row_labels |
Numeric vector. Indices for the rows for which a label should be added (label should be stored in rownames). Default NULL. |
col_labels |
Numeric vector. Indices for the columns for which a label should be added (label should be stored in colnames). Default NULL (no columns). |
type |
String. Type of plot to draw. Either "ggplot" or "plotly". Default "ggplot". |
col_metadata |
named vector of additional metadata to color points. The names of the elements in col_metadata should correspond to the column names in 'obj'. If NULL columns will be in a single color. Can also specify a metadata column for Seurat/SingleCellExperiment objects. |
row_metadata |
named vector of additional metadata to color points. The names of the elements in row_metadata should correspond to the row names in 'obj'. If NULL rows will be in a single color. Can also specify a metadata column for Seurat/SingleCellExperiment objects. |
show_all |
logical. If FALSE cells/genes that are not in col_metadata/ row_metadata are not plotted. If *_metadata is NULL, the cell or genes respectively will still be plotted. |
... |
Further arguments. |
assay |
SingleCellExperiment assay for recomputation |
slot |
Seurat assay slot from which to get matrix. |
Choosing type "plotly" will generate an interactive html plot with the package plotly. Type "ggplot" generates a static plot. Depending on whether 'princ_coords' is set to 1 or 2 either the principal coordinates of either the rows (1) or the columns (2) are chosen. For the other the standard coordinates are plotted (assymetric biplot). Labels for rows and columns should be stored in the row and column names respectively.
Plot of class "plotly" or "ggplot".
# Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis ca <- cacomp(obj = cnts, princ_coords = 3) ca_biplot(ca)
# Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis ca <- cacomp(obj = cnts, princ_coords = 3) ca_biplot(ca)
'ca_coords' calculates the standardized and principal coordinates of the rows and columns in CA space.
ca_coords(caobj, dims = NULL, princ_coords = 3, princ_only = FALSE)
ca_coords(caobj, dims = NULL, princ_coords = 3, princ_only = FALSE)
caobj |
A "cacomp" object as outputted from 'cacomp()'. |
dims |
Integer indicating the number of dimensions to use for the calculation of coordinates. All elements of caobj (where applicable) will be reduced to the given number of dimensions. Default NULL (keeps all dimensions). |
princ_coords |
Integer. Number indicating whether principal coordinates should be calculated for the rows (=1), columns (=2), both (=3) or none (=0). Default 3. |
princ_only |
Logical, whether only principal coordinates should be calculated. Or, in other words, whether the standardized coordinates are already calculated and stored in 'caobj'. Default 'FALSE'. |
Takes a "cacomp" object and calculates standardized and principal coordinates for the visualization of CA results in a biplot or to subsequently calculate coordinates in an Association Plot.
Returns input object with coordinates added. std_coords_rows/std_coords_cols: Standardized coordinates of rows/columns. prin_coords_rows/prin_coords_cols: Principal coordinates of rows/columns.
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 1) ca <- ca_coords(ca, princ_coords = 3)
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 1) ca <- ca_coords(ca, princ_coords = 3)
'cacomp' performs correspondence analysis on a matrix or Seurat/SingleCellExperiment object and returns the transformed data.
'cacomp.seurat' performs correspondence analysis on an assay from a Seurat container and stores the standardized coordinates of the columns (= cells) and the principal coordinates of the rows (= genes) as a DimReduc Object in the Seurat container.
'cacomp.SingleCellExperiment' performs correspondence analysis on an assay from a SingleCellExperiment and stores the standardized coordinates of the columns (= cells) and the principal coordinates of the rows (= genes) as a matrix in the SingleCellExperiment container.
cacomp( obj, coords = TRUE, princ_coords = 3, python = FALSE, dims = NULL, top = 5000, inertia = TRUE, rm_zeros = TRUE, residuals = "pearson", cutoff = NULL, clip = FALSE, ... ) ## S4 method for signature 'matrix' cacomp( obj, coords = TRUE, princ_coords = 3, python = FALSE, dims = NULL, top = 5000, inertia = TRUE, rm_zeros = TRUE, residuals = "pearson", cutoff = NULL, clip = FALSE, ... ) ## S4 method for signature 'dgCMatrix' cacomp( obj, coords = TRUE, princ_coords = 3, python = FALSE, dims = NULL, top = 5000, inertia = TRUE, rm_zeros = TRUE, residuals = "pearson", cutoff = NULL, clip = FALSE, ... ) ## S4 method for signature 'Seurat' cacomp( obj, coords = TRUE, princ_coords = 3, python = FALSE, dims = NULL, top = 5000, inertia = TRUE, rm_zeros = TRUE, residuals = "pearson", cutoff = NULL, clip = FALSE, ..., assay = SeuratObject::DefaultAssay(obj), slot = "counts", return_input = FALSE ) ## S4 method for signature 'SingleCellExperiment' cacomp( obj, coords = TRUE, princ_coords = 3, python = FALSE, dims = NULL, top = 5000, inertia = TRUE, rm_zeros = TRUE, residuals = "pearson", cutoff = NULL, clip = FALSE, ..., assay = "counts", return_input = FALSE )
cacomp( obj, coords = TRUE, princ_coords = 3, python = FALSE, dims = NULL, top = 5000, inertia = TRUE, rm_zeros = TRUE, residuals = "pearson", cutoff = NULL, clip = FALSE, ... ) ## S4 method for signature 'matrix' cacomp( obj, coords = TRUE, princ_coords = 3, python = FALSE, dims = NULL, top = 5000, inertia = TRUE, rm_zeros = TRUE, residuals = "pearson", cutoff = NULL, clip = FALSE, ... ) ## S4 method for signature 'dgCMatrix' cacomp( obj, coords = TRUE, princ_coords = 3, python = FALSE, dims = NULL, top = 5000, inertia = TRUE, rm_zeros = TRUE, residuals = "pearson", cutoff = NULL, clip = FALSE, ... ) ## S4 method for signature 'Seurat' cacomp( obj, coords = TRUE, princ_coords = 3, python = FALSE, dims = NULL, top = 5000, inertia = TRUE, rm_zeros = TRUE, residuals = "pearson", cutoff = NULL, clip = FALSE, ..., assay = SeuratObject::DefaultAssay(obj), slot = "counts", return_input = FALSE ) ## S4 method for signature 'SingleCellExperiment' cacomp( obj, coords = TRUE, princ_coords = 3, python = FALSE, dims = NULL, top = 5000, inertia = TRUE, rm_zeros = TRUE, residuals = "pearson", cutoff = NULL, clip = FALSE, ..., assay = "counts", return_input = FALSE )
obj |
A numeric matrix or Seurat/SingleCellExperiment object. For sequencing a count matrix, gene expression values with genes in rows and samples/cells in columns. Should contain row and column names. |
coords |
Logical. Indicates whether CA standard coordinates should be calculated. |
princ_coords |
Integer. Number indicating whether principal coordinates should be calculated for the rows (=1), columns (=2), both (=3) or none (=0). |
python |
DEPRACTED. A logical value indicating whether to use singular-value decomposition from the python package torch. This implementation dramatically speeds up computation compared to 'svd()' in R when calculating the full SVD. This parameter only works when dims==NULL or dims==rank(mat), where caculating a full SVD is demanded. |
dims |
Integer. Number of CA dimensions to retain. If NULL: (0.2 * min(nrow(A), ncol(A)) - 1 ). |
top |
Integer. Number of most variable rows to retain. Set NULL to keep all. |
inertia |
Logical. Whether total, row and column inertias should be calculated and returned. |
rm_zeros |
Logical. Whether rows & cols containing only 0s should be removed. Keeping zero only rows/cols might lead to unexpected results. |
residuals |
character string. Specifies which kind of residuals should be calculated. Can be "pearson" (default), "freemantukey" or "NB" for negative-binomial. |
cutoff |
numeric. Residuals that are larger than cutoff or lower than -cutoff are clipped to cutoff. |
clip |
logical. Whether residuals should be clipped if they are higher/lower than a specified cutoff |
... |
Other parameters |
assay |
Character. The assay from which extract the count matrix for SVD, e.g. "RNA" for Seurat objects or "counts"/"logcounts" for SingleCellExperiments. |
slot |
character. The slot of the Seurat assay. Default "counts". |
return_input |
Logical. If TRUE returns the input (SingleCellExperiment/Seurat object) with the CA results saved in the reducedDim/DimReduc slot "CA". Otherwise returns a "cacomp". Default FALSE. |
The calculation is performed according to the work of Michael Greenacre. Singular value decomposition can be performed either with the base R function 'svd' or preferably by the faster pytorch implementation (python = TRUE). When working with large matrices, CA coordinates and principal coordinates should only be computed when needed to save computational time.
Returns a named list of class "cacomp" with components U, V and D: The results from the SVD. row_masses and col_masses: Row and columns masses. top_rows: How many of the most variable rows were retained for the analysis. tot_inertia, row_inertia and col_inertia: Only if inertia = TRUE. Total, row and column inertia respectively.
If return_imput = TRUE with Seurat container: Returns input obj of class "Seurat" with a new Dimensional Reduction Object named "CA". Standard coordinates of the cells are saved as embeddings, the principal coordinates of the genes as loadings and the singular values (= square root of principal intertias/eigenvalues) are stored as stdev. To recompute a regular "cacomp" object without rerunning cacomp use 'as.cacomp()'.
If return_input =TRUE for SingleCellExperiment input returns a SingleCellExperiment object with a matrix of standardized coordinates of the columns in reducedDim(obj, "CA"). Additionally, the matrix contains the following attributes: "prin_coords_rows": Principal coordinates of the rows. "singval": Singular values. For the explained inertia of each principal axis calculate singval^2. "percInertia": Percent explained inertia of each principal axis. To recompute a regular "cacomp" object from a SingleCellExperiment without rerunning cacomp use 'as.cacomp()'.
Greenacre, M. Correspondence Analysis in Practice, Third Edition, 2017.
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 3, top = 5) ########### # Seurat # ########### library(SeuratObject) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Create Seurat object seu <- CreateSeuratObject(counts = cnts) # Run CA and save in dim. reduction slot seu <- cacomp(seu, return_input = TRUE, assay = "RNA", slot = "counts") # Run CA and return cacomp object ca <- cacomp(seu, return_input = FALSE, assay = "RNA", slot = "counts") ######################## # SingleCellExperiment # ######################## library(SingleCellExperiment) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) logcnts <- log2(cnts + 1) # Create SingleCellExperiment object sce <- SingleCellExperiment(assays=list(counts=cnts, logcounts=logcnts)) # run CA and save in dim. reduction slot. sce <- cacomp(sce, return_input = TRUE, assay = "counts") # on counts sce <- cacomp(sce, return_input = TRUE, assay = "logcounts") # on logcounts # run CA and return cacomp object. ca <- cacomp(sce, return_input = FALSE, assay = "counts")
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 3, top = 5) ########### # Seurat # ########### library(SeuratObject) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Create Seurat object seu <- CreateSeuratObject(counts = cnts) # Run CA and save in dim. reduction slot seu <- cacomp(seu, return_input = TRUE, assay = "RNA", slot = "counts") # Run CA and return cacomp object ca <- cacomp(seu, return_input = FALSE, assay = "RNA", slot = "counts") ######################## # SingleCellExperiment # ######################## library(SingleCellExperiment) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) logcnts <- log2(cnts + 1) # Create SingleCellExperiment object sce <- SingleCellExperiment(assays=list(counts=cnts, logcounts=logcnts)) # run CA and save in dim. reduction slot. sce <- cacomp(sce, return_input = TRUE, assay = "counts") # on counts sce <- cacomp(sce, return_input = TRUE, assay = "logcounts") # on logcounts # run CA and return cacomp object. ca <- cacomp(sce, return_input = FALSE, assay = "counts")
Prints slot names of cacomp object
cacomp_names(caobj)
cacomp_names(caobj)
caobj |
a cacomp object |
Prints slot names of cacomp object
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 3, top = 5) # show slot names: cacomp_names(ca)
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 3, top = 5) # show slot names: cacomp_names(ca)
Access slots in a cacomp object
cacomp_slot(caobj, slot)
cacomp_slot(caobj, slot)
caobj |
a cacomp object |
slot |
slot to return |
Chosen slot of the cacomp object
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 3, top = 5) # access left singular vectors cacomp_slot(ca, "U")
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 3, top = 5) # access left singular vectors cacomp_slot(ca, "U")
This class contains elements necessary to computer CA coordinates or Association Plot coordinates, as well as other informative data such as row/column inertia, gene-wise APL-scores, etc. ...
Creates new cacomp object.
new_cacomp(...)
new_cacomp(...)
... |
slot names and objects for new cacomp object. |
cacomp object
U
class "matrix". Left singular vectors of the original input matrix.
V
class "matrix". Right singular vectors of the original input matrix.
D
class "numeric". Singular values of the original inpt matrix.
std_coords_rows
class "matrix". Standardized CA coordinates of the rows.
std_coords_cols
class "matrix". Standardized CA coordinates of the columns.
prin_coords_rows
class "matrix". Principal CA coordinates of the rows.
prin_coords_cols
class "matrix". Principal CA coordinates of the columns.
apl_rows
class "matrix". Association Plot coordinates of the rows for the direction defined in slot "group"
apl_cols
class "matrix". Association Plot coordinates of the columns for the direction defined in slot "group"
APL_score
class "data.frame". Contains rows sorted by the APL score. Columns: Rowname (gene name in the case of gene expression data), APL score calculated for the direction defined in slot "group", the original row number and the rank of the row as determined by the score.
dims
class "numeric". Number of dimensions in CA space.
group
class "numeric". Indices of the chosen columns for APL calculations.
row_masses
class "numeric". Row masses of the frequency table.
col_masses
class "numeric". Column masses of the frequency table.
top_rows
class "numeric". Number of most variable rows chosen.
tot_inertia
class "numeric". Total inertia in CA space.
row_inertia
class "numeric". Row-wise inertia in CA space.
col_inertia
class "numeric". Column-wise inertia in CA space.
permuted_data
class "list". Storage slot for permuted data.
params
class "list". List of parameters.
set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) res <- APL:::comp_std_residuals(mat=cnts) SVD <- svd(res$S) names(SVD) <- c("D", "U", "V") SVD <- SVD[c(2, 1, 3)] ca <- new_cacomp(U = SVD$U, V = SVD$V, D = SVD$D, row_masses = res$rowm, col_masses = res$colm)
set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) res <- APL:::comp_std_residuals(mat=cnts) SVD <- svd(res$S) names(SVD) <- c("D", "U", "V") SVD <- SVD[c(2, 1, 3)] ca <- new_cacomp(U = SVD$U, V = SVD$V, D = SVD$D, row_masses = res$rowm, col_masses = res$colm)
calc_residuals
provides optional residuals as the basis for Correspondence
Analysis
calc_residuals(mat, residuals = "pearson", clip = FALSE, cutoff = NULL)
calc_residuals(mat, residuals = "pearson", clip = FALSE, cutoff = NULL)
mat |
A numerical matrix or coercible to one by 'as.matrix()'. Should have row and column names. |
residuals |
character string. Specifies which kind of residuals should be calculated. Can be "pearson" (default), "freemantukey" or "NB" for negative-binomial. |
clip |
logical. Whether residuals should be clipped if they are higher/lower than a specified cutoff |
cutoff |
numeric. Residuals that are larger than cutoff or lower than -cutoff are clipped to cutoff. |
A named list. The elements are:
"S": standardized residual matrix.
"tot": grand total of the original matrix.
"rowm": row masses.
"colm": column masses.
Checks if the slots in a cacomp object are of the correct size and whether they are coherent.
check_cacomp(object)
check_cacomp(object)
object |
A cacomp object. |
TRUE if it is a valid cacomp object. FALSE otherwise.
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 3, top = 5) check_cacomp(ca)
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 3, top = 5) check_cacomp(ca)
Clips Pearson or negative-binomial residuals above or below a determined value. For Pearson (Poisson) residuals it is set by default for 1, for NB at sqrt(n).
clip_residuals(S, cutoff = sqrt(ncol(S)))
clip_residuals(S, cutoff = sqrt(ncol(S)))
S |
Matrix of residuals. |
cutoff |
Value above/below which clipping should happen. |
Matrix of clipped residuals.
Lause, J., Berens, P. & Kobak, D. Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data. Genome Biol 22, 258 (2021). https://doi.org/10.1186/s13059-021-02451-7
Computes Freeman-Tukey residuals
comp_ft_residuals(mat)
comp_ft_residuals(mat)
mat |
A numerical matrix or coercible to one by 'as.matrix()'. Should have row and column names. |
A named list. The elements are:
"S": standardized residual matrix.
"tot": grand total of the original matrix.
"rowm": row masses.
"colm": column masses.
Computes the residuals based on the negative binomial model. By default a theta of 100 is used to capture technical variation.
comp_NB_residuals(mat, theta = 100, clip = FALSE, cutoff = NULL, freq = TRUE)
comp_NB_residuals(mat, theta = 100, clip = FALSE, cutoff = NULL, freq = TRUE)
mat |
A numerical matrix or coercible to one by 'as.matrix()'. Should have row and column names. |
theta |
Overdispersion parameter. By default set to 100 as described in Lause and Berens, 2021 (see references). |
clip |
logical. Whether residuals should be clipped if they are higher/lower than a specified cutoff |
cutoff |
numeric. Residuals that are larger than cutoff or lower than -cutoff are clipped to cutoff. |
freq |
logical. Whether a table of frequencies (as used in CA) should be used. |
A named list. The elements are:
"S": standardized residual matrix.
"tot": grand total of the original matrix.
"rowm": row masses.
"colm": column masses.
Lause, J., Berens, P. & Kobak, D. Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data. Genome Biol 22, 258 (2021). https://doi.org/10.1186/s13059-021-02451-7
'comp_std_residuals' computes the standardized residual matrix S based on the Poisson model, which is the basis for correspondence analysis and serves as input for singular value decomposition (SVD).
comp_std_residuals(mat, clip = FALSE, cutoff = NULL)
comp_std_residuals(mat, clip = FALSE, cutoff = NULL)
mat |
A numerical matrix or coercible to one by 'as.matrix()'. Should have row and column names. |
clip |
logical. Whether residuals should be clipped if they are higher/lower than a specified cutoff |
cutoff |
numeric. Residuals that are larger than cutoff or lower than -cutoff are clipped to cutoff. |
Calculates standardized residual matrix S from the proportion matrix P and
the expected values E according to .
A named list. The elements are:
"S": standardized residual matrix.
"tot": grand total of the original matrix.
"rowm": row masses.
"colm": column masses.
Helper function for pick_dims() to run the elbow method.
elbow_method(obj, mat, reps, python = FALSE, return_plot = FALSE)
elbow_method(obj, mat, reps, python = FALSE, return_plot = FALSE)
obj |
A "cacomp" object as outputted from 'cacomp()' |
mat |
A numeric matrix. For sequencing a count matrix, gene expression values with genes in rows and samples/cells in columns. Should contain row and column names. |
reps |
Integer. Number of permutations to perform when choosing "elbow_rule". |
python |
A logical value indicating whether to use singular value decomposition from the python package torch. This implementation dramatically speeds up computation compared to 'svd()' in R. |
return_plot |
TRUE/FALSE. Whether a plot should be returned when choosing "elbow_rule". |
'elbow_method' (for 'return_plot=TRUE') returns a list with two elements: "dims" contains the number of dimensions and "plot" a ggplot. if 'return_plot=TRUE' it just returns the number of picked dimensions.
Ciampi, Antonio, González Marcos, Ana and Castejón Limas, Manuel.
Correspondence analysis and 2-way clustering. (2005), SORT 29(1).
# Get example data from Seurat library(SeuratObject) set.seed(2358) cnts <- as.matrix(SeuratObject::LayerData(pbmc_small, assay = "RNA", layer = "data")) # Run correspondence analysis. ca <- cacomp(obj = cnts) # pick dimensions with the elbow rule. Returns list. pd <- pick_dims(obj = ca, mat = cnts, method = "elbow_rule", return_plot = TRUE, reps = 10) pd$plot ca_sub <- subset_dims(ca, dims = pd$dims)
# Get example data from Seurat library(SeuratObject) set.seed(2358) cnts <- as.matrix(SeuratObject::LayerData(pbmc_small, assay = "RNA", layer = "data")) # Run correspondence analysis. ca <- cacomp(obj = cnts) # pick dimensions with the elbow rule. Returns list. pd <- pick_dims(obj = ca, mat = cnts, method = "elbow_rule", return_plot = TRUE, reps = 10) pd$plot ca_sub <- subset_dims(ca, dims = pd$dims)
Calculates the contributing inertia of each row which is defined as sum of squares of pearson residuals and selects the rows with the largested inertias, e.g. 5,000.
inertia_rows(mat, top = 5000, ...)
inertia_rows(mat, top = 5000, ...)
mat |
A matrix with genes in rows and cells in columns. |
top |
Number of genes to select. |
... |
Further arguments for 'comp_std_residuals' |
Returns a matrix, which consists of the top variable rows of mat.
Helper function to check if object is empty.
is.empty(x)
is.empty(x)
x |
object |
TRUE if x has length 0 and is not NULL. FALSE otherwise
Calculates matrix of apl coordinates when permuting the original data.
permutation_cutoff( caobj, mat, group = caobj@group, dims = caobj@dims, reps = 10, store_perm = FALSE, python = TRUE )
permutation_cutoff( caobj, mat, group = caobj@group, dims = caobj@dims, reps = 10, store_perm = FALSE, python = TRUE )
caobj |
A "cacomp" object with principal row coordinates and standardized column coordinates calculated. |
mat |
A numeric matrix. For sequencing a count matrix, gene expression values with genes in rows and samples/cells in columns. Should contain row and column names. |
group |
Vector of indices of the columns to calculate centroid/x-axis direction. |
dims |
Integer. Number of CA dimensions to retain. Needs to be the same as in caobj! |
reps |
Integer. Number of permutations to perform. |
store_perm |
Logical. Whether permuted data should be stored in the CA object. This implementation dramatically speeds up computation compared to 'svd()' in R. |
python |
DEPRACTED. A logical value indicating whether to use singular-value decomposition from the python package torch. |
List with permuted apl coordinates ("apl_perm") and, a list of saved ca components ("saved_ca") that allow for quick recomputation of the CA results. For random_direction_cutoff this saved_ca is empty.
Allow the user to choose from 4 different methods ("avg_inertia", "maj_inertia", "scree_plot" and "elbow_rule") to estimate the number of dimensions that best represent the data.
pick_dims( obj, mat = NULL, method = "scree_plot", reps = 3, python = FALSE, return_plot = FALSE, ... ) ## S4 method for signature 'cacomp' pick_dims( obj, mat = NULL, method = "scree_plot", reps = 3, python = FALSE, return_plot = FALSE, ... ) ## S4 method for signature 'Seurat' pick_dims( obj, mat = NULL, method = "scree_plot", reps = 3, python = FALSE, return_plot = FALSE, ..., assay = SeuratObject::DefaultAssay(obj), slot = "counts" ) ## S4 method for signature 'SingleCellExperiment' pick_dims( obj, mat = NULL, method = "scree_plot", reps = 3, python = FALSE, return_plot = FALSE, ..., assay = "counts" )
pick_dims( obj, mat = NULL, method = "scree_plot", reps = 3, python = FALSE, return_plot = FALSE, ... ) ## S4 method for signature 'cacomp' pick_dims( obj, mat = NULL, method = "scree_plot", reps = 3, python = FALSE, return_plot = FALSE, ... ) ## S4 method for signature 'Seurat' pick_dims( obj, mat = NULL, method = "scree_plot", reps = 3, python = FALSE, return_plot = FALSE, ..., assay = SeuratObject::DefaultAssay(obj), slot = "counts" ) ## S4 method for signature 'SingleCellExperiment' pick_dims( obj, mat = NULL, method = "scree_plot", reps = 3, python = FALSE, return_plot = FALSE, ..., assay = "counts" )
obj |
A "cacomp" object as outputted from |
mat |
A numeric matrix. For sequencing a count matrix, gene expression values with genes in rows and samples/cells in columns. Should contain row and column names. |
method |
String. Either "scree_plot", "avg_inertia", "maj_inertia" or "elbow_rule" (see Details section). Default "scree_plot". |
reps |
Integer. Number of permutations to perform when choosing "elbow_rule". Default 3. |
python |
DEPRACTED. A logical value indicating whether to use singular value
decomposition from the python package torch.
This implementation dramatically speeds up computation compared to |
return_plot |
TRUE/FALSE. Whether a plot should be returned when choosing "elbow_rule". Default FALSE. |
... |
Arguments forwarded to methods. |
assay |
Character. The assay from which to extract the count matrix for SVD, e.g. "RNA" for Seurat objects or "counts"/"logcounts" for SingleCellExperiments. |
slot |
Character. Data slot of the Seurat assay. E.g. "data" or "counts". Default "counts". |
"avg_inertia" calculates the number of dimensions in which the inertia is above the average inertia.
"maj_inertia" calculates the number of dimensions in which cumulatively explain up to 80% of the total inertia.
"scree_plot" plots a scree plot.
"elbow_rule" formalization of the commonly used elbow rule. Permutes the
rows for each column and reruns cacomp()
for a total of reps
times.
The number of relevant dimensions is obtained from the point where the
line for the explained inertia of the permuted data intersects with the
actual data.
For avg_inertia
, maj_inertia
and elbow_rule
(when return_plot=FALSE
)
returns an integer, indicating the suggested number of dimensions to use.
scree_plot
returns a ggplot object.
elbow_rule
(for return_plot=TRUE
) returns a list with two elements:
"dims" contains the number of dimensions and "plot" a ggplot.
# Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis. ca <- cacomp(obj = cnts) # pick dimensions with the elbow rule. Returns list. set.seed(2358) pd <- pick_dims(obj = ca, mat = cnts, method = "elbow_rule", return_plot = TRUE, reps = 10) pd$plot ca_sub <- subset_dims(ca, dims = pd$dims) # pick dimensions which explain cumulatively >80% of total inertia. # Returns vector. pd <- pick_dims(obj = ca, method = "maj_inertia") ca_sub <- subset_dims(ca, dims = pd) ################################ # pick_dims for Seurat objects # ################################ library(SeuratObject) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Create Seurat object seu <- CreateSeuratObject(counts = cnts) # run CA and save in dim. reduction slot. seu <- cacomp(seu, return_input = TRUE, assay = "RNA", slot = "counts") # pick dimensions pd <- pick_dims(obj = seu, method = "maj_inertia", assay = "RNA", slot = "counts") ############################################## # pick_dims for SingleCellExperiment objects # ############################################## library(SingleCellExperiment) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Create SingleCellExperiment object sce <- SingleCellExperiment(assays=list(counts=cnts)) # run CA and save in dim. reduction slot. sce <- cacomp(sce, return_input = TRUE, assay = "counts") # pick dimensions pd <- pick_dims(obj = sce, method = "maj_inertia", assay = "counts")
# Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Run correspondence analysis. ca <- cacomp(obj = cnts) # pick dimensions with the elbow rule. Returns list. set.seed(2358) pd <- pick_dims(obj = ca, mat = cnts, method = "elbow_rule", return_plot = TRUE, reps = 10) pd$plot ca_sub <- subset_dims(ca, dims = pd$dims) # pick dimensions which explain cumulatively >80% of total inertia. # Returns vector. pd <- pick_dims(obj = ca, method = "maj_inertia") ca_sub <- subset_dims(ca, dims = pd) ################################ # pick_dims for Seurat objects # ################################ library(SeuratObject) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Create Seurat object seu <- CreateSeuratObject(counts = cnts) # run CA and save in dim. reduction slot. seu <- cacomp(seu, return_input = TRUE, assay = "RNA", slot = "counts") # pick dimensions pd <- pick_dims(obj = seu, method = "maj_inertia", assay = "RNA", slot = "counts") ############################################## # pick_dims for SingleCellExperiment objects # ############################################## library(SingleCellExperiment) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Create SingleCellExperiment object sce <- SingleCellExperiment(assays=list(counts=cnts)) # run CA and save in dim. reduction slot. sce <- cacomp(sce, return_input = TRUE, assay = "counts") # pick dimensions pd <- pick_dims(obj = sce, method = "maj_inertia", assay = "counts")
Plots the results from the data frame generated via apl_topGO.
plot_enrichment(genenr, ntop = 10)
plot_enrichment(genenr, ntop = 10)
genenr |
data.frame. gene enrichment results table. |
ntop |
numeric. Number of elements to plot. |
Returns a ggplot plot.
library(SeuratObject) set.seed(1234) cnts <- SeuratObject::LayerData(pbmc_small, assay = "RNA", layer = "counts") cnts <- as.matrix(cnts) # Run CA on example from Seurat ca <- cacomp(pbmc_small, princ_coords = 3, return_input = FALSE, assay = "RNA", slot = "counts") grp <- which(Idents(pbmc_small) == 2) ca <- apl_coords(ca, group = grp) ca <- apl_score(ca, mat = cnts) enr <- apl_topGO(ca, ontology = "BP", organism = "hs") plot_enrichment(enr)
library(SeuratObject) set.seed(1234) cnts <- SeuratObject::LayerData(pbmc_small, assay = "RNA", layer = "counts") cnts <- as.matrix(cnts) # Run CA on example from Seurat ca <- cacomp(pbmc_small, princ_coords = 3, return_input = FALSE, assay = "RNA", slot = "counts") grp <- which(Idents(pbmc_small) == 2) ca <- apl_coords(ca, group = grp) ca <- apl_score(ca, mat = cnts) enr <- apl_topGO(ca, ontology = "BP", organism = "hs") plot_enrichment(enr)
Calculates matrix of apl coordinates for random directions
random_direction_cutoff(caobj, dims = caobj@dims, reps = 100)
random_direction_cutoff(caobj, dims = caobj@dims, reps = 100)
caobj |
A "cacomp" object with principal row coordinates and standardized column coordinates calculated. |
dims |
Integer. Number of CA dimensions to retain. Needs to be the same as in caobj! |
reps |
Integer. Number of permutations to perform. |
List with permuted apl coordinates ("apl_perm") and, a list of saved ca components ("saved_ca") that allow for quick recomputation of the CA results. For random_direction_cutoff this saved_ca is empty.
The caobj needs to have the std_coords_cols, the prin_coords_rows and D calculated. From this the remainder will be calculated. Future updates might extend this functionality.
recompute(calist, mat, ...)
recompute(calist, mat, ...)
calist |
A list with std_coords_cols, the prin_coords_rows and D. |
mat |
A matrix from which the cacomp object is derived from. |
... |
Further arguments forwarded to cacomp. |
A cacomp object with additional calculated row_masses, col_masses, std_coords_rows, U and V.
removes 0-only rows and columns in a matrix.
rm_zeros(obj)
rm_zeros(obj)
obj |
A matrix. |
Input matrix with rows & columns consisting of only 0 removed.
Computes singular value decomposition and coordinates for the Association Plot.
runAPL.SingleCellExperiment: Computes singular value decomposition and coordinates for the Association Plot from SingleCellExperiment objects with reducedDim(obj, "CA") slot (optional).
runAPL.Seurat: Computes singular value decomposition and coordinates for the Association Plot from Seurat objects, optionally with a DimReduc Object in the "CA" slot.
run_APL( obj, group, caobj = NULL, dims = NULL, nrow = 10, top = 5000, clip = FALSE, score = TRUE, score_method = "permutation", mark_rows = NULL, mark_cols = NULL, reps = 3, python = FALSE, row_labs = TRUE, col_labs = TRUE, type = "plotly", show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow", pd_method = "elbow_rule", pd_reps = 1, pd_use = TRUE ) runAPL( obj, group, caobj = NULL, dims = NULL, nrow = 10, top = 5000, clip = FALSE, score = TRUE, score_method = "permutation", mark_rows = NULL, mark_cols = caobj@group, reps = 3, python = FALSE, row_labs = TRUE, col_labs = TRUE, type = "plotly", show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow", pd_method = "elbow_rule", pd_reps = 1, pd_use = TRUE, ... ) ## S4 method for signature 'matrix' runAPL( obj, group, caobj = NULL, dims = NULL, nrow = 10, top = 5000, clip = FALSE, score = TRUE, score_method = "permutation", mark_rows = NULL, mark_cols = NULL, reps = 3, python = FALSE, row_labs = TRUE, col_labs = TRUE, type = "plotly", show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow", pd_method = "elbow_rule", pd_reps = 1, pd_use = TRUE, ... ) ## S4 method for signature 'SingleCellExperiment' runAPL( obj, group, caobj = NULL, dims = NULL, nrow = 10, top = 5000, clip = FALSE, score = TRUE, score_method = "permutation", mark_rows = NULL, mark_cols = NULL, reps = 3, python = FALSE, row_labs = TRUE, col_labs = TRUE, type = "plotly", show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow", pd_method = "elbow_rule", pd_reps = 1, pd_use = TRUE, ..., assay = "counts" ) ## S4 method for signature 'Seurat' runAPL( obj, group, caobj = NULL, dims = NULL, nrow = 10, top = 5000, clip = FALSE, score = TRUE, score_method = "permutation", mark_rows = NULL, mark_cols = NULL, reps = 3, python = FALSE, row_labs = TRUE, col_labs = TRUE, type = "plotly", show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow", pd_method = "elbow_rule", pd_reps = 1, pd_use = TRUE, ..., assay = SeuratObject::DefaultAssay(obj), slot = "counts" ) ## S4 method for signature 'dgCMatrix' runAPL( obj, group, caobj = NULL, dims = NULL, nrow = 10, top = 5000, clip = FALSE, score = TRUE, score_method = "permutation", mark_rows = NULL, mark_cols = NULL, reps = 3, python = FALSE, row_labs = TRUE, col_labs = TRUE, type = "plotly", show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow", pd_method = "elbow_rule", pd_reps = 1, pd_use = TRUE, ... )
run_APL( obj, group, caobj = NULL, dims = NULL, nrow = 10, top = 5000, clip = FALSE, score = TRUE, score_method = "permutation", mark_rows = NULL, mark_cols = NULL, reps = 3, python = FALSE, row_labs = TRUE, col_labs = TRUE, type = "plotly", show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow", pd_method = "elbow_rule", pd_reps = 1, pd_use = TRUE ) runAPL( obj, group, caobj = NULL, dims = NULL, nrow = 10, top = 5000, clip = FALSE, score = TRUE, score_method = "permutation", mark_rows = NULL, mark_cols = caobj@group, reps = 3, python = FALSE, row_labs = TRUE, col_labs = TRUE, type = "plotly", show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow", pd_method = "elbow_rule", pd_reps = 1, pd_use = TRUE, ... ) ## S4 method for signature 'matrix' runAPL( obj, group, caobj = NULL, dims = NULL, nrow = 10, top = 5000, clip = FALSE, score = TRUE, score_method = "permutation", mark_rows = NULL, mark_cols = NULL, reps = 3, python = FALSE, row_labs = TRUE, col_labs = TRUE, type = "plotly", show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow", pd_method = "elbow_rule", pd_reps = 1, pd_use = TRUE, ... ) ## S4 method for signature 'SingleCellExperiment' runAPL( obj, group, caobj = NULL, dims = NULL, nrow = 10, top = 5000, clip = FALSE, score = TRUE, score_method = "permutation", mark_rows = NULL, mark_cols = NULL, reps = 3, python = FALSE, row_labs = TRUE, col_labs = TRUE, type = "plotly", show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow", pd_method = "elbow_rule", pd_reps = 1, pd_use = TRUE, ..., assay = "counts" ) ## S4 method for signature 'Seurat' runAPL( obj, group, caobj = NULL, dims = NULL, nrow = 10, top = 5000, clip = FALSE, score = TRUE, score_method = "permutation", mark_rows = NULL, mark_cols = NULL, reps = 3, python = FALSE, row_labs = TRUE, col_labs = TRUE, type = "plotly", show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow", pd_method = "elbow_rule", pd_reps = 1, pd_use = TRUE, ..., assay = SeuratObject::DefaultAssay(obj), slot = "counts" ) ## S4 method for signature 'dgCMatrix' runAPL( obj, group, caobj = NULL, dims = NULL, nrow = 10, top = 5000, clip = FALSE, score = TRUE, score_method = "permutation", mark_rows = NULL, mark_cols = NULL, reps = 3, python = FALSE, row_labs = TRUE, col_labs = TRUE, type = "plotly", show_cols = FALSE, show_rows = TRUE, score_cutoff = 0, score_color = "rainbow", pd_method = "elbow_rule", pd_reps = 1, pd_use = TRUE, ... )
obj |
A numeric matrix. For sequencing usually a count matrix, gene expression values with genes in rows and samples/cells in columns. Should contain row and column names. |
group |
Numeric/Character. Vector of indices or column names of the columns to calculate centroid/x-axis direction. |
caobj |
A "cacomp" object as outputted from 'cacomp()'. If not supplied will be calculated. Default NULL. |
dims |
Integer. Number of CA dimensions to retain. If NULL: (0.2 * min(nrow(A), ncol(A)) - 1 ). |
nrow |
Integer. The top nrow scored row labels will be added to the plot if score = TRUE. Default 10. |
top |
Integer. Number of most variable rows to retain. Set NULL to keep all. |
clip |
logical. Whether residuals should be clipped if they are higher/lower than a specified cutoff |
score |
Logical. Whether rows should be scored and ranked. Ignored when a vector is supplied to mark_rows. Default TRUE. |
score_method |
Method to calculate the cutoff. Either "random" for random direction method or "permutation" for the permutation method. |
mark_rows |
Character vector. Names of rows that should be highlighted in the plot. If not NULL, score is ignored. Default NULL. |
mark_cols |
Character vector. Names of cols that should be highlighted in the plot. |
reps |
Integer. Number of permutations during scoring. Default 3. |
python |
DEPRACTED. A logical value indicating whether to use singular-value decomposition from the python package torch. This implementation dramatically speeds up computation compared to 'svd()' in R when calculating the full SVD. This parameter only works when dims==NULL or dims==rank(mat), where caculating a full SVD is demanded. |
row_labs |
Logical. Whether labels for rows indicated by rows_idx should be labeled with text. Default TRUE. |
col_labs |
Logical. Whether labels for columns indicated by cols_idx shouls be labeled with text. Default FALSE. |
type |
"ggplot"/"plotly". For a static plot a string "ggplot", for an interactive plot "plotly". Default "ggplot". |
show_cols |
Logical. Whether column points should be plotted. |
show_rows |
Logical. Whether row points should be plotted. |
score_cutoff |
Numeric. Rows (genes) with a score >= score_cutoff will be colored according to their score if show_score = TRUE. |
score_color |
Either "rainbow" or "viridis". |
pd_method |
Which method to use for pick_dims (pick_dims). |
pd_reps |
Number of repetitions performed when using "elbow_rule" in 'pick_dims'. (pick_dims) |
pd_use |
Whether to use 'pick_dims' (pick_dims) to determine the number of dimensions. Ignored when 'dims' is set by the user. |
... |
Arguments forwarded to methods. |
assay |
Character. The assay from which extract the count matrix for SVD, e.g. "RNA" for Seurat objects or "counts"/"logcounts" for SingleCellExperiments. |
slot |
character. The Seurat assay slot from which to extract the count matrix. |
The function is a wrapper that calls 'cacomp()', 'apl_coords()', 'apl_score()' and finally 'apl()' for ease of use. The chosen defaults are most useful for genomics experiments, but for more fine grained control the functions can be also run individually for the same results. If score = FALSE, nrow and reps are ignored. If mark_rows is not NULL score is treated as if FALSE.
Association Plot (plotly object).
Association Plots: Visualizing associations in high-dimensional
correspondence analysis biplots
Elzbieta Gralinska, Martin Vingron
bioRxiv 2020.10.23.352096; doi: https://doi.org/10.1101/2020.10.23.352096
set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # (nonsensical) APL APL:::run_APL(obj = cnts, group = 1:10, dims = 10, top = 500, score = TRUE, show_cols = TRUE, type = "ggplot") set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # (nonsensical) APL runAPL(obj = cnts, group = 1:10, dims = 10, top = 500, score = TRUE, show_cols = TRUE, type = "ggplot") ######################## # SingleCellExperiment # ######################## library(SingleCellExperiment) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) sce <- SingleCellExperiment(assays=list(counts=cnts)) # (nonsensical) APL runAPL(obj = sce, group = 1:10, dims = 10, top = 500, score = TRUE, show_cols = TRUE, type = "ggplot", assay = "counts") ########### # Seurat # ########### library(SeuratObject) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) seu <- CreateSeuratObject(counts = cnts) # (nonsensical) APL runAPL(obj = seu, group = 1:10, dims = 10, top = 500, score = TRUE, show_cols = TRUE, type = "ggplot", assay = "RNA", slot = "counts") set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(seq(0.01,0.1,by=0.01), 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) cnts <- Matrix::Matrix(cnts) # (nonsensical) APL runAPL(obj = cnts, group = 1:10, dims = 10, top = 500, score = TRUE, show_cols = TRUE, type = "ggplot")
set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # (nonsensical) APL APL:::run_APL(obj = cnts, group = 1:10, dims = 10, top = 500, score = TRUE, show_cols = TRUE, type = "ggplot") set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # (nonsensical) APL runAPL(obj = cnts, group = 1:10, dims = 10, top = 500, score = TRUE, show_cols = TRUE, type = "ggplot") ######################## # SingleCellExperiment # ######################## library(SingleCellExperiment) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) sce <- SingleCellExperiment(assays=list(counts=cnts)) # (nonsensical) APL runAPL(obj = sce, group = 1:10, dims = 10, top = 500, score = TRUE, show_cols = TRUE, type = "ggplot", assay = "counts") ########### # Seurat # ########### library(SeuratObject) set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:100, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) seu <- CreateSeuratObject(counts = cnts) # (nonsensical) APL runAPL(obj = seu, group = 1:10, dims = 10, top = 500, score = TRUE, show_cols = TRUE, type = "ggplot", assay = "RNA", slot = "counts") set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(seq(0.01,0.1,by=0.01), 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) cnts <- Matrix::Matrix(cnts) # (nonsensical) APL runAPL(obj = cnts, group = 1:10, dims = 10, top = 500, score = TRUE, show_cols = TRUE, type = "ggplot")
'run_cacomp' performs correspondence analysis on a matrix and returns the transformed data.
run_cacomp( obj, coords = TRUE, princ_coords = 3, python = FALSE, dims = 100, top = 5000, inertia = TRUE, rm_zeros = TRUE, residuals = "pearson", cutoff = NULL, clip = FALSE, ... )
run_cacomp( obj, coords = TRUE, princ_coords = 3, python = FALSE, dims = 100, top = 5000, inertia = TRUE, rm_zeros = TRUE, residuals = "pearson", cutoff = NULL, clip = FALSE, ... )
obj |
A numeric matrix or Seurat/SingleCellExperiment object. For sequencing a count matrix, gene expression values with genes in rows and samples/cells in columns. Should contain row and column names. |
coords |
Logical. Indicates whether CA standard coordinates should be calculated. |
princ_coords |
Integer. Number indicating whether principal coordinates should be calculated for the rows (=1), columns (=2), both (=3) or none (=0). |
python |
DEPRACTED. A logical value indicating whether to use singular-value decomposition from the python package torch. This implementation dramatically speeds up computation compared to 'svd()' in R when calculating the full SVD. This parameter only works when dims==NULL or dims==rank(mat), where caculating a full SVD is demanded. |
dims |
Integer. Number of CA dimensions to retain. If NULL: (0.2 * min(nrow(A), ncol(A)) - 1 ). |
top |
Integer. Number of most variable rows to retain. Set NULL to keep all. |
inertia |
Logical. Whether total, row and column inertias should be calculated and returned. |
rm_zeros |
Logical. Whether rows & cols containing only 0s should be removed. Keeping zero only rows/cols might lead to unexpected results. |
residuals |
character string. Specifies which kind of residuals should be calculated. Can be "pearson" (default), "freemantukey" or "NB" for negative-binomial. |
cutoff |
numeric. Residuals that are larger than cutoff or lower than -cutoff are clipped to cutoff. |
clip |
logical. Whether residuals should be clipped if they are higher/lower than a specified cutoff |
... |
Arguments forwarded to methods. |
The calculation is performed according to the work of Michael Greenacre. When working with large matrices, CA coordinates and principal coordinates should only be computed when needed to save computational time.
Returns a named list of class "cacomp" with components U, V and D: The results from the SVD. row_masses and col_masses: Row and columns masses. top_rows: How many of the most variable rows/genes were retained for the analysis. tot_inertia, row_inertia and col_inertia: Only if inertia = TRUE. Total, row and column inertia respectively.
Greenacre, M. Correspondence Analysis in Practice, Third Edition, 2017.
Plots a scree plot.
scree_plot(df)
scree_plot(df)
df |
A data frame with columns "dims" and "inertia". |
Returns a ggplot object.
Provides more user friendly printing of cacomp objects.
show.cacomp(object) ## S4 method for signature 'cacomp' show(object)
show.cacomp(object) ## S4 method for signature 'cacomp' show(object)
object |
cacomp object to print |
prints summary information about cacomp object.
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 3, top = 5) ca
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(obj = cnts, princ_coords = 3, top = 5) ca
Subsets the dimensions according to user input.
subset_dims(caobj, dims)
subset_dims(caobj, dims)
caobj |
A caobj. |
dims |
Integer. Number of dimensions. |
Returns caobj.
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(cnts) ca <- subset_dims(ca, 2)
# Simulate scRNAseq data. cnts <- data.frame(cell_1 = rpois(10, 5), cell_2 = rpois(10, 10), cell_3 = rpois(10, 20)) rownames(cnts) <- paste0("gene_", 1:10) cnts <- as.matrix(cnts) # Run correspondence analysis. ca <- cacomp(cnts) ca <- subset_dims(ca, 2)
Calculates the variance of the chi-square component matrix and selects the rows with the highest variance, e.g. 5,000.
var_rows(mat, residuals = "pearson", top = 5000, ...)
var_rows(mat, residuals = "pearson", top = 5000, ...)
mat |
A numeric matrix. For sequencing a count matrix, gene expression values with genes in rows and samples/cells in columns. Should contain row and column names. |
residuals |
character string. Specifies which kind of residuals should be calculated. Can be "pearson" (default), "freemantukey" or "NB" for negative-binomial. |
top |
Integer. Number of most variable rows to retain. Default 5000. |
... |
Further arguments for 'calc_residuals'. |
Returns a matrix, which consists of the top variable rows of mat.
set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Choose top 5000 most variable genes cnts <- var_rows(mat = cnts, top = 5000)
set.seed(1234) # Simulate counts cnts <- mapply(function(x){rpois(n = 500, lambda = x)}, x = sample(1:20, 50, replace = TRUE)) rownames(cnts) <- paste0("gene_", 1:nrow(cnts)) colnames(cnts) <- paste0("cell_", 1:ncol(cnts)) # Choose top 5000 most variable genes cnts <- var_rows(mat = cnts, top = 5000)