| Title: | Calculate per-cell gene signature scores in scRNA-seq data using cell set overlaps |
|---|---|
| Description: | Cell Set Overlap Analysis (CSOA) is a tool for calculating per-cell gene signature scores in an scRNA-seq dataset. CSOA constructs a set for each gene in the signature, consisting of the cells that highly express the gene. Next, all overlaps of pairs of cell sets are computed, ranked, filtered and scored. The CSOA per-cell score is calculated by summing up all products of the overlap scores and the min-max-normalized expression of the two involved genes. CSOA can run on a Seurat object, a SingleCellExperiment object, a matrix and a dgCMatrix. |
| Authors: | Andrei-Florian Stoica [aut, cre] (ORCID: <https://orcid.org/0000-0002-5253-0826>) |
| Maintainer: | Andrei-Florian Stoica <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.3.1 |
| Built: | 2026-05-06 11:09:39 UTC |
| Source: | https://github.com/bioc/CSOA |
This function attaches the data frame of CSOA scores to the input object.
## Default S3 method: attachCellScores(scObj, scoreDF, ...) ## S3 method for class 'Seurat' attachCellScores(scObj, scoreDF, ...) ## S3 method for class 'SingleCellExperiment' attachCellScores(scObj, scoreDF, ...) ## S3 method for class 'matrix' attachCellScores(scObj, scoreDF, ...) ## S3 method for class 'dgCMatrix' attachCellScores(scObj, scoreDF, ...) attachCellScores(scObj, ...)## Default S3 method: attachCellScores(scObj, scoreDF, ...) ## S3 method for class 'Seurat' attachCellScores(scObj, scoreDF, ...) ## S3 method for class 'SingleCellExperiment' attachCellScores(scObj, scoreDF, ...) ## S3 method for class 'matrix' attachCellScores(scObj, scoreDF, ...) ## S3 method for class 'dgCMatrix' attachCellScores(scObj, scoreDF, ...) attachCellScores(scObj, ...)
scObj |
A Seurat object, SingleCellExperiment object, or expression matrix. |
scoreDF |
Data frame of CSOA scores. |
... |
Additional arguments. |
If the input object is of the Seurat or SingleCellExpression class, it will be returned with added CSOA scores. Otherwise, a list containing the expression matrix and the CSOA scores data frame will be returned.
A Seurat object with CSOA scores added to metadata.
A SingleCellExperiment object with CSOA scores added to
colData.
A list containing the expression matrix and the CSOA scores data frame.
A list containing the expression matrix and the CSOA scores data frame.
library(Seurat) mat <- matrix(0, 500, 300) rownames(mat) <- paste0('G', seq(500)) colnames(mat) <- paste0('C', seq(300)) mat[sample(8000)] <- sample(20, 8000, TRUE) seuratObj <- CreateSeuratObject(mat) seuratObj <- NormalizeData(seuratObj) scores <- data.frame(CSOA = runif(300)) seuratObj <- attachCellScores(seuratObj, scores) head(seuratObj$CSOA)library(Seurat) mat <- matrix(0, 500, 300) rownames(mat) <- paste0('G', seq(500)) colnames(mat) <- paste0('C', seq(300)) mat[sample(8000)] <- sample(20, 8000, TRUE) seuratObj <- CreateSeuratObject(mat) seuratObj <- NormalizeData(seuratObj) scores <- data.frame(CSOA = runif(300)) seuratObj <- attachCellScores(seuratObj, scores) head(seuratObj$CSOA)
This function plots a simple heatmap, with clustering but no dendograms.
basicHeatmap( mat, aesNames = c("x", "y", "Score"), title = "Heatmap", axisTextSize = 7, palette = paletteer_c("grDevices::Plasma", 30), ... )basicHeatmap( mat, aesNames = c("x", "y", "Score"), title = "Heatmap", axisTextSize = 7, palette = paletteer_c("grDevices::Plasma", 30), ... )
mat |
A matrix. |
aesNames |
A character vector of length 3 representing the y, x and fill aes elements. |
title |
Plot title. |
axisTextSize |
Axis text size. |
palette |
Color palette. |
... |
Other arguments passed to |
A ggplot object.
mat <- matrix(0, 10, 20) mat[sample(length(mat), 50)] <- runif(50, max = 2.5) basicHeatmap(mat)mat <- matrix(0, 10, 20) mat[sample(length(mat), 50)] <- runif(50, max = 2.5) basicHeatmap(mat)
This function iteratively removes all overlap pairs with neighbor Jaccard score below a fixed cutoff until no overlap pairs can be removed. Subsequently, overlap ranks are recalculated.
breakWeakTies( overlapDF, cutoff = 1/3, doConnComp = FALSE, mtMethod = c("BY", "BH") )breakWeakTies( overlapDF, cutoff = 1/3, doConnComp = FALSE, mtMethod = c("BY", "BH") )
overlapDF |
An overlap data frame. |
cutoff |
A cutoff used in the filtering of edges with low Jaccard scores. |
doConnComp |
Whether to calculate the connected components. |
mtMethod |
Multiple testing correction method. Choose between Benjamini-Yekutieli ('BY') and Benjamini-Hochberg('BH'). Default is 'BY'. |
The functions removes overlaps for which the two involved genes record too few shared neighbors—genes whose cell set significantly overlaps with those of both overlap genes.
An overlap data frame in which edges with low Jaccard scores have been removed.
overlapDF <- data.frame(gene1=paste0('G', c(1, 3, 7, 6, 8, 2, 4, 3, 4, 5)), gene2=paste0('G', c(2, 7, 2, 5, 4, 5, 1, 2, 2, 8)), ratio=runif(10, 2, 10), pval=runif(10, 0, 1e-10)) breakWeakTies(overlapDF, cutoff=0.1)overlapDF <- data.frame(gene1=paste0('G', c(1, 3, 7, 6, 8, 2, 4, 3, 4, 5)), gene2=paste0('G', c(2, 7, 2, 5, 4, 5, 1, 2, 2, 8)), ratio=runif(10, 2, 10), pval=runif(10, 0, 1e-10)) breakWeakTies(overlapDF, cutoff=0.1)
This function returns a logical matrix that shows the representation of cell sets among all cells.
cellDistribution(cellSets, allCells)cellDistribution(cellSets, allCells)
cellSets |
A list of character vectors. |
allCells |
Names of all cells in the dataset. |
A logical matrix with genes as rows and cells as columns.
cellSets <- list(c('A', 'H', 'J'), c('B', 'D', 'E', 'F', 'J'), c('C', 'I', 'L')) allCells <- LETTERS[seq(15)] cellDistribution(cellSets, allCells)cellSets <- list(c('A', 'H', 'J'), c('B', 'D', 'E', 'F', 'J'), c('C', 'I', 'L')) allCells <- LETTERS[seq(15)] cellDistribution(cellSets, allCells)
This function computes the statistical significance of overlaps of pairs of cell sets.
cellSetsOverlaps(cellSets, nCells, pairs = NULL, overlapFileName = NULL)cellSetsOverlaps(cellSets, nCells, pairs = NULL, overlapFileName = NULL)
cellSets |
A list of character arrays. |
nCells |
The total number of cells in the Seurat object. |
pairs |
Pairs of cell sets to be assessed. If |
overlapFileName |
The name of the file where the overlap data frame
will be saved. This option can be used to save time when performing
exploratory analyses such as trying different |
A data frame listing statistics for all cell set overlaps: cell set sizes, recorded and expected shared cells, the recorded-over-expected ratio and the hypergeometric p-value.
cellSets <- list(G1 = c('A', 'H', 'J'), G2 = c('B', 'D', 'E', 'F', 'J'), G3 = c('C', 'I', 'L')) cellSetsOverlaps(cellSets, 40)cellSets <- list(G1 = c('A', 'H', 'J'), G2 = c('B', 'D', 'E', 'F', 'J'), G3 = c('C', 'I', 'L')) cellSetsOverlaps(cellSets, 40)
This function extracts the data expression matrix from object as a non-sparse matrix. Selected genes can be specified as input.
expMat(scObj, ...) ## Default S3 method: expMat(scObj, genes = NULL, ...) ## S3 method for class 'Seurat' expMat(scObj, ...) ## S3 method for class 'SingleCellExperiment' expMat(scObj, ...) ## S3 method for class 'dgCMatrix' expMat(scObj, ...) ## S3 method for class 'matrix' expMat(scObj, ...)expMat(scObj, ...) ## Default S3 method: expMat(scObj, genes = NULL, ...) ## S3 method for class 'Seurat' expMat(scObj, ...) ## S3 method for class 'SingleCellExperiment' expMat(scObj, ...) ## S3 method for class 'dgCMatrix' expMat(scObj, ...) ## S3 method for class 'matrix' expMat(scObj, ...)
scObj |
A Seurat object, SingleCellExperiment object, or expression matrix. |
... |
Additional arguments. |
genes |
Genes retained in the expression matrix. If NULL, all genes will be retained |
An expression matrix.
library(Seurat) mat <- matrix(0, 6, 4) mat[sample(length(mat), 7)] <- sample(3, 7, TRUE) seuratObj <- CreateSeuratObject(counts = mat) seuratObj <- NormalizeData(seuratObj) expMat(seuratObj)library(Seurat) mat <- matrix(0, 6, 4) mat[sample(length(mat), 7)] <- sample(3, 7, TRUE) seuratObj <- CreateSeuratObject(counts = mat) seuratObj <- NormalizeData(seuratObj) expMat(seuratObj)
This function customizes the appearance of Seurat::FeaturePlot for
improved distinctiveness and aesthetics.
featureWes( seuratObj, feature, title = feature, idClass = NULL, labelSize = 3.5, titleSize = 12, palette = paletteer_d("wesanderson::Royal1")[c(3, 2)], ... )featureWes( seuratObj, feature, title = feature, idClass = NULL, labelSize = 3.5, titleSize = 12, palette = paletteer_d("wesanderson::Royal1")[c(3, 2)], ... )
seuratObj |
A Seurat object. |
feature |
Seurat feature. |
title |
Plot title. |
idClass |
Column to be used for labelling. If NULL, no column-based labels will be generated. |
labelSize |
Size of labels. Ignored if idClass is NULL. |
titleSize |
Title size. |
palette |
Color palette. |
... |
Additional arguments passed to |
A ggplot object.
library(Seurat) mat <- matrix(0, 3000, 800) mat[sample(length(mat), 90000)] <- sample(8, 90000, TRUE) seuratObj <- CreateSeuratObject(counts = mat) seuratObj <- FindVariableFeatures(seuratObj, nfeatures=200) seuratObj <- NormalizeData(seuratObj) seuratObj <- ScaleData(seuratObj) seuratObj <- RunPCA(seuratObj, verbose=FALSE) seuratObj <- RunUMAP(seuratObj, dims=1:20, verbose=FALSE) featureWes(seuratObj, 'Feature3')library(Seurat) mat <- matrix(0, 3000, 800) mat[sample(length(mat), 90000)] <- sample(8, 90000, TRUE) seuratObj <- CreateSeuratObject(counts = mat) seuratObj <- FindVariableFeatures(seuratObj, nfeatures=200) seuratObj <- NormalizeData(seuratObj) seuratObj <- ScaleData(seuratObj) seuratObj <- RunPCA(seuratObj, verbose=FALSE) seuratObj <- RunUMAP(seuratObj, dims=1:20, verbose=FALSE) featureWes(seuratObj, 'Feature3')
This function draws a radial plot for an overlap data frame to illustrate gene participation in top overlaps.
geneRadialPlot( overlapObj, title = "Top overlap genes plot", degreeLegendTitle = "Number of top overlaps", groupLegendTitle = "Group", extraCircles = 2, groupNames = NULL, cutoff = NULL, ... )geneRadialPlot( overlapObj, title = "Top overlap genes plot", degreeLegendTitle = "Number of top overlaps", groupLegendTitle = "Group", extraCircles = 2, groupNames = NULL, cutoff = NULL, ... )
overlapObj |
An overlap data frame or list of overlap data frames. |
title |
Plot title. |
degreeLegendTitle |
The title of the degree legend. |
groupLegendTitle |
The title of the group legend. If |
extraCircles |
Number of extra circles to be displayed on the plot. |
groupNames |
Names of groups. If provided, must be a vector of the same length as the list of overlap data frames. |
cutoff |
Number of retained edges from each overlap data frame after
refiltering. If |
... |
Additional parameters passed to |
The function can separate genes by groups. The groups can be, for
instance, different gene sets, or different connected components of the same
overlap data frame. A wrapper around henna::radialPlot
A ggplot object.
edgesDF <- data.frame(gene1 = paste0('G', c(1, 2, 3, 4, 7, 8, 10, 11, 11, 10, 10, 10)), gene2 = paste0('G', c(2, 5, 1, 8, 4, 9, 12, 13, 14, 13, 16, 14))) edgesDF <- henna::connectedComponents(edgesDF, 'group') geneRadialPlot(edgesDF, groupLegendTitle='Component', extraCircles=1)edgesDF <- data.frame(gene1 = paste0('G', c(1, 2, 3, 4, 7, 8, 10, 11, 11, 10, 10, 10)), gene2 = paste0('G', c(2, 5, 1, 8, 4, 9, 12, 13, 14, 13, 16, 14))) edgesDF <- henna::connectedComponents(edgesDF, 'group') geneRadialPlot(edgesDF, groupLegendTitle='Component', extraCircles=1)
This function constructs, for each gene in the expression matrix, a set of cells expressing the gene at or above the input percentile. Subsequently, overlaps of pairs of the constructed cell sets are assessed for statistical significance.
generateOverlaps( geneSetExp, percentile = 90, pairs = NULL, overlapFileName = NULL )generateOverlaps( geneSetExp, percentile = 90, pairs = NULL, overlapFileName = NULL )
geneSetExp |
A gene expression non-sparse matrix with the rows restricted to the genes for which cell sets will be computed. |
percentile |
A positive number under 100. |
pairs |
Pairs of cell sets to be assessed. If |
overlapFileName |
The name of the file where the overlap data frame
will be saved. This option can be used to save time when performing
exploratory analyses such as trying different |
Wrapper around percentileSets and cellSetsOverlaps.
A data frame listing statistics for all cell set overlaps
mat <- matrix(0, 2000, 500) rownames(mat) <- paste0('G', seq(2000)) colnames(mat) <- paste0('C', seq(500)) mat[sample(length(mat), 270000)] <- sample(50, 270000, TRUE) mat <- mat[paste0('G', sample(2000, 5)), ] generateOverlaps(mat)mat <- matrix(0, 2000, 500) rownames(mat) <- paste0('G', seq(2000)) colnames(mat) <- paste0('C', seq(500)) mat[sample(length(mat), 270000)] <- sample(50, 270000, TRUE) mat <- mat[paste0('G', sample(2000, 5)), ] generateOverlaps(mat)
This function returns all unorderded pairs of two elements from a vector.
getPairs(v)getPairs(v)
v |
A vector. |
A list of vectors of length 2.
v <- c('ASD', 'VBN', 'HJKL') getPairs(v)v <- c('ASD', 'VBN', 'HJKL') getPairs(v)
This function illustrates the process of selecting the overlap rank cutoff by plotting rank frequencies against ranks and showcasing the convex hull of the rank-frequency points.
overlapCutoffPlot( overlapDF, title = "Overlap cutoff plot", palette = c("purple", "yellow"), hullWidth = 0.8, xLab = "Overlap rank", yLab = "Frequency", legendLabs = c("Accepted overlaps", "Discarded overlaps"), pointShape = 24, ... )overlapCutoffPlot( overlapDF, title = "Overlap cutoff plot", palette = c("purple", "yellow"), hullWidth = 0.8, xLab = "Overlap rank", yLab = "Frequency", legendLabs = c("Accepted overlaps", "Discarded overlaps"), pointShape = 24, ... )
overlapDF |
Processed overlap data frame created
with |
title |
Plot title. |
palette |
Color palette. Must have two colors, the first one representing accepted overlaps and the other representing discarded overlaps. |
hullWidth |
Width of the convex hull. |
xLab |
x axis label. |
yLab |
y axis label. |
legendLabs |
Legend labels. |
pointShape |
Point shape. |
... |
Additional arguments passed to |
A wrapper around henna::hullPlot.
A ggplot object.
overlapDF <- data.frame(gene1=paste0('G', c(1, 3, 7, 6, 8, 2, 4, 3, 4, 5)), gene2=paste0('G', c(2, 7, 2, 5, 4, 5, 1, 2, 2, 8)), rank=c(1, 2, 3, 4, 4, 6, 7, 7, 7, 10)) overlapCutoffPlot(overlapDF)overlapDF <- data.frame(gene1=paste0('G', c(1, 3, 7, 6, 8, 2, 4, 3, 4, 5)), gene2=paste0('G', c(2, 7, 2, 5, 4, 5, 1, 2, 2, 8)), rank=c(1, 2, 3, 4, 4, 6, 7, 7, 7, 10)) overlapCutoffPlot(overlapDF)
This function gets all genes from an overlap data frame.
overlapGenes(overlapDF, components = NULL)overlapGenes(overlapDF, components = NULL)
overlapDF |
Overlap data frame. |
components |
A numeric vector representing the connected components of the overlap data frame graph. |
A character vector of genes.
overlapDF <- data.frame(gene1 = paste0('G', c(1, 2, 3)), gene1 = paste0('G', c(2, 7, 8))) overlapGenes(overlapDF)overlapDF <- data.frame(gene1 = paste0('G', c(1, 2, 3)), gene1 = paste0('G', c(2, 7, 8))) overlapGenes(overlapDF)
This function plots the graph of the overlap data frame, with genes as vertices and overlaps as edges.
overlapNetworkPlot( overlapDF, title = "Top overlaps network plot", nodeColor = "orange", edgeColor = "green4", ... )overlapNetworkPlot( overlapDF, title = "Top overlaps network plot", nodeColor = "orange", edgeColor = "green4", ... )
overlapDF |
Overlap data frame. |
title |
Plot title. |
nodeColor |
The color of nodes. If |
edgeColor |
The color of edges. |
... |
Additional parameters passed to |
A thin wrapper around henna::networkPlot.
An overlap network plot.
overlapDF <- data.frame(gene1 = paste0('G', c(1, 2, 5, 6, 7, 17)), gene2 = paste0('G', c(2, 5, 8, 11, 11, 11)), rank = c(1, 1, 3, 3, 3, 3)) overlapNetworkPlot(overlapDF)overlapDF <- data.frame(gene1 = paste0('G', c(1, 2, 5, 6, 7, 17)), gene2 = paste0('G', c(2, 5, 8, 11, 11, 11)), rank = c(1, 1, 3, 3, 3, 3)) overlapNetworkPlot(overlapDF)
This function extracts the gene pairs from an overlap data frame.
overlapPairs(overlapDF)overlapPairs(overlapDF)
overlapDF |
Overlap data frame. |
A list of gene pairs.
overlapDF <- data.frame(gene1 = paste0('G', c(1, 2, 3)), gene1 = paste0('G', c(2, 7, 8))) overlapPairs(overlapDF)overlapDF <- data.frame(gene1 = paste0('G', c(1, 2, 3)), gene1 = paste0('G', c(2, 7, 8))) overlapPairs(overlapDF)
This function constructs, for each gene in the expression matrix, a set of cells expressing the gene at or above the input percentile.
percentileSets(geneSetExp, percentile = 90)percentileSets(geneSetExp, percentile = 90)
geneSetExp |
A gene expression non-sparse matrix with the rows restricted to the genes for which cell sets will be computed. |
percentile |
A positive number under 100. |
A named list of character vectors of length equaling the number of input genes. Each vector stores the cells expressing the gene at or above the input percentile.
mat <- matrix(0, 1000, 500) rownames(mat) <- paste0('G', seq(1000)) colnames(mat) <- paste0('C', seq(500)) mat[sample(length(mat), 70000)] <- sample(50, 70000, TRUE) mat <- mat[paste0('G', sample(1000, 3)), ] percentileSets(mat)mat <- matrix(0, 1000, 500) rownames(mat) <- paste0('G', seq(1000)) colnames(mat) <- paste0('C', seq(500)) mat[sample(length(mat), 70000)] <- sample(50, 70000, TRUE) mat <- mat[paste0('G', sample(1000, 3)), ] percentileSets(mat)
This function filters, ranks and scores previously generated overlaps of cell sets.
processOverlaps( overlapDF, mtMethod = c("BY", "BH"), jaccardCutoff = NULL, osMethod = c("log", "minmax"), ... )processOverlaps( overlapDF, mtMethod = c("BY", "BH"), jaccardCutoff = NULL, osMethod = c("log", "minmax"), ... )
overlapDF |
Overlap data frame. |
mtMethod |
Multiple testing correction method. Choose between Benjamini-Yekutieli ('BY') and Benjamini-Hochberg('BH'). Default is 'BY'. |
jaccardCutoff |
A cutoff used in the filtering of edges with low
Jaccard scores. If |
osMethod |
Method used to compute overlap scores. Options are "log" and "minmax". |
... |
Additional arguments passed to |
Wrapper around byCorrectDF, rankOverlaps,
prepareFiltering, filterOverlaps and scoreOverlaps.
If jaccardCutoff is not NULL, it also calls
breakWeakTies between filterOverlaps and scoreOverlaps.
A data frame consisting of filtered, ranked and scored cell sets overlaps
overlapDF <- data.frame(gene1=paste0('G', c(1, 3, 7, 6, 8, 2, 4, 3, 4, 5)), gene2=paste0('G', c(2, 7, 2, 5, 4, 5, 1, 2, 2, 8)), ratio=runif(10, 2, 10), pval=runif(10, 0, 1e-10)) processOverlaps(overlapDF)overlapDF <- data.frame(gene1=paste0('G', c(1, 3, 7, 6, 8, 2, 4, 3, 4, 5)), gene2=paste0('G', c(2, 7, 2, 5, 4, 5, 1, 2, 2, 8)), ratio=runif(10, 2, 10), pval=runif(10, 0, 1e-10)) processOverlaps(overlapDF)
This functions reads a .qs2 file, deletes it, and returns its content.
qGrab(qs2File)qGrab(qs2File)
qs2File |
Name of .qs2 file with path. |
The content of the .qs2 file.
library(qs2) qs_save(c(1, 2, 3), 'temp.qs2') qGrab('temp.qs2')library(qs2) qs_save(c(1, 2, 3), 'temp.qs2') qGrab('temp.qs2')
This function generates cell set overlaps for input gene sets based on percentiles of gene expression, computes the significance of these overlaps, ranks, filters and scores the overlaps, and builds a per-cell score by summing the products of overlap scores and the min-max-normalized expression of the corresponding pairs of genes.
runCSOA( scObj, geneSets, percentile = 90, mtMethod = c("BY", "BH"), jaccardCutoff = NULL, osMethod = c("log", "minmax"), overlapFileName = NULL, pairFileTemplate = NULL, keepOverlapOrder = FALSE, ... )runCSOA( scObj, geneSets, percentile = 90, mtMethod = c("BY", "BH"), jaccardCutoff = NULL, osMethod = c("log", "minmax"), overlapFileName = NULL, pairFileTemplate = NULL, keepOverlapOrder = FALSE, ... )
scObj |
A Seurat object, SingleCellExperiment object, or expression matrix. |
geneSets |
Named list of character vectors of which each must contain at least two genes. |
percentile |
A positive number under 100. |
mtMethod |
Multiple testing correction method. Choose between Benjamini-Yekutieli ('BY') and Benjamini-Hochberg('BH'). Default is 'BY'. |
jaccardCutoff |
A cutoff used in the filtering of edges with low
Jaccard scores. If |
osMethod |
Method used to compute overlap scores. Options are "log" and "minmax". |
overlapFileName |
The name of the file where the overlap data frame
will be saved. This option can be used to save time when performing
exploratory analyses such as trying different |
pairFileTemplate |
Character object used in the naming of the files
where the pair data frames will be saved. Default is |
keepOverlapOrder |
Keep the rank-based order of overlaps in the
pair score file, as opposed to changing it to a pair score-based order.
Ignored if |
... |
Additional arguments. |
Wrapper around expMat, generateOverlaps,
scoreCells and attachCellScores.
An object of the same class as scObj with per-gene-set CSOA scores assigned for each cell.
mat <- matrix(0, 500, 300) rownames(mat) <- paste0('G', seq(500)) colnames(mat) <- paste0('C', seq(300)) mat[sample(8000)] <- runif(8000, max=13) genes <- paste0('G', seq(200)) mat[genes, 20:50] <- matrix(runif(200 * 31, min = 14, max = 15), nrow = 200, ncol = 31) geneSet1 <- paste0('G', seq(1, 150)) geneSet2 <- paste0('G', seq(50, 200)) df <- runCSOA(mat, list(a = geneSet1, b = geneSet2)) head(df)mat <- matrix(0, 500, 300) rownames(mat) <- paste0('G', seq(500)) colnames(mat) <- paste0('C', seq(300)) mat[sample(8000)] <- runif(8000, max=13) genes <- paste0('G', seq(200)) mat[genes, 20:50] <- matrix(runif(200 * 31, min = 14, max = 15), nrow = 200, ncol = 31) geneSet1 <- paste0('G', seq(1, 150)) geneSet2 <- paste0('G', seq(50, 200)) df <- runCSOA(mat, list(a = geneSet1, b = geneSet2)) head(df)
This function scores an overlap data frame using its associated list of pairs. The overlap data frame is split based on the overlaps corresponding to each gene set and scored, and the output is rejoined as a data frame.
scoreCells( geneSetExp, overlapDF, setPairs, geneSetNames, mtMethod = c("BY", "BH"), jaccardCutoff = NULL, osMethod = c("log", "minmax"), pairFileTemplate = NULL, keepOverlapOrder = FALSE, ... )scoreCells( geneSetExp, overlapDF, setPairs, geneSetNames, mtMethod = c("BY", "BH"), jaccardCutoff = NULL, osMethod = c("log", "minmax"), pairFileTemplate = NULL, keepOverlapOrder = FALSE, ... )
geneSetExp |
A gene expression non-sparse matrix with the rows restricted to the genes for which cell sets will be computed. |
overlapDF |
Overlap data frame. |
setPairs |
A list of overlaps corresponding to each input gene set. |
geneSetNames |
Character vector of names of gene sets. |
mtMethod |
Multiple testing correction method. Choose between Benjamini-Yekutieli ('BY') and Benjamini-Hochberg('BH'). Default is 'BY'. |
jaccardCutoff |
A cutoff used in the filtering of edges with low
Jaccard scores. If |
osMethod |
Method used to compute overlap scores. Options are "log" and "minmax". |
pairFileTemplate |
Character object used in the naming of the files
where the pair data frames will be saved. Default is |
keepOverlapOrder |
Keep the rank-based order of overlaps in the
pair score file, as opposed to changing it to a pair score-based order.
Ignored if pairFileTemplate is |
... |
Additional arguments passed to |
This function calls scoreCells to score each gene set
data frame split from the full overlap data frame.
A data frame whose columns correspond to the CSOA scores of the input gene sets.
mat <- matrix(0, 500, 300) rownames(mat) <- paste0('G', seq(500)) colnames(mat) <- paste0('C', seq(300)) mat[sample(8000)] <- runif(8000, max=13) genes <- paste0('G', seq(200)) mat[genes, 20:50] <- matrix(runif(200 * 31, min=14, max=15), nrow=200, ncol=31) geneSet1 <- paste0('G', seq(1, 150)) geneSet2 <- paste0('G', seq(50, 200)) geneSets <- list(geneSet1, geneSet2) geneSets <- lapply(geneSets, sort) setPairs <- lapply(geneSets, getPairs) pairs <- Reduce(union, setPairs) genes <- union(geneSet1, geneSet2) mat <- mat[genes, ] overlapDF <- generateOverlaps(mat, pairs=pairs) scoreDF <- scoreCells(mat, overlapDF, setPairs, c('set1', 'set2')) head(scoreDF)mat <- matrix(0, 500, 300) rownames(mat) <- paste0('G', seq(500)) colnames(mat) <- paste0('C', seq(300)) mat[sample(8000)] <- runif(8000, max=13) genes <- paste0('G', seq(200)) mat[genes, 20:50] <- matrix(runif(200 * 31, min=14, max=15), nrow=200, ncol=31) geneSet1 <- paste0('G', seq(1, 150)) geneSet2 <- paste0('G', seq(50, 200)) geneSets <- list(geneSet1, geneSet2) geneSets <- lapply(geneSets, sort) setPairs <- lapply(geneSets, getPairs) pairs <- Reduce(union, setPairs) genes <- union(geneSet1, geneSet2) mat <- mat[genes, ] overlapDF <- generateOverlaps(mat, pairs=pairs) scoreDF <- scoreCells(mat, overlapDF, setPairs, c('set1', 'set2')) head(scoreDF)
This function runs CSOA on the connected components of the graph having the filtered overlaps as edges.
scoreModules( scObj, networkDF, components, colStrTemplate = "CSOA_component", ... )scoreModules( scObj, networkDF, components, colStrTemplate = "CSOA_component", ... )
scObj |
A Seurat object, SingleCellExperiment object, or expression matrix. |
networkDF |
A data frame with |
components |
A numeric vector representing the connected components of the overlap data frame graph. |
colStrTemplate |
Character used in the naming of the component gene sets. |
... |
Additional parameters passed to |
An object of the same class as scObj with CSOA scores corresponding to the genes defining each connected components assigned for each cell.
mat <- matrix(0, 500, 300) rownames(mat) <- paste0('G', seq(500)) colnames(mat) <- paste0('C', seq(300)) mat[sample(8000)] <- runif(8000, max=13) genes1 <- paste0('G', seq(100)) mat[genes1, 20:50] <- matrix(runif(100 * 31, min = 14, max = 15), nrow = 100, ncol = 31) genes2 <- paste0('G', seq(101, 200)) mat[genes2, 70:100] <- matrix(runif(100 * 31, min = 14, max = 15), nrow = 100, ncol = 31) genes <- union(genes1, genes2) mat <- mat[genes, ] overlapDF <- generateOverlaps(mat) overlapDF <- processOverlaps(overlapDF) overlapDF <- henna::connectedComponents(overlapDF) df <- scoreModules(mat, overlapDF, unique(overlapDF$component))[[2]] head(df)mat <- matrix(0, 500, 300) rownames(mat) <- paste0('G', seq(500)) colnames(mat) <- paste0('C', seq(300)) mat[sample(8000)] <- runif(8000, max=13) genes1 <- paste0('G', seq(100)) mat[genes1, 20:50] <- matrix(runif(100 * 31, min = 14, max = 15), nrow = 100, ncol = 31) genes2 <- paste0('G', seq(101, 200)) mat[genes2, 70:100] <- matrix(runif(100 * 31, min = 14, max = 15), nrow = 100, ncol = 31) genes <- union(genes1, genes2) mat <- mat[genes, ] overlapDF <- generateOverlaps(mat) overlapDF <- processOverlaps(overlapDF) overlapDF <- henna::connectedComponents(overlapDF) df <- scoreModules(mat, overlapDF, unique(overlapDF$component))[[2]] head(df)