Title: | AUCell: Analysis of 'gene set' activity in single-cell RNA-seq data (e.g. identify cells with specific gene signatures) |
---|---|
Description: | AUCell allows to identify cells with active gene sets (e.g. signatures, gene modules...) in single-cell RNA-seq data. AUCell uses the "Area Under the Curve" (AUC) to calculate whether a critical subset of the input gene set is enriched within the expressed genes for each cell. The distribution of AUC scores across all the cells allows exploring the relative expression of the signature. Since the scoring method is ranking-based, AUCell is independent of the gene expression units and the normalization procedure. In addition, since the cells are evaluated individually, it can easily be applied to bigger datasets, subsetting the expression matrix if needed. |
Authors: | Sara Aibar, Stein Aerts. Laboratory of Computational Biology. VIB-KU Leuven Center for Brain & Disease Research. Leuven, Belgium. |
Maintainer: | Gert Hulselmans <[email protected]> |
License: | GPL-3 |
Version: | 1.29.0 |
Built: | 2024-10-30 03:34:25 UTC |
Source: | https://github.com/bioc/AUCell |
Assigns whether the gene sets are considered "active" on each cell based on the given thresholds
AUCell_assignCells(cellsAUC, thresholds, nCores = 1)
AUCell_assignCells(cellsAUC, thresholds, nCores = 1)
cellsAUC |
AUC object returned by |
thresholds |
Thresholds selected for each geneset (named vector). |
nCores |
Number of cores to use for computation. |
List with the following elements for each gene-set:
'aucThr' threshold value, in the same format as AUCell_exploreThresholds()
'assignment' List of cells that pass the selected AUC threshold
Previous step in the workflow: AUCell_calcAUC
and optionally AUCell_exploreThresholds
See the package vignette for examples and more details:
vignette("AUCell")
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake expression matrix ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) dim(exprMatrix) ################################################## ######### Previous steps in the workflow ######### # Step 1. cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats=FALSE) # Step 2. # (Gene sets: random genes) geneSets <- list(geneSet1=sample(rownames(exprMatrix), 10), geneSet2=sample(rownames(exprMatrix), 5)) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5) ################################################## ############## Step 3: Assign cells ############## # 1. Plot histograms and obtain some pre-computed thresholds # (this example is only meant to show the interface/arguments of the function, # see the vignette for meaningful examples) set.seed(123) par(mfrow=c(1,2)) # Plot is divided into one row and two columns thresholds <- AUCell_exploreThresholds(cells_AUC, plotHist=TRUE) thresholds$geneSet1$aucThr # 2. Obtain cells over a given threshold: names(which(getAUC(cells_AUC)["geneSet1",] > 0.19)) # Alternative 1: assign cells according to the 'automatic' threshold cells_assignment <- AUCell_exploreThresholds(cells_AUC, plotHist=FALSE, assignCells=TRUE) # Cells assigned: getAssignments(cells_assignment) # Threshold applied: getThresholdSelected(cells_assignment) # Alternative 2: choose a threshold manually and assign cells newThresholds = getThresholdSelected(cells_assignment) newThresholds['geneSet1'] = 0.8 newAssignments = AUCell_assignCells(cells_AUC, newThresholds) getAssignments(newAssignments)
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake expression matrix ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) dim(exprMatrix) ################################################## ######### Previous steps in the workflow ######### # Step 1. cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats=FALSE) # Step 2. # (Gene sets: random genes) geneSets <- list(geneSet1=sample(rownames(exprMatrix), 10), geneSet2=sample(rownames(exprMatrix), 5)) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5) ################################################## ############## Step 3: Assign cells ############## # 1. Plot histograms and obtain some pre-computed thresholds # (this example is only meant to show the interface/arguments of the function, # see the vignette for meaningful examples) set.seed(123) par(mfrow=c(1,2)) # Plot is divided into one row and two columns thresholds <- AUCell_exploreThresholds(cells_AUC, plotHist=TRUE) thresholds$geneSet1$aucThr # 2. Obtain cells over a given threshold: names(which(getAUC(cells_AUC)["geneSet1",] > 0.19)) # Alternative 1: assign cells according to the 'automatic' threshold cells_assignment <- AUCell_exploreThresholds(cells_AUC, plotHist=FALSE, assignCells=TRUE) # Cells assigned: getAssignments(cells_assignment) # Threshold applied: getThresholdSelected(cells_assignment) # Alternative 2: choose a threshold manually and assign cells newThresholds = getThresholdSelected(cells_assignment) newThresholds['geneSet1'] = 0.8 newAssignments = AUCell_assignCells(cells_AUC, newThresholds) getAssignments(newAssignments)
Builds the "rankings" for each cell: expression-based ranking for all the genes in each cell.
The genes with same expression value are shuffled. Therefore, genes with expression '0' are randomly sorted at the end of the ranking.
These "rankings" can be seen as a new representation of the original dataset. Once they are calculated, they can be saved for future analyses.
AUCell_buildRankings( exprMat, featureType = "genes", plotStats = TRUE, splitByBlocks = FALSE, BPPARAM = NULL, keepZeroesAsNA = FALSE, verbose = TRUE, nCores = NULL, mctype = NULL, ... ) ## S4 method for signature 'dgCMatrix' AUCell_buildRankings( exprMat, featureType = "genes", plotStats = TRUE, splitByBlocks = TRUE, BPPARAM = NULL, keepZeroesAsNA = FALSE, verbose = TRUE, nCores = NULL, mctype = NULL ) ## S4 method for signature 'matrix' AUCell_buildRankings( exprMat, featureType = "genes", plotStats = TRUE, splitByBlocks = FALSE, BPPARAM = NULL, keepZeroesAsNA = FALSE, verbose = TRUE, nCores = NULL, mctype = NULL ) ## S4 method for signature 'ExpressionSet' AUCell_buildRankings( exprMat, featureType = "genes", plotStats = TRUE, splitByBlocks = FALSE, BPPARAM = NULL, keepZeroesAsNA = FALSE, verbose = TRUE, nCores = NULL, mctype = NULL ) ## S4 method for signature 'SummarizedExperiment' AUCell_buildRankings( exprMat, featureType = "genes", plotStats = TRUE, splitByBlocks = FALSE, BPPARAM = NULL, keepZeroesAsNA = FALSE, verbose = TRUE, assayName = NULL, nCores = NULL, mctype = NULL )
AUCell_buildRankings( exprMat, featureType = "genes", plotStats = TRUE, splitByBlocks = FALSE, BPPARAM = NULL, keepZeroesAsNA = FALSE, verbose = TRUE, nCores = NULL, mctype = NULL, ... ) ## S4 method for signature 'dgCMatrix' AUCell_buildRankings( exprMat, featureType = "genes", plotStats = TRUE, splitByBlocks = TRUE, BPPARAM = NULL, keepZeroesAsNA = FALSE, verbose = TRUE, nCores = NULL, mctype = NULL ) ## S4 method for signature 'matrix' AUCell_buildRankings( exprMat, featureType = "genes", plotStats = TRUE, splitByBlocks = FALSE, BPPARAM = NULL, keepZeroesAsNA = FALSE, verbose = TRUE, nCores = NULL, mctype = NULL ) ## S4 method for signature 'ExpressionSet' AUCell_buildRankings( exprMat, featureType = "genes", plotStats = TRUE, splitByBlocks = FALSE, BPPARAM = NULL, keepZeroesAsNA = FALSE, verbose = TRUE, nCores = NULL, mctype = NULL ) ## S4 method for signature 'SummarizedExperiment' AUCell_buildRankings( exprMat, featureType = "genes", plotStats = TRUE, splitByBlocks = FALSE, BPPARAM = NULL, keepZeroesAsNA = FALSE, verbose = TRUE, assayName = NULL, nCores = NULL, mctype = NULL )
exprMat |
Expression matrix (genes as rows, cells as columns) The expression matrix can also be provided as one of the R/Bioconductor classes:
|
featureType |
Name for the rows (e.g. "genes"). Only for naming the rankings, not used internally. |
plotStats |
Should the function plot the expression boxplots/histograms?
(TRUE / FALSE). These plots can also be produced
with the function |
splitByBlocks |
Whether to split the matrix by blocks in the ranking calculation. Allows using multiple cores. FALSE by default. If using sparse matrices it is automatically set to TRUE. |
BPPARAM |
Set to use multiple cores. Only used if 'splitByBlocks=TRUE' |
keepZeroesAsNA |
Convert zeroes to NA instead of locating randomly at the end of the ranking. |
verbose |
Should the function show progress messages? (TRUE / FALSE) |
nCores |
Deprecated |
mctype |
Deprecated |
... |
Other arguments |
assayName |
Name of the assay containing the expression matrix (e.g. in SingleCellExperiment objects) |
It is important to check that most cells have at least the number of expressed/detected genes that are going to be used to calculate the AUC ('aucMaxRank' in 'calcAUC()'). The histogram provided by 'AUCell_buildRankings()' allows to quickly check this distribution. 'plotGeneCount(exprMatrix)' allows to obtain only the plot before building the rankings.
Ranking of the feature within the cell (features as rows, cells as columns)
Next step in the workflow: AUCell_calcAUC
.
See the package vignette for examples and more details:
vignette("AUCell")
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake expression matrix ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) ################################################## cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats=TRUE) cells_rankings
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake expression matrix ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) ################################################## cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats=TRUE) cells_rankings
Calculates the 'AUC' for each gene-set in each cell.
AUCell_calcAUC( geneSets, rankings, nCores = 1, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(rankings)), verbose = TRUE ) ## S4 method for signature 'list' AUCell_calcAUC( geneSets, rankings, nCores = 1, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(rankings)), verbose = TRUE ) ## S4 method for signature 'character' AUCell_calcAUC( geneSets, rankings, nCores = 1, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(rankings)), verbose = TRUE ) ## S4 method for signature 'GeneSet' AUCell_calcAUC( geneSets, rankings, nCores = 1, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(rankings)), verbose = TRUE ) ## S4 method for signature 'GeneSetCollection' AUCell_calcAUC( geneSets, rankings, nCores = 1, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(rankings)), verbose = TRUE )
AUCell_calcAUC( geneSets, rankings, nCores = 1, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(rankings)), verbose = TRUE ) ## S4 method for signature 'list' AUCell_calcAUC( geneSets, rankings, nCores = 1, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(rankings)), verbose = TRUE ) ## S4 method for signature 'character' AUCell_calcAUC( geneSets, rankings, nCores = 1, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(rankings)), verbose = TRUE ) ## S4 method for signature 'GeneSet' AUCell_calcAUC( geneSets, rankings, nCores = 1, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(rankings)), verbose = TRUE ) ## S4 method for signature 'GeneSetCollection' AUCell_calcAUC( geneSets, rankings, nCores = 1, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(rankings)), verbose = TRUE )
geneSets |
List of gene-sets (or signatures) to test in the cells.
The gene-sets should be provided as |
rankings |
'Rankings' created for this dataset with
|
nCores |
Number of cores to use for computation. |
normAUC |
Wether to normalize the maximum possible AUC to 1 (Default: TRUE). |
aucMaxRank |
Threshold to calculate the AUC (see 'details' section) |
verbose |
Should the function show progress messages? (TRUE / FALSE) |
In a simplified way, the AUC value represents the fraction of genes, within the top X genes in the ranking, that are included in the signature. The parameter 'aucMaxRank' allows to modify the number of genes (maximum ranking) that is used to perform this computation. By default, it is set to 5% of the total number of genes in the rankings. Common values may range from 1 to 20%.
Matrix with the AUC values (gene-sets as rows, cells as columns).
Previous step in the workflow: AUCell_buildRankings
.
Next step in the workflow: AUCell_exploreThresholds
.
See the package vignette for examples and more details:
vignette("AUCell")
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake expression matrix ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) ################################################## ######### Previous step in the workflow ########## # Step 1. cells_rankings <- AUCell_buildRankings(exprMatrix) ################################################## ############## Step 2: Calculate AUC ############# # In this example we use two gene sets: 10 and 5 random genes # (see other formatting examples at the end) fewGenes <- sample(rownames(exprMatrix), 10) otherGenes <- sample(rownames(exprMatrix), 5) geneSets <- list(geneSet1=fewGenes, geneSet2=otherGenes) geneSets # Calculate AUC with the rankings from Step 1. # To be able to run this fake example (which contain only 20 genes), # we use aucMaxRank=5 (top 25% of the genes in the ranking) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5, nCores=1) # Format of the output: cells_AUC # To subset & access the AUC slot (as matrix): cells_AUC[1:2,] cells_AUC[,3:4] getAUC(cells_AUC)[,1:5] # These methods are also available: dim(cells_AUC) nrow(cells_AUC) ncol(cells_AUC) colnames(cells_AUC)[1:4] rownames(cells_AUC) ######################################################### # Alternatives for the input of gene sets: # a) Character vector (i.e. only one gene-set) # It will take the default name 'geneSet' fewGenes test <- AUCell_calcAUC(fewGenes, cells_rankings, aucMaxRank=5) # b) List geneSets <- list(geneSet1=fewGenes, geneSet2=otherGenes) geneSets test <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5) # c) GeneSet object (from GSEABase) library(GSEABase) geneSetOne <- GeneSet(fewGenes, setName="geneSetOne") geneSetOne test <- AUCell_calcAUC(geneSetOne, cells_rankings, aucMaxRank=5) # d) GeneSetCollection object (from GSEABase) geneSetTwo <- GeneSet(otherGenes, setName="geneSetTwo") geneSets <- GeneSetCollection(geneSetOne, geneSetTwo) geneSets test <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5)
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake expression matrix ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) ################################################## ######### Previous step in the workflow ########## # Step 1. cells_rankings <- AUCell_buildRankings(exprMatrix) ################################################## ############## Step 2: Calculate AUC ############# # In this example we use two gene sets: 10 and 5 random genes # (see other formatting examples at the end) fewGenes <- sample(rownames(exprMatrix), 10) otherGenes <- sample(rownames(exprMatrix), 5) geneSets <- list(geneSet1=fewGenes, geneSet2=otherGenes) geneSets # Calculate AUC with the rankings from Step 1. # To be able to run this fake example (which contain only 20 genes), # we use aucMaxRank=5 (top 25% of the genes in the ranking) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5, nCores=1) # Format of the output: cells_AUC # To subset & access the AUC slot (as matrix): cells_AUC[1:2,] cells_AUC[,3:4] getAUC(cells_AUC)[,1:5] # These methods are also available: dim(cells_AUC) nrow(cells_AUC) ncol(cells_AUC) colnames(cells_AUC)[1:4] rownames(cells_AUC) ######################################################### # Alternatives for the input of gene sets: # a) Character vector (i.e. only one gene-set) # It will take the default name 'geneSet' fewGenes test <- AUCell_calcAUC(fewGenes, cells_rankings, aucMaxRank=5) # b) List geneSets <- list(geneSet1=fewGenes, geneSet2=otherGenes) geneSets test <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5) # c) GeneSet object (from GSEABase) library(GSEABase) geneSetOne <- GeneSet(fewGenes, setName="geneSetOne") geneSetOne test <- AUCell_calcAUC(geneSetOne, cells_rankings, aucMaxRank=5) # d) GeneSetCollection object (from GSEABase) geneSetTwo <- GeneSet(otherGenes, setName="geneSetTwo") geneSets <- GeneSetCollection(geneSetOne, geneSetTwo) geneSets test <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5)
Plots all the AUC histograms (per gene-set) and calculates several likely thresholds for each gene-set
AUCell_exploreThresholds( cellsAUC, thrP = 0.01, nCores = 1, smallestPopPercent = 0.25, plotHist = TRUE, densAdjust = 2, assignCells = FALSE, nBreaks = 100, verbose = TRUE ) getThresholdSelected(aucellThresholds) getAssignments(aucellThresholds)
AUCell_exploreThresholds( cellsAUC, thrP = 0.01, nCores = 1, smallestPopPercent = 0.25, plotHist = TRUE, densAdjust = 2, assignCells = FALSE, nBreaks = 100, verbose = TRUE ) getThresholdSelected(aucellThresholds) getAssignments(aucellThresholds)
cellsAUC |
AUC object returned by |
thrP |
Probability to determine outliers in some of the distributions (see 'details' section). By default it is set to 1% (thrP): if there are 3000 cells in the dataset, it is expected that approximately 30 cells are over this threshold if the AUC is normally distributed. |
nCores |
Number of cores to use for computation. |
smallestPopPercent |
Size (percentage) of the smallest population of cells expected. Used to calculate some of the thresholds. |
plotHist |
Whether to plot the AUC histograms. (TRUE / FALSE) |
densAdjust |
Parameter for the density curve. (See |
assignCells |
Return the list of cells that pass the automatically selected threshold? (TRUE/FALSE) |
nBreaks |
Number of bars to plot in the histograms. |
verbose |
Should the function show progress messages? (TRUE / FALSE) |
aucellThresholds |
For aux functions: Output from AUCell_exploreThresholds |
To ease the selection of an assignment theshold, this function adjusts the AUCs of each gene-set to several distributions and calculates possible thresholds:
minimumDens
(plot in Blue):
Inflection point of the density curve.
This is usually a good option for the ideal situation
with bimodal distributions.
To avoid false positives, by default this threshold will not be chosen if the second distribution is higher (i.e. the majority of cells have the gene-set "active").
L_k2
(plot in Red): Left distribution,
after adjusting the AUC to a mixture of two distributions.
The threshold is set to the right (prob: 1-(thrP/nCells)
).
Only available if 'mixtools' package is installed.
R_k3
(plot in Pink): Right distribution,
after adjusting the AUC to a mixture of three distributions.
The threshold is set to the left (prob: thrP
).
Only available if 'mixtools' package is installed.
Global_k1
(plot in Grey): "global" distribution
(i.e. mean and standard deviations of all cells).
The threshold is set to the right (prob: 1-(thrP/nCells)
).
The threshold based on the global distribution is ignored from the automatic selection unless the mixed models are overlapping.
Note: If assignCells=TRUE
, the highest threshold is used to select cells.
However, keep in mind that this function is only meant to ease the selection
of the threshold, and we highly recommend to look at the AUC histograms and
adjust the threshold manually if needed.
We recommend to be specially aware on gene-sets with few genes (10-15) and
thresholds that are set extremely low.
List with the following elements for each gene-set:
'aucThr' Thresholds calculated with each method (see 'details' section), and the number of cells that would be assigned using that threshold.
If assignCells=TRUE, the threshold selected automatically is the highest value (in most cases, excluding the global distribution).
'assignment' List of cells that pass the selected AUC threshold (if assignCells=TRUE
)
If plotHist=TRUE
the AUC histogram is also plot,
including the distributions calculated and the corresponding thresholds
in the same color (dashed vertical lines).
The threshold that is automatically selected is shown as a thicker
non-dashed vertical line.
Previous step in the workflow: AUCell_calcAUC
.
See the package vignette for examples and more details:
vignette("AUCell")
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake expression matrix ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) dim(exprMatrix) ################################################## ######### Previous steps in the workflow ######### # Step 1. cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats=FALSE) # Step 2. # (Gene sets: random genes) geneSets <- list(geneSet1=sample(rownames(exprMatrix), 10), geneSet2=sample(rownames(exprMatrix), 5)) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5) ################################################## ############## Step 3: Assign cells ############## # 1. Plot histograms and obtain some pre-computed thresholds # (this example is only meant to show the interface/arguments of the function, # see the vignette for meaningful examples) set.seed(123) par(mfrow=c(1,2)) # Plot is divided into one row and two columns thresholds <- AUCell_exploreThresholds(cells_AUC, plotHist=TRUE) thresholds$geneSet1$aucThr # 2. Obtain cells over a given threshold: names(which(getAUC(cells_AUC)["geneSet1",] > 0.19)) # Alternative 1: assign cells according to the 'automatic' threshold cells_assignment <- AUCell_exploreThresholds(cells_AUC, plotHist=FALSE, assignCells=TRUE) # Cells assigned: getAssignments(cells_assignment) # Threshold applied: getThresholdSelected(cells_assignment) # Alternative 2: choose a threshold manually and assign cells newThresholds = getThresholdSelected(cells_assignment) newThresholds['geneSet1'] = 0.8 newAssignments = AUCell_assignCells(cells_AUC, newThresholds) getAssignments(newAssignments)
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake expression matrix ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) dim(exprMatrix) ################################################## ######### Previous steps in the workflow ######### # Step 1. cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats=FALSE) # Step 2. # (Gene sets: random genes) geneSets <- list(geneSet1=sample(rownames(exprMatrix), 10), geneSet2=sample(rownames(exprMatrix), 5)) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5) ################################################## ############## Step 3: Assign cells ############## # 1. Plot histograms and obtain some pre-computed thresholds # (this example is only meant to show the interface/arguments of the function, # see the vignette for meaningful examples) set.seed(123) par(mfrow=c(1,2)) # Plot is divided into one row and two columns thresholds <- AUCell_exploreThresholds(cells_AUC, plotHist=TRUE) thresholds$geneSet1$aucThr # 2. Obtain cells over a given threshold: names(which(getAUC(cells_AUC)["geneSet1",] > 0.19)) # Alternative 1: assign cells according to the 'automatic' threshold cells_assignment <- AUCell_exploreThresholds(cells_AUC, plotHist=FALSE, assignCells=TRUE) # Cells assigned: getAssignments(cells_assignment) # Threshold applied: getThresholdSelected(cells_assignment) # Alternative 2: choose a threshold manually and assign cells newThresholds = getThresholdSelected(cells_assignment) newThresholds['geneSet1'] = 0.8 newAssignments = AUCell_assignCells(cells_AUC, newThresholds) getAssignments(newAssignments)
Plots the distribution of AUC across the cells (for each gene-set) as an histogram.
AUCell_plotHist( cellsAUC, aucThr = max(cellsAUC), nBreaks = 100, onColor = "dodgerblue4", offColor = "slategray2", ... )
AUCell_plotHist( cellsAUC, aucThr = max(cellsAUC), nBreaks = 100, onColor = "dodgerblue4", offColor = "slategray2", ... )
cellsAUC |
Subset of the object returned by |
aucThr |
AUC value planned to use as threshold (to make sure the X axis includes it), if any. Otherwise, the X axis extends to cover only the AUC values plotted. |
nBreaks |
Number of 'bars' to plot (breaks argument for hist function). |
onColor |
Color for the bars that pass the AUC threshold |
offColor |
Color for the bars that do not pass the AUC threshold |
... |
Other arguments to pass to |
List of histogram objects (invisible).
See the package vignette for examples and more details:
vignette("AUCell")
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake expression matrix ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) dim(exprMatrix) ################################################## ############# Begining of the workflow ########### # Step 1. cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats=FALSE) # Step 2. # (Gene set: 10 random genes) genes <- sample(rownames(exprMatrix), 10) geneSets <- list(geneSet1=genes) # (aucMaxRank=5 to run with this fake example, it will return 'high' AUC values) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5) ################################################## # Plot histogram: AUCell_plotHist(cells_AUC["geneSet1",], nBreaks=10)
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake expression matrix ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) dim(exprMatrix) ################################################## ############# Begining of the workflow ########### # Step 1. cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats=FALSE) # Step 2. # (Gene set: 10 random genes) genes <- sample(rownames(exprMatrix), 10) geneSets <- list(geneSet1=genes) # (aucMaxRank=5 to run with this fake example, it will return 'high' AUC values) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5) ################################################## # Plot histogram: AUCell_plotHist(cells_AUC["geneSet1",], nBreaks=10)
Plots the AUC histogram and t-SNE coloured by AUC, binary activity and TF expression
AUCell_plotTSNE( tSNE, exprMat = NULL, cellsAUC = NULL, thresholds = NULL, reorderGeneSets = FALSE, cex = 1, alphaOn = 1, alphaOff = 0.2, borderColor = adjustcolor("lightgray", alpha.f = 0.1), offColor = "lightgray", plots = c("histogram", "binaryAUC", "AUC", "expression"), exprCols = c("goldenrod1", "darkorange", "brown"), asPNG = FALSE, ... )
AUCell_plotTSNE( tSNE, exprMat = NULL, cellsAUC = NULL, thresholds = NULL, reorderGeneSets = FALSE, cex = 1, alphaOn = 1, alphaOff = 0.2, borderColor = adjustcolor("lightgray", alpha.f = 0.1), offColor = "lightgray", plots = c("histogram", "binaryAUC", "AUC", "expression"), exprCols = c("goldenrod1", "darkorange", "brown"), asPNG = FALSE, ... )
tSNE |
t-SNE coordinates (e.g. |
exprMat |
Expression matrix |
cellsAUC |
AUC (as returned by calcAUC) |
thresholds |
Thresholds returned by AUCell |
reorderGeneSets |
Whether to reorder the gene sets based on AUC similarity |
cex |
Scaling factor for the dots in the scatterplot |
alphaOn |
Transparency for the dots representing "active" cells |
alphaOff |
Transparency for the dots representing "inactive" cells |
borderColor |
Border color for the dots (scatterplot) |
offColor |
Color for the dots representing "inactive" cells |
plots |
Which plots to generate? Select one or multiple: |
exprCols |
Color scale for the expression |
asPNG |
Output each individual plot in a .png file? (can also be a directory) |
... |
Other arguments to pass to |
To avoid calculating thresholds, set thresholds to FALSE
Returns invisible: cells_trhAssignment
List of vignettes included in the package: vignette(package="AUCell")
###### # Fake run of AUCell set.seed(123) exprMatrix <- matrix( data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) geneSets <- list(geneSet1=sample(rownames(exprMatrix), 10), geneSet2=sample(rownames(exprMatrix), 5)) cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats = FALSE) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5, nCores=1) selectedThresholds <- rowMeans(getAUC(cells_AUC)) cellsTsne<- Rtsne::Rtsne(t(exprMatrix),max_iter = 10)$Y # cellsTsne<- tsne::tsne(t(exprMatrix),max_iter = 10) rownames(cellsTsne) <- colnames(exprMatrix) ###### par(mfrow=c(2,3)) thrs <- AUCell_plotTSNE(tSNE=cellsTsne, exprMat=NULL, cellsAUC=cells_AUC, thresholds=selectedThresholds, plots = c("histogram", "binaryAUC", "AUC")) ##### # Color based on the known phenodata: cellInfo <- data.frame(cellType1=sample(LETTERS[1:3],ncol(exprMatrix), replace=TRUE), cellType2=sample(letters[5:7],ncol(exprMatrix), replace=TRUE), nGenes=abs(rnorm(ncol(exprMatrix))), row.names=colnames(exprMatrix)) colVars <- list(cellType2=setNames(c("skyblue","magenta", "darkorange"),letters[5:7])) # dev.off() plotTsne_cellProps(cellsTsne, cellInfo, colVars=colVars)
###### # Fake run of AUCell set.seed(123) exprMatrix <- matrix( data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) geneSets <- list(geneSet1=sample(rownames(exprMatrix), 10), geneSet2=sample(rownames(exprMatrix), 5)) cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats = FALSE) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5, nCores=1) selectedThresholds <- rowMeans(getAUC(cells_AUC)) cellsTsne<- Rtsne::Rtsne(t(exprMatrix),max_iter = 10)$Y # cellsTsne<- tsne::tsne(t(exprMatrix),max_iter = 10) rownames(cellsTsne) <- colnames(exprMatrix) ###### par(mfrow=c(2,3)) thrs <- AUCell_plotTSNE(tSNE=cellsTsne, exprMat=NULL, cellsAUC=cells_AUC, thresholds=selectedThresholds, plots = c("histogram", "binaryAUC", "AUC")) ##### # Color based on the known phenodata: cellInfo <- data.frame(cellType1=sample(LETTERS[1:3],ncol(exprMatrix), replace=TRUE), cellType2=sample(letters[5:7],ncol(exprMatrix), replace=TRUE), nGenes=abs(rnorm(ncol(exprMatrix))), row.names=colnames(exprMatrix)) colVars <- list(cellType2=setNames(c("skyblue","magenta", "darkorange"),letters[5:7])) # dev.off() plotTsne_cellProps(cellsTsne, cellInfo, colVars=colVars)
Runs AUCell (calculates the ranking + score genesets)
AUCell_run( exprMat, geneSets, featureType = "genes", keepZeroesAsNA = FALSE, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(exprMat)), BPPARAM = NULL, ... ) ## S4 method for signature 'dgCMatrix' AUCell_run( exprMat, geneSets, featureType = "genes", keepZeroesAsNA = FALSE, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(exprMat)), BPPARAM = NULL ) ## S4 method for signature 'matrix' AUCell_run( exprMat, geneSets, featureType = "genes", keepZeroesAsNA = FALSE, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(exprMat)), BPPARAM = NULL ) ## S4 method for signature 'SummarizedExperiment' AUCell_run( exprMat, geneSets, featureType = "genes", keepZeroesAsNA = FALSE, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(exprMat)), BPPARAM = NULL, assayName = NULL )
AUCell_run( exprMat, geneSets, featureType = "genes", keepZeroesAsNA = FALSE, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(exprMat)), BPPARAM = NULL, ... ) ## S4 method for signature 'dgCMatrix' AUCell_run( exprMat, geneSets, featureType = "genes", keepZeroesAsNA = FALSE, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(exprMat)), BPPARAM = NULL ) ## S4 method for signature 'matrix' AUCell_run( exprMat, geneSets, featureType = "genes", keepZeroesAsNA = FALSE, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(exprMat)), BPPARAM = NULL ) ## S4 method for signature 'SummarizedExperiment' AUCell_run( exprMat, geneSets, featureType = "genes", keepZeroesAsNA = FALSE, normAUC = TRUE, aucMaxRank = ceiling(0.05 * nrow(exprMat)), BPPARAM = NULL, assayName = NULL )
exprMat |
Expression matrix (genes/regions as rows, cells as columns) The expression matrix can also be provided as one of the R/Bioconductor classes:
|
geneSets |
List of gene-sets (or signatures) to test in the cells.
The gene-sets should be provided as |
featureType |
Name for the rows (e.g. "genes"). Only for naming the rankings, not used internally. |
keepZeroesAsNA |
Convert zeroes to NA instead of locating randomly at the end of the ranking. |
normAUC |
Whether to normalize the maximum possible AUC to 1 (Default: TRUE). |
aucMaxRank |
Threshold to calculate the AUC (see 'details' section) |
BPPARAM |
Set to use multiple cores. Only used if 'splitByBlocks=TRUE' |
... |
Other arguments |
assayName |
Name of the assay containing the expression matrix (e.g. in SingleCellExperiment objects) |
In a simplified way, the AUC value represents the fraction of genes, within the top X genes in the ranking, that are included in the signature. The parameter 'aucMaxRank' allows to modify the number of genes (maximum ranking) that is used to perform this computation. By default, it is set to 5% of the total number of genes in the rankings. Common values may range from 1 to 20%.
Matrix with the AUC values (gene-sets as rows, cells as columns).
Includes AUCell_buildRankings
and AUCell_calcAUC
.
Next step in the workflow: AUCell_exploreThresholds
.
See the package vignette for examples and more details:
vignette("AUCell")
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake expression matrix ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) exprMatrix <- as(exprMatrix, "dgCMatrix") # In this example we use two gene sets: 10 and 5 random genes # (see other formatting examples at the end) fewGenes <- sample(rownames(exprMatrix), 10) otherGenes <- sample(rownames(exprMatrix), 5) geneSets <- list(geneSet1=fewGenes, geneSet2=otherGenes) geneSets # Calculate AUCell score for the genes in the sets # To be able to run this fake example (which contain only 20 genes), # we use aucMaxRank=5 (top 25% of the genes in the ranking) cells_AUC <- AUCell_run(exprMatrix, geneSets, aucMaxRank=5) ## To run in paralell: # cells_AUC <- AUCell_run(exprMatrix, geneSets, aucMaxRank=5, # BPPARAM=BiocParallel::MulticoreParam(5)) # Format of the output: cells_AUC # To subset & access the AUC slot (as matrix): cells_AUC[1:2,] cells_AUC[,3:4] getAUC(cells_AUC)[,1:5] # These methods are also available: dim(cells_AUC) nrow(cells_AUC) ncol(cells_AUC) colnames(cells_AUC)[1:4] rownames(cells_AUC) ######################################################### # Alternatives for the input of gene sets: # a) Character vector (i.e. only one gene-set) # It will take the default name 'geneSet' fewGenes test <- AUCell_run(exprMatrix, fewGenes, aucMaxRank=5) # b) List geneSets <- list(geneSet1=fewGenes, geneSet2=otherGenes) geneSets test <- AUCell_run(exprMatrix, fewGenes, aucMaxRank=5) # c) GeneSet object (from GSEABase) library(GSEABase) geneSetOne <- GeneSet(fewGenes, setName="geneSetOne") geneSetOne test <- AUCell_run(exprMatrix, fewGenes, aucMaxRank=5) # d) GeneSetCollection object (from GSEABase) geneSetTwo <- GeneSet(otherGenes, setName="geneSetTwo") geneSets <- GeneSetCollection(geneSetOne, geneSetTwo) geneSets test <- AUCell_run(exprMatrix, fewGenes, aucMaxRank=5)
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake expression matrix ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) exprMatrix <- as(exprMatrix, "dgCMatrix") # In this example we use two gene sets: 10 and 5 random genes # (see other formatting examples at the end) fewGenes <- sample(rownames(exprMatrix), 10) otherGenes <- sample(rownames(exprMatrix), 5) geneSets <- list(geneSet1=fewGenes, geneSet2=otherGenes) geneSets # Calculate AUCell score for the genes in the sets # To be able to run this fake example (which contain only 20 genes), # we use aucMaxRank=5 (top 25% of the genes in the ranking) cells_AUC <- AUCell_run(exprMatrix, geneSets, aucMaxRank=5) ## To run in paralell: # cells_AUC <- AUCell_run(exprMatrix, geneSets, aucMaxRank=5, # BPPARAM=BiocParallel::MulticoreParam(5)) # Format of the output: cells_AUC # To subset & access the AUC slot (as matrix): cells_AUC[1:2,] cells_AUC[,3:4] getAUC(cells_AUC)[,1:5] # These methods are also available: dim(cells_AUC) nrow(cells_AUC) ncol(cells_AUC) colnames(cells_AUC)[1:4] rownames(cells_AUC) ######################################################### # Alternatives for the input of gene sets: # a) Character vector (i.e. only one gene-set) # It will take the default name 'geneSet' fewGenes test <- AUCell_run(exprMatrix, fewGenes, aucMaxRank=5) # b) List geneSets <- list(geneSet1=fewGenes, geneSet2=otherGenes) geneSets test <- AUCell_run(exprMatrix, fewGenes, aucMaxRank=5) # c) GeneSet object (from GSEABase) library(GSEABase) geneSetOne <- GeneSet(fewGenes, setName="geneSetOne") geneSetOne test <- AUCell_run(exprMatrix, fewGenes, aucMaxRank=5) # d) GeneSetCollection object (from GSEABase) geneSetTwo <- GeneSet(otherGenes, setName="geneSetTwo") geneSets <- GeneSetCollection(geneSetOne, geneSetTwo) geneSets test <- AUCell_run(exprMatrix, fewGenes, aucMaxRank=5)
This class extends SummarizedExperiment to contain the AUC matrix and cell rankings (as 'assays').
The results are stored in the assays slot, but they can be accessed through the regular methods (i.e. nrow, rownames... )
Types:
- "AUC": The assays contains the AUC for the gene-sets (or region-sets) & cells.
- "ranking": The assays contains the gene rankings for each cell.
## S4 method for signature 'aucellResults' show(object) getAUC(object) ## S4 method for signature 'aucellResults' getAUC(object) getRanking(object) ## S4 method for signature 'aucellResults' getRanking(object) ## S4 method for signature 'aucellResults' cbind(..., deparse.level = 1) ## S4 method for signature 'aucellResults' rbind(..., deparse.level = 1)
## S4 method for signature 'aucellResults' show(object) getAUC(object) ## S4 method for signature 'aucellResults' getAUC(object) getRanking(object) ## S4 method for signature 'aucellResults' getRanking(object) ## S4 method for signature 'aucellResults' cbind(..., deparse.level = 1) ## S4 method for signature 'aucellResults' rbind(..., deparse.level = 1)
object |
Results from |
... |
(Only for cbind) or |
deparse.level |
(Only for cbind) |
show: Prints a summary of the object
getAUC: Returns the matrix containing the AUC
getRanking: Returns the matrix containing the rankings
cbind: Combines objects by columns (cbind on assays
); other other slots are conserved from the first object provided as argument.
Both, ranking and AUC are calculated by column (cell or sample). Therefore, it is fine to merge objects as long as they come from equivalent datasets (and keep same genes/genesets, etc...)
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake run of AUCell ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) dim(exprMatrix) # Running AUCell cells_rankings <- AUCell_buildRankings(exprMatrix) fewGenes <- sample(rownames(exprMatrix), 10) otherGenes <- sample(rownames(exprMatrix), 5) geneSets <- list(geneSet1=fewGenes, geneSet2=otherGenes) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5, nCores=1) ############################################## #Exploring the output: cells_AUC class(cells_AUC) # Extracting the AUC matrix: getAUC(cells_AUC)[,1:5] # Subsetting and regular manipulation methods are also available: cells_AUC[1:2,] cells_AUC[,3:4] dim(cells_AUC) nrow(cells_AUC) ncol(cells_AUC) colnames(cells_AUC) rownames(cells_AUC) ### Merging 2 objects (ranking or AUC): sample1 <- cells_AUC[,10:20] sample2 <- cells_AUC[,100:140] cbind(sample1, sample2)
# This example is run using a fake expression matrix. # Therefore, the output will be meaningless. ############# Fake run of AUCell ############# set.seed(123) exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) dim(exprMatrix) # Running AUCell cells_rankings <- AUCell_buildRankings(exprMatrix) fewGenes <- sample(rownames(exprMatrix), 10) otherGenes <- sample(rownames(exprMatrix), 5) geneSets <- list(geneSet1=fewGenes, geneSet2=otherGenes) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5, nCores=1) ############################################## #Exploring the output: cells_AUC class(cells_AUC) # Extracting the AUC matrix: getAUC(cells_AUC)[,1:5] # Subsetting and regular manipulation methods are also available: cells_AUC[1:2,] cells_AUC[,3:4] dim(cells_AUC) nrow(cells_AUC) ncol(cells_AUC) colnames(cells_AUC) rownames(cells_AUC) ### Merging 2 objects (ranking or AUC): sample1 <- cells_AUC[,10:20] sample2 <- cells_AUC[,100:140] cbind(sample1, sample2)
Returns the gene set name (i.e. selects the given pattern)
getSetNames(aucMat, patterns, startChr = "^", endChr = " |_")
getSetNames(aucMat, patterns, startChr = "^", endChr = " |_")
aucMat |
AUC matrix |
patterns |
patterns |
startChr |
Character to indicate the start (typically "^") |
endChr |
Character at the end of the gene set name, i.e. space or "_" after a transcription factor name |
Returns the gene set name (i.e. selects the given pattern)
Functions to manipulate GeneSet and GeneSetCollection objects (from package GSEABase)
nGenes(geneSet) ## S4 method for signature 'GeneSet' nGenes(geneSet) ## S4 method for signature 'GeneSetCollection' nGenes(geneSet) subsetGeneSets(geneSets, geneNames) ## S4 method for signature 'GeneSetCollection' subsetGeneSets(geneSets, geneNames) setGeneSetNames(geneSets, newNames) ## S4 method for signature 'GeneSetCollection' setGeneSetNames(geneSets, newNames)
nGenes(geneSet) ## S4 method for signature 'GeneSet' nGenes(geneSet) ## S4 method for signature 'GeneSetCollection' nGenes(geneSet) subsetGeneSets(geneSets, geneNames) ## S4 method for signature 'GeneSetCollection' subsetGeneSets(geneSets, geneNames) setGeneSetNames(geneSets, newNames) ## S4 method for signature 'GeneSetCollection' setGeneSetNames(geneSets, newNames)
geneSet |
One gene-set ( |
geneSets |
Gene-set collection
( |
geneNames |
Gene names (for subset) |
newNames |
New names (to assign to the gene sets) |
- **nGenes()**: provides the number of genes in the gene-set, or each of the gene-sets in a collection
- **subsetGeneSets()**: Subsets each of the gene-sets in a collection to contain only the genes inthe given list. Equivalent to intersect(), but keeping the original gene-set name.
- **setGeneSetNames()**: Modifies the name of each gene-set in a collection
library(GSEABase) genes_1 <- GeneSet(paste("Gene", 1:20, sep=""), setName="geneSet1") genes_2 <- GeneSet(paste("Gene", 18:22, sep=""), setName="geneSet2") geneSets <- GeneSetCollection(genes_1, genes_2) nGenes(genes_1) nGenes(geneSets) subsetGeneSets(geneSets, paste("Gene", 15:20, sep="")) geneSets_newNames <- setGeneSetNames(geneSets, c("one", "two")) names(geneSets_newNames)
library(GSEABase) genes_1 <- GeneSet(paste("Gene", 1:20, sep=""), setName="geneSet1") genes_2 <- GeneSet(paste("Gene", 18:22, sep=""), setName="geneSet2") geneSets <- GeneSetCollection(genes_1, genes_2) nGenes(genes_1) nGenes(geneSets) subsetGeneSets(geneSets, paste("Gene", 15:20, sep="")) geneSets_newNames <- setGeneSetNames(geneSets, c("one", "two")) names(geneSets_newNames)
Reorder the gene-sets based on AUC similarity
orderAUC(auc)
orderAUC(auc)
auc |
AUC (as returned by calcAUC) |
gene-set names in the suggested order
# cellsAUC <- cellsAUC[orderAUC(cellsAUC),]
# cellsAUC <- cellsAUC[orderAUC(cellsAUC),]
Colors the embeddings (t-SNE/Umap) based on the activity of 3 (groups of) geneSets
plotEmb_rgb( aucMat, embedding, geneSetsByCol, aucType = "AUC", aucMaxContrast = 0.8, offColor = "#c0c0c030", showPlot = TRUE, showLegend = TRUE, ... )
plotEmb_rgb( aucMat, embedding, geneSetsByCol, aucType = "AUC", aucMaxContrast = 0.8, offColor = "#c0c0c030", showPlot = TRUE, showLegend = TRUE, ... )
aucMat |
AUC matrix (continuous or binary) |
embedding |
AUC matrix (continuous or binary) |
geneSetsByCol |
Gene sets to plot |
aucType |
"AUC" or "Binary" |
aucMaxContrast |
To increase the AUC contrast decrease the value. |
offColor |
Color por the cells completelly off. To deactivate (color as black), set to NULL. |
showPlot |
Whether to plot the coloured embeddings. |
showLegend |
Whether to plot add a legend to the plot. |
... |
Other arguments to pass to the |
The cell colors (invisible)
Plots a histogram and boxplot for the number of genes detected in each cell.
plotGeneCount(exprMat, plotStats = TRUE, verbose = TRUE)
plotGeneCount(exprMat, plotStats = TRUE, verbose = TRUE)
exprMat |
Expression matrix (genes as rows, cells as columns) |
plotStats |
Logical. If true, it plots the histogram, otherwise only calculates the percentages of genes detected. |
verbose |
Should the function show progress messages? (TRUE / FALSE) |
It is important to check that most cells have at least the number of expressed/detected genes that are going to be used to calculate the AUC ('aucMaxRank' in 'calcAUC()'). The histogram provided by 'AUCell_buildRankings()' allows to quickly check this distribution. 'plotGeneCount(exprMatrix)' allows to obtain only the plot before building the rankings.
Quantiles with the number of genes detected by cell (invisible). his result is also printed if verbose=TRUE.
See the package vignette for more details: vignette("AUCell")
### (Fake expression matrix) exprMatrix <- matrix(sample(c(rep(0, 500), sample(1:3, 500, replace=TRUE))), nrow=20) rownames(exprMatrix) <- paste("Gene", 1:20, sep="") colnames(exprMatrix) <- paste("Sample", 1:50, sep="") ### plotGeneCount(exprMatrix) title(sub="Fake expression matrix")
### (Fake expression matrix) exprMatrix <- matrix(sample(c(rep(0, 500), sample(1:3, 500, replace=TRUE))), nrow=20) rownames(exprMatrix) <- paste("Gene", 1:20, sep="") colnames(exprMatrix) <- paste("Sample", 1:50, sep="") ### plotGeneCount(exprMatrix) title(sub="Fake expression matrix")
Plots the t-SNE coloured based on the known the cell properties
plotTsne_cellProps( tSNE, cellInfo, colVars = NULL, cex = 1, sub = "", gradientCols = c("yellow", "orange", "red"), showLegend = TRUE )
plotTsne_cellProps( tSNE, cellInfo, colVars = NULL, cex = 1, sub = "", gradientCols = c("yellow", "orange", "red"), showLegend = TRUE )
tSNE |
t-SNE coordinates (e.g. |
cellInfo |
Dataframe with cell phenodata |
colVars |
Colors for the cell properties (optional) |
cex |
Scaling factor for the dots in the scatterplot |
sub |
Subtitle (e.g. tSNE type) |
gradientCols |
Gradient colors for numerical variables |
showLegend |
Whether to show the legend |
Plots the t-SNE
###### # Fake run of AUCell set.seed(123) exprMatrix <- matrix( data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) geneSets <- list(geneSet1=sample(rownames(exprMatrix), 10), geneSet2=sample(rownames(exprMatrix), 5)) cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats = FALSE) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5, nCores=1) selectedThresholds <- rowMeans(getAUC(cells_AUC)) cellsTsne<- Rtsne::Rtsne(t(exprMatrix),max_iter = 10)$Y # cellsTsne<- tsne::tsne(t(exprMatrix),max_iter = 10) rownames(cellsTsne) <- colnames(exprMatrix) ###### par(mfrow=c(2,3)) thrs <- AUCell_plotTSNE(tSNE=cellsTsne, exprMat=NULL, cellsAUC=cells_AUC, thresholds=selectedThresholds, plots = c("histogram", "binaryAUC", "AUC")) ##### # Color based on the known phenodata: cellInfo <- data.frame(cellType1=sample(LETTERS[1:3],ncol(exprMatrix), replace=TRUE), cellType2=sample(letters[5:7],ncol(exprMatrix), replace=TRUE), nGenes=abs(rnorm(ncol(exprMatrix))), row.names=colnames(exprMatrix)) colVars <- list(cellType2=setNames(c("skyblue","magenta", "darkorange"),letters[5:7])) # dev.off() plotTsne_cellProps(cellsTsne, cellInfo, colVars=colVars)
###### # Fake run of AUCell set.seed(123) exprMatrix <- matrix( data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))), nrow=20, dimnames=list(paste("Gene", 1:20, sep=""), paste("Cell", 1:500, sep=""))) geneSets <- list(geneSet1=sample(rownames(exprMatrix), 10), geneSet2=sample(rownames(exprMatrix), 5)) cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats = FALSE) cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5, nCores=1) selectedThresholds <- rowMeans(getAUC(cells_AUC)) cellsTsne<- Rtsne::Rtsne(t(exprMatrix),max_iter = 10)$Y # cellsTsne<- tsne::tsne(t(exprMatrix),max_iter = 10) rownames(cellsTsne) <- colnames(exprMatrix) ###### par(mfrow=c(2,3)) thrs <- AUCell_plotTSNE(tSNE=cellsTsne, exprMat=NULL, cellsAUC=cells_AUC, thresholds=selectedThresholds, plots = c("histogram", "binaryAUC", "AUC")) ##### # Color based on the known phenodata: cellInfo <- data.frame(cellType1=sample(LETTERS[1:3],ncol(exprMatrix), replace=TRUE), cellType2=sample(letters[5:7],ncol(exprMatrix), replace=TRUE), nGenes=abs(rnorm(ncol(exprMatrix))), row.names=colnames(exprMatrix)) colVars <- list(cellType2=setNames(c("skyblue","magenta", "darkorange"),letters[5:7])) # dev.off() plotTsne_cellProps(cellsTsne, cellInfo, colVars=colVars)
Updates the AUC scores provided by AUCell from a previous version.
updateAucellResults(oldAucObject, objectType = "AUC")
updateAucellResults(oldAucObject, objectType = "AUC")
oldAucObject |
Object to update |
objectType |
Either "AUC" or "ranking" indicating the object type |
Updated version of the object as aucellResults
.
oldAuc <- matrix(data=1:2000, nrow=50, ncol=40) updateAucellResults(oldAuc)
oldAuc <- matrix(data=1:2000, nrow=50, ncol=40) updateAucellResults(oldAuc)