Title: | Differential cell-type-specific allelic imbalance |
---|---|
Description: | Airpart identifies sets of genes displaying differential cell-type-specific allelic imbalance across cell types or states, utilizing single-cell allelic counts. It makes use of a generalized fused lasso with binomial observations of allelic counts to partition cell types by their allelic imbalance. Alternatively, a nonparametric method for partitioning cell types is offered. The package includes a number of visualizations and quality control functions for examining single cell allelic imbalance datasets. |
Authors: | Wancen Mu [aut, cre] , Michael Love [aut, ctb] |
Maintainer: | Wancen Mu <[email protected]> |
License: | GPL-2 |
Version: | 1.15.0 |
Built: | 2024-10-30 03:28:16 UTC |
Source: | https://github.com/bioc/airpart |
This function performs additional inference on the allelic ratio across cell types, giving posterior mean and credible intervals per cell type. A Cauchy prior is centered for each cell type on the allelic ratio from the fused lasso across all genes in the gene cluster (or using a weighted means if the fused lasso is not provided).
allelicRatio(sce, formula, nogroup = FALSE, level = 0.95, DAItest = FALSE, ...)
allelicRatio(sce, formula, nogroup = FALSE, level = 0.95, DAItest = FALSE, ...)
sce |
A SingleCellExperiment containing assays
( |
formula |
The same |
nogroup |
Indicate whether there is previous group step. Default is FALSE represents either Generalized fused lasso or nonparametric method is used to derive partition. nogroup == TRUE means the hypothesis that groups of cell types sharing a common regulatory context is not valid |
level |
the level of credible interval (default is 0.95) |
DAItest |
Indicate whether to do Likelihood Ratio test on differential allelic imbalance(DAI) or equivalent to heterogeneity. |
... |
arguments to pass to |
posterior mean ("ar"
) for allelic ratio
estimate is returned in the rowData for each cell type,
as well as the "s"
value, "fsr"
false sign rate and
credible interval ("lower"
and "upper"
). One can use "fsr"
< 0.005 or
credible intervals contain 0.5 or not for AI test significance. "p.value"
shows DAI test result
and "adj.p.value"
is false discovery rate corrected p values.
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = seq_len(4)) sce_sub <- wilcoxExt(sce, genecluster = 1) sce_sub <- allelicRatio(sce_sub, DAItest = TRUE)
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = seq_len(4)) sce_sub <- wilcoxExt(sce, genecluster = 1) sce_sub <- allelicRatio(sce_sub, DAItest = TRUE)
Quality control on cells
cellQC( sce, spike, threshold = 0, mad_sum = 5, mad_detected = 3, mad_spikegenes = 5 )
cellQC( sce, spike, threshold = 0, mad_sum = 5, mad_detected = 3, mad_spikegenes = 5 )
sce |
SingleCellExperiment with |
spike |
the character name of spike genes.
If missing, |
threshold |
A numeric scalar specifying the threshold above which a gene is considered to be detected. |
mad_sum |
A numeric scalar specifying exceed how many median absolute deviations from the median log10-counts a cell is considered to be filtered out. Default is 5. |
mad_detected |
A numeric scalar specifying exceed how many median absolute deviations from the median detected features a cell is considered to be filtered out. Default is 5. |
mad_spikegenes |
A numeric scalar specifying exceed how many median absolute deviations from the median spike genes expression percentage a cell is considered to be filtered out. Default is 5. |
A DataFrame of QC statistics includes
sum the sum of counts of each cell
detected the number of features above threshold
spikePercent the percentage of counts assignes to spike genes
filter_sum indicate whether log10-counts
within mad_sum
median absolute deviations
from the median log10-counts for the dataset
filter_detected indicate whether features detected by this cell
within mad_detected
median absolute deviations
from the median detected features for the dataset
filter_spike indicate whether percentage expressed by spike genes
within mad_spikegenes
median absolute deviations from the median
spike genes expression percentage for the dataset
sce <- makeSimulatedData() sce <- preprocess(sce) cellQCmetrics <- cellQC(sce) keep_cell <- ( cellQCmetrics$filter_sum | # sufficient features (genes) # sufficient molecules counted cellQCmetrics$filter_detected | # sufficient features expressed compared to spike genes cellQCmetrics$filter_spike ) sce <- sce[, keep_cell] # or manually setting threshold cellQCmetrics <- cellQC(sce, spike = "Ercc", mad_detected = 4, mad_spikegenes = 4 ) keep_cell <- ( cellQCmetrics$sum > 2.4 | cellQCmetrics$detected > 110 )
sce <- makeSimulatedData() sce <- preprocess(sce) cellQCmetrics <- cellQC(sce) keep_cell <- ( cellQCmetrics$filter_sum | # sufficient features (genes) # sufficient molecules counted cellQCmetrics$filter_detected | # sufficient features expressed compared to spike genes cellQCmetrics$filter_spike ) sce <- sce[, keep_cell] # or manually setting threshold cellQCmetrics <- cellQC(sce, spike = "Ercc", mad_detected = 4, mad_spikegenes = 4 ) keep_cell <- ( cellQCmetrics$sum > 2.4 | cellQCmetrics$detected > 110 )
Derive consensus partitions of an ensemble fused lasso partitions.
consensusPart(sce)
consensusPart(sce)
sce |
SingleCellExperiment |
A matrix grouping factor partition is replaced in metadata.
Consensus Partation also stored in colData"part"
.
library(smurf) sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = 1:4) f <- ratio ~ p(x, pen = "gflasso") # formula for the GFL sce_sub <- fusedLasso(sce, formula = f, model = "binomial", genecluster = 1, niter = 2, ncores = 2, se.rule.nct = 3 ) sce_sub <- consensusPart(sce_sub)
library(smurf) sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = 1:4) f <- ratio ~ p(x, pen = "gflasso") # formula for the GFL sce_sub <- fusedLasso(sce, formula = f, model = "binomial", genecluster = 1, niter = 2, ncores = 2, se.rule.nct = 3 ) sce_sub <- consensusPart(sce_sub)
Estimate overdispersion parameter of a beta-binomial
estDisp(sce, genecluster, type = c("plot", "values"))
estDisp(sce, genecluster, type = c("plot", "values"))
sce |
SingleCellExperiment with |
genecluster |
the gene cluster for which to estimate the over-dispersion parameter. Default is the cluster with the most cells |
type |
whether to output the over-dispersion estimates as a plot or a value |
A ggplot object of the dispersion estimates over the mean, or a data.frame of the mean and dispersion estimates (theta)
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = seq_len(4)) estDisp(sce, genecluster = 1)
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = seq_len(4)) estDisp(sce, genecluster = 1)
results extracts a result table from an airpart analysis giving posterior allelic ratio estimates, s values, false sign rate(fsr), upper confidence interval and lower confidence interval.
extractResult(sce, estimates = c("ar", "svalue", "fsr", "lower", "upper"))
extractResult(sce, estimates = c("ar", "svalue", "fsr", "lower", "upper"))
sce |
SingleCellExperiment |
estimates |
the estimates want to be extracted. Default is allelic ratio estimates,
can be |
a DataFrame of estimates
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = 1:4) sce_sub <- wilcoxExt(sce, genecluster = 1) sce_sub <- allelicRatio(sce_sub) ar <- extractResult(sce_sub) ar
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = 1:4) sce_sub <- wilcoxExt(sce, genecluster = 1) sce_sub <- allelicRatio(sce_sub) ar <- extractResult(sce_sub) ar
Quality control on features
featureQC(sce, spike, detection_limit = 1, threshold = 0.25, sd = 0.03, pc = 2)
featureQC(sce, spike, detection_limit = 1, threshold = 0.25, sd = 0.03, pc = 2)
sce |
SingleCellExperiment with |
spike |
the character name of spike genes. The default is |
detection_limit |
Numeric scalar providing the value above which observations are deemed to be expressed. |
threshold |
A numeric scalar specifying the threshold above which percentage of cells expressed within each cell type. Default is 0.25 |
sd |
A numeric scalar specifying the cell type weighted allelic ratio mean standard deviation threshold above which are interested features with highly variation. Default is 0.03 |
pc |
pseudocount in the |
A DataFrame of QC statistics includes
filter_celltype indicate whether genes expressed in more than
threshold
cells for all cell types
sd read counts standard deviation for each feature
filter_sd indicate whether gene standard deviation
exceed sd
filter_spike indicate no spike genes
sce <- makeSimulatedData() sce <- preprocess(sce) featureQCmetric <- featureQC(sce) keep_feature <- (featureQCmetric$filter_celltype & featureQCmetric$filter_sd & featureQCmetric$filter_spike) sce <- sce[keep_feature, ] # or manually setting threshold featureQCmetric <- featureQC(sce, spike = "Ercc", threshold = 0.25, sd = 0.03, pc = 2 ) keep_feature <- (featureQCmetric$filter_celltype & featureQCmetric$sd > 0.02)
sce <- makeSimulatedData() sce <- preprocess(sce) featureQCmetric <- featureQC(sce) keep_feature <- (featureQCmetric$filter_celltype & featureQCmetric$filter_sd & featureQCmetric$filter_spike) sce <- sce[keep_feature, ] # or manually setting threshold featureQCmetric <- featureQC(sce, spike = "Ercc", threshold = 0.25, sd = 0.03, pc = 2 ) keep_feature <- (featureQCmetric$filter_celltype & featureQCmetric$sd > 0.02)
Fits generalized fused lasso with either binomial(link="logit")
or Gaussian likelihood, leveraging functions from the
smurf
package.
fusedLasso( sce, formula, model = c("binomial", "gaussian"), genecluster, niter = 1, pen.weights, lambda = "cv1se.dev", k = 5, adj.matrix, lambda.length = 25L, se.rule.nct = 8, se.rule.mult = 0.5, ... )
fusedLasso( sce, formula, model = c("binomial", "gaussian"), genecluster, niter = 1, pen.weights, lambda = "cv1se.dev", k = 5, adj.matrix, lambda.length = 25L, se.rule.nct = 8, se.rule.mult = 0.5, ... )
sce |
A SingleCellExperiment containing assays ( |
formula |
A |
model |
Either |
genecluster |
which gene cluster to run the fused lasso on.
Usually one first identifies an interesting gene cluster pattern by
|
niter |
number of iterations to run; recommended to run 5 times if allelic ratio differences across cell types are within [0.05, 0.1] |
pen.weights |
argument as described in |
lambda |
argument as described in |
k |
number of cross-validation folds |
adj.matrix |
argument as described in |
lambda.length |
argument as described in |
se.rule.nct |
the number of cell types to trigger a different SE-based rule than 1 SE
(to prioritize larger models, less fusing,
good for detecting smaller, e.g. 0.05, allelic ratio differences).
When the number of cell types is less than or equal to this value,
|
se.rule.mult |
the multiplier of the SE in determining the lambda:
the chosen lambda is within |
... |
additional arguments passed to |
Usually, we used a Generalized Fused Lasso penalty for the cell states in order to regularize all possible coefficient differences. Another possibility would be to use the Graph-Guided Fused Lasso penalty to only regularize the differences of coefficients of neighboring cell states.
When using a Graph-Guided Fused Lasso penalty, the adjacency matrix corresponding to the graph needs to be provided. The elements of this matrix are zero when two levels are not connected, and one when they are adjacent.
See the package vignette for more details and a complete description of a use case.
A SummarizedExperiment with attached metadata and colData:
a matrix grouping factor partition
and the penalized parameter lambda
are returned in metadata "partition"
and "lambda"
.
Partition and logistic group allelic estimates are stored in
colData: "part"
and "coef"
.
This function leverages the glmsmurf function from the smurf package. For more details see the following manuscript:
Devriendt S, Antonio K, Reynkens T, et al. Sparse regression with multi-type regularized feature modeling[J]. Insurance: Mathematics and Economics, 2021, 96: 248-261.
glmsmurf
,
glmsmurf.control
,
p
, glm
library(S4Vectors) library(smurf) sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = seq_len(4)) f <- ratio ~ p(x, pen = "gflasso") # formula for the GFL sce_sub <- fusedLasso(sce, formula = f, model = "binomial", genecluster = 1, ncores = 1) metadata(sce_sub)$partition metadata(sce_sub)$lambda # can add covariates or `gene` to the formula f2 <- ratio ~ p(x, pen = "gflasso") + gene sce_sub <- fusedLasso(sce[1:5,], formula = f2, model = "binomial", genecluster = 1, ncores = 1) # Suppose we have 4 cell states, if we only want neibouring cell states # to be grouped together with other cell states. Note here the names of # the cell states should be given as row and column names. nct <- nlevels(sce$x) adjmatrix <- makeOffByOneAdjMat(nct) colnames(adjmatrix) <- rownames(adjmatrix) <- levels(sce$x) f <- ratio ~ p(x, pen = "ggflasso") # use graph-guided fused lasso sce_sub <- fusedLasso(sce, formula = f, model = "binomial", genecluster = 1, lambda = 0.5, ncores = 1, adj.matrix = list(x = adjmatrix) ) metadata(sce_sub)$partition
library(S4Vectors) library(smurf) sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = seq_len(4)) f <- ratio ~ p(x, pen = "gflasso") # formula for the GFL sce_sub <- fusedLasso(sce, formula = f, model = "binomial", genecluster = 1, ncores = 1) metadata(sce_sub)$partition metadata(sce_sub)$lambda # can add covariates or `gene` to the formula f2 <- ratio ~ p(x, pen = "gflasso") + gene sce_sub <- fusedLasso(sce[1:5,], formula = f2, model = "binomial", genecluster = 1, ncores = 1) # Suppose we have 4 cell states, if we only want neibouring cell states # to be grouped together with other cell states. Note here the names of # the cell states should be given as row and column names. nct <- nlevels(sce$x) adjmatrix <- makeOffByOneAdjMat(nct) colnames(adjmatrix) <- rownames(adjmatrix) <- levels(sce$x) f <- ratio ~ p(x, pen = "ggflasso") # use graph-guided fused lasso sce_sub <- fusedLasso(sce, formula = f, model = "binomial", genecluster = 1, lambda = 0.5, ncores = 1, adj.matrix = list(x = adjmatrix) ) metadata(sce_sub)$partition
Gene clustering based on allelic ratio matrix with pseudo-count
geneCluster( sce, G, method = c("GMM", "hierarchical"), minClusterSize = 3, plot = TRUE, ... )
geneCluster( sce, G, method = c("GMM", "hierarchical"), minClusterSize = 3, plot = TRUE, ... )
sce |
SingleCellExperiment containing assays |
G |
An integer vector specifying the numbers of clusters for which the BIC is to be calculated. The default is G=c(8, 12, 16, 20, 24). |
method |
the method to do gene clustering. The default is the Gaussian
Mixture Modeling which
is likely to be more accurate. |
minClusterSize |
Minimum cluster size of |
plot |
logical, whether to make a PCA plot |
... |
Catches unused arguments in indirect or list calls via do.call
as described in |
gene cluster IDs are stored in the rowData column cluster
and a table of gene cluster is returned in metadata geneCluster
This function leverages Mclust from the mclust package, or hclust.
For mclust see: Luca Scrucca and Michael Fop and T. Brendan Murphy, Adrian E. Raftery "mclust 5: clustering, classification and density estimation using Gaussian finite mixture models" 2016. The R Journal. doi: 10.32614/RJ-2016-021
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = seq_len(4))
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = seq_len(4))
Draw a forest plot to visualized cell type specific
allelic ratio estimator and confidence interval.
It is based on the forestplot-package's forestplot
function.
makeForest( sce, genepoi, ctpoi = seq_len(nlevels(sce$x)), showtext = FALSE, xticks, boxsize = 0.25, xlab = "Allelic Ratio", col, grid = structure(seq(0.1, 0.9, 0.1), gp = gpar(lty = 2, col = "#CCCCFF")), ... )
makeForest( sce, genepoi, ctpoi = seq_len(nlevels(sce$x)), showtext = FALSE, xticks, boxsize = 0.25, xlab = "Allelic Ratio", col, grid = structure(seq(0.1, 0.9, 0.1), gp = gpar(lty = 2, col = "#CCCCFF")), ... )
sce |
A SingleCellExperiment containing colData allelic ratio estimator in the third column and last two column is the confidence interval. |
genepoi |
the gene position index or gene name vector that want to be plotted. Ordered by increased cell type svalue. Default is the top 40 genes that has minimum svalue in any cell type or all genes if number of genes smaller than 40. |
ctpoi |
the cell type position index that want to be plotted. |
showtext |
indicate whether show the svalue information along the forestplot. |
xticks |
argument as described in |
boxsize |
Override the default box size based on precision |
xlab |
x-axis label. Default is "Allelic Ratio" |
col |
Set the colors for all the elements. See
|
grid |
If you want a discrete gray dashed grid
at the level of the ticks
you can set this parameter to TRUE.
If you set the parameter to a vector of values lines will be drawn at the
corresponding positions.
If you want to specify the |
... |
Passsed on the other argument in
|
generates a forest plot
forestplot
,
fpColors
,
fpShapesGp
, fpLegend
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = 1:4) sce_sub <- wilcoxExt(sce, genecluster = 1) sce_sub <- allelicRatio(sce_sub) makeForest(sce_sub, showtext = TRUE) # if want to change some properties, like ticks position library(forestplot) xticks <- seq(from = 0, to = 1, by = 0.25) xtlab <- rep(c(TRUE, FALSE), length.out = length(xticks)) attr(xticks, "labels") <- xtlab genepoi <- paste0("gene", seq_len(5)) ctpoi <- c(1, 3) makeForest(sce_sub, genepoi, ctpoi, xticks = xticks, col = fpColors(box = c("blue", "red", "black", "darkgreen")) )
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = 1:4) sce_sub <- wilcoxExt(sce, genecluster = 1) sce_sub <- allelicRatio(sce_sub) makeForest(sce_sub, showtext = TRUE) # if want to change some properties, like ticks position library(forestplot) xticks <- seq(from = 0, to = 1, by = 0.25) xtlab <- rep(c(TRUE, FALSE), length.out = length(xticks)) attr(xticks, "labels") <- xtlab genepoi <- paste0("gene", seq_len(5)) ctpoi <- c(1, 3) makeForest(sce_sub, genepoi, ctpoi, xticks = xticks, col = fpColors(box = c("blue", "red", "black", "darkgreen")) )
Plot allelic ratio as heatmap
makeHeatmap( sce, assay = c("ratio_pseudo", "ratio", "counts"), genecluster = NULL, show_row_names = FALSE, order_by_group = TRUE, ... )
makeHeatmap( sce, assay = c("ratio_pseudo", "ratio", "counts"), genecluster = NULL, show_row_names = FALSE, order_by_group = TRUE, ... )
sce |
SingleCellExperiment |
assay |
the assay to be plotted. Choices are |
genecluster |
an integer indicates which gene cluster heatmap want to be returned. |
show_row_names |
show row names or not |
order_by_group |
indicate whether order by group or order by cell types |
... |
Passsed on the other argument in
|
generates a heatmap
set.seed(2021) sce <- makeSimulatedData(p.vec = c(0.3, 0.5, 0.5, 0.3), ncl = 1) sce <- preprocess(sce) # display allelic ratio pattern in whole dataset makeHeatmap(sce) sce <- geneCluster(sce, G = seq_len(4), plot = FALSE) sce_sub <- wilcoxExt(sce, genecluster = 1) # display specific gene cluster partition result makeHeatmap(sce_sub) # display by cell type orders makeHeatmap(sce_sub, order_by_group = FALSE)
set.seed(2021) sce <- makeSimulatedData(p.vec = c(0.3, 0.5, 0.5, 0.3), ncl = 1) sce <- preprocess(sce) # display allelic ratio pattern in whole dataset makeHeatmap(sce) sce <- geneCluster(sce, G = seq_len(4), plot = FALSE) sce_sub <- wilcoxExt(sce, genecluster = 1) # display specific gene cluster partition result makeHeatmap(sce_sub) # display by cell type orders makeHeatmap(sce_sub, order_by_group = FALSE)
To use the Graph-Guided Fused Lasso penalty to only regularize the differences of coefficients of neighboring areas, suitable for time/spatial analysis. The adjacency matrix corresponding to the graph needs to be provided. The elements of this matrix are zero when two levels are not connected, and one when they are adjacent.
makeOffByOneAdjMat(nct)
makeOffByOneAdjMat(nct)
nct |
the number of cell types/states |
If manually input the adjacency matrix, this matrix has to be symmetric and the names of the cell states should be given as row and column names.
sce <- makeSimulatedData() nct <- nlevels(sce$x) adjmatrix <- makeOffByOneAdjMat(nct) colnames(adjmatrix) <- rownames(adjmatrix) <- levels(sce$x)
sce <- makeSimulatedData() nct <- nlevels(sce$x) adjmatrix <- makeOffByOneAdjMat(nct) colnames(adjmatrix) <- rownames(adjmatrix) <- levels(sce$x)
Make simulated data for airpart
makeSimulatedData( mu1 = 2, mu2 = 10, nct = 4, n = 30, ngenecl = 50, theta = 20, ncl = 3, p.vec = rep(c(0.2, 0.8, 0.5, 0.5, 0.7, 0.9), each = 2), totalClusters = FALSE )
makeSimulatedData( mu1 = 2, mu2 = 10, nct = 4, n = 30, ngenecl = 50, theta = 20, ncl = 3, p.vec = rep(c(0.2, 0.8, 0.5, 0.5, 0.7, 0.9), each = 2), totalClusters = FALSE )
mu1 |
low count (typical of "noisy" ratio estimates) |
mu2 |
high count |
nct |
number of cell types |
n |
number of cells per cell type |
ngenecl |
number of genes per cluster |
theta |
overdispersion parameter (higher is closer to binomial) |
ncl |
number of gene cluster |
p.vec |
the allelic ratio vector which follows gene cluster order. (length is nct * ncl) |
totalClusters |
logical, whether cell types should cluster by total count |
SingleCellExperiment with the following elements as assays
a1 allelic count matrix for the numerator/effect allele
a2 allelic count matrix for the denominator/non-effect allele
true.ratio a matrix of the true probabilities (allelic ratios) for the cell types
Also x
in the colData is a vector of annotated
cell types in the same order as cells in count matrix
library(SummarizedExperiment) sce <- makeSimulatedData() assayNames(sce)
library(SummarizedExperiment) sce <- makeSimulatedData() assayNames(sce)
Plot group partition and posterior allelic ratio estimates by step
makeStep(sce, xlab = "cell type")
makeStep(sce, xlab = "cell type")
sce |
SingleCellExperiment |
xlab |
the x axis name. |
a ggplot2 object.
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = 1:4) sce_sub <- wilcoxExt(sce, genecluster = 1) sce_sub <- allelicRatio(sce_sub) makeStep(sce_sub)
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = 1:4) sce_sub <- wilcoxExt(sce, genecluster = 1) sce_sub <- allelicRatio(sce_sub) makeStep(sce_sub)
Posterior mean allelic ratio estimates in violin plots
makeViolin(sce, xlab = "cell type", ylim = c(0, 1))
makeViolin(sce, xlab = "cell type", ylim = c(0, 1))
sce |
SingleCellExperiment |
xlab |
the x axis name. |
ylim |
the y axis range |
a ggplot2 object, n
represents number of cells in that cell type.
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = 1:4) sce_sub <- wilcoxExt(sce, genecluster = 1) sce_sub <- allelicRatio(sce_sub) makeViolin(sce_sub)
sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = 1:4) sce_sub <- wilcoxExt(sce, genecluster = 1) sce_sub <- allelicRatio(sce_sub) makeViolin(sce_sub)
Preprocess the SingleCellExperiment
preprocess(sce, pc = 2)
preprocess(sce, pc = 2)
sce |
SingleCellExperiment with |
pc |
pseudocount for calculating the smoothed ratio |
SingleCellExperiment with total count, allelic ratio = a1/(a1 + a2), and pseud-ocount-smoothed ratio
library(SummarizedExperiment) sce <- makeSimulatedData() sce <- preprocess(sce) assayNames(sce)
library(SummarizedExperiment) sce <- makeSimulatedData() sce <- preprocess(sce) assayNames(sce)
Oroduce allelic ratio summaries for each gene cluster
summaryAllelicRatio(sce, genecluster)
summaryAllelicRatio(sce, genecluster)
sce |
SingleCellExperiment |
genecluster |
an optional vector of gene cluster IDs. if nothing is given, all cluster's summaries will be calculated |
a list of gene cluster summary tables containing:
weighted.mean weighted mean of allelic ratio for the cell types
mean mean allelic ratio for the cell types
var variance of allelic ratio for the cell types
library(S4Vectors) sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = 1:4) summary <- summaryAllelicRatio(sce, genecluster = c(1, 3)) summary
library(S4Vectors) sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = 1:4) summary <- summaryAllelicRatio(sce, genecluster = c(1, 3)) summary
Extends the Pairwise Mann Whitney Wilcoxon Test by combining hierarchical clustering for partition.
wilcoxExt( sce, genecluster, threshold, adj.matrix, p.adjust.method = "none", ncores = NULL, ... )
wilcoxExt( sce, genecluster, threshold, adj.matrix, p.adjust.method = "none", ncores = NULL, ... )
sce |
A SingleCellExperiment containing assays ( |
genecluster |
which gene cluster result want to be returned.
Usually identified interesting gene cluster pattern by
|
threshold |
a vector with candidate thresholds for raw p-value cut-off. Default is 10^seq(from=-2,to=-0.4,by=0.2). For details please see vignette |
adj.matrix |
an adjacency matrix with 1 indicates cell states allowed to be grouped together, 0 otherwise. |
p.adjust.method |
method for adjusting p-values
(see |
ncores |
A cluster object created by |
... |
additional arguments to pass to |
A matrix grouping factor partition and
the significant cut-off threshold
are returned in metadata "partition"
and "threshold"
.
Partation also stored in colData"part"
. Note we recommend the returned
"threshold"
is not at the ends of input "threshold"
.
library(S4Vectors) sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = seq_len(4)) sce_sub <- wilcoxExt(sce, genecluster = 1) metadata(sce_sub)$partition metadata(sce_sub)$threshold # Suppose we have 4 cell states, if we don't want cell state 1 # to be grouped together with other cell states adj.matrix <- 1 - diag(4) colnames(adj.matrix) <- rownames(adj.matrix) <- levels(sce$x) adj.matrix[1, c(2, 3, 4)] <- 0 adj.matrix[c(2, 3, 4), 1] <- 0 thrs <- 10^seq(from = -2, to = -0.4, by = 0.1) sce_sub <- wilcoxExt(sce, genecluster = 1, threshold = thrs, adj.matrix = adj.matrix ) metadata(sce_sub)$partition
library(S4Vectors) sce <- makeSimulatedData() sce <- preprocess(sce) sce <- geneCluster(sce, G = seq_len(4)) sce_sub <- wilcoxExt(sce, genecluster = 1) metadata(sce_sub)$partition metadata(sce_sub)$threshold # Suppose we have 4 cell states, if we don't want cell state 1 # to be grouped together with other cell states adj.matrix <- 1 - diag(4) colnames(adj.matrix) <- rownames(adj.matrix) <- levels(sce$x) adj.matrix[1, c(2, 3, 4)] <- 0 adj.matrix[c(2, 3, 4), 1] <- 0 thrs <- 10^seq(from = -2, to = -0.4, by = 0.1) sce_sub <- wilcoxExt(sce, genecluster = 1, threshold = thrs, adj.matrix = adj.matrix ) metadata(sce_sub)$partition