library(miloR)
library(SingleCellExperiment)
library(scater)
library(scran)
library(dplyr)
library(patchwork)
library(MouseGastrulationData)
For this vignette we will use the mouse gastrulation single-cell data
from Pijuan-Sala et
al. 2019. The dataset can be downloaded as a
SingleCellExperiment
object from the MouseGastrulationData
package on Bioconductor. To make computations faster, here we will
download just a subset of samples, 4 samples at stage E7 and 4 samples
at stage E7.5.
This dataset has already been pre-processed and contains a
pca.corrected
dimensionality reduction, which was built
after batch correction using fastMNN
.
select_samples <- c(2, 3, 6, 4, #15,
# 19,
10, 14#, 20 #30
#31, 32
)
embryo_data = EmbryoAtlasData(samples = select_samples)
embryo_data
## class: SingleCellExperiment
## dim: 29452 7558
## metadata(0):
## assays(1): counts
## rownames(29452): ENSMUSG00000051951 ENSMUSG00000089699 ...
## ENSMUSG00000096730 ENSMUSG00000095742
## rowData names(2): ENSEMBL SYMBOL
## colnames(7558): cell_361 cell_362 ... cell_29013 cell_29014
## colData names(17): cell barcode ... colour sizeFactor
## reducedDimNames(2): pca.corrected umap
## mainExpName: NULL
## altExpNames(0):
We recompute the UMAP embedding for this subset of cells to visualize the data.
embryo_data <- embryo_data[,apply(reducedDim(embryo_data, "pca.corrected"), 1, function(x) !all(is.na(x)))]
embryo_data <- runUMAP(embryo_data, dimred = "pca.corrected", name = 'umap')
plotReducedDim(embryo_data, colour_by="stage", dimred = "umap")
We will test for significant differences in abundance of cells between these stages of development, and the associated gene signatures.
For differential abundance analysis on graph neighbourhoods we first
construct a Milo
object. This extends the SingleCellExperiment
class to store information about neighbourhoods on the KNN graph.
## class: Milo
## dim: 29452 6875
## metadata(0):
## assays(1): counts
## rownames(29452): ENSMUSG00000051951 ENSMUSG00000089699 ...
## ENSMUSG00000096730 ENSMUSG00000095742
## rowData names(2): ENSEMBL SYMBOL
## colnames(6875): cell_361 cell_362 ... cell_29013 cell_29014
## colData names(17): cell barcode ... colour sizeFactor
## reducedDimNames(2): pca.corrected umap
## mainExpName: NULL
## altExpNames(0):
## nhoods dimensions(2): 1 1
## nhoodCounts dimensions(2): 1 1
## nhoodDistances dimension(1): 0
## graph names(0):
## nhoodIndex names(1): 0
## nhoodExpression dimension(2): 1 1
## nhoodReducedDim names(0):
## nhoodGraph names(0):
## nhoodAdjacency dimension(2): 1 1
We need to add the KNN graph to the Milo object. This is stored in
the graph
slot, in igraph
format. The
miloR
package includes functionality to build and store the
graph from the PCA dimensions stored in the reducedDim
slot. In this case, we specify that we want to build the graph from the
MNN corrected PCA dimensions.
For graph building you need to define a few parameters:
d
: the number of reduced dimensions to use for KNN
refinement. We recommend using the same d used for KNN graph building, or to
select PCs by inspecting the scree
plot.k
: this affects the power of DA testing, since we need
to have enough cells from each sample represented in a neighbourhood to
estimate the variance between replicates. On the other side, increasing
k too much might lead to
over-smoothing. We suggest to start by using the same value for k used for KNN graph building for
clustering and UMAP visualization. We will later use some heuristics to
evaluate whether the value of k should be increased.Alternatively, one can add a precomputed KNN graph (for example
constructed with Seurat or scanpy) to the graph
slot using
the adjacency matrix, through the helper function
buildFromAdjacency
.
We define the neighbourhood of a cell, the index, as the group of cells connected by an edge in the KNN graph to the index cell. For efficiency, we don’t test for DA in the neighbourhood of every cell, but we sample as indices a subset of representative cells, using a KNN sampling algorithm used by Gut et al. 2015.
As well as d and k, for sampling we need to define a few additional parameters:
prop
: the proportion of cells to randomly sample to
start with. We suggest using prop=0.1
for datasets of less
than 30k cells. For bigger datasets using prop=0.05
should
be sufficient (and makes computation faster).refined
: indicates whether you want to use the sampling
refinement algorithm, or just pick cells at random. The default and
recommended way to go is to use refinement. The only situation in which
you might consider using random
instead, is if you have
batch corrected your data with a graph based correction algorithm, such
as BBKNN, but the
results of DA testing will be suboptimal.embryo_milo <- makeNhoods(embryo_milo, prop = 0.1, k = 30, d=30, refined = TRUE, reduced_dims = "pca.corrected")
Once we have defined neighbourhoods, we plot the distribution of
neighbourhood sizes (i.e. how many cells form each neighbourhood) to
evaluate whether the value of k used for graph building was
appropriate. We can check this out using the
plotNhoodSizeHist
function.
As a rule of thumb we want to have an average neighbourhood size over 5 x N_samples. If the mean is lower, or if the distribution is
Milo leverages the variation in cell numbers between replicates for the same experimental condition to test for differential abundance. Therefore we have to count how many cells from each sample are in each neighbourhood. We need to use the cell metadata and specify which column contains the sample information.
embryo_milo <- countCells(embryo_milo, meta.data = as.data.frame(colData(embryo_milo)), sample="sample")
This adds to the Milo
object a n × m matrix, where n is the number of neighbourhoods
and m is the number of
experimental samples. Values indicate the number of cells from each
sample counted in a neighbourhood. This count matrix will be used for DA
testing.
## 6 x 6 sparse Matrix of class "dgCMatrix"
## 2 3 6 4 10 14
## 1 . 2 21 1 40 31
## 2 . 1 17 1 112 19
## 3 12 8 43 2 2 11
## 4 2 4 57 4 15 11
## 5 4 3 54 . 10 9
## 6 5 4 73 5 9 7
Now we are all set to test for differential abundance in
neighbourhoods. We implement this hypothesis testing in a generalized
linear model (GLM) framework, specifically using the Negative Binomial
GLM implementation in edgeR
.
We first need to think about our experimental design. The design
matrix should match each sample to the experimental condition of
interest for DA testing. In this case, we want to detect DA between
embryonic stages, stored in the stage
column of the dataset
colData
. We also include the sequencing.batch
column in the design matrix. This represents a known technical covariate
that we want to account for in DA testing.
embryo_design <- data.frame(colData(embryo_milo))[,c("sample", "stage", "sequencing.batch")]
## Convert batch info from integer to factor
embryo_design$sequencing.batch <- as.factor(embryo_design$sequencing.batch)
embryo_design <- distinct(embryo_design)
rownames(embryo_design) <- embryo_design$sample
embryo_design
## sample stage sequencing.batch
## 2 2 E7.5 1
## 3 3 E7.5 1
## 6 6 E7.5 1
## 4 4 E7.5 1
## 10 10 E7.0 1
## 14 14 E7.0 2
Milo uses an adaptation of the Spatial FDR correction introduced by
cydar,
where we correct p-values accounting for the amount of overlap between
neighbourhoods. Specifically, each hypothesis test P-value is weighted
by the reciprocal of the kth nearest neighbour distance. To use this
statistic we first need to store the distances between nearest neighbors
in the Milo object. This is done by the calcNhoodDistance
function (N.B. this step is the most time consuming of the analysis
workflow and might take a couple of minutes for large datasets).
Now we can do the DA test, explicitly defining our experimental design. In this case, we want to test for differences between experimental stages, while accounting for the variability between technical batches (You can find more info on how to use formulas to define a testing design in R here)
da_results <- testNhoods(embryo_milo, design = ~ sequencing.batch + stage, design.df = embryo_design, reduced.dim="pca.corrected")
head(da_results)
## logFC logCPM F PValue FDR Nhood SpatialFDR
## 1 -2.74019830 11.89271 10.33729134 1.334768e-03 2.517474e-03 1 2.530372e-03
## 2 -4.56833749 12.32693 24.33984225 9.086333e-07 4.835227e-06 2 4.211712e-06
## 3 3.39264532 11.67069 7.92840315 4.937711e-03 8.456539e-03 3 8.577635e-03
## 4 0.09368285 11.54817 0.01250761 9.109690e-01 9.212741e-01 4 9.217265e-01
## 5 0.37954639 11.27013 0.12777327 7.208083e-01 7.487341e-01 5 7.497681e-01
## 6 1.22231716 11.60894 1.75470956 1.855115e-01 2.316302e-01 6 2.345769e-01
This calculates a Fold-change and corrected P-value for each neighbourhood, which indicates whether there is significant differential abundance between developmental stages. The main statistics we consider here are:
logFC
: indicates the log-Fold change in cell numbers
between samples from E7.5 and samples from E7.0PValue
: reports P-values before FDR correctionSpatialFDR
: reports P-values corrected for multiple
testing accounting for overlap between neighbourhoods## logFC logCPM F PValue FDR Nhood SpatialFDR
## 304 -8.679683 11.43163 52.05160 9.014308e-13 4.029396e-10 304 3.104397e-10
## 198 -8.768282 11.74875 49.72141 2.829119e-12 6.323081e-10 198 4.947630e-10
## 241 -6.206875 11.52020 45.43242 2.340204e-11 3.486904e-09 241 2.703463e-09
## 432 -6.116118 11.56844 43.40681 6.370755e-11 7.105226e-09 432 5.502451e-09
## 95 -7.790937 12.20009 42.57754 9.606245e-11 7.105226e-09 95 5.640548e-09
## 265 -8.061842 11.56340 42.74222 8.853610e-11 7.105226e-09 265 5.640548e-09
We can start inspecting the results of our DA analysis from a couple of standard diagnostic plots. We first inspect the distribution of uncorrected P values, to verify that the test was balanced.
Then we visualize the test results with a volcano plot (remember that each point here represents a neighbourhood, not a cell).
ggplot(da_results, aes(logFC, -log10(SpatialFDR))) +
geom_point() +
geom_hline(yintercept = 1) ## Mark significance threshold (10% FDR)
Looks like we have detected several neighbourhoods were there is a significant difference in cell abundances between developmental stages.
To visualize DA results relating them to the embedding of single cells, we can build an abstracted graph of neighbourhoods that we can superimpose on the single-cell embedding. Here each node represents a neighbourhood, while edges indicate how many cells two neighbourhoods have in common. Here the layout of nodes is determined by the position of the index cell in the UMAP embedding of all single-cells. The neighbourhoods displaying significant DA are colored by their log-Fold Change.
embryo_milo <- buildNhoodGraph(embryo_milo)
## Plot single-cell UMAP
umap_pl <- plotReducedDim(embryo_milo, dimred = "umap", colour_by="stage", text_by = "celltype",
text_size = 3, point_size=0.5) +
guides(fill="none")
## Plot neighbourhood graph
nh_graph_pl <- plotNhoodGraphDA(embryo_milo, da_results, layout="umap",alpha=0.1)
umap_pl + nh_graph_pl +
plot_layout(guides="collect")
## Warning: ggrepel: 9 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
We might also be interested in visualizing whether DA is particularly
evident in certain cell types. To do this, we assign a cell type label
to each neighbourhood by finding the most abundant cell type within
cells in each neighbourhood. We can label neighbourhoods in the results
data.frame
using the function annotateNhoods
.
This also saves the fraction of cells harbouring the label.
## logFC logCPM F PValue FDR Nhood SpatialFDR
## 1 -2.74019830 11.89271 10.33729134 1.334768e-03 2.517474e-03 1 2.530372e-03
## 2 -4.56833749 12.32693 24.33984225 9.086333e-07 4.835227e-06 2 4.211712e-06
## 3 3.39264532 11.67069 7.92840315 4.937711e-03 8.456539e-03 3 8.577635e-03
## 4 0.09368285 11.54817 0.01250761 9.109690e-01 9.212741e-01 4 9.217265e-01
## 5 0.37954639 11.27013 0.12777327 7.208083e-01 7.487341e-01 5 7.497681e-01
## 6 1.22231716 11.60894 1.75470956 1.855115e-01 2.316302e-01 6 2.345769e-01
## celltype celltype_fraction
## 1 Primitive Streak 0.8421053
## 2 Epiblast 0.9800000
## 3 ExE mesoderm 0.4102564
## 4 ExE endoderm 1.0000000
## 5 ExE ectoderm 1.0000000
## 6 ExE ectoderm 1.0000000
While neighbourhoods tend to be homogeneous, we can define a
threshold for celltype_fraction
to exclude neighbourhoods
that are a mix of cell types.
Now we can visualize the distribution of DA Fold Changes in different cell types
This is already quite informative: we can see that certain early development cell types, such as epiblast and primitive streak, are enriched in the earliest time stage, while others are enriched later in development, such as ectoderm cells. Interestingly, we also see plenty of DA neighbourhood with a mixed label. This could indicate that transitional states show changes in abundance in time.
Once you have found your neighbourhoods showindg significant DA
between conditions, you might want to find gene signatures specific to
the cells in those neighbourhoods. The function
findNhoodGroupMarkers
runs a one-VS-all differential gene
expression test to identify marker genes for a group of neighbourhoods
of interest. Before running this function you will need to define your
neighbourhood groups depending on your biological question, that need to
be stored as a NhoodGroup
column in the
da_results
data.frame.
In a case where all the DA neighbourhoods seem to belong to the same region of the graph, you might just want to test the significant DA neighbourhoods with the same logFC against all the rest (N.B. for illustration purposes, here I am testing on a randomly selected set of 10 genes).
## Add log normalized count to Milo object
embryo_milo <- logNormCounts(embryo_milo)
da_results$NhoodGroup <- as.numeric(da_results$SpatialFDR < 0.1 & da_results$logFC < 0)
da_nhood_markers <- findNhoodGroupMarkers(embryo_milo, da_results, subset.row = rownames(embryo_milo)[1:10])
## Warning: Zero sample variances detected, have been offset away from zero
## Warning: Zero sample variances detected, have been offset away from zero
## GeneID logFC_1 adj.P.Val_1 logFC_0 adj.P.Val_0
## 1 ENSMUSG00000025900 0.0001254895 1.0000000000 -0.0001254895 1.0000000000
## 2 ENSMUSG00000025902 -0.0935026113 0.0006458443 0.0935026113 0.0006458443
## 3 ENSMUSG00000025903 0.0581409688 0.0117021846 -0.0581409688 0.0117021846
## 4 ENSMUSG00000033813 0.0357617077 0.2055973888 -0.0357617077 0.2055973888
## 5 ENSMUSG00000033845 0.0680548988 0.0117021846 -0.0680548988 0.0117021846
## 6 ENSMUSG00000051951 0.0000000000 1.0000000000 0.0000000000 1.0000000000
For this analysis we recommend aggregating the neighbourhood
expression profiles by experimental samples (the same used for DA
testing), by setting aggregate.samples=TRUE
. This way
single-cells will not be considered as “replicates” during DGE testing,
and dispersion will be estimated between true biological replicates.
Like so:
da_nhood_markers <- findNhoodGroupMarkers(embryo_milo, da_results, subset.row = rownames(embryo_milo)[1:10],
aggregate.samples = TRUE, sample_col = "sample")
## Warning: Zero sample variances detected, have been offset away from zero
## Warning: Zero sample variances detected, have been offset away from zero
## GeneID logFC_1 adj.P.Val_1 logFC_0 adj.P.Val_0
## 1 ENSMUSG00000025900 -0.0008350905 1 0.0008350905 1
## 2 ENSMUSG00000025902 0.1294318379 1 -0.1294318379 1
## 3 ENSMUSG00000025903 0.0305468380 1 -0.0305468380 1
## 4 ENSMUSG00000033813 -0.0253068419 1 0.0253068419 1
## 5 ENSMUSG00000033845 0.0462633800 1 -0.0462633800 1
## 6 ENSMUSG00000051951 0.0000000000 1 0.0000000000 1
(Notice the difference in p values)
In many cases, such as this example, DA neighbourhoods are found in different areas of the KNN graph, and grouping together all significant DA populations might not be ideal, as they might include cells of very different celltypes. For this kind of scenario, we have implemented a neighbourhood function that uses community detection to partition neighbourhoods into groups on the basis of (1) the number of shared cells between 2 neighbourhoods; (2) the direction of fold-change for DA neighbourhoods; (3) the difference in fold change.
## Run buildNhoodGraph to store nhood adjacency matrix
embryo_milo <- buildNhoodGraph(embryo_milo)
## Find groups
da_results <- groupNhoods(embryo_milo, da_results, max.lfc.delta = 10)
head(da_results)
## logFC logCPM F PValue FDR Nhood SpatialFDR
## 1 -2.74019830 11.89271 10.33729134 1.334768e-03 2.517474e-03 1 2.530372e-03
## 2 -4.56833749 12.32693 24.33984225 9.086333e-07 4.835227e-06 2 4.211712e-06
## 3 3.39264532 11.67069 7.92840315 4.937711e-03 8.456539e-03 3 8.577635e-03
## 4 0.09368285 11.54817 0.01250761 9.109690e-01 9.212741e-01 4 9.217265e-01
## 5 0.37954639 11.27013 0.12777327 7.208083e-01 7.487341e-01 5 7.497681e-01
## 6 1.22231716 11.60894 1.75470956 1.855115e-01 2.316302e-01 6 2.345769e-01
## celltype celltype_fraction NhoodGroup
## 1 Primitive Streak 0.8421053 1
## 2 Epiblast 0.9800000 2
## 3 Mixed 0.4102564 3
## 4 ExE endoderm 1.0000000 4
## 5 ExE ectoderm 1.0000000 5
## 6 ExE ectoderm 1.0000000 5
Let’s have a look at the detected groups
We can easily check how changing the grouping parameters changes the
groups we obtain, starting with the LFC delta by plotting with different
values of max.lfc.delta
(not executed here).
# code not run - uncomment to run.
# plotDAbeeswarm(groupNhoods(embryo_milo, da_results, max.lfc.delta = 1) , group.by = "NhoodGroup") + ggtitle("max LFC delta=1")
# plotDAbeeswarm(groupNhoods(embryo_milo, da_results, max.lfc.delta = 2) , group.by = "NhoodGroup") + ggtitle("max LFC delta=2")
# plotDAbeeswarm(groupNhoods(embryo_milo, da_results, max.lfc.delta = 3) , group.by = "NhoodGroup") + ggtitle("max LFC delta=3")
…and we can do the same for the minimum overlap between neighbourhoods… (code not executed).
# code not run - uncomment to run.
# plotDAbeeswarm(groupNhoods(embryo_milo, da_results, max.lfc.delta = 5, overlap=1) , group.by = "NhoodGroup") + ggtitle("overlap=5")
# plotDAbeeswarm(groupNhoods(embryo_milo, da_results, max.lfc.delta = 5, overlap=5) , group.by = "NhoodGroup") + ggtitle("overlap=10")
# plotDAbeeswarm(groupNhoods(embryo_milo, da_results, max.lfc.delta = 5, overlap=10) , group.by = "NhoodGroup") + ggtitle("overlap=20")
In these examples we settle for overlap=5
and
max.lfc.delta=5
, as we need at least 2 neighbourhoods
assigned to each group.
Once we have grouped neighbourhoods using groupNhoods
we
are now all set to identifying gene signatures between neighbourhood
groups.
Let’s restrict the testing to highly variable genes in this case
## Exclude zero counts genes
keep.rows <- rowSums(logcounts(embryo_milo)) != 0
embryo_milo <- embryo_milo[keep.rows, ]
## Find HVGs
set.seed(101)
dec <- modelGeneVar(embryo_milo)
hvgs <- getTopHVGs(dec, n=2000)
# this vignette randomly fails to identify HVGs for some reason
if(!length(hvgs)){
set.seed(42)
dec <- modelGeneVar(embryo_milo)
hvgs <- getTopHVGs(dec, n=2000)
}
head(hvgs)
## [1] "ENSMUSG00000032083" "ENSMUSG00000095180" "ENSMUSG00000061808"
## [4] "ENSMUSG00000002985" "ENSMUSG00000024990" "ENSMUSG00000024391"
We run findNhoodGroupMarkers
to test for one-vs-all
differential gene expression for each neighbourhood group
set.seed(42)
nhood_markers <- findNhoodGroupMarkers(embryo_milo, da_results, subset.row = hvgs,
aggregate.samples = TRUE, sample_col = "sample")
head(nhood_markers)
## GeneID logFC_1 adj.P.Val_1 logFC_2 adj.P.Val_2 logFC_3
## 1 ENSMUSG00000000031 -1.65303404 0.1453528 -1.6405350 0.1203701 -0.53707545
## 2 ENSMUSG00000000078 -0.15831282 0.6478173 -0.2287234 0.3918569 0.05301000
## 3 ENSMUSG00000000088 -0.35549044 0.4967186 -0.3487443 0.3185433 -0.34395007
## 4 ENSMUSG00000000125 0.06140842 0.6981318 -0.1103835 0.3675965 0.13258090
## 5 ENSMUSG00000000149 0.01297753 0.9455496 -0.1385986 0.2888017 -0.06455307
## 6 ENSMUSG00000000184 0.40432901 0.6142346 -0.3687256 0.5317216 1.70010612
## adj.P.Val_3 logFC_4 adj.P.Val_4 logFC_5 adj.P.Val_5 logFC_6
## 1 0.6340646564 3.143179651 2.444001e-07 1.1915758 0.17492131 1.2757413
## 2 0.8705871844 0.005001341 9.827421e-01 -0.1530989 0.50561579 0.1822290
## 3 0.3575354646 0.739411897 1.210485e-03 -0.1941067 0.51219145 0.3417394
## 4 0.3275663836 0.014419687 8.783042e-01 -0.1604204 0.11880272 -0.1147870
## 5 0.6425568541 0.261938341 2.037219e-03 -0.1354282 0.21344827 0.1386168
## 6 0.0001196805 -1.051898019 1.389260e-02 -1.0435900 0.03509479 -0.4546416
## adj.P.Val_6 logFC_7 adj.P.Val_7 logFC_8 adj.P.Val_8
## 1 0.1553903 -1.7401872 0.08180048 -0.6814130 0.7209963758
## 2 0.4835026 -0.1643664 0.48234473 1.1487033 0.0001157561
## 3 0.2742285 -0.2920122 0.35021121 1.0255057 0.0237406534
## 4 0.3028047 0.2168659 0.06559294 -0.1285892 0.5525549196
## 5 0.2282143 -0.1727427 0.16729856 0.2031156 0.3123760929
## 6 0.4394821 1.1464773 0.04639828 -0.7937188 0.4416863102
Let’s check out the markers for group 5 for example
gr5_markers <- nhood_markers[c("logFC_5", "adj.P.Val_5")]
colnames(gr5_markers) <- c("logFC", "adj.P.Val")
head(gr5_markers[order(gr5_markers$adj.P.Val), ])
## logFC adj.P.Val
## 777 2.6283800 9.320809e-39
## 21 1.8837638 3.461591e-36
## 636 1.6013236 2.935699e-32
## 1470 2.4556267 2.947958e-31
## 1589 1.1103890 4.740793e-30
## 1902 0.9409209 2.812070e-28
If you already know you are interested only in the markers for group
2, you might want to test just 8-VS-all using the
subset.groups
parameter:
nhood_markers <- findNhoodGroupMarkers(embryo_milo, da_results, subset.row = hvgs,
aggregate.samples = TRUE, sample_col = "sample",
subset.groups = c("5")
)
head(nhood_markers)
## logFC_5 adj.P.Val_5 GeneID
## ENSMUSG00000027186 2.6283800 9.320809e-39 ENSMUSG00000027186
## ENSMUSG00000001025 1.8837638 3.461591e-36 ENSMUSG00000001025
## ENSMUSG00000025056 1.6013236 2.935699e-32 ENSMUSG00000025056
## ENSMUSG00000042367 2.4556267 2.947958e-31 ENSMUSG00000042367
## ENSMUSG00000048752 1.1103890 4.740793e-30 ENSMUSG00000048752
## ENSMUSG00000073243 0.9409209 2.812070e-28 ENSMUSG00000073243
You might also want to compare a subset of neighbourhoods between
each other. You can specify the neighbourhoods to use for testing by
setting the parameter subset.nhoods
.
For example, you might want to compare just one pair of neighbourhood groups against each other:
nhood_markers <- findNhoodGroupMarkers(embryo_milo, da_results, subset.row = hvgs,
subset.nhoods = da_results$NhoodGroup %in% c('5','6'),
aggregate.samples = TRUE, sample_col = "sample")
## Warning: Zero sample variances detected, have been offset away from zero
## Warning: Zero sample variances detected, have been offset away from zero
## GeneID logFC_5 adj.P.Val_5 logFC_6 adj.P.Val_6
## 1 ENSMUSG00000000031 -0.09466293 0.737807561 0.09466293 0.737807561
## 2 ENSMUSG00000000078 -0.43918659 0.064050846 0.43918659 0.064050846
## 3 ENSMUSG00000000088 -0.60287161 0.007022271 0.60287161 0.007022271
## 4 ENSMUSG00000000125 -0.05555124 0.033767647 0.05555124 0.033767647
## 5 ENSMUSG00000000149 -0.21571524 0.002422481 0.21571524 0.002422481
## 6 ENSMUSG00000000184 -0.30264666 0.012003366 0.30264666 0.012003366
or you might use subset.nhoods
to exclude singleton
neighbourhoods, or to subset to the neighbourhoods that show significant
DA.
Lets select marker genes for group 10 at FDR 1% and log-fold-Change > 1.
ggplot(nhood_markers, aes(logFC_5, -log10(adj.P.Val_5 ))) +
geom_point(alpha=0.5, size=0.5) +
geom_hline(yintercept = 3)
We can visualize the expression in neighbourhoods using
plotNhoodExpressionGroups
.
set.seed(42)
plotNhoodExpressionGroups(embryo_milo, da_results, features=intersect(rownames(embryo_milo), markers[1:10]),
subset.nhoods = da_results$NhoodGroup %in% c('6','5'),
scale=TRUE,
grid.space = "fixed")
## Warning in plotNhoodExpressionGroups(embryo_milo, da_results, features =
## intersect(rownames(embryo_milo), : Nothing in nhoodExpression(x): computing for
## requested features...
In some cases you might want to test for differential expression
between cells in different conditions within the same
neighbourhood group. You can do that using testDiffExp
:
dge_6 <- testDiffExp(embryo_milo, da_results, design = ~ stage, meta.data = data.frame(colData(embryo_milo)),
subset.row = rownames(embryo_milo)[1:5], subset.nhoods=da_results$NhoodGroup=="6")
dge_6
## $`6`
## logFC AveExpr t P.Value adj.P.Val
## ENSMUSG00000033845 -0.26338036 2.559112956 -2.803225 0.005428539 0.02714270
## ENSMUSG00000025902 0.28454115 2.168222541 2.375870 0.018210009 0.04552502
## ENSMUSG00000025900 -0.01156166 0.003562159 -1.590895 0.112810486 0.18801748
## ENSMUSG00000051951 0.00000000 0.000000000 0.000000 1.000000000 1.00000000
## ENSMUSG00000102343 0.00000000 0.000000000 0.000000 1.000000000 1.00000000
## B Nhood.Group
## ENSMUSG00000033845 -9.137091 6
## ENSMUSG00000025902 -10.220620 6
## ENSMUSG00000025900 -11.759897 6
## ENSMUSG00000051951 -13.024134 6
## ENSMUSG00000102343 -13.024134 6
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] MouseGastrulationData_1.19.0 SpatialExperiment_1.15.1
## [3] MouseThymusAgeing_1.13.0 patchwork_1.3.0
## [5] dplyr_1.1.4 scran_1.33.2
## [7] scater_1.33.4 ggplot2_3.5.1
## [9] scuttle_1.15.5 SingleCellExperiment_1.27.2
## [11] SummarizedExperiment_1.35.5 Biobase_2.67.0
## [13] GenomicRanges_1.57.2 GenomeInfoDb_1.41.2
## [15] IRanges_2.39.2 S4Vectors_0.43.2
## [17] BiocGenerics_0.53.0 MatrixGenerics_1.17.1
## [19] matrixStats_1.4.1 miloR_2.3.0
## [21] edgeR_4.3.21 limma_3.61.12
## [23] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 sys_3.4.3 jsonlite_1.8.9
## [4] magrittr_2.0.3 magick_2.8.5 ggbeeswarm_0.7.2
## [7] farver_2.1.2 rmarkdown_2.28 zlibbioc_1.51.2
## [10] vctrs_0.6.5 memoise_2.0.1 htmltools_0.5.8.1
## [13] S4Arrays_1.5.11 AnnotationHub_3.15.0 curl_5.2.3
## [16] BiocNeighbors_2.1.0 SparseArray_1.5.45 sass_0.4.9
## [19] pracma_2.4.4 bslib_0.8.0 cachem_1.1.0
## [22] buildtools_1.0.0 igraph_2.1.1 mime_0.12
## [25] lifecycle_1.0.4 pkgconfig_2.0.3 rsvd_1.0.5
## [28] Matrix_1.7-1 R6_2.5.1 fastmap_1.2.0
## [31] GenomeInfoDbData_1.2.13 digest_0.6.37 numDeriv_2016.8-1.1
## [34] colorspace_2.1-1 AnnotationDbi_1.69.0 dqrng_0.4.1
## [37] irlba_2.3.5.1 ExperimentHub_2.13.1 RSQLite_2.3.7
## [40] beachmat_2.23.0 labeling_0.4.3 filelock_1.0.3
## [43] fansi_1.0.6 httr_1.4.7 polyclip_1.10-7
## [46] abind_1.4-8 compiler_4.4.1 bit64_4.5.2
## [49] withr_3.0.2 BiocParallel_1.41.0 viridis_0.6.5
## [52] DBI_1.2.3 highr_0.11 ggforce_0.4.2
## [55] MASS_7.3-61 rappdirs_0.3.3 DelayedArray_0.33.1
## [58] rjson_0.2.23 bluster_1.17.0 gtools_3.9.5
## [61] tools_4.4.1 vipor_0.4.7 beeswarm_0.4.0
## [64] glue_1.8.0 grid_4.4.1 cluster_2.1.6
## [67] generics_0.1.3 gtable_0.3.6 tidyr_1.3.1
## [70] BiocSingular_1.23.0 tidygraph_1.3.1 ScaledMatrix_1.13.0
## [73] metapod_1.13.0 utf8_1.2.4 XVector_0.45.0
## [76] RcppAnnoy_0.0.22 ggrepel_0.9.6 BiocVersion_3.21.1
## [79] pillar_1.9.0 stringr_1.5.1 BumpyMatrix_1.15.0
## [82] splines_4.4.1 tweenr_2.0.3 BiocFileCache_2.15.0
## [85] lattice_0.22-6 FNN_1.1.4.1 bit_4.5.0
## [88] tidyselect_1.2.1 locfit_1.5-9.10 maketools_1.3.1
## [91] Biostrings_2.75.0 knitr_1.48 gridExtra_2.3
## [94] xfun_0.48 graphlayouts_1.2.0 statmod_1.5.0
## [97] stringi_1.8.4 UCSC.utils_1.1.0 yaml_2.3.10
## [100] evaluate_1.0.1 codetools_0.2-20 ggraph_2.2.1
## [103] tibble_3.2.1 BiocManager_1.30.25 cli_3.6.3
## [106] uwot_0.2.2 munsell_0.5.1 jquerylib_0.1.4
## [109] Rcpp_1.0.13 dbplyr_2.5.0 png_0.1-8
## [112] parallel_4.4.1 blob_1.2.4 viridisLite_0.4.2
## [115] scales_1.3.0 purrr_1.0.2 crayon_1.5.3
## [118] rlang_1.1.4 cowplot_1.1.3 KEGGREST_1.45.1