library(miloR)
library(SingleCellExperiment)
library(scater)
library(scran)
library(dplyr)
library(patchwork)
library(MouseThymusAgeing)
library(scuttle)
We have seen how Milo uses graph neighbourhoods to model cell state
abundance differences in an experiment, when comparing 2 groups.
However, we are often interested in testing between 2 specific groups in
our analysis when our experiment has collected data from > 2 groups. We can focus our analysis to a
2 group comparison and still make use of all of the data for things like
dispersion estimation, by using contrasts. For an in-depth use
of contrasts we recommend users refer to the limma
and
edgeR
Biostars and Bioconductor community forum threads on
the subject. Here I will give an overview of how to use contrasts in the
context of a Milo analysis.
We will use the MouseThymusAgeing
data package as there
are multiple groups that we can compare.
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## field not found in version - adding
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## see ?MouseThymusAgeing and browseVignettes('MouseThymusAgeing') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## class: SingleCellExperiment
## dim: 48801 2327
## metadata(0):
## assays(2): counts logcounts
## rownames(48801): ERCC-00002 ERCC-00003 ... ENSMUSG00000064371
## ENSMUSG00000064372
## rowData names(6): Geneid Chr ... Strand Length
## colnames(2327): B13.B002229.1_52.1.32.1_S109
## B13.B002297.1_32.4.52.1_S73 ... P9.B002345.5_52.1.32.1_S93
## P9.B002450.5_4.52.16.1_S261
## colData names(11): CellID ClusterID ... SubType sizeFactor
## reducedDimNames(1): PCA
## mainExpName: NULL
## altExpNames(0):
thy.sce <- runUMAP(thy.sce) # add a UMAP for plotting results later
thy.milo <- Milo(thy.sce) # from the SCE object
reducedDim(thy.milo, "UMAP") <- reducedDim(thy.sce, "UMAP")
plotUMAP(thy.milo, colour_by="SubType") + plotUMAP(thy.milo, colour_by="Age")
These UMAPs shows how the different thymic epithelial cell subtypes and cells from different aged mice are distributed across our single-cell data set. Next we build the KNN graph and define neighbourhoods to quantify cell abundance across our experimental samples.
## Constructing kNN graph with k:11
thy.milo <- makeNhoods(thy.milo, prop = 0.2, k = 11, d=40, refined = TRUE, refinement_scheme="graph") # make nhoods using graph-only as this is faster
## Checking valid object
## Running refined sampling with graph
colData(thy.milo)$Sample <- paste(colData(thy.milo)$SortDay, colData(thy.milo)$Age, sep="_")
thy.milo <- countCells(thy.milo, meta.data = data.frame(colData(thy.milo)), samples="Sample") # make the nhood X sample counts matrix
## Checking meta.data validity
## Counting cells in neighbourhoods
Now we have the pieces in place for DA testing to demonstrate how to use contrasts. We will use these contrasts to explicitly define which groups will be compared to each other.
thy.design <- data.frame(colData(thy.milo))[,c("Sample", "SortDay", "Age")]
thy.design <- distinct(thy.design)
rownames(thy.design) <- thy.design$Sample
## Reorder rownames to match columns of nhoodCounts(milo)
thy.design <- thy.design[colnames(nhoodCounts(thy.milo)), , drop=FALSE]
table(thy.design$Age)
##
## 16wk 1wk 32wk 4wk 52wk
## 5 5 5 5 5
To demonstrate the use of contrasts we will fit the whole model to the whole data set, but we will compare sequential pairs of time points. I’ll start with week 1 vs. week 4 to illustrate the syntax.
rownames(thy.design) <- thy.design$Sample
contrast.1 <- c("Age1wk - Age4wk") # the syntax is <VariableName><ConditionLevel> - <VariableName><ControlLevel>
# we need to use the ~ 0 + Variable expression here so that we have all of the levels of our variable as separate columns in our model matrix
da_results <- testNhoods(thy.milo, design = ~ 0 + Age, design.df = thy.design, model.contrasts = contrast.1,
fdr.weighting="graph-overlap", norm.method="TMM")
## Using TMM normalisation
## Running with model contrasts
## Performing spatial FDR correction with graph-overlap weighting
##
## FALSE TRUE
## 329 39
This calculates a Fold-change and corrected P-value for each neighbourhood, which indicates whether there is significant differential abundance between conditions for 39 neighbourhoods.
You will notice that the syntax for the contrasts is quite specific.
It starts with the name of the column variable that contains the
different group levels; in this case it is the Age
variable. We then define the comparison levels as
level1 - level2
. To understand this syntax we need to
consider what we are concretely comparing. In this case we are asking
what is the ratio of the average cell count at week1 compared to the
average cell count at week 4, where the averaging is across the
replicates. The reason we express this as a difference rather than a
ratio is because we are dealing with the log fold change.
We can also pass multiple comparisons at the same time, for instance if we wished to compare each sequential pair of time points. This will give us a better intuition behind how to use contrasts to compare multiple groups.
contrast.all <- c("Age1wk - Age4wk", "Age4wk - Age16wk", "Age16wk - Age32wk", "Age32wk - Age52wk")
# contrast.all <- c("Age1wk - Age4wk", "Age16wk - Age32wk")
# this is the edgeR code called by `testNhoods`
model <- model.matrix(~ 0 + Age, data=thy.design)
mod.constrast <- makeContrasts(contrasts=contrast.all, levels=model)
mod.constrast
## Contrasts
## Levels Age1wk - Age4wk Age4wk - Age16wk Age16wk - Age32wk Age32wk - Age52wk
## Age16wk 0 -1 1 0
## Age1wk 1 0 0 0
## Age32wk 0 0 -1 1
## Age4wk -1 1 0 0
## Age52wk 0 0 0 -1
This shows the contrast matrix. If we want to test each of these
comparisons then we need to pass them sequentially to
testNhoods
, then apply an additional multiple testing
correction to the spatial FDR values.
contrast1.res <- testNhoods(thy.milo, design=~0+ Age, design.df=thy.design, fdr.weighting="graph-overlap", model.contrasts = contrast.all)
## Using TMM normalisation
## Running with model contrasts
## Performing spatial FDR correction with graph-overlap weighting
## logFC.Age1wk...Age4wk logFC.Age4wk...Age16wk logFC.Age16wk...Age32wk
## 1 1.60883944 2.609730 0.0000000
## 2 0.09368977 1.008753 2.4314527
## 3 0.00000000 -1.760337 -1.9251921
## 4 3.32094579 1.318364 0.0000000
## 5 -0.85859868 1.201711 0.8231479
## 6 -2.88346102 -1.199472 0.7592478
## logFC.Age32wk...Age52wk logCPM F PValue FDR Nhood
## 1 0.0000000 13.13414 5.026768 5.671633e-04 0.003040631 1
## 2 0.0000000 13.16451 3.471621 8.172310e-03 0.020690684 2
## 3 -0.3428480 13.24631 4.775808 8.543897e-04 0.003820024 3
## 4 0.0000000 13.17266 7.165642 1.251995e-05 0.000295905 4
## 5 -2.1235046 13.57888 1.178679 3.191056e-01 0.412793428 5
## 6 -0.1002945 13.35662 2.291891 5.870124e-02 0.102866938 6
## SpatialFDR
## 1 0.0038717577
## 2 0.0253918206
## 3 0.0049056411
## 4 0.0004389594
## 5 0.3950148402
## 6 0.1245685742
This matrix of contrasts will perform a quasi-likelihood F-test over
all 5 contrasts, hence a single p-value and spatial FDR are returned.
Log fold changes are returned for each contrast of the Age
variable, which gives 1 log-fold change column for each - this is the
default behaviour of glmQLFTest
in the edgeR
package which is what Milo uses for hypothesis testing. In general, and
to avoid confusion, we recommend testing each pair of contrasts
separately if these are the comparisons of interest, as shown below.
# compare weeks 4 and 16, with week 4 as the reference.
cont.4vs16.res <- testNhoods(thy.milo, design=~0+ Age, design.df=thy.design, fdr.weighting="graph-overlap", model.contrasts = c("Age4wk - Age16wk"))
## Using TMM normalisation
## Running with model contrasts
## Performing spatial FDR correction with graph-overlap weighting
## logFC logCPM F PValue FDR Nhood SpatialFDR
## 1 2.609730 13.13414 3.2029104 0.07416981 0.3899213 1 0.2419657
## 2 1.008753 13.16451 0.7209931 0.39618472 0.7367617 2 0.6627135
## 3 -1.760337 13.24631 2.0627528 0.15740385 0.5550350 3 0.4048074
## 4 1.318364 13.17266 1.0348738 0.33801788 0.6871559 4 0.6054043
## 5 1.201711 13.57888 1.0989926 0.29493742 0.6871559 5 0.5730064
## 6 -1.199472 13.35662 1.1779743 0.27833153 0.6729756 6 0.5473406
Now we have a single logFC which compares nhood abundance between week 4 and week 16 - as we can see the LFC estimates should be the same, but the SpatialFDR will be different.
par(mfrow=c(1, 2))
plot(contrast1.res$logFC.Age4wk...Age16wk, cont.4vs16.res$logFC,
xlab="4wk vs. 16wk LFC\nsingle contrast", ylab="4wk vs. 16wk LFC\nmultiple contrast")
plot(contrast1.res$SpatialFDR, cont.4vs16.res$SpatialFDR,
xlab="Spatial FDR\nsingle contrast", ylab="Spatial FDR\nmultiple contrast")
Contrasts are not limited to these simple pair-wise comparisons, we can also group levels together for comparisons. For instance, imagine we want to know what the effect of the cell counts in the week 1 mice is compared to all other time points.
model <- model.matrix(~ 0 + Age, data=thy.design)
ave.contrast <- c("Age1wk - (Age4wk + Age16wk + Age32wk + Age52wk)/4")
mod.constrast <- makeContrasts(contrasts=ave.contrast, levels=model)
mod.constrast
## Contrasts
## Levels Age1wk - (Age4wk + Age16wk + Age32wk + Age52wk)/4
## Age16wk -0.25
## Age1wk 1.00
## Age32wk -0.25
## Age4wk -0.25
## Age52wk -0.25
In this contrasts matrix we can see that we have taken the average
effect over the other time points. Now running this using
testNhoods
da_results <- testNhoods(thy.milo, design = ~ 0 + Age, design.df = thy.design, model.contrasts = ave.contrast, fdr.weighting="graph-overlap")
## Using TMM normalisation
## Running with model contrasts
## Performing spatial FDR correction with graph-overlap weighting
##
## FALSE TRUE
## 240 128
## logFC logCPM F PValue FDR Nhood
## 1 3.56613667 13.13414 1.847428e+01 2.106352e-05 3.100549e-04 1
## 2 2.06598102 13.16451 7.524685e+00 6.282731e-03 3.450813e-02 2
## 3 -2.36856093 13.24631 4.850860e-07 9.995778e-01 9.996948e-01 3
## 4 4.30971866 13.17266 2.850528e+01 1.368539e-07 5.919042e-06 4
## 5 -0.07661779 13.57888 1.697783e-03 9.671478e-01 9.996948e-01 5
## 6 -3.42851513 13.35662 7.887936e+00 5.186211e-03 3.215547e-02 6
## SpatialFDR
## 1 3.459150e-04
## 2 4.221079e-02
## 3 9.996948e-01
## 4 6.513718e-06
## 5 9.996948e-01
## 6 3.971554e-02
The results table In this comparison there are 128 DA nhoods - which we can visualise on a superimposed single-cell UMAP.
thy.milo <- buildNhoodGraph(thy.milo)
plotUMAP(thy.milo, colour_by="SubType") + plotNhoodGraphDA(thy.milo, da_results, alpha=0.1) +
plot_layout(guides="auto" )
## Adding nhood effect sizes to neighbourhood graph attributes
In these side-by-side UMAPs we can see that there is an enrichment of the Perinatal cTEC and Proliferating TEC populations in the 1 week old compared to the other time points.
For a more extensive description of the uses of contrasts please take a look at the edgeR documentation .
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] MouseThymusAgeing_1.14.0 patchwork_1.3.0
## [3] dplyr_1.1.4 scran_1.35.0
## [5] scater_1.35.0 ggplot2_3.5.1
## [7] scuttle_1.17.0 SingleCellExperiment_1.29.1
## [9] SummarizedExperiment_1.37.0 Biobase_2.67.0
## [11] GenomicRanges_1.59.1 GenomeInfoDb_1.43.2
## [13] IRanges_2.41.2 S4Vectors_0.45.2
## [15] BiocGenerics_0.53.3 generics_0.1.3
## [17] MatrixGenerics_1.19.0 matrixStats_1.4.1
## [19] miloR_2.3.0 edgeR_4.5.1
## [21] limma_3.63.2 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 sys_3.4.3 jsonlite_1.8.9
## [4] magrittr_2.0.3 ggbeeswarm_0.7.2 farver_2.1.2
## [7] rmarkdown_2.29 vctrs_0.6.5 memoise_2.0.1
## [10] htmltools_0.5.8.1 S4Arrays_1.7.1 AnnotationHub_3.15.0
## [13] curl_6.0.1 BiocNeighbors_2.1.2 SparseArray_1.7.2
## [16] sass_0.4.9 pracma_2.4.4 bslib_0.8.0
## [19] cachem_1.1.0 buildtools_1.0.0 igraph_2.1.2
## [22] mime_0.12 lifecycle_1.0.4 pkgconfig_2.0.3
## [25] rsvd_1.0.5 Matrix_1.7-1 R6_2.5.1
## [28] fastmap_1.2.0 GenomeInfoDbData_1.2.13 digest_0.6.37
## [31] numDeriv_2016.8-1.1 colorspace_2.1-1 AnnotationDbi_1.69.0
## [34] dqrng_0.4.1 irlba_2.3.5.1 ExperimentHub_2.15.0
## [37] RSQLite_2.3.9 beachmat_2.23.5 labeling_0.4.3
## [40] filelock_1.0.3 httr_1.4.7 polyclip_1.10-7
## [43] abind_1.4-8 compiler_4.4.2 bit64_4.5.2
## [46] withr_3.0.2 BiocParallel_1.41.0 viridis_0.6.5
## [49] DBI_1.2.3 ggforce_0.4.2 MASS_7.3-61
## [52] rappdirs_0.3.3 DelayedArray_0.33.3 bluster_1.17.0
## [55] gtools_3.9.5 tools_4.4.2 vipor_0.4.7
## [58] beeswarm_0.4.0 glue_1.8.0 grid_4.4.2
## [61] cluster_2.1.8 gtable_0.3.6 tidyr_1.3.1
## [64] BiocSingular_1.23.0 tidygraph_1.3.1 ScaledMatrix_1.15.0
## [67] metapod_1.15.0 XVector_0.47.1 ggrepel_0.9.6
## [70] BiocVersion_3.21.1 pillar_1.10.0 stringr_1.5.1
## [73] splines_4.4.2 tweenr_2.0.3 BiocFileCache_2.15.0
## [76] lattice_0.22-6 FNN_1.1.4.1 bit_4.5.0.1
## [79] tidyselect_1.2.1 locfit_1.5-9.10 maketools_1.3.1
## [82] Biostrings_2.75.3 knitr_1.49 gridExtra_2.3
## [85] xfun_0.49 graphlayouts_1.2.1 statmod_1.5.0
## [88] stringi_1.8.4 UCSC.utils_1.3.0 yaml_2.3.10
## [91] evaluate_1.0.1 codetools_0.2-20 ggraph_2.2.1
## [94] tibble_3.2.1 BiocManager_1.30.25 cli_3.6.3
## [97] uwot_0.2.2 munsell_0.5.1 jquerylib_0.1.4
## [100] Rcpp_1.0.13-1 dbplyr_2.5.0 png_0.1-8
## [103] parallel_4.4.2 blob_1.2.4 viridisLite_0.4.2
## [106] scales_1.3.0 purrr_1.0.2 crayon_1.5.3
## [109] rlang_1.1.4 cowplot_1.1.3 KEGGREST_1.47.0