muscat
For details on the concepts presented here, consider having a look at our publication:
Crowell HL, Soneson C*, Germain P-L*, Calini D,
Collin L, Raposo C, Malhotra D, and Robinson MD:
muscat detects subpopulation-specific state transitions from
multi-sample multi-condition single-cell transcriptomics data.
Nature Communications 11, 6077 (2020).
DOI: 10.1038/s41467-020-19894-4
To demonstrate muscat’s simulation framework, we will use a SingleCellExperiment (SCE) containing 10x droplet-based scRNA-seq PBCM data from 8 Lupus patients obtained befor and after 6h-treatment with IFN-β (Kang et al. 2018). The complete raw data, as well as gene and cell metadata is available through the NCBI GEO, accession number GSE96583.
muscat’s simulation framework comprises: i) estimation of negative binomial (NB) parameters from a reference multi-subpopulation, multi-sample dataset; ii) sampling of gene and cell parameters to use for simulation; and, iii) simulation of gene expression data as NB distributions of mixtures thereof. See Fig. @ref(fig:1a).
Let Y = (ygc) ∈ ℕ0G × C denote the count matrix of a multi-sample multi-subpopulation reference dataset with genes 𝒢 = {g1, …, gG} and sets of cells 𝒞sk = {c1sk, ..., cCsksk} for each sample s and subpopulation k (Csk is the number of cells for sample s, subpopulation k). For each gene g, we fit a model to estimate sample-specific means βgs, for each sample s, and dispersion parameters ϕg using ’s function with default parameters. Thus, we model the reference count data as NB distributed:
Ygc ∼ NB(μgc, ϕg)
for gene g and cell c, where the mean μgc = exp (βgs(c)) ⋅ λc. Here, βgs(c) is the relative abundance of gene g in sample s(c), λc is the library size (total number of counts), and ϕg is the dispersion.
For each subpopulation, we randomly assign each gene to a given
differential distribution (DD) category (Korthauer et al. 2016) according to a
probability vector p_dd
= (pEE, pEP, pDE, pDP, pDM, pDB).
For each gene and subpopulation, we draw a vector of fold changes (FCs)
from a Gamma distribution with shape parameter α = 4 and rate β = 4/μlogFC,
where μlogFC is the
desired average logFC across all genes and subpopulations specified via
argument lfc
. The direction of differential expression is
randomized for each gene, with equal probability of up- and
down-regulation.
Next, we split the cells in a given subpopulations into two sets (representing treatment groups), 𝒯A and 𝒯B, which are in turn split again into two sets each (representing subpopulations within the given treatment group.), 𝒯A1/𝒯A2 and 𝒯B1/𝒯B2.
For EE genes, counts for 𝒯A and 𝒯B are drawn using identical means.For EP genes, we multiply the effective means for identical fractions of cells per group by the sample FCs, i.e., cells are split such that dim 𝒯A1 = dim 𝒯B1 and dim 𝒯A2 = dim 𝒯B2. For DE genes, the means of one group, A or B, are multiplied with the samples FCs. DP genes are simulated analogously to EP genes with dim 𝒯A1 = a ⋅ dim 𝒯A and dim 𝒯B1 = b ⋅ dim 𝒯B, where a + b = 1 and a ≠ b. For DM genes, 50% of cells from one group are simulated at μ ⋅ logFC. For DB genes, all cells from one group are simulated at μ ⋅ logFC/2, and the second group is split into equal proportions of cells simulated at μ and μ ⋅ logFC, respectively. See Fig. @ref(fig:1b).
prepSim
: Preparing data for simulationTo prepare a reference SingleCellExperiment
(SCE) for simulation of multi-sample multi-group scRNA-seq data,
prepSim
will
Importantly, we want to introduce known changes in states
across conditions; thus, only replicates from a single condition should
go into the simulation. The group to be kept for simulation may be
specified via group_keep
, in which case samples from all
other groups (sce$group_id != group_keep
) will be dropped.
By default (group_keep = NULL
), prepSim
will
select the first group available as reference.
Arguments min_count
, min_cells
,
min_genes
and min_size
are used to tune the
filtering of genes, cells and subpopulation-instances as follows:
> min_count
in
>= min_cells
will be retained> 0
in
>= min_genes
will be retained>= min_size
cells will be retained; min_size = NULL
will skip this
step# estimate simulation parameters
data(example_sce)
ref <- prepSim(example_sce, verbose = FALSE)
# only samples from `ctrl` group are retained
table(ref$sample_id)
##
## ctrl101 ctrl107
## 200 200
# cell parameters: library sizes
sub <- assay(example_sce[rownames(ref), colnames(ref)])
all.equal(exp(ref$offset), as.numeric(colSums(sub)))
## [1] "names for target but not for current"
## [2] "Mean relative difference: 0.4099568"
## DataFrame with 6 rows and 4 columns
## ENSEMBL SYMBOL beta disp
## <character> <character> <DataFrame> <numeric>
## ISG15 ENSG00000187608 ISG15 -7.84582:-0.24689653:-1.039481 4.6562351
## AURKAIP1 ENSG00000175756 AURKAIP1 -7.84854: 0.00760804:-1.171909 0.1769946
## MRPL20 ENSG00000242485 MRPL20 -8.31434:-0.58684228:-0.304824 0.6429744
## SSU72 ENSG00000160075 SSU72 -8.05155:-0.17122919:-0.793248 0.0342695
## RER1 ENSG00000157916 RER1 -7.75327: 0.10732490:-1.261825 0.5043444
## RPL22 ENSG00000116251 RPL22 -8.03551:-0.03356007: 0.143487 0.2013805
simData
: Simulating complex designsProvided with a reference SCE as returned by prepSim
, a
variery of simulation scenarios can be generated using the
simData
function, which will again return an SCE containg
the following elements:
assay
counts
containing the simulated
count datacolData
columns cluster/sample/group_id
containing each cells cluster, sample, and group ID (A or B).metadata$gene_info
containing a data.frame
listing, for each gene and cluster
category
logFC
; note that this will only approximate
log2(sim_mean.B/sim_mean.A)
for genes of the
de
category as other types of state changes use mixtures
for NBs, and will consequently not exhibit a shift in means of the same
magnitude as logFC
sim_gene
from which dispersion
sim_disp
and sample-specific means
beta.<sample_id>
were usedsim_mean.A/B
for each
groupIn the code chunk that follows, we run a simple simulation with
p_dd = c(1,0,...0)
, i.e., 10% of EE genesnk = 3
subpopulations and ns = 3
replicates for each of 2 groupsng = 1000
genes and nc = 2000
cells,
resulting in 2000/2/ns/nk
≈ 111 cells for 2 groups with 3 samples each
and 3 subpopulations# simulated 10% EE genes
sim <- simData(ref,
nc = 2e3, ng = 1e3, force = TRUE,
p_dd = diag(6)[1, ], nk = 3, ns = 3)
# number of cells per sample and subpopulation
table(sim$sample_id, sim$cluster_id)
##
## cluster1 cluster2 cluster3
## sample1.A 113 105 103
## sample2.A 119 119 111
## sample3.A 98 108 115
## sample1.B 120 114 112
## sample2.B 117 112 104
## sample3.B 99 115 116
By default, we have drawn a random reference sample from
levels(ref$sample_id)
for every simulated sample in each
group, resulting in an unpaired design:
## A B
## sample1 "ctrl107" "ctrl107"
## sample2 "ctrl101" "ctrl107"
## sample3 "ctrl101" "ctrl101"
Alternatively, we can re-run the above simulation with
paired = TRUE
such that both groups will use the same set
of reference samples, resulting in a paired design:
# simulated paired design
sim <- simData(ref,
nk = 3, ns = 3, paired = TRUE,
nc = 2e3, ng = 1e3, force = TRUE)
# same set of reference samples for both groups
ref_sids <- metadata(sim)$ref_sids
all(ref_sids[, 1] == ref_sids[, 2])
## [1] TRUE
p_dd
: Simulating differential distributionsArgument p_dd
specifies the fraction of cells to
simulate for each DD category. Its values should thus lie in [0, 1] and sum to 1. Expression densities for
an exemplary set of genes simulated from the code below is shown in Fig.
@ref(fig:densities).
# simulate genes from all DD categories
sim <- simData(ref,
p_dd = c(0.5, rep(0.1, 5)),
nc = 2e3, ng = 1e3, force = TRUE)
We can retrieve the category assigned to each gene in each cluster
from the gene_info
table stored in the output SCE’s
metadata
:
##
## ee ep de dp dm db
## 1012 204 200 184 196 204
rel_lfc
: Simulating cluster-specific state changesBy default, for each gene and subpopulation, we draw a vector of fold
changes (FCs) from a Gamma distribution with rate parameter β ∝ μlogFC,
where μlogFC is the
desired average logFC across all genes and subpopulations specified via
argument lfc
. This results in state changes that are of
same magnitute for each subpopulation.
Now, suppose we wanted to have a subpopulation that does not
exhibit any state changes across conditions, or vary the magnitute of
changes across subpopulations. To this end, argument
rel_lfc
supplies a subpopulation-specific factor applied to
the FCs sampled for subpopulation. Fig. @ref(fig:rel-lfc) demonstrates
how this manifests in in two-dimensional embeddings of the cells: Here,
we generate a set of 3 simulations with
rel_lfc=c(1,1,1)
rel_lfc=c(1,1,3)
rel_lfc=c(0,1,2)
p_type
: Simulating type featuresThe idea underlying differential state (DS) analysis to test for subpopulation-specific changes in expression across experimental conditions is based on the idea that we i) use stable moleculare signatures (i.e., type features) to group cells into meaningful subpopulations; and, ii) perform statistical tests on state features that are more transiently expression and may be subject to changes in expression upon, for example, treatment or during disease.
The fraction of type features introduced into each subpopulation is
specified via argument p_type
. Note that, without
introducing any differential states, a non-zero fraction of type genes
will result in separation of cells into clusters. Fig. @ref(fig:p-type)
demonstrates how increasing values for p_type
lead to more
and more separation of the cells when coloring by cluster ID, but that
the lack of state changes leads to homogenous mixing of cells when
coloring by group ID.
## Warning in get_plot_component(plot, "guide-box"): Multiple components found;
## returning the first one. To return all, use `return_all = TRUE`.
## Warning in get_plot_component(plot, "guide-box"): Multiple components found;
## returning the first one. To return all, use `return_all = TRUE`.
simData
contains three parameters that control how
subpopulations relate to and differ from one another:
p_type
determines the percentage of type genes
exclusive to each clusterphylo_tree
represents a phylogenetic tree specifying of
clusters relate to one anotherphylo_pars
controls how branch distances are to be
interpretedNote that, when supplied with a cluster phylogeny, argument
nk
is ignored and simData
extracts the number
of clusters to be simulated from phylo_tree
.
p_type
: Introducing type featuresTo exemplify the effect of the parameter p_type
, we
simulate a dataset with ≈ 5% of type
genes per cluster, and one group only via
probs = list(..., c(1, 0)
(i.e., Prob(cell is in group 2) = 0):
# simulate 5% of type genes; one group only
sim <- simData(ref, p_type = 0.1,
nc = 2e3, ng = 1e3, force = TRUE,
probs = list(NULL, NULL, c(1, 0)))
# do log-library size normalization
sim <- logNormCounts(sim)
For visualizing the above simulation, we select for genes that are of
class type (rowData()$class == "type"
) and have a
decent simulated expression mean. Furthermore, we sample a subset of
cells for each cluster. The resulting heatmap (Fig.
@ref(fig:heatmap-type)) shows that the 3 clusters separate well from one
another, but that type genes aren’t necessarily expressed higher in a
single cluster. This is the case because a gene selected as reference
for a type gene in a given cluster may indeed have a lower expression
than the gene used for the remainder of clusters.
# extract gene metadata & number of clusters
rd <- rowData(sim)
nk <- nlevels(sim$cluster_id)
# filter for type genes with high expression mean
is_type <- rd$class == "type"
is_high <- rowMeans(assay(sim, "logcounts")) > 1
gs <- rownames(sim)[is_type & is_high]
# sample 100 cells per cluster for plotting
cs <- lapply(split(seq_len(ncol(sim)), sim$cluster_id), sample, 100)
plotHeatmap(sim[, unlist(cs)], features = gs, center = TRUE,
colour_columns_by = "cluster_id", cutree_cols = nk)
phylo_tree
: Introducing a cluster phylogenyThe scenario illustrated above is arguably not very realistic. Instead, in a biology setting, subpopulations don’t differ from one another by a specific subset of genes, but may share some of the genes decisive for their biological role. I.e., the set type features is not exclusive for every given subpopulation, and some subpopulations are more similar to one another than others.
To introduce a more realistic subpopulation structure,
simData
can be supplied with a phylogenetic tree,
phylo_tree
, that specifies the relationship and distances
between clusters. The tree should be written in Newick format as in the
following example:
# specify cluster phylogeny
tree <- "(('cluster1':0.4,'cluster2':0.4):0.4,('cluster3':
0.5,('cluster4':0.2,'cluster5':0.2,'cluster6':0.2):0.4):0.4);"
# visualize cluster tree
library(phylogram)
plot(read.dendrogram(text = tree))
# simulate 5% of type genes; one group only
sim <- simData(ref,
phylo_tree = tree, phylo_pars = c(0.1, 1),
nc = 800, ng = 1e3, dd = FALSE, force = TRUE)
# do log-library size normalization
sim <- logNormCounts(sim)
# extract gene metadata & number of clusters
rd <- rowData(sim)
nk <- nlevels(sim$cluster_id)
# filter for type & shared genes with high expression mean
is_type <- rd$class != "state"
is_high <- rowMeans(assay(sim, "logcounts")) > 1
gs <- rownames(sim)[is_type & is_high]
# sample 100 cells per cluster for plotting
cs <- lapply(split(seq_len(ncol(sim)), sim$cluster_id), sample, 50)
plotHeatmap(sim[, unlist(cs)], features = gs,
center = TRUE, show_rownames = FALSE,
colour_columns_by = "cluster_id")
under development.
As is the case with any simulation, it is crutial to verify the
qualitation of the simulated data; i.e., how well key characteristics of
the reference data are captured in the simulation. While we have
demonstrated that muscat
s simulation framework is capable
of reproducing key features of scRNA-seq dataset at both the single-cell
and pseudobulk level (Crowell2019-muscat?),
simulation quality will vary depending on the reference dataset and
could suffer from too extreme simulation parameters. Therefore, we
advise anyone interested in using the framework presented herein for any
type of method evaluation or comparison to generate countsimQC
report (Soneson and Robinson 2018) as it
is extremly simple to make and very comprehensive.
The code chunk below (not evaluated here) illustrates how to generate
a report comparing an exemplary simData
simulation with the
reference data provided in ref
. Runtimes are mainly
determined by argument maxNForCorr
and
maxNForDisp
, and computing a full-blown report can be
very time intensive. We thus advice using a sufficient but low
number of cells/genes for these steps.
# load required packages
library(countsimQC)
library(DESeq2)
# simulate data
sim <- simData(ref,
ng = nrow(ref),
nc = ncol(ref),
dd = FALSE)
# construct 'DESeqDataSet's for both,
# simulated and reference dataset
dds_sim <- DESeqDataSetFromMatrix(
countData = counts(sim),
colData = colData(sim),
design = ~ sample_id)
dds_ref <- DESeqDataSetFromMatrix(
countData = counts(ref),
colData = colData(ref),
design = ~ sample_id)
dds_list <- list(sim = dds_sim, data = dds_ref)
# generate 'countsimQC' report
countsimQCReport(
ddsList = dds_list,
outputFile = "<file_name>.html",
outputDir = "<output_path>",
outputFormat = "html_document",
maxNForCorr = 200,
maxNForDisp = 500)
A variety of functions for calculation and visualizing performance metrics for evaluation of ranking and binary classification (assignment) methods is provided in the iCOBRA package (Soneson and Robinson 2016).
We firstly define a wrapper that takes as input a method
passed pbDS
and reformats the results as a
data.frame
in tidy format, which is in turn
right_join
ed with simulation gene metadata. As each methods
may return results for different subsets of gene-subpopulation
instances, the latter steps assures that the dimensions of all method
results will match.
# 'm' is a character string specifying a valid `pbDS` method
.run_method <- function(m) {
res <- pbDS(pb, method = m, verbose = FALSE)
tbl <- resDS(sim, res)
left_join(gi, tbl, by = c("gene", "cluster_id"))
}
Having computed result data.frame
s for a set of methods,
we next define a wrapper that prepares the data for evaluation with
iCOBRA
using the COBRAData
constructor, and
calculates any performance measures of interest (specified via
aspects
) with calculate_performance
:
# 'x' is a list of result 'data.frame's
.calc_perf <- function(x, facet = NULL) {
cd <- COBRAData(truth = gi,
pval = data.frame(bind_cols(map(x, "p_val"))),
padj = data.frame(bind_cols(map(x, "p_adj.loc"))))
perf <- calculate_performance(cd,
binary_truth = "is_de", maxsplit = 1e6,
splv = ifelse(is.null(facet), "none", facet),
aspects = c("fdrtpr", "fdrtprcurve", "curve"))
}
Putting it all together, we can finally simulate some data, run a set
of DS analysis methods, calculate their performance, and plot a variety
of performance metrics depending on the aspects
calculated
by .calc_perf
:
# simulation with all DD types
sim <- simData(ref,
p_dd = c(rep(0.3, 2), rep(0.1, 4)),
ng = 1e3, nc = 2e3, ns = 3, force = TRUE)
# aggregate to pseudobulks
pb <- aggregateData(sim)
# extract gene metadata
gi <- metadata(sim)$gene_info
# add truth column (must be numeric!)
gi$is_de <- !gi$category %in% c("ee", "ep")
gi$is_de <- as.numeric(gi$is_de)
# specify methods for comparison & run them
# (must set names for methods to show in visualizations!)
names(mids) <- mids <- c("edgeR", "DESeq2", "limma-trend", "limma-voom")
res <- lapply(mids, .run_method)
# calculate performance measures
# and prep. for plotting with 'iCOBRA'
library(iCOBRA)
perf <- .calc_perf(res, "cluster_id")
pd <- prepare_data_for_plot(perf)
# plot FDR-TPR curves by cluster
plot_fdrtprcurve(pd,
linewidth = 0.8, pointsize = 2) +
facet_wrap(~ splitval, nrow = 1) +
scale_x_continuous(trans = "sqrt") +
theme(aspect.ratio = 1)
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] iCOBRA_1.33.0 phylogram_2.1.0
## [3] reshape2_1.4.4 tidyr_1.3.1
## [5] UpSetR_1.4.0 scater_1.33.4
## [7] scuttle_1.15.5 SingleCellExperiment_1.27.2
## [9] SummarizedExperiment_1.35.5 Biobase_2.67.0
## [11] GenomicRanges_1.57.2 GenomeInfoDb_1.41.2
## [13] IRanges_2.39.2 S4Vectors_0.43.2
## [15] MatrixGenerics_1.17.1 matrixStats_1.4.1
## [17] ExperimentHub_2.15.0 AnnotationHub_3.15.0
## [19] BiocFileCache_2.15.0 dbplyr_2.5.0
## [21] BiocGenerics_0.53.0 purrr_1.0.2
## [23] muscat_1.21.0 limma_3.61.12
## [25] ggplot2_3.5.1 dplyr_1.1.4
## [27] cowplot_1.1.3 patchwork_1.3.0
## [29] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] RcppAnnoy_0.0.22 later_1.3.2 splines_4.4.1
## [4] filelock_1.0.3 bitops_1.0-9 tibble_3.2.1
## [7] lifecycle_1.0.4 Rdpack_2.6.1 edgeR_4.3.21
## [10] doParallel_1.0.17 globals_0.16.3 lattice_0.22-6
## [13] MASS_7.3-61 backports_1.5.0 magrittr_2.0.3
## [16] sass_0.4.9 rmarkdown_2.28 jquerylib_0.1.4
## [19] yaml_2.3.10 shinyBS_0.61.1 httpuv_1.6.15
## [22] sctransform_0.4.1 DBI_1.2.3 buildtools_1.0.0
## [25] minqa_1.2.8 RColorBrewer_1.1-3 abind_1.4-8
## [28] zlibbioc_1.51.2 Rtsne_0.17 EnvStats_3.0.0
## [31] glmmTMB_1.1.10 rappdirs_0.3.3 circlize_0.4.16
## [34] GenomeInfoDbData_1.2.13 ggrepel_0.9.6 pbkrtest_0.5.3
## [37] irlba_2.3.5.1 listenv_0.9.1 maketools_1.3.1
## [40] pheatmap_1.0.12 parallelly_1.38.0 codetools_0.2-20
## [43] DelayedArray_0.33.1 DT_0.33 tidyselect_1.2.1
## [46] shape_1.4.6.1 UCSC.utils_1.1.0 farver_2.1.2
## [49] lme4_1.1-35.5 ScaledMatrix_1.13.0 viridis_0.6.5
## [52] jsonlite_1.8.9 GetoptLong_1.0.5 BiocNeighbors_2.1.0
## [55] iterators_1.0.14 foreach_1.5.2 tools_4.4.1
## [58] progress_1.2.3 Rcpp_1.0.13 blme_1.0-6
## [61] glue_1.8.0 gridExtra_2.3 SparseArray_1.5.45
## [64] xfun_0.48 mgcv_1.9-1 DESeq2_1.47.0
## [67] stageR_1.27.0 shinydashboard_0.7.2 withr_3.0.2
## [70] numDeriv_2016.8-1.1 BiocManager_1.30.25 fastmap_1.2.0
## [73] boot_1.3-31 fansi_1.0.6 caTools_1.18.3
## [76] digest_0.6.37 rsvd_1.0.5 mime_0.12
## [79] R6_2.5.1 colorspace_2.1-1 Cairo_1.6-2
## [82] gtools_3.9.5 RSQLite_2.3.7 RhpcBLASctl_0.23-42
## [85] utf8_1.2.4 generics_0.1.3 variancePartition_1.35.4
## [88] data.table_1.16.2 corpcor_1.6.10 htmlwidgets_1.6.4
## [91] prettyunits_1.2.0 httr_1.4.7 S4Arrays_1.5.11
## [94] uwot_0.2.2 pkgconfig_2.0.3 gtable_0.3.6
## [97] blob_1.2.4 ComplexHeatmap_2.23.0 XVector_0.45.0
## [100] sys_3.4.3 remaCor_0.0.18 htmltools_0.5.8.1
## [103] TMB_1.9.15 clue_0.3-65 scales_1.3.0
## [106] png_0.1-8 fANCOVA_0.6-1 reformulas_0.3.0
## [109] knitr_1.48 rjson_0.2.23 curl_5.2.3
## [112] nlme_3.1-166 nloptr_2.1.1 cachem_1.1.0
## [115] GlobalOptions_0.1.2 stringr_1.5.1 BiocVersion_3.21.1
## [118] KernSmooth_2.23-24 parallel_4.4.1 vipor_0.4.7
## [121] AnnotationDbi_1.69.0 pillar_1.9.0 grid_4.4.1
## [124] vctrs_0.6.5 gplots_3.2.0 promises_1.3.0
## [127] BiocSingular_1.23.0 beachmat_2.23.0 xtable_1.8-4
## [130] cluster_2.1.6 beeswarm_0.4.0 evaluate_1.0.1
## [133] mvtnorm_1.3-1 cli_3.6.3 locfit_1.5-9.10
## [136] compiler_4.4.1 rlang_1.1.4 crayon_1.5.3
## [139] future.apply_1.11.3 labeling_0.4.3 plyr_1.8.9
## [142] ggbeeswarm_0.7.2 stringi_1.8.4 viridisLite_0.4.2
## [145] BiocParallel_1.41.0 Biostrings_2.75.0 lmerTest_3.1-3
## [148] munsell_0.5.1 aod_1.3.3 Matrix_1.7-1
## [151] hms_1.1.3 bit64_4.5.2 future_1.34.0
## [154] shiny_1.9.1 KEGGREST_1.45.1 statmod_1.5.0
## [157] highr_0.11 ROCR_1.0-11 rbibutils_2.3
## [160] memoise_2.0.1 broom_1.0.7 bslib_0.8.0
## [163] bit_4.5.0 ape_5.8