Introduction
This is the tutorial for ADAPT
(analysis of microbiome
differential abundance analysis by pooling Tobit models). Differential
abundance analysis (DAA) is one of the fundamental steps in microbiome
research1. The metagenomics
count data have excessive zeros and are compositional, causing
challenges for detecting microbiome taxa whose abundances are associated
with different conditions. ADAPT
overcomes these challenges
differently from other DAA methods. First, ADAPT
treats
zero counts as left censored observations and apply Tobit models which
are censored regressions2 for
modeling the log count ratios. Second, ADAPT
has an
innovative and theoretically justified way of finding non-differentially
abundant taxa as reference taxa. It then applies Tobit models to log
count ratios between individual taxa and the summed counts of reference
taxa to identify differentially abundant taxa. The overall workflow is
presented in the graph below.
There are two ways to install the package, either through GitHub or
Bioconductor.
# install through GitHub
if(!require("ADAPT")) BiocManager::install("mkbwang/ADAPT", build_vignettes = TRUE)
# install through Bioconductor
if(!require("ADAPT")) BiocManager::install("ADAPT", version="devel",
build_vignettes = TRUE)
Main function adapt
The main function adapt
takes in a phyloseq3 object that needs to have at least
a count table and a sample metadata. The metadata must contain one
variable that represents the condition to test on. We currently allow
binary and continuous conditions. If there are more than two groups for
the condition variable, the user may single out one group to compare
with all the others for DAA. ADAPT
allows adjusting for
extra covariates. There are eight arguments in the adapt
function:
input_data
is the phyloseq object with the count matrix
and metadata. This argument is required.
cond.var
is the variable in the metadata that
represents the condition to test on. This argument is required.
base.cond
is the group that the user is interested in
comparing against other groups. This argument is only needed if the
condition is categorical.
adj.var
contains the covariates to adjust for. It can
be empty or be a vector of variable names.
censor
is the value to censor at for all the zero
counts. By default the zeros are censored at one.
prev.filter
is the threshold for filtering out rare
taxa. By default taxa with prevalence lower than 0.05 are filtered out
before analysis.
depth.filter
is the threshold for filtering out samples
with low sequencing depths. The default threshold is 1000 reads.
alpha
is the cutoff for the Benjamini-hochberg adjusted
p-values to decide differentially abundant taxa. The default is
0.05.
The ADAPT
package contains two metagenomics datasets
from an early childhood dental caries study4. One corresponds to 16S rRNA
sequencing of 161 saliva samples collected from 12-month-old infants
(ecc_saliva
). The other corresponds to shotgun metagenomic
sequencing of 30 plaque samples collected from kids between 36 and 60
months old (ecc_plaque
). In this vignette let’s use the
saliva data to find out differentially abundant taxa between kids who
developed early childhood caries (ECC) after 36 months old and those who
didn’t. Besides the main variable CaseStatus
, there is
another variable Site
representing the site where each
sample was collected. Both variables are discrete and contain two
categories each.
library(ADAPT)
#> Warning: multiple methods tables found for 'intersect'
#> Warning: multiple methods tables found for 'intersect'
data(ecc_saliva)
We can run adapt
with or without covariate adjustment.
The returned object is a customized S4-type object with slots
corresponding to analysis name, reference taxa, differentially abundant
taxa, detailed analysis results (a dataframe) and input phyloseq
object.
saliva_output_noadjust <- adapt(input_data=ecc_saliva, cond.var="CaseStatus", base.cond="Control")
#> Choose 'Control' as the baseline condition
#> 155 taxa and 161 samples being analyzed...
#> Selecting Reference Set... 77 taxa selected as reference
#> 27 differentially abundant taxa detected
saliva_output_adjust <- adapt(input_data=ecc_saliva, cond.var="CaseStatus", base.cond="Control", adj.var="Site")
#> Choose 'Control' as the baseline condition
#> 155 taxa and 161 samples being analyzed...
#> Selecting Reference Set... 77 taxa selected as reference
#> 28 differentially abundant taxa detected
The Site
is not a confounding variable, therefore the
DAA results only differ by one taxon.
Explore Analysis Results
We have developed two utility functions for result exploration.
summary
returns a dataframe with the estimated log10
abundance fold changes, hypothesis test p-values and taxonomy (if
provided in the phyloseq object). The user can choose to return results
of all the taxa, the differentially abundant taxa only or the reference
taxa only through the select
variable (“all”, “da” or
“ref”). We choose to return the results of only the DA taxa below.
DAtaxa_result <- summary(saliva_output_noadjust, select="da")
#> Differential abundance analysis for CaseStatus (Case VS Control)
#> 155 taxa and 161 samples analyzed
#> 77 taxa are selected as reference and 27 taxa are differentially abundant
head(DAtaxa_result)
#> Taxa prevalence log10foldchange teststat pval adjusted_pval
#> ASV3 ASV3 0.9751553 -0.3618780 9.313926 2.274187e-03 1.855258e-02
#> ASV8 ASV8 0.9689441 0.3567131 9.881400 1.669578e-03 1.437693e-02
#> ASV9 ASV9 0.8695652 -0.8925900 19.157084 1.203899e-05 4.665108e-04
#> ASV14 ASV14 0.7204969 0.9203963 13.159234 2.861058e-04 4.031491e-03
#> ASV18 ASV18 0.7080745 -1.3929574 28.068324 1.171070e-07 9.075792e-06
#> ASV20 ASV20 0.4099379 2.2046044 31.758612 1.745735e-08 2.705889e-06
#> Kingdom Phylum Class Order
#> ASV3 Bacteria Proteobacteria Gammaproteobacteria Pasteurellales
#> ASV8 Bacteria Firmicutes Bacilli Lactobacillales
#> ASV9 Bacteria Proteobacteria Betaproteobacteria Neisseriales
#> ASV14 Bacteria Firmicutes Bacilli Lactobacillales
#> ASV18 Bacteria Fusobacteria Fusobacteriia Fusobacteriales
#> ASV20 Bacteria Bacteroidetes Bacteroidia Bacteroidales
#> Family Genus Species
#> ASV3 Pasteurellaceae Haemophilus parainfluenzae
#> ASV8 Streptococcaceae Streptococcus salivarius
#> ASV9 Neisseriaceae Neisseria subflava
#> ASV14 Streptococcaceae Streptococcus <NA>
#> ASV18 Fusobacteriaceae Fusobacterium periodonticum
#> ASV20 Prevotellaceae Prevotella <NA>
We have also developed a plot
function to generate a
volcano plot. The DA taxa are highlighted in red. The user can decide
how many points with the smallest p-values are labeled (default 5) with
n.label
argument.
plot(saliva_output_noadjust, n.label=5)