Title: | Analysis of Microbiome Differential Abundance by Pooling Tobit Models |
---|---|
Description: | ADAPT carries out differential abundance analysis for microbiome metagenomics data in phyloseq format. It has two innovations. One is to treat zero counts as left censored and use Tobit models for log count ratios. The other is an innovative way to find non-differentially abundant taxa as reference, then use the reference taxa to find the differentially abundant ones. |
Authors: | Mukai Wang [aut, cre] , Simon Fontaine [ctb], Hui Jiang [ctb], Gen Li [aut, ctb] |
Maintainer: | Mukai Wang <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.0 |
Built: | 2024-12-17 06:15:00 UTC |
Source: | https://github.com/bioc/ADAPT |
Analysis of microbiome differential abundance by pooling tobit models
adapt( input_data, cond.var, base.cond = NULL, adj.var = NULL, censor = 1, prev.filter = 0.05, depth.filter = 1000, alpha = 0.05 )
adapt( input_data, cond.var, base.cond = NULL, adj.var = NULL, censor = 1, prev.filter = 0.05, depth.filter = 1000, alpha = 0.05 )
input_data |
a phyloseq object |
cond.var |
the variable representing the conditions to compare, a character string |
base.cond |
the condition chosen as baseline. This is only used when the condition is categorical. |
adj.var |
the names of the variables to be adjusted, a vector of character strings |
censor |
the value to censor at for zero counts, default 1 |
prev.filter |
taxa whose prevalences are smaller than the cutoff will be excluded from analysis, default 0.05 |
depth.filter |
a sample would be discarded if its library size is smaller than the threshold |
alpha |
the cutoff of the adjusted p values |
ADAPT takes in a metagenomics count table as a phyloseq object.
The phyloseq object needs to have metadata containing
at least one variable cond.var
representing the conditions that the user is testing on.
The condition variable cond.var
can be numeric (as a continuous variable) or character (representing categorical variable).
ADAPT does not support multigroup comparison yet. If there are multiple conditions,
the user can specify the condition to single out through base.cond
. ADAPT then carry out DAA between
the selected base.cond
and all the others. ADAPT allows adjusting for other covariates. The
user can specify all the covariates to adjust for by specifying adj.var
with a vector of variable names.
Differential abundance analysis may be too challenging for rare taxa and samples with too low sequencing depth.
The users can filter out taxa whose prevalences are lower than prev.filter
(default 0.05). The users
can also filter out samples whose sequencing depths (library sizes) are smaller than depth.filter
(default 1000).
One major feature of ADAPT is treating zero counts as left censored observations and use Tobit models for log count ratios.
The zero counts by default are left censored at one. The users can change the value to censor at through censor
.
Change the cutoff of BH-adjusted p-values with alpha
(default 0.05) for calling DA taxa.
The returned value of adapt
is a customized S4 type called DAresult
. We have developed two helper functions summary
and plot
for this special data type.
a DAresult
type object contains the input and the output. Use summary and plot to explore the output
data(ecc_plaque) plaque_results <- adapt(input_data=ecc_plaque, cond.var="CaseStatus", base.cond="case") data(ecc_saliva) saliva_results <- adapt(input_data=ecc_saliva, cond.var="CaseStatus", base.cond="Control", adj.var="Site")
data(ecc_plaque) plaque_results <- adapt(input_data=ecc_plaque, cond.var="CaseStatus", base.cond="case") data(ecc_saliva) saliva_results <- adapt(input_data=ecc_saliva, cond.var="CaseStatus", base.cond="Control", adj.var="Site")
An S4 class to represent ADAPT analysis results
The analysis result object contains the analysis name, reference taxa, DA taxa,
detailed analysis results as a dataframe and the input phyloseq object.
The analysis name contains the condition variable. The reference taxa reference
must be nonempty. DA taxa signal
may be an empty string if no taxa are differentially abundant.
The details
dataframe contains the taxa names, the prevalence of taxa, the estimated
log10 absolute abundance fold changes, the raw hypothesis test p-values and BH-adjusted p-values.
DAAname
The name of differential abundance analysis
reference
A vector of taxa names corresponding to all the reference taxa
signal
A vector of taxa names corresponding to all the DA taxa
details
A dataframe with the analysis results for all taxa
input
Input phyloseq object
A phyloseq object with 30 samples and 610 taxa for whole genome sequencing of plaque samples. The samples were collected from children at 36, 48 or 60 months old. 15 samples were from teeth with dental lesions and 15 samples were controls. The samples of cases were collected at the onset visit.
data(ecc_plaque)
data(ecc_plaque)
The metadata of ecc_plaque
has two columns
Whether the kids had dental caries
The location of sample collection (Site 1 or Site 2).
The original publication of this data is "Evaluating the ecological hypothesis: early life salivary microbiome assembly predicts dental caries in a longitudinal case-control study" (Blostein etal, 2022). The sequence data are available under Project number PRJNA752888.
A phyloseq object with 161 samples and 280 ASVs for 16S sequencing of saliva samples. The samples were collected from 12-month-old infants. 84 out of 161 children developed dental caries after 36 months old. All samples have been de-identified.
data(ecc_saliva)
data(ecc_saliva)
The metadata of ecc_saliva
has two columns
Whether the child developed dental caries after 36 months old
The location of sample collection (Site 1 or Site 2).
The original publication of this data is "Evaluating the ecological hypothesis: early life salivary microbiome assembly predicts dental caries in a longitudinal case-control study" (Blostein etal, 2022). The sequence data are available under Project number PRJNA752888.
Volcano plot of ADAPT results
## S4 method for signature 'DAresult,ANY' plot(x, n.label = 5)
## S4 method for signature 'DAresult,ANY' plot(x, n.label = 5)
x |
analysis result in |
n.label |
Number of taxa to label on the plot. Note that no taxa will be labeled if no DA taxa. |
The customized plot function for DAresult
type object generates a volcano plot with
the differentially abundant taxa highlighted. The users can decide how many taxa with
the smallest p-values are labeled on the plot.
A ggplot object of the volcano plot
data(ecc_saliva) saliva_results <- adapt(input_data=ecc_saliva, cond.var="CaseStatus", base.cond="Control", adj.var="Site") plot(saliva_results, n.label=10)
data(ecc_saliva) saliva_results <- adapt(input_data=ecc_saliva, cond.var="CaseStatus", base.cond="Control", adj.var="Site") plot(saliva_results, n.label=10)
Summary function for DAresult
type object
## S4 method for signature 'DAresult' summary(object, select = c("all", "da", "ref"))
## S4 method for signature 'DAresult' summary(object, select = c("all", "da", "ref"))
object |
analysis result in |
select |
Taxa whose results to be returned, can be all the taxa ("all"), only the differentially abundant taxa ("da") or reference taxa ("ref"). |
This customized summary function reports the dimension of input count table,
number of reference taxa and number of differentially abundant taxa. It also returns
a data frame with the detailed analysis result and taxonomy of all the taxa. The
user can choose to only get the detailed analysis result of DA taxa or the reference taxa
through the select
parameter.
A dataframe with detailed analysis results
data(ecc_saliva) saliva_results <- adapt(input_data=ecc_saliva, cond.var="CaseStatus", base.cond="Control", adj.var="Site") summary(saliva_results, select="da")
data(ecc_saliva) saliva_results <- adapt(input_data=ecc_saliva, cond.var="CaseStatus", base.cond="Control", adj.var="Site") summary(saliva_results, select="da")