Title: | Benchmark of differential abundance methods on microbiome data |
---|---|
Description: | Starting from a microbiome dataset (16S or WMS with absolute count values) it is possible to perform several analysis to assess the performances of many differential abundance detection methods. A basic and standardized version of the main differential abundance analysis methods is supplied but the user can also add his method to the benchmark. The analyses focus on 4 main aspects: i) the goodness of fit of each method's distributional assumptions on the observed count data, ii) the ability to control the false discovery rate, iii) the within and between method concordances, iv) the truthfulness of the findings if any apriori knowledge is given. Several graphical functions are available for result visualization. |
Authors: | Matteo Calgaro [aut, cre] , Chiara Romualdi [aut] , Davide Risso [aut] , Nicola Vitulo [aut] |
Maintainer: | Matteo Calgaro <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.13.1 |
Built: | 2024-11-21 03:09:18 UTC |
Source: | https://github.com/bioc/benchdamic |
Add a priori knowledge for each feature tested by a method.
addKnowledge(method, priorKnowledge, enrichmentCol, namesCol = NULL)
addKnowledge(method, priorKnowledge, enrichmentCol, namesCol = NULL)
method |
Output of differential abundance detection method in which
DA information is extracted by the |
priorKnowledge |
|
enrichmentCol |
name of the column containing information for enrichment analysis. |
namesCol |
name of the column containing new names for features (default
|
A data.frame
with a new column containing information for
enrichment analysis.
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # DA Analysis # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Add scaling factors ps_plaque_16S <- norm_edgeR(object = ps_plaque_16S, method = "TMM") # DA analysis da.limma <- DA_limma( object = ps_plaque_16S, design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = "TMM" ) DA <- getDA(method = da.limma, slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.05, threshold_logfc = 1, top = NULL) # Add a priori information DA_info <- addKnowledge(method = DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "newNames")
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # DA Analysis # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Add scaling factors ps_plaque_16S <- norm_edgeR(object = ps_plaque_16S, method = "TMM") # DA analysis da.limma <- DA_limma( object = ps_plaque_16S, design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = "TMM" ) DA <- getDA(method = da.limma, slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.05, threshold_logfc = 1, top = NULL) # Add a priori information DA_info <- addKnowledge(method = DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "newNames")
Compute the area between the bisector and the concordance curve.
areaCAT(concordance, plotIt = FALSE)
areaCAT(concordance, plotIt = FALSE)
concordance |
A long format |
plotIt |
Plot the concordance (default |
A long format data.frame
object with several columns:
comparison
which indicates the comparison number;
n_features
which indicates the total number of taxa in
the comparison dataset;
method1
which contains the first method name;
method2
which contains the first method name;
rank
;
concordance
which is defined as the cardinality of the
intersection of the top rank elements of each list, divided by rank, i.e.
, , where L and M represent
the lists of the extracted statistics of method1 and method2
respectively;
heightOver
which is the distance between the bisector and
the concordance value;
areaOver
which is the cumulative sum of the
heightOver
value.
createConcordance
and plotConcordance
data(ps_plaque_16S) # Balanced design for dependent samples my_splits <- createSplits( object = ps_plaque_16S, varName = "HMP_BODY_SUBSITE", balanced = TRUE, paired = "RSID", N = 10 # N = 100 suggested ) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Initialize some limma based methods my_limma <- set_limma(design = ~ RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Set the normalization methods according to the DA methods my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) # Run methods on split datasets results <- runSplits(split_list = my_splits, method_list = my_limma, normalization_list = my_norm, object = ps_plaque_16S) # Concordance for p-values concordance_pvalues <- createConcordance( object = results, slot = "pValMat", colName = "rawP", type = "pvalue" ) # Add area over the concordance curve concordance_area <- areaCAT(concordance = concordance_pvalues)
data(ps_plaque_16S) # Balanced design for dependent samples my_splits <- createSplits( object = ps_plaque_16S, varName = "HMP_BODY_SUBSITE", balanced = TRUE, paired = "RSID", N = 10 # N = 100 suggested ) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Initialize some limma based methods my_limma <- set_limma(design = ~ RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Set the normalization methods according to the DA methods my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) # Run methods on split datasets results <- runSplits(split_list = my_splits, method_list = my_limma, normalization_list = my_norm, object = ps_plaque_16S) # Concordance for p-values concordance_pvalues <- createConcordance( object = results, slot = "pValMat", colName = "rawP", type = "pvalue" ) # Add area over the concordance curve concordance_area <- areaCAT(concordance = concordance_pvalues)
For the i top-ranked members of each list, concordance is defined as
length(intersect(vec1[1:i],vec2[1:i]))/i
.
CAT(vec1, vec2, maxrank = min(length(vec1), length(vec2)))
CAT(vec1, vec2, maxrank = min(length(vec1), length(vec2)))
vec1 , vec2
|
Two numeric vectors, for computing concordance. If these are numeric vectors with names, the numeric values will be used for sorting and the names will be used for calculating concordance. Otherwise, they are assumed to be already-ranked vectors, and the values themselves will be used for calculating concordance. |
maxrank |
Optionally specify the maximum size of top-ranked items that you want to plot. |
a data.frame with two columns: rank
containing the length of
the top lists and concordance
which is the fraction in common that
the two provided lists have in the top rank
items.
vec1 <- c("A" = 10, "B" = 5, "C" = 20, "D" = 15) vec2 <- c("A" = 1, "B" = 2, "C" = 3, "D" = 4) CAT(vec1, vec2)
vec1 <- c("A" = 10, "B" = 5, "C" = 20, "D" = 15) vec2 <- c("A" = 1, "B" = 2, "C" = 3, "D" = 4) CAT(vec1, vec2)
Check if the normalization function's name and the method's name to compute normalization/scaling factors are correctly matched.
checkNormalization(fun, method, ...)
checkNormalization(fun, method, ...)
fun |
a character with the name of normalization function (e.g. "norm_edgeR", "norm_DESeq2", "norm_CSS"...). |
method |
a character with the normalization method (e.g.
"TMM", "upperquartile"... if the |
... |
other arguments if needed (e.g. for |
a list object containing the normalization method and its parameters.
setNormalizations
, norm_edgeR
,
norm_DESeq2
, norm_CSS
, norm_TSS
# Check if TMM normalization belong to "norm_edgeR" check_TMM_normalization <- checkNormalization(fun = "norm_edgeR", method = "TMM")
# Check if TMM normalization belong to "norm_edgeR" check_TMM_normalization <- checkNormalization(fun = "norm_edgeR", method = "TMM")
Produce a qualitative set of colors.
createColors(variable)
createColors(variable)
variable |
character vector or factor variable. |
A named vector containing the color codes.
# Given qualitative variable cond <- factor(c("A", "A", "B", "B", "C", "D"), levels = c("A", "B", "C", "D")) # Associate a color to each level (or unique value, if not a factor) cond_colors <- createColors(cond)
# Given qualitative variable cond <- factor(c("A", "A", "B", "B", "C", "D"), levels = c("A", "B", "C", "D")) # Associate a color to each level (or unique value, if not a factor) cond_colors <- createColors(cond)
Compute the between and within method concordances comparing the lists of extracted statistics from the outputs of the differential abundance detection methods.
createConcordance(object, slot = "pValMat", colName = "rawP", type = "pvalue")
createConcordance(object, slot = "pValMat", colName = "rawP", type = "pvalue")
object |
Output of differential abundance detection methods.
|
slot |
A character vector with 1 or number-of-methods-times repeats of
the slot names where to extract values for each method
(default |
colName |
A character vector with 1 or number-of-methods-times repeats
of the column name of the slot where to extract values for each method
(default |
type |
A character vector with 1 or number-of-methods-times repeats
of the value type of the column selected where to extract values for each
method. Two values are possible: |
A long format data.frame
object with several columns:
comparison
which indicates the comparison number;
n_features
which indicates the total number of taxa in
the comparison dataset;
method1
which contains the first method name;
method2
which contains the first method name;
rank
;
concordance
which is defined as the cardinality of the
intersection of the top rank elements of each list, divided by rank, i.e.
, , where L and M represent
the lists of the extracted statistics of method1 and method2
respectively (averaged values between subset1 and subset2).
extractStatistics
and areaCAT
.
data(ps_plaque_16S) # Balanced design my_splits <- createSplits( object = ps_plaque_16S, varName = "HMP_BODY_SUBSITE", balanced = TRUE, paired = "RSID", N = 10 # N = 100 suggested ) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Initialize some limma based methods my_limma <- set_limma(design = ~ RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Set the normalization methods according to the DA methods my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) # Run methods on split datasets results <- runSplits(split_list = my_splits, method_list = my_limma, normalization_list = my_norm, object = ps_plaque_16S) # Concordance for p-values concordance_pvalues <- createConcordance( object = results, slot = "pValMat", colName = "rawP", type = "pvalue" ) # Concordance for log fold changes concordance_logfc <- createConcordance( object = results, slot = "statInfo", colName = "logFC", type = "logfc" ) # Concordance for log fold changes in the first method and p-values in the # other concordance_logfc_pvalues <- createConcordance( object = results, slot = c("statInfo", "pValMat"), colName = c("logFC", "rawP"), type = c("logfc", "pvalue") )
data(ps_plaque_16S) # Balanced design my_splits <- createSplits( object = ps_plaque_16S, varName = "HMP_BODY_SUBSITE", balanced = TRUE, paired = "RSID", N = 10 # N = 100 suggested ) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Initialize some limma based methods my_limma <- set_limma(design = ~ RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Set the normalization methods according to the DA methods my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) # Run methods on split datasets results <- runSplits(split_list = my_splits, method_list = my_limma, normalization_list = my_norm, object = ps_plaque_16S) # Concordance for p-values concordance_pvalues <- createConcordance( object = results, slot = "pValMat", colName = "rawP", type = "pvalue" ) # Concordance for log fold changes concordance_logfc <- createConcordance( object = results, slot = "statInfo", colName = "logFC", type = "logfc" ) # Concordance for log fold changes in the first method and p-values in the # other concordance_logfc_pvalues <- createConcordance( object = results, slot = c("statInfo", "pValMat"), colName = c("logFC", "rawP"), type = c("logfc", "pvalue") )
Create a data.frame
object with several information to perform
enrichment analysis.
createEnrichment( object, priorKnowledge, enrichmentCol, namesCol = NULL, slot = "pValMat", colName = "adjP", type = "pvalue", direction = NULL, threshold_pvalue = 1, threshold_logfc = 0, top = NULL, alternative = "greater", verbose = FALSE )
createEnrichment( object, priorKnowledge, enrichmentCol, namesCol = NULL, slot = "pValMat", colName = "adjP", type = "pvalue", direction = NULL, threshold_pvalue = 1, threshold_logfc = 0, top = NULL, alternative = "greater", verbose = FALSE )
object |
Output of differential abundance detection methods.
|
priorKnowledge |
|
enrichmentCol |
name of the column containing information for enrichment analysis. |
namesCol |
name of the column containing new names for features (default
|
slot |
A character vector with 1 or number-of-methods-times repeats of
the slot names where to extract values for each method
(default |
colName |
A character vector with 1 or number-of-methods-times repeats
of the column name of the slot where to extract values for each method
(default |
type |
A character vector with 1 or number-of-methods-times repeats
of the value type of the column selected where to extract values for each
method. Two values are possible: |
direction |
A character vector with 1 or number-of-methods-times repeats
of the |
threshold_pvalue |
A single or a numeric vector of thresholds for
p-values. If present, features with p-values lower than
|
threshold_logfc |
A single or a numeric vector of thresholds for log
fold changes. If present, features with log fold change absolute values
higher than |
top |
If not null, the |
alternative |
indicates the alternative hypothesis and must be
one of |
verbose |
Boolean to display the kind of extracted values
(default |
a list of objects for each method. Each list contains:
data
a data.frame
object with DA directions,
statistics, and feature names;
tables
a list of 2x2 contingency tables;
tests
the list of Fisher exact tests' p-values for each
contingency table;
summaries
a list with the first element of each
contingency table and its p-value (for graphical purposes);
addKnowledge
, extractDA
, and
enrichmentTest
.
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Perform DA analysis Plaque_16S_DA <- runDA(method_list = my_limma, object = ps_plaque_16S) # Enrichment analysis enrichment <- createEnrichment(object = Plaque_16S_DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "GENUS", slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.1, threshold_logfc = 1, top = 10, verbose = TRUE)
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Perform DA analysis Plaque_16S_DA <- runDA(method_list = my_limma, object = ps_plaque_16S) # Enrichment analysis enrichment <- createEnrichment(object = Plaque_16S_DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "GENUS", slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.1, threshold_logfc = 1, top = 10, verbose = TRUE)
Given the number of samples of the dataset from which the mocks should be
created, this function produces a data.frame
object with as many rows
as the number of mocks and as many columns as the number of samples. If an
odd number of samples is given, the lower even integer will be considered in
order to obtain a balanced design for the mocks.
createMocks(nsamples, N = 1000)
createMocks(nsamples, N = 1000)
nsamples |
an integer representing the total number of samples. |
N |
number of mock comparison to generate. |
a data.frame
containing N
rows and nsamples
columns (if even). Each cell of the data frame contains the "grp1" or "grp2"
characters which represent the mock groups pattern.
# Generate the pattern for 100 mock comparisons for an experiment with 30 # samples mocks <- createMocks(nsamples = 30, N = 100) head(mocks)
# Generate the pattern for 100 mock comparisons for an experiment with 30 # samples mocks <- createMocks(nsamples = 30, N = 100) head(mocks)
Inspect the list of p-values or/and log fold changes from the output of the differential abundance detection methods and count the True Positives (TP) and the False Positives (FP).
createPositives( object, priorKnowledge, enrichmentCol, namesCol = NULL, slot = "pValMat", colName = "adjP", type = "pvalue", direction = NULL, threshold_pvalue = 1, threshold_logfc = 0, top = NULL, alternative = "greater", verbose = FALSE, TP, FP )
createPositives( object, priorKnowledge, enrichmentCol, namesCol = NULL, slot = "pValMat", colName = "adjP", type = "pvalue", direction = NULL, threshold_pvalue = 1, threshold_logfc = 0, top = NULL, alternative = "greater", verbose = FALSE, TP, FP )
object |
Output of differential abundance detection methods.
|
priorKnowledge |
|
enrichmentCol |
name of the column containing information for enrichment analysis. |
namesCol |
name of the column containing new names for features (default
|
slot |
A character vector with 1 or number-of-methods-times repeats of
the slot names where to extract values for each method
(default |
colName |
A character vector with 1 or number-of-methods-times repeats
of the column name of the slot where to extract values for each method
(default |
type |
A character vector with 1 or number-of-methods-times repeats
of the value type of the column selected where to extract values for each
method. Two values are possible: |
direction |
A character vector with 1 or number-of-methods-times repeats
of the |
threshold_pvalue |
A single or a numeric vector of thresholds for
p-values. If present, features with p-values lower than
|
threshold_logfc |
A single or a numeric vector of thresholds for log
fold changes. If present, features with log fold change absolute values
higher than |
top |
If not null, the |
alternative |
indicates the alternative hypothesis and must be
one of |
verbose |
Boolean to display the kind of extracted values
(default |
TP |
A list of length-2 vectors. The entries in the vector are the
direction ("UP Abundant", "DOWN Abundant", or "non-DA") in the first
position, and the level of the enrichment variable ( |
FP |
A list of length-2 vectors. The entries in the vector are the
direction ("UP Abundant", "DOWN Abundant", or "non-DA") in the first
position, and the level of the enrichment variable ( |
a data.frame
object which contains the number of TPs and FPs
features for each method and for each threshold of the top
argument.
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Perform DA analysis Plaque_16S_DA <- runDA(method_list = my_limma, object = ps_plaque_16S) # Count TPs and FPs, from the top 1 to the top 20 features. # As direction is supplied, features are ordered by "logFC" absolute values. positives <- createPositives(object = Plaque_16S_DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "newNames", slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC", threshold_pvalue = 1, threshold_logfc = 0, top = 1:20, alternative = "greater", verbose = FALSE, TP = list(c("DOWN Abundant", "Anaerobic"), c("UP Abundant", "Aerobic")), FP = list(c("DOWN Abundant", "Aerobic"), c("UP Abundant", "Anaerobic"))) # Plot the TP-FP differences for each threshold plotPositives(positives = positives)
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Perform DA analysis Plaque_16S_DA <- runDA(method_list = my_limma, object = ps_plaque_16S) # Count TPs and FPs, from the top 1 to the top 20 features. # As direction is supplied, features are ordered by "logFC" absolute values. positives <- createPositives(object = Plaque_16S_DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "newNames", slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC", threshold_pvalue = 1, threshold_logfc = 0, top = 1:20, alternative = "greater", verbose = FALSE, TP = list(c("DOWN Abundant", "Anaerobic"), c("UP Abundant", "Aerobic")), FP = list(c("DOWN Abundant", "Aerobic"), c("UP Abundant", "Anaerobic"))) # Plot the TP-FP differences for each threshold plotPositives(positives = positives)
Given a phyloseq or TreeSummarizedExperiment object from which the random
splits should be created, this function produces a list of 2
data.frame
objects: Subset1
and Subset2
with as many
rows as the number of splits and as many columns as the half of the number
of samples.
createSplits( object, assay_name = "counts", varName = NULL, paired = NULL, balanced = TRUE, N = 1000 )
createSplits( object, assay_name = "counts", varName = NULL, paired = NULL, balanced = TRUE, N = 1000 )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
varName |
name of a factor variable of interest. |
paired |
name of the unique subject identifier variable. If specified, paired samples will remain in the same split. (default = NULL). |
balanced |
If |
N |
number of splits to generate. |
A list of 2 data.frame
objects: Subset1
and
Subset2
containing N
rows and half of the total number of
samples columns. Each cell contains a unique sample identifier.
data(ps_plaque_16S) set.seed(123) # Balanced design for repeated measures # Balanced design for independent samples splits_df <- createSplits( object = ps_plaque_16S, varName = "HMP_BODY_SUBSITE", balanced = TRUE, N = 100 ) # Unbalanced design splits_df <- createSplits( object = ps_plaque_16S, varName = "HMP_BODY_SUBSITE", balanced = FALSE, N = 100 )
data(ps_plaque_16S) set.seed(123) # Balanced design for repeated measures # Balanced design for independent samples splits_df <- createSplits( object = ps_plaque_16S, varName = "HMP_BODY_SUBSITE", balanced = TRUE, N = 100 ) # Unbalanced design splits_df <- createSplits( object = ps_plaque_16S, varName = "HMP_BODY_SUBSITE", balanced = FALSE, N = 100 )
Extract the list of p-values from the outputs of the differential abundance detection methods to compute several statistics to study the ability to control the type I error and the p-values distribution.
createTIEC(object)
createTIEC(object)
object |
Output of the differential abundance tests on mock comparisons. Must follow a specific structure with comparison, method, matrix of p-values, and method's name (See vignette for detailed information). |
A list
of data.frame
s:
df_pval
5 columns per number_of_features x methods x
comparisons rows data.frame. The four columns are called Comparison,
Method, variable (containing the feature names), pval, and padj;
df_FPR
5 columns per methods x comparisons rows
data.frame. For each set of method and comparison, the proportion of
false positives, considering 3 thresholds (0.01, 0.05, 0.1) are
reported;
df_FDR
4 columns per methods rows data.frame. For each
method, the average proportion of mock comparisons where false positives
are found, considering 3 thresholds (0.01, 0.05, 0.1), are reported.
Each value is an estimate of the nominal False Discovery Rate (FDR);
df_QQ
contains the coordinates to draw the QQ-plot to
compare the mean observed p-value distribution across comparisons, with
the theoretical uniform distribution;
df_KS
5 columns and methods x comparisons rows data.frame.
For each set of method and comparison, the Kolmogorov-Smirnov test
statistics and p-values are reported in KS and KS_pval columns
respectively.
# Load some data data(ps_stool_16S) # Generate the patterns for 10 mock comparison for an experiment # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S) # Prepare results for Type I Error Control TIEC_summary <- createTIEC(results) # Plot the results plotFPR(df_FPR = TIEC_summary$df_FPR) plotFDR(df_FDR = TIEC_summary$df_FDR) plotQQ(df_QQ = TIEC_summary$df_QQ, zoom = c(0, 0.1)) plotKS(df_KS = TIEC_summary$df_KS) plotLogP(df_QQ = TIEC_summary$df_QQ)
# Load some data data(ps_stool_16S) # Generate the patterns for 10 mock comparison for an experiment # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S) # Prepare results for Type I Error Control TIEC_summary <- createTIEC(results) # Plot the results plotFPR(df_FPR = TIEC_summary$df_FPR) plotFDR(df_FDR = TIEC_summary$df_FDR) plotQQ(df_QQ = TIEC_summary$df_QQ, zoom = c(0, 0.1)) plotKS(df_KS = TIEC_summary$df_KS) plotLogP(df_QQ = TIEC_summary$df_QQ)
Fast run for the ALDEx2's differential abundance detection method. Support for Welch's t, Wilcoxon, Kruskal-Wallace, Kruskal-Wallace glm ANOVA-like, and glm tests.
DA_ALDEx2( object, assay_name = "counts", pseudo_count = FALSE, design = NULL, mc.samples = 128, test = c("t", "wilcox", "kw", "kw_glm", "glm"), paired.test = FALSE, denom = "all", contrast = NULL, verbose = TRUE )
DA_ALDEx2( object, assay_name = "counts", pseudo_count = FALSE, design = NULL, mc.samples = 128, test = c("t", "wilcox", "kw", "kw_glm", "glm"), paired.test = FALSE, denom = "all", contrast = NULL, verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
design |
a character with the name of a variable to group samples and
compare them or a formula to compute a model.matrix (when
|
mc.samples |
an integer. The number of Monte Carlo samples to use when estimating the underlying distributions. Since we are estimating central tendencies, 128 is usually sufficient. |
test |
a character string. Indicates which tests to perform. "t" runs Welch's t test while "wilcox" runs Wilcoxon test. "kw" runs Kruskal-Wallace test while "kw_glm" runs glm ANOVA-like test. "glm" runs a generalized linear model. |
paired.test |
A boolean. Toggles whether to do paired-sample tests.
Applies to |
denom |
An |
contrast |
character vector with exactly three elements: the name of a variable used in "design", the name of the level of interest, and the name of the reference level. If "kw" or "kw_glm" as test, contrast vector is not used. |
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat', the matrix of summary statistics for each tag 'statInfo', and a suggested 'name' of the final object considering the parameters passed to the function.
aldex
for the Dirichlet-Multinomial model
estimation. Several and more complex tests are present in the ALDEx2
framework.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 300, size = 3, prob = 0.5), nrow = 50, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance with t test and denom defined by the user DA_t <- DA_ALDEx2(ps, design = "group", test = "t", denom = c(1,2,3), paired.test = FALSE, contrast = c("group", "B", "A")) # Differential abundance with wilcox test and denom = "iqlr" DA_w <- DA_ALDEx2(ps, design = "group", test = "wilcox", denom = "iqlr", paired.test = FALSE, contrast = c("group", "B", "A")) # Differential abundance with kw test and denom = "zero" # mc.samples = 2 to speed up (128 suggested) DA_kw <- DA_ALDEx2(ps, design = "group", test = "kw", denom = "zero", mc.samples = 2) # Differential abundance with kw_glm test and denom = "median" DA_kw_glm <- DA_ALDEx2(ps, design = "group", test = "kw", denom = "median", mc.samples = 2) # Differential abundance with glm test and denom = "all" DA_glm <- DA_ALDEx2(ps, design = ~ group, test = "glm", denom = "all", mc.samples = 2, contrast = c("group", "B", "A"))
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 300, size = 3, prob = 0.5), nrow = 50, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance with t test and denom defined by the user DA_t <- DA_ALDEx2(ps, design = "group", test = "t", denom = c(1,2,3), paired.test = FALSE, contrast = c("group", "B", "A")) # Differential abundance with wilcox test and denom = "iqlr" DA_w <- DA_ALDEx2(ps, design = "group", test = "wilcox", denom = "iqlr", paired.test = FALSE, contrast = c("group", "B", "A")) # Differential abundance with kw test and denom = "zero" # mc.samples = 2 to speed up (128 suggested) DA_kw <- DA_ALDEx2(ps, design = "group", test = "kw", denom = "zero", mc.samples = 2) # Differential abundance with kw_glm test and denom = "median" DA_kw_glm <- DA_ALDEx2(ps, design = "group", test = "kw", denom = "median", mc.samples = 2) # Differential abundance with glm test and denom = "all" DA_glm <- DA_ALDEx2(ps, design = ~ group, test = "glm", denom = "all", mc.samples = 2, contrast = c("group", "B", "A"))
Fast run for ANCOM and ANCOM-BC2 differential abundance detection methods.
DA_ANCOM( object, assay_name = "counts", pseudo_count = FALSE, fix_formula = NULL, adj_formula = NULL, rand_formula = NULL, lme_control = lme4::lmerControl(), contrast = NULL, alpha = 0.05, p_adj_method = "BH", struc_zero = FALSE, BC = TRUE, n_cl = 1, verbose = TRUE )
DA_ANCOM( object, assay_name = "counts", pseudo_count = FALSE, fix_formula = NULL, adj_formula = NULL, rand_formula = NULL, lme_control = lme4::lmerControl(), contrast = NULL, alpha = 0.05, p_adj_method = "BH", struc_zero = FALSE, BC = TRUE, n_cl = 1, verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
fix_formula |
Used when |
adj_formula |
Used when |
rand_formula |
Optionally used when |
lme_control |
a list of control parameters for mixed model fitting.
See |
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
alpha |
numeric. Level of significance. Default is 0.05. |
p_adj_method |
character. method to adjust p-values. Default is "holm".
Options include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
"fdr", "none". See |
struc_zero |
logical. Whether to detect structural zeros based on
|
BC |
boolean for ANCOM method to use. If TRUE the bias correction
(ANCOM-BC2) is computed (default |
n_cl |
numeric. The number of nodes to be forked. For details, see
|
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat',
a matrix of summary statistics for each tag 'statInfo', and a suggested
'name' of the final object considering the parameters passed to the
function. ANCOM (BC = FALSE) does not produce p-values but W statistics.
Hence, 'pValMat' matrix is filled with 1 - W / (nfeatures - 1)
values
which are not p-values. To find DA features a threshold on this statistic
can be used (liberal < 0.4, < 0.3, < 0.2, < 0.1 conservative).
ancombc
for analysis of microbiome
compositions with bias correction or without it
ancom
.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_ANCOM(object = ps, pseudo_count = FALSE, fix_formula = "group", contrast = c("group", "B", "A"), verbose = FALSE)
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_ANCOM(object = ps, pseudo_count = FALSE, fix_formula = "group", contrast = c("group", "B", "A"), verbose = FALSE)
Fast run for basic differential abundance detection methods such as wilcox and t tests.
DA_basic( object, assay_name = "counts", pseudo_count = FALSE, contrast = NULL, test = c("t", "wilcox"), paired = FALSE, verbose = TRUE )
DA_basic( object, assay_name = "counts", pseudo_count = FALSE, contrast = NULL, test = c("t", "wilcox"), paired = FALSE, verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
test |
name of the test to perform. Choose between "t" or "wilcox". |
paired |
boolean. Choose whether the test is paired or not (default
|
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat', a matrix of summary statistics for each tag 'statInfo', and a suggested 'name' of the final object considering the parameters passed to the function.
DA_Seurat
for a similar implementation of basic
tests.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_basic(object = ps, pseudo_count = FALSE, contrast = c("group", "B", "A"), test = "t", verbose = FALSE)
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_basic(object = ps, pseudo_count = FALSE, contrast = c("group", "B", "A"), test = "t", verbose = FALSE)
Fast run for corncob differential abundance detection method.
DA_corncob( object, assay_name = "counts", pseudo_count = FALSE, formula, phi.formula, formula_null, phi.formula_null, test, boot = FALSE, coefficient = NULL, verbose = TRUE )
DA_corncob( object, assay_name = "counts", pseudo_count = FALSE, formula, phi.formula, formula_null, phi.formula_null, test, boot = FALSE, coefficient = NULL, verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
formula |
an object of class |
phi.formula |
an object of class |
formula_null |
Formula for mean under null, without response |
phi.formula_null |
Formula for overdispersion under null, without response |
test |
Character. Hypothesis testing procedure to use. One of
|
boot |
Boolean. Defaults to |
coefficient |
The coefficient of interest as a single word formed by the variable name and the non reference level. (e.g.: 'ConditionDisease' if the reference level for the variable 'Condition' is 'control'). |
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat', the matrix of summary statistics for each tag 'statInfo', and a suggested 'name' of the final object considering the parameters passed to the function.
bbdml
and
differentialTest
for differential abundance and
differential variance evaluation.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_corncob(object = ps, formula = ~ group, phi.formula = ~ group, formula_null = ~ 1, phi.formula_null = ~ group, coefficient = "groupB", test = "Wald")
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_corncob(object = ps, formula = ~ group, phi.formula = ~ group, formula_null = ~ 1, phi.formula_null = ~ group, coefficient = "groupB", test = "Wald")
Fast run for dearseq differential abundance detection method.
DA_dearseq( object, assay_name = "counts", pseudo_count = FALSE, covariates = NULL, variables2test = NULL, sample_group = NULL, test = c("permutation", "asymptotic"), preprocessed = FALSE, n_perm = 1000, verbose = TRUE )
DA_dearseq( object, assay_name = "counts", pseudo_count = FALSE, covariates = NULL, variables2test = NULL, sample_group = NULL, test = c("permutation", "asymptotic"), preprocessed = FALSE, n_perm = 1000, verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
covariates |
a character vector containing the colnames of the covariates to include in the model. |
variables2test |
a character vector containing the colnames of the variable of interest. |
sample_group |
a vector of length |
test |
a character string indicating which method to use to approximate
the variance component score test, either 'permutation' or 'asymptotic'
(default |
preprocessed |
a logical flag indicating whether the expression data have
already been preprocessed (e.g. log2 transformed). Default is |
n_perm |
the number of perturbations. Default is |
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat', a matrix of summary statistics for each tag 'statInfo' which are still the p-values as this method does not produce other values, and a suggested 'name' of the final object considering the parameters passed to the function.
dear_seq
for analysis of differential
expression/abundance through a variance component test.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_dearseq(object = ps, pseudo_count = FALSE, covariates = NULL, variables2test = "group", sample_group = NULL, test = "asymptotic", preprocessed = FALSE, verbose = TRUE)
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_dearseq(object = ps, pseudo_count = FALSE, covariates = NULL, variables2test = "group", sample_group = NULL, test = "asymptotic", preprocessed = FALSE, verbose = TRUE)
Fast run for DESeq2 differential abundance detection method.
DA_DESeq2( object, assay_name = "counts", pseudo_count = FALSE, design = NULL, contrast = NULL, alpha = 0.05, norm = c("ratio", "poscounts", "iterate"), weights, verbose = TRUE )
DA_DESeq2( object, assay_name = "counts", pseudo_count = FALSE, design = NULL, contrast = NULL, alpha = 0.05, norm = c("ratio", "poscounts", "iterate"), weights, verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
design |
character or formula to specify the model matrix. |
contrast |
character vector with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change. |
alpha |
the significance cutoff used for optimizing the independent filtering (by default 0.05). If the adjusted p-value cutoff (FDR) will be a value other than 0.05, alpha should be set to that value. |
norm |
name of the normalization method to use in the differential
abundance analysis. Choose between the native DESeq2 normalization methods,
such as |
weights |
an optional numeric matrix giving observational weights. |
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat', the dispersion estimates 'dispEsts', the matrix of summary statistics for each tag 'statInfo', and a suggested 'name' of the final object considering the parameters passed to the function.
phyloseq_to_deseq2
for phyloseq to DESeq2
object conversion, DESeq
and
results
for the differential abundance method.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the poscounts size factors ps_NF <- norm_DESeq2(object = ps, method = "poscounts") # The phyloseq object now contains the size factors: sizeFacts <- phyloseq::sample_data(ps_NF)[, "NF.poscounts"] head(sizeFacts) # Differential abundance DA_DESeq2(object = ps_NF, pseudo_count = FALSE, design = ~ group, contrast = c("group", "B", "A"), norm = "poscounts")
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the poscounts size factors ps_NF <- norm_DESeq2(object = ps, method = "poscounts") # The phyloseq object now contains the size factors: sizeFacts <- phyloseq::sample_data(ps_NF)[, "NF.poscounts"] head(sizeFacts) # Differential abundance DA_DESeq2(object = ps_NF, pseudo_count = FALSE, design = ~ group, contrast = c("group", "B", "A"), norm = "poscounts")
Fast run for edgeR differential abundance detection method.
DA_edgeR( object, assay_name = "counts", pseudo_count = FALSE, group_name = NULL, design = NULL, robust = FALSE, coef = 2, norm = c("TMM", "TMMwsp", "RLE", "upperquartile", "posupperquartile", "none"), weights, verbose = TRUE )
DA_edgeR( object, assay_name = "counts", pseudo_count = FALSE, group_name = NULL, design = NULL, robust = FALSE, coef = 2, norm = c("TMM", "TMMwsp", "RLE", "upperquartile", "posupperquartile", "none"), weights, verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
group_name |
character giving the name of the column containing information about experimental group/condition for each sample/library. |
design |
character or formula to specify the model matrix. |
robust |
logical, should the estimation of |
coef |
integer or character index vector indicating which coefficients of the linear model are to be tested equal to zero. |
norm |
name of the normalization method to use in the differential
abundance analysis. Choose between the native edgeR normalization methods,
such as |
weights |
an optional numeric matrix giving observational weights. |
verbose |
an optional logical value. If |
A list object containing the matrix of p-values pValMat
,
the dispersion estimates dispEsts
, the matrix of summary statistics
for each tag statInfo
, and a suggested name
of the final object
considering the parameters passed to the function.
DGEList
for the edgeR DEG object creation,
estimateDisp
and
estimateGLMRobustDisp
for dispersion estimation, and
glmQLFit
and glmQLFTest
for the
quasi-likelihood negative binomial model fit.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the TMM normalization factors ps_NF <- norm_edgeR(object = ps, method = "TMM") # The phyloseq object now contains the normalization factors: normFacts <- phyloseq::sample_data(ps_NF)[, "NF.TMM"] head(normFacts) # Differential abundance DA_edgeR(object = ps_NF, pseudo_count = FALSE, group_name = "group", design = ~ group, coef = 2, robust = FALSE, norm = "TMM")
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the TMM normalization factors ps_NF <- norm_edgeR(object = ps, method = "TMM") # The phyloseq object now contains the normalization factors: normFacts <- phyloseq::sample_data(ps_NF)[, "NF.TMM"] head(normFacts) # Differential abundance DA_edgeR(object = ps_NF, pseudo_count = FALSE, group_name = "group", design = ~ group, coef = 2, robust = FALSE, norm = "TMM")
Fast run for limma voom differential abundance detection method.
DA_limma( object, assay_name = "counts", pseudo_count = FALSE, design = NULL, coef = 2, norm = c("TMM", "TMMwsp", "RLE", "upperquartile", "posupperquartile", "none"), weights, verbose = TRUE )
DA_limma( object, assay_name = "counts", pseudo_count = FALSE, design = NULL, coef = 2, norm = c("TMM", "TMMwsp", "RLE", "upperquartile", "posupperquartile", "none"), weights, verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
design |
character name of the metadata columns, formula, or design matrix with rows corresponding to samples and columns to coefficients to be estimated. |
coef |
integer or character index vector indicating which coefficients of the linear model are to be tested equal to zero. |
norm |
name of the normalization method to use in the differential
abundance analysis. Choose between the native edgeR normalization methods,
such as |
weights |
an optional numeric matrix giving observational weights. |
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat', the matrix of summary statistics for each tag 'statInfo', and a suggested 'name' of the final object considering the parameters passed to the function.
voom
for the mean-variance relationship
estimation, lmFit
for the linear model framework.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the TMM normalization factors ps_NF <- norm_edgeR(object = ps, method = "TMM") # The phyloseq object now contains the normalization factors: normFacts <- phyloseq::sample_data(ps_NF)[, "NF.TMM"] head(normFacts) # Differential abundance DA_limma(object = ps_NF, pseudo_count = FALSE, design = ~ group, coef = 2, norm = "TMM")
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the TMM normalization factors ps_NF <- norm_edgeR(object = ps, method = "TMM") # The phyloseq object now contains the normalization factors: normFacts <- phyloseq::sample_data(ps_NF)[, "NF.TMM"] head(normFacts) # Differential abundance DA_limma(object = ps_NF, pseudo_count = FALSE, design = ~ group, coef = 2, norm = "TMM")
Fast run for linda differential abundance detection method.
DA_linda( object, assay_name = "counts", formula = NULL, contrast = NULL, is.winsor = TRUE, outlier.pct = 0.03, zero.handling = c("pseudo-count", "imputation"), pseudo.cnt = 0.5, alpha = 0.05, p.adj.method = "BH", verbose = TRUE )
DA_linda( object, assay_name = "counts", formula = NULL, contrast = NULL, is.winsor = TRUE, outlier.pct = 0.03, zero.handling = c("pseudo-count", "imputation"), pseudo.cnt = 0.5, alpha = 0.05, p.adj.method = "BH", verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
formula |
Character. For example: formula = '~x1*x2+x3+(1|id)'. At least one fixed effect is required. |
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
is.winsor |
Boolean. If TRUE (default), the Winsorization process will be conducted for the OTU table. |
outlier.pct |
A real value between 0 and 1; Winsorization cutoff (percentile) for the OTU table, e.g., 0.03. Default is NULL. If NULL, Winsorization process will not be conducted. |
zero.handling |
Character. Specifies the method to handle zeros in the OTU table. Options are "pseudo-count" or "imputation" (default is "pseudo-count"). If "imputation", zeros in the OTU table will be imputed using the formula in the referenced paper. If "pseudo-count", a small constant (pseudo.cnt) will be added to each value in the OTU table. |
pseudo.cnt |
A positive real value. Default is 0.5. If zero.handling is set to "pseudo-count", this constant will be added to each value in the OTU table. |
alpha |
A real value between 0 and 1; significance level of differential abundance. Default is 0.05. |
p.adj.method |
Character; p-value adjusting approach. See R function p.adjust. Default is 'BH'. |
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat', a matrix of summary statistics for each tag 'statInfo', and a suggested 'name' of the final object considering the parameters passed to the function.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_linda(object = ps, formula = "~ group", contrast = c("group", "B", "A"), is.winsor = TRUE, zero.handling = "pseudo-count", verbose = FALSE)
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_linda(object = ps, formula = "~ group", contrast = c("group", "B", "A"), is.winsor = TRUE, zero.handling = "pseudo-count", verbose = FALSE)
Fast run for Maaslin2 differential abundance detection method.
DA_Maaslin2( object, assay_name = "counts", normalization = c("TSS", "CLR", "CSS", "NONE", "TMM"), transform = c("LOG", "LOGIT", "AST", "NONE"), analysis_method = c("LM", "CPLM", "ZICP", "NEGBIN", "ZINB"), correction = "BH", random_effects = NULL, fixed_effects = NULL, contrast = NULL, reference = NULL, verbose = TRUE )
DA_Maaslin2( object, assay_name = "counts", normalization = c("TSS", "CLR", "CSS", "NONE", "TMM"), transform = c("LOG", "LOGIT", "AST", "NONE"), analysis_method = c("LM", "CPLM", "ZICP", "NEGBIN", "ZINB"), correction = "BH", random_effects = NULL, fixed_effects = NULL, contrast = NULL, reference = NULL, verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
normalization |
The normalization method to apply. |
transform |
The transform to apply. |
analysis_method |
The analysis method to apply. |
correction |
The correction method for computing the q-value. |
random_effects |
The random effects for the model, comma-delimited for multiple effects. |
fixed_effects |
The fixed effects for the model, comma-delimited for multiple effects. |
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
reference |
The factor to use as a reference for a variable with more than two levels provided as a string of 'variable,reference' semi-colon delimited for multiple variables. |
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat', a matrix of summary statistics for each tag 'statInfo', and a suggested 'name' of the final object considering the parameters passed to the function.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_Maaslin2(object = ps, normalization = "CLR", transform = "NONE", analysis_method = "LM", correction = "BH", random_effects = NULL, fixed_effects = "group", contrast = c("group", "B", "A"), verbose = FALSE)
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_Maaslin2(object = ps, normalization = "CLR", transform = "NONE", analysis_method = "LM", correction = "BH", random_effects = NULL, fixed_effects = "group", contrast = c("group", "B", "A"), verbose = FALSE)
Fast run for MAST differential abundance detection method.
DA_MAST( object, assay_name = "counts", pseudo_count = FALSE, rescale = c("median", "default"), design, coefficient = NULL, verbose = TRUE )
DA_MAST( object, assay_name = "counts", pseudo_count = FALSE, rescale = c("median", "default"), design, coefficient = NULL, verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
rescale |
Rescale count data, per million if 'default', or per median library size if 'median' ('median' is suggested for metagenomics data). |
design |
The model for the count distribution. Can be the variable name, or a character similar to "~ 1 + group", or a formula, or a 'model.matrix' object. |
coefficient |
The coefficient of interest as a single word formed by the variable name and the non reference level. (e.g.: 'ConditionDisease' if the reference level for the variable 'Condition' is 'control'). |
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat', the matrix of summary statistics for each tag 'statInfo', and a suggested 'name' of the final object considering the parameters passed to the function.
zlm
for the Truncated Gaussian Hurdle model
estimation.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 600, size = 3, prob = 0.5), nrow = 100, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_MAST(object = ps, pseudo_count = FALSE, rescale = "median", design = ~ group, coefficient = "groupB")
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 600, size = 3, prob = 0.5), nrow = 100, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_MAST(object = ps, pseudo_count = FALSE, rescale = "median", design = ~ group, coefficient = "groupB")
Fast run for the metagenomeSeq's differential abundance detection method.
DA_metagenomeSeq( object, assay_name = "counts", pseudo_count = FALSE, design = NULL, coef = 2, norm = "CSS", model = "fitFeatureModel", verbose = TRUE )
DA_metagenomeSeq( object, assay_name = "counts", pseudo_count = FALSE, design = NULL, coef = 2, norm = "CSS", model = "fitFeatureModel", verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
design |
the model for the count distribution. Can be the variable name, or a character similar to "~ 1 + group", or a formula. |
coef |
coefficient of interest to grab log fold-changes. |
norm |
name of the normalization method to use in the differential
abundance analysis. Choose the native metagenomeSeq normalization method
|
model |
character equal to "fitFeatureModel" for differential abundance
analysis using a zero-inflated log-normal model, "fitZig" for a complex
mathematical optimization routine to estimate probabilities that a zero for
a particular feature in a sample is a technical zero or not. The latter model
relies heavily on the limma package (default
|
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat', the matrix of summary statistics for each tag 'statInfo', and a suggested 'name' of the final object considering the parameters passed to the function.
fitZig
for the Zero-Inflated Gaussian
regression model estimation and MRfulltable
for results extraction.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the CSS normalization factors ps_NF <- norm_CSS(object = ps, method = "CSS") # The phyloseq object now contains the normalization factors: normFacts <- phyloseq::sample_data(ps_NF)[, "NF.CSS"] head(normFacts) # Differential abundance DA_metagenomeSeq(object = ps_NF, pseudo_count = FALSE, design = ~ group, coef = 2, norm = "CSS")
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the CSS normalization factors ps_NF <- norm_CSS(object = ps, method = "CSS") # The phyloseq object now contains the normalization factors: normFacts <- phyloseq::sample_data(ps_NF)[, "NF.CSS"] head(normFacts) # Differential abundance DA_metagenomeSeq(object = ps_NF, pseudo_count = FALSE, design = ~ group, coef = 2, norm = "CSS")
Fast run for mixMC sPLS-DA method for biomarker identification. It performs a CLR transformation on the 'counts + pseudo_counts' values. Then the sPLS-DA is tuned through a leave-one-out cross validation procedure.
DA_mixMC( object, pseudo_count = 1, assay_name = "counts", contrast = NULL, ID_variable = NULL, verbose = TRUE )
DA_mixMC( object, pseudo_count = 1, assay_name = "counts", contrast = NULL, ID_variable = NULL, verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
pseudo_count |
a positive numeric value for the pseudo-count to be added. Default is 1. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
ID_variable |
a character string indicating the name of the variable name corresponding to the repeated measures units (e.g., the subject ID). |
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat',
a matrix of summary statistics for each tag 'statInfo', and a suggested
'name' of the final object considering the parameters passed to the
function. mixMC does not produce p-values. The frequency and the importance
values are produced instead. The frequency indicates the stability of
the features across the folds of the cross validation. The importance
indicates the magnitude of the discrimination for the features and their
direction. Hence, 'pValMat' matrix is filled with 1 - frequency
values
which are not p-values. To find discriminant features a threshold on this
statistic can be used (liberal < 1, < 0.5, < 0.1 conservative).
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_mixMC(object = ps, pseudo_count = 1, contrast = c("group", "B", "A"), verbose = FALSE)
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_mixMC(object = ps, pseudo_count = 1, contrast = c("group", "B", "A"), verbose = FALSE)
Fast run for NOISeqBIO differential abundance detection method. It computes differential expression between two experimental conditions.
DA_NOISeq( object, assay_name = "counts", pseudo_count = FALSE, contrast = NULL, norm = c("rpkm", "uqua", "tmm", "n"), verbose = TRUE )
DA_NOISeq( object, assay_name = "counts", pseudo_count = FALSE, contrast = NULL, norm = c("rpkm", "uqua", "tmm", "n"), verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
norm |
name of the normalization method to use in the differential
abundance analysis. Choose between the native edgeR normalization methods,
such as |
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat',
a matrix of summary statistics for each tag 'statInfo', and a suggested
'name' of the final object considering the parameters passed to the
function. NOISeq does not produce p-values but an estimated probability of
differential expression for each feature. Note that these probabilities are
not equivalent to p-values. The higher the probability, the more likely that
the difference in abundance is due to the change in the experimental
condition and not to chance... Hence, 'pValMat' matrix is filled with
1 - prob
values which can be interpreted as 1 - FDR. Where FDR can
be considered as an adjusted p-value (see NOISeq vignette).
noiseqbio
for analysis of differential
expression/abundance between two experimental conditions from read count
data.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_NOISeq(object = ps, pseudo_count = FALSE, contrast = c("group", "B", "A"), norm = "tmm", verbose = FALSE)
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_NOISeq(object = ps, pseudo_count = FALSE, contrast = c("group", "B", "A"), norm = "tmm", verbose = FALSE)
Fast run for Seurat differential abundance detection method.
DA_Seurat( object, assay_name = "counts", pseudo_count = FALSE, norm = "LogNormalize", scale.factor = 10000, test = "wilcox", contrast = NULL, verbose = TRUE )
DA_Seurat( object, assay_name = "counts", pseudo_count = FALSE, norm = "LogNormalize", scale.factor = 10000, test = "wilcox", contrast = NULL, verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
norm |
Method for normalization.
|
scale.factor |
Sets the scale factor for cell-level normalization |
test |
Denotes which test to use. Available options are:
|
contrast |
character vector with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change. |
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat', the matrix of summary statistics for each tag 'statInfo', and a suggested 'name' of the final object considering the parameters passed to the function.
CreateSeuratObject
to create the Seurat
object, AddMetaData
to add metadata information,
NormalizeData
to compute the normalization for the
counts, FindVariableFeatures
to estimate the
mean-variance trend, ScaleData
to scale and center
features in the dataset, and FindMarkers
to perform
differential abundance analysis.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_Seurat(object = ps, contrast = c("group","B","A")) # Perform a simple Wilcoxon test using Seurat on raw data DA_Seurat(object = ps, contrast = c("group","B","A"), norm = "none", test = "wilcox")
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_Seurat(object = ps, contrast = c("group","B","A")) # Perform a simple Wilcoxon test using Seurat on raw data DA_Seurat(object = ps, contrast = c("group","B","A"), norm = "none", test = "wilcox")
Fast run for ZicoSeq differential abundance detection method.
DA_ZicoSeq( object, assay_name = "counts", contrast = NULL, strata = NULL, adj.name = NULL, feature.dat.type = c("count", "proportion", "other"), is.winsor = TRUE, outlier.pct = 0.03, winsor.end = c("top", "bottom", "both"), is.post.sample = TRUE, post.sample.no = 25, perm.no = 99, link.func = list(function(x) sign(x) * (abs(x))^0.5), ref.pct = 0.5, stage.no = 6, excl.pct = 0.2, verbose = TRUE )
DA_ZicoSeq( object, assay_name = "counts", contrast = NULL, strata = NULL, adj.name = NULL, feature.dat.type = c("count", "proportion", "other"), is.winsor = TRUE, outlier.pct = 0.03, winsor.end = c("top", "bottom", "both"), is.post.sample = TRUE, post.sample.no = 25, perm.no = 99, link.func = list(function(x) sign(x) * (abs(x))^0.5), ref.pct = 0.5, stage.no = 6, excl.pct = 0.2, verbose = TRUE )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
strata |
a factor such as subject IDs indicating the permutation strata or characters indicating the strata variable in |
adj.name |
the name(s) for the variable(s) to be adjusted. Multiple variables are allowed.
They could be numeric or categorical; should be in |
feature.dat.type |
the type of the feature data. It could be "count", "proportion" or "other". For "proportion" data type, posterior sampling will not be performed, but the reference-based ratio approach will still be used to address compositional effects. For "other" data type, neither posterior sampling or reference-base ratio approach will be used. |
is.winsor |
a logical value indicating whether winsorization should be performed to replace outliers. The default is TRUE. |
outlier.pct |
the expected percentage of outliers. These outliers will be winsorized. The default is 0.03. For count/proportion data,
|
winsor.end |
a character indicating whether the outliers at the "top", "bottom" or "both" will be winsorized.
The default is "top". If the |
is.post.sample |
a logical value indicating whether to perform posterior sampling of the underlying proportions. Only relevant when the feature data are counts. |
post.sample.no |
the number of posterior samples if posterior sampling is used. The default is 25. |
perm.no |
the number of permutations. If the raw p values are of the major interest, set |
link.func |
a list of transformation functions for the feature data or the ratios. Based on our experience, square-root transformation is a robust choice for many datasets. |
ref.pct |
percentage of reference taxa. The default is 0.5. |
stage.no |
the number of stages if multiple-stage normalization is used. The default is 6. |
excl.pct |
the maximum percentage of significant features (nominal p-value < 0.05) in the reference set that should be removed. Only relevant when multiple-stage normalization is used. |
verbose |
an optional logical value. If |
A list object containing the matrix of p-values 'pValMat', a matrix of summary statistics for each tag 'statInfo', and a suggested 'name' of the final object considering the parameters passed to the function.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 600, size = 3, prob = 0.5), nrow = 100, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_ZicoSeq(object = ps, feature.dat.type = "count", contrast = c("group", "B", "A"), is.winsor = TRUE, winsor.end = "top", is.post.sample = FALSE, verbose = FALSE)
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 600, size = 3, prob = 0.5), nrow = 100, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Differential abundance DA_ZicoSeq(object = ps, feature.dat.type = "count", contrast = c("group", "B", "A"), is.winsor = TRUE, winsor.end = "top", is.post.sample = FALSE, verbose = FALSE)
Perform the Fisher exact test for all the possible 2x2 contingency tables, considering differential abundance direction and enrichment variable.
enrichmentTest(method, enrichmentCol, alternative = "greater")
enrichmentTest(method, enrichmentCol, alternative = "greater")
method |
Output of differential abundance detection method in which
DA information is extracted by the |
enrichmentCol |
name of the column containing information for enrichment analysis. |
alternative |
indicates the alternative hypothesis and must be
one of |
a list of objects:
data
a data.frame
object with DA directions,
statistics, and feature names;
tables
a list of 2x2 contingency tables;
tests
the list of Fisher exact tests' p-values for each
contingency table;
summaries
a list with the first element of each
contingency table and its p-value (for graphical purposes);
extractDA
, addKnowledge
, and
createEnrichment
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # DA Analysis # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Add scaling factors ps_plaque_16S <- norm_edgeR(object = ps_plaque_16S, method = "TMM") # DA analysis da.limma <- DA_limma( object = ps_plaque_16S, design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = "TMM" ) DA <- getDA(method = da.limma, slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.05, threshold_logfc = 1, top = NULL) # Add a priori information DA_info <- addKnowledge(method = DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "newNames") # Create contingency tables and compute F tests DA_info_enriched <- enrichmentTest(method = DA_info, enrichmentCol = "Type", alternative = "greater")
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # DA Analysis # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Add scaling factors ps_plaque_16S <- norm_edgeR(object = ps_plaque_16S, method = "TMM") # DA analysis da.limma <- DA_limma( object = ps_plaque_16S, design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = "TMM" ) DA <- getDA(method = da.limma, slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.05, threshold_logfc = 1, top = NULL) # Add a priori information DA_info <- addKnowledge(method = DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "newNames") # Create contingency tables and compute F tests DA_info_enriched <- enrichmentTest(method = DA_info, enrichmentCol = "Type", alternative = "greater")
Inspect the list of p-values or/and log fold changes from the output of differential abundance detection methods.
extractDA( object, slot = "pValMat", colName = "adjP", type = "pvalue", direction = NULL, threshold_pvalue = 1, threshold_logfc = 0, top = NULL, verbose = FALSE )
extractDA( object, slot = "pValMat", colName = "adjP", type = "pvalue", direction = NULL, threshold_pvalue = 1, threshold_logfc = 0, top = NULL, verbose = FALSE )
object |
Output of differential abundance detection methods.
|
slot |
A character vector with 1 or number-of-methods-times repeats of
the slot names where to extract values for each method
(default |
colName |
A character vector with 1 or number-of-methods-times repeats
of the column name of the slot where to extract values for each method
(default |
type |
A character vector with 1 or number-of-methods-times repeats
of the value type of the column selected where to extract values for each
method. Two values are possible: |
direction |
A character vector with 1 or number-of-methods-times repeats
of the |
threshold_pvalue |
A single or a numeric vector of thresholds for
p-values. If present, features with p-values lower than
|
threshold_logfc |
A single or a numeric vector of thresholds for log
fold changes. If present, features with log fold change absolute values
higher than |
top |
If not null, the |
verbose |
Boolean to display the kind of extracted values
(default |
A data.frame
with several columns for each method:
stat
which contains the p-values or the absolute log fold
change values;
direction
which is present if direction
was
supplied, it contains the information about directionality of
differential abundance (usually log fold changes);
DA
which can contain several values according to
thresholds and inputs. "DA"
or "non-DA"
if
direction = NULL
, "UP Abundant"
, "DOWN Abundant"
, or
"non-DA"
otherwise.
data("ps_plaque_16S") # Add scaling factors my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Perform DA analysis my_methods <- set_limma(design = ~ 1 + HMP_BODY_SUBSITE, coef = 2, norm = c("TMM", "CSS")) Plaque_16S_DA <- runDA(method_list = my_methods, object = ps_plaque_16S) # Top 10 features (ordered by 'direction') are DA DA_1 <- extractDA( object = Plaque_16S_DA, slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 1, threshold_logfc = 0, top = 10 ) # Features with p-value < 0.05 and |logFC| > 1 are DA DA_2 <- extractDA( object = Plaque_16S_DA, slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.05, threshold_logfc = 1, top = NULL )
data("ps_plaque_16S") # Add scaling factors my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Perform DA analysis my_methods <- set_limma(design = ~ 1 + HMP_BODY_SUBSITE, coef = 2, norm = c("TMM", "CSS")) Plaque_16S_DA <- runDA(method_list = my_methods, object = ps_plaque_16S) # Top 10 features (ordered by 'direction') are DA DA_1 <- extractDA( object = Plaque_16S_DA, slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 1, threshold_logfc = 0, top = 10 ) # Features with p-value < 0.05 and |logFC| > 1 are DA DA_2 <- extractDA( object = Plaque_16S_DA, slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.05, threshold_logfc = 1, top = NULL )
Extract the list of p-values or/and log fold changes from the outputs of the differential abundance detection methods.
extractStatistics( object, slot = "pValMat", colName = "rawP", type = "pvalue", direction = NULL, verbose = FALSE )
extractStatistics( object, slot = "pValMat", colName = "rawP", type = "pvalue", direction = NULL, verbose = FALSE )
object |
Output of differential abundance detection methods.
|
slot |
A character vector with 1 or number-of-methods-times repeats of
the slot names where to extract values for each method
(default |
colName |
A character vector with 1 or number-of-methods-times repeats
of the column name of the slot where to extract values for each method
(default |
type |
A character vector with 1 or number-of-methods-times repeats
of the value type of the column selected where to extract values for each
method. Two values are possible: |
direction |
A character vector with 1 or number-of-methods-times repeats
of the |
verbose |
Boolean to display the kind of extracted values
(default |
A vector or a data.frame
for each method. If
direction = NULL
, the colname
column values, transformed
according to type
(not tranformed if type = "pvalue"
,
-abs(value)
if type = "logfc"
), of the slot
are reported
, otherwise the direction
column of the statInfo
matrix is
added to the output.
data("ps_plaque_16S") # Add scaling factors my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Perform DA analysis my_methods <- set_limma(design = ~ 1 + HMP_BODY_SUBSITE, coef = 2, norm = c("TMM", "CSS")) Plaque_16S_DA <- runDA(method_list = my_methods, object = ps_plaque_16S) ### Extract statistics for concordance analysis: # Only p-values extracted_pvalues <- extractStatistics( object = Plaque_16S_DA, slot = "pValMat", colName = "rawP", type = "pvalue" ) # Only transformed log fold changes -abs(logFC) extracted_abslfc <- extractStatistics( object = Plaque_16S_DA, slot = "statInfo", colName = "logFC", type = "logfc" ) # Only transformed log fold changes for a method and p-values for the other extracted_abslfc_pvalues <- extractStatistics( object = Plaque_16S_DA, slot = c("statInfo", "pValMat"), colName = c("logFC", "rawP"), type = c("logfc", "pvalue") ) ### Extract statistics for enrichment analysis: # p-values and log fold changes extracted_pvalues_and_lfc <- extractStatistics( object = Plaque_16S_DA, slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC" ) # transformed log fold changes and untouched log fold changes extracted_abslfc_and_lfc <- extractStatistics( object = Plaque_16S_DA, slot = "statInfo", colName = "logFC", type = "logfc", direction = "logFC" ) # Only transformed log fold changes for a method and p-values for the other extracted_mix <- extractStatistics( object = Plaque_16S_DA, slot = c("statInfo", "pValMat"), colName = c("logFC", "rawP"), type = c("logfc", "pvalue"), direction = "logFC" )
data("ps_plaque_16S") # Add scaling factors my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Perform DA analysis my_methods <- set_limma(design = ~ 1 + HMP_BODY_SUBSITE, coef = 2, norm = c("TMM", "CSS")) Plaque_16S_DA <- runDA(method_list = my_methods, object = ps_plaque_16S) ### Extract statistics for concordance analysis: # Only p-values extracted_pvalues <- extractStatistics( object = Plaque_16S_DA, slot = "pValMat", colName = "rawP", type = "pvalue" ) # Only transformed log fold changes -abs(logFC) extracted_abslfc <- extractStatistics( object = Plaque_16S_DA, slot = "statInfo", colName = "logFC", type = "logfc" ) # Only transformed log fold changes for a method and p-values for the other extracted_abslfc_pvalues <- extractStatistics( object = Plaque_16S_DA, slot = c("statInfo", "pValMat"), colName = c("logFC", "rawP"), type = c("logfc", "pvalue") ) ### Extract statistics for enrichment analysis: # p-values and log fold changes extracted_pvalues_and_lfc <- extractStatistics( object = Plaque_16S_DA, slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC" ) # transformed log fold changes and untouched log fold changes extracted_abslfc_and_lfc <- extractStatistics( object = Plaque_16S_DA, slot = "statInfo", colName = "logFC", type = "logfc", direction = "logFC" ) # Only transformed log fold changes for a method and p-values for the other extracted_mix <- extractStatistics( object = Plaque_16S_DA, slot = c("statInfo", "pValMat"), colName = c("logFC", "rawP"), type = c("logfc", "pvalue"), direction = "logFC" )
Fit a Dirichlet-Multinomial (DM) distribution for each taxon of the count
data. The model estimation procedure is performed by MGLM
MGLMreg
function without assuming the presence of any group in
the samples (design matrix equal to a column of ones.)
fitDM(object, assay_name = "counts", verbose = TRUE)
fitDM(object, assay_name = "counts", verbose = TRUE)
object |
a phyloseq object, a TreeSummarizedExperiment object, or a matrix of counts. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
verbose |
an optional logical value. If |
A data frame containing the continuity corrected logarithms of the
average fitted values for each row of the matrix of counts in the Y
column, and the estimated probability to observe a zero in the Y0
column.
# Generate some random counts counts = matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) # Fit model on the counts matrix DM <- fitDM(counts) head(DM)
# Generate some random counts counts = matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) # Fit model on the counts matrix DM <- fitDM(counts) head(DM)
Fit a truncated gaussian hurdle model for each taxon of the count data. The
hurdle model estimation procedure is performed by MAST zlm
function without assuming the presence of any group in the samples (design
matrix equal to a column of ones.)
fitHURDLE(object, assay_name = "counts", scale = "default", verbose = TRUE)
fitHURDLE(object, assay_name = "counts", scale = "default", verbose = TRUE)
object |
a phyloseq object, a TreeSummarizedExperiment object, or a matrix of counts. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
scale |
Character vector, either |
verbose |
an optional logical value. If |
A data frame containing the continuity corrected logarithms of the
average fitted values for each row of the matrix of counts in the Y
column, and the estimated probability to observe a zero in the Y0
column.
# Generate some random counts counts = matrix(rnbinom(n = 600, size = 3, prob = 0.5), nrow = 100, ncol = 6) # Fit model on the counts matrix HURDLE <- fitHURDLE(counts, scale = "median") head(HURDLE)
# Generate some random counts counts = matrix(rnbinom(n = 600, size = 3, prob = 0.5), nrow = 100, ncol = 6) # Fit model on the counts matrix HURDLE <- fitHURDLE(counts, scale = "median") head(HURDLE)
A wrapper function that fits the specified models for each taxon of the count data and computes the mean difference (MD) and zero probability difference (ZPD) between estimated and observed values.
fitModels( object, assay_name = "counts", models = c("NB", "ZINB", "DM", "ZIG", "HURDLE"), scale_HURDLE = c("default", "median"), verbose = TRUE )
fitModels( object, assay_name = "counts", models = c("NB", "ZINB", "DM", "ZIG", "HURDLE"), scale_HURDLE = c("default", "median"), verbose = TRUE )
object |
a phyloseq object, a TreeSummarizedExperiment object, or a matrix of counts. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
models |
character vector which assumes the values |
scale_HURDLE |
character vector, either |
verbose |
an optional logical value. If |
list of data.frame
objects for each model. The first two
columns contain the properly transformed observed values for mean and zero
proportion, while the third and the fourth columns contain the estimated
values for the mean and the zero rate respectively.
fitNB
, fitZINB
, fitDM
,
fitZIG
, and fitHURDLE
for the model estimations,
prepareObserved
for raw counts preparation, and
meanDifferences
for the Mean Difference (MD) and Zero
Probability Difference (ZPD) computations.
# Generate some random counts counts <- matrix(rnbinom(n = 600, size = 3, prob = 0.5), nrow = 100, ncol = 6) # Estimate the counts assuming several distributions GOF <- fitModels( object = counts, models = c( "NB", "ZINB", "DM", "ZIG", "HURDLE" ), scale_HURDLE = c("median", "default") ) head(GOF)
# Generate some random counts counts <- matrix(rnbinom(n = 600, size = 3, prob = 0.5), nrow = 100, ncol = 6) # Estimate the counts assuming several distributions GOF <- fitModels( object = counts, models = c( "NB", "ZINB", "DM", "ZIG", "HURDLE" ), scale_HURDLE = c("median", "default") ) head(GOF)
Fit a Negative Binomial (NB) distribution for each taxon of the count data.
The NB estimation procedure is performed by edgeR glmFit
function, using TMM
normalized counts, tag-wise dispersion estimation,
and not assuming the presence of any group in the samples (design matrix
equal to a column of ones).
fitNB(object, assay_name = "counts", verbose = TRUE)
fitNB(object, assay_name = "counts", verbose = TRUE)
object |
a phyloseq object, a TreeSummarizedExperiment object, or a matrix of counts. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
verbose |
an optional logical value. If |
A data frame containing the continuity corrected logarithms of the average fitted values for each row of the 'counts' matrix in the 'Y' column, and the estimated probability to observe a zero in the 'Y0' column.
# Generate some random counts counts = matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) # Fit model on the matrix of counts NB <- fitNB(counts) head(NB)
# Generate some random counts counts = matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) # Fit model on the matrix of counts NB <- fitNB(counts) head(NB)
Fit a Zero-Inflated Gaussian (ZIG) distribution for each taxon of the count
data. The model estimation procedure is performed by metagenomeSeq
fitZig
function without assuming the presence of any group in
the samples (design matrix equal to a column of ones.)
fitZIG(object, assay_name = "counts", verbose = TRUE)
fitZIG(object, assay_name = "counts", verbose = TRUE)
object |
a phyloseq object, a TreeSummarizedExperiment object, or a matrix of counts. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
verbose |
an optional logical value. If |
A data frame containing the continuity corrected logarithms of the
average fitted values for each row of the matrix of counts in the Y
column, and the estimated probability to observe a zero in the Y0
column.
# Generate some random counts counts = matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) # Fit model on the counts matrix ZIG <- fitZIG(counts) head(ZIG)
# Generate some random counts counts = matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) # Fit model on the counts matrix ZIG <- fitZIG(counts) head(ZIG)
Fit a Zero-Inflated Negative Binomial (ZINB) distribution for each taxon of
the countdata. The ZINB estimation procedure is performed by zinbwave
zinbFit
function with commondispersion = FALSE
,
regularization parameter epsilon = 1e10
, and not assuming the presence
of any group in the samples (design matrix equal to a column of ones.)
fitZINB(object, assay_name = "counts", verbose = TRUE)
fitZINB(object, assay_name = "counts", verbose = TRUE)
object |
a phyloseq object, a TreeSummarizedExperiment object, or a matrix of counts. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
verbose |
an optional logical value. If |
A data frame containing the continuity corrected logarithms of the
average fitted values for each row of the matrix of counts in the Y
column, and the estimated probability to observe a zero in the Y0
column.
# Generate some random counts counts = matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) # Fit model on the counts matrix ZINB <- fitZINB(counts) head(ZINB)
# Generate some random counts counts = matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) # Fit model on the counts matrix ZINB <- fitZINB(counts) head(ZINB)
Check whether the input object is a phyloseq or a TreeSummarizedExperiment, then extract the requested data slots.
get_counts_metadata( object, assay_name = "counts", min_counts = 0, min_samples = 0 )
get_counts_metadata( object, assay_name = "counts", min_counts = 0, min_samples = 0 )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
min_counts |
Parameter to filter taxa. Set this number to keep features
with more than |
min_samples |
Parameter to filter taxa. Set this number to keep
features with a |
a list
of results:
counts
the otu_table
slot or assayName
assay
of the phyloseq or TreeSummarizedExperiment object;
metadata
the sample_data
or colData
slot of
the phyloseq or TreeSummarizedExperiment object;
is_phyloseq
a boolean equal to TRUE
if the input is
a phyloseq object.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) get_counts_metadata(ps) # Or with a TreeSummarizedExperiment tse <- TreeSummarizedExperiment::TreeSummarizedExperiment( assays = list("counts" = counts), colData = metadata) get_counts_metadata(tse)
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) get_counts_metadata(ps) # Or with a TreeSummarizedExperiment tse <- TreeSummarizedExperiment::TreeSummarizedExperiment( assays = list("counts" = counts), colData = metadata) get_counts_metadata(tse)
Inspect the list of p-values or/and log fold changes from the output of a differential abundance detection method.
getDA( method, slot = "pValMat", colName = "rawP", type = "pvalue", direction = NULL, threshold_pvalue = 1, threshold_logfc = 0, top = NULL, verbose = FALSE )
getDA( method, slot = "pValMat", colName = "rawP", type = "pvalue", direction = NULL, threshold_pvalue = 1, threshold_logfc = 0, top = NULL, verbose = FALSE )
method |
Output of a differential abundance detection method.
|
slot |
The slot name where to extract values
(default |
colName |
The column name of the slot where to extract values
(default |
type |
The value type of the column selected where to extract values.
Two values are possible: |
direction |
|
threshold_pvalue |
Threshold value for p-values. If present, features
with p-values lower than |
threshold_logfc |
Threshold value for log fold changes. If present,
features with log fold change absolute values higher than
|
top |
If not null, the |
verbose |
Boolean to display the kind of extracted values
(default |
A data.frame
with several columns:
stat
which contains the p-values or the absolute log fold
change values;
direction
which is present if method
was a
data.frame
object, it contains the information about
directionality of differential abundance (usually log fold changes);
DA
which can contain several values according to
thresholds and inputs. "DA"
or "non-DA"
if method
object was a vector, "UP Abundant"
, "DOWN Abundant"
, or
"non-DA"
if method
was a data.frame
.
data("ps_plaque_16S") # Add scaling factors ps_plaque_16S <- norm_edgeR(object = ps_plaque_16S, method = "TMM") # DA analysis da.limma <- DA_limma( object = ps_plaque_16S, design = ~ 1 + HMP_BODY_SUBSITE, coef = 2, norm = "TMM" ) # features with p-value < 0.1 as DA getDA( method = da.limma, slot = "pValMat", colName = "rawP", type = "pvalue", direction = NULL, threshold_pvalue = 0.1, threshold_logfc = 0, top = NULL ) # top 10 feature with highest logFC are DA getDA( method = da.limma, slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC", threshold_pvalue = 1, threshold_logfc = 0, top = 10 ) # features with p-value < 0.1 and |logFC| > 1 are DA getDA( method = da.limma, slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.1, threshold_logfc = 1, top = NULL ) # top 10 features with |logFC| > 1 are DA getDA( method = da.limma, slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC", threshold_pvalue = 1, threshold_logfc = 1, top = 10 )
data("ps_plaque_16S") # Add scaling factors ps_plaque_16S <- norm_edgeR(object = ps_plaque_16S, method = "TMM") # DA analysis da.limma <- DA_limma( object = ps_plaque_16S, design = ~ 1 + HMP_BODY_SUBSITE, coef = 2, norm = "TMM" ) # features with p-value < 0.1 as DA getDA( method = da.limma, slot = "pValMat", colName = "rawP", type = "pvalue", direction = NULL, threshold_pvalue = 0.1, threshold_logfc = 0, top = NULL ) # top 10 feature with highest logFC are DA getDA( method = da.limma, slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC", threshold_pvalue = 1, threshold_logfc = 0, top = 10 ) # features with p-value < 0.1 and |logFC| > 1 are DA getDA( method = da.limma, slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.1, threshold_logfc = 1, top = NULL ) # top 10 features with |logFC| > 1 are DA getDA( method = da.limma, slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC", threshold_pvalue = 1, threshold_logfc = 1, top = 10 )
Inspect the list of p-values or/and log fold changes from the output of a differential abundance detection method and count the True Positives (TP) and the False Positives (FP).
getPositives(method, enrichmentCol, TP, FP)
getPositives(method, enrichmentCol, TP, FP)
method |
Output of differential abundance detection method in which
DA information is extracted by the |
enrichmentCol |
name of the column containing information for enrichment analysis. |
TP |
A list of length-2 vectors. The entries in the vector are the
direction ("UP Abundant", "DOWN Abundant", or "non-DA") in the first
position, and the level of the enrichment variable ( |
FP |
A list of length-2 vectors. The entries in the vector are the
direction ("UP Abundant", "DOWN Abundant", or "non-DA") in the first
position, and the level of the enrichment variable ( |
A named vector containing the number of TPs and FPs.
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"] ) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # DA Analysis # Add scaling factors ps_plaque_16S <- norm_edgeR(object = ps_plaque_16S, method = "TMM") # DA analysis da.limma <- DA_limma( object = ps_plaque_16S, design = ~ 1 + HMP_BODY_SUBSITE, coef = 2, norm = "TMM" ) DA <- getDA( method = da.limma, slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.05, threshold_logfc = 1, top = NULL ) # Add a priori information DA_info <- addKnowledge( method = DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "newNames" ) # Create contingency tables and compute F tests DA_info_enriched <- enrichmentTest( method = DA_info, enrichmentCol = "Type", alternative = "greater" ) # Count True and False Positives DA_TP_FP <- getPositives( method = DA_info_enriched, enrichmentCol = "Type", TP = list(c("UP Abundant", "Aerobic"), c("DOWN Abundant", "Anaerobic")), FP = list(c("UP Abundant", "Anaerobic"), c("DOWN Abundant", "Aerobic")) )
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"] ) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # DA Analysis # Add scaling factors ps_plaque_16S <- norm_edgeR(object = ps_plaque_16S, method = "TMM") # DA analysis da.limma <- DA_limma( object = ps_plaque_16S, design = ~ 1 + HMP_BODY_SUBSITE, coef = 2, norm = "TMM" ) DA <- getDA( method = da.limma, slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.05, threshold_logfc = 1, top = NULL ) # Add a priori information DA_info <- addKnowledge( method = DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "newNames" ) # Create contingency tables and compute F tests DA_info_enriched <- enrichmentTest( method = DA_info, enrichmentCol = "Type", alternative = "greater" ) # Count True and False Positives DA_TP_FP <- getPositives( method = DA_info_enriched, enrichmentCol = "Type", TP = list(c("UP Abundant", "Aerobic"), c("DOWN Abundant", "Anaerobic")), FP = list(c("UP Abundant", "Anaerobic"), c("DOWN Abundant", "Aerobic")) )
Extract the list of p-values or/and log fold changes from the output of a differential abundance detection method.
getStatistics( method, slot = "pValMat", colName = "rawP", type = "pvalue", direction = NULL, verbose = FALSE )
getStatistics( method, slot = "pValMat", colName = "rawP", type = "pvalue", direction = NULL, verbose = FALSE )
method |
Output of a differential abundance detection method.
|
slot |
The slot name where to extract values
(default |
colName |
The column name of the slot where to extract values
(default |
type |
The value type of the column selected where to extract values.
Two values are possible: |
direction |
|
verbose |
Boolean to display the kind of extracted values
(default |
A vector or a data.frame
. If direction = NULL
,
the colname
column values, transformed according to type
(not
tranformed if type = "pvalue"
, -abs(value)
if
type = "logfc"
), of the slot
are reported, otherwise the
direction
column of the statInfo
matrix is added to the output.
data("ps_plaque_16S") # Add scaling factors ps_plaque_16S <- norm_edgeR(object = ps_plaque_16S, method = "TMM") # DA analysis da.limma <- DA_limma( object = ps_plaque_16S, design = ~ 1 + HMP_BODY_SUBSITE, coef = 2, norm = "TMM" ) # get p-values getStatistics( method = da.limma, slot = "pValMat", colName = "rawP", type = "pvalue", direction = NULL ) # get negative abs(logFC) values getStatistics( method = da.limma, slot = "statInfo", colName = "logFC", type = "logfc", direction = NULL ) # get p-values and logFC getStatistics( method = da.limma, slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC" )
data("ps_plaque_16S") # Add scaling factors ps_plaque_16S <- norm_edgeR(object = ps_plaque_16S, method = "TMM") # DA analysis da.limma <- DA_limma( object = ps_plaque_16S, design = ~ 1 + HMP_BODY_SUBSITE, coef = 2, norm = "TMM" ) # get p-values getStatistics( method = da.limma, slot = "pValMat", colName = "rawP", type = "pvalue", direction = NULL ) # get negative abs(logFC) values getStatistics( method = da.limma, slot = "statInfo", colName = "logFC", type = "logfc", direction = NULL ) # get p-values and logFC getStatistics( method = da.limma, slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC" )
Turn the chosen columns (factor) of the input data.frame
into ordered factors. For each factor, the order is given by the number of
elements in each level of that factor.
iterative_ordering(df, var_names, i = 1, decreasing = TRUE)
iterative_ordering(df, var_names, i = 1, decreasing = TRUE)
df |
a |
var_names |
character vector containing the names of one or more columns
of |
i |
iteration index (default |
decreasing |
logical value or a vector of them. Each value should be
associated to a |
the input data.frame
with the var_names
variables as
ordered factors.
# From a dataset with some factor columns mpg <- data.frame(ggplot2::mpg) # Order the levels of the desired factors based on the cardinality of each # level. ordered_mpg <- iterative_ordering(df = mpg, var_names = c("manufacturer", "model"), decreasing = c(TRUE, TRUE)) # Now the levels of the factors are ordered in a decreasing way levels(ordered_mpg$manufacturer) levels(ordered_mpg$model)
# From a dataset with some factor columns mpg <- data.frame(ggplot2::mpg) # Order the levels of the desired factors based on the cardinality of each # level. ordered_mpg <- iterative_ordering(df = mpg, var_names = c("manufacturer", "model"), decreasing = c(TRUE, TRUE)) # Now the levels of the factors are ordered in a decreasing way levels(ordered_mpg$manufacturer) levels(ordered_mpg$model)
Compute the differences between the estimated and the observed continuity corrected logarithms of the average count values (MD), and between the estimated average probability to observe a zero and the the observed zero rate (ZPD).
meanDifferences(estimated, observed)
meanDifferences(estimated, observed)
estimated |
a two column data.frame, output of |
observed |
a two column data.frame, output of
|
a data.frame
containing the differences between the estimated
and the observed continuity corrected logarithms of the average count values
in the MD
column, and between the estimated average probability to
observe a zero and the the observed zero rate in the ZPD
column.
# Randomly generate the observed and estimated data.frames observed <- data.frame(Y = rpois(10, 5), Y0 = runif(10, 0, 1)) estimated <- data.frame(Y = rpois(10, 5), Y0 = runif(10, 0, 1)) # Compute the mean differences between estimated and observed data.frames meanDifferences(estimated, observed)
# Randomly generate the observed and estimated data.frames observed <- data.frame(Y = rpois(10, 5), Y0 = runif(10, 0, 1)) estimated <- data.frame(Y = rpois(10, 5), Y0 = runif(10, 0, 1)) # Compute the mean differences between estimated and observed data.frames meanDifferences(estimated, observed)
Aerobic, Anaerobic, or Facultative Anaerobic microbes by genus (NYC-HANES study).
data(microbial_metabolism)
data(microbial_metabolism)
A data.frame
object
Calculate normalization factors from a phyloseq or TreeSummarizedExperiment
object. Inherited from metagenomeSeq
calcNormFactors
function which performs the
Cumulative Sum Scaling normalization.
norm_CSS(object, assay_name = "counts", method = "CSS", verbose = TRUE)
norm_CSS(object, assay_name = "counts", method = "CSS", verbose = TRUE)
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
method |
normalization method to be used (only CSS). |
verbose |
an optional logical value. If |
A new column containing the CSS normalization factors is added to
the sample_data
slot of the phyloseq object or the colData
slot of the TreeSummarizedExperiment object.
calcNormFactors
for details.
setNormalizations
and runNormalizations
to fastly
set and run normalizations.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the normalization factors ps_NF <- norm_CSS(object = ps, method = "CSS") # The phyloseq object now contains the normalization factors: CSSFacts <- phyloseq::sample_data(ps_NF)[, "NF.CSS"] head(CSSFacts) # VERY IMPORTANT: metagenomeSeq uses scaling factors to normalize counts # (even though they are called normalization factors). These factors are used # internally by a regression model. To make CSS size factors available for # edgeR, we need to transform them into normalization factors. This is # possible by dividing the factors for the library sizes and renormalizing. normCSSFacts = CSSFacts / colSums(phyloseq::otu_table(ps_stool_16S)) # Renormalize: multiply to 1 normFacts = normCSSFacts/exp(colMeans(log(normCSSFacts)))
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the normalization factors ps_NF <- norm_CSS(object = ps, method = "CSS") # The phyloseq object now contains the normalization factors: CSSFacts <- phyloseq::sample_data(ps_NF)[, "NF.CSS"] head(CSSFacts) # VERY IMPORTANT: metagenomeSeq uses scaling factors to normalize counts # (even though they are called normalization factors). These factors are used # internally by a regression model. To make CSS size factors available for # edgeR, we need to transform them into normalization factors. This is # possible by dividing the factors for the library sizes and renormalizing. normCSSFacts = CSSFacts / colSums(phyloseq::otu_table(ps_stool_16S)) # Renormalize: multiply to 1 normFacts = normCSSFacts/exp(colMeans(log(normCSSFacts)))
Calculate size factors from a phyloseq or TreeSummarizedExperiment object.
Inherited from DESeq2 estimateSizeFactors
function.
norm_DESeq2( object, assay_name = "counts", method = c("ratio", "poscounts", "iterate"), verbose = TRUE, ... )
norm_DESeq2( object, assay_name = "counts", method = c("ratio", "poscounts", "iterate"), verbose = TRUE, ... )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
method |
Method for estimation: either |
verbose |
an optional logical value. If |
... |
other parameters for DESeq2
|
A new column containing the chosen DESeq2-based size factors is
added to the sample_data
slot of the phyloseq object or the
colData
slot of the TreeSummarizedExperiment object.
estimateSizeFactors
for details.
setNormalizations
and runNormalizations
to
fastly set and run normalizations.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the size factors ps_NF <- norm_DESeq2(object = ps, method = "poscounts") # The phyloseq object now contains the size factors: sizeFacts <- phyloseq::sample_data(ps_NF)[, "NF.poscounts"] head(sizeFacts) # VERY IMPORTANT: DESeq2 uses size factors to normalize counts. # These factors are used internally by a regression model. To make DEseq2 # size factors available for edgeR, we need to transform them into # normalization factors. This is possible by dividing the factors by the # library sizes and renormalizing. normSizeFacts = sizeFacts / colSums(phyloseq::otu_table(ps_stool_16S)) # Renormalize: multiply to 1 normFacts = normSizeFacts/exp(colMeans(log(normSizeFacts)))
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the size factors ps_NF <- norm_DESeq2(object = ps, method = "poscounts") # The phyloseq object now contains the size factors: sizeFacts <- phyloseq::sample_data(ps_NF)[, "NF.poscounts"] head(sizeFacts) # VERY IMPORTANT: DESeq2 uses size factors to normalize counts. # These factors are used internally by a regression model. To make DEseq2 # size factors available for edgeR, we need to transform them into # normalization factors. This is possible by dividing the factors by the # library sizes and renormalizing. normSizeFacts = sizeFacts / colSums(phyloseq::otu_table(ps_stool_16S)) # Renormalize: multiply to 1 normFacts = normSizeFacts/exp(colMeans(log(normSizeFacts)))
Calculate normalization factors from a phyloseq or TreeSummarizedExperiment
object. Inherited from edgeR calcNormFactors
function.
norm_edgeR( object, assay_name = "counts", method = c("TMM", "TMMwsp", "RLE", "upperquartile", "posupperquartile", "none"), refColumn = NULL, logratioTrim = 0.3, sumTrim = 0.05, doWeighting = TRUE, Acutoff = -1e+10, p = 0.75, verbose = TRUE, ... )
norm_edgeR( object, assay_name = "counts", method = c("TMM", "TMMwsp", "RLE", "upperquartile", "posupperquartile", "none"), refColumn = NULL, logratioTrim = 0.3, sumTrim = 0.05, doWeighting = TRUE, Acutoff = -1e+10, p = 0.75, verbose = TRUE, ... )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
method |
normalization method to be used. Choose between |
refColumn |
column to use as reference for |
logratioTrim |
the fraction (0 to 0.5) of observations to be trimmed from each tail of the distribution of log-ratios (M-values) before computing the mean. Used by |
sumTrim |
the fraction (0 to 0.5) of observations to be trimmed from each tail of the distribution of A-values before computing the mean. Used by |
doWeighting |
logical, whether to use (asymptotic binomial precision) weights when computing the mean M-values. Used by |
Acutoff |
minimum cutoff applied to A-values. Count pairs with lower A-values are ignored. Used by |
p |
numeric value between 0 and 1 specifying which quantile of the counts should be used by |
verbose |
an optional logical value. If |
... |
other arguments are not currently used. |
A new column containing the chosen edgeR-based normalization factors
is added to the sample_data
slot of the phyloseq object or the
colData
slot of the TreeSummarizedExperiment object.
calcNormFactors
for details.
setNormalizations
and runNormalizations
to fastly set and run normalizations.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the normalization factors ps_NF <- norm_edgeR(object = ps, method = "TMM") # The phyloseq object now contains the normalization factors: normFacts <- phyloseq::sample_data(ps_NF)[, "NF.TMM"] head(normFacts) # VERY IMPORTANT: edgeR uses normalization factors to normalize library sizes # not counts. They are used internally by a regression model. To make edgeR # normalization factors available for other methods, such as DESeq2 or other # DA methods based on scaling or size factors, we need to transform them into # size factors. This is possible by multiplying the factors for the library # sizes and renormalizing. normLibSize = normFacts * colSums(phyloseq::otu_table(ps_stool_16S)) # Renormalize: multiply to 1 sizeFacts = normLibSize/exp(colMeans(log(normLibSize)))
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the normalization factors ps_NF <- norm_edgeR(object = ps, method = "TMM") # The phyloseq object now contains the normalization factors: normFacts <- phyloseq::sample_data(ps_NF)[, "NF.TMM"] head(normFacts) # VERY IMPORTANT: edgeR uses normalization factors to normalize library sizes # not counts. They are used internally by a regression model. To make edgeR # normalization factors available for other methods, such as DESeq2 or other # DA methods based on scaling or size factors, we need to transform them into # size factors. This is possible by multiplying the factors for the library # sizes and renormalizing. normLibSize = normFacts * colSums(phyloseq::otu_table(ps_stool_16S)) # Renormalize: multiply to 1 sizeFacts = normLibSize/exp(colMeans(log(normLibSize)))
Calculate the Total Sum Scaling (TSS) factors for a phyloseq or a TreeSummarizedExperiment object, i.e. the library sizes. If the counts are divided by the scaling factors, a relative abundance is obtained.
norm_TSS(object, assay_name = "counts", method = "TSS", verbose = TRUE)
norm_TSS(object, assay_name = "counts", method = "TSS", verbose = TRUE)
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
method |
normalization method to be used. |
verbose |
an optional logical value. If |
A new column containing the TSS scaling factors is added to the
sample_data
slot of the phyloseq object or the colData
slot of
the TreeSummarizedExperiment object.
setNormalizations
and runNormalizations
to fastly set and run normalizations.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the scaling factors ps_NF <- norm_TSS(object = ps) # The phyloseq object now contains the scaling factors: scaleFacts <- phyloseq::sample_data(ps_NF)[, "NF.TSS"] head(scaleFacts)
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the scaling factors ps_NF <- norm_TSS(object = ps) # The phyloseq object now contains the scaling factors: scaleFacts <- phyloseq::sample_data(ps_NF)[, "NF.TSS"] head(scaleFacts)
Produce a list of graphical outputs summarizing the between and within method concordance.
plotConcordance(concordance, threshold = NULL, cols = NULL)
plotConcordance(concordance, threshold = NULL, cols = NULL)
concordance |
A long format |
threshold |
The threshold for rank (x-axis upper limit if all methods have a higher number of computed statistics). |
cols |
A named vector containing the color hex codes. |
A 2 elements list of ggplot2
class objects:
concordanceDendrogram
which contains the
vertically directioned dendrogram for the methods involved in the
concordance analysis;
concordanceHeatmap
which contains the heatmap of between
and within method concordances.
data(ps_plaque_16S) # Balanced design my_splits <- createSplits( object = ps_plaque_16S, varName = "HMP_BODY_SUBSITE", balanced = TRUE, paired = "RSID", N = 10 # N = 100 suggested ) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Initialize some limma based methods my_limma <- set_limma(design = ~ RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Set the normalization methods according to the DA methods my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) # Run methods on split datasets results <- runSplits(split_list = my_splits, method_list = my_limma, normalization_list = my_norm, object = ps_plaque_16S) # Concordance for p-values concordance_pvalues <- createConcordance( object = results, slot = "pValMat", colName = "rawP", type = "pvalue" ) # plot concordances from rank 1 to 50. plotConcordance( concordance = concordance_pvalues, threshold = 50 )
data(ps_plaque_16S) # Balanced design my_splits <- createSplits( object = ps_plaque_16S, varName = "HMP_BODY_SUBSITE", balanced = TRUE, paired = "RSID", N = 10 # N = 100 suggested ) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Initialize some limma based methods my_limma <- set_limma(design = ~ RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Set the normalization methods according to the DA methods my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) # Run methods on split datasets results <- runSplits(split_list = my_splits, method_list = my_limma, normalization_list = my_norm, object = ps_plaque_16S) # Concordance for p-values concordance_pvalues <- createConcordance( object = results, slot = "pValMat", colName = "rawP", type = "pvalue" ) # plot concordances from rank 1 to 50. plotConcordance( concordance = concordance_pvalues, threshold = 50 )
Plot of the contingency tables for the chosen method. The top-left cells are colored, according to Fisher exact tests' p-values, if the number of features in those cells are enriched.
plotContingency(enrichment, method, levels_to_plot)
plotContingency(enrichment, method, levels_to_plot)
enrichment |
enrichment object produced by createEnrichment function. |
method |
name of the method to plot. |
levels_to_plot |
A character vector containing the levels of the enrichment variable to plot. |
a ggplot2
object.
createEnrichment
, plotEnrichment
, and
plotMutualFindings
.
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Perform DA analysis Plaque_16S_DA <- runDA(method_list = my_limma, object = ps_plaque_16S) # Enrichment analysis enrichment <- createEnrichment(object = Plaque_16S_DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "GENUS", slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.1, threshold_logfc = 1, top = 10, verbose = TRUE) # Contingency tables plotContingency(enrichment = enrichment, method = "limma.TMM") # Barplots plotEnrichment(enrichment, enrichmentCol = "Type") # Mutual findings plotMutualFindings( enrichment = enrichment, enrichmentCol = "Type", n_methods = 1 )
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Perform DA analysis Plaque_16S_DA <- runDA(method_list = my_limma, object = ps_plaque_16S) # Enrichment analysis enrichment <- createEnrichment(object = Plaque_16S_DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "GENUS", slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.1, threshold_logfc = 1, top = 10, verbose = TRUE) # Contingency tables plotContingency(enrichment = enrichment, method = "limma.TMM") # Barplots plotEnrichment(enrichment, enrichmentCol = "Type") # Mutual findings plotMutualFindings( enrichment = enrichment, enrichmentCol = "Type", n_methods = 1 )
Summary plot for the number of differentially abundant (DA) features and their association with enrichment variable. If some features are UP-Abundant or DOWN-Abundant (or just DA), several bars will be represented in the corresponding direction. Significance thresholds are reported over/under each bar, according to the Fisher exact tests.
plotEnrichment(enrichment, enrichmentCol, levels_to_plot)
plotEnrichment(enrichment, enrichmentCol, levels_to_plot)
enrichment |
enrichment object produced by createEnrichment function. |
enrichmentCol |
name of the column containing information for enrichment analysis. |
levels_to_plot |
A character vector containing the levels of the enrichment variable to plot. |
a ggplot2
object.
createEnrichment
, plotContingency
, and
plotMutualFindings
.
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Perform DA analysis Plaque_16S_DA <- runDA(method_list = my_limma, object = ps_plaque_16S) # Enrichment analysis enrichment <- createEnrichment(object = Plaque_16S_DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "GENUS", slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.1, threshold_logfc = 1, top = 10, verbose = TRUE) # Contingency tables plotContingency(enrichment = enrichment, method = "limma.TMM") # Barplots plotEnrichment(enrichment, enrichmentCol = "Type") # Mutual findings plotMutualFindings( enrichment = enrichment, enrichmentCol = "Type", n_methods = 1 )
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Perform DA analysis Plaque_16S_DA <- runDA(method_list = my_limma, object = ps_plaque_16S) # Enrichment analysis enrichment <- createEnrichment(object = Plaque_16S_DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "GENUS", slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.1, threshold_logfc = 1, top = 10, verbose = TRUE) # Contingency tables plotContingency(enrichment = enrichment, method = "limma.TMM") # Barplots plotEnrichment(enrichment, enrichmentCol = "Type") # Mutual findings plotMutualFindings( enrichment = enrichment, enrichmentCol = "Type", n_methods = 1 )
Draw the nominal false discovery rates for the 0.01, 0.05, and 0.1 levels.
plotFDR(df_FDR, cols = NULL)
plotFDR(df_FDR, cols = NULL)
df_FDR |
a |
cols |
named vector of colors. |
A ggplot object.
# Load some data data(ps_stool_16S) # Generate the patterns for 10 mock comparison for an experiment # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S) # Prepare results for Type I Error Control TIEC_summary <- createTIEC(results) # Plot the results plotFPR(df_FPR = TIEC_summary$df_FPR) plotFDR(df_FDR = TIEC_summary$df_FDR) plotQQ(df_QQ = TIEC_summary$df_QQ, zoom = c(0, 0.1)) plotKS(df_KS = TIEC_summary$df_KS) plotLogP(df_QQ = TIEC_summary$df_QQ)
# Load some data data(ps_stool_16S) # Generate the patterns for 10 mock comparison for an experiment # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S) # Prepare results for Type I Error Control TIEC_summary <- createTIEC(results) # Plot the results plotFPR(df_FPR = TIEC_summary$df_FPR) plotFDR(df_FDR = TIEC_summary$df_FDR) plotQQ(df_QQ = TIEC_summary$df_QQ, zoom = c(0, 0.1)) plotKS(df_KS = TIEC_summary$df_KS) plotLogP(df_QQ = TIEC_summary$df_QQ)
Draw the boxplots of the proportions of p-values lower than 0.01, 0.05, and 0.1 thresholds for each method.
plotFPR(df_FPR, cols = NULL)
plotFPR(df_FPR, cols = NULL)
df_FPR |
a |
cols |
named vector of colors. |
A ggplot object.
# Load some data data(ps_stool_16S) # Generate the patterns for 10 mock comparison for an experiment # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S) # Prepare results for Type I Error Control TIEC_summary <- createTIEC(results) # Plot the results plotFPR(df_FPR = TIEC_summary$df_FPR) plotFDR(df_FDR = TIEC_summary$df_FDR) plotQQ(df_QQ = TIEC_summary$df_QQ, zoom = c(0, 0.1)) plotKS(df_KS = TIEC_summary$df_KS) plotLogP(df_QQ = TIEC_summary$df_QQ)
# Load some data data(ps_stool_16S) # Generate the patterns for 10 mock comparison for an experiment # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S) # Prepare results for Type I Error Control TIEC_summary <- createTIEC(results) # Plot the results plotFPR(df_FPR = TIEC_summary$df_FPR) plotFDR(df_FDR = TIEC_summary$df_FDR) plotQQ(df_QQ = TIEC_summary$df_QQ, zoom = c(0, 0.1)) plotKS(df_KS = TIEC_summary$df_KS) plotLogP(df_QQ = TIEC_summary$df_QQ)
Draw the boxplots of the Kolmogorov-Smirnov test statistics for the p-value distributions across the mock comparisons.
plotKS(df_KS, cols = NULL)
plotKS(df_KS, cols = NULL)
df_KS |
a |
cols |
named vector of colors. |
A ggplot object.
# Load some data data(ps_stool_16S) # Generate the patterns for 10 mock comparison for an experiment # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S) # Prepare results for Type I Error Control TIEC_summary <- createTIEC(results) # Plot the results plotFPR(df_FPR = TIEC_summary$df_FPR) plotFDR(df_FDR = TIEC_summary$df_FDR) plotQQ(df_QQ = TIEC_summary$df_QQ, zoom = c(0, 0.1)) plotKS(df_KS = TIEC_summary$df_KS) plotLogP(df_QQ = TIEC_summary$df_QQ)
# Load some data data(ps_stool_16S) # Generate the patterns for 10 mock comparison for an experiment # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S) # Prepare results for Type I Error Control TIEC_summary <- createTIEC(results) # Plot the results plotFPR(df_FPR = TIEC_summary$df_FPR) plotFDR(df_FDR = TIEC_summary$df_FDR) plotQQ(df_QQ = TIEC_summary$df_QQ, zoom = c(0, 0.1)) plotKS(df_KS = TIEC_summary$df_KS) plotLogP(df_QQ = TIEC_summary$df_QQ)
Draw the p-values or the average p-values distribution across the mock comparisons in logarithmic scale.
plotLogP(df_pval = NULL, df_QQ = NULL, cols = NULL)
plotLogP(df_pval = NULL, df_QQ = NULL, cols = NULL)
df_pval |
a |
df_QQ |
a |
cols |
named vector of colors. |
A ggplot object.
# Load some data data(ps_stool_16S) # Generate the patterns for 10 mock comparison for an experiment # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S) # Prepare results for Type I Error Control TIEC_summary <- createTIEC(results) # Plot the results plotFPR(df_FPR = TIEC_summary$df_FPR) plotFDR(df_FDR = TIEC_summary$df_FDR) plotQQ(df_QQ = TIEC_summary$df_QQ, zoom = c(0, 0.1)) plotKS(df_KS = TIEC_summary$df_KS) plotLogP(df_QQ = TIEC_summary$df_QQ)
# Load some data data(ps_stool_16S) # Generate the patterns for 10 mock comparison for an experiment # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S) # Prepare results for Type I Error Control TIEC_summary <- createTIEC(results) # Plot the results plotFPR(df_FPR = TIEC_summary$df_FPR) plotFDR(df_FDR = TIEC_summary$df_FDR) plotQQ(df_QQ = TIEC_summary$df_QQ, zoom = c(0, 0.1)) plotKS(df_KS = TIEC_summary$df_KS) plotLogP(df_QQ = TIEC_summary$df_QQ)
A function to plot mean difference (MD) and zero probability difference (ZPD) values between estimated and observed values.
plotMD(data, difference = NULL, split = TRUE)
plotMD(data, difference = NULL, split = TRUE)
data |
a list, output of the |
difference |
character vector, either |
split |
Display each model mean differences in different facets
(default |
a ggplot
object.
fitModels
and RMSE
for the model
estimations and the RMSE computations respectively. plotRMSE
for the graphical evaluation of the RMSE values.
# Generate some random counts counts = matrix(rnbinom(n = 600, size = 3, prob = 0.5), nrow = 100, ncol = 6) # Estimate the counts assuming several distributions GOF <- fitModels( object = counts, models = c( "NB", "ZINB", "DM", "ZIG", "HURDLE" ), scale_HURDLE = c("median", "default") ) # Plot the results plotMD(data = GOF, difference = "MD", split = TRUE) plotMD(data = GOF, difference = "ZPD", split = TRUE)
# Generate some random counts counts = matrix(rnbinom(n = 600, size = 3, prob = 0.5), nrow = 100, ncol = 6) # Estimate the counts assuming several distributions GOF <- fitModels( object = counts, models = c( "NB", "ZINB", "DM", "ZIG", "HURDLE" ), scale_HURDLE = c("median", "default") ) # Plot the results plotMD(data = GOF, difference = "MD", split = TRUE) plotMD(data = GOF, difference = "ZPD", split = TRUE)
Plot and filter the features which are considered differentially abundant, simultaneously, by a specified number of methods.
plotMutualFindings(enrichment, enrichmentCol, levels_to_plot, n_methods = 1)
plotMutualFindings(enrichment, enrichmentCol, levels_to_plot, n_methods = 1)
enrichment |
enrichment object produced by createEnrichment function. |
enrichmentCol |
name of the column containing information for enrichment analysis. |
levels_to_plot |
A character vector containing the levels of the enrichment variable to plot. |
n_methods |
minimum number of method that mutually find the features. |
a ggplot2
object.
createEnrichment
, plotEnrichment
, and
plotContingency
.
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Perform DA analysis Plaque_16S_DA <- runDA(method_list = my_limma, object = ps_plaque_16S) # Enrichment analysis enrichment <- createEnrichment(object = Plaque_16S_DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "GENUS", slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.1, threshold_logfc = 1, top = 10, verbose = TRUE) # Contingency tables plotContingency(enrichment = enrichment, method = "limma.TMM") # Barplots plotEnrichment(enrichment, enrichmentCol = "Type") # Mutual findings plotMutualFindings( enrichment = enrichment, enrichmentCol = "Type", n_methods = 1 )
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Perform DA analysis Plaque_16S_DA <- runDA(method_list = my_limma, object = ps_plaque_16S) # Enrichment analysis enrichment <- createEnrichment(object = Plaque_16S_DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "GENUS", slot = "pValMat", colName = "adjP", type = "pvalue", direction = "logFC", threshold_pvalue = 0.1, threshold_logfc = 1, top = 10, verbose = TRUE) # Contingency tables plotContingency(enrichment = enrichment, method = "limma.TMM") # Barplots plotEnrichment(enrichment, enrichmentCol = "Type") # Mutual findings plotMutualFindings( enrichment = enrichment, enrichmentCol = "Type", n_methods = 1 )
Plot the difference between the number of true positives (TP)
and false positives (FP) for each method and for each 'top' threshold
provided by the createPositives()
function.
plotPositives(positives, cols = NULL)
plotPositives(positives, cols = NULL)
positives |
|
cols |
named vector of cols (default |
a ggplot2
object.
getPositives
, createPositives
.
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Perform DA analysis Plaque_16S_DA <- runDA(method_list = my_limma, object = ps_plaque_16S) # Count TPs and FPs, from the top 1 to the top 20 features. # As direction is supplied, features are ordered by "logFC" absolute values. positives <- createPositives(object = Plaque_16S_DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "newNames", slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC", threshold_pvalue = 1, threshold_logfc = 0, top = 1:20, alternative = "greater", verbose = FALSE, TP = list(c("DOWN Abundant", "Anaerobic"), c("UP Abundant", "Aerobic")), FP = list(c("DOWN Abundant", "Aerobic"), c("UP Abundant", "Anaerobic"))) # Plot the TP-FP differences for each threshold plotPositives(positives = positives)
data("ps_plaque_16S") data("microbial_metabolism") # Extract genera from the phyloseq tax_table slot genera <- phyloseq::tax_table(ps_plaque_16S)[, "GENUS"] # Genera as rownames of microbial_metabolism data.frame rownames(microbial_metabolism) <- microbial_metabolism$Genus # Match OTUs to their metabolism priorInfo <- data.frame(genera, "Type" = microbial_metabolism[genera, "Type"]) # Unmatched genera becomes "Unknown" unknown_metabolism <- is.na(priorInfo$Type) priorInfo[unknown_metabolism, "Type"] <- "Unknown" priorInfo$Type <- factor(priorInfo$Type) # Add a more informative names column priorInfo[, "newNames"] <- paste0(rownames(priorInfo), priorInfo[, "GENUS"]) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_plaque_16S <- runNormalizations(normalization_list = my_norm, object = ps_plaque_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ 1 + RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Perform DA analysis Plaque_16S_DA <- runDA(method_list = my_limma, object = ps_plaque_16S) # Count TPs and FPs, from the top 1 to the top 20 features. # As direction is supplied, features are ordered by "logFC" absolute values. positives <- createPositives(object = Plaque_16S_DA, priorKnowledge = priorInfo, enrichmentCol = "Type", namesCol = "newNames", slot = "pValMat", colName = "rawP", type = "pvalue", direction = "logFC", threshold_pvalue = 1, threshold_logfc = 0, top = 1:20, alternative = "greater", verbose = FALSE, TP = list(c("DOWN Abundant", "Anaerobic"), c("UP Abundant", "Aerobic")), FP = list(c("DOWN Abundant", "Aerobic"), c("UP Abundant", "Anaerobic"))) # Plot the TP-FP differences for each threshold plotPositives(positives = positives)
Draw the average QQ-plots across the mock comparisons.
plotQQ(df_QQ, cols = NULL, zoom = c(0, 0.1), split = FALSE)
plotQQ(df_QQ, cols = NULL, zoom = c(0, 0.1), split = FALSE)
df_QQ |
Coordinates to draw the QQ-plot to compare the mean observed p-value distribution across comparisons, with the theoretical uniform distribution. |
cols |
named vector of colors. |
zoom |
2-dimesional vector containing the starting and the
final coordinates (default: |
split |
boolean value. If |
A ggplot object.
# Load some data data(ps_stool_16S) # Generate the patterns for 10 mock comparison for an experiment # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S) # Prepare results for Type I Error Control TIEC_summary <- createTIEC(results) # Plot the results plotFPR(df_FPR = TIEC_summary$df_FPR) plotFDR(df_FDR = TIEC_summary$df_FDR) plotQQ(df_QQ = TIEC_summary$df_QQ, zoom = c(0, 0.1)) plotKS(df_KS = TIEC_summary$df_KS) plotLogP(df_QQ = TIEC_summary$df_QQ)
# Load some data data(ps_stool_16S) # Generate the patterns for 10 mock comparison for an experiment # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S) # Prepare results for Type I Error Control TIEC_summary <- createTIEC(results) # Plot the results plotFPR(df_FPR = TIEC_summary$df_FPR) plotFDR(df_FDR = TIEC_summary$df_FDR) plotQQ(df_QQ = TIEC_summary$df_QQ, zoom = c(0, 0.1)) plotKS(df_KS = TIEC_summary$df_KS) plotLogP(df_QQ = TIEC_summary$df_QQ)
A function to plot RMSE values computed for mean difference (MD) and zero probability difference (ZPD) values between estimated and observed values.
plotRMSE(data, difference = NULL, plotIt = TRUE)
plotRMSE(data, difference = NULL, plotIt = TRUE)
data |
a list, output of the |
difference |
character vector, either |
plotIt |
logical. Should plotting be done? (default
|
a ggplot
object.
fitModels
and RMSE
for the model
estimations and the RMSE computations respectively. plotMD
for
the graphical evaluation.
# Generate some random counts counts = matrix(rnbinom(n = 600, size = 3, prob = 0.5), nrow = 100, ncol = 6) # Estimate the counts assuming several distributions GOF <- fitModels( object = counts, models = c( "NB", "ZINB", "DM", "ZIG", "HURDLE" ), scale_HURDLE = c("median", "default") ) # Plot the RMSE results plotRMSE(data = GOF, difference = "MD") plotRMSE(data = GOF, difference = "ZPD")
# Generate some random counts counts = matrix(rnbinom(n = 600, size = 3, prob = 0.5), nrow = 100, ncol = 6) # Estimate the counts assuming several distributions GOF <- fitModels( object = counts, models = c( "NB", "ZINB", "DM", "ZIG", "HURDLE" ), scale_HURDLE = c("median", "default") ) # Plot the RMSE results plotRMSE(data = GOF, difference = "MD") plotRMSE(data = GOF, difference = "ZPD")
Continuity corrected logarithms of the average counts and fraction of zeroes by feature.
prepareObserved(object, assay_name = "counts", scale = NULL)
prepareObserved(object, assay_name = "counts", scale = NULL)
object |
a phyloseq object, a TreeSummarizedExperiment object, or a matrix of counts. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
scale |
If specified it refers to the character vector used in
|
A data frame containing the continuity corrected logarithm for the
raw count mean values for each taxon of the matrix of counts in the Y
column and the observed zero rate in the Y0
column. If scale
is
specified the continuity corrected logarithm for the mean CPM
(scale = "default"
) or the mean counts per median library size
(scale = "median"
) is computed instead.
# Generate some random counts counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) observed1 <- prepareObserved(counts) # For the comparison with HURDLE model observed2 <- prepareObserved(counts, scale = "median")
# Generate some random counts counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) observed1 <- prepareObserved(counts) # For the comparison with HURDLE model observed2 <- prepareObserved(counts, scale = "median")
A demonstrative purpose dataset containing microbial abundances
for a total of 88 OTUs. The 60 Gingival Plaque paired samples belong to the
Human Microbiome Project. This particular subset contains 30 Supragingival
and 30 Subgingival Plaque samples from the SEX = "Male"
,
RUN_CENTER = "WUCG"
, and VISITNO = "1"
samples. It is possible
to obtain the same dataset after basic filters (remove taxa with zero counts)
and collapsing the counts to the genus level; HMP16SData
Bioconductor
package was used to download the data.
data(ps_plaque_16S)
data(ps_plaque_16S)
An object of class phyloseq
A demonstrative purpose dataset containing microbial abundances
for a total of 71 OTUs. The 32 Stool samples belong to the Human Microbiome
Project. This particular subset contains the SEX = "Male"
,
RUN_CENTER = "BI"
, and VISITNO = "1"
samples. It is possible to
obtain the same dataset after basic filters (remove taxa with zero counts)
and collapsing the counts to the genus level; HMP16Data
Bioconductor
package was used to download the data.
data(ps_stool_16S)
data(ps_stool_16S)
An object of class phyloseq
Computes the Root Mean Square Error (RMSE) from a vector of differences.
RMSE(differences)
RMSE(differences)
differences |
a vector of differences. |
RMSE value
prepareObserved
and meanDifferences
.
# Generate the data.frame of Mean Differences and Zero Probability Difference MD_df <- data.frame(MD = rpois(10, 5), ZPD = runif(10, -1, 1)) # Calculate RMSE for MD and ZPD values RMSE(MD_df[, "MD"]) RMSE(MD_df[, "ZPD"])
# Generate the data.frame of Mean Differences and Zero Probability Difference MD_df <- data.frame(MD = rpois(10, 5), ZPD = runif(10, -1, 1)) # Calculate RMSE for MD and ZPD values RMSE(MD_df[, "MD"]) RMSE(MD_df[, "ZPD"])
Run the differential abundance detection methods.
runDA(method_list, object, weights = NULL, verbose = TRUE)
runDA(method_list, object, weights = NULL, verbose = TRUE)
method_list |
a list object containing the methods and their parameters. |
object |
a phyloseq object. |
weights |
an optional numeric matrix giving observational weights. |
verbose |
an optional logical value. If |
A named list containing the results for each method.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Set some simple normalizations my_norm <- setNormalizations() # Add them to the phyloseq object ps <- runNormalizations(normalization_list = my_norm, object = ps) # Set some limma instances my_methods <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "poscounts", "CSS")) # Run the methods results <- runDA(method_list = my_methods, object = ps)
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Set some simple normalizations my_norm <- setNormalizations() # Add them to the phyloseq object ps <- runNormalizations(normalization_list = my_norm, object = ps) # Set some limma instances my_methods <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "poscounts", "CSS")) # Run the methods results <- runDA(method_list = my_methods, object = ps)
Run the differential abundance detection methods on mock datasets.
runMocks( mocks, method_list, object, weights = NULL, verbose = TRUE, BPPARAM = BiocParallel::SerialParam() )
runMocks( mocks, method_list, object, weights = NULL, verbose = TRUE, BPPARAM = BiocParallel::SerialParam() )
mocks |
a |
method_list |
a list object containing the methods and their parameters. |
object |
a phyloseq or TreeSummarizedExperiment object. |
weights |
an optional numeric matrix giving observational weights. |
verbose |
an optional logical value. If |
BPPARAM |
An optional |
A named list containing the results for each method.
# Load some data data(ps_stool_16S) # Generate the pattern for 10 mock comparisons # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S)
# Load some data data(ps_stool_16S) # Generate the pattern for 10 mock comparisons # (N = 1000 is suggested) mocks <- createMocks(nsamples = phyloseq::nsamples(ps_stool_16S), N = 10) head(mocks) # Add some normalization/scaling factors to the phyloseq object my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) ps_stool_16S <- runNormalizations(normalization_list = my_norm, object = ps_stool_16S) # Initialize some limma based methods my_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "CSS")) # Run methods on mock datasets results <- runMocks(mocks = mocks, method_list = my_limma, object = ps_stool_16S)
Add normalization/scaling factors to a phyloseq object
runNormalizations( normalization_list, object, assay_name = "counts", verbose = TRUE )
runNormalizations( normalization_list, object, assay_name = "counts", verbose = TRUE )
normalization_list |
a list object containing the normalization methods and their parameters. |
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
verbose |
an optional logical value. If |
A phyloseq object containing the normalization/scaling factors.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Set some simple normalizations my_normalizations <- setNormalizations() # Add them to the phyloseq object ps <- runNormalizations(normalization_list = my_normalizations, object = ps)
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6"), "group" = as.factor(c("A", "A", "A", "B", "B", "B"))) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Set some simple normalizations my_normalizations <- setNormalizations() # Add them to the phyloseq object ps <- runNormalizations(normalization_list = my_normalizations, object = ps)
Run the differential abundance detection methods on split datasets.
runSplits( split_list, method_list, normalization_list, object, assay_name = "counts", min_counts = 0, min_samples = 0, verbose = TRUE, BPPARAM = BiocParallel::SerialParam() )
runSplits( split_list, method_list, normalization_list, object, assay_name = "counts", min_counts = 0, min_samples = 0, verbose = TRUE, BPPARAM = BiocParallel::SerialParam() )
split_list |
A list of 2 |
method_list |
a list object containing the methods and their parameters. |
normalization_list |
a list object containing the normalization method
names and their parameters produced by |
object |
a phyloseq object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
min_counts |
Parameter to filter taxa. Set this number to keep features
with more than |
min_samples |
Parameter to filter taxa. Set this number to keep
features with a |
verbose |
an optional logical value. If |
BPPARAM |
An optional |
A named list containing the results for each method.
data(ps_plaque_16S) # Balanced design my_splits <- createSplits( object = ps_plaque_16S, varName = "HMP_BODY_SUBSITE", balanced = TRUE, paired = "RSID", N = 10 # N = 100 suggested ) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Initialize some limma based methods my_limma <- set_limma(design = ~ RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Set the normalization methods according to the DA methods my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) # Run methods on split datasets results <- runSplits(split_list = my_splits, method_list = my_limma, normalization_list = my_norm, object = ps_plaque_16S)
data(ps_plaque_16S) # Balanced design my_splits <- createSplits( object = ps_plaque_16S, varName = "HMP_BODY_SUBSITE", balanced = TRUE, paired = "RSID", N = 10 # N = 100 suggested ) # Make sure the subject ID variable is a factor phyloseq::sample_data(ps_plaque_16S)[, "RSID"] <- as.factor( phyloseq::sample_data(ps_plaque_16S)[["RSID"]]) # Initialize some limma based methods my_limma <- set_limma(design = ~ RSID + HMP_BODY_SUBSITE, coef = "HMP_BODY_SUBSITESupragingival Plaque", norm = c("TMM", "CSS")) # Set the normalization methods according to the DA methods my_norm <- setNormalizations(fun = c("norm_edgeR", "norm_CSS"), method = c("TMM", "CSS")) # Run methods on split datasets results <- runSplits(split_list = my_splits, method_list = my_limma, normalization_list = my_norm, object = ps_plaque_16S)
Set the parameters for ALDEx2 differential abundance detection method.
set_ALDEx2( assay_name = "counts", pseudo_count = FALSE, design = NULL, mc.samples = 128, test = "t", paired.test = FALSE, denom = "all", contrast = NULL, expand = TRUE )
set_ALDEx2( assay_name = "counts", pseudo_count = FALSE, design = NULL, mc.samples = 128, test = "t", paired.test = FALSE, denom = "all", contrast = NULL, expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
design |
a character with the name of a variable to group samples and
compare them or a formula to compute a model.matrix (when
|
mc.samples |
an integer. The number of Monte Carlo samples to use when estimating the underlying distributions. Since we are estimating central tendencies, 128 is usually sufficient. |
test |
a character string. Indicates which tests to perform. "t" runs Welch's t test while "wilcox" runs Wilcoxon test. "kw" runs Kruskal-Wallace test while "kw_glm" runs glm ANOVA-like test. "glm" runs a generalized linear model. |
paired.test |
A boolean. Toggles whether to do paired-sample tests.
Applies to |
denom |
An |
contrast |
character vector with exactly three elements: the name of a variable used in "design", the name of the level of interest, and the name of the reference level. If "kw" or "kw_glm" as test, contrast vector is not used. |
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for
DA_ALDEx2
method.
# Set some basic combinations of parameters for ALDEx2 base_ALDEx2 <- set_ALDEx2(design = "group", contrast = c("group", "grp2", "grp1")) # Set a specific set of normalization for ALDEx2 (even of other # packages!) setNorm_ALDEx2 <- set_ALDEx2(design = "group", contrast = c("group", "grp2", "grp1")) # Set many possible combinations of parameters for ALDEx2 all_ALDEx2 <- set_ALDEx2(design = "group", denom = c("iqlr", "zero"), test = c("t", "wilcox"), contrast = c("group", "grp2", "grp1"))
# Set some basic combinations of parameters for ALDEx2 base_ALDEx2 <- set_ALDEx2(design = "group", contrast = c("group", "grp2", "grp1")) # Set a specific set of normalization for ALDEx2 (even of other # packages!) setNorm_ALDEx2 <- set_ALDEx2(design = "group", contrast = c("group", "grp2", "grp1")) # Set many possible combinations of parameters for ALDEx2 all_ALDEx2 <- set_ALDEx2(design = "group", denom = c("iqlr", "zero"), test = c("t", "wilcox"), contrast = c("group", "grp2", "grp1"))
Set the parameters for ANCOM differential abundance detection method.
set_ANCOM( assay_name = "counts", pseudo_count = FALSE, fix_formula = NULL, adj_formula = NULL, rand_formula = NULL, lme_control = lme4::lmerControl(), contrast = NULL, alpha = 0.05, p_adj_method = "BH", struc_zero = FALSE, BC = TRUE, n_cl = 1, expand = TRUE )
set_ANCOM( assay_name = "counts", pseudo_count = FALSE, fix_formula = NULL, adj_formula = NULL, rand_formula = NULL, lme_control = lme4::lmerControl(), contrast = NULL, alpha = 0.05, p_adj_method = "BH", struc_zero = FALSE, BC = TRUE, n_cl = 1, expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
fix_formula |
Used when |
adj_formula |
Used when |
rand_formula |
Optionally used when |
lme_control |
a list of control parameters for mixed model fitting.
See |
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
alpha |
numeric. Level of significance. Default is 0.05. |
p_adj_method |
character. method to adjust p-values. Default is "holm".
Options include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
"fdr", "none". See |
struc_zero |
logical. Whether to detect structural zeros based on
|
BC |
boolean for ANCOM method to use. If TRUE the bias correction
(ANCOM-BC2) is computed (default |
n_cl |
numeric. The number of nodes to be forked. For details, see
|
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for DA_ANCOM
method.
# Set some basic combinations of parameters for ANCOM with bias correction base_ANCOMBC <- set_ANCOM(pseudo_count = FALSE, fix_formula = "group", contrast = c("group", "B", "A"), BC = TRUE, expand = FALSE) many_ANCOMs <- set_ANCOM(pseudo_count = c(TRUE, FALSE), fix_formula = "group", contrast = c("group", "B", "A"), struc_zero = c(TRUE, FALSE), BC = c(TRUE, FALSE))
# Set some basic combinations of parameters for ANCOM with bias correction base_ANCOMBC <- set_ANCOM(pseudo_count = FALSE, fix_formula = "group", contrast = c("group", "B", "A"), BC = TRUE, expand = FALSE) many_ANCOMs <- set_ANCOM(pseudo_count = c(TRUE, FALSE), fix_formula = "group", contrast = c("group", "B", "A"), struc_zero = c(TRUE, FALSE), BC = c(TRUE, FALSE))
Set the parameters for basic differential abundance detection methods such as t and wilcox.
set_basic( assay_name = "counts", pseudo_count = FALSE, contrast = NULL, test = c("t", "wilcox"), paired = FALSE, expand = TRUE )
set_basic( assay_name = "counts", pseudo_count = FALSE, contrast = NULL, test = c("t", "wilcox"), paired = FALSE, expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
test |
name of the test to perform. Choose between "t" or "wilcox". |
paired |
boolean. Choose whether the test is paired or not (default
|
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for DA_basic
method.
# Set some basic methods basic_methods <- set_basic(pseudo_count = FALSE, test = c("t", "wilcox"), contrast = c("group", "B", "A"), expand = TRUE)
# Set some basic methods basic_methods <- set_basic(pseudo_count = FALSE, test = c("t", "wilcox"), contrast = c("group", "B", "A"), expand = TRUE)
Set the parameters for corncob differential abundance detection method.
set_corncob( assay_name = "counts", pseudo_count = FALSE, formula = NULL, phi.formula = NULL, formula_null = NULL, phi.formula_null = NULL, test = c("Wald", "LRT"), boot = FALSE, coefficient = NULL, expand = TRUE )
set_corncob( assay_name = "counts", pseudo_count = FALSE, formula = NULL, phi.formula = NULL, formula_null = NULL, phi.formula_null = NULL, test = c("Wald", "LRT"), boot = FALSE, coefficient = NULL, expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
formula |
an object of class |
phi.formula |
an object of class |
formula_null |
Formula for mean under null, without response |
phi.formula_null |
Formula for overdispersion under null, without response |
test |
Character. Hypothesis testing procedure to use. One of
|
boot |
Boolean. Defaults to |
coefficient |
The coefficient of interest as a single word formed by the variable name and the non reference level. (e.g.: 'ConditionDisease' if the reference level for the variable 'Condition' is 'control'). |
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for DA_corncob
method.
# Set some basic combinations of parameters for corncob base_corncob <- set_corncob(formula = ~ group, phi.formula = ~ group, formula_null = ~ 1, phi.formula_null = ~ group, coefficient = "groupB") # Set many possible combinations of parameters for corncob all_corncob <- set_corncob(pseudo_count = c(TRUE, FALSE), formula = ~ group, phi.formula = ~ group, formula_null = ~ 1, phi.formula_null = ~ group, coefficient = "groupB", boot = c(TRUE, FALSE))
# Set some basic combinations of parameters for corncob base_corncob <- set_corncob(formula = ~ group, phi.formula = ~ group, formula_null = ~ 1, phi.formula_null = ~ group, coefficient = "groupB") # Set many possible combinations of parameters for corncob all_corncob <- set_corncob(pseudo_count = c(TRUE, FALSE), formula = ~ group, phi.formula = ~ group, formula_null = ~ 1, phi.formula_null = ~ group, coefficient = "groupB", boot = c(TRUE, FALSE))
Set the parameters for dearseq differential abundance detection method.
set_dearseq( assay_name = "counts", pseudo_count = FALSE, covariates = NULL, variables2test = NULL, sample_group = NULL, test = c("permutation", "asymptotic"), preprocessed = FALSE, n_perm = 1000, expand = TRUE )
set_dearseq( assay_name = "counts", pseudo_count = FALSE, covariates = NULL, variables2test = NULL, sample_group = NULL, test = c("permutation", "asymptotic"), preprocessed = FALSE, n_perm = 1000, expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
covariates |
a character vector containing the colnames of the covariates to include in the model. |
variables2test |
a character vector containing the colnames of the variable of interest. |
sample_group |
a vector of length |
test |
a character string indicating which method to use to approximate
the variance component score test, either 'permutation' or 'asymptotic'
(default |
preprocessed |
a logical flag indicating whether the expression data have
already been preprocessed (e.g. log2 transformed). Default is |
n_perm |
the number of perturbations. Default is |
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for DA_dearseq
method.
# Set some basic combinations of parameters for dearseq base_dearseq <- set_dearseq(pseudo_count = FALSE, variables2test = "group", test = c("permutation", "asymptotic"), expand = TRUE)
# Set some basic combinations of parameters for dearseq base_dearseq <- set_dearseq(pseudo_count = FALSE, variables2test = "group", test = c("permutation", "asymptotic"), expand = TRUE)
Set the parameters for DESeq2 differential abundance detection method.
set_DESeq2( assay_name = "counts", pseudo_count = FALSE, design = NULL, contrast = NULL, alpha = 0.05, norm = c("ratio", "poscounts", "iterate"), weights_logical = FALSE, expand = TRUE )
set_DESeq2( assay_name = "counts", pseudo_count = FALSE, design = NULL, contrast = NULL, alpha = 0.05, norm = c("ratio", "poscounts", "iterate"), weights_logical = FALSE, expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
design |
character or formula to specify the model matrix. |
contrast |
character vector with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change. |
alpha |
the significance cutoff used for optimizing the independent filtering (by default 0.05). If the adjusted p-value cutoff (FDR) will be a value other than 0.05, alpha should be set to that value. |
norm |
name of the normalization method to use in the differential
abundance analysis. Choose between the native DESeq2 normalization methods,
such as |
weights_logical |
logical vector, if TRUE a matrix of observational
weights will be used for differential abundance analysis (default
|
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for DA_DESeq2
method.
# Set some basic combinations of parameters for DESeq2 base_DESeq2 <- set_DESeq2(design = ~ group, contrast = c("group", "B", "A")) # Set a specific set of normalization for DESeq2 setNorm_DESeq2 <- set_DESeq2(design = ~ group, contrast = c("group", "B", "A"), norm = c("ratio", "poscounts")) # Set many possible combinations of parameters for DESeq2 all_DESeq2 <- set_DESeq2(pseudo_count = c(TRUE, FALSE), design = ~ group, contrast = c("group", "B", "A"), weights_logical = c(TRUE,FALSE))
# Set some basic combinations of parameters for DESeq2 base_DESeq2 <- set_DESeq2(design = ~ group, contrast = c("group", "B", "A")) # Set a specific set of normalization for DESeq2 setNorm_DESeq2 <- set_DESeq2(design = ~ group, contrast = c("group", "B", "A"), norm = c("ratio", "poscounts")) # Set many possible combinations of parameters for DESeq2 all_DESeq2 <- set_DESeq2(pseudo_count = c(TRUE, FALSE), design = ~ group, contrast = c("group", "B", "A"), weights_logical = c(TRUE,FALSE))
Set the parameters for edgeR differential abundance detection method.
set_edgeR( assay_name = "counts", pseudo_count = FALSE, group_name = NULL, design = NULL, robust = FALSE, coef = 2, norm = c("TMM", "TMMwsp", "RLE", "upperquartile", "posupperquartile", "none"), weights_logical = FALSE, expand = TRUE )
set_edgeR( assay_name = "counts", pseudo_count = FALSE, group_name = NULL, design = NULL, robust = FALSE, coef = 2, norm = c("TMM", "TMMwsp", "RLE", "upperquartile", "posupperquartile", "none"), weights_logical = FALSE, expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
group_name |
character giving the name of the column containing information about experimental group/condition for each sample/library. |
design |
character or formula to specify the model matrix. |
robust |
logical, should the estimation of |
coef |
integer or character index vector indicating which coefficients of the linear model are to be tested equal to zero. |
norm |
name of the normalization method to use in the differential
abundance analysis. Choose between the native edgeR normalization methods,
such as |
weights_logical |
logical vector, if true a matrix of observation
weights must be supplied (default |
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for DA_edgeR
method.
# Set some basic combinations of parameters for edgeR base_edgeR <- set_edgeR(group_name = "group", design = ~ group, coef = 2) # Set a specific set of normalization for edgeR setNorm_edgeR <- set_edgeR(group_name = "group", design = ~ group, coef = 2, norm = c("TMM", "RLE")) # Set many possible combinations of parameters for edgeR all_edgeR <- set_edgeR(pseudo_count = c(TRUE, FALSE), group_name = "group", design = ~ group, robust = c(TRUE, FALSE), coef = 2, weights_logical = c(TRUE, FALSE))
# Set some basic combinations of parameters for edgeR base_edgeR <- set_edgeR(group_name = "group", design = ~ group, coef = 2) # Set a specific set of normalization for edgeR setNorm_edgeR <- set_edgeR(group_name = "group", design = ~ group, coef = 2, norm = c("TMM", "RLE")) # Set many possible combinations of parameters for edgeR all_edgeR <- set_edgeR(pseudo_count = c(TRUE, FALSE), group_name = "group", design = ~ group, robust = c(TRUE, FALSE), coef = 2, weights_logical = c(TRUE, FALSE))
Set the parameters for limma differential abundance detection method.
set_limma( assay_name = "counts", pseudo_count = FALSE, design = NULL, coef = 2, norm = c("TMM", "TMMwsp", "RLE", "upperquartile", "posupperquartile", "none"), weights_logical = FALSE, expand = TRUE )
set_limma( assay_name = "counts", pseudo_count = FALSE, design = NULL, coef = 2, norm = c("TMM", "TMMwsp", "RLE", "upperquartile", "posupperquartile", "none"), weights_logical = FALSE, expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
design |
character name of the metadata columns, formula, or design matrix with rows corresponding to samples and columns to coefficients to be estimated. |
coef |
integer or character index vector indicating which coefficients of the linear model are to be tested equal to zero. |
norm |
name of the normalization method to use in the differential
abundance analysis. Choose between the native edgeR normalization methods,
such as |
weights_logical |
logical vector, if TRUE a matrix of observational
weights will be used for differential abundance analysis (default
|
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for DA_limma
method.
# Set some basic combinations of parameters for limma base_limma <- set_limma(design = ~ group, coef = 2) # Set a specific set of normalization for limma (even of other packages!) setNorm_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "upperquartile")) # Set many possible combinations of parameters for limma all_limma <- set_limma(pseudo_count = c(TRUE, FALSE), design = ~ group, coef = 2, weights_logical = c(TRUE, FALSE))
# Set some basic combinations of parameters for limma base_limma <- set_limma(design = ~ group, coef = 2) # Set a specific set of normalization for limma (even of other packages!) setNorm_limma <- set_limma(design = ~ group, coef = 2, norm = c("TMM", "upperquartile")) # Set many possible combinations of parameters for limma all_limma <- set_limma(pseudo_count = c(TRUE, FALSE), design = ~ group, coef = 2, weights_logical = c(TRUE, FALSE))
Set the parameters for linda differential abundance detection method.
set_linda( assay_name = "counts", formula = NULL, contrast = NULL, is.winsor = TRUE, outlier.pct = 0.03, zero.handling = c("pseudo-count", "imputation"), pseudo.cnt = 0.5, alpha = 0.05, p.adj.method = "BH", expand = TRUE )
set_linda( assay_name = "counts", formula = NULL, contrast = NULL, is.winsor = TRUE, outlier.pct = 0.03, zero.handling = c("pseudo-count", "imputation"), pseudo.cnt = 0.5, alpha = 0.05, p.adj.method = "BH", expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
formula |
Character. For example: formula = '~x1*x2+x3+(1|id)'. At least one fixed effect is required. |
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
is.winsor |
Boolean. If TRUE (default), the Winsorization process will be conducted for the OTU table. |
outlier.pct |
A real value between 0 and 1; Winsorization cutoff (percentile) for the OTU table, e.g., 0.03. Default is NULL. If NULL, Winsorization process will not be conducted. |
zero.handling |
Character. Specifies the method to handle zeros in the OTU table. Options are "pseudo-count" or "imputation" (default is "pseudo-count"). If "imputation", zeros in the OTU table will be imputed using the formula in the referenced paper. If "pseudo-count", a small constant (pseudo.cnt) will be added to each value in the OTU table. |
pseudo.cnt |
A positive real value. Default is 0.5. If zero.handling is set to "pseudo-count", this constant will be added to each value in the OTU table. |
alpha |
A real value between 0 and 1; significance level of differential abundance. Default is 0.05. |
p.adj.method |
Character; p-value adjusting approach. See R function p.adjust. Default is 'BH'. |
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for DA_linda
method.
# Set some basic combinations of parameters for ANCOM with bias correction base_linda <- set_linda(formula = "~ group", contrast = c("group", "B", "A"), zero.handling = "pseudo-count", expand = TRUE) many_linda <- set_linda(formula = "~ group", contrast = c("group", "B", "A"), is.winsor = c(TRUE, FALSE), zero.handling = c("pseudo-count", "imputation"), expand = TRUE)
# Set some basic combinations of parameters for ANCOM with bias correction base_linda <- set_linda(formula = "~ group", contrast = c("group", "B", "A"), zero.handling = "pseudo-count", expand = TRUE) many_linda <- set_linda(formula = "~ group", contrast = c("group", "B", "A"), is.winsor = c(TRUE, FALSE), zero.handling = c("pseudo-count", "imputation"), expand = TRUE)
Set the parameters for Maaslin2 differential abundance detection method.
set_Maaslin2( assay_name = "counts", normalization = c("TSS", "CLR", "CSS", "NONE", "TMM"), transform = c("LOG", "LOGIT", "AST", "NONE"), analysis_method = c("LM", "CPLM", "ZICP", "NEGBIN", "ZINB"), correction = "BH", random_effects = NULL, fixed_effects = NULL, contrast = NULL, reference = NULL, expand = TRUE )
set_Maaslin2( assay_name = "counts", normalization = c("TSS", "CLR", "CSS", "NONE", "TMM"), transform = c("LOG", "LOGIT", "AST", "NONE"), analysis_method = c("LM", "CPLM", "ZICP", "NEGBIN", "ZINB"), correction = "BH", random_effects = NULL, fixed_effects = NULL, contrast = NULL, reference = NULL, expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
normalization |
The normalization method to apply. |
transform |
The transform to apply. |
analysis_method |
The analysis method to apply. |
correction |
The correction method for computing the q-value. |
random_effects |
The random effects for the model, comma-delimited for multiple effects. |
fixed_effects |
The fixed effects for the model, comma-delimited for multiple effects. |
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
reference |
The factor to use as a reference for a variable with more than two levels provided as a string of 'variable,reference' semi-colon delimited for multiple variables. |
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for DA_Maaslin2
method.
# Set some basic combinations of parameters for Maaslin2 base_Maaslin2 <- set_Maaslin2(normalization = "TSS", transform = "LOG", analysis_method = "LM", fixed_effects = "group", contrast = c("group", "B", "A")) many_Maaslin2 <- set_Maaslin2(normalization = c("TSS", "CLR", "CSS", "TMM", "NONE"), transform = c("LOG", "NONE"), analysis_method = c("LM", "NEGBIN"), fixed_effects = "group", contrast = c("group", "B", "A"))
# Set some basic combinations of parameters for Maaslin2 base_Maaslin2 <- set_Maaslin2(normalization = "TSS", transform = "LOG", analysis_method = "LM", fixed_effects = "group", contrast = c("group", "B", "A")) many_Maaslin2 <- set_Maaslin2(normalization = c("TSS", "CLR", "CSS", "TMM", "NONE"), transform = c("LOG", "NONE"), analysis_method = c("LM", "NEGBIN"), fixed_effects = "group", contrast = c("group", "B", "A"))
Set the parameters for MAST differential abundance detection method.
set_MAST( assay_name = "counts", pseudo_count = FALSE, rescale = c("median", "default"), design = NULL, coefficient = NULL, expand = TRUE )
set_MAST( assay_name = "counts", pseudo_count = FALSE, rescale = c("median", "default"), design = NULL, coefficient = NULL, expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
rescale |
Rescale count data, per million if 'default', or per median library size if 'median' ('median' is suggested for metagenomics data). |
design |
The model for the count distribution. Can be the variable name, or a character similar to "~ 1 + group", or a formula, or a 'model.matrix' object. |
coefficient |
The coefficient of interest as a single word formed by the variable name and the non reference level. (e.g.: 'ConditionDisease' if the reference level for the variable 'Condition' is 'control'). |
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for DA_MAST
method.
# Set some basic combinations of parameters for MAST base_MAST <- set_MAST(design = ~ group, coefficient = "groupB") # Set many possible combinations of parameters for MAST all_MAST <- set_MAST(pseudo_count = c(TRUE, FALSE), rescale = c("median", "default"), design = ~ group, coefficient = "groupB")
# Set some basic combinations of parameters for MAST base_MAST <- set_MAST(design = ~ group, coefficient = "groupB") # Set many possible combinations of parameters for MAST all_MAST <- set_MAST(pseudo_count = c(TRUE, FALSE), rescale = c("median", "default"), design = ~ group, coefficient = "groupB")
Set the parameters for metagenomeSeq differential abundance detection method.
set_metagenomeSeq( assay_name = "counts", pseudo_count = FALSE, design = NULL, coef = 2, norm = "CSS", model = "fitFeatureModel", expand = TRUE )
set_metagenomeSeq( assay_name = "counts", pseudo_count = FALSE, design = NULL, coef = 2, norm = "CSS", model = "fitFeatureModel", expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
design |
the model for the count distribution. Can be the variable name, or a character similar to "~ 1 + group", or a formula. |
coef |
coefficient of interest to grab log fold-changes. |
norm |
name of the normalization method to use in the differential
abundance analysis. Choose the native metagenomeSeq normalization method
|
model |
character equal to "fitFeatureModel" for differential abundance
analysis using a zero-inflated log-normal model, "fitZig" for a complex
mathematical optimization routine to estimate probabilities that a zero for
a particular feature in a sample is a technical zero or not. The latter model
relies heavily on the limma package (default
|
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for
DA_metagenomeSeq
method.
# Set a basic combination of parameters for metagenomeSeq base_mgs <- set_metagenomeSeq(design = ~ group, coef = 2) # Set a specific model for metagenomeSeq setModel_mgs <- set_metagenomeSeq(design = ~ group, coef = 2, model = "fitZig") # Set many possible combinations of parameters for metagenomeSeq all_mgs <- set_metagenomeSeq(pseudo_count = c(TRUE, FALSE), design = ~ group, coef = 2, model = c("fitFeatureModel", "fitZig"), norm = "CSS")
# Set a basic combination of parameters for metagenomeSeq base_mgs <- set_metagenomeSeq(design = ~ group, coef = 2) # Set a specific model for metagenomeSeq setModel_mgs <- set_metagenomeSeq(design = ~ group, coef = 2, model = "fitZig") # Set many possible combinations of parameters for metagenomeSeq all_mgs <- set_metagenomeSeq(pseudo_count = c(TRUE, FALSE), design = ~ group, coef = 2, model = c("fitFeatureModel", "fitZig"), norm = "CSS")
Set the parameters for mixMC sPLS-DA.
set_mixMC( assay_name = "counts", pseudo_count = 1, contrast = NULL, ID_variable = NULL, expand = TRUE )
set_mixMC( assay_name = "counts", pseudo_count = 1, contrast = NULL, ID_variable = NULL, expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
a positive numeric value for the pseudo-count to be added. Default is 1. |
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
ID_variable |
a character string indicating the name of the variable name corresponding to the repeated measures units (e.g., the subject ID). |
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list contaning the set of parameters for DA_mixMC
method.
# Set some basic combinations of parameters for mixMC base_mixMC <- set_mixMC(pseudo_count = 1, contrast = c("group", "B", "A")) many_mixMC <- set_mixMC(pseudo_count = c(0.1, 0.5, 1), contrast = c("group", "B", "A"))
# Set some basic combinations of parameters for mixMC base_mixMC <- set_mixMC(pseudo_count = 1, contrast = c("group", "B", "A")) many_mixMC <- set_mixMC(pseudo_count = c(0.1, 0.5, 1), contrast = c("group", "B", "A"))
Set the parameters for NOISeq differential abundance detection method.
set_NOISeq( assay_name = "counts", pseudo_count = FALSE, contrast = NULL, norm = c("rpkm", "uqua", "tmm", "n"), expand = TRUE )
set_NOISeq( assay_name = "counts", pseudo_count = FALSE, contrast = NULL, norm = c("rpkm", "uqua", "tmm", "n"), expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
norm |
name of the normalization method to use in the differential
abundance analysis. Choose between the native edgeR normalization methods,
such as |
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for DA_NOISeq
method.
# Set a basic combination of parameters for NOISeq with 'tmm' normalization base_NOISeq <- set_NOISeq(pseudo_count = FALSE, norm = "tmm", contrast = c("group", "B", "A"), expand = FALSE) # try many normalizations many_NOISeq <- set_NOISeq(pseudo_count = FALSE, norm = c("tmm", "uqua", "rpkm", "n"), contrast = c("group", "B", "A"))
# Set a basic combination of parameters for NOISeq with 'tmm' normalization base_NOISeq <- set_NOISeq(pseudo_count = FALSE, norm = "tmm", contrast = c("group", "B", "A"), expand = FALSE) # try many normalizations many_NOISeq <- set_NOISeq(pseudo_count = FALSE, norm = c("tmm", "uqua", "rpkm", "n"), contrast = c("group", "B", "A"))
Set the parameters for Seurat differential abundance detection method.
set_Seurat( assay_name = "counts", pseudo_count = FALSE, test = "wilcox", contrast = NULL, norm = "LogNormalize", scale.factor = 10000, expand = TRUE )
set_Seurat( assay_name = "counts", pseudo_count = FALSE, test = "wilcox", contrast = NULL, norm = "LogNormalize", scale.factor = 10000, expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
pseudo_count |
add 1 to all counts if TRUE (default
|
test |
Denotes which test to use. Available options are:
|
contrast |
character vector with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change. |
norm |
Method for normalization.
|
scale.factor |
Sets the scale factor for cell-level normalization |
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for DA_Seurat
method.
# Set some basic combinations of parameters for Seurat base_Seurat <- set_Seurat(contrast = c("group", "B", "A")) # Set many possible combinations of parameters for Seurat all_Seurat <- set_Seurat(test = c("wilcox", "t", "negbinom", "poisson"), norm = c("LogNormalize", "CLR", "RC", "none"), scale.factor = c(1000, 10000), contrast = c("group", "B", "A"))
# Set some basic combinations of parameters for Seurat base_Seurat <- set_Seurat(contrast = c("group", "B", "A")) # Set many possible combinations of parameters for Seurat all_Seurat <- set_Seurat(test = c("wilcox", "t", "negbinom", "poisson"), norm = c("LogNormalize", "CLR", "RC", "none"), scale.factor = c(1000, 10000), contrast = c("group", "B", "A"))
Set the parameters for ZicoSeq differential abundance detection method.
set_ZicoSeq( assay_name = "counts", contrast = NULL, strata = NULL, adj.name = NULL, feature.dat.type = c("count", "proportion", "other"), is.winsor = TRUE, outlier.pct = 0.03, winsor.end = c("top", "bottom", "both"), is.post.sample = TRUE, post.sample.no = 25, perm.no = 99, link.func = list(function(x) sign(x) * (abs(x))^0.5), ref.pct = 0.5, stage.no = 6, excl.pct = 0.2, expand = TRUE )
set_ZicoSeq( assay_name = "counts", contrast = NULL, strata = NULL, adj.name = NULL, feature.dat.type = c("count", "proportion", "other"), is.winsor = TRUE, outlier.pct = 0.03, winsor.end = c("top", "bottom", "both"), is.post.sample = TRUE, post.sample.no = 25, perm.no = 99, link.func = list(function(x) sign(x) * (abs(x))^0.5), ref.pct = 0.5, stage.no = 6, excl.pct = 0.2, expand = TRUE )
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
contrast |
character vector with exactly, three elements: a string indicating the name of factor whose levels are the conditions to be compared, the name of the level of interest, and the name of the other level. |
strata |
a factor such as subject IDs indicating the permutation strata or characters indicating the strata variable in |
adj.name |
the name(s) for the variable(s) to be adjusted. Multiple variables are allowed.
They could be numeric or categorical; should be in |
feature.dat.type |
the type of the feature data. It could be "count", "proportion" or "other". For "proportion" data type, posterior sampling will not be performed, but the reference-based ratio approach will still be used to address compositional effects. For "other" data type, neither posterior sampling or reference-base ratio approach will be used. |
is.winsor |
a logical value indicating whether winsorization should be performed to replace outliers. The default is TRUE. |
outlier.pct |
the expected percentage of outliers. These outliers will be winsorized. The default is 0.03. For count/proportion data,
|
winsor.end |
a character indicating whether the outliers at the "top", "bottom" or "both" will be winsorized.
The default is "top". If the |
is.post.sample |
a logical value indicating whether to perform posterior sampling of the underlying proportions. Only relevant when the feature data are counts. |
post.sample.no |
the number of posterior samples if posterior sampling is used. The default is 25. |
perm.no |
the number of permutations. If the raw p values are of the major interest, set |
link.func |
a list of transformation functions for the feature data or the ratios. Based on our experience, square-root transformation is a robust choice for many datasets. |
ref.pct |
percentage of reference taxa. The default is 0.5. |
stage.no |
the number of stages if multiple-stage normalization is used. The default is 6. |
excl.pct |
the maximum percentage of significant features (nominal p-value < 0.05) in the reference set that should be removed. Only relevant when multiple-stage normalization is used. |
expand |
logical, if TRUE create all combinations of input parameters
(default |
A named list containing the set of parameters for DA_ZicoSeq
method.
# Set some basic combinations of parameters for ZicoSeq base_ZicoSeq <- set_ZicoSeq(contrast = c("group", "B", "A"), feature.dat.type = "count", winsor.end = "top") many_ZicoSeq <- set_ZicoSeq(contrast = c("group", "B", "A"), feature.dat.type = "count", outlier.pct = c(0.03, 0.05), winsor.end = "top", is.post.sample = c(TRUE, FALSE))
# Set some basic combinations of parameters for ZicoSeq base_ZicoSeq <- set_ZicoSeq(contrast = c("group", "B", "A"), feature.dat.type = "count", winsor.end = "top") many_ZicoSeq <- set_ZicoSeq(contrast = c("group", "B", "A"), feature.dat.type = "count", outlier.pct = c(0.03, 0.05), winsor.end = "top", is.post.sample = c(TRUE, FALSE))
Set the methods and parameters to compute normalization/scaling factors.
setNormalizations( fun = c("norm_edgeR", "norm_DESeq2", "norm_CSS"), method = c("TMM", "poscounts", "CSS") )
setNormalizations( fun = c("norm_edgeR", "norm_DESeq2", "norm_CSS"), method = c("TMM", "poscounts", "CSS") )
fun |
a character with the name of normalization function (e.g. "norm_edgeR", "norm_DESeq2", "norm_CSS"...). |
method |
a character with the normalization method (e.g.
"TMM", "upperquartile"... if the |
a list object containing the normalization methods and their parameters.
runNormalizations
, norm_edgeR
,
norm_DESeq2
, norm_CSS
, norm_TSS
# Set a TMM normalization my_TMM_normalization <- setNormalizations(fun = "norm_edgeR", method = "TMM") # Set some simple normalizations my_normalizations <- setNormalizations() # Add a custom normalization my_normalizations <- c(my_normalizations, myNormMethod1 = list("myNormMethod", "parameter1", "parameter2"))
# Set a TMM normalization my_TMM_normalization <- setNormalizations(fun = "norm_edgeR", method = "TMM") # Set some simple normalizations my_normalizations <- setNormalizations() # Add a custom normalization my_normalizations <- c(my_normalizations, myNormMethod1 = list("myNormMethod", "parameter1", "parameter2"))
Computes the observational weights of the counts under a zero-inflated negative binomial (ZINB) model. For each count, the ZINB distribution is parametrized by three parameters: the mean value and the dispersion of the negative binomial distribution, and the probability of the zero component.
weights_ZINB( object, assay_name = "counts", design, K = 0, commondispersion = TRUE, zeroinflation = TRUE, verbose = FALSE, ... )
weights_ZINB( object, assay_name = "counts", design, K = 0, commondispersion = TRUE, zeroinflation = TRUE, verbose = FALSE, ... )
object |
a phyloseq or TreeSummarizedExperiment object. |
assay_name |
the name of the assay to extract from the
TreeSummarizedExperiment object (default |
design |
character name of the metadata columns, formula, or design matrix with rows corresponding to samples and columns to coefficients to be estimated (the user needs to explicitly include the intercept in the design). |
K |
integer. Number of latent factors. |
commondispersion |
Whether or not a single dispersion for all features is estimated (default TRUE). |
zeroinflation |
Whether or not a ZINB model should be fitted. If FALSE, a negative binomial model is fitted instead. |
verbose |
Print helpful messages. |
... |
Additional parameters to describe the model, see
|
A matrix of weights.
zinbFit
for zero-inflated negative binomial
parameters' estimation and
computeObservationalWeights
for weights extraction.
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6")) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the ZINB weights zinbweights <- weights_ZINB(object = ps, K = 0, design = "~ 1")
set.seed(1) # Create a very simple phyloseq object counts <- matrix(rnbinom(n = 60, size = 3, prob = 0.5), nrow = 10, ncol = 6) metadata <- data.frame("Sample" = c("S1", "S2", "S3", "S4", "S5", "S6")) ps <- phyloseq::phyloseq(phyloseq::otu_table(counts, taxa_are_rows = TRUE), phyloseq::sample_data(metadata)) # Calculate the ZINB weights zinbweights <- weights_ZINB(object = ps, K = 0, design = "~ 1")