Title: | R implementation of the LEfSE method for microbiome biomarker discovery |
---|---|
Description: | lefser is the R implementation of the popular microbiome biomarker discovery too, LEfSe. It uses the Kruskal-Wallis test, Wilcoxon-Rank Sum test, and Linear Discriminant Analysis to find biomarkers from two-level classes (and optional sub-classes). |
Authors: | Sehyun Oh [cre, ctb] , Asya Khleborodova [aut], Samuel Gamboa-Tuz [ctb], Marcel Ramos [ctb] , Ludwig Geistlinger [ctb] , Levi Waldron [ctb] |
Maintainer: | Sehyun Oh <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.17.1 |
Built: | 2024-10-31 03:38:23 UTC |
Source: | https://github.com/bioc/lefser |
A terminal node in a taxonomy does not have any child nodes. For example, a species is a terminal node if there are no subspecies or strains that belong to that species. This function identifies which elements of a vector are terminal nodes simply by checking whether that element appears as a substring in any other element of the vector.
get_terminal_nodes(string)
get_terminal_nodes(string)
string |
A character vector of strings to check for terminal nodes |
A logical vector indicating which elements of the string are terminal nodes
# What does it do? data("zeller14") rownames(zeller14)[988:989] get_terminal_nodes(rownames(zeller14)[988:989]) # How do I use it to keep only terminal nodes for a lefser analysis? terminal_nodes <- get_terminal_nodes(rownames(zeller14)) zeller14sub <- zeller14[terminal_nodes, ] # Then continue with your analysis!
# What does it do? data("zeller14") rownames(zeller14)[988:989] get_terminal_nodes(rownames(zeller14)[988:989]) # How do I use it to keep only terminal nodes for a lefser analysis? terminal_nodes <- get_terminal_nodes(rownames(zeller14)) zeller14sub <- zeller14[terminal_nodes, ] # Then continue with your analysis!
Perform a LEfSe analysis: the function carries out differential analysis between two sample classes for multiple features and uses linear discriminant analysis to establish their effect sizes. Subclass information for each class can be incorporated into the analysis (see examples). Features with large differences between two sample classes are identified as biomarkers.
lefser( relab, kruskal.threshold = 0.05, wilcox.threshold = 0.05, lda.threshold = 2, classCol = "CLASS", subclassCol = NULL, assay = 1L, trim.names = FALSE, checkAbundances = TRUE, method = "none", ..., groupCol, blockCol )
lefser( relab, kruskal.threshold = 0.05, wilcox.threshold = 0.05, lda.threshold = 2, classCol = "CLASS", subclassCol = NULL, assay = 1L, trim.names = FALSE, checkAbundances = TRUE, method = "none", ..., groupCol, blockCol )
relab |
A SummarizedExperiment with relative abundances in the assay |
kruskal.threshold |
numeric(1) The p-value for the Kruskal-Wallis Rank Sum Test (default 0.05). If multiple hypothesis testing is performed, this threshold is applied to corrected p-values. |
wilcox.threshold |
numeric(1) The p-value for the Wilcoxon Rank-Sum Test when 'subclassCol' is present (default 0.05). If multiple hypothesis testing is performed, this threshold is applied to corrected p-values. |
lda.threshold |
numeric(1) The effect size threshold (default 2.0). |
classCol |
character(1) Column name in |
subclassCol |
character(1) Optional column name in |
assay |
The i-th assay matrix in the |
trim.names |
Default is |
checkAbundances |
|
method |
Default is "none" as in the original LEfSe implementation. Character string of length one, passed on to p.adjust to set option for multiple testing. For multiple pairwise comparisons, each comparison is adjusted separately. Options are "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr" (synonym for "BH"), and "none". |
groupCol |
(DEFUNCT) Column name in |
blockCol |
(DEFUNCT) Optional column name in |
... |
Additional inputs to lower level functions (not used). |
The LEfSe method expects relative abundances in the expr
input. A warning
will be emitted if the column sums do not result in 1. Use the relativeAb
helper function to convert the data in the SummarizedExperiment
to relative
abundances. The checkAbundances
argument enables checking the data
for presence of relative abundances and can be turned off by setting the
argument to FALSE
.
The function returns a data.frame
with two columns, which are
names of features and their LDA scores.
data(zeller14) zeller14 <- zeller14[, zeller14$study_condition != "adenoma"] tn <- get_terminal_nodes(rownames(zeller14)) zeller14tn <- zeller14[tn,] zeller14tn_ra <- relativeAb(zeller14tn) # (1) Using classes only res_class <- lefser(zeller14tn_ra, classCol = "study_condition") # (2) Using classes and sub-classes res_subclass <- lefser(zeller14tn_ra, classCol = "study_condition", subclassCol = "age_category")
data(zeller14) zeller14 <- zeller14[, zeller14$study_condition != "adenoma"] tn <- get_terminal_nodes(rownames(zeller14)) zeller14tn <- zeller14[tn,] zeller14tn_ra <- relativeAb(zeller14tn) # (1) Using classes only res_class <- lefser(zeller14tn_ra, classCol = "study_condition") # (2) Using classes and sub-classes res_subclass <- lefser(zeller14tn_ra, classCol = "study_condition", subclassCol = "age_category")
lefesrCaldes
Agglomerates the features abundance at different
taxonomic ranks using mia::splitByRanks
and performs lefser at
each rank. The analysis is run at the species, genus, family, order,
class, and phylum levels.
lefserClades(relab, ...)
lefserClades(relab, ...)
relab |
A (Tree) SummarizedExperiment with full taxonomy in the rowData. |
... |
Arguments passed to the |
When running lefserClades
, all features with NAs in the rowData will
be dropped. This is to avoid creating artificial clades with NAs.
An object of class 'lefser_df_clades', "lefser_df", and 'data.frame'.
data("zeller14") z14 <- zeller14[, zeller14$study_condition != "adenoma"] tn <- get_terminal_nodes(rownames(z14)) z14tn <- z14[tn, ] z14tn_ra <- relativeAb(z14tn) z14_input <- rowNames2RowData(z14tn_ra) resCl <- lefserClades(relab = z14_input, classCol = "study_condition")
data("zeller14") z14 <- zeller14[, zeller14$study_condition != "adenoma"] tn <- get_terminal_nodes(rownames(z14)) z14tn <- z14[tn, ] z14tn_ra <- relativeAb(z14tn) z14_input <- rowNames2RowData(z14tn_ra) resCl <- lefserClades(relab = z14_input, classCol = "study_condition")
lefser
functionThis function plots the biomarkers found by LEfSe, that are ranked according to their effect sizes and linked to their abundance in each class.
lefserPlot( df, colors = "c", trim.names = TRUE, title = "", label.font.size = 3 )
lefserPlot( df, colors = "c", trim.names = TRUE, title = "", label.font.size = 3 )
df |
Data frame produced by |
colors |
Colors corresponding to class 0 and 1. Options: "c" (colorblind), "l" (lefse), "g" (greyscale). Defaults to "c". This argument also accepts a character(2) with two color names. |
trim.names |
Under the default ( |
title |
A character(1). The title of the plot. |
label.font.size |
A numeric(1). The font size of the feature labels.
The default is |
Function returns plot of effect size scores produced by lefser
.
Positive scores represent the biomarker is more abundant in class '1'.
Negative scores represent the biomarker is more abundant in class '0'.
example("lefser") lefserPlot(res_class)
example("lefser") lefserPlot(res_class)
lefserPlotClad
plots a cladogram from the results of
lefserClades
.
lefserPlotClad(df, colors = "c", showTipLabels = FALSE, showNodeLabels = "p")
lefserPlotClad(df, colors = "c", showTipLabels = FALSE, showNodeLabels = "p")
df |
An object of class "lefesr_df_clades". |
colors |
Colors corresponding to class 0 and 1. Options: "c" (colorblind), "l" (lefse), "g" (greyscale). Defaults to "c". This argument also accepts a character(2) with two color names. |
showTipLabels |
Logical. If TRUE, show tip labels. Default is FALSE. |
showNodeLabels |
Label's to be shown in the tree. Options: "p" = phylum, "c" = class, "o" = order, "f" = family, "g" = genus, "s" = species, "t" = strain. It can accept several options, e.g., c("p", "c"). |
A ggtree object.
data("zeller14") z14 <- zeller14[, zeller14$study_condition != "adenoma"] tn <- get_terminal_nodes(rownames(z14)) z14tn <- z14[tn, ] z14tn_ra <- relativeAb(z14tn) z14_input <- rowNames2RowData(z14tn_ra) resCl <- lefserClades(relab = z14_input, classCol = "study_condition") ggt <- lefserPlotClad(df = resCl)
data("zeller14") z14 <- zeller14[, zeller14$study_condition != "adenoma"] tn <- get_terminal_nodes(rownames(z14)) z14tn <- z14[tn, ] z14tn_ra <- relativeAb(z14tn) z14_input <- rowNames2RowData(z14tn_ra) resCl <- lefserClades(relab = z14_input, classCol = "study_condition") ggt <- lefserPlotClad(df = resCl)
lefserPlotFeat
plots the abundance data of a DA feature across all
samples.
lefserPlotFeat(res, fName, colors = "colorblind")
lefserPlotFeat(res, fName, colors = "colorblind")
res |
An object of class lefser_df,
output of the |
fName |
A character string. The name of a feature in the lefser_df object. |
colors |
Colors corresponding to class 0 and 1. Options: "c" (colorblind), "l" (lefse), "g" (greyscale). Defaults to "c". This argument also accepts a character(2) with two color names. |
The solid lines represent the mean by class or by class+subclass (if the subclass variable is present). The dashed lines represent the median by class or by class+subclass (if the subclass variable is present).
A ggplot object.
data(zeller14) zeller14 <- zeller14[, zeller14$study_condition != "adenoma"] tn <- get_terminal_nodes(rownames(zeller14)) zeller14tn <- zeller14[tn,] zeller14tn_ra <- relativeAb(zeller14tn) # (1) Using classes only res_class <- lefser(zeller14tn_ra, classCol = "study_condition") # (2) Using classes and sub-classes res_subclass <- lefser(zeller14tn_ra, classCol = "study_condition", subclassCol = "age_category") plot_class <- lefserPlotFeat(res_class, res_class$features[[1]]) plot_subclass <- lefserPlotFeat(res_subclass, res_subclass$features[[2]])
data(zeller14) zeller14 <- zeller14[, zeller14$study_condition != "adenoma"] tn <- get_terminal_nodes(rownames(zeller14)) zeller14tn <- zeller14[tn,] zeller14tn_ra <- relativeAb(zeller14tn) # (1) Using classes only res_class <- lefser(zeller14tn_ra, classCol = "study_condition") # (2) Using classes and sub-classes res_subclass <- lefser(zeller14tn_ra, classCol = "study_condition", subclassCol = "age_category") plot_class <- lefserPlotFeat(res_class, res_class$features[[1]]) plot_subclass <- lefserPlotFeat(res_subclass, res_subclass$features[[2]])
The function calculates the column totals and divides each value within the column by the respective column total.
This function calculates the relative abundance of each feature in the SummarizedExperiment object containing count data, expressed as counts per million (CPM)
relativeAb(se, assay = 1L)
relativeAb(se, assay = 1L)
se |
A SummarizedExperiment object with counts |
assay |
The i-th assay matrix in the |
returns a new SummarizedExperiment object with counts per million calculated and added as a new assay named rel_abs.
se <- SummarizedExperiment( assays = list( counts = matrix( rep(1, 4), ncol = 1, dimnames = list(LETTERS[1:4], "SAMP") ) ) ) assay(se) assay(relativeAb(se))
se <- SummarizedExperiment( assays = list( counts = matrix( rep(1, 4), ncol = 1, dimnames = list(LETTERS[1:4], "SAMP") ) ) ) assay(se) assay(relativeAb(se))
rowNames2RowData
transforms the taxonomy stored in the row names to
the rowData in a SummarizedExperiment.
rowNames2RowData(x)
rowNames2RowData(x)
x |
A SummarizedExperiment with the features taxonomy in the rownames. |
The same SummarizedExpriment with the taxonomy now in the rowData.
data("zeller14") ## Keep only "CRC" and "control" (dichotomous variable) z14 <- zeller14[, zeller14$study_condition %in% c("control", "CRC")] ## Get terminal nodes tn <- get_terminal_nodes(rownames(z14)) z14_tn <- z14[tn, ] ## Normalize to relative abundance (also known as Total Sum Scaling) z14_tn_ra <- relativeAb(z14_tn) ## Add the taxonomy to the rowData input_se <- rowNames2RowData(z14_tn_ra)
data("zeller14") ## Keep only "CRC" and "control" (dichotomous variable) z14 <- zeller14[, zeller14$study_condition %in% c("control", "CRC")] ## Get terminal nodes tn <- get_terminal_nodes(rownames(z14)) z14_tn <- z14[tn, ] ## Normalize to relative abundance (also known as Total Sum Scaling) z14_tn_ra <- relativeAb(z14_tn) ## Add the taxonomy to the rowData input_se <- rowNames2RowData(z14_tn_ra)
The ZellerG_2014 dataset contains microbiome count data for CRC patients and controls. It was for curatedMetagenomicData using the script in the package directory "data-raw".
data("zeller14")
data("zeller14")
A SummarizedExperiment with 1585 features, 199 samples
adenoma, control, CRC
adult, senior
https://pubmed.ncbi.nlm.nih.gov/25432777/