Title: | R implementation of the LEfSE method for microbiome biomarker discovery |
---|---|
Description: | lefser is an implementation in R of the popular "LDA Effect Size (LEfSe)" method for microbiome biomarker discovery. It uses the Kruskal-Wallis test, Wilcoxon-Rank Sum test, and Linear Discriminant Analysis to find biomarkers of groups and sub-groups. |
Authors: | Sehyun Oh [cre, ctb] , Asya Khleborodova [aut], Ludwig Geistlinger [ctb] , Marcel Ramos [ctb] , Samuel Gamboa-Tuz [ctb], Levi Waldron [ctb] |
Maintainer: | Sehyun Oh <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.15.4 |
Built: | 2024-07-18 02:46:45 UTC |
Source: | https://github.com/bioc/lefser |
A terminal node in a taxonomy does not have any child nodes. For example, a species is a terminal node if there are no subspecies or strains that belong to that species. This function identifies which elements of a vector are terminal nodes simply by checking whether that element appears as a substring in any other element of the vector.
get_terminal_nodes(string)
get_terminal_nodes(string)
string |
A character vector of strings to check for terminal nodes |
A logical vector indicating which elements of the string are terminal nodes
# What does it do? data("zeller14") rownames(zeller14)[988:989] get_terminal_nodes(rownames(zeller14)[988:989]) # How do I use it to keep only terminal nodes for a lefser analysis? terminal_nodes <- get_terminal_nodes(rownames(zeller14)) zeller14sub <- zeller14[terminal_nodes, ] # Then continue with your analysis!
# What does it do? data("zeller14") rownames(zeller14)[988:989] get_terminal_nodes(rownames(zeller14)[988:989]) # How do I use it to keep only terminal nodes for a lefser analysis? terminal_nodes <- get_terminal_nodes(rownames(zeller14)) zeller14sub <- zeller14[terminal_nodes, ] # Then continue with your analysis!
Perform a LEfSe analysis: the function carries out differential analysis between two sample groups for multiple features and uses linear discriminant analysis to establish their effect sizes. Subclass information for each class can be incorporated into the analysis (see examples). Features with large differences between two sample groups are identified as biomarkers.
lefser( relab, kruskal.threshold = 0.05, wilcox.threshold = 0.05, lda.threshold = 2, groupCol = "GROUP", blockCol = NULL, assay = 1L, trim.names = FALSE, checkAbundances = TRUE, method = "none", ..., expr )
lefser( relab, kruskal.threshold = 0.05, wilcox.threshold = 0.05, lda.threshold = 2, groupCol = "GROUP", blockCol = NULL, assay = 1L, trim.names = FALSE, checkAbundances = TRUE, method = "none", ..., expr )
relab |
A SummarizedExperiment with relative abundances in the assay |
kruskal.threshold |
numeric(1) The p-value for the Kruskal-Wallis Rank Sum Test (default 0.05). If multiple hypothesis testing is performed, this threshold is applied to corrected p-values. |
wilcox.threshold |
numeric(1) The p-value for the Wilcoxon Rank-Sum Test when 'blockCol' is present (default 0.05). If multiple hypothesis testing is performed, this threshold is applied to corrected p-values. |
lda.threshold |
numeric(1) The effect size threshold (default 2.0). |
groupCol |
character(1) Column name in |
blockCol |
character(1) Optional column name in |
assay |
The i-th assay matrix in the |
trim.names |
Default is |
checkAbundances |
|
method |
Default is "none" as in the original LEfSe implementation. Character string of length one, passed on to p.adjust to set option for multiple testing. For multiple pairwise comparisons, each comparison is adjusted separately. Options are "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr" (synonym for "BH"), and "none". |
expr |
( |
... |
Additional inputs to lower level functions (not used). |
The LEfSe method expects relative abundances in the expr
input. A warning
will be emitted if the column sums do not result in 1. Use the relativeAb
helper function to convert the data in the SummarizedExperiment
to relative
abundances. The checkAbundances
argument enables checking the data
for presence of relative abundances and can be turned off by setting the
argument to FALSE
.
The function returns a data.frame
with two columns, which are
names of features and their LDA scores.
data(zeller14) zeller14 <- zeller14[, zeller14$study_condition != "adenoma"] tn <- get_terminal_nodes(rownames(zeller14)) zeller14tn <- zeller14[tn,] zeller14tn_ra <- relativeAb(zeller14tn) # (1) Using classes only res_group <- lefser(zeller14tn_ra, groupCol = "study_condition") # (2) Using classes and sub-classes res_block <- lefser(zeller14tn_ra, groupCol = "study_condition", blockCol = "age_category")
data(zeller14) zeller14 <- zeller14[, zeller14$study_condition != "adenoma"] tn <- get_terminal_nodes(rownames(zeller14)) zeller14tn <- zeller14[tn,] zeller14tn_ra <- relativeAb(zeller14tn) # (1) Using classes only res_group <- lefser(zeller14tn_ra, groupCol = "study_condition") # (2) Using classes and sub-classes res_block <- lefser(zeller14tn_ra, groupCol = "study_condition", blockCol = "age_category")
lefser
functionThis function plots the biomarkers found by LEfSe, that are ranked according to their effect sizes and linked to their abundance in each class.
lefserPlot( df, colors = c("red", "forestgreen"), trim.names = TRUE, title = "", label.font.size = 3 )
lefserPlot( df, colors = c("red", "forestgreen"), trim.names = TRUE, title = "", label.font.size = 3 )
df |
Data frame produced by |
colors |
A character(2). Colors corresponding to class 0 and 1.
Defaults to |
trim.names |
Under the default ( |
title |
A character(1). The title of the plot. |
label.font.size |
A numeric(1). The font size of the feature labels.
The default is |
Function returns plot of effect size scores produced by lefser
.
Positive scores represent the biomarker is more abundant in class '1'.
Negative scores represent the biomarker is more abundant in class '0'.
example("lefser") lefserPlot(res_group)
example("lefser") lefserPlot(res_group)
The function calculates the column totals and divides each value within the column by the respective column total.
This function calculates the relative abundance of each feature in the SummarizedExperiment object containing count data, expressed as counts per million (CPM)
relativeAb(se, assay = 1L)
relativeAb(se, assay = 1L)
se |
A SummarizedExperiment object with counts |
assay |
The i-th assay matrix in the |
returns a new SummarizedExperiment object with counts per million calculated and added as a new assay named rel_abs.
se <- SummarizedExperiment( assays = list( counts = matrix( rep(1, 4), ncol = 1, dimnames = list(LETTERS[1:4], "SAMP") ) ) ) assay(se) assay(relativeAb(se))
se <- SummarizedExperiment( assays = list( counts = matrix( rep(1, 4), ncol = 1, dimnames = list(LETTERS[1:4], "SAMP") ) ) ) assay(se) assay(relativeAb(se))
The ZellerG_2014 dataset contains microbiome count data for CRC patients and controls. It was for curatedMetagenomicData using the script in the package directory "data-raw".
data("zeller14")
data("zeller14")
A SummarizedExperiment with 1585 features, 199 samples
adenoma, control, CRC
adult, senior
https://pubmed.ncbi.nlm.nih.gov/25432777/