Package 'lefser'

Title: R implementation of the LEfSE method for microbiome biomarker discovery
Description: lefser is the R implementation of the popular microbiome biomarker discovery too, LEfSe. It uses the Kruskal-Wallis test, Wilcoxon-Rank Sum test, and Linear Discriminant Analysis to find biomarkers from two-level classes (and optional sub-classes).
Authors: Sehyun Oh [cre, ctb] , Asya Khleborodova [aut], Samuel Gamboa-Tuz [ctb], Marcel Ramos [ctb] , Ludwig Geistlinger [ctb] , Levi Waldron [ctb]
Maintainer: Sehyun Oh <[email protected]>
License: Artistic-2.0
Version: 1.17.1
Built: 2024-10-31 03:38:23 UTC
Source: https://github.com/bioc/lefser

Help Index


Identify which elements of a string are terminal nodes

Description

A terminal node in a taxonomy does not have any child nodes. For example, a species is a terminal node if there are no subspecies or strains that belong to that species. This function identifies which elements of a vector are terminal nodes simply by checking whether that element appears as a substring in any other element of the vector.

Usage

get_terminal_nodes(string)

Arguments

string

A character vector of strings to check for terminal nodes

Value

A logical vector indicating which elements of the string are terminal nodes

Examples

# What does it do?
data("zeller14")
rownames(zeller14)[988:989]
get_terminal_nodes(rownames(zeller14)[988:989])
# How do I use it to keep only terminal nodes for a lefser analysis?
terminal_nodes <- get_terminal_nodes(rownames(zeller14))
zeller14sub <- zeller14[terminal_nodes, ]
# Then continue with your analysis!

R implementation of the LEfSe method

Description

Perform a LEfSe analysis: the function carries out differential analysis between two sample classes for multiple features and uses linear discriminant analysis to establish their effect sizes. Subclass information for each class can be incorporated into the analysis (see examples). Features with large differences between two sample classes are identified as biomarkers.

Usage

lefser(
  relab,
  kruskal.threshold = 0.05,
  wilcox.threshold = 0.05,
  lda.threshold = 2,
  classCol = "CLASS",
  subclassCol = NULL,
  assay = 1L,
  trim.names = FALSE,
  checkAbundances = TRUE,
  method = "none",
  ...,
  groupCol,
  blockCol
)

Arguments

relab

A SummarizedExperiment with relative abundances in the assay

kruskal.threshold

numeric(1) The p-value for the Kruskal-Wallis Rank Sum Test (default 0.05). If multiple hypothesis testing is performed, this threshold is applied to corrected p-values.

wilcox.threshold

numeric(1) The p-value for the Wilcoxon Rank-Sum Test when 'subclassCol' is present (default 0.05). If multiple hypothesis testing is performed, this threshold is applied to corrected p-values.

lda.threshold

numeric(1) The effect size threshold (default 2.0).

classCol

character(1) Column name in colData(relab) indicating class, usually a factor with two levels (e.g., c("cases", "controls"); default "CLASS").

subclassCol

character(1) Optional column name in colData(relab) indicating the subclasses, usually a factor with two levels (e.g., c("adult", "senior"); default NULL), but can be more than two levels.

assay

The i-th assay matrix in the SummarizedExperiment ('relab'; default 1).

trim.names

Default is FALSE. If TRUE, this function extracts the most specific taxonomic rank of organism.

checkAbundances

logical(1) Whether to check if the assay data in the relab input are relative abundances or counts. If counts are found, a warning will be emitted (default TRUE).

method

Default is "none" as in the original LEfSe implementation. Character string of length one, passed on to p.adjust to set option for multiple testing. For multiple pairwise comparisons, each comparison is adjusted separately. Options are "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr" (synonym for "BH"), and "none".

groupCol

(DEFUNCT) Column name in colData(relab) indicating groups, usually a factor with two levels (e.g., c("cases", "controls"); default "GROUP").

blockCol

(DEFUNCT) Optional column name in colData(relab) indicating the blocks, usually a factor with two levels (e.g., c("adult", "senior"); default NULL).

...

Additional inputs to lower level functions (not used).

Details

The LEfSe method expects relative abundances in the expr input. A warning will be emitted if the column sums do not result in 1. Use the relativeAb helper function to convert the data in the SummarizedExperiment to relative abundances. The checkAbundances argument enables checking the data for presence of relative abundances and can be turned off by setting the argument to FALSE.

Value

The function returns a data.frame with two columns, which are names of features and their LDA scores.

Examples

data(zeller14)
    zeller14 <- zeller14[, zeller14$study_condition != "adenoma"]
    tn <- get_terminal_nodes(rownames(zeller14))
    zeller14tn <- zeller14[tn,]
    zeller14tn_ra <- relativeAb(zeller14tn)

    # (1) Using classes only
    res_class <- lefser(zeller14tn_ra,
                        classCol = "study_condition")
    # (2) Using classes and sub-classes
    res_subclass <- lefser(zeller14tn_ra,
                        classCol = "study_condition",
                        subclassCol = "age_category")

Run lefser at different clades

Description

lefesrCaldes Agglomerates the features abundance at different taxonomic ranks using mia::splitByRanks and performs lefser at each rank. The analysis is run at the species, genus, family, order, class, and phylum levels.

Usage

lefserClades(relab, ...)

Arguments

relab

A (Tree) SummarizedExperiment with full taxonomy in the rowData.

...

Arguments passed to the lefser function.

Details

When running lefserClades, all features with NAs in the rowData will be dropped. This is to avoid creating artificial clades with NAs.

Value

An object of class 'lefser_df_clades', "lefser_df", and 'data.frame'.

Examples

data("zeller14")
z14 <- zeller14[, zeller14$study_condition != "adenoma"]
tn <- get_terminal_nodes(rownames(z14))
z14tn <- z14[tn, ]
z14tn_ra <- relativeAb(z14tn)
z14_input <- rowNames2RowData(z14tn_ra)

resCl <- lefserClades(relab = z14_input, classCol = "study_condition")

Plots results from lefser function

Description

This function plots the biomarkers found by LEfSe, that are ranked according to their effect sizes and linked to their abundance in each class.

Usage

lefserPlot(
  df,
  colors = "c",
  trim.names = TRUE,
  title = "",
  label.font.size = 3
)

Arguments

df

Data frame produced by lefser. This data frame contains two columns labeled as c("features", "scores").

colors

Colors corresponding to class 0 and 1. Options: "c" (colorblind), "l" (lefse), "g" (greyscale). Defaults to "c". This argument also accepts a character(2) with two color names.

trim.names

Under the default (TRUE), this function extracts the most specific taxonomic rank of organism.

title

A character(1). The title of the plot.

label.font.size

A numeric(1). The font size of the feature labels. The default is 3.

Value

Function returns plot of effect size scores produced by lefser. Positive scores represent the biomarker is more abundant in class '1'. Negative scores represent the biomarker is more abundant in class '0'.

Examples

example("lefser")
lefserPlot(res_class)

LEfSer plot cladogram

Description

lefserPlotClad plots a cladogram from the results of lefserClades.

Usage

lefserPlotClad(df, colors = "c", showTipLabels = FALSE, showNodeLabels = "p")

Arguments

df

An object of class "lefesr_df_clades".

colors

Colors corresponding to class 0 and 1. Options: "c" (colorblind), "l" (lefse), "g" (greyscale). Defaults to "c". This argument also accepts a character(2) with two color names.

showTipLabels

Logical. If TRUE, show tip labels. Default is FALSE.

showNodeLabels

Label's to be shown in the tree. Options: "p" = phylum, "c" = class, "o" = order, "f" = family, "g" = genus, "s" = species, "t" = strain. It can accept several options, e.g., c("p", "c").

Value

A ggtree object.

Examples

data("zeller14")
z14 <- zeller14[, zeller14$study_condition != "adenoma"]
tn <- get_terminal_nodes(rownames(z14))
z14tn <- z14[tn, ]
z14tn_ra <- relativeAb(z14tn)
z14_input <- rowNames2RowData(z14tn_ra)

resCl <- lefserClades(relab = z14_input, classCol = "study_condition")
ggt <- lefserPlotClad(df = resCl)

Plot Feature

Description

lefserPlotFeat plots the abundance data of a DA feature across all samples.

Usage

lefserPlotFeat(res, fName, colors = "colorblind")

Arguments

res

An object of class lefser_df, output of the lefser function.

fName

A character string. The name of a feature in the lefser_df object.

colors

Colors corresponding to class 0 and 1. Options: "c" (colorblind), "l" (lefse), "g" (greyscale). Defaults to "c". This argument also accepts a character(2) with two color names.

Details

The solid lines represent the mean by class or by class+subclass (if the subclass variable is present). The dashed lines represent the median by class or by class+subclass (if the subclass variable is present).

Value

A ggplot object.

Examples

data(zeller14)
zeller14 <- zeller14[, zeller14$study_condition != "adenoma"]
tn <- get_terminal_nodes(rownames(zeller14))
zeller14tn <- zeller14[tn,]
zeller14tn_ra <- relativeAb(zeller14tn)

# (1) Using classes only
res_class <- lefser(zeller14tn_ra,
                    classCol = "study_condition")
# (2) Using classes and sub-classes
res_subclass <- lefser(zeller14tn_ra,
                    classCol = "study_condition",
                    subclassCol = "age_category")
plot_class <- lefserPlotFeat(res_class, res_class$features[[1]])
plot_subclass <- lefserPlotFeat(res_subclass, res_subclass$features[[2]])

Utility function to calculate relative abundances

Description

The function calculates the column totals and divides each value within the column by the respective column total.

This function calculates the relative abundance of each feature in the SummarizedExperiment object containing count data, expressed as counts per million (CPM)

Usage

relativeAb(se, assay = 1L)

Arguments

se

A SummarizedExperiment object with counts

assay

The i-th assay matrix in the SummarizedExperiment ('relab'; default 1).

Value

returns a new SummarizedExperiment object with counts per million calculated and added as a new assay named rel_abs.

Examples

se <- SummarizedExperiment(
    assays = list(
        counts = matrix(
            rep(1, 4), ncol = 1, dimnames = list(LETTERS[1:4], "SAMP")
        )
    )
)
assay(se)
assay(relativeAb(se))

RowNames to RowData

Description

rowNames2RowData transforms the taxonomy stored in the row names to the rowData in a SummarizedExperiment.

Usage

rowNames2RowData(x)

Arguments

x

A SummarizedExperiment with the features taxonomy in the rownames.

Value

The same SummarizedExpriment with the taxonomy now in the rowData.

Examples

data("zeller14")

## Keep only "CRC" and "control" (dichotomous variable)
z14 <- zeller14[, zeller14$study_condition %in% c("control", "CRC")]

## Get terminal nodes
tn <- get_terminal_nodes(rownames(z14))
z14_tn <- z14[tn, ]

## Normalize to relative abundance (also known as Total Sum Scaling)
z14_tn_ra <- relativeAb(z14_tn)

## Add the taxonomy to the rowData
input_se <- rowNames2RowData(z14_tn_ra)

Example dataset for lefser

Description

The ZellerG_2014 dataset contains microbiome count data for CRC patients and controls. It was for curatedMetagenomicData using the script in the package directory "data-raw".

Usage

data("zeller14")

Format

A SummarizedExperiment with 1585 features, 199 samples

study_condition

adenoma, control, CRC

age_category

adult, senior

Source

https://pubmed.ncbi.nlm.nih.gov/25432777/