Title: | Microbiome Analytics |
---|---|
Description: | Utilities for microbiome analysis. |
Authors: | Leo Lahti [aut, cre] , Sudarshan Shetty [aut] |
Maintainer: | Leo Lahti <[email protected]> |
License: | BSD_2_clause + file LICENSE |
Version: | 1.29.0 |
Built: | 2024-12-19 03:23:18 UTC |
Source: | https://github.com/bioc/microbiome |
Brief summary of the microbiome package
Package: | microbiome |
Type: | Package |
Version: | See sessionInfo() or DESCRIPTION file |
Date: | 2014-2017 |
License: | FreeBSD |
LazyLoad: | yes |
R package for microbiome studies
Leo Lahti et al. [email protected]
See citation('microbiome') http://microbiome.github.io
citation('microbiome')
citation('microbiome')
Retrieves the taxon abundance table from phyloseq-class object and ensures it is systematically returned as taxa x samples matrix.
abundances(x, transform = "identity")
abundances(x, transform = "identity")
x |
|
transform |
Transformation to apply. The options include: 'compositional' (ie relative abundance), 'Z', 'log10', 'log10p', 'hellinger', 'identity', 'clr', 'alr', or any method from the vegan::decostand function. |
Abundance matrix (OTU x samples).
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(dietswap) a <- abundances(dietswap) # b <- abundances(dietswap, transform='compositional')
data(dietswap) a <- abundances(dietswap) # b <- abundances(dietswap, transform='compositional')
best_hist
to a phyloseq-class
ObjectAdd the lowest classification for an OTU or ASV.
add_besthit(x, sep = ":")
add_besthit(x, sep = ":")
x |
|
sep |
separator e.g. ASV161:Roseburia |
Most commonly it is observed that taxa names are either OTU ids or
ASV ids. In such cases it is useful to know the taxonomic identity.
For this purpose, best_hist
identifies the best available
taxonomic identity and adds it to the OTU ids or ASV ids. If genus
and species columns are present in input the function internally
combines the names.
phyloseq-class
object phyloseq-class
Contact: Sudarshan A. Shetty [email protected]
## Not run: # Example data library(microbiome) data(dietswap) p0.f <- add_besthit(atlas1006, sep=":") ## End(Not run)
## Not run: # Example data library(microbiome) data(dietswap) p0.f <- add_besthit(atlas1006, sep=":") ## End(Not run)
refseq
Slot for dada2
based phyloseq
ObjectUtility to add refseq slot for dada2
based
phyloseq
Object. Here, the taxa_names which are unique
sequences, are stored in refseq
slot of phyloseq
.
Sequence ids are converted to ids using tag option.
add_refseq(x, tag = "ASV")
add_refseq(x, tag = "ASV")
x |
|
tag |
Provide name for Ids, Default="ASV". |
phyloseq-class
object
Contact: Sudarshan A. Shetty [email protected]
# ps <- add_refseq(p0,tag="ASV") # ps
# ps <- add_refseq(p0,tag="ASV") # ps
Combining rare taxa.
aggregate_rare(x, level, detection, prevalence, include.lowest = FALSE, ...)
aggregate_rare(x, level, detection, prevalence, include.lowest = FALSE, ...)
x |
|
level |
Summarization level (from |
detection |
Detection threshold for absence/presence (strictly greater by default). |
prevalence |
Prevalence threshold (in [0, 1]). The required prevalence is strictly greater by default. To include the limit, set include.lowest to TRUE. |
include.lowest |
Include the lower boundary of the detection and prevalence cutoffs. FALSE by default. |
... |
Arguments to pass. |
phyloseq-class
object
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(dietswap) s <- aggregate_rare(dietswap, level = 'Phylum', detection = 0.1/100, prevalence = 5/100)
data(dietswap) s <- aggregate_rare(dietswap, level = 'Phylum', detection = 0.1/100, prevalence = 5/100)
Summarize phyloseq data into a higher phylogenetic level.
aggregate_taxa(x, level, verbose = FALSE)
aggregate_taxa(x, level, verbose = FALSE)
x |
|
level |
Summarization level (from |
verbose |
verbose |
This provides a convenient way to aggregate phyloseq OTUs (or other taxa) when the phylogenetic tree is missing. Calculates the sum of OTU abundances over all OTUs that map to the same higher-level group. Removes ambiguous levels from the taxonomy table. Returns a phyloseq object with the summarized abundances.
Summarized phyloseq object
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(dietswap) s <- aggregate_taxa(dietswap, 'Phylum')
data(dietswap) s <- aggregate_taxa(dietswap, 'Phylum')
Global indicators of the ecoystem state, including richness, evenness, diversity, and other indicators
alpha(x, index = "all", zeroes = TRUE)
alpha(x, index = "all", zeroes = TRUE)
x |
A species abundance vector, or matrix (taxa/features x samples)
with the absolute count data (no relative abundances), or
|
index |
Default is ‘NULL’, meaning that all available indices will be included. For specific options, see details. |
zeroes |
Include zero counts in the diversity estimation. |
This function returns various indices of the ecosystem state.
The function is named alpha (global in some previous versions of this
package) as these indices can be viewed as measures of
alpha diversity. The function uses default choices for detection,
prevalence and other parameters for
simplicity and standardization. See the individual functions for more
options. All indicators from the richness, diversity, evenness,
dominance, and rarity functions are available. Some additional measures,
such as Chao1 and ACE are available via estimate_richness
function in the phyloseq package but not included here.
The index names are given the prefix richness_, evenness_, diversity_,
dominance_, or rarity_ in the output table to avoid confusion between
similarly named but different indices (e.g. Simpson diversity and Simpson
dominance). All parameters are set to their default. To experiment with
different parameterizations, see the more specific index functions
(richness, diversity, evenness, dominance, rarity).
A data.frame of samples x alpha diversity indicators
Contact: Leo Lahti [email protected]
See citation('microbiome')
dominance, rarity, phyloseq::estimate_richness
data(dietswap) d <- alpha(dietswap, index='shannon') # d <- alpha(dietswap, index='all')
data(dietswap) d <- alpha(dietswap, index='shannon') # d <- alpha(dietswap, index='all')
Cross-correlate columns of the input matrices.
associate( x, y = NULL, method = "spearman", p.adj.threshold = Inf, cth = NULL, order = FALSE, n.signif = 0, mode = "table", p.adj.method = "fdr", verbose = FALSE, filter.self.correlations = FALSE )
associate( x, y = NULL, method = "spearman", p.adj.threshold = Inf, cth = NULL, order = FALSE, n.signif = 0, mode = "table", p.adj.method = "fdr", verbose = FALSE, filter.self.correlations = FALSE )
x |
matrix (samples x features if annotation matrix) |
y |
matrix (samples x features if cross-correlated with annotations) |
method |
association method ('pearson', or 'spearman' for continuous) |
p.adj.threshold |
q-value threshold to include features |
cth |
correlation threshold to include features |
order |
order the results |
n.signif |
mininum number of significant correlations for each element |
mode |
Specify output format ('table' or 'matrix') |
p.adj.method |
p-value multiple testing correction method. One of the methods in p.adjust function ('BH' and others; see help(p.adjust)). Default: 'fdr' |
verbose |
verbose |
filter.self.correlations |
Filter out correlations between identical items. |
The p-values in the output table depend on the method. For the spearman and pearson correlation values, the p-values are provided by the default method in the cor.test function.
List with cor, pval, pval.adjusted
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(peerj32) d1 <- peerj32$microbes[1:20, 1:10] d2 <- peerj32$lipids[1:20,1:10] cc <- associate(d1, d2, method='pearson')
data(peerj32) d1 <- peerj32$microbes[1:20, 1:10] d2 <- peerj32$lipids[1:20,1:10] cc <- associate(d1, d2, method='pearson')
This data set contains genus-level microbiota profiling with HITChip for 1006 western adults with no reported health complications, reported in Lahti et al. (2014) https://doi.org/10.1038/ncomms5344.
data(atlas1006)
data(atlas1006)
The data set in phyloseq-class
format.
The data is also available for download from the Data Dryad http://doi.org/10.5061/dryad.pk75d.
Loads the data set in R.
Leo Lahti [email protected]
Lahti et al. Tipping elements of the human intestinal ecosystem. Nature Communications 5:4344, 2014. To cite the microbiome R package, see citation('microbiome')
Identify and select the baseline timepoint samples in a
phyloseq
object.
baseline(x, na.omit = TRUE)
baseline(x, na.omit = TRUE)
x |
phyloseq object. Assuming that the sample_data(x) has the fields 'time', 'sample' and 'subject' |
na.omit |
Logical. Ignore samples with no time point information. If this is FALSE, the first sample for each subject is selected even when there is no time information. |
Arranges the samples by time and picks the first sample for each subject. Compared to simple subsetting at time point zero, this checks NAs and possibility for multiple samples at the baseline, and guarantees that a single sample per subject is selected.
Phyloseq object with only baseline time point samples selected.
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(peerj32) a <- baseline(peerj32$phyloseq)
data(peerj32) a <- baseline(peerj32$phyloseq)
Estimate bimodality scores.
bimodality( x, method = "potential_analysis", peak.threshold = 1, bw.adjust = 1, bs.iter = 100, min.density = 1, verbose = TRUE )
bimodality( x, method = "potential_analysis", peak.threshold = 1, bw.adjust = 1, bs.iter = 100, min.density = 1, verbose = TRUE )
x |
A vector, matrix, or a phyloseq object |
method |
bimodality quantification method ('potential_analysis', 'Sarle.finite.sample', or 'Sarle.asymptotic'). If method='all', then a data.frame with all scores is returned. |
peak.threshold |
Mode detection threshold |
bw.adjust |
Bandwidth adjustment |
bs.iter |
Bootstrap iterations |
min.density |
minimum accepted density for a maximum; as a multiple of kernel height |
verbose |
Verbose |
Sarle.finite.sample Coefficient of bimodality for finite sample. See SAS 2012.
Sarle.asymptotic Coefficient of bimodality, used and described in Shade et al. (2014) and Ellison AM (1987).
potential_analysis Repeats potential analysis (Livina et al. 2010) multiple times with bootstrap sampling for each row of the input data (as in Lahti et al. 2014) and returns the bootstrap score.
The coefficient lies in (0, 1).
The 'Sarle.asymptotic' version is defined as
. This is coefficient of bimodality from Ellison AM Am. J. Bot. 1987, for microbiome analysis it has been used for instance in Shade et al. 2014. The formula for 'Sarle.finite.sample' (SAS 2012):
where n is sample size and
In both formulas, is sample skewness and
is the kth
standardized moment (also called the sample kurtosis, or
excess kurtosis).
A list with following elements:
scoreFraction of bootstrap samples where multiple modes are observed
nmodesThe most frequently observed number of modes in bootstrap sampling results.
resultsFull results of potential_analysis for each row of the input matrix.
Leo Lahti [email protected]
Livina et al. (2010). Potential analysis reveals changing number of climate states during the last 60 kyr. Climate of the Past, 6, 77-82.
Lahti et al. (2014). Tipping elements of the human intestinal ecosystem. Nature Communications 5:4344.
Shade et al. mBio 5(4):e01371-14, 2014.
AM Ellison, Am. J. Bot 74:1280-8, 1987.
SAS Institute Inc. (2012). SAS/STAT 12.1 user's guide. Cary, NC.
To cite the microbiome R package, see citation('microbiome')
A classical test of multimodality is provided by dip.test
in the DIP package.
# In practice, use more bootstrap iterations b <- bimodality(c(rnorm(100, mean=0), rnorm(100, mean=5)), method = "Sarle.finite.sample", bs.iter=5) # The classical DIP test: # quantifies unimodality. Values range between 0 to 1. # dip.test(x, simulate.p.value=TRUE, B=200)$statistic # Values less than 0.05 indicate significant deviation from unimodality. # Therefore, to obtain an increasing multimodality score, use # library(diptest) # multimodality.dip <- apply(abundances(pseq), 1, # function (x) {1 - unname(dip.test(x)$p.value)})
# In practice, use more bootstrap iterations b <- bimodality(c(rnorm(100, mean=0), rnorm(100, mean=5)), method = "Sarle.finite.sample", bs.iter=5) # The classical DIP test: # quantifies unimodality. Values range between 0 to 1. # dip.test(x, simulate.p.value=TRUE, B=200)$statistic # Values less than 0.05 indicate significant deviation from unimodality. # Therefore, to obtain an increasing multimodality score, use # library(diptest) # multimodality.dip <- apply(abundances(pseq), 1, # function (x) {1 - unname(dip.test(x)$p.value)})
Sarle's bimodality coefficient.
bimodality_sarle(x, bs.iter = 1, type = "Sarle.finite.sample")
bimodality_sarle(x, bs.iter = 1, type = "Sarle.finite.sample")
x |
Data vector for which bimodality will be quantified |
bs.iter |
Bootstrap iterations |
type |
Score type ('Sarle.finite.sample' or 'Sarle.asymptotic') |
The coefficient lies in (0, 1).
The 'Sarle.asymptotic' version is defined as
. This is coefficient of bimodality from Ellison AM Am. J. Bot. 1987, for microbiome analysis it has been used for instance in Shade et al. 2014.
The formula for 'Sarle.finite.sample' (SAS 2012):
where n is sample size and
In both formulas, is sample skewness and
is the kth
standardized moment (also called the sample kurtosis, or
excess kurtosis).
Bimodality score
Contact: Leo Lahti [email protected]
Shade et al. mBio 5(4):e01371-14, 2014.
Ellison AM (1987) Am J Botany 74(8):1280-1288.
SAS Institute Inc. (2012). SAS/STAT 12.1 user's guide. Cary, NC.
To cite the microbiome R package, see citation('microbiome')
Check the dip.test from the DIP package for a classical test of multimodality.
# b <- bimodality_sarle(rnorm(50), type='Sarle.finite.sample')
# b <- bimodality_sarle(rnorm(50), type='Sarle.finite.sample')
Plot phyloseq abundances.
boxplot_abundance( d, x, y, line = NULL, violin = FALSE, na.rm = FALSE, show.points = TRUE )
boxplot_abundance( d, x, y, line = NULL, violin = FALSE, na.rm = FALSE, show.points = TRUE )
d |
|
x |
Metadata variable to map to the horizontal axis. |
y |
OTU to map on the vertical axis |
line |
The variable to map on lines |
violin |
Use violin version of the boxplot |
na.rm |
Remove NAs |
show.points |
Include data points in the figure |
The directionality of change in paired boxplot is indicated by the colors of the connecting lines.
A ggplot
plot object
data(peerj32) p <- boxplot_abundance(peerj32$phyloseq, x='time', y='Akkermansia', line='subject')
data(peerj32) p <- boxplot_abundance(peerj32$phyloseq, x='time', y='Akkermansia', line='subject')
Plot alpha index.
boxplot_alpha( x, x_var = NULL, index = NULL, violin = FALSE, na.rm = FALSE, show.points = TRUE, zeroes = TRUE, element.alpha = 0.5, element.width = 0.2, fill.colors = NA, outlier.fill = "grey50" )
boxplot_alpha( x, x_var = NULL, index = NULL, violin = FALSE, na.rm = FALSE, show.points = TRUE, zeroes = TRUE, element.alpha = 0.5, element.width = 0.2, fill.colors = NA, outlier.fill = "grey50" )
x |
|
x_var |
Metadata variable to map to the horizontal axis. |
index |
Alpha index to plot. See function |
violin |
Use violin version of the boxplot |
na.rm |
Remove NAs |
show.points |
Include data points in the figure |
zeroes |
Include zero counts in diversity estimation. Default is TRUE |
element.alpha |
Alpha value for plot elements. Controls the transparency of plots elements. |
element.width |
Width value for plot elements. Controls the transparency of plots elements. |
fill.colors |
Specify a list of colors passed on to ggplot2
|
outlier.fill |
If using boxplot and and points together how to deal with outliers. See ggplot2 outlier.fill argument in geom_ elements. |
A simple wrapper to visualize alpha diversity index.
A ggplot
plot object
data("dietswap") p <- boxplot_alpha(dietswap, x_var = "sex", index="observed", violin=FALSE, na.rm=FALSE, show.points=TRUE, zeroes=TRUE, element.alpha=0.5, element.width=0.2, fill.colors= c("steelblue", "firebrick"), outlier.fill="white") p
data("dietswap") p <- boxplot_alpha(dietswap, x_var = "sex", index="observed", violin=FALSE, na.rm=FALSE, show.points=TRUE, zeroes=TRUE, element.alpha=0.5, element.width=0.2, fill.colors= c("steelblue", "firebrick"), outlier.fill="white") p
Arrange correlation matrices from associate into a table format.
cmat2table(res, verbose = FALSE)
cmat2table(res, verbose = FALSE)
res |
Output from associate |
verbose |
verbose |
Correlation table
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(peerj32) d1 <- peerj32$microbes[1:20, 1:10] d2 <- peerj32$lipids[1:20,1:10] cc <- associate(d1, d2, mode='matrix', method='pearson') cmat <- associate(d1, d2, mode='table', method='spearman')
data(peerj32) d1 <- peerj32$microbes[1:20, 1:10] d2 <- peerj32$lipids[1:20,1:10] cc <- associate(d1, d2, mode='matrix', method='pearson') cmat <- associate(d1, d2, mode='table', method='spearman')
Collapse samples, mostly meant for technical replicates.
collapse_replicates( x, method = "sample", replicate_id = NULL, replicate_fields = NULL )
collapse_replicates( x, method = "sample", replicate_id = NULL, replicate_fields = NULL )
x |
|
method |
Collapsing method. Only random sampling ("sample") implemented. |
replicate_id |
Replicate identifier. A character vector. |
replicate_fields |
Metadata fields used to determine replicates. |
Collapsed phyloseq object.
Contact: Leo Lahti [email protected]
To cite the microbiome R package, see citation('microbiome')
data(atlas1006) pseq <- collapse_replicates(atlas1006, method = "sample", replicate_fields = c("subject", "time"))
data(atlas1006) pseq <- collapse_replicates(atlas1006, method = "sample", replicate_fields = c("subject", "time"))
Filter the phyloseq object to include only prevalent taxa.
core(x, detection, prevalence, include.lowest = FALSE, ...)
core(x, detection, prevalence, include.lowest = FALSE, ...)
x |
|
detection |
Detection threshold for absence/presence (strictly greater by default). |
prevalence |
Prevalence threshold (in [0, 1]). The required prevalence is strictly greater by default. To include the limit, set include.lowest to TRUE. |
include.lowest |
Include the lower boundary of the detection and prevalence cutoffs. FALSE by default. |
... |
Arguments to pass. |
Filtered phyloseq object including only prevalent taxa
Contact: Leo Lahti [email protected]
Salonen A, Salojarvi J, Lahti L, de Vos WM. The adult intestinal core microbiota is determined by analysis depth and health status. Clinical Microbiology and Infection 18(S4):16-20, 2012 To cite the microbiome R package, see citation('microbiome')
core_members, rare_members
data(dietswap) # Detection threshold 0 (strictly greater by default); # Prevalence threshold 50 percent (strictly greater by default) pseq <- core(dietswap, 0, 50/100) # Detection threshold 0 (strictly greater by default); # Prevalence threshold exactly 100 percent; for this set # include.lowest=TRUE, otherwise the required prevalence is # strictly greater than 100 pseq <- core(dietswap, 0, 100/100, include.lowest = TRUE)
data(dietswap) # Detection threshold 0 (strictly greater by default); # Prevalence threshold 50 percent (strictly greater by default) pseq <- core(dietswap, 0, 50/100) # Detection threshold 0 (strictly greater by default); # Prevalence threshold exactly 100 percent; for this set # include.lowest=TRUE, otherwise the required prevalence is # strictly greater than 100 pseq <- core(dietswap, 0, 100/100, include.lowest = TRUE)
Calculates the community core abundance index.
core_abundance( x, detection = 0.1/100, prevalence = 50/100, include.lowest = FALSE )
core_abundance( x, detection = 0.1/100, prevalence = 50/100, include.lowest = FALSE )
x |
|
detection |
Detection threshold for absence/presence (strictly greater by default). |
prevalence |
Prevalence threshold (in [0, 1]). The required prevalence is strictly greater by default. To include the limit, set include.lowest to TRUE. |
include.lowest |
Include the lower boundary of the detection and prevalence cutoffs. FALSE by default. |
The core abundance index gives the relative proportion of the core species (in [0,1]). The core taxa are defined as those that exceed the given population prevalence threshold at the given detection level.
A vector of core abundance indices
Contact: Leo Lahti [email protected]
rarity
data(dietswap) d <- core_abundance(dietswap, detection=0.1/100, prevalence=50/100)
data(dietswap) d <- core_abundance(dietswap, detection=0.1/100, prevalence=50/100)
Core heatmap.
core_heatmap(x, dets, cols, min.prev, taxa.order)
core_heatmap(x, dets, cols, min.prev, taxa.order)
x |
OTU matrix |
dets |
A vector or a scalar indicating the number of intervals in (0, log10(max(data))). The dets are calculated for relative abundancies. |
cols |
colours for the heatmap |
min.prev |
If minimum prevalence is set, then filter out those rows (taxa) and columns (dets) that never exceed this prevalence. This helps to zoom in on the actual core region of the heatmap. |
taxa.order |
Ordering of the taxa. |
Used for its side effects
Contact: Leo Lahti [email protected]
A Salonen et al. The adult intestinal core microbiota is determined by analysis depth and health status. Clinical Microbiology and Infection 18(S4):16 20, 2012. To cite the microbiome R package, see citation('microbiome')
Creates the core matrix.
core_matrix(x, prevalences = seq(0.1, 1, , 1), detections = NULL)
core_matrix(x, prevalences = seq(0.1, 1, , 1), detections = NULL)
x |
|
prevalences |
a vector of prevalence percentages in [0,1] |
detections |
a vector of intensities around the data range |
Estimated core microbiota
Contact: Jarkko Salojarvi [email protected]
A Salonen et al. The adult intestinal core microbiota is determined by analysis depth and health status. Clinical Microbiology and Infection 18(S4):16 20, 2012. To cite the microbiome R package, see citation('microbiome')
# Not exported #data(peerj32) #core <- core_matrix(peerj32$phyloseq)
# Not exported #data(peerj32) #core <- core_matrix(peerj32$phyloseq)
Determine members of the core microbiota with given abundance and prevalences
core_members(x, detection = 1/100, prevalence = 50/100, include.lowest = FALSE)
core_members(x, detection = 1/100, prevalence = 50/100, include.lowest = FALSE)
x |
|
detection |
Detection threshold for absence/presence (strictly greater by default). |
prevalence |
Prevalence threshold (in [0, 1]). The required prevalence is strictly greater by default. To include the limit, set include.lowest to TRUE. |
include.lowest |
Include the lower boundary of the detection and prevalence cutoffs. FALSE by default. |
For phyloseq object, lists taxa that are more prevalent with the given detection threshold. For matrix, lists columns that satisfy these criteria.
Vector of core members
Contact: Leo Lahti [email protected]
A Salonen et al. The adult intestinal core microbiota is determined by analysis depth and health status. Clinical Microbiology and Infection 18(S4):16 20, 2012. To cite the microbiome R package, see citation('microbiome')
data(dietswap) # Detection threshold 1 (strictly greater by default); # Note that the data (dietswap) is here in absolute counts # (and not compositional, relative abundances) # Prevalence threshold 50 percent (strictly greater by default) a <- core_members(dietswap, 1, 50/100)
data(dietswap) # Detection threshold 1 (strictly greater by default); # Note that the data (dietswap) is here in absolute counts # (and not compositional, relative abundances) # Prevalence threshold 50 percent (strictly greater by default) a <- core_members(dietswap, 1, 50/100)
Community coverage index.
coverage(x, threshold = 0.5)
coverage(x, threshold = 0.5)
x |
A species abundance vector, or matrix (taxa/features x samples)
with the absolute count data (no relative abundances), or
|
threshold |
Indicates the fraction of the ecosystem to be occupied by the N most abundant species (N is returned by this function). If the detection argument is a vector, then a data.frame is returned, one column for each detection threshold. |
The coverage index gives the number of groups needed to have a given proportion of the ecosystem occupied (by default 0.5 ie 50
A vector of coverage indices
Contact: Leo Lahti [email protected]
dominance, alpha
data(dietswap) d <- coverage(dietswap, threshold=0.5)
data(dietswap) d <- coverage(dietswap, threshold=0.5)
Default colors for different variables.
default_colors(x, v = NULL)
default_colors(x, v = NULL)
x |
Name of the variable type ("Phylum") |
v |
Optional. Vector of elements to color. |
Named character vector of default colors
Leo Lahti [email protected]
See citation("microbiome")
col <- default_colors("Phylum")
col <- default_colors("Phylum")
Density visualization for data points overlaid on cross-plot.
densityplot( x, main = NULL, x.ticks = 10, rounding = 0, add.points = TRUE, col = "black", adjust = 1, size = 1, legend = FALSE, shading = TRUE, shading.low = "white", shading.high = "black", point.opacity = 0.75 )
densityplot( x, main = NULL, x.ticks = 10, rounding = 0, add.points = TRUE, col = "black", adjust = 1, size = 1, legend = FALSE, shading = TRUE, shading.low = "white", shading.high = "black", point.opacity = 0.75 )
x |
Data matrix to plot. The first two columns will be visualized as a cross-plot. |
main |
title text |
x.ticks |
Number of ticks on the X axis |
rounding |
Rounding for X axis tick values |
add.points |
Plot the data points as well |
col |
Color of the data points. NAs are marked with darkgray. |
adjust |
Kernel width adjustment |
size |
point size |
legend |
plot legend TRUE/FALSE |
shading |
Shading |
shading.low |
Color for shading low density regions |
shading.high |
Color for shading high density regions |
point.opacity |
Transparency-level for points |
ggplot2 object
Contact: Leo Lahti [email protected]
See citation('microbiome')
# p <- densityplot(cbind(rnorm(100), rnorm(100)))
# p <- densityplot(cbind(rnorm(100), rnorm(100)))
The diet swap data set represents a study with African and African American groups undergoing a two-week diet swap. For details, see dx.doi.org/10.1038/ncomms7342.
data(dietswap)
data(dietswap)
The data set in phyloseq-class
format.
The data is also available for download from the Data Dryad repository http://datadryad.org/resource/doi:10.5061/dryad.1mn1n.
Loads the data set in R.
Leo Lahti [email protected]
O'Keefe et al. Nature Communications 6:6342, 2015. dx.doi.org/10.1038/ncomms7342 To cite the microbiome R package, see citation('microbiome')
Quantify microbiota divergence (heterogeneity) within a given sample set with respect to a reference.
divergence(x, y, method = "bray")
divergence(x, y, method = "bray")
x |
phyloseq object or a vector |
y |
Reference sample. A vector. |
method |
dissimilarity method: any method available via phyloseq::distance function. Note that some methods ("jsd" and 'unifrac' for instance) do not work with the group divergence. |
Microbiota divergence (heterogeneity / spread) within a given sample set can be quantified by the average sample dissimilarity or beta diversity with respect to a given reference sample.
This measure is sensitive to sample size. Subsampling or bootstrapping can be applied to equalize sample sizes between comparisons.
Vector with dissimilarities; one for each sample, quantifying the dissimilarity of the sample from the reference sample.
Leo Lahti [email protected]
To cite this R package, see citation('microbiome')
the vegdist function from the vegan package provides many standard beta diversity measures
# Assess beta diversity among the African samples # in a diet swap study (see \code{help(dietswap)} for references) data(dietswap) pseq <- subset_samples(dietswap, nationality == 'AFR') reference <- apply(abundances(pseq), 1, median) b <- divergence(pseq, reference, method = "bray")
# Assess beta diversity among the African samples # in a diet swap study (see \code{help(dietswap)} for references) data(dietswap) pseq <- subset_samples(dietswap, nationality == 'AFR') reference <- apply(abundances(pseq), 1, median) b <- divergence(pseq, reference, method = "bray")
Various community diversity indices.
diversity(x, index = "all", zeroes = TRUE)
diversity(x, index = "all", zeroes = TRUE)
x |
A species abundance vector, or matrix (taxa/features x samples)
with the absolute count data (no relative abundances), or
|
index |
Diversity index. See details for options. |
zeroes |
Include zero counts in the diversity estimation. |
By default, returns all diversity indices. The available diversity indices include the following:
inverse_simpson Inverse Simpson diversity: $1/lambda$ where $lambda=sum(p^2)$ and $p$ are relative abundances.
gini_simpson Gini-Simpson diversity $1 - lambda$. This is also called Gibbs–Martin, or Blau index in sociology, psychology and management studies.
shannon Shannon diversity ie entropy
fisher Fisher alpha; as implemented in the vegan package
coverage Number of species needed to cover 50% of the ecosystem. For other quantiles, apply the function coverage directly.
A vector of diversity indices
Contact: Leo Lahti [email protected]
Beisel J-N. et al. A Comparative Analysis of Diversity Index Sensitivity. Internal Rev. Hydrobiol. 88(1):3-15, 2003. URL: https://portais.ufg.br/up/202/o/2003-comparative_evennes_index.pdf
Bulla L. An index of diversity and its associated diversity measure. Oikos 70:167–171, 1994
Magurran AE, McGill BJ, eds (2011) Biological Diversity: Frontiers in Measurement and Assessment (Oxford Univ Press, Oxford), Vol 12.
Smith B and Wilson JB. A Consumer's Guide to Diversity Indices. Oikos 76(1):70-82, 1996.
dominance, richness, evenness, rarity, alpha
data(dietswap) d <- alpha(dietswap, 'shannon')
data(dietswap) d <- alpha(dietswap, 'shannon')
Calculates the community dominance index.
dominance(x, index = "all", rank = 1, relative = TRUE, aggregate = TRUE)
dominance(x, index = "all", rank = 1, relative = TRUE, aggregate = TRUE)
x |
A species abundance vector, or matrix (taxa/features x samples)
with the absolute count data (no relative abundances), or
|
index |
If the index is given, it will override the other parameters. See the details below for description and references of the standard dominance indices. By default, this function returns the Berger-Parker index, ie relative dominance at rank 1. |
rank |
Optional. The rank of the dominant taxa to consider. |
relative |
Use relative abundances (default: TRUE) |
aggregate |
Aggregate (TRUE; default) the top members or not. If aggregate=TRUE, then the sum of relative abundances is returned. Otherwise the relative abundance is returned for the single taxa with the indicated rank. |
The dominance index gives the abundance of the most abundant species. This has been used also in microbiomics context (Locey & Lennon (2016)). The following indices are provided:
'absolute' This is the most simple variant, giving the absolute abundance of the most abundant species (Magurran & McGill 2011). By default, this refers to the single most dominant species (rank=1) but it is possible to calculate the absolute dominance with rank n based on the abundances of top-n species by tuning the rank argument.
'relative' Relative abundance of the most abundant species. This is with rank=1 by default but can be calculated for other ranks.
'DBP' Berger–Parker index, a special case of relative dominance with rank 1; This also equals the inverse of true diversity of the infinite order.
'DMN' McNaughton’s dominance. This is the sum of the relative abundance of the two most abundant taxa, or a special case of relative dominance with rank 2
'simpson' Simpson's index ($sum(p^2)$) where p are relative abundances has an interpretation as a dominance measure. Also the version ($sum(q * (q-1)) / S(S-1)$) based on absolute abundances q has been proposed by Simpson (1949) but not included here as it is not within [0,1] range, and it is highly correlated with the simpler Simpson dominance. Finally, it is also possible to calculated dominances up to an arbitrary rank by setting the rank argument
'core_abundance' Relative proportion of the core species that exceed detection level 0.2% in over 50% of the samples
'gini' Gini index is calculated with the function
inequality
.
By setting aggregate=FALSE, the abundance for the single n'th most dominant taxa (n=rank) is returned instead the sum of abundances up to that rank (the default).
A vector of dominance indices
Contact: Leo Lahti [email protected]
Kenneth J. Locey and Jay T. Lennon. Scaling laws predict global microbial diversity. PNAS 2016 113 (21) 5970-5975; doi:10.1073/pnas.1521291113.
Magurran AE, McGill BJ, eds (2011) Biological Diversity: Frontiers in Measurement and Assessment (Oxford Univ Press, Oxford), Vol 12
coverage, core_abundance, rarity, alpha
data(dietswap) # vector d <- dominance(abundances(dietswap)[,1], rank=1, relative=TRUE) # matrix # d <- dominance(abundances(dietswap), rank=1, relative=TRUE) # Phyloseq object # d <- dominance(dietswap, rank=1, relative=TRUE)
data(dietswap) # vector d <- dominance(abundances(dietswap)[,1], rank=1, relative=TRUE) # matrix # d <- dominance(abundances(dietswap), rank=1, relative=TRUE) # Phyloseq object # d <- dominance(dietswap, rank=1, relative=TRUE)
Returns the dominant taxonomic group for each sample.
dominant(x, level = NULL)
dominant(x, level = NULL)
x |
A |
level |
Optional. Taxonomic level. |
A vector of dominance indices
Leo Lahti [email protected]
data(dietswap) # vector d <- dominant(dietswap)
data(dietswap) # vector d <- dominant(dietswap)
Various community evenness indices.
evenness(x, index = "all", zeroes = TRUE, detection = 0)
evenness(x, index = "all", zeroes = TRUE, detection = 0)
x |
A species abundance vector, or matrix (taxa/features x samples)
with the absolute count data (no relative abundances), or
|
index |
Evenness index. See details for options. |
zeroes |
Include zero counts in the evenness estimation. |
detection |
Detection threshold |
By default, Pielou's evenness is returned.
The available evenness indices include the following: 1) 'camargo': Camargo's evenness (Camargo 1992) 2) 'simpson': Simpson’s evenness (inverse Simpson diversity / S) 3) 'pielou': Pielou's evenness (Pielou, 1966), also known as Shannon or Shannon-Weaver/Wiener/Weiner evenness; H/ln(S). The Shannon-Weaver is the preferred term; see A tribute to Claude Shannon (1916 –2001) and a plea for more rigorous use of species richness, species diversity and the ‘Shannon–Wiener’ Index. Spellerberg and Fedor. Alpha Ecology & Biogeography (2003) 12, 177–197 4) 'evar': Smith and Wilson’s Evar index (Smith & Wilson 1996) 5) 'bulla': Bulla’s index (O) (Bulla 1994)
Desirable statistical evenness metrics avoid strong bias towards very large or very small abundances; are independent of richness; and range within [0,1] with increasing evenness (Smith & Wilson 1996). Evenness metrics that fulfill these criteria include at least camargo, simpson, smith-wilson, and bulla. Also see Magurran & McGill (2011) and Beisel et al. (2003) for further details.
A vector of evenness indices
Contact: Leo Lahti [email protected]
Beisel J-N. et al. A Comparative Analysis of Evenness Index Sensitivity. Internal Rev. Hydrobiol. 88(1):3-15, 2003. URL: https://portais.ufg.br/up/202/o/2003-comparative_evennes_index.pdf
Bulla L. An index of evenness and its associated diversity measure. Oikos 70:167–171, 1994
Camargo, JA. New diversity index for assessing structural alterations in aquatic communities. Bull. Environ. Contam. Toxicol. 48:428–434, 1992.
Locey KJ and Lennon JT. Scaling laws predict global microbial diversity. PNAS 113(21):5970-5975, 2016; doi:10.1073/pnas.1521291113.
Magurran AE, McGill BJ, eds (2011) Biological Diversity: Frontiers in Measurement and Assessment (Oxford Univ Press, Oxford), Vol 12.
Pielou, EC. The measurement of diversity in different types of biological collections. Journal of Theoretical Biology 13:131–144, 1966.
Smith B and Wilson JB. A Consumer's Guide to Evenness Indices. Oikos 76(1):70-82, 1996.
coverage, core_abundance, rarity, alpha
data(dietswap) # phyloseq object #d <- evenness(dietswap, 'pielou') # matrix #d <- evenness(abundances(dietswap), 'pielou') # vector d <- evenness(abundances(dietswap)[,1], 'pielou')
data(dietswap) # phyloseq object #d <- evenness(dietswap, 'pielou') # matrix #d <- evenness(abundances(dietswap), 'pielou') # vector d <- evenness(abundances(dietswap)[,1], 'pielou')
Detect optima, excluding local optima below peak.threshold.
find_optima(f, peak.threshold = 0, bw = 1, min.density = 1)
find_optima(f, peak.threshold = 0, bw = 1, min.density = 1)
f |
density |
peak.threshold |
Mode detection threshold |
bw |
bandwidth |
min.density |
Minimun accepted density for a maximum; as a multiple of kernel height |
A list with min (minima), max (maxima), and peak.threshold (minimum detection density)
Leo Lahti [email protected]
See citation('microbiome')
# Not exported # o <- find_optima(rnorm(100), bw=1)
# Not exported # o <- find_optima(rnorm(100), bw=1)
Measure association between nominal (no order for levels) variables
gktau(x, y)
gktau(x, y)
x |
first variable |
y |
second variable |
Measure association between nominal (no order for levels) variables using Goodman and Kruskal tau. Code modified from the original source: r-bloggers.com/measuring-associations-between-non-numeric-variables/ An important feature of this procedure is that it allows missing values in either of the variables x or y, treating 'missing' as an additional level. In practice, this is sometimes very important since missing values in one variable may be strongly associated with either missing values in another variable or specific non-missing levels of that variable. An important characteristic of Goodman and Kruskal's tau measure is its asymmetry: because the variables x and y enter this expression differently, the value of a(y,x) is not the same as the value of a(x, y), in general. This stands in marked contrast to either the product-moment correlation coefficient or the Spearman rank correlation coefficient, which are both symmetric, giving the same association between x and y as that between y and x. The fundamental reason for the asymmetry of the general class of measures defined above is that they quantify the extent to which the variable x is useful in predicting y, which may be very different than the extent to which the variable y is useful in predicting x.
Dependency measure
Contact: Leo Lahti [email protected]
Code modified from the original source: http://r-bloggers.com/measuring-associations-between-non-numeric-variables/ To cite the microbiome R package, see citation('microbiome')
data(peerj32) v1 <- factor(peerj32$microbes[,1]) v2 <- factor(peerj32$meta$gender) tc <- gktau(v1, v2)
data(peerj32) v1 <- factor(peerj32$microbes[,1]) v2 <- factor(peerj32$meta$gender) tc <- gktau(v1, v2)
Cut age information to discrete factors.
group_age( x, breaks = "decades", n = 10, labels = NULL, include.lowest = TRUE, right = FALSE, dig.lab = 3, ordered_result = FALSE )
group_age( x, breaks = "decades", n = 10, labels = NULL, include.lowest = TRUE, right = FALSE, dig.lab = 3, ordered_result = FALSE )
x |
Numeric vector (age in years) |
breaks |
Class break points. Either a vector of breakpoints, or one of the predefined options ("years", "decades", "even"). |
n |
Number of groups for the breaks = "even" option. |
labels |
labels for the levels of the resulting category. By default,
labels are constructed using |
include.lowest |
logical, indicating if an ‘x[i]’ equal to
the lowest (or highest, for |
right |
logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa. |
dig.lab |
integer which is used when labels are not given. It determines the number of digits used in formatting the break numbers. |
ordered_result |
logical: should the result be an ordered factor? |
Regarding the breaks arguments, the "even" option aims to cut the samples in groups with approximately the same size (by quantiles). The "years" and "decades" options are self-explanatory.
Factor of age groups.
Contact: Leo Lahti [email protected]
See citation('microbiome')
base::cut
data(atlas1006) age.numeric <- meta(atlas1006)$age age.factor <- group_age(age.numeric)
data(atlas1006) age.numeric <- meta(atlas1006)$age age.factor <- group_age(age.numeric)
Cut BMI information to standard discrete factors.
group_bmi( x, breaks = "standard", n = 10, labels = NULL, include.lowest = TRUE, right = FALSE, dig.lab = 3, ordered_result = FALSE )
group_bmi( x, breaks = "standard", n = 10, labels = NULL, include.lowest = TRUE, right = FALSE, dig.lab = 3, ordered_result = FALSE )
x |
Numeric vector (BMI) |
breaks |
Class break points. Either a vector of breakpoints, or one of the predefined options ("standard", "standard_truncated", "even"). |
n |
Number of groups for the breaks = "even" option. |
labels |
labels for the levels of the resulting category. By default,
labels are constructed using |
include.lowest |
logical, indicating if an ‘x[i]’ equal to
the lowest (or highest, for |
right |
logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa. |
dig.lab |
integer which is used when labels are not given. It determines the number of digits used in formatting the break numbers. |
ordered_result |
logical: should the result be an ordered factor? |
Regarding the breaks arguments, the "even" option aims to cut the samples in groups with approximately the same size (by quantiles). The "standard" option corresponds to standard obesity categories defined by the cutoffs <18.5 (underweight); <25 (lean); <30 (obese); <35 (severe obese); <40 (morbid obese); <45 (super obese). The standard_truncated combines the severe, morbid and super obese into a single group.
Factor of BMI groups.
Contact: Leo Lahti [email protected]
See citation('microbiome')
base::cut
bmi.numeric <- range(rnorm(100, mean = 25, sd = 3)) bmi.factor <- group_bmi(bmi.numeric)
bmi.numeric <- range(rnorm(100, mean = 25, sd = 3)) bmi.factor <- group_bmi(bmi.numeric)
Visualizes n x m association table as heatmap.
heat( df, Xvar = names(df)[[1]], Yvar = names(df)[[2]], fill = names(df)[[3]], star = NULL, p.adj.threshold = 1, association.threshold = 0, step = 0.2, colours = c("darkblue", "blue", "white", "red", "darkred"), limits = NULL, legend.text = "", order.rows = TRUE, order.cols = TRUE, filter.significant = TRUE, star.size = NULL, plot.values = FALSE )
heat( df, Xvar = names(df)[[1]], Yvar = names(df)[[2]], fill = names(df)[[3]], star = NULL, p.adj.threshold = 1, association.threshold = 0, step = 0.2, colours = c("darkblue", "blue", "white", "red", "darkred"), limits = NULL, legend.text = "", order.rows = TRUE, order.cols = TRUE, filter.significant = TRUE, star.size = NULL, plot.values = FALSE )
df |
Data frame. Each row corresponds to a pair of associated variables. The columns give variable names, association scores and significance estimates. |
Xvar |
X axis variable column name. For instance 'X'. |
Yvar |
Y axis variable column name. For instance 'Y'. |
fill |
Column to be used for heatmap coloring. For instance 'association'. |
star |
Column to be used for cell highlighting. For instance 'p.adj'. |
p.adj.threshold |
Significance threshold for the stars. |
association.threshold |
Include only elements that have absolute association higher than this value |
step |
color interval |
colours |
heatmap colours |
limits |
colour scale limits |
legend.text |
legend text |
order.rows |
Order rows to enhance visualization interpretability. If this is logical, then hclust is applied. If this is a vector then the rows are ordered using this index. |
order.cols |
Order columns to enhance visualization interpretability. If this is logical, then hclust is applied. If this is a vector then the rows are ordered using this index. |
filter.significant |
Keep only the elements with at least one significant entry |
star.size |
NULL Determine size of the highlight symbols |
plot.values |
Show values as text |
ggplot2 object
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(peerj32) d1 <- peerj32$lipids[, 1:10] d2 <- peerj32$microbes[, 1:10] cc <- associate(d1, d2, method='pearson') p <- heat(cc, 'X1', 'X2', 'Correlation', star='p.adj')
data(peerj32) d1 <- peerj32$lipids[, 1:10] d2 <- peerj32$microbes[, 1:10] cc <- associate(d1, d2, method='pearson') p <- heat(cc, 'X1', 'X2', 'Correlation', star='p.adj')
HITChip taxonomy table.
data(hitchip.taxonomy)
data(hitchip.taxonomy)
List with the element 'filtered', including a simplified version of the HITChip taxonomy.
Loads the data set in R.
Leo Lahti [email protected]
Lahti et al. Tipping elements of the human intestinal ecosystem. Nature Communications 5:4344, 2014. To cite the microbiome R package, see citation('microbiome')
Coloured bimodality plot.
hotplot( x, taxon, tipping.point = NULL, lims = NULL, shift = 0.001, log10 = TRUE )
hotplot( x, taxon, tipping.point = NULL, lims = NULL, shift = 0.001, log10 = TRUE )
x |
|
taxon |
Taxonomic group to visualize. |
tipping.point |
Indicate critical point for abundance variations to be highlighted. |
lims |
Optional. Figure X axis limits. |
shift |
Small constant to avoid problems with zeroes in log10 |
log10 |
Use log10 abundances for the OTU table and tipping point |
ggplot
object
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(atlas1006) pseq <- subset_samples(atlas1006, DNA_extraction_method == 'r') pseq <- transform(pseq, 'compositional') # Set a tipping point manually tipp <- .3/100 # .3 percent relative abundance # Bimodality is often best visible at log10 relative abundances p <- hotplot(pseq, 'Dialister', tipping.point=tipp, log10=TRUE)
data(atlas1006) pseq <- subset_samples(atlas1006, DNA_extraction_method == 'r') pseq <- transform(pseq, 'compositional') # Set a tipping point manually tipp <- .3/100 # .3 percent relative abundance # Bimodality is often best visible at log10 relative abundances p <- hotplot(pseq, 'Dialister', tipping.point=tipp, log10=TRUE)
Calculate Gini indices for a phyloseq object.
inequality(x)
inequality(x)
x |
|
Gini index is a common measure for relative inequality in economical income, but can also be used as a community diversity measure. Gini index is between [0,1], and increasing gini index implies increasing inequality.
A vector of Gini indices
Contact: Leo Lahti [email protected]
Relative Distribution Methods in the Social Sciences. Mark S. Handcock and Martina Morris, Springer-Verlag, Inc., New York, 1999. ISBN 0387987789.
diversity, reldist::gini (inspired by that implementation but independently written here to avoid external depedencies)
data(dietswap) d <- inequality(dietswap)
data(dietswap) d <- inequality(dietswap)
Quantify intermediate stability with respect to a given reference point.
intermediate_stability( x, reference.point = NULL, method = "correlation", output = "scores" )
intermediate_stability( x, reference.point = NULL, method = "correlation", output = "scores" )
x |
phyloseq object. Includes abundances (variables x samples) and sample_data data.frame (samples x features) with 'subject' and 'time' field for each sample. |
reference.point |
Calculate stability of the data w.r.t. this point. By default the intermediate range is used (min + (max - min)/2). If a vector of points is provided, then the scores will be calculated for every point and a data.frame is returned. |
method |
'lm' (linear model) or 'correlation'; the linear model takes time into account as a covariate |
output |
Specify the return mode. Either the 'full' set of stability analysis outputs, or the 'scores' of intermediate stability. |
Decomposes each column in x into differences between consecutive time points. For each variable and time point we calculate for the data values: (i) the distance from reference point; (ii) distance from the data value at the consecutive time point. The 'correlation' method calculates correlation between these two variables. Negative correlations indicate that values closer to reference point tend to have larger shifts in the consecutive time point. The 'lm' method takes the time lag between the consecutive time points into account as this may affect the comparison and is not taken into account by the straightforward correlation. Here the coefficients of the following linear model are used to assess stability: abs(change) ~ time + abs(start.reference.distance). Samples with missing data, and subjects with less than two time point are excluded. The absolute count data x is logarithmized before the analysis with the log10(1 + x) trick to circumvent logarithmization of zeroes.
A list with following elements: stability: estimated stability data: processed data set used in calculations
Leo Lahti [email protected]
data(atlas1006) x <- subset_samples(atlas1006, DNA_extraction_method == 'r') x <- prune_taxa(c('Akkermansia', 'Dialister'), x) res <- intermediate_stability(x, reference.point=NULL)
data(atlas1006) x <- subset_samples(atlas1006, DNA_extraction_method == 'r') x <- prune_taxa(c('Akkermansia', 'Dialister'), x) res <- intermediate_stability(x, reference.point=NULL)
Test if phyloseq object is compositional.
is_compositional(x, tolerance = 1e-06)
is_compositional(x, tolerance = 1e-06)
x |
|
tolerance |
Tolerance for detecting compositionality. |
This function tests that the sum of abundances within each sample is almost zero, within the tolerance of 1e-6 by default.
Logical TRUE/FALSE
transform
data(dietswap) a <- is_compositional(dietswap) b <- is_compositional(transform(dietswap, "identity")) c <- is_compositional(transform(dietswap, "compositional"))
data(dietswap) a <- is_compositional(dietswap) b <- is_compositional(transform(dietswap, "identity")) c <- is_compositional(transform(dietswap, "compositional"))
Calculates the community rarity index by log-modulo skewness.
log_modulo_skewness(x, q = 0.5, n = 50)
log_modulo_skewness(x, q = 0.5, n = 50)
x |
Abundance matrix (taxa x samples) with counts |
q |
Arithmetic abundance classes are evenly cut up to to this quantile of the data. The assumption is that abundances higher than this are not common, and they are classified in their own group. |
n |
The number of arithmetic abundance classes from zero to the quantile cutoff indicated by q. |
The rarity index characterizes the concentration of species at low abundance. Here, we use the skewness of the frequency distribution of arithmetic abundance classes (see Magurran & McGill 2011). These are typically right-skewed; to avoid taking log of occasional negative skews, we follow Locey & Lennon (2016) and use the log-modulo transformation that adds a value of one to each measure of skewness to allow logarithmization.
A vector of rarity indices
Contact: Leo Lahti [email protected]
Kenneth J. Locey and Jay T. Lennon. Scaling laws predict global microbial diversity. PNAS 2016 113 (21) 5970-5975; doi:10.1073/pnas.1521291113.
Magurran AE, McGill BJ, eds (2011) Biological Diversity: Frontiers in Measurement and Assessment (Oxford Univ Press, Oxford), Vol 12
core_abundance, low_abundance, alpha
data(dietswap) d <- log_modulo_skewness(dietswap)
data(dietswap) d <- log_modulo_skewness(dietswap)
Calculates the concentration of low-abundance taxa below the indicated detection threshold.
low_abundance(x, detection = 0.2/100)
low_abundance(x, detection = 0.2/100)
x |
|
detection |
Detection threshold for absence/presence (strictly greater by default). |
The low_abundance index gives the concentration of species at low abundance, or the relative proportion of rare species in [0,1]. The species that are below the indicated detection threshold are considered rare. Note that population prevalence is not considered. If the detection argument is a vector, then a data.frame is returned, one column for each detection threshold.
A vector of indicators.
Contact: Leo Lahti [email protected]
core_abundance, rarity, global
data(dietswap) d <- low_abundance(dietswap, detection=0.2/100)
data(dietswap) d <- low_abundance(dietswap, detection=0.2/100)
Map taxa between hierarchy levels.
map_levels(taxa = NULL, from, to, data)
map_levels(taxa = NULL, from, to, data)
taxa |
taxa to convert; if NULL then considering all taxa in the tax.table |
from |
convert from taxonomic level |
to |
convert to taxonomic level |
data |
Either a |
mappings
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(dietswap) m <- map_levels('Akkermansia', from='Genus', to='Phylum', tax_table(dietswap)) m <- map_levels('Verrucomicrobia', from='Phylum', to='Genus', tax_table(dietswap))
data(dietswap) m <- map_levels('Akkermansia', from='Genus', to='Phylum', tax_table(dietswap)) m <- map_levels('Verrucomicrobia', from='Phylum', to='Genus', tax_table(dietswap))
Merge taxonomic groups into a single group.
merge_taxa2(x, taxa = NULL, pattern = NULL, name = "Merged")
merge_taxa2(x, taxa = NULL, pattern = NULL, name = "Merged")
x |
|
taxa |
A vector of taxa names to merge. |
pattern |
Taxa that match this pattern will be merged. |
name |
Name of the merged group. |
In some cases it is necessary to place certain OTUs or other groups into an "other" category. For instance, unclassified groups. This wrapper makes this easy. This function differs from phyloseq::merge_taxa by the last two arguments. Here, in merge_taxa2 the user can specify the name of the new merged group. And the merging can be done based on common pattern in the name.
Modified phyloseq object
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(dietswap) s <- merge_taxa(dietswap, c())
data(dietswap) s <- merge_taxa(dietswap, c())
The output of the phyloseq::sample_data() function does not return data.frame, which is needed for many applications. This function retrieves the sample data as a data.frame
meta(x)
meta(x)
x |
a phyloseq object |
Sample metadata as a data.frame
Leo Lahti [email protected]
sample_data
in the phyloseq package
data(dietswap); df <- meta(dietswap)
data(dietswap); df <- meta(dietswap)
Multimodality score based on bootstrapped potential analysis.
multimodality( x, peak.threshold = 1, bw.adjust = 1, bs.iter = 100, min.density = 1, verbose = TRUE )
multimodality( x, peak.threshold = 1, bw.adjust = 1, bs.iter = 100, min.density = 1, verbose = TRUE )
x |
A vector, or data matrix (variables x samples) |
peak.threshold |
Mode detection threshold |
bw.adjust |
Bandwidth adjustment |
bs.iter |
Bootstrap iterations |
min.density |
minimum accepted density for a maximum; as a multiple of kernel height |
verbose |
Verbose |
Repeats potential analysis (Livina et al. 2010) multiple times with bootstrap sampling for each row of the input data (as in Lahti et al. 2014) and returns the specified results.
A list with following elements:
scoreFraction of bootstrap samples with multiple observed modes
nmodesThe most frequently observed number of modes in bootstrap
resultsFull results of potential_analysis for each row of the input matrix.
Leo Lahti [email protected]
Livina et al. (2010). Potential analysis reveals changing number of climate states during the last 60 kyr. Climate of the Past, 6, 77-82.
Lahti et al. (2014). Tipping elements of the human intestinal ecosystem. Nature Communications 5:4344.
#data(peerj32) #s <- multimodality(t(peerj32$microbes[, c('Akkermansia', 'Dialister')]))
#data(peerj32) #s <- multimodality(t(peerj32$microbes[, c('Akkermansia', 'Dialister')]))
Order matrix or phyloseq OTU table based on the neatmap approach.
neat( x, arrange = "both", method = "NMDS", distance = "bray", first.feature = NULL, first.sample = NULL, ... )
neat( x, arrange = "both", method = "NMDS", distance = "bray", first.feature = NULL, first.sample = NULL, ... )
x |
A matrix or phyloseq object. |
arrange |
Order 'features', 'samples' or 'both' (for matrices). For matrices, it is assumed that the samples are on the columns and features are on the rows. For phyloseq objects, features are the taxa of the OTU table. |
method |
Ordination method. Only NMDS implemented for now. |
distance |
Distance method. See |
first.feature |
Optionally provide the name of the first feature to start the ordering |
first.sample |
Optionally provide the name of the first sample to start the ordering |
... |
Arguments to pass. |
Borrows elements from the heatmap implementation in the phyloseq package. The row/column sorting is not available there as a separate function. Therefore I implemented this function to provide an independent method for easy sample/taxon reordering for phyloseq objects. The ordering is cyclic so we can start at any point. The choice of the first sample may somewhat affect the overall ordering
Sorted matrix
This function is partially based on code derived from the phyloseq package. However for the original neatmap approach for heatmap sorting, see (and cite): Rajaram, S., & Oono, Y. (2010). NeatMap–non-clustering heat map alternatives in R. BMC Bioinformatics, 11, 45.
data(peerj32) # Take subset to speed up example x <- peerj32$microbes[1:10,1:10] xo <- neat(x, 'both', method='NMDS', distance='bray')
data(peerj32) # Take subset to speed up example x <- peerj32$microbes[1:10,1:10] xo <- neat(x, 'both', method='NMDS', distance='bray')
Sort samples or features based on the neatmap approach.
neatsort(x, target, method = "NMDS", distance = "bray", first = NULL, ...)
neatsort(x, target, method = "NMDS", distance = "bray", first = NULL, ...)
x |
|
target |
For |
method |
Ordination method. See |
distance |
Distance method. See |
first |
Optionally provide the name of the first sample/taxon to start the ordering (the ordering is cyclic so we can start at any point). The choice of the first sample may somewhat affect the overall ordering. |
... |
Arguments to be passed. |
This function borrows elements from the heatmap implementation in the phyloseq package. The row/column sorting is there not available as a separate function at present, however, hindering reuse in other tools. Implemented in the microbiome package to provide an independent method for easy sample/taxon reordering for phyloseq objects.
Vector of ordered elements
This function is partially based on code derived from the phyloseq package. For the original neatmap approach for heatmap sorting, see (and cite): Rajaram, S., & Oono, Y. (2010). NeatMap–non-clustering heat map alternatives in R. BMC Bioinformatics, 11, 45.
data(peerj32) pseq <- peerj32$phyloseq # For Phyloseq sort.otu <- neatsort(pseq, target='species') # For matrix # sort.rows <- neatsort(abundances(pseq), target='rows')
data(peerj32) pseq <- peerj32$phyloseq # For Phyloseq sort.otu <- neatsort(pseq, target='species') # For matrix # sort.rows <- neatsort(abundances(pseq), target='rows')
Quantify microbiota 'overlap' between samples.
overlap(x, detection = 0)
overlap(x, detection = 0)
x |
|
detection |
Detection threshold. |
Overlap matrix
Contact: Leo Lahti [email protected]
Bashan, A., Gibson, T., Friedman, J. et al. Universality of human microbial dynamics. Nature 534, 259–262 (2016). https://doi.org/10.1038/nature18301
data(atlas1006) o <- overlap(atlas1006, detection = 0.1/100)
data(atlas1006) o <- overlap(atlas1006, detection = 0.1/100)
The peerj32 data set contains high-through profiling data from 389 human blood serum lipids and 130 intestinal genus-level bacteria from 44 samples (22 subjects from 2 time points; before and after probiotic/placebo intervention). The data set can be used to investigate associations between intestinal bacteria and host lipid metabolism. For details, see http://dx.doi.org/10.7717/peerj.32.
data(peerj32)
data(peerj32)
List of the following data matrices as described in detail in Lahti et al. (2013):
lipids: Quantification of 389 blood serum lipids across 44 samples
microbes: Quantification of 130 genus-like taxa across 44 samples
meta: Sample metadata including time point, sex, subjectID, sampleID and treatment group (probiotic LGG / Placebo)
phyloseq The microbiome data set converted into a
phyloseq-class
object.
Loads the data set in R.
Leo Lahti [email protected]
Lahti et al. (2013) PeerJ 1:e32 http://dx.doi.org/10.7717/peerj.32
Show all samples of a microbiota collection, colored by specific factor levels (x axis) and signal (y axis).
plot_atlas(pseq, x, y, ncol = 2)
plot_atlas(pseq, x, y, ncol = 2)
pseq |
phyloseq object |
x |
Sorting variable for X axis and sample coloring |
y |
Signal variable for Y axis |
ncol |
Number of legend columns. |
Arranges the samples based on the given grouping factor (x), and plots the signal (y) on the Y axis. The samples are randomly ordered within each factor level. The factor levels are ordered by standard deviation of the signal (y axis).
ggplot object
Leo Lahti [email protected]
See citation('microbiome'); Visualization inspired by Kilpinen et al. 2008, Genome Biology 9:R139. DOI: 10.1186/gb-2008-9-9-r139
data(atlas1006) p <- plot_atlas(atlas1006, 'DNA_extraction_method', 'diversity') p <- plot_atlas(atlas1006, 'DNA_extraction_method', 'Bifidobacterium')
data(atlas1006) p <- plot_atlas(atlas1006, 'DNA_extraction_method', 'diversity') p <- plot_atlas(atlas1006, 'DNA_extraction_method', 'Bifidobacterium')
Plot taxon abundance for samples.
plot_composition( x, sample.sort = NULL, otu.sort = NULL, x.label = "sample", plot.type = "barplot", verbose = FALSE, average_by = NULL, group_by = NULL, ... )
plot_composition( x, sample.sort = NULL, otu.sort = NULL, x.label = "sample", plot.type = "barplot", verbose = FALSE, average_by = NULL, group_by = NULL, ... )
x |
|
sample.sort |
Order samples. Various criteria are available:
|
otu.sort |
Order taxa. Same options as for the sample.sort argument but instead of metadata, taxonomic table is used. Also possible to sort by 'abundance'. |
x.label |
Specify how to label the x axis. This should be one of the variables in sample_variables(x). |
plot.type |
Plot type: 'barplot' or 'heatmap' |
verbose |
verbose
(but not in sample/taxon ordering).
The options are 'Z-OTU', 'Z-Sample', 'log10' and 'compositional'.
See the |
average_by |
Average the samples by the average_by variable |
group_by |
Group by this variable (in plot.type "barplot") |
... |
Arguments to be passed (for |
A ggplot
plot object.
library(dplyr) data(atlas1006) pseq <- atlas1006 %>% subset_samples(DNA_extraction_method == "r") %>% aggregate_taxa(level = "Phylum") %>% transform(transform = "compositional") p <- plot_composition(pseq, sample.sort = "Firmicutes", otu.sort = "abundance", verbose = TRUE) + scale_fill_manual(values = default_colors("Phylum")[taxa(pseq)])
library(dplyr) data(atlas1006) pseq <- atlas1006 %>% subset_samples(DNA_extraction_method == "r") %>% aggregate_taxa(level = "Phylum") %>% transform(transform = "compositional") p <- plot_composition(pseq, sample.sort = "Firmicutes", otu.sort = "abundance", verbose = TRUE) + scale_fill_manual(values = default_colors("Phylum")[taxa(pseq)])
Core visualization (2D).
plot_core( x, prevalences = seq(0.1, 1, 0.1), detections = 20, plot.type = "lineplot", colours = NULL, min.prevalence = NULL, taxa.order = NULL, horizontal = FALSE )
plot_core( x, prevalences = seq(0.1, 1, 0.1), detections = 20, plot.type = "lineplot", colours = NULL, min.prevalence = NULL, taxa.order = NULL, horizontal = FALSE )
x |
A |
prevalences |
a vector of prevalence percentages in [0,1] |
detections |
a vector of intensities around the data range, or a scalar indicating the number of intervals in the data range. |
plot.type |
Plot type ('lineplot' or 'heatmap') |
colours |
colours for the heatmap |
min.prevalence |
If minimum prevalence is set, then filter out those rows (taxa) and columns (detections) that never exceed this prevalence. This helps to zoom in on the actual core region of the heatmap. Only affects the plot.type='heatmap'. |
taxa.order |
Ordering of the taxa: a vector of names. |
horizontal |
Logical. Horizontal figure. |
A list with three elements: the ggplot object and the data. The data has a different form for the lineplot and heatmap. Finally, the applied parameters are returned.
Contact: Leo Lahti [email protected]
A Salonen et al. The adult intestinal core microbiota is determined by analysis depth and health status. Clinical Microbiology and Infection 18(S4):16 20, 2012. To cite the microbiome R package, see citation('microbiome')
data(dietswap) p <- plot_core(transform(dietswap, "compositional"), prevalences=seq(0.1, 1, .1), detections=seq(0.01, 1, length = 10))
data(dietswap) p <- plot_core(transform(dietswap, "compositional"), prevalences=seq(0.1, 1, .1), detections=seq(0.01, 1, length = 10))
Plot abundance density across samples for a given taxon.
plot_density( x, variable = NULL, log10 = FALSE, adjust = 1, kernel = "gaussian", trim = FALSE, na.rm = FALSE, fill = "gray", tipping.point = NULL, xlim = NULL )
plot_density( x, variable = NULL, log10 = FALSE, adjust = 1, kernel = "gaussian", trim = FALSE, na.rm = FALSE, fill = "gray", tipping.point = NULL, xlim = NULL )
x |
|
variable |
OTU or metadata variable to visualize |
log10 |
Logical. Show log10 abundances or not. |
adjust |
see stat_density |
kernel |
see stat_density |
trim |
see stat_density |
na.rm |
see stat_density |
fill |
Fill color |
tipping.point |
Optional. Indicate critical point for abundance variations to be highlighted. |
xlim |
X axis limits |
A ggplot
plot object.
# Load gut microbiota data on 1006 western adults # (see help(atlas1006) for references and details) data(dietswap) # Use compositional abundances instead of absolute signal pseq.rel <- transform(dietswap, 'compositional') # Population density for Dialister spp.; with log10 on the abundance (X) # axis library(ggplot2) p <- plot_density(pseq.rel, variable='Dialister') + scale_x_log10()
# Load gut microbiota data on 1006 western adults # (see help(atlas1006) for references and details) data(dietswap) # Use compositional abundances instead of absolute signal pseq.rel <- transform(dietswap, 'compositional') # Population density for Dialister spp.; with log10 on the abundance (X) # axis library(ggplot2) p <- plot_density(pseq.rel, variable='Dialister') + scale_x_log10()
Plot relative frequencies within each Group for the levels of the given factor.
plot_frequencies(x, Groups, Factor)
plot_frequencies(x, Groups, Factor)
x |
|
Groups |
Name of the grouping variable |
Factor |
Name of the frequency variable |
For table with the indicated frequencies, see the returned phyloseq object.
ggplot
plot object.
data(dietswap) p <- plot_frequencies(meta(dietswap), 'group', 'sex')
data(dietswap) p <- plot_frequencies(meta(dietswap), 'group', 'sex')
Wrapper for visualizing sample similarity landscape ie. sample density in various 2D projections.
plot_landscape( x, method = "PCoA", distance = "bray", transformation = "identity", col = NULL, main = NULL, x.ticks = 10, rounding = 0, add.points = TRUE, adjust = 1, size = 1, legend = FALSE, shading = TRUE, shading.low = "#ebf4f5", shading.high = "#e9b7ce", point.opacity = 0.75 )
plot_landscape( x, method = "PCoA", distance = "bray", transformation = "identity", col = NULL, main = NULL, x.ticks = 10, rounding = 0, add.points = TRUE, adjust = 1, size = 1, legend = FALSE, shading = TRUE, shading.low = "#ebf4f5", shading.high = "#e9b7ce", point.opacity = 0.75 )
x |
|
method |
Ordination method, see phyloseq::plot_ordination; or "PCA", or "t-SNE" (from the Rtsne package) |
distance |
Ordination distance, see phyloseq::plot_ordination; for method = "PCA", only euclidean distance is implemented now. |
transformation |
Transformation applied on the input object x |
col |
Variable name to highlight samples (points) with colors |
main |
title text |
x.ticks |
Number of ticks on the X axis |
rounding |
Rounding for X axis tick values |
add.points |
Plot the data points as well |
adjust |
Kernel width adjustment |
size |
point size |
legend |
plot legend TRUE/FALSE |
shading |
Add shading in the background. |
shading.low |
Color for shading low density regions |
shading.high |
Color for shading high density regions |
point.opacity |
Transparency-level for points |
For consistent results, set random seet (set.seed) before function call. Note that the distance and transformation arguments may have a drastic effect on the outputs.
A ggplot
plot object.
data(dietswap) # PCoA p <- plot_landscape(transform(dietswap, "compositional"), distance = "bray", method = "PCoA") p <- plot_landscape(dietswap, method = "t-SNE", distance = "bray", transformation = "compositional") # PCA p <- plot_landscape(dietswap, method = "PCA", transformation = "clr")
data(dietswap) # PCoA p <- plot_landscape(transform(dietswap, "compositional"), distance = "bray", method = "PCoA") p <- plot_landscape(dietswap, method = "t-SNE", distance = "bray", transformation = "compositional") # PCA p <- plot_landscape(dietswap, method = "PCA", transformation = "clr")
Draw regression curve with smoothed error bars with Visually-Weighted Regression by Solomon M. Hsiang; see http://www.fight-entropy.com/2012/07/visually-weighted-regression.html The R is modified from Felix Schonbrodt's original code http://www.nicebread.de/ visually-weighted-watercolor-plots-new-variants-please-vote
plot_regression( formula, data, B = 1000, shade = TRUE, shade.alpha = 0.1, spag = FALSE, mweight = TRUE, show.lm = FALSE, show.median = TRUE, median.col = "white", show.CI = FALSE, method = loess, slices = 200, ylim = NULL, quantize = "continuous", show.points = TRUE, color = NULL, pointsize = NULL, ... )
plot_regression( formula, data, B = 1000, shade = TRUE, shade.alpha = 0.1, spag = FALSE, mweight = TRUE, show.lm = FALSE, show.median = TRUE, median.col = "white", show.CI = FALSE, method = loess, slices = 200, ylim = NULL, quantize = "continuous", show.points = TRUE, color = NULL, pointsize = NULL, ... )
formula |
formula |
data |
data |
B |
number bootstrapped smoothers |
shade |
plot the shaded confidence region? |
shade.alpha |
shade.alpha: should the CI shading fade out at the edges? (by reducing alpha; 0=no alpha decrease, 0.1=medium alpha decrease, 0.5=strong alpha decrease) |
spag |
plot spaghetti lines? |
mweight |
visually weight the median smoother |
show.lm |
plot the linear regression line |
show.median |
show median smoother |
median.col |
median color |
show.CI |
should the 95% CI limits be plotted? |
method |
the fitting function for the spaghettis; default: loess |
slices |
number of slices in x and y direction for the shaded region. Higher numbers make a smoother plot, but takes longer to draw. I wouldn'T go beyond 500 |
ylim |
restrict range of the watercoloring |
quantize |
either 'continuous', or 'SD'. In the latter case, we get three color regions for 1, 2, and 3 SD (an idea of John Mashey) |
show.points |
Show points. |
color |
Point colors |
pointsize |
Point sizes |
... |
further parameters passed to the fitting function, in the case of loess, for example, 'span=.9', or 'family='symmetric” |
ggplot2 object
Based on the original version from F. Schonbrodt. Modified by Leo Lahti [email protected]
See citation('microbiome')
data(atlas1006) pseq <- subset_samples(atlas1006, DNA_extraction_method == 'r' & sex == "female" & nationality == "UKIE", B=10, slices=10 # non-default used here to speed up examples ) p <- plot_regression(diversity ~ age, meta(pseq)[1:20,], slices=10, B=10)
data(atlas1006) pseq <- subset_samples(atlas1006, DNA_extraction_method == 'r' & sex == "female" & nationality == "UKIE", B=10, slices=10 # non-default used here to speed up examples ) p <- plot_regression(diversity ~ age, meta(pseq)[1:20,], slices=10, B=10)
Create taxa prevalence plots at various taxonomic levels.
plot_taxa_prevalence(x, level, detection = 0)
plot_taxa_prevalence(x, level, detection = 0)
x |
|
level |
Phylum/Order/Class/Family |
detection |
Detection threshold for presence (prevalance) |
This helps to obtain first insights into how is the taxa distribution in the data. It also gives an idea about the taxonomic affiliation of rare and abundant taxa in the data. This may be helpful for data filtering or other downstream analysis.
A ggplot
plot object.
Sudarshan A. Shetty [email protected]
data(atlas1006) # Pick data subset just to speed up example p0 <- subset_samples(atlas1006, DNA_extraction_method == "r") p0 <- prune_taxa(taxa(p0)[grep("Bacteroides", taxa(p0))], p0) # Detection threshold (0 by default; higher especially with HITChip) p <- plot_taxa_prevalence(p0, 'Phylum', detection = 1) print(p)
data(atlas1006) # Pick data subset just to speed up example p0 <- subset_samples(atlas1006, DNA_extraction_method == "r") p0 <- prune_taxa(taxa(p0)[grep("Bacteroides", taxa(p0))], p0) # Detection threshold (0 by default; higher especially with HITChip) p <- plot_taxa_prevalence(p0, 'Phylum', detection = 1) print(p)
Plot variation in taxon abundance for many subjects.
plot_tipping( x, taxon, tipping.point = NULL, lims = NULL, shift = 0.001, xlim = NULL )
plot_tipping( x, taxon, tipping.point = NULL, lims = NULL, shift = 0.001, xlim = NULL )
x |
|
taxon |
Taxonomic group to visualize. |
tipping.point |
Optional. Indicate critical point for abundance variations to be highlighted. |
lims |
Optional. Figure X axis limits. |
shift |
Small constant to avoid problems with zeroes in log10 |
xlim |
Horizontal axis limits |
Assuming the sample_data(x) has 'subject' field and some subjects have multiple time points.
ggplot
object
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(atlas1006) pseq <- subset_samples(atlas1006, DNA_extraction_method == 'r') pseq <- transform(pseq, 'compositional') p <- plot_tipping(pseq, 'Dialister', tipping.point=1)
data(atlas1006) pseq <- subset_samples(atlas1006, DNA_extraction_method == 'r') pseq <- transform(pseq, 'compositional') p <- plot_tipping(pseq, 'Dialister', tipping.point=1)
Analysis of multimodality based on bootstrapped potential analysis of Livina et al. (2010) as described in Lahti et al. (2014).
potential_analysis( x, peak.threshold = 0, bw.adjust = 1, bs.iter = 100, min.density = 1 )
potential_analysis( x, peak.threshold = 0, bw.adjust = 1, bs.iter = 100, min.density = 1 )
x |
Input data vector |
peak.threshold |
Mode detection threshold |
bw.adjust |
Bandwidth adjustment |
bs.iter |
Bootstrap iterations |
min.density |
minimum accepted density for a maximum; as a multiple of kernel height |
List with following elements:
modesNumber of modes for the input data vector (the most frequent number of modes from bootstrap)
minimaAverage of potential minima across the bootstrap samples (for the most frequent number of modes)
maximaAverage of potential maxima across the bootstrap samples (for the most frequent number of modes)
unimodality.supportFraction of bootstrap samples exhibiting unimodality
bwsBandwidths
Livina et al. (2010). Potential analysis reveals changing number of climate states during the last 60 kyr. Climate of the Past, 6, 77-82.
Lahti et al. (2014). Tipping elements of the human intestinal ecosystem. Nature Communications 5:4344.
plot_potential
# Example data; see help(peerj32) for details data(peerj32) # Log10 abundance of Dialister x <- abundances(transform(peerj32$phyloseq, "clr"))['Dialister',] # Bootstrapped potential analysis # In practice, use more bootstrap iterations # res <- potential_analysis(x, peak.threshold=0, bw.adjust=1, # bs.iter=9, min.density=1)
# Example data; see help(peerj32) for details data(peerj32) # Log10 abundance of Dialister x <- abundances(transform(peerj32$phyloseq, "clr"))['Dialister',] # Bootstrapped potential analysis # In practice, use more bootstrap iterations # res <- potential_analysis(x, peak.threshold=0, bw.adjust=1, # bs.iter=9, min.density=1)
One-dimensional potential estimation for univariate timeseries.
potential_univariate( x, std = 1, bw = "nrd", weights = c(), grid.size = NULL, peak.threshold = 1, bw.adjust = 1, density.smoothing = 0, min.density = 1 )
potential_univariate( x, std = 1, bw = "nrd", weights = c(), grid.size = NULL, peak.threshold = 1, bw.adjust = 1, density.smoothing = 0, min.density = 1 )
x |
Univariate data (vector) for which the potentials shall be estimated |
std |
Standard deviation of the noise (defaults to 1; this will set scaled potentials) |
bw |
kernel bandwidth estimation method |
weights |
optional weights in ksdensity (used by potential_slidingaverages). |
grid.size |
Grid size for potential estimation. of density kernel height dnorm(0, sd=bandwidth)/N |
peak.threshold |
Mode detection threshold |
bw.adjust |
The real bandwidth will be bw.adjust*bw; defaults to 1 |
density.smoothing |
Add a small constant density across the whole observation range to regularize density estimation (and to avoid zero probabilities within the observation range). This parameter adds uniform density across the observation range, scaled by density.smoothing. |
min.density |
minimum accepted density for a maximum; as a multiple of kernel height |
potential_univariate
returns a list with the
following elements:
xi the grid of points on which the potential is estimated
pot The estimated potential: -log(f)*std^2/2, where f is the density.
density Density estimate corresponding to the potential.
min.inds indices of the grid points at which the density has minimum values; (-potentials; neglecting local optima)
max.inds indices the grid points at which the density has maximum values; (-potentials; neglecting local optima)
bw bandwidth of kernel used
min.points grid point values at which the density has minimum values; (-potentials; neglecting local optima)
max.points grid point values at which the density has maximum values; (-potentials; neglecting local optima)
Based on Matlab code from Egbert van Nes modified by Leo Lahti. Extended from the initial version in the earlywarnings R package.
Livina et al. (2010). Potential analysis reveals changing number of climate states during the last 60 kyr. Climate of the Past, 6, 77-82.
Lahti et al. (2014). Tipping elements of the human intestinal ecosystem. Nature Communications 5:4344.
# res <- potential_univariate(x)
# res <- potential_univariate(x)
Simple prevalence measure.
prevalence( x, detection = 0, sort = FALSE, count = FALSE, include.lowest = FALSE )
prevalence( x, detection = 0, sort = FALSE, count = FALSE, include.lowest = FALSE )
x |
A vector, data matrix or |
detection |
Detection threshold for absence/presence (strictly greater by default). |
sort |
Sort the groups by prevalence |
count |
Logical. Indicate prevalence as fraction of samples (in percentage [0, 1]; default); or in absolute counts indicating the number of samples where the OTU is detected (strictly) above the given abundance threshold. |
include.lowest |
Include the lower boundary of the detection and prevalence cutoffs. FALSE by default. |
For vectors, calculates the fraction (count=FALSE) or number (count=TRUE) of samples that exceed the detection. For matrices, calculates this for each matrix column. For phyloseq objects, calculates this for each OTU. The relative prevalence (count=FALSE) is simply the absolute prevalence (count=TRUE) divided by the number of samples.
For each OTU, the fraction of samples where a given OTU is detected. The output is readily given as a percentage.
Contact: Leo Lahti [email protected]
A Salonen et al. The adult intestinal core microbiota is determined by analysis depth and health status. Clinical Microbiology and Infection 18(S4):16 20, 2012. To cite the microbiome R package, see citation('microbiome')
data(peerj32) pr <- prevalence(peerj32$phyloseq, detection=0, sort=TRUE, count=TRUE)
data(peerj32) pr <- prevalence(peerj32$phyloseq, detection=0, sort=TRUE, count=TRUE)
phyloseq-class
object to long data formatAn alternative to psmelt function from
phyloseq-class
object.
psmelt2(x, sample.column = NULL, feature.column = NULL)
psmelt2(x, sample.column = NULL, feature.column = NULL)
x |
|
sample.column |
A single character string specifying name of the column to hold sample names. |
feature.column |
A single character string specifying name of the column to hold OTU or ASV names. |
A tibble
in long format
Contact: Sudarshan A. Shetty [email protected]
data("dietswap") ps.melt <- psmelt2(dietswap, sample.column="SampleID", feature.column="Feature") head(ps.melt)
data("dietswap") ps.melt <- psmelt2(dietswap, sample.column="SampleID", feature.column="Feature") head(ps.melt)
Filter the phyloseq object to include only rare (non-core) taxa.
rare(x, detection, prevalence, include.lowest = FALSE, ...)
rare(x, detection, prevalence, include.lowest = FALSE, ...)
x |
|
detection |
Detection threshold for absence/presence (strictly greater by default). |
prevalence |
Prevalence threshold (in [0, 1]; strictly greater by default) |
include.lowest |
Include the lower boundary of the detection and prevalence cutoffs in core calculation. FALSE by default. |
... |
Arguments to pass. |
Filtered phyloseq object including only rare taxa
Contact: Leo Lahti [email protected]
Salonen A, Salojarvi J, Lahti L, de Vos WM. The adult intestinal core microbiota is determined by analysis depth and health status. Clinical Microbiology and Infection 18(S4):16-20, 2012 To cite the microbiome R package, see citation('microbiome')
core_members
data(dietswap) # Detection threshold 0 (strictly greater by default); # Prevalence threshold 50 percent (strictly greater by default) pseq <- rare(dietswap, 0, 50/100)
data(dietswap) # Detection threshold 0 (strictly greater by default); # Prevalence threshold 50 percent (strictly greater by default) pseq <- rare(dietswap, 0, 50/100)
Calculates the rare abundance community index.
rare_abundance( x, detection = 0.1/100, prevalence = 50/100, include.lowest = FALSE )
rare_abundance( x, detection = 0.1/100, prevalence = 50/100, include.lowest = FALSE )
x |
|
detection |
Detection threshold for absence/presence (strictly greater by default). |
prevalence |
Prevalence threshold (in [0, 1]). The required prevalence is strictly greater by default. To include the limit, set include.lowest to TRUE. |
include.lowest |
Include the lower boundary of the detection and prevalence cutoffs. FALSE by default. |
This index gives the relative proportion of rare species (ie. those that are not part of the core microbiota) in the interval [0,1]. This is the complement (1-x) of the core abundance. The rarity function provides the abundance of the least abundant taxa within each sample, regardless of the population prevalence.
A vector of indices
Contact: Leo Lahti [email protected]
core_abundance, rarity, diversity
data(dietswap) d <- rare_abundance(dietswap, detection=0.1/100, prevalence=50/100)
data(dietswap) d <- rare_abundance(dietswap, detection=0.1/100, prevalence=50/100)
Determine members of the rare microbiota with given abundance and prevalence threshold.
rare_members(x, detection = 1/100, prevalence = 50/100, include.lowest = FALSE)
rare_members(x, detection = 1/100, prevalence = 50/100, include.lowest = FALSE)
x |
|
detection |
Detection threshold for absence/presence (strictly greater by default). |
prevalence |
Prevalence threshold (in [0, 1]). The required prevalence is strictly greater by default. To include the limit, set include.lowest to TRUE. |
include.lowest |
Include the lower boundary of the detection and prevalence cutoffs. FALSE by default. |
For phyloseq object, lists taxa that are less prevalent than the given prevalence threshold. Optionally, never exceeds the given abundance threshold (by default, all abundanecs accepted). For matrix, lists columns that satisfy these criteria.
Vector of rare taxa
Leo Lahti [email protected]
To cite the microbiome R package, see citation('microbiome')
core_members
data(dietswap) # Detection threshold: the taxa never exceed the given detection threshold # Prevalence threshold 20 percent (strictly greater by default) a <- rare_members(dietswap, detection=100/100, prevalence=20/100)
data(dietswap) # Detection threshold: the taxa never exceed the given detection threshold # Prevalence threshold 20 percent (strictly greater by default) a <- rare_members(dietswap, detection=100/100, prevalence=20/100)
Calculates the community rarity index.
rarity(x, index = "all", detection = 0.2/100, prevalence = 20/100)
rarity(x, index = "all", detection = 0.2/100, prevalence = 20/100)
x |
|
index |
If the index is given, it will override the other parameters. See the details below for description and references of the standard rarity indices. |
detection |
Detection threshold for absence/presence (strictly greater by default). |
prevalence |
Prevalence threshold (in [0, 1]). The required prevalence is strictly greater by default. To include the limit, set include.lowest to TRUE. |
The rarity index characterizes the concentration of species at low abundance.
The following rarity indices are provided:
log_modulo_skewness Quantifies the concentration of the least abundant species by the log-modulo skewness of the arithmetic abundance classes (see Magurran & McGill 2011). These are typically right-skewed; to avoid taking log of occasional negative skews, we follow Locey & Lennon (2016) and use the log-modulo transformation that adds a value of one to each measure of skewness to allow logarithmization. The values q=0.5 and n=50 are used here.
low_abundance Relative proportion of the least abundant species, below the detection level of 0.2%. The least abundant species are determined separately for each sample regardless of their prevalence.
rare_abundance Relative proportion of the non-core species, exceed the given detection level (default 20 at the given prevalence (default 20 This is complement of the core with the same thresholds.
A vector of rarity indices
Contact: Leo Lahti [email protected]
Kenneth J. Locey and Jay T. Lennon. Scaling laws predict global microbial diversity. PNAS 2016 113 (21) 5970-5975; doi:10.1073/pnas.1521291113.
Magurran AE, McGill BJ, eds (2011) Biological Diversity: Frontiers in Measurement and Assessment (Oxford Univ Press, Oxford), Vol 12
alpha, log_modulo_skewness, rare_abundance, low_abundance
data(dietswap) d <- rarity(dietswap, index='low_abundance') # d <- rarity(dietswap, index='all')
data(dietswap) d <- rarity(dietswap, index='low_abundance') # d <- rarity(dietswap, index='all')
Read biom and mapping files into a phyloseq-class
object.
read_biom2phyloseq( biom.file = NULL, taxonomy.file = NULL, metadata.file = NULL, ... )
read_biom2phyloseq( biom.file = NULL, taxonomy.file = NULL, metadata.file = NULL, ... )
biom.file |
A biom file with '.biom' extension |
taxonomy.file |
NULL the latest version has taxonomic information within the biom |
metadata.file |
A simple metadata/mapping file with .csv extension |
... |
Arguments to pass for import_biom |
Biom file and mapping files will be converted to
phyloseq-class
.
phyloseq-class
object.
Sudarshan A. Shetty [email protected]
p0 <- read_biom2phyloseq() #biom.file <- qiita1629.biom" #meta.file <- qiita1629_mapping.csv" #p0 <- read_biom2phyloseq(biom.file = biom.file, # metadata.file = meta.file, # taxonomy.file = NULL)
p0 <- read_biom2phyloseq() #biom.file <- qiita1629.biom" #meta.file <- qiita1629_mapping.csv" #p0 <- read_biom2phyloseq(biom.file = biom.file, # metadata.file = meta.file, # taxonomy.file = NULL)
Read simple OTU tables, mapping and taxonomy files into a
phyloseq-class
object.
read_csv2phyloseq( otu.file = NULL, taxonomy.file = NULL, metadata.file = NULL, sep = "," )
read_csv2phyloseq( otu.file = NULL, taxonomy.file = NULL, metadata.file = NULL, sep = "," )
otu.file |
A simple otu_table with '.csv' extension |
taxonomy.file |
A simple taxonomy file with '.csv' extension |
metadata.file |
A simple metadata/mapping file with .csv extension |
sep |
CSV file separator |
Simple OTU tables, mapping and taxonomy files will be converted
to phyloseq-class
.
phyloseq-class
object.
Sudarshan A. Shetty [email protected]
# NOTE: the system.file command reads these example files from the # microbiome R package. To use your own local files, simply write # otu.file <- "/path/to/my/file.csv" etc. #otu.file <- # system.file("extdata/qiita1629_otu_table.csv", # package='microbiome') #tax.file <- system.file("extdata/qiita1629_taxonomy_table.csv", # package='microbiome') #meta.file <- system.file("extdata/qiita1629_mapping_subset.csv", # package='microbiome') #p0 <- read_csv2phyloseq( # otu.file=otu.file, # taxonomy.file=tax.file, # metadata.file=meta.file)
# NOTE: the system.file command reads these example files from the # microbiome R package. To use your own local files, simply write # otu.file <- "/path/to/my/file.csv" etc. #otu.file <- # system.file("extdata/qiita1629_otu_table.csv", # package='microbiome') #tax.file <- system.file("extdata/qiita1629_taxonomy_table.csv", # package='microbiome') #meta.file <- system.file("extdata/qiita1629_mapping_subset.csv", # package='microbiome') #p0 <- read_csv2phyloseq( # otu.file=otu.file, # taxonomy.file=tax.file, # metadata.file=meta.file)
Read mothur shared and consensus taxonomy files into a
phyloseq-class
object.
read_mothur2phyloseq(shared.file, consensus.taxonomy.file, mapping.file = NULL)
read_mothur2phyloseq(shared.file, consensus.taxonomy.file, mapping.file = NULL)
shared.file |
A shared file produced by mothur. Identified from the .shared extension |
consensus.taxonomy.file |
Consensus taxonomy file produced by mothur. Identified from with the .taxonomy extension. See http://www.mothur.org/wiki/ConTaxonomy_file. |
mapping.file |
Metadata/mapping file with .csv extension |
Mothur shared and consensus taxonomy files will be converted to
phyloseq-class
.
phyloseq-class
object.
Sudarshan A. Shetty [email protected]
#otu.file <- system.file( #"extdata/Baxter_FITs_Microbiome_2016_fit.final.tx.1.subsample.shared", # package='microbiome') #tax.file <- system.file( #"extdata/Baxter_FITs_Microbiome_2016_fit.final.tx.1.cons.taxonomy", # package='microbiome') #meta.file <- system.file( #"extdata/Baxter_FITs_Microbiome_2016_mapping.csv", # package='microbiome') #p0 <- read_mothur2phyloseq( # shared.file=otu.file, # consensus.taxonomy.file=tax.file, # mapping.file=meta.file)
#otu.file <- system.file( #"extdata/Baxter_FITs_Microbiome_2016_fit.final.tx.1.subsample.shared", # package='microbiome') #tax.file <- system.file( #"extdata/Baxter_FITs_Microbiome_2016_fit.final.tx.1.cons.taxonomy", # package='microbiome') #meta.file <- system.file( #"extdata/Baxter_FITs_Microbiome_2016_mapping.csv", # package='microbiome') #p0 <- read_mothur2phyloseq( # shared.file=otu.file, # consensus.taxonomy.file=tax.file, # mapping.file=meta.file)
Read the otu, taxonomy and metadata from various formats.
read_phyloseq( otu.file = NULL, taxonomy.file = NULL, metadata.file = NULL, type = c("simple", "mothur", "biom"), sep = "," )
read_phyloseq( otu.file = NULL, taxonomy.file = NULL, metadata.file = NULL, type = c("simple", "mothur", "biom"), sep = "," )
otu.file |
File containing the OTU table (for mothur this is the file with the .shared extension) |
taxonomy.file |
(for mothur this is typically the consensus taxonomy file with the .taxonomy extension) |
metadata.file |
File containing samples x variables metadata. |
type |
Input data type: 'mothur' or 'simple' or 'biom' type. |
sep |
CSV file separator |
See help(read_mothur2phyloseq) for details on the Mothur input format; and help(read_biom2phyloseq) for details on the biom format. The simple format refers to the set of CSV files.
phyloseq-class
object
Sudarshan A. Shetty [email protected]
# pseq <- read_phyloseq(otu.file, # taxonomy.file, # metadata.file, # type=c('mothur', 'simple', 'biom'))
# pseq <- read_phyloseq(otu.file, # taxonomy.file, # metadata.file, # type=c('mothur', 'simple', 'biom'))
Total Read Count
readcount(x)
readcount(x)
x |
|
Vector of read counts.
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(dietswap) d <- readcount(dietswap)
data(dietswap) d <- readcount(dietswap)
Filter out selected samples from a phyloseq object.
remove_samples(samples = NULL, x)
remove_samples(samples = NULL, x)
samples |
Names of samples to be removed. |
x |
|
This complements the phyloseq function prune_samples by providing a way to exclude given groups from a phyloseq object.
Filtered phyloseq object.
Contact: Leo Lahti [email protected]
To cite the microbiome R package, see citation('microbiome')
phyloseq::prune_samples, phyloseq::subset_samples
data(dietswap) pseq <- remove_samples(c("Sample-182", "Sample-222"), dietswap)
data(dietswap) pseq <- remove_samples(c("Sample-182", "Sample-222"), dietswap)
Filter out selected taxa from a phyloseq object.
remove_taxa(taxa = NULL, x)
remove_taxa(taxa = NULL, x)
taxa |
Names of taxa to be removed. |
x |
|
This complements the phyloseq function prune_taxa by providing a way to exclude given groups from a phyloseq object.
Filtered phyloseq object.
Contact: Leo Lahti [email protected]
To cite the microbiome R package, see citation('microbiome')
phyloseq::prune_taxa, phyloseq::subset_taxa
data(dietswap) pseq <- remove_taxa(c("Akkermansia", "Dialister"), dietswap)
data(dietswap) pseq <- remove_taxa(c("Akkermansia", "Dialister"), dietswap)
Community richness index.
richness(x, index = c("observed", "chao1"), detection = 0)
richness(x, index = c("observed", "chao1"), detection = 0)
x |
A species abundance vector, or matrix (taxa/features x samples)
with the absolute count data (no relative abundances), or
|
index |
"observed" or "chao1" |
detection |
Detection threshold. Used for the "observed" index. |
By default, returns the richness for multiple detection thresholds defined by the data quantiles. If the detection argument is provided, returns richness with that detection threshold. The "observed" richness corresponds to index="observed", detection=0.
A vector of richness indices
Contact: Leo Lahti [email protected]
alpha
data(dietswap) d <- richness(dietswap, detection=0)
data(dietswap) d <- richness(dietswap, detection=0)
Visualize abundance spread for OTUs
spreadplot(x, trunc = 0.001/100, alpha = 0.15, width = 0.35)
spreadplot(x, trunc = 0.001/100, alpha = 0.15, width = 0.35)
x |
|
trunc |
Truncate abundances lower than this to zero |
alpha |
Alpha level for point transparency |
width |
Width for point spread |
ggplot2 object
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(dietswap) p <- spreadplot(transform(dietswap, "compositional"))
data(dietswap) p <- spreadplot(transform(dietswap, "compositional"))
Prints basic information of data.
summarize_phyloseq(x)
summarize_phyloseq(x)
x |
Input is a |
The summarize_phyloseq function will give information on weather data is compositional or not, reads (min. max, median, average), sparsity, presence of singletons and sample variables.
Prints basic information of phyloseq-class
object.
Contact: Sudarshan A. Shetty [email protected]
data(dietswap) summarize_phyloseq(dietswap)
data(dietswap) summarize_phyloseq(dietswap)
List the names of taxonomic groups in a phyloseq object.
taxa(x)
taxa(x)
x |
|
A handy shortcut for phyloseq::taxa_names, with a potential to add to add some extra tweaks later.
A vector with taxon names.
Contact: Leo Lahti [email protected]
To cite the microbiome R package, see citation('microbiome')
data(dietswap) x <- taxa(dietswap)
data(dietswap) x <- taxa(dietswap)
phyloseq-class
Slots to TibblesUtility to convert phyloseq slots to tibbles.
otu_tibble(x, column.id = "FeatureID") tax_tibble(x, column.id = "FeatureID") sample_tibble(x, column.id = "SampleID") combine_otu_tax(x, column.id = "FeatureID")
otu_tibble(x, column.id = "FeatureID") tax_tibble(x, column.id = "FeatureID") sample_tibble(x, column.id = "SampleID") combine_otu_tax(x, column.id = "FeatureID")
x |
|
column.id |
Provide name for the column which will hold the rownames. of slot. |
Convert different phyloseq
slots into tibbles.
otu_tibble
gets the otu_table in tibble format.
tax_tibble
gets the taxa_table in tibble format.
combine_otu_tax
combines otu_table and taxa_table into one tibble.
A tibble
Contact: Sudarshan A. Shetty [email protected]
library(microbiome) data("dietswap") otu_tib <- otu_tibble(dietswap,column.id="FeatureID") tax_tib <- tax_tibble(dietswap,column.id="FeatureID") sample_tib <- sample_tibble(dietswap,column.id="SampleID") otu_tax <- combine_otu_tax(dietswap,column.id = "FeatureID") head(otu_tax)
library(microbiome) data("dietswap") otu_tib <- otu_tibble(dietswap,column.id="FeatureID") tax_tib <- tax_tibble(dietswap,column.id="FeatureID") sample_tib <- sample_tibble(dietswap,column.id="SampleID") otu_tax <- combine_otu_tax(dietswap,column.id = "FeatureID") head(otu_tax)
Shift the time field in phyloseq sample_data such that the first time point of each subject is always 0.
time_normalize(x)
time_normalize(x)
x |
phyloseq object. The sample_data(x) should contain the following fields: subject, time |
Phyloseq object with a normalized time field
data(peerj32) pseq <- time_normalize(peerj32$phyloseq)
data(peerj32) pseq <- time_normalize(peerj32$phyloseq)
Within each subject, sort samples by time and calculate distance from the baseline point (minimum time).
time_sort(x)
time_sort(x)
x |
A metadata data.frame including the following columns: time, subject, sample, signal. Or a phyloseq object. |
A list with sorted metadata (data.frame) for each subject.
Leo Lahti [email protected]
See citation('microbiome')
data(atlas1006) pseq <- subset_samples(atlas1006, DNA_extraction_method == "r") ts <- time_sort(meta(pseq))
data(atlas1006) pseq <- subset_samples(atlas1006, DNA_extraction_method == "r") ts <- time_sort(meta(pseq))
For each subject, return temporally consecutive sample pairs together with the corresponding time difference.
timesplit(x)
timesplit(x)
x |
phyloseq object. |
data.frame
Leo Lahti [email protected]
data(atlas1006) x <- timesplit(subset_samples(atlas1006, DNA_extraction_method == 'r' & sex == "male"))
data(atlas1006) x <- timesplit(subset_samples(atlas1006, DNA_extraction_method == 'r' & sex == "male"))
Identify top entries in a vector or given field in data frame.
top( x, field = NULL, n = NULL, output = "vector", round = NULL, na.rm = FALSE, include.rank = FALSE )
top( x, field = NULL, n = NULL, output = "vector", round = NULL, na.rm = FALSE, include.rank = FALSE )
x |
data.frame, matrix, or vector |
field |
Field or column to check for a data.frame or matrix |
n |
Number of top entries to show |
output |
Output format: vector or data.frame |
round |
Optional rounding |
na.rm |
Logical. Remove NA before calculating the statistics. |
include.rank |
Include ranking if the output is data.frame. Logical. |
Vector of top counts, named by the corresponding entries
Leo Lahti [email protected]
See citation("bibliographica")
data(dietswap) p <- top(meta(dietswap), "group", 10)
data(dietswap) p <- top(meta(dietswap), "group", 10)
Return n most abundant taxa (based on total abundance over all samples), sorted by abundance
top_taxa(x, n = ntaxa(x))
top_taxa(x, n = ntaxa(x))
x |
phyloseq object |
n |
Number of top taxa to return (default: all) |
Character vector listing the top taxa
data(dietswap) topx <- top_taxa(dietswap, n=10)
data(dietswap) topx <- top_taxa(dietswap, n=10)
Standard transforms for phyloseq-class
.
transform( x, transform = "identity", target = "OTU", shift = 0, scale = 1, log10 = TRUE, reference = 1, ... )
transform( x, transform = "identity", target = "OTU", shift = 0, scale = 1, log10 = TRUE, reference = 1, ... )
x |
|
transform |
Transformation to apply. The options include: 'compositional' (ie relative abundance), 'Z', 'log10', 'log10p', 'hellinger', 'identity', 'clr', 'alr', or any method from the vegan::decostand function. |
target |
Apply the transform for 'sample' or 'OTU'. Does not affect the log transform. |
shift |
A constant indicating how much to shift the baseline abundance (in transform='shift') |
scale |
Scaling constant for the abundance values when transform = "scale". |
log10 |
Used only for Z transformation. Apply log10 before Z. |
reference |
Reference feature for the alr transformation. |
... |
arguments to be passed |
In transformation typ, the 'compositional' abundances are returned as relative abundances in [0, 1] (convert to percentages by multiplying with a factor of 100). The Hellinger transform is square root of the relative abundance but instead given at the scale [0,1]. The log10p transformation refers to log10(1 + x). The log10 transformation is applied as log10(1 + x) if the data contains zeroes. CLR transform applies a pseudocount of min(relative abundance)/2 to exact zero relative abundance entries in OTU table before taking logs.
Transformed phyloseq
object
data(dietswap) x <- dietswap # No transformation xt <- transform(x, 'identity') # OTU relative abundances # xt <- transform(x, 'compositional') # Z-transform for OTUs # xt <- transform(x, 'Z', 'OTU') # Z-transform for samples # xt <- transform(x, 'Z', 'sample') # Log10 transform (log10(1+x) if the data contains zeroes) # xt <- transform(x, 'log10') # Log10p transform (log10(1+x) always) # xt <- transform(x, 'log10p') # CLR transform # Note that small pseudocount is added if data contains zeroes xt <- microbiome::transform(x, 'clr') # ALR transform # The pseudocount must be specified explicitly # The reference feature is 1 by default xt <- microbiome::transform(x, 'alr', shift=1, reference=1) # Shift the baseline # xt <- transform(x, 'shift', shift=1) # Scale # xt <- transform(x, 'scale', scale=1)
data(dietswap) x <- dietswap # No transformation xt <- transform(x, 'identity') # OTU relative abundances # xt <- transform(x, 'compositional') # Z-transform for OTUs # xt <- transform(x, 'Z', 'OTU') # Z-transform for samples # xt <- transform(x, 'Z', 'sample') # Log10 transform (log10(1+x) if the data contains zeroes) # xt <- transform(x, 'log10') # Log10p transform (log10(1+x) always) # xt <- transform(x, 'log10p') # CLR transform # Note that small pseudocount is added if data contains zeroes xt <- microbiome::transform(x, 'clr') # ALR transform # The pseudocount must be specified explicitly # The reference feature is 1 by default xt <- microbiome::transform(x, 'alr', shift=1, reference=1) # Shift the baseline # xt <- transform(x, 'shift', shift=1) # Scale # xt <- transform(x, 'scale', scale=1)