Package 'microSTASIS' reference manual

Title:	Microbiota STability ASsessment via Iterative cluStering
Description:	The toolkit 'µSTASIS', or microSTASIS, has been developed for the stability analysis of microbiota in a temporal framework by leveraging on iterative clustering. Concretely, the core function uses Hartigan-Wong k-means algorithm as many times as possible for stressing out paired samples from the same individuals to test if they remain together for multiple numbers of clusters over a whole data set of individuals. Moreover, the package includes multiple functions to subset samples from paired times, validate the results or visualize the output.
Authors:	Pedro Sánchez-Sánchez [aut, cre] , Alfonso Benítez-Páez [aut]
Maintainer:	Pedro Sánchez-Sánchez <[email protected]>
License:	GPL-3
Version:	1.7.0
Built:	2025-03-29 05:03:51 UTC
Source:	https://github.com/bioc/microSTASIS

Detected ASV from multiple individuals at four different sampling times.

Description

A dataset containing the amplicon sequence variants of 131 samples from the gut microbiota of 43 individuals. The values are transformed from counts by applying centred log-transformation (CLR).

Usage

data(clr)
data(clr)

Format

A data.frame with 131 rows and 226 variables

References

Gloria M. Agudelo-Ochoa, Beatriz E. Valdés-Duque, Nubia A. Giraldo-Giraldo, Ana M. Jaillier-Ramírez, Adriana Giraldo-Villa, Irene Acevedo-Castaño, Mónica A. Yepes-Molina, Janeth Barbosa-Barbosa, Alfonso Benítez-Paéz, Gut microbiota profiles in critically ill patients, potential biomarkers and risk variables for sepsis, Gut Microbes, Volume 12, Issue 1, January 2020, https://doi.org/10.1080/19490976.2019.1707610

Pedro Sánchez-Sánchez, Francisco J Santonja, Alfonso Benítez-Páez, Assessment of human microbiota stability across longitudinal samples using iteratively growing-partitioned clustering, Briefings in Bioinformatics, Volume 23, Issue 2, March 2022, bbac055, https://doi.org/10.1093/bib/bbac055

Stability of individuals after iteratively performing Hartigan-Wong k-means clustering.

Description

Perform Hartigan-Wong stats::kmeans() algorithm as many times as possible. The values of k are from 2 to the number of samples minus 1. Those individuals whose paired samples are clustered under the same label sum 1. If paired samples are in different clusters, then sum 0, except when the euclidean distance between them is smaller to the ones of each sample to its centroid. This is done for all possible values of k and, finally, divided the sum by k, so obtaining a value between 0 and 1.

Usage

iterativeClustering(
  pairedTimes,
  BPPARAM = BiocParallel::bpparam(),
  common = "_"
)
iterativeClustering(
  pairedTimes,
  BPPARAM = BiocParallel::bpparam(),
  common = "_"
)

Arguments

`pairedTimes`	list of matrices with paired times, i.e. samples to be stressed to multiple iterations. Output of `pairedTimes()`.
`BPPARAM`	supply a `BiocParallel` parameters object, e.g. `BiocParallel::SerialParam()` in the specific case of Windows OS or `BiocParallel::bpparam()`.
`common`	pattern that separates the ID and the sampling time.

Value

µSTASIS stability score (mS) for the individuals from the corresponding paired times.

Examples

data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")

Cross validation of the iterative Hartigan-Wong k-means clustering.

Description

Perform cross validation of the stability results from iterativeClustering()in the way of leave-one-out (LOO) or leave-k-out (understood as quitting k individuals each time for calculating the metric over individuals).

Usage

iterativeClusteringCV(
  pairedTimes,
  results,
  name,
  common = "_",
  k = 1L,
  BPPARAM = BiocParallel::bpparam()
)
iterativeClusteringCV(
  pairedTimes,
  results,
  name,
  common = "_",
  k = 1L,
  BPPARAM = BiocParallel::bpparam()
)

Arguments

`pairedTimes`	list of matrices with paired times, i.e. samples to be stressed to multiple iterations. Output of `pairedTimes()`.
`results`	the list output of `iterativeClustering()`.
`name`	character; name of the paired times whose stability is being assessed.
`common`	pattern that separates the ID and the sampling time.
`k`	integer; number of individuals to remove from the data for each time running `iterativeClustering()`.
`BPPARAM`	supply a `BiocParallel` parameters object, e.g. `BiocParallel::SerialParam()` in the specific case of Windows OS or `BiocParallel::bpparam()`.

Value

Multiple lists with multiple objects of class "kmeans".

Examples

data(clr)
times <- pairedTimes(data = clr[, 1:20], sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
cv_klist_t1_t25_k2 <- iterativeClusteringCV(pairedTimes = times, 
                                            results = mS, name = "t1_t25",
                                            common = "_0_", k = 2L)
data(clr)
times <- pairedTimes(data = clr[, 1:20], sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
cv_klist_t1_t25_k2 <- iterativeClusteringCV(pairedTimes = times, 
                                            results = mS, name = "t1_t25",
                                            common = "_0_", k = 2L)

Microbiota STability ASsessment via Iterative cluStering

Description

The toolkit 'µSTASIS' has been developed for the stability analysis of microbiota in a temporal framework by leveraging on iterative clustering. Concretely, the core function uses Hartigan-Wong k-means algorithm as many times as possible for stressing out paired samples from the same individuals to test if they remain together for multiple numbers of clusters over a whole data set of individuals. Moreover, the package includes multiple functions to subset samples from paired times, validate the results or visualize the output.

Compute the mean absolute error (MAE) in percentage after `iterativeClusteringCV()`.

Description

Compute the mean absolute error after the cross validation or plot lines connecting the stability values for each subset of the original matrix of paired times.

Usage

mSerrorCV(pairedTime, CVklist, k = 1L)
mSerrorCV(pairedTime, CVklist, k = 1L)

Arguments

`pairedTime`	input matrix with paired times whose stability has being assessed. One of the lists output of `pairedTimes()`.
`CVklist`	list resulting from `iterativeClusteringCV()`.
`k`	integer; number of individuals to subset from the data. The same as used in `iterativeClusteringCV()`.

Value

A vector with MAE values for each individual's mS score.

Examples

data(clr)
times <- pairedTimes(data = clr[, 1:20], sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
cv_klist_t1_t25_k2 <- iterativeClusteringCV(pairedTimes = times, 
                                            results = mS, name = "t1_t25",
                                            common = "_0_", k = 2L)
MAE_t1_t25 <- mSerrorCV(pairedTime = times$t1_t25, 
                        CVklist = cv_klist_t1_t25_k2,  k = 2L)
MAE <- mSpreviz(results = list(MAE_t1_t25), 
                times = list(t1_t25 = times$t1_t25))
plotmSheatmap(results = MAE, times = c("t1_t25", "t25_t26"), label = TRUE,
              high = 'red2',  low = 'forestgreen', midpoint = 5)
data(clr)
times <- pairedTimes(data = clr[, 1:20], sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
cv_klist_t1_t25_k2 <- iterativeClusteringCV(pairedTimes = times, 
                                            results = mS, name = "t1_t25",
                                            common = "_0_", k = 2L)
MAE_t1_t25 <- mSerrorCV(pairedTime = times$t1_t25, 
                        CVklist = cv_klist_t1_t25_k2,  k = 2L)
MAE <- mSpreviz(results = list(MAE_t1_t25), 
                times = list(t1_t25 = times$t1_t25))
plotmSheatmap(results = MAE, times = c("t1_t25", "t25_t26"), label = TRUE,
              high = 'red2',  low = 'forestgreen', midpoint = 5)

Internal function for `pairedTimes()`.

Description

Internal function for pairedTimes().

Usage

mSinternalPairedTimes(data, specifiedTimePoints, common = "_")
mSinternalPairedTimes(data, specifiedTimePoints, common = "_")

Arguments

`data`	matrix with rownames including ID, common pattern and sampling time.
`specifiedTimePoints`	character vector to specify the selection of concrete paired times.
`common`	pattern separating the ID and the sampling time in rownames.

Value

A list of matrices with the same number of columns as input and with samples from paired sampling times as rows.

Examples

data(clr)
t1_t2 <- mSinternalPairedTimes(data = clr, 
                               specifiedTimePoints = c("1", "25"), 
                               common = "_0_")
data(clr)
t1_t2 <- mSinternalPairedTimes(data = clr, 
                               specifiedTimePoints = c("1", "25"), 
                               common = "_0_")

Easily extract groups of individuals from sample metadata.

Description

Easily extract groups of individuals from sample metadata.

Usage

mSmetadataGroups(
  metadata,
  samples,
  individuals,
  variable,
  common,
  ID,
  timePoints
)
mSmetadataGroups(
  metadata,
  samples,
  individuals,
  variable,
  common,
  ID,
  timePoints
)

Arguments

`metadata`	input data.frame with data corresponding to samples. It can be the `SummarizedExperiment::colData()` from the TreeSummarizedExperiment.
`samples`	vector from metadata corresponding to the samples ID, if applicable; should be NULL if ID and timePoints are provided from a TreeSummarizedExperiment, for example.
`individuals`	vector of individuals; first column of the `mSpreviz()` output.
`variable`	column name with the variable used for grouping individuals.
`common`	pattern that separates the ID and the sampling time in rownames, if applicable.
`ID`	If applicable, one of the colData() colnames from the TreeSummarizedExperiment should be given as individuals.
`timePoints`	If applicable, one of the colData() colnames from the TreeSummarizedExperiment should be given as sampling times.

Value

A vector with the same length as the number of rows in the mSpreviz() output.

Examples

data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
results <- mSpreviz(results = mS, times = times)
metadata <- data.frame(Sample = rownames(clr), age = c(rep("youth", 65), 
                       rep("old", 131-65)))
group <- mSmetadataGroups(metadata = metadata, samples = metadata$Sample, 
                          common = "_0_", individuals = results$individual, 
                          variable = "age")
data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
results <- mSpreviz(results = mS, times = times)
metadata <- data.frame(Sample = rownames(clr), age = c(rep("youth", 65), 
                       rep("old", 131-65)))
group <- mSmetadataGroups(metadata = metadata, samples = metadata$Sample, 
                          common = "_0_", individuals = results$individual, 
                          variable = "age")

Process the `iterativeClustering()` output to a new format ready for the implemented visualization functions.

Description

Process the iterativeClustering() output to a new format ready for the implemented visualization functions.

Usage

mSpreviz(results, times)
mSpreviz(results, times)

Arguments

`results`	list; output of `iterativeClustering()`.
`times`	list; output of `pairedTimes()`.

Value

A data frame ready for its use under the implemented visualization functions and others.

Examples

data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
results <- mSpreviz(results = mS, times = times)
data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
results <- mSpreviz(results = mS, times = times)

Generate one or multiple matrices with paired times.

Description

Generate one or multiple matrices with paired times.

Usage

pairedTimes(data, ...)

## S4 method for signature 'matrix'
pairedTimes(data, sequential, common, specifiedTimePoints)

## S4 method for signature 'TreeSummarizedExperiment'
pairedTimes(
  data,
  sequential,
  assay,
  alternativeExp,
  ID,
  timePoints,
  specifiedTimePoints
)
pairedTimes(data, ...)

## S4 method for signature 'matrix'
pairedTimes(data, sequential, common, specifiedTimePoints)

## S4 method for signature 'TreeSummarizedExperiment'
pairedTimes(
  data,
  sequential,
  assay,
  alternativeExp,
  ID,
  timePoints,
  specifiedTimePoints
)

Arguments

`data`	input object: either a matrix with rownames including ID, common pattern and sampling time, or a TreeSummarizedExperiment object.
`...`	Additional argument list that might not ever be used.
`sequential`	TRUE if paired times to analyse are sequential and present the desired alphanumerical order.
`common`	If is.matrix(data), pattern that separates the ID and the sampling time in rownames.
`specifiedTimePoints`	character vector to specify the selection of concrete paired times.
`assay`	If class(data) == "TreeSummarizedExperiment", name of the assay to use.
`alternativeExp`	If class(data) == "TreeSummarizedExperiment", name of the alternative experiment to use (if applicable).
`ID`	If class(data) == "TreeSummarizedExperiment", one of the colData(data) colnames should be given as individuals.
`timePoints`	If class(data) == "TreeSummarizedExperiment", one of the colData(data) colnames should be given as sampling times.

Value

A list of matrices with the same number of columns as input and with samples from paired sampling times as rows.

Examples

data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
times_b <- pairedTimes(data = clr, sequential = FALSE, common = "_0_", 
                       specifiedTimePoints = c("1", "26"))
data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
times_b <- pairedTimes(data = clr, sequential = FALSE, common = "_0_", 
                       specifiedTimePoints = c("1", "26"))

Generate boxplots of the stability dynamics throughout sampling times by groups.

Description

Generate boxplots of the stability dynamics throughout sampling times by groups.

Usage

plotmSdynamics(results, groups, points = TRUE, linetype = 2)
plotmSdynamics(results, groups, points = TRUE, linetype = 2)

Arguments

`results`	input data.frame resulting from `mSpreviz()`.
`groups`	vector with the same length as individuals, i.e. the number of rows in the `mSpreviz()` output.
`points`	logical; FALSE to only visualize boxplots or TRUE to also add individual points.
`linetype`	numeric; type of line to connect the median value of paired times; 0 to avoid the line.

Value

A plot with as many boxes as paired times by group in the form of a ggplot2::ggplot() object.

Examples

data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
results <- mSpreviz(results = mS, times = times)
metadata <- data.frame(Sample = rownames(clr), age = c(rep("youth", 65), 
                       rep("old", 131-65)))
group <- mSmetadataGroups(metadata = metadata, samples = metadata$Sample, 
                          common = "_0_", individuals = results$individual, 
                          variable = "age")
plotmSdynamics(results, groups = group, points = TRUE, linetype = 0)
data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
results <- mSpreviz(results = mS, times = times)
metadata <- data.frame(Sample = rownames(clr), age = c(rep("youth", 65), 
                       rep("old", 131-65)))
group <- mSmetadataGroups(metadata = metadata, samples = metadata$Sample, 
                          common = "_0_", individuals = results$individual, 
                          variable = "age")
plotmSdynamics(results, groups = group, points = TRUE, linetype = 0)

Plot a heatmap of the stability results.

Description

Plot a heatmap of the stability results.

Usage

plotmSheatmap(
  results,
  order = NULL,
  times,
  label = FALSE,
  low = "red2",
  mid = "yellow",
  high = "forestgreen",
  midpoint = 0.5
)
plotmSheatmap(
  results,
  order = NULL,
  times,
  label = FALSE,
  low = "red2",
  mid = "yellow",
  high = "forestgreen",
  midpoint = 0.5
)

Arguments

`results`	input data.frame resulting from `mSpreviz()`.
`order`	NULL object or character: none, mean or median; if the individuals should be sorted by any of those statistics of the stability values.
`times`	character; names of the paired times to plot, i.e. colnames of results.
`label`	logical; TRUE to print the mS score or FALSE to not.
`low`	color for the lowest value.
`mid`	color for the middle value.
`high`	color for the highest values.
`midpoint`	value to situate the middle.

Value

A heatmap of the stability values in the form of a ggplot2::ggplot() object.

Examples

data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
results <- mSpreviz(results = mS, times = times)
plotmSheatmap(results = results, order = "mean", times = c("t1_t25", "t25_t26"), 
          label = TRUE)
data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
results <- mSpreviz(results = mS, times = times)
plotmSheatmap(results = results, order = "mean", times = c("t1_t25", "t25_t26"), 
          label = TRUE)

Plot the stability values after `iterativeClusteringCV()`.

Description

Plot lines connecting the mS score for each subset of the original matrix of paired times.

Usage

plotmSlinesCV(pairedTime, CVklist, k = 1L, points = TRUE, sizeLine = 0.5)
plotmSlinesCV(pairedTime, CVklist, k = 1L, points = TRUE, sizeLine = 0.5)

Arguments

`pairedTime`	input matrix with paired times whose stability has being assessed. One of the lists output of `pairedTimes()`.
`CVklist`	list resulting from `iterativeClusteringCV()`.
`k`	integer; number of individuals to subset from the data. The same as used in `iterativeClusteringCV()`.
`points`	logical; if plotting, FALSE to only plot lines and TRUE to add points on the mS score, i.e. result from `iterativeClusteringCV()`.
`sizeLine`	numeric; if plotting, size of the multiple lines.

Value

A line plot in the form of a ggplot2::ggplot() object with the values of stability for the multiple subsets and the original matrix of paired samples (points).

Examples

data(clr)
times <- pairedTimes(data = clr[, 1:20], sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
cv_klist_t1_t25_k2 <- iterativeClusteringCV(pairedTimes = times, 
                                            results = mS, name = "t1_t25",
                                            common = "_0_", k = 2L)
plotmSlinesCV(pairedTime = times$t1_t25, CVklist = cv_klist_t1_t25_k2, k = 2L)
data(clr)
times <- pairedTimes(data = clr[, 1:20], sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
cv_klist_t1_t25_k2 <- iterativeClusteringCV(pairedTimes = times, 
                                            results = mS, name = "t1_t25",
                                            common = "_0_", k = 2L)
plotmSlinesCV(pairedTime = times$t1_t25, CVklist = cv_klist_t1_t25_k2, k = 2L)

Plot a scatter and side boxplot of the stability results.

Description

Plot a scatter and side boxplot of the stability results.

Usage

plotmSscatter(results, order = NULL, times, gridLines = FALSE, sideScale = 0.3)
plotmSscatter(results, order = NULL, times, gridLines = FALSE, sideScale = 0.3)

Arguments

`results`	input data.frame resulting from `mSpreviz()`.
`order`	NULL object or character: mean or median; if the individuals should be sorted by any of those statistics of the stability values.
`times`	a vector with the names of each paired time, e.g. "t1_t2".
`gridLines`	logical; FALSE to print a blank background or TRUE to include a gray grid.
`sideScale`	numeric; scale of the side boxplot.

Value

A scatter plot and a side boxplot of the stability values in the form of a ggplot2::ggplot() object.

Examples

data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
results <- mSpreviz(results = mS, times = times)
plotmSscatter(results = results, order = "median", times = c("t1_t25", 
                                                         "t25_t26"), 
          gridLines = TRUE, sideScale = 0.2)
data(clr)
times <- pairedTimes(data = clr, sequential = TRUE, common = "_0_")
mS <- iterativeClustering(pairedTimes = times, common = "_")
results <- mSpreviz(results = mS, times = times)
plotmSscatter(results = results, order = "median", times = c("t1_t25", 
                                                         "t25_t26"), 
          gridLines = TRUE, sideScale = 0.2)

Package 'microSTASIS'

Help Index

Detected ASV from multiple individuals at four different sampling times.

Description

Usage

Format

References

Stability of individuals after iteratively performing Hartigan-Wong k-means clustering.

Description

Usage

Arguments

Value

Examples

Cross validation of the iterative Hartigan-Wong k-means clustering.

Description

Usage

Arguments

Value

Examples

Microbiota STability ASsessment via Iterative cluStering

Description

Compute the mean absolute error (MAE) in percentage after iterativeClusteringCV().

Description

Usage

Arguments

Value

Examples

Internal function for pairedTimes().

Description

Usage

Arguments

Value

Examples

Easily extract groups of individuals from sample metadata.

Description

Usage

Arguments

Value

Examples

Process the iterativeClustering() output to a new format ready for the implemented visualization functions.

Description

Usage

Arguments

Value

Examples

Generate one or multiple matrices with paired times.

Description

Usage

Arguments

Value

Examples

Generate boxplots of the stability dynamics throughout sampling times by groups.

Description

Usage

Arguments

Value

Examples

Plot a heatmap of the stability results.

Description

Usage

Arguments

Value

Examples

Plot the stability values after iterativeClusteringCV().

Description

Usage

Arguments

Value

Examples

Plot a scatter and side boxplot of the stability results.

Description

Usage

Arguments

Value

Examples

Compute the mean absolute error (MAE) in percentage after `iterativeClusteringCV()`.

Internal function for `pairedTimes()`.

Process the `iterativeClustering()` output to a new format ready for the implemented visualization functions.

Plot the stability values after `iterativeClusteringCV()`.