Title: | Microbiome analysis |
---|---|
Description: | mia implements tools for microbiome analysis based on the SummarizedExperiment, SingleCellExperiment and TreeSummarizedExperiment infrastructure. Data wrangling and analysis in the context of taxonomic data is the main scope. Additional functions for common task are implemented such as community indices calculation and summarization. |
Authors: | Tuomas Borman [aut, cre] , Felix G.M. Ernst [aut] , Sudarshan A. Shetty [aut] , Leo Lahti [aut] , Yang Cao [ctb], Nathan D. Olson [ctb], Levi Waldron [ctb], Marcel Ramos [ctb], Héctor Corrada Bravo [ctb], Jayaram Kancherla [ctb], Domenick Braccia [ctb], Basil Courbayre [ctb], Muluh Muluh [ctb], Giulio Benedetti [ctb], Moritz Emanuel Beber [ctb] , Nitesh Turaga [ctb], Chouaib Benchraka [ctb], Akewak Jeba [ctb], Himmi Lindgren [ctb], Noah De Gunst [ctb], Théotime Pralas [ctb], Shadman Ishraq [ctb], Eineje Ameh [ctb], Artur Sannikov [ctb], Hervé Pagès [ctb], Rajesh Shigdel [ctb], Katariina Pärnänen [ctb], Pande Erawijantari [ctb], Danielle Callan [ctb] |
Maintainer: | Tuomas Borman <[email protected]> |
License: | Artistic-2.0 | file LICENSE |
Version: | 1.15.6 |
Built: | 2025-01-15 03:14:20 UTC |
Source: | https://github.com/bioc/mia |
mia
Package.mia
implements tools for microbiome analysis based on the
SummarizedExperiment
, SingleCellExperiment
and
TreeSummarizedExperiment
infrastructure. Data wrangling and analysis
in the context of taxonomic data is the main scope. Additional functions for
common task are implemented such as community indices calculation and
summarization.
Maintainer: Tuomas Borman [email protected] (ORCID)
Authors:
Felix G.M. Ernst [email protected] (ORCID)
Sudarshan A. Shetty [email protected] (ORCID)
Leo Lahti [email protected] (ORCID)
Other contributors:
Yang Cao [contributor]
Nathan D. Olson [email protected] [contributor]
Levi Waldron [contributor]
Marcel Ramos [contributor]
Héctor Corrada Bravo [contributor]
Jayaram Kancherla [contributor]
Domenick Braccia [email protected] [contributor]
Basil Courbayre [contributor]
Muluh Muluh [contributor]
Giulio Benedetti [contributor]
Moritz Emanuel Beber [email protected] (ORCID) [contributor]
Nitesh Turaga [contributor]
Chouaib Benchraka [contributor]
Akewak Jeba [contributor]
Himmi Lindgren [contributor]
Noah De Gunst [contributor]
Théotime Pralas [contributor]
Shadman Ishraq [contributor]
Eineje Ameh [contributor]
Artur Sannikov [contributor]
Hervé Pagès [contributor]
Rajesh Shigdel [contributor]
Katariina Pärnänen [contributor]
Pande Erawijantari [contributor]
Danielle Callan [contributor]
The function estimates alpha diversity indices optionally using rarefaction,
then stores results in colData
.
addAlpha( x, assay.type = "counts", index = c("coverage_diversity", "fisher_diversity", "faith_diversity", "gini_simpson_diversity", "inverse_simpson_diversity", "log_modulo_skewness_diversity", "shannon_diversity", "absolute_dominance", "dbp_dominance", "core_abundance_dominance", "gini_dominance", "dmn_dominance", "relative_dominance", "simpson_lambda_dominance", "camargo_evenness", "pielou_evenness", "simpson_evenness", "evar_evenness", "bulla_evenness", "ace_richness", "chao1_richness", "hill_richness", "observed_richness"), name = index, niter = NULL, ... ) ## S4 method for signature 'SummarizedExperiment' addAlpha( x, assay.type = "counts", index = c("coverage_diversity", "fisher_diversity", "faith_diversity", "gini_simpson_diversity", "inverse_simpson_diversity", "log_modulo_skewness_diversity", "shannon_diversity", "absolute_dominance", "dbp_dominance", "core_abundance_dominance", "gini_dominance", "dmn_dominance", "relative_dominance", "simpson_lambda_dominance", "camargo_evenness", "pielou_evenness", "simpson_evenness", "evar_evenness", "bulla_evenness", "ace_richness", "chao1_richness", "hill_richness", "observed_richness"), name = index, niter = NULL, ... )
addAlpha( x, assay.type = "counts", index = c("coverage_diversity", "fisher_diversity", "faith_diversity", "gini_simpson_diversity", "inverse_simpson_diversity", "log_modulo_skewness_diversity", "shannon_diversity", "absolute_dominance", "dbp_dominance", "core_abundance_dominance", "gini_dominance", "dmn_dominance", "relative_dominance", "simpson_lambda_dominance", "camargo_evenness", "pielou_evenness", "simpson_evenness", "evar_evenness", "bulla_evenness", "ace_richness", "chao1_richness", "hill_richness", "observed_richness"), name = index, niter = NULL, ... ) ## S4 method for signature 'SummarizedExperiment' addAlpha( x, assay.type = "counts", index = c("coverage_diversity", "fisher_diversity", "faith_diversity", "gini_simpson_diversity", "inverse_simpson_diversity", "log_modulo_skewness_diversity", "shannon_diversity", "absolute_dominance", "dbp_dominance", "core_abundance_dominance", "gini_dominance", "dmn_dominance", "relative_dominance", "simpson_lambda_dominance", "camargo_evenness", "pielou_evenness", "simpson_evenness", "evar_evenness", "bulla_evenness", "ace_richness", "chao1_richness", "hill_richness", "observed_richness"), name = index, niter = NULL, ... )
x |
a |
assay.type |
|
index |
|
name |
|
niter |
|
... |
optional arguments:
|
Alpha diversity is a joint quantity that combines elements or community richness and evenness. Diversity increases, in general, when species richness or evenness increase.
The following diversity indices are available:
'coverage': Number of species needed to cover a given fraction of the ecosystem (50 percent by default). Tune this with the threshold argument.
'faith': Faith's phylogenetic alpha diversity index measures how long the taxonomic distance is between taxa that are present in the sample. Larger values represent higher diversity. Using this index requires rowTree. (Faith 1992)
If the data includes features that are not in tree's tips but in
internal nodes, there are two options. First, you can keep those features,
and prune the tree to match features so that each tip can be found from
the features. Other option is to remove all features that are not tips.
(See only.tips
parameter)
'fisher': Fisher's alpha; as implemented in
vegan::fisher.alpha
. (Fisher et al. 1943)
'gini_simpson': Gini-Simpson diversity i.e. ,
where
is the
Simpson index, calculated as the sum of squared relative abundances.
This corresponds to the diversity index
'simpson' in
vegan::diversity
.
This is also called Gibbs–Martin, or Blau index in sociology,
psychology and management studies. The Gini-Simpson index (1-lambda)
should not be
confused with Simpson's dominance (lambda), Gini index, or
inverse Simpson index (1/lambda).
'inverse_simpson': Inverse Simpson diversity:
where
and p refers to relative
abundances.
This corresponds to the diversity index
'invsimpson' in vegan::diversity. Don't confuse this with the
closely related Gini-Simpson index
'log_modulo_skewness': The rarity index characterizes the concentration of species at low abundance. Here, we use the skewness of the frequency distribution of arithmetic abundance classes (see Magurran & McGill 2011). These are typically right-skewed; to avoid taking log of occasional negative skews, we follow Locey & Lennon (2016) and use the log-modulo transformation that adds a value of one to each measure of skewness to allow logarithmization.
'shannon': Shannon diversity (entropy).
A dominance index quantifies the dominance of one or few species in a community. Greater values indicate higher dominance.
Dominance indices are in general negatively correlated with alpha diversity indices (species richness, evenness, diversity, rarity). More dominant communities are less diverse.
The following community dominance indices are available:
'absolute': Absolute index equals to the absolute abundance of the
most dominant n species of the sample (specify the number with the argument
ntaxa
). Index gives positive integer values.
'dbp': Berger-Parker index (See Berger & Parker 1970) calculation is a special case of the 'relative' index. dbp is the relative abundance of the most abundant species of the sample. Index gives values in interval 0 to 1, where bigger value represent greater dominance.
where is the absolute abundance of the most
dominant species and
is the sum of absolute abundances of all
species.
'core_abundance': Core abundance index is related to core species. Core species are species that are most abundant in all samples, i.e., in whole data set. Core species are defined as those species that have prevalence over 50\ species must be prevalent in 50\ calculate the core abundance index. Core abundance index is sum of relative abundances of core species in the sample. Index gives values in interval 0 to 1, where bigger value represent greater dominance.
where is the sum of absolute
abundance of the core species and
is the sum of absolute
abundances of all species.
'gini': Gini index is probably best-known from socio-economic contexts (Gini 1921). In economics, it is used to measure, for example, how unevenly income is distributed among population. Here, Gini index is used similarly, but income is replaced with abundance.
If there is small group of species that represent large portion of total abundance of microbes, the inequality is large and Gini index closer to 1. If all species has equally large abundances, the equality is perfect and Gini index equals 0. This index should not be confused with Gini-Simpson index, which quantifies diversity.
'dmn': McNaughton’s index is the sum of relative abundances of the two most abundant species of the sample (McNaughton & Wolf, 1970). Index gives values in the unit interval:
where and
are the absolute
abundances of the two most dominant species and
is the sum of
absolute abundances of all species.
'relative': Relative index equals to the relative abundance of the
most dominant n species of the sample (specify the number with the
argument ntaxa
).
This index gives values in interval 0 to 1.
where is the absolute abundance of the most
dominant species and
is the sum of absolute abundances of all
species.
'simpson_lambda': Simpson's (dominance) index or Simpson's lambda is the sum of squared relative abundances. This index gives values in the unit interval. This value equals the probability that two randomly chosen individuals belongs to the same species. The higher the probability, the greater the dominance (See e.g. Simpson 1949).
where p refers to relative abundances.
There is also a more advanced Simpson dominance index (Simpson 1949). However, this is not provided and the simpler squared sum of relative abundances is used instead as the alternative index is not in the unit interval and it is highly correlated with the simpler variant implemented here.
Evenness is a standard index in community ecology, and it quantifies how evenly the abundances of different species are distributed. The following evenness indices are provided:
By default, this function returns all indices.
The available evenness indices include the following (all in lowercase):
'camargo': Camargo's evenness (Camargo 1992)
'simpson_evenness': Simpson’s evenness is calculated as inverse Simpson diversity (1/lambda) divided by observed species richness S: (1/lambda)/S.
'pielou': Pielou's evenness (Pielou, 1966), also known as Shannon or Shannon-Weaver/Wiener/Weiner evenness; H/ln(S). The Shannon-Weaver is the preferred term; see Spellerberg and Fedor (2003).
'evar': Smith and Wilson’s Evar index (Smith & Wilson 1996).
'bulla': Bulla’s index (O) (Bulla 1994).
Desirable statistical evenness metrics avoid strong bias towards very large or very small abundances; are independent of richness; and range within the unit interval with increasing evenness (Smith & Wilson 1996). Evenness metrics that fulfill these criteria include at least camargo, simpson, smith-wilson, and bulla. Also see Magurran & McGill (2011) and Beisel et al. (2003) for further details.
The richness is calculated per sample. This is a standard index in community ecology, and it provides an estimate of the number of unique species in the community. This is often not directly observed for the whole community but only for a limited sample from the community. This has led to alternative richness indices that provide different ways to estimate the species richness.
Richness index differs from the concept of species diversity or evenness in that it ignores species abundance, and focuses on the binary presence/absence values that indicate simply whether the species was detected.
The function takes all index names in full lowercase. The user can provide
the desired spelling through the argument name
(see examples).
The following richness indices are provided.
'ace': Abundance-based coverage estimator (ACE) is another
nonparametric richness
index that uses sample coverage, defined based on the sum of the
probabilities
of the observed species. This method divides the species into abundant
(more than 10
reads or observations) and rare groups
in a sample and tends to underestimate the real number of species. The
ACE index
ignores the abundance information for the abundant species,
based on the assumption that the abundant species are observed regardless
of their
exact abundance. We use here the bias-corrected version
(O'Hara 2005, Chiu et al. 2014) implemented in
estimateR
.
For an exact formulation, see estimateR
.
Note that this index comes with an additional column with standard
error information.
'chao1': This is a nonparametric estimator of species richness. It
assumes that rare species carry information about the (unknown) number
of unobserved species. We use here the bias-corrected version
(O'Hara 2005, Chiu et al. 2014) implemented in
estimateR
. This index implicitly
assumes that every taxa has equal probability of being observed. Note
that it gives a lower bound to species richness. The bias-corrected
for an exact formulation, see estimateR
.
This estimator uses only the singleton and doubleton counts, and
hence it gives more weight to the low abundance species.
Note that this index comes with an additional column with standard
error information.
'hill': Effective species richness aka Hill index (see e.g. Chao et al. 2016). Currently only the case 1D is implemented. This corresponds to the exponent of Shannon diversity. Intuitively, the effective richness indicates the number of species whose even distribution would lead to the same diversity than the observed community, where the species abundances are unevenly distributed.
'observed': The observed richness gives the number of species that
is detected above a given detection
threshold in the observed sample
(default 0). This is conceptually the simplest richness index. The
corresponding index in the vegan package is "richness".
x
with additional colData
column(s) named code
Beisel J-N. et al. (2003) A Comparative Analysis of Diversity Index Sensitivity. Internal Rev. Hydrobiol. 88(1):3-15. https://portais.ufg.br/up/202/o/2003-comparative_evennes_index.pdf
Berger WH & Parker FL (1970) Diversity of Planktonic Foraminifera in Deep-Sea Sediments. Science 168(3937):1345-1347. doi: 10.1126/science.168.3937.1345
Bulla L. (1994) An index of diversity and its associated diversity measure. Oikos 70:167–171
Camargo, JA. (1992) New diversity index for assessing structural alterations in aquatic communities. Bull. Environ. Contam. Toxicol. 48:428–434.
Chao A. (1984) Non-parametric estimation of the number of classes in a population. Scand J Stat. 11:265–270.
Chao A, Chun-Huo C, Jost L (2016). Phylogenetic Diversity Measures and Their Decomposition: A Framework Based on Hill Numbers. Biodiversity Conservation and Phylogenetic Systematics, Springer International Publishing, pp. 141–172, doi:10.1007/978-3-319-22461-9_8.
Chiu, C.H., Wang, Y.T., Walther, B.A. & Chao, A. (2014). Improved nonparametric lower bound of species richness via a modified Good-Turing frequency formula. Biometrics 70, 671-682.
Faith D.P. (1992) Conservation evaluation and phylogenetic diversity. Biological Conservation 61(1):1-10.
Fisher R.A., Corbet, A.S. & Williams, C.B. (1943) The relation between the number of species and the number of individuals in a random sample of animal population. Journal of Animal Ecology 12, 42-58.
Gini C (1921) Measurement of Inequality of Incomes. The Economic Journal 31(121): 124-126. doi: 10.2307/2223319
Locey KJ and Lennon JT. (2016) Scaling laws predict global microbial diversity. PNAS 113(21):5970-5975; doi:10.1073/pnas.1521291113.
Magurran AE, McGill BJ, eds (2011) Biological Diversity: Frontiers in Measurement and Assessment (Oxford Univ Press, Oxford), Vol 12.
McNaughton, SJ and Wolf LL. (1970). Dominance and the niche in ecological systems. Science 167:13, 1–139
O'Hara, R.B. (2005). Species richness estimators: how many species can dance on the head of a pin? J. Anim. Ecol. 74, 375-386.
Pielou, EC. (1966) The measurement of diversity in different types of biological collections. J Theoretical Biology 13:131–144.
Simpson EH (1949) Measurement of Diversity. Nature 163(688). doi: 10.1038/163688a0
Smith B and Wilson JB. (1996) A Consumer's Guide to Evenness Indices. Oikos 76(1):70-82.
Spellerberg and Fedor (2003). A tribute to Claude Shannon (1916 –2001) and a plea for more rigorous use of species richness, species diversity and the ‘Shannon–Wiener’ Index. Alpha Ecology & Biogeography 12, 177–197.
data("GlobalPatterns") tse <- GlobalPatterns # Calculate the default Shannon index with no rarefaction tse <- addAlpha(tse, index = "shannon") # Shows the estimated Shannon index tse$shannon # Calculate observed richness with 10 rarefaction rounds tse <- addAlpha(tse, assay.type = "counts", index = "observed_richness", sample = min(colSums(assay(tse, "counts")), na.rm = TRUE), niter=10) # Shows the estimated observed richness tse$observed_richness
data("GlobalPatterns") tse <- GlobalPatterns # Calculate the default Shannon index with no rarefaction tse <- addAlpha(tse, index = "shannon") # Shows the estimated Shannon index tse$shannon # Calculate observed richness with 10 rarefaction rounds tse <- addAlpha(tse, assay.type = "counts", index = "observed_richness", sample = min(colSums(assay(tse, "counts")), na.rm = TRUE), niter=10) # Shows the estimated observed richness tse$observed_richness
This function returns a SummarizedExperiment
with clustering
information in its colData or rowData
addCluster( x, BLUSPARAM, assay.type = assay_name, assay_name = "counts", by = MARGIN, MARGIN = "rows", full = FALSE, name = "clusters", clust.col = "clusters", ... ) ## S4 method for signature 'SummarizedExperiment' addCluster( x, BLUSPARAM, assay.type = assay_name, assay_name = "counts", by = MARGIN, MARGIN = "rows", full = FALSE, name = "clusters", clust.col = "clusters", ... )
addCluster( x, BLUSPARAM, assay.type = assay_name, assay_name = "counts", by = MARGIN, MARGIN = "rows", full = FALSE, name = "clusters", clust.col = "clusters", ... ) ## S4 method for signature 'SummarizedExperiment' addCluster( x, BLUSPARAM, assay.type = assay_name, assay_name = "counts", by = MARGIN, MARGIN = "rows", full = FALSE, name = "clusters", clust.col = "clusters", ... )
x |
A
|
BLUSPARAM |
A BlusterParam object specifying the algorithm to use. |
assay.type |
|
assay_name |
Deprecated. Use |
by |
|
MARGIN |
Deprecated. Use |
full |
Logical scalar indicating whether the full clustering statistics should be returned for each method. |
name |
|
clust.col |
|
... |
Additional parameters to use altExps for example |
This is a wrapper for the clusterRows
function from the
bluster
package.
When setting full = TRUE
, the clustering information will be stored in
the metadata of the object.
By default, clustering is done on the features.
addCluster
returns an object of the same type as the x
parameter
with clustering information named clusters
stored in colData
or rowData
.
library(bluster) data(GlobalPatterns, package = "mia") tse <- GlobalPatterns # Cluster on rows using Kmeans tse <- addCluster(tse, KmeansParam(centers = 3)) # Clustering done on the samples using Hclust tse <- addCluster(tse, by = "samples", HclustParam(metric = "bray", dist.fun = vegan::vegdist)) # Getting the clusters colData(tse)$clusters
library(bluster) data(GlobalPatterns, package = "mia") tse <- GlobalPatterns # Cluster on rows using Kmeans tse <- addCluster(tse, KmeansParam(centers = 3)) # Clustering done on the samples using Hclust tse <- addCluster(tse, by = "samples", HclustParam(metric = "bray", dist.fun = vegan::vegdist)) # Getting the clusters colData(tse)$clusters
Estimate divergence against a given reference sample.
addDivergence(x, name = "divergence", ...) ## S4 method for signature 'SummarizedExperiment' addDivergence(x, name = "divergence", ...) getDivergence( x, assay.type = assay_name, assay_name = "counts", reference = "median", method = "bray", ... ) ## S4 method for signature 'SummarizedExperiment' getDivergence( x, assay.type = assay_name, assay_name = "counts", reference = "median", method = "bray", ... )
addDivergence(x, name = "divergence", ...) ## S4 method for signature 'SummarizedExperiment' addDivergence(x, name = "divergence", ...) getDivergence( x, assay.type = assay_name, assay_name = "counts", reference = "median", method = "bray", ... ) ## S4 method for signature 'SummarizedExperiment' getDivergence( x, assay.type = assay_name, assay_name = "counts", reference = "median", method = "bray", ... )
x |
a |
name |
|
... |
optional arguments passed to
|
assay.type |
|
assay_name |
Deprecated. Use |
reference |
|
method |
|
Microbiota divergence (heterogeneity / spread) within a given sample set can be quantified by the average sample dissimilarity or beta diversity with respect to a given reference sample.
The calculation makes use of the function getDissimilarity()
. The
divergence
measure is sensitive to sample size. Subsampling or bootstrapping can be
applied to equalize sample sizes between comparisons.
x
with additional colData
named name
data(GlobalPatterns) tse <- GlobalPatterns # By default, reference is median of all samples. The name of column where # results is "divergence" by default, but it can be specified. tse <- addDivergence(tse) # The method that are used to calculate distance in divergence and # reference can be specified. Here, euclidean distance is used. Reference is # the first sample. It is recommended # to add reference to colData. tse[["reference"]] <- rep(colnames(tse)[[1]], ncol(tse)) tse <- addDivergence( tse, name = "divergence_first_sample", reference = "reference", method = "euclidean") # Here we compare samples to global mean tse <- addDivergence(tse, name = "divergence_average", reference = "mean") # All three divergence results are stored in colData. colData(tse)
data(GlobalPatterns) tse <- GlobalPatterns # By default, reference is median of all samples. The name of column where # results is "divergence" by default, but it can be specified. tse <- addDivergence(tse) # The method that are used to calculate distance in divergence and # reference can be specified. Here, euclidean distance is used. Reference is # the first sample. It is recommended # to add reference to colData. tse[["reference"]] <- rep(colnames(tse)[[1]], ncol(tse)) tse <- addDivergence( tse, name = "divergence_first_sample", reference = "reference", method = "euclidean") # Here we compare samples to global mean tse <- addDivergence(tse, name = "divergence_average", reference = "mean") # All three divergence results are stored in colData. colData(tse)
These functions perform Latent Dirichlet Allocation on data stored in a
TreeSummarizedExperiment
object.
getLDA(x, ...) addLDA(x, ...) ## S4 method for signature 'SummarizedExperiment' getLDA(x, k = 2, assay.type = "counts", eval.metric = "perplexity", ...) ## S4 method for signature 'SummarizedExperiment' addLDA(x, k = 2, assay.type = "counts", name = "LDA", ...)
getLDA(x, ...) addLDA(x, ...) ## S4 method for signature 'SummarizedExperiment' getLDA(x, k = 2, assay.type = "counts", eval.metric = "perplexity", ...) ## S4 method for signature 'SummarizedExperiment' addLDA(x, k = 2, assay.type = "counts", name = "LDA", ...)
x |
a
|
... |
optional arguments passed to |
k |
|
assay.type |
|
eval.metric |
|
name |
|
The functions getLDA
and addLDA
internally use
LDA
to compute the ordination matrix and
feature loadings.
For getLDA
, the ordination matrix with feature loadings matrix
as attribute "loadings"
.
For addLDA
, a
TreeSummarizedExperiment
object is returned containing the ordination matrix in
reducedDim(..., name)
with feature loadings matrix as attribute
"loadings"
.
data(GlobalPatterns) tse <- GlobalPatterns # Reduce the number of features tse <- agglomerateByPrevalence(tse, rank="Phylum") # Run LDA and add the result to reducedDim(tse, "LDA") tse <- addLDA(tse) # Extract feature loadings loadings <- attr(reducedDim(tse, "LDA"), "loadings") head(loadings) # Estimate models with number of topics from 2 to 10 tse <- addLDA(tse, k = c(2, 3, 4, 5, 6, 7, 8, 9, 10), name = "LDA_10") # Get the evaluation metrics tab <- attr(reducedDim(tse, "LDA_10"),"eval_metrics") # Plot plot(tab[["k"]], tab[["perplexity"]], xlab = "k", ylab = "perplexity")
data(GlobalPatterns) tse <- GlobalPatterns # Reduce the number of features tse <- agglomerateByPrevalence(tse, rank="Phylum") # Run LDA and add the result to reducedDim(tse, "LDA") tse <- addLDA(tse) # Extract feature loadings loadings <- attr(reducedDim(tse, "LDA"), "loadings") head(loadings) # Estimate models with number of topics from 2 to 10 tse <- addLDA(tse, k = c(2, 3, 4, 5, 6, 7, 8, 9, 10), name = "LDA_10") # Get the evaluation metrics tab <- attr(reducedDim(tse, "LDA_10"),"eval_metrics") # Plot plot(tab[["k"]], tab[["perplexity"]], xlab = "k", ylab = "perplexity")
These functions perform Non-negative Matrix Factorization on data stored in a
TreeSummarizedExperiment
object.
getNMF(x, ...) addNMF(x, ...) ## S4 method for signature 'SummarizedExperiment' getNMF(x, k = 2, assay.type = "counts", eval.metric = "evar", ...) ## S4 method for signature 'SummarizedExperiment' addNMF( x, k = 2, assay.type = "counts", eval.metric = "evar", name = "NMF", ... )
getNMF(x, ...) addNMF(x, ...) ## S4 method for signature 'SummarizedExperiment' getNMF(x, k = 2, assay.type = "counts", eval.metric = "evar", ...) ## S4 method for signature 'SummarizedExperiment' addNMF( x, k = 2, assay.type = "counts", eval.metric = "evar", name = "NMF", ... )
x |
a
|
... |
optional arguments passed to |
k |
|
assay.type |
|
eval.metric |
|
name |
|
The functions getNMF
and addNMF
internally use nmf::NMF
compute the ordination matrix and
feature loadings.
If k is a vector of integers, NMF output is calculated for all the rank
values contained in k, and the best fit is selected based on
eval.metric
value.
For getNMF
, the ordination matrix with feature loadings matrix
as attribute "loadings"
.
For addNMF
, a
TreeSummarizedExperiment
object is returned containing the ordination matrix in
reducedDims(x, name)
with the following attributes:
"loadings" which is a matrix containing the feature loadings
"NMF_output" which is the output of function nmf::NMF
"best_fit" which is the result of the best fit if k is a vector of integers
data(GlobalPatterns) tse <- GlobalPatterns # Reduce the number of features tse <- agglomerateByPrevalence(tse, rank = "Phylum") # Run NMF and add the result to reducedDim(tse, "NMF"). tse <- addNMF(tse, k = 2, name = "NMF") # Extract feature loadings loadings_NMF <- attr(reducedDim(tse, "NMF"), "loadings") head(loadings_NMF) # Estimate models with number of topics from 2 to 4. Perform 2 runs. tse <- addNMF(tse, k = c(2, 3, 4), name = "NMF_4", nrun = 2) # Extract feature loadings loadings_NMF_4 <- attr(reducedDim(tse, "NMF_4"), "loadings") head(loadings_NMF_4)
data(GlobalPatterns) tse <- GlobalPatterns # Reduce the number of features tse <- agglomerateByPrevalence(tse, rank = "Phylum") # Run NMF and add the result to reducedDim(tse, "NMF"). tse <- addNMF(tse, k = 2, name = "NMF") # Extract feature loadings loadings_NMF <- attr(reducedDim(tse, "NMF"), "loadings") head(loadings_NMF) # Estimate models with number of topics from 2 to 4. Perform 2 runs. tse <- addNMF(tse, k = c(2, 3, 4), name = "NMF_4", nrun = 2) # Extract feature loadings loadings_NMF_4 <- attr(reducedDim(tse, "NMF_4"), "loadings") head(loadings_NMF_4)
Agglomeration functions can be used to sum-up data based on specific criteria such as taxonomic ranks, variables or prevalence.
agglomerateByRank
can be used to sum up data based on associations
with certain taxonomic ranks, as defined in rowData
. Only available
taxonomyRanks
can be used.
agglomerateByVariable
merges data on rows or columns of a
SummarizedExperiment
as defined by a factor
alongside the
chosen dimension. This function allows agglomeration of data based on other
variables than taxonomy ranks.
Metadata from the rowData
or colData
are
retained as defined by archetype
.
assay
are
agglomerated, i.e. summed up. If the assay contains values other than counts
or absolute values, this can lead to meaningless values being produced.
agglomerateByRanks
takes a SummarizedExperiment
, splits it along the
taxonomic ranks, aggregates the data per rank, converts the input to a
SingleCellExperiment
objects and stores the aggregated data as
alternative experiments. unsplitByRanks
takes these alternative
experiments and flattens them again into a single
SummarizedExperiment
.
agglomerateByRank(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' agglomerateByRank( x, rank = taxonomyRanks(x)[1], update.tree = agglomerateTree, agglomerate.tree = agglomerateTree, agglomerateTree = FALSE, ... ) ## S4 method for signature 'SingleCellExperiment' agglomerateByRank( x, rank = taxonomyRanks(x)[1], altexp = NULL, altexp.rm = strip_altexp, strip_altexp = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' agglomerateByRank( x, rank = taxonomyRanks(x)[1], empty.rm = TRUE, empty.fields = c(NA, "", " ", "\t", "-", "_"), ... ) agglomerateByVariable(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' agglomerateByVariable( x, by, group = f, f, update.tree = mergeTree, mergeTree = FALSE, ... ) ## S4 method for signature 'SummarizedExperiment' agglomerateByVariable(x, by, group = f, f, ...) agglomerateByRanks(x, ...) ## S4 method for signature 'SummarizedExperiment' agglomerateByRanks( x, ranks = taxonomyRanks(x), na.rm = TRUE, as.list = FALSE, ... ) ## S4 method for signature 'SingleCellExperiment' agglomerateByRanks( x, ranks = taxonomyRanks(x), na.rm = TRUE, as.list = FALSE, ... ) ## S4 method for signature 'TreeSummarizedExperiment' agglomerateByRanks( x, ranks = taxonomyRanks(x), na.rm = TRUE, as.list = FALSE, ... ) splitByRanks(x, ...) unsplitByRanks(x, ...) ## S4 method for signature 'SingleCellExperiment' unsplitByRanks( x, ranks = taxonomyRanks(x), keep.dimred = keep_reducedDims, keep_reducedDims = FALSE, ... ) ## S4 method for signature 'TreeSummarizedExperiment' unsplitByRanks( x, ranks = taxonomyRanks(x), keep.dimred = keep_reducedDims, keep_reducedDims = FALSE, ... )
agglomerateByRank(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' agglomerateByRank( x, rank = taxonomyRanks(x)[1], update.tree = agglomerateTree, agglomerate.tree = agglomerateTree, agglomerateTree = FALSE, ... ) ## S4 method for signature 'SingleCellExperiment' agglomerateByRank( x, rank = taxonomyRanks(x)[1], altexp = NULL, altexp.rm = strip_altexp, strip_altexp = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' agglomerateByRank( x, rank = taxonomyRanks(x)[1], empty.rm = TRUE, empty.fields = c(NA, "", " ", "\t", "-", "_"), ... ) agglomerateByVariable(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' agglomerateByVariable( x, by, group = f, f, update.tree = mergeTree, mergeTree = FALSE, ... ) ## S4 method for signature 'SummarizedExperiment' agglomerateByVariable(x, by, group = f, f, ...) agglomerateByRanks(x, ...) ## S4 method for signature 'SummarizedExperiment' agglomerateByRanks( x, ranks = taxonomyRanks(x), na.rm = TRUE, as.list = FALSE, ... ) ## S4 method for signature 'SingleCellExperiment' agglomerateByRanks( x, ranks = taxonomyRanks(x), na.rm = TRUE, as.list = FALSE, ... ) ## S4 method for signature 'TreeSummarizedExperiment' agglomerateByRanks( x, ranks = taxonomyRanks(x), na.rm = TRUE, as.list = FALSE, ... ) splitByRanks(x, ...) unsplitByRanks(x, ...) ## S4 method for signature 'SingleCellExperiment' unsplitByRanks( x, ranks = taxonomyRanks(x), keep.dimred = keep_reducedDims, keep_reducedDims = FALSE, ... ) ## S4 method for signature 'TreeSummarizedExperiment' unsplitByRanks( x, ranks = taxonomyRanks(x), keep.dimred = keep_reducedDims, keep_reducedDims = FALSE, ... )
x |
|
... |
arguments passed to |
rank |
|
update.tree |
|
agglomerate.tree |
Deprecated. Use |
agglomerateTree |
Deprecated. Use |
altexp |
|
altexp.rm |
|
strip_altexp |
Deprecated. Use |
empty.rm |
|
empty.fields |
|
by |
|
group |
|
f |
Deprecated. Use |
mergeTree |
Deprecated. Use |
ranks |
|
na.rm |
|
as.list |
|
keep.dimred |
|
keep_reducedDims |
Deprecated. Use |
Agglomeration sums up the values of assays at the specified taxonomic level. With certain assays, e.g. those that include binary or negative values, this summing can produce meaningless values. In those cases, consider performing agglomeration first, and then applying the transformation afterwards.
agglomerateByVariable
works similarly to
sumCountsAcrossFeatures
.
However, additional support for TreeSummarizedExperiment
was added and
science field agnostic names were used. In addition the archetype
argument lets the user select how to preserve row or column data.
For merge data of assays the function from scuttle
are used.
agglomerateByRanks
will use by default all available taxonomic ranks, but
this can be controlled by setting ranks
manually. NA
values
are removed by default, since they would not make sense, if the result
should be used for unsplitByRanks
at some point. The input data
remains unchanged in the returned SingleCellExperiment
objects.
unsplitByRanks
will remove any NA
value on each taxonomic rank
so that no ambiguous data is created. In additional, a column
taxonomicLevel
is created or overwritten in the rowData
to
specify from which alternative experiment this originates from. This can also
be used for splitAltExps
to
split the result along the same factor again. The input data from the base
objects is not returned, only the data from the altExp()
. Be aware that
changes to rowData
of the base object are not returned, whereas only
the colData
of the base object is kept.
agglomerateByRank
returns a taxonomically-agglomerated,
optionally-pruned object of the same class as x
.
agglomerateByVariable
returns an object of the same class as x
with the specified entries merged into one entry in all relevant components.
agglomerateByRank
returns a taxonomically-agglomerated,
optionally-pruned object of the same class as x
.
For agglomerateByRanks
:
If as.list = TRUE
: SummarizedExperiment
objects in a
SimpleList
If as.list = FALSE
: The SummarizedExperiment
passed as a
parameter and now containing the SummarizedExperiment
objects in its
altExps
For unsplitByRanks
: x
, with rowData
and assay
data replaced by the unsplit data. colData
of x is kept as well
and any existing rowTree
is dropped as well, since existing
rowLinks
are not valid anymore.
splitOn
unsplitOn
agglomerateByVariable
,
sumCountsAcrossFeatures
,
agglomerateByRank
,
altExps
,
splitAltExps
### Agglomerate data based on taxonomic information data(GlobalPatterns) # print the available taxonomic ranks colnames(rowData(GlobalPatterns)) taxonomyRanks(GlobalPatterns) # agglomerate at the Family taxonomic rank x1 <- agglomerateByRank(GlobalPatterns, rank="Family") ## How many taxa before/after agglomeration? nrow(GlobalPatterns) nrow(x1) # agglomerate the tree as well x2 <- agglomerateByRank(GlobalPatterns, rank="Family", update.tree = TRUE) nrow(x2) # same number of rows, but rowTree(x1) # ... different rowTree(x2) # ... tree # If assay contains binary or negative values, summing might lead to # meaningless values, and you will get a warning. In these cases, you might # want to do agglomeration again at chosen taxonomic level. tse <- transformAssay(GlobalPatterns, method = "pa") tse <- agglomerateByRank(tse, rank = "Genus") tse <- transformAssay(tse, method = "pa") # Removing empty labels by setting empty.rm = TRUE sum(is.na(rowData(GlobalPatterns)$Family)) x3 <- agglomerateByRank(GlobalPatterns, rank="Family", empty.rm = TRUE) nrow(x3) # different from x2 # Because all the rownames are from the same rank, rownames do not include # prefixes, in this case "Family:". print(rownames(x3[1:3,])) # To add them, use getTaxonomyLabels function. rownames(x3) <- getTaxonomyLabels(x3, with.rank = TRUE) print(rownames(x3[1:3,])) # use 'empty.ranks.rm' to remove columns that include only NAs x4 <- agglomerateByRank( GlobalPatterns, rank="Phylum", empty.ranks.rm = TRUE) head(rowData(x4)) # If the assay contains NAs, you might want to specify na.rm=TRUE, # since summing-up NAs lead to NA x5 <- GlobalPatterns # Replace first value with NA assay(x5)[1,1] <- NA x6 <- agglomerateByRank(x5, "Kingdom") head( assay(x6) ) # Use na.rm=TRUE x6 <- agglomerateByRank(x5, "Kingdom", na.rm = TRUE) head( assay(x6) ) ## Look at enterotype dataset... data(enterotype) ## Print the available taxonomic ranks. Shows only 1 available rank, ## not useful for agglomerateByRank taxonomyRanks(enterotype) ### Merge TreeSummarizedExperiments on rows and columns data(esophagus) esophagus plot(rowTree(esophagus)) # Get a factor for merging f <- factor(regmatches(rownames(esophagus), regexpr("^[0-9]*_[0-9]*",rownames(esophagus)))) merged <- agglomerateByVariable( esophagus, by = "rows", f, update.tree = TRUE) plot(rowTree(merged)) # data(GlobalPatterns) GlobalPatterns merged <- agglomerateByVariable( GlobalPatterns, by = "cols", colData(GlobalPatterns)$SampleType) merged data(GlobalPatterns) # print the available taxonomic ranks taxonomyRanks(GlobalPatterns) # agglomerateByRanks # tse <- agglomerateByRanks(GlobalPatterns) altExps(tse) altExp(tse,"Kingdom") altExp(tse,"Species") # unsplitByRanks tse <- unsplitByRanks(tse) tse
### Agglomerate data based on taxonomic information data(GlobalPatterns) # print the available taxonomic ranks colnames(rowData(GlobalPatterns)) taxonomyRanks(GlobalPatterns) # agglomerate at the Family taxonomic rank x1 <- agglomerateByRank(GlobalPatterns, rank="Family") ## How many taxa before/after agglomeration? nrow(GlobalPatterns) nrow(x1) # agglomerate the tree as well x2 <- agglomerateByRank(GlobalPatterns, rank="Family", update.tree = TRUE) nrow(x2) # same number of rows, but rowTree(x1) # ... different rowTree(x2) # ... tree # If assay contains binary or negative values, summing might lead to # meaningless values, and you will get a warning. In these cases, you might # want to do agglomeration again at chosen taxonomic level. tse <- transformAssay(GlobalPatterns, method = "pa") tse <- agglomerateByRank(tse, rank = "Genus") tse <- transformAssay(tse, method = "pa") # Removing empty labels by setting empty.rm = TRUE sum(is.na(rowData(GlobalPatterns)$Family)) x3 <- agglomerateByRank(GlobalPatterns, rank="Family", empty.rm = TRUE) nrow(x3) # different from x2 # Because all the rownames are from the same rank, rownames do not include # prefixes, in this case "Family:". print(rownames(x3[1:3,])) # To add them, use getTaxonomyLabels function. rownames(x3) <- getTaxonomyLabels(x3, with.rank = TRUE) print(rownames(x3[1:3,])) # use 'empty.ranks.rm' to remove columns that include only NAs x4 <- agglomerateByRank( GlobalPatterns, rank="Phylum", empty.ranks.rm = TRUE) head(rowData(x4)) # If the assay contains NAs, you might want to specify na.rm=TRUE, # since summing-up NAs lead to NA x5 <- GlobalPatterns # Replace first value with NA assay(x5)[1,1] <- NA x6 <- agglomerateByRank(x5, "Kingdom") head( assay(x6) ) # Use na.rm=TRUE x6 <- agglomerateByRank(x5, "Kingdom", na.rm = TRUE) head( assay(x6) ) ## Look at enterotype dataset... data(enterotype) ## Print the available taxonomic ranks. Shows only 1 available rank, ## not useful for agglomerateByRank taxonomyRanks(enterotype) ### Merge TreeSummarizedExperiments on rows and columns data(esophagus) esophagus plot(rowTree(esophagus)) # Get a factor for merging f <- factor(regmatches(rownames(esophagus), regexpr("^[0-9]*_[0-9]*",rownames(esophagus)))) merged <- agglomerateByVariable( esophagus, by = "rows", f, update.tree = TRUE) plot(rowTree(merged)) # data(GlobalPatterns) GlobalPatterns merged <- agglomerateByVariable( GlobalPatterns, by = "cols", colData(GlobalPatterns)$SampleType) merged data(GlobalPatterns) # print the available taxonomic ranks taxonomyRanks(GlobalPatterns) # agglomerateByRanks # tse <- agglomerateByRanks(GlobalPatterns) altExps(tse) altExp(tse,"Kingdom") altExp(tse,"Species") # unsplitByRanks tse <- unsplitByRanks(tse) tse
Agglomerate data based on population prevalence
agglomerateByPrevalence(x, ...) ## S4 method for signature 'SummarizedExperiment' agglomerateByPrevalence( x, rank = NULL, other.name = other_label, other_label = "Other", ... ) ## S4 method for signature 'TreeSummarizedExperiment' agglomerateByPrevalence( x, rank = NULL, other.name = other_label, other_label = "Other", update.tree = FALSE, ... )
agglomerateByPrevalence(x, ...) ## S4 method for signature 'SummarizedExperiment' agglomerateByPrevalence( x, rank = NULL, other.name = other_label, other_label = "Other", ... ) ## S4 method for signature 'TreeSummarizedExperiment' agglomerateByPrevalence( x, rank = NULL, other.name = other_label, other_label = "Other", update.tree = FALSE, ... )
x |
|
... |
arguments passed to |
rank |
|
other.name |
|
other_label |
Deprecated. use |
update.tree |
|
agglomerateByPrevalence
sums up the values of assays at the taxonomic
level specified by rank
(by default the highest taxonomic level
available) and selects the summed results that exceed the given population
prevalence at the given detection level. The other summed values (below the
threshold) are agglomerated in an additional row taking the name indicated by
other.name
(by default "Other").
agglomerateByPrevalence
returns a taxonomically-agglomerated object
of the same class as x and based on prevalent taxonomic results.
## Data can be aggregated based on prevalent taxonomic results data(GlobalPatterns) tse <- GlobalPatterns tse <- transformAssay(tse, method = "relabundance") tse <- agglomerateByPrevalence( tse, rank = "Phylum", assay.type = "relabundance", detection = 1/100, prevalence = 50/100) tse # Here data is aggregated at the taxonomic level "Phylum". The five phyla # that exceed the population prevalence threshold of 50/100 represent the # five first rows of the assay in the aggregated data. The sixth and last row # named by default "Other" takes the summed up values of all the other phyla # that are below the prevalence threshold. assay(tse)[,1:5]
## Data can be aggregated based on prevalent taxonomic results data(GlobalPatterns) tse <- GlobalPatterns tse <- transformAssay(tse, method = "relabundance") tse <- agglomerateByPrevalence( tse, rank = "Phylum", assay.type = "relabundance", detection = 1/100, prevalence = 50/100) tse # Here data is aggregated at the taxonomic level "Phylum". The five phyla # that exceed the population prevalence threshold of 50/100 represent the # five first rows of the assay in the aggregated data. The sixth and last row # named by default "Other" takes the summed up values of all the other phyla # that are below the prevalence threshold. assay(tse)[,1:5]
These functions are accessors for functions implemented in the
DirichletMultinomial
package
calculateDMN(x, ...) ## S4 method for signature 'ANY' calculateDMN( x, k = 1, BPPARAM = SerialParam(), seed = runif(1, 0, .Machine$integer.max), ... ) ## S4 method for signature 'SummarizedExperiment' calculateDMN( x, assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", transposed = FALSE, ... ) runDMN(x, name = "DMN", ...) getDMN(x, name = "DMN", ...) ## S4 method for signature 'SummarizedExperiment' getDMN(x, name = "DMN") bestDMNFit(x, name = "DMN", type = c("laplace", "AIC", "BIC"), ...) ## S4 method for signature 'SummarizedExperiment' bestDMNFit(x, name = "DMN", type = c("laplace", "AIC", "BIC")) getBestDMNFit(x, name = "DMN", type = c("laplace", "AIC", "BIC"), ...) ## S4 method for signature 'SummarizedExperiment' getBestDMNFit(x, name = "DMN", type = c("laplace", "AIC", "BIC")) calculateDMNgroup(x, ...) ## S4 method for signature 'ANY' calculateDMNgroup( x, variable, k = 1, seed = runif(1, 0, .Machine$integer.max), ... ) ## S4 method for signature 'SummarizedExperiment' calculateDMNgroup( x, variable, assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", transposed = FALSE, ... ) performDMNgroupCV(x, ...) ## S4 method for signature 'ANY' performDMNgroupCV( x, variable, k = 1, seed = runif(1, 0, .Machine$integer.max), ... ) ## S4 method for signature 'SummarizedExperiment' performDMNgroupCV( x, variable, assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", transposed = FALSE, ... )
calculateDMN(x, ...) ## S4 method for signature 'ANY' calculateDMN( x, k = 1, BPPARAM = SerialParam(), seed = runif(1, 0, .Machine$integer.max), ... ) ## S4 method for signature 'SummarizedExperiment' calculateDMN( x, assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", transposed = FALSE, ... ) runDMN(x, name = "DMN", ...) getDMN(x, name = "DMN", ...) ## S4 method for signature 'SummarizedExperiment' getDMN(x, name = "DMN") bestDMNFit(x, name = "DMN", type = c("laplace", "AIC", "BIC"), ...) ## S4 method for signature 'SummarizedExperiment' bestDMNFit(x, name = "DMN", type = c("laplace", "AIC", "BIC")) getBestDMNFit(x, name = "DMN", type = c("laplace", "AIC", "BIC"), ...) ## S4 method for signature 'SummarizedExperiment' getBestDMNFit(x, name = "DMN", type = c("laplace", "AIC", "BIC")) calculateDMNgroup(x, ...) ## S4 method for signature 'ANY' calculateDMNgroup( x, variable, k = 1, seed = runif(1, 0, .Machine$integer.max), ... ) ## S4 method for signature 'SummarizedExperiment' calculateDMNgroup( x, variable, assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", transposed = FALSE, ... ) performDMNgroupCV(x, ...) ## S4 method for signature 'ANY' performDMNgroupCV( x, variable, k = 1, seed = runif(1, 0, .Machine$integer.max), ... ) ## S4 method for signature 'SummarizedExperiment' performDMNgroupCV( x, variable, assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", transposed = FALSE, ... )
x |
a numeric matrix with samples as rows or a
|
... |
optional arguments not used. |
k |
|
BPPARAM |
A
|
seed |
|
assay.type |
|
assay_name |
Deprecated. Use |
exprs_values |
Deprecated. Use |
transposed |
|
name |
|
type |
|
variable |
|
calculateDMN
and getDMN
return a list of DMN
objects,
one element for each value of k provided.
bestDMNFit
returns the index for the best fit and getBestDMNFit
returns a single DMN
object.
calculateDMNgroup
returns a
DMNGroup
object
performDMNgroupCV
returns a data.frame
DMN-class
,
DMNGroup-class
,
dmn
,
dmngroup
,
cvdmngroup
,
accessors for DMN objects
fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv") counts <- as.matrix(read.csv(fl, row.names=1)) fl <- system.file(package="DirichletMultinomial", "extdata", "TwinStudy.t") pheno0 <- scan(fl) lvls <- c("Lean", "Obese", "Overwt") pheno <- factor(lvls[pheno0 + 1], levels=lvls) colData <- DataFrame(pheno = pheno) tse <- TreeSummarizedExperiment(assays = list(counts = counts), colData = colData) library(bluster) # Compute DMM algorithm and store result in metadata tse <- addCluster(tse, name = "DMM", DmmParam(k = 1:3, type = "laplace"), by = "samples", full = TRUE) # Get the list of DMN objects metadata(tse)$DMM$dmm # Get and display which objects fits best bestFit <- metadata(tse)$DMM$best bestFit # Get the model that generated the best fit bestModel <- metadata(tse)$DMM$dmm[[bestFit]] bestModel # Get the sample-cluster assignment probability matrix head(metadata(tse)$DMM$prob) # Get the weight of each component for the best model bestModel@mixture$Weight
fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv") counts <- as.matrix(read.csv(fl, row.names=1)) fl <- system.file(package="DirichletMultinomial", "extdata", "TwinStudy.t") pheno0 <- scan(fl) lvls <- c("Lean", "Obese", "Overwt") pheno <- factor(lvls[pheno0 + 1], levels=lvls) colData <- DataFrame(pheno = pheno) tse <- TreeSummarizedExperiment(assays = list(counts = counts), colData = colData) library(bluster) # Compute DMM algorithm and store result in metadata tse <- addCluster(tse, name = "DMM", DmmParam(k = 1:3, type = "laplace"), by = "samples", full = TRUE) # Get the list of DMN objects metadata(tse)$DMM$dmm # Get and display which objects fits best bestFit <- metadata(tse)$DMM$best bestFit # Get the model that generated the best fit bestModel <- metadata(tse)$DMM$dmm[[bestFit]] bestModel # Get the sample-cluster assignment probability matrix head(metadata(tse)$DMM$prob) # Get the weight of each component for the best model bestModel@mixture$Weight
TreeSummarizedExperiment
object from ‘DADA2’ resultsCreate a TreeSummarizedExperiment
object from ‘DADA2’ results
convertFromDADA2(...)
convertFromDADA2(...)
... |
Additional arguments. For |
convertFromDADA2
is a wrapper for the
mergePairs
function from the dada2
package.
A count matrix is constructed via
makeSequenceTable(mergePairs(...))
and rownames are dynamically
created as ASV(N)
with N
from 1 to nrow
of the count
tables. The colnames and rownames from the output of makeSequenceTable
are stored as colnames
and in the referenceSeq
slot of the
TreeSummarizedExperiment
, respectively.
convertFromDADA2
returns an object of class
TreeSummarizedExperiment
### Coerce DADA2 results to a TreeSE object if(requireNamespace("dada2")) { fnF <- system.file("extdata", "sam1F.fastq.gz", package="dada2") fnR = system.file("extdata", "sam1R.fastq.gz", package="dada2") dadaF <- dada2::dada(fnF, selfConsist=TRUE) dadaR <- dada2::dada(fnR, selfConsist=TRUE) tse <- convertFromDADA2(dadaF, fnF, dadaR, fnR) tse }
### Coerce DADA2 results to a TreeSE object if(requireNamespace("dada2")) { fnF <- system.file("extdata", "sam1F.fastq.gz", package="dada2") fnR = system.file("extdata", "sam1R.fastq.gz", package="dada2") dadaF <- dada2::dada(fnF, selfConsist=TRUE) dadaR <- dada2::dada(fnR, selfConsist=TRUE) tse <- convertFromDADA2(dadaF, fnF, dadaR, fnR) tse }
TreeSummarizedExperiment
object from a phyloseq objectCreate a TreeSummarizedExperiment
object from a phyloseq object
Create a phyloseq object from a TreeSummarizedExperiment
object
convertFromPhyloseq(x) convertToPhyloseq(x, ...) ## S4 method for signature 'SummarizedExperiment' convertToPhyloseq(x, assay.type = "counts", assay_name = NULL, ...) ## S4 method for signature 'TreeSummarizedExperiment' convertToPhyloseq(x, tree.name = tree_name, tree_name = "phylo", ...)
convertFromPhyloseq(x) convertToPhyloseq(x, ...) ## S4 method for signature 'SummarizedExperiment' convertToPhyloseq(x, assay.type = "counts", assay_name = NULL, ...) ## S4 method for signature 'TreeSummarizedExperiment' convertToPhyloseq(x, tree.name = tree_name, tree_name = "phylo", ...)
x |
a
|
... |
Additional arguments. Not used currently. |
assay.type |
|
assay_name |
Deprecated. Use |
tree.name |
|
tree_name |
Deprecated. Use |
convertFromPhyloseq
converts phyloseq
objects into
TreeSummarizedExperiment
objects.
All data stored in a phyloseq
object is transferred.
convertToPhyloseq
creates a phyloseq object from a
TreeSummarizedExperiment
object. By using assay.type
, it is possible to specify which table
from assay
is added to the phyloseq object.
convertFromPhyloseq
returns an object of class
TreeSummarizedExperiment
convertToPhyloseq
returns an object of class
phyloseq
### Coerce a phyloseq object to a TreeSE object if (requireNamespace("phyloseq")) { data(GlobalPatterns, package="phyloseq") convertFromPhyloseq(GlobalPatterns) data(enterotype, package="phyloseq") convertFromPhyloseq(enterotype) data(esophagus, package="phyloseq") convertFromPhyloseq(esophagus) } ### Coerce a TreeSE object to a phyloseq object # Get tse object data(GlobalPatterns) tse <- GlobalPatterns # Create a phyloseq object from it phy <- convertToPhyloseq(tse) phy # By default the chosen table is counts, but if there are other tables, # they can be chosen with assay.type. # Counts relative abundances table tse <- transformAssay(tse, method = "relabundance") phy2 <- convertToPhyloseq(tse, assay.type = "relabundance") phy2
### Coerce a phyloseq object to a TreeSE object if (requireNamespace("phyloseq")) { data(GlobalPatterns, package="phyloseq") convertFromPhyloseq(GlobalPatterns) data(enterotype, package="phyloseq") convertFromPhyloseq(enterotype) data(esophagus, package="phyloseq") convertFromPhyloseq(esophagus) } ### Coerce a TreeSE object to a phyloseq object # Get tse object data(GlobalPatterns) tse <- GlobalPatterns # Create a phyloseq object from it phy <- convertToPhyloseq(tse) phy # By default the chosen table is counts, but if there are other tables, # they can be chosen with assay.type. # Counts relative abundances table tse <- transformAssay(tse, method = "relabundance") phy2 <- convertToPhyloseq(tse, assay.type = "relabundance") phy2
These functions will be deprecated. Please use other functions instead.
cluster(x, ...) ## S4 method for signature 'SummarizedExperiment' cluster(x, ...) addTaxonomyTree(x, ...) ## S4 method for signature 'SummarizedExperiment' addTaxonomyTree(x, ...) taxonomyTree(x, ...) ## S4 method for signature 'SummarizedExperiment' taxonomyTree(x, ...) mergeRows(x, ...) ## S4 method for signature 'SummarizedExperiment' mergeRows(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' mergeRows(x, ...) mergeCols(x, ...) ## S4 method for signature 'SummarizedExperiment' mergeCols(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' mergeCols(x, ...) mergeFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' mergeFeatures(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' mergeFeatures(x, ...) mergeSamples(x, ...) ## S4 method for signature 'SummarizedExperiment' mergeSamples(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' mergeSamples(x, ...) mergeFeaturesByRank(x, ...) ## S4 method for signature 'SummarizedExperiment' mergeFeaturesByRank(x, ...) ## S4 method for signature 'SingleCellExperiment' mergeFeaturesByRank(x, ...) mergeFeaturesByPrevalence(x, ...) ## S4 method for signature 'SummarizedExperiment' mergeFeaturesByPrevalence(x, ...) getExperimentCrossAssociation(x, ...) ## S4 method for signature 'MultiAssayExperiment' getExperimentCrossAssociation(x, ...) ## S4 method for signature 'SummarizedExperiment' getExperimentCrossAssociation(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' mergeFeaturesByRank(x, ...) testExperimentCrossAssociation(x, ...) ## S4 method for signature 'ANY' testExperimentCrossAssociation(x, ...) testExperimentCrossCorrelation(x, ...) ## S4 method for signature 'ANY' testExperimentCrossCorrelation(x, ...) getExperimentCrossCorrelation(x, ...) ## S4 method for signature 'ANY' getExperimentCrossCorrelation(x, ...) loadFromBiom(...) loadFromQIIME2(...) readQZA(...) loadFromMothur(...) loadFromMetaphlan(...) loadFromHumann(...) countDominantFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' countDominantFeatures(x, ...) subsetByRareTaxa(x, ...) ## S4 method for signature 'ANY' subsetByRareTaxa(x, ...) subsetByRareFeatures(x, ...) ## S4 method for signature 'ANY' subsetByRareFeatures(x, ...) subsetByPrevalentTaxa(x, ...) ## S4 method for signature 'ANY' subsetByPrevalentTaxa(x, ...) subsetByPrevalentFeatures(x, ...) ## S4 method for signature 'ANY' subsetByPrevalentFeatures(x, ...) countDominantTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' countDominantTaxa(x, ...) full_join(x, ...) ## S4 method for signature 'ANY' full_join(x, ...) inner_join(x, ...) ## S4 method for signature 'ANY' inner_join(x, ...) left_join(x, ...) ## S4 method for signature 'ANY' left_join(x, ...) right_join(x, ...) ## S4 method for signature 'ANY' right_join(x, ...) plotNMDS(x, ...) estimateDivergence(x, ...) ## S4 method for signature 'SummarizedExperiment' estimateDivergence(x, ...) meltAssay(x, ...) ## S4 method for signature 'SummarizedExperiment' meltAssay(x, ...) transformSamples(x, ...) ## S4 method for signature 'SummarizedExperiment' transformSamples(x, ...) ZTransform(x, ...) ## S4 method for signature 'SummarizedExperiment' ZTransform(x, ...) relAbundanceCounts(x, ...) ## S4 method for signature 'SummarizedExperiment' relAbundanceCounts(x, ...) transformCounts(x, ...) transformFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' transformFeatures(x, ...) getUniqueFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' getUniqueFeatures(x, ...) getUniqueTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' getUniqueTaxa(x, ...) getTopFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' getTopFeatures(x, ...) getTopTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' getTopTaxa(x, ...) getRareFeatures(x, ...) ## S4 method for signature 'ANY' getRareFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' getRareFeatures(x, ...) getRareTaxa(x, ...) ## S4 method for signature 'ANY' getRareTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' getRareTaxa(x, ...) getPrevalentFeatures(x, ...) ## S4 method for signature 'ANY' getPrevalentFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' getPrevalentFeatures(x, ...) getPrevalentTaxa(x, ...) ## S4 method for signature 'ANY' getPrevalentTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' getPrevalentTaxa(x, ...) subsampleCounts(x, ...) ## S4 method for signature 'SummarizedExperiment' subsampleCounts(x, ...) addPerSampleDominantFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' addPerSampleDominantFeatures(x, ...) addPerSampleDominantTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' addPerSampleDominantTaxa(x, ...) perSampleDominantFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' perSampleDominantFeatures(x, ...) perSampleDominantTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' perSampleDominantTaxa(x, ...) makePhyloseqFromTreeSE(x, ...) ## S4 method for signature 'SummarizedExperiment' makePhyloseqFromTreeSE(x) ## S4 method for signature 'TreeSummarizedExperiment' makePhyloseqFromTreeSE(x) makePhyloseqFromTreeSummarizedExperiment(x) ## S4 method for signature 'ANY' makePhyloseqFromTreeSummarizedExperiment(x) makeTreeSummarizedExperimentFromPhyloseq(x) ## S4 method for signature 'ANY' makeTreeSummarizedExperimentFromPhyloseq(x) makeTreeSEFromBiom(...) makeTreeSummarizedExperimentFromBiom(...) makeTreeSEFromDADA2(...) makeTreeSummarizedExperimentFromDADA2(...) makeTreeSEFromPhyloseq(x) estimateEvenness(x, ...) ## S4 method for signature 'ANY' estimateEvenness(x, ...) estimateRichness(x, ...) ## S4 method for signature 'ANY' estimateRichness(x, ...) estimateDiversity(x, ...) ## S4 method for signature 'ANY' estimateDiversity(x, ...) estimateFaith(x, ...) ## S4 method for signature 'ANY' estimateFaith(x, ...) estimateDominance(x, ...) ## S4 method for signature 'ANY' estimateDominance(x, ...) subsetSamples(x, ...) subsetFeatures(x, ...) subsetTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' subsetSamples(x, ...) ## S4 method for signature 'SummarizedExperiment' subsetFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' subsetTaxa(x, ...) relabundance(x, ...) relabundance(x) <- value ## S4 method for signature 'SummarizedExperiment' relabundance(x) ## S4 replacement method for signature 'SummarizedExperiment' relabundance(x) <- value runOverlap(x, ...) ## S4 method for signature 'SummarizedExperiment' runOverlap(x, ...) calculateOverlap(x, ...) ## S4 method for signature 'ANY' calculateOverlap(x, ...) calculateJSD(x, ...) ## S4 method for signature 'ANY' calculateJSD(x, ...) runJSD(x, ...) ## S4 method for signature 'SummarizedExperiment' runJSD(x, ...) calculateUnifrac(x, ...) ## S4 method for signature 'ANY' calculateUnifrac(x, ...) runUnifrac(mat, tree, ...)
cluster(x, ...) ## S4 method for signature 'SummarizedExperiment' cluster(x, ...) addTaxonomyTree(x, ...) ## S4 method for signature 'SummarizedExperiment' addTaxonomyTree(x, ...) taxonomyTree(x, ...) ## S4 method for signature 'SummarizedExperiment' taxonomyTree(x, ...) mergeRows(x, ...) ## S4 method for signature 'SummarizedExperiment' mergeRows(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' mergeRows(x, ...) mergeCols(x, ...) ## S4 method for signature 'SummarizedExperiment' mergeCols(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' mergeCols(x, ...) mergeFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' mergeFeatures(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' mergeFeatures(x, ...) mergeSamples(x, ...) ## S4 method for signature 'SummarizedExperiment' mergeSamples(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' mergeSamples(x, ...) mergeFeaturesByRank(x, ...) ## S4 method for signature 'SummarizedExperiment' mergeFeaturesByRank(x, ...) ## S4 method for signature 'SingleCellExperiment' mergeFeaturesByRank(x, ...) mergeFeaturesByPrevalence(x, ...) ## S4 method for signature 'SummarizedExperiment' mergeFeaturesByPrevalence(x, ...) getExperimentCrossAssociation(x, ...) ## S4 method for signature 'MultiAssayExperiment' getExperimentCrossAssociation(x, ...) ## S4 method for signature 'SummarizedExperiment' getExperimentCrossAssociation(x, ...) ## S4 method for signature 'TreeSummarizedExperiment' mergeFeaturesByRank(x, ...) testExperimentCrossAssociation(x, ...) ## S4 method for signature 'ANY' testExperimentCrossAssociation(x, ...) testExperimentCrossCorrelation(x, ...) ## S4 method for signature 'ANY' testExperimentCrossCorrelation(x, ...) getExperimentCrossCorrelation(x, ...) ## S4 method for signature 'ANY' getExperimentCrossCorrelation(x, ...) loadFromBiom(...) loadFromQIIME2(...) readQZA(...) loadFromMothur(...) loadFromMetaphlan(...) loadFromHumann(...) countDominantFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' countDominantFeatures(x, ...) subsetByRareTaxa(x, ...) ## S4 method for signature 'ANY' subsetByRareTaxa(x, ...) subsetByRareFeatures(x, ...) ## S4 method for signature 'ANY' subsetByRareFeatures(x, ...) subsetByPrevalentTaxa(x, ...) ## S4 method for signature 'ANY' subsetByPrevalentTaxa(x, ...) subsetByPrevalentFeatures(x, ...) ## S4 method for signature 'ANY' subsetByPrevalentFeatures(x, ...) countDominantTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' countDominantTaxa(x, ...) full_join(x, ...) ## S4 method for signature 'ANY' full_join(x, ...) inner_join(x, ...) ## S4 method for signature 'ANY' inner_join(x, ...) left_join(x, ...) ## S4 method for signature 'ANY' left_join(x, ...) right_join(x, ...) ## S4 method for signature 'ANY' right_join(x, ...) plotNMDS(x, ...) estimateDivergence(x, ...) ## S4 method for signature 'SummarizedExperiment' estimateDivergence(x, ...) meltAssay(x, ...) ## S4 method for signature 'SummarizedExperiment' meltAssay(x, ...) transformSamples(x, ...) ## S4 method for signature 'SummarizedExperiment' transformSamples(x, ...) ZTransform(x, ...) ## S4 method for signature 'SummarizedExperiment' ZTransform(x, ...) relAbundanceCounts(x, ...) ## S4 method for signature 'SummarizedExperiment' relAbundanceCounts(x, ...) transformCounts(x, ...) transformFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' transformFeatures(x, ...) getUniqueFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' getUniqueFeatures(x, ...) getUniqueTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' getUniqueTaxa(x, ...) getTopFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' getTopFeatures(x, ...) getTopTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' getTopTaxa(x, ...) getRareFeatures(x, ...) ## S4 method for signature 'ANY' getRareFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' getRareFeatures(x, ...) getRareTaxa(x, ...) ## S4 method for signature 'ANY' getRareTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' getRareTaxa(x, ...) getPrevalentFeatures(x, ...) ## S4 method for signature 'ANY' getPrevalentFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' getPrevalentFeatures(x, ...) getPrevalentTaxa(x, ...) ## S4 method for signature 'ANY' getPrevalentTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' getPrevalentTaxa(x, ...) subsampleCounts(x, ...) ## S4 method for signature 'SummarizedExperiment' subsampleCounts(x, ...) addPerSampleDominantFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' addPerSampleDominantFeatures(x, ...) addPerSampleDominantTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' addPerSampleDominantTaxa(x, ...) perSampleDominantFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' perSampleDominantFeatures(x, ...) perSampleDominantTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' perSampleDominantTaxa(x, ...) makePhyloseqFromTreeSE(x, ...) ## S4 method for signature 'SummarizedExperiment' makePhyloseqFromTreeSE(x) ## S4 method for signature 'TreeSummarizedExperiment' makePhyloseqFromTreeSE(x) makePhyloseqFromTreeSummarizedExperiment(x) ## S4 method for signature 'ANY' makePhyloseqFromTreeSummarizedExperiment(x) makeTreeSummarizedExperimentFromPhyloseq(x) ## S4 method for signature 'ANY' makeTreeSummarizedExperimentFromPhyloseq(x) makeTreeSEFromBiom(...) makeTreeSummarizedExperimentFromBiom(...) makeTreeSEFromDADA2(...) makeTreeSummarizedExperimentFromDADA2(...) makeTreeSEFromPhyloseq(x) estimateEvenness(x, ...) ## S4 method for signature 'ANY' estimateEvenness(x, ...) estimateRichness(x, ...) ## S4 method for signature 'ANY' estimateRichness(x, ...) estimateDiversity(x, ...) ## S4 method for signature 'ANY' estimateDiversity(x, ...) estimateFaith(x, ...) ## S4 method for signature 'ANY' estimateFaith(x, ...) estimateDominance(x, ...) ## S4 method for signature 'ANY' estimateDominance(x, ...) subsetSamples(x, ...) subsetFeatures(x, ...) subsetTaxa(x, ...) ## S4 method for signature 'SummarizedExperiment' subsetSamples(x, ...) ## S4 method for signature 'SummarizedExperiment' subsetFeatures(x, ...) ## S4 method for signature 'SummarizedExperiment' subsetTaxa(x, ...) relabundance(x, ...) relabundance(x) <- value ## S4 method for signature 'SummarizedExperiment' relabundance(x) ## S4 replacement method for signature 'SummarizedExperiment' relabundance(x) <- value runOverlap(x, ...) ## S4 method for signature 'SummarizedExperiment' runOverlap(x, ...) calculateOverlap(x, ...) ## S4 method for signature 'ANY' calculateOverlap(x, ...) calculateJSD(x, ...) ## S4 method for signature 'ANY' calculateJSD(x, ...) runJSD(x, ...) ## S4 method for signature 'SummarizedExperiment' runJSD(x, ...) calculateUnifrac(x, ...) ## S4 method for signature 'ANY' calculateUnifrac(x, ...) runUnifrac(mat, tree, ...)
x |
A
|
... |
Additional parameters. See dedicated function. |
value |
a matrix to store as the ‘relabundance’ assay |
mat |
An abundance matrix |
tree |
A phylogenetic tree |
dmn_se is a dataset on twins' microbiome where samples are stratified by their community composition through Dirichlet Multinomial Mixtures (DMM). It was derived from the DirichletMultinomial package.
data(dmn_se)
data(dmn_se)
A SummarizedExperiment with 130 features and 278 samples. The rowData contains no taxonomic information. The colData includes:
participant's weight condition (Lean, Overwt and Obese)
Turnbaugh, PJ et al.
Holmes I, Harris K, Quince C (2012). Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics. PLoS ONE 7(2): e30126. https://doi.org/10.1371/journal.pone.0030126
Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, et al. (2009). A core gut microbiome in obese and lean twins. Nature 457: 480–484. https://doi.org/10.1038/nature07540
The enterotype data of the human gut microbiome includes taxonomic profiling for 280 fecal samples from 22 subjects based on shotgun DNA sequencing. The authors claimed that the data naturally clumps into three community-level clusters, or "enterotypes", that are not immediately explained by sequencing technology or demographic features of the subjects. In a later addendum from 2014 the authors stated that enterotypes should not be seen as discrete clusters, but as a way of stratifying samples to reduce complexity. It was converted into a TreeSummarizedExperiment from the phyloseq package.
data(enterotype)
data(enterotype)
A TreeSummarizedExperiment with 553 features and 280 samples. The rowData contains taxonomic information at Genus level. The colData includes:
enterotype the sample belongs to (1, 2 and 3)
sample ID of samples from all studies
sequencing technology
sample ID of complete samples
original project from which sample was obtained (gill06, turnbaugh09, MetaHIT, MicroObes, MicroAge and kurokawa07)
participant's nationality (american, danish, spanish, french, italian and japanese)
participant's gender (F or M)
participant's age (0.25 – 87)
participant's clinical status (healthy, obese, CD, UC and elderly)
Arumugam, M., Raes, J., et al.
http://www.bork.embl.de/Docu/Arumugam_et_al_2011/downloads.html
Arumugam, M., et al. (2011). Enterotypes of the human gut microbiome. Nature, 473(7346), 174-180. https://doi.org/10.1038/nature09944
Arumugam, M., et al. (2014). Addendum: Enterotypes of the human gut microbiome. Nature 506, 516 (2014). https://doi.org/10.1038/nature13075
This small dataset from a human esophageal community includes 3 samples from 3 human adults based on biopsies analysed with 16S rDNA PCR. The 16S rRNA sequence processing is provided in the mothur wiki from the link below. It was converted into a TreeSummarizedExperiment from the phyloseq package.
data(esophagus)
data(esophagus)
A TreeSummarizedExperiment with 58 features and 3 samples. The rowData contains no taxonomic information. The colData is empty.
Pei et al. [email protected].
http://www.mothur.org/wiki/Esophageal_community_analysis
Pei, Z., Bini, E. J., Yang, L., Zhou, M., Francois, F., & Blaser, M. J. (2004). Bacterial biota in the human distal esophagus. Proceedings of the National Academy of Sciences of the United States of America, 101(12), 4250-4255. https://doi.org/10.1073/pnas.0306398101
McMurdie, J. & Holmes, S. (2013) phyloseq: An R Package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE. 8(4):e61217. https://doi.org/10.1371/journal.pone.0061217
Calculate correlations between features of two experiments.
getCrossAssociation(x, ...) ## S4 method for signature 'MultiAssayExperiment' getCrossAssociation( x, experiment1 = 1, experiment2 = 2, assay.type1 = assay_name1, assay_name1 = "counts", assay.type2 = assay_name2, assay_name2 = "counts", altexp1 = NULL, altexp2 = NULL, col.var1 = colData_variable1, colData_variable1 = NULL, col.var2 = colData_variable2, colData_variable2 = NULL, by = MARGIN, MARGIN = 1, method = c("kendall", "spearman", "categorical", "pearson"), mode = "table", p.adj.method = p_adj_method, p_adj_method = c("fdr", "BH", "bonferroni", "BY", "hochberg", "holm", "hommel", "none"), p.adj.threshold = p_adj_threshold, p_adj_threshold = NULL, cor.threshold = cor_threshold, cor_threshold = NULL, sort = FALSE, filter.self.cor = filter_self_correlations, filter_self_correlations = FALSE, verbose = TRUE, test.signif = test_significance, test_significance = FALSE, show.warnings = show_warnings, show_warnings = TRUE, paired = FALSE, ... ) ## S4 method for signature 'SummarizedExperiment' getCrossAssociation(x, experiment2 = x, ...)
getCrossAssociation(x, ...) ## S4 method for signature 'MultiAssayExperiment' getCrossAssociation( x, experiment1 = 1, experiment2 = 2, assay.type1 = assay_name1, assay_name1 = "counts", assay.type2 = assay_name2, assay_name2 = "counts", altexp1 = NULL, altexp2 = NULL, col.var1 = colData_variable1, colData_variable1 = NULL, col.var2 = colData_variable2, colData_variable2 = NULL, by = MARGIN, MARGIN = 1, method = c("kendall", "spearman", "categorical", "pearson"), mode = "table", p.adj.method = p_adj_method, p_adj_method = c("fdr", "BH", "bonferroni", "BY", "hochberg", "holm", "hommel", "none"), p.adj.threshold = p_adj_threshold, p_adj_threshold = NULL, cor.threshold = cor_threshold, cor_threshold = NULL, sort = FALSE, filter.self.cor = filter_self_correlations, filter_self_correlations = FALSE, verbose = TRUE, test.signif = test_significance, test_significance = FALSE, show.warnings = show_warnings, show_warnings = TRUE, paired = FALSE, ... ) ## S4 method for signature 'SummarizedExperiment' getCrossAssociation(x, experiment2 = x, ...)
x |
A
|
... |
Additional arguments:
|
experiment1 |
|
experiment2 |
|
assay.type1 |
|
assay_name1 |
Deprecated. Use |
assay.type2 |
|
assay_name2 |
Deprecated. Use |
altexp1 |
|
altexp2 |
|
col.var1 |
|
colData_variable1 |
Deprecated. Use |
col.var2 |
|
colData_variable2 |
Deprecated. Use |
by |
A |
MARGIN |
Deprecated. Use |
method |
|
mode |
|
p.adj.method |
|
p_adj_method |
Deprecated. Use |
p.adj.threshold |
|
p_adj_threshold |
Deprecated. Use |
cor.threshold |
|
cor_threshold |
Deprecated. Use |
sort |
|
filter.self.cor |
|
filter_self_correlations |
Deprecated. Use |
verbose |
|
test.signif |
|
test_significance |
Deprecated. Use |
show.warnings |
|
show_warnings |
Deprecated. use |
paired |
|
The function getCrossAssociation
calculates associations between
features of two experiments. By default, it not only computes associations
but also tests their significance. If desired, setting
test.signif
to FALSE disables significance calculation.
We recommend the non-parametric Kendall's tau as the default method for association analysis. Kendall's tau has desirable statistical properties and robustness at lower sample sizes. Spearman rank correlation can provide faster solutions when running times are critical.
This function returns associations in table or matrix format. In table format, returned value is a data frame that includes features and associations (and p-values) in columns. In matrix format, returned value is a one matrix when only associations are calculated. If also significances are tested, then returned value is a list of matrices.
data(HintikkaXOData) mae <- HintikkaXOData # Subset so that less observations / quicker to run, just for example mae[[1]] <- mae[[1]][1:20, 1:10] mae[[2]] <- mae[[2]][1:20, 1:10] # Several rows in the counts assay have a standard deviation of zero # Remove them, since they do not add useful information about # cross-association mae[[1]] <- mae[[1]][rowSds(assay(mae[[1]])) > 0, ] # Transform data mae[[1]] <- transformAssay(mae[[1]], method = "rclr") # Calculate cross-correlations result <- getCrossAssociation(mae, method = "pearson", assay.type2 = "nmr") # Show first 5 entries head(result, 5) # Use altExp option to specify alternative experiment from the experiment altExp(mae[[1]], "Phylum") <- agglomerateByRank(mae[[1]], rank = "Phylum") # Transform data altExp(mae[[1]], "Phylum") <- transformAssay( altExp(mae[[1]], "Phylum"), method = "relabundance") # When mode = "matrix", the return value is a matrix result <- getCrossAssociation( mae, experiment2 = 2, assay.type1 = "relabundance", assay.type2 = "nmr", altexp1 = "Phylum", method = "pearson", mode = "matrix") # Show first 5 entries head(result, 5) # If test.signif = TRUE, then getCrossAssociation additionally returns # significances # filter.self.cor = TRUE filters self correlations # p.adj.threshold can be used to filter those features that do not # have any correlations whose p-value is lower than the threshold result <- getCrossAssociation( mae[[1]], experiment2 = mae[[1]], method = "pearson", filter.self.cor = TRUE, p.adj.threshold = 0.05, test.signif = TRUE) # Show first 5 entries head(result, 5) # Returned value is a list of matrices names(result) # Calculate Bray-Curtis dissimilarity between samples. If dataset includes # paired samples, you can use paired = TRUE. result <- getCrossAssociation( mae[[1]], mae[[1]], by = 2, paired = FALSE, association.fun = getDissimilarity, method = "bray") # If experiments are equal and measure is symmetric # (e.g., taxa1 vs taxa2 == taxa2 vs taxa1), # it is possible to speed-up calculations by calculating association only # for unique variable-pairs. Use "symmetric" to choose whether to measure # association for only other half of of variable-pairs. result <- getCrossAssociation( mae, experiment1 = "microbiota", experiment2 = "microbiota", assay.type1 = "counts", assay.type2 = "counts", symmetric = TRUE) # For big data sets, the calculations might take a long time. # To speed them up, you can take a random sample from the data. # When dealing with complex biological problems, random samples can be # enough to describe the data. Here, our random sample is 30 % of whole data. sample_size <- 0.3 tse <- mae[[1]] tse_sub <- tse[ sample( seq_len( nrow(tse) ), sample_size * nrow(tse) ), ] result <- getCrossAssociation(tse_sub) # It is also possible to choose variables from colData and calculate # association between assay and sample metadata or between variables of # sample metadata mae[[1]] <- addAlpha(mae[[1]]) # col.var works similarly to assay.type. Instead of fetching # an assay named assay.type from assay slot, it fetches a column named # col.var from colData. result <- getCrossAssociation( mae[[1]], assay.type1 = "counts", col.var2 = c("shannon_diversity", "coverage_diversity"), test.signif = TRUE)
data(HintikkaXOData) mae <- HintikkaXOData # Subset so that less observations / quicker to run, just for example mae[[1]] <- mae[[1]][1:20, 1:10] mae[[2]] <- mae[[2]][1:20, 1:10] # Several rows in the counts assay have a standard deviation of zero # Remove them, since they do not add useful information about # cross-association mae[[1]] <- mae[[1]][rowSds(assay(mae[[1]])) > 0, ] # Transform data mae[[1]] <- transformAssay(mae[[1]], method = "rclr") # Calculate cross-correlations result <- getCrossAssociation(mae, method = "pearson", assay.type2 = "nmr") # Show first 5 entries head(result, 5) # Use altExp option to specify alternative experiment from the experiment altExp(mae[[1]], "Phylum") <- agglomerateByRank(mae[[1]], rank = "Phylum") # Transform data altExp(mae[[1]], "Phylum") <- transformAssay( altExp(mae[[1]], "Phylum"), method = "relabundance") # When mode = "matrix", the return value is a matrix result <- getCrossAssociation( mae, experiment2 = 2, assay.type1 = "relabundance", assay.type2 = "nmr", altexp1 = "Phylum", method = "pearson", mode = "matrix") # Show first 5 entries head(result, 5) # If test.signif = TRUE, then getCrossAssociation additionally returns # significances # filter.self.cor = TRUE filters self correlations # p.adj.threshold can be used to filter those features that do not # have any correlations whose p-value is lower than the threshold result <- getCrossAssociation( mae[[1]], experiment2 = mae[[1]], method = "pearson", filter.self.cor = TRUE, p.adj.threshold = 0.05, test.signif = TRUE) # Show first 5 entries head(result, 5) # Returned value is a list of matrices names(result) # Calculate Bray-Curtis dissimilarity between samples. If dataset includes # paired samples, you can use paired = TRUE. result <- getCrossAssociation( mae[[1]], mae[[1]], by = 2, paired = FALSE, association.fun = getDissimilarity, method = "bray") # If experiments are equal and measure is symmetric # (e.g., taxa1 vs taxa2 == taxa2 vs taxa1), # it is possible to speed-up calculations by calculating association only # for unique variable-pairs. Use "symmetric" to choose whether to measure # association for only other half of of variable-pairs. result <- getCrossAssociation( mae, experiment1 = "microbiota", experiment2 = "microbiota", assay.type1 = "counts", assay.type2 = "counts", symmetric = TRUE) # For big data sets, the calculations might take a long time. # To speed them up, you can take a random sample from the data. # When dealing with complex biological problems, random samples can be # enough to describe the data. Here, our random sample is 30 % of whole data. sample_size <- 0.3 tse <- mae[[1]] tse_sub <- tse[ sample( seq_len( nrow(tse) ), sample_size * nrow(tse) ), ] result <- getCrossAssociation(tse_sub) # It is also possible to choose variables from colData and calculate # association between assay and sample metadata or between variables of # sample metadata mae[[1]] <- addAlpha(mae[[1]]) # col.var works similarly to assay.type. Instead of fetching # an assay named assay.type from assay slot, it fetches a column named # col.var from colData. result <- getCrossAssociation( mae[[1]], assay.type1 = "counts", col.var2 = c("shannon_diversity", "coverage_diversity"), test.signif = TRUE)
These functions are designed to calculate dissimilarities on data stored
within a
TreeSummarizedExperiment
object. For overlap, Unifrac, and Jensen-Shannon Divergence (JSD)
dissimilarities, the functions use mia internal functions, while for other
types of dissimilarities, they rely on vegdist
by default.
addDissimilarity(x, method, ...) ## S4 method for signature 'SummarizedExperiment' addDissimilarity(x, method = "bray", name = method, ...) getDissimilarity(x, method, ...) ## S4 method for signature 'SummarizedExperiment' getDissimilarity( x, method = "bray", assay.type = "counts", niter = NULL, transposed = FALSE, ... ) ## S4 method for signature 'TreeSummarizedExperiment' getDissimilarity( x, method = "bray", assay.type = "counts", niter = NULL, transposed = FALSE, ... ) ## S4 method for signature 'ANY' getDissimilarity(x, method = "bray", niter = NULL, ...)
addDissimilarity(x, method, ...) ## S4 method for signature 'SummarizedExperiment' addDissimilarity(x, method = "bray", name = method, ...) getDissimilarity(x, method, ...) ## S4 method for signature 'SummarizedExperiment' getDissimilarity( x, method = "bray", assay.type = "counts", niter = NULL, transposed = FALSE, ... ) ## S4 method for signature 'TreeSummarizedExperiment' getDissimilarity( x, method = "bray", assay.type = "counts", niter = NULL, transposed = FALSE, ... ) ## S4 method for signature 'ANY' getDissimilarity(x, method = "bray", niter = NULL, ...)
x |
|
method |
|
... |
other arguments passed into
|
name |
|
assay.type |
|
niter |
The number of iterations performed. If |
transposed |
|
Overlap reflects similarity between sample-pairs. When overlap is calculated using relative abundances, the higher the value the higher the similarity is. When using relative abundances, overlap value 1 means that all the abundances of features are equal between two samples, and 0 means that samples have completely different relative abundances.
Unifrac is calculated with rbiom:unifrac()
.
If rarefaction is enabled, vegan:avgdist()
is
utilized.
For JSD implementation: Susan Holmes [email protected]. Adapted for phyloseq by Paul J. McMurdie. Adapted for mia by Felix G.M. Ernst
getDissimilarity
returns a sample-by-sample dissimilarity matrix.
addDissimilarity
returns x
that includes dissimilarity matrix
in its metadata.
For unifrac dissimilarity: http://bmf.colorado.edu/unifrac/
See also additional descriptions of Unifrac in the following articles:
Lozupone, Hamady and Knight, “Unifrac - An Online Tool for Comparing Microbial Community Diversity in a Phylogenetic Context.”, BMC Bioinformatics 2006, 7:371
Lozupone, Hamady, Kelley and Knight, “Quantitative and qualitative (beta) diversity measures lead to different insights into factors that structure microbial communities.” Appl Environ Microbiol. 2007
Lozupone C, Knight R. “Unifrac: a new phylogenetic method for comparing microbial communities.” Appl Environ Microbiol. 2005 71 (12):8228-35.
For JSD dissimilarity: Jensen-Shannon Divergence and Hilbert space embedding. Bent Fuglede and Flemming Topsoe University of Copenhagen, Department of Mathematics http://www.math.ku.dk/~topsoe/ISIT2004JSD.pdf
http://en.wikipedia.org/wiki/Jensen-Shannon_divergence
library(mia) library(scater) # load dataset data(GlobalPatterns) tse <- GlobalPatterns ### Overlap dissimilarity tse <- addDissimilarity(tse, method = "overlap", detection = 0.25) metadata(tse)[["overlap"]][1:6, 1:6] ### JSD dissimilarity tse <- addDissimilarity(tse, method = "jsd") metadata(tse)[["jsd"]][1:6, 1:6] # Multi Dimensional Scaling applied to JSD dissimilarity matrix tse <- runMDS(tse, FUN = getDissimilarity, method = "overlap", assay.type = "counts") metadata(tse)[["MDS"]][1:6, ] ### Unifrac dissimilarity res <- getDissimilarity(tse, method = "unifrac", weighted = FALSE) dim(as.matrix((res))) tse <- addDissimilarity(tse, method = "unifrac", weighted = TRUE) metadata(tse)[["unifrac"]][1:6, 1:6]
library(mia) library(scater) # load dataset data(GlobalPatterns) tse <- GlobalPatterns ### Overlap dissimilarity tse <- addDissimilarity(tse, method = "overlap", detection = 0.25) metadata(tse)[["overlap"]][1:6, 1:6] ### JSD dissimilarity tse <- addDissimilarity(tse, method = "jsd") metadata(tse)[["jsd"]][1:6, 1:6] # Multi Dimensional Scaling applied to JSD dissimilarity matrix tse <- runMDS(tse, FUN = getDissimilarity, method = "overlap", assay.type = "counts") metadata(tse)[["MDS"]][1:6, ] ### Unifrac dissimilarity res <- getDissimilarity(tse, method = "unifrac", weighted = FALSE) dim(as.matrix((res))) tse <- addDissimilarity(tse, method = "unifrac", weighted = TRUE) metadata(tse)[["unifrac"]][1:6, 1:6]
These functions return information about the most dominant taxa in a
SummarizedExperiment
object.
getDominant( x, assay.type = assay_name, assay_name = "counts", group = rank, rank = NULL, other.name = "Other", n = NULL, complete = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' getDominant( x, assay.type = assay_name, assay_name = "counts", group = rank, rank = NULL, other.name = "Other", n = NULL, complete = TRUE, ... ) addDominant(x, name = "dominant_taxa", other.name = "Other", n = NULL, ...) ## S4 method for signature 'SummarizedExperiment' addDominant( x, name = "dominant_taxa", other.name = "Other", n = NULL, complete = FALSE, ... )
getDominant( x, assay.type = assay_name, assay_name = "counts", group = rank, rank = NULL, other.name = "Other", n = NULL, complete = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' getDominant( x, assay.type = assay_name, assay_name = "counts", group = rank, rank = NULL, other.name = "Other", n = NULL, complete = TRUE, ... ) addDominant(x, name = "dominant_taxa", other.name = "Other", n = NULL, ...) ## S4 method for signature 'SummarizedExperiment' addDominant( x, name = "dominant_taxa", other.name = "Other", n = NULL, complete = FALSE, ... )
x |
|
assay.type |
|
assay_name |
Deprecated. Use |
group |
|
rank |
Deprecated. Use |
other.name |
|
n |
|
complete |
|
... |
Additional arguments passed on to |
name |
|
addDominant
extracts the most abundant taxa in a
SummarizedExperiment
object, and stores the information in the colData
. It is a wrapper for
getDominant
.
With group
parameter, it is possible to agglomerate rows based on
groups. If the value is one of the columns in taxonomyRanks()
,
agglomerateByRank()
is applied. Otherwise,
agglomerateByVariable()
is utilized.
E.g. if 'Genus' rank is used, all abundances of same Genus
are added together, and agglomerated features are returned.
See corresponding functions for additional arguments to deal with
missing values or special characters.
getDominant
returns a named character vector x
while addDominant
returns
SummarizedExperiment
with additional column in colData
named *name*
.
data(GlobalPatterns) x <- GlobalPatterns # Finds the dominant taxa. sim.dom <- getDominant(x, group = "Genus") # Add information to colData x <- addDominant(x, group = "Genus", name ="dominant_genera") colData(x)
data(GlobalPatterns) x <- GlobalPatterns # Finds the dominant taxa. sim.dom <- getDominant(x, group = "Genus") # Add information to colData x <- addDominant(x, group = "Genus", name ="dominant_genera") colData(x)
getMediation
and addMediation
provide a wrapper of
mediate
for
SummarizedExperiment
.
addMediation(x, ...) ## S4 method for signature 'SummarizedExperiment' addMediation( x, outcome, treatment, name = "mediation", mediator = NULL, assay.type = NULL, dimred = NULL, family = gaussian(), covariates = NULL, p.adj.method = "holm", add.metadata = FALSE, verbose = TRUE, ... ) getMediation(x, ...) ## S4 method for signature 'SummarizedExperiment' getMediation( x, outcome, treatment, mediator = NULL, assay.type = "counts", dimred = NULL, family = gaussian(), covariates = NULL, p.adj.method = "holm", add.metadata = FALSE, verbose = TRUE, ... )
addMediation(x, ...) ## S4 method for signature 'SummarizedExperiment' addMediation( x, outcome, treatment, name = "mediation", mediator = NULL, assay.type = NULL, dimred = NULL, family = gaussian(), covariates = NULL, p.adj.method = "holm", add.metadata = FALSE, verbose = TRUE, ... ) getMediation(x, ...) ## S4 method for signature 'SummarizedExperiment' getMediation( x, outcome, treatment, mediator = NULL, assay.type = "counts", dimred = NULL, family = gaussian(), covariates = NULL, p.adj.method = "holm", add.metadata = FALSE, verbose = TRUE, ... )
x |
|
... |
additional parameters that can be passed to
|
outcome |
|
treatment |
|
name |
|
mediator |
|
assay.type |
|
dimred |
|
family |
|
covariates |
|
p.adj.method |
|
add.metadata |
|
verbose |
|
This wrapper of mediate
for
SummarizedExperiment
provides a simple method to analyse the effect of a treatment variable on an
outcome variable found in colData(se)
through the mediation of either
another variable in colData (argument mediator
) or an assay
(argument assay.type
) or a reducedDim (argument dimred
).
Importantly, those three arguments are mutually exclusive.
getMediation
returns a data.frame
of adjusted p-values and
effect sizes for the ACMEs and ADEs of every mediator given as input, whereas
addMediation
returns an updated
SummarizedExperiment
instance with the same data.frame
stored in the metadata as
"mediation". Its columns include:
the mediator variable
the Average Causal Mediation Effect (ACME) estimate
the Average Direct Effect (ADE) estimate
the adjusted p-value for the ACME estimate
the adjusted p-value for the ADE estimate
## Not run: # Import libraries library(mia) library(scater) # Load dataset data(hitchip1006, package = "miaTime") tse <- hitchip1006 # Agglomerate features by family (merely to speed up execution) tse <- agglomerateByRank(tse, rank = "Phylum") # Convert BMI variable to numeric tse$bmi_group <- as.numeric(tse$bmi_group) # Analyse mediated effect of nationality on BMI via alpha diversity # 100 permutations were done to speed up execution, but ~1000 are recommended med_df <- getMediation(tse, outcome = "bmi_group", treatment = "nationality", mediator = "diversity", covariates = c("sex", "age"), treat.value = "Scandinavia", control.value = "CentralEurope", boot = TRUE, sims = 100, add.metadata = TRUE) # Visualise model statistics for 1st mediator plot(attr(med_df, "metadata")[[1]]) # Apply clr transformation to counts assay tse <- transformAssay(tse, method = "clr", pseudocount = 1) # Analyse mediated effect of nationality on BMI via clr-transformed features # 100 permutations were done to speed up execution, but ~1000 are recommended tse <- addMediation(tse, name = "assay_mediation", outcome = "bmi_group", treatment = "nationality", assay.type = "clr", covariates = c("sex", "age"), treat.value = "Scandinavia", control.value = "CentralEurope", boot = TRUE, sims = 100, p.adj.method = "fdr") # Show results for first 5 mediators head(metadata(tse)$assay_mediation, 5) # Perform ordination tse <- runMDS(tse, name = "MDS", method = "euclidean", assay.type = "clr", ncomponents = 3) # Analyse mediated effect of nationality on BMI via NMDS components # 100 permutations were done to speed up execution, but ~1000 are recommended tse <- addMediation(tse, name = "reddim_mediation", outcome = "bmi_group", treatment = "nationality", dimred = "MDS", covariates = c("sex", "age"), treat.value = "Scandinavia", control.value = "CentralEurope", boot = TRUE, sims = 100, p.adj.method = "fdr") # Show results for first 5 mediators head(metadata(tse)$reddim_mediation, 5) ## End(Not run)
## Not run: # Import libraries library(mia) library(scater) # Load dataset data(hitchip1006, package = "miaTime") tse <- hitchip1006 # Agglomerate features by family (merely to speed up execution) tse <- agglomerateByRank(tse, rank = "Phylum") # Convert BMI variable to numeric tse$bmi_group <- as.numeric(tse$bmi_group) # Analyse mediated effect of nationality on BMI via alpha diversity # 100 permutations were done to speed up execution, but ~1000 are recommended med_df <- getMediation(tse, outcome = "bmi_group", treatment = "nationality", mediator = "diversity", covariates = c("sex", "age"), treat.value = "Scandinavia", control.value = "CentralEurope", boot = TRUE, sims = 100, add.metadata = TRUE) # Visualise model statistics for 1st mediator plot(attr(med_df, "metadata")[[1]]) # Apply clr transformation to counts assay tse <- transformAssay(tse, method = "clr", pseudocount = 1) # Analyse mediated effect of nationality on BMI via clr-transformed features # 100 permutations were done to speed up execution, but ~1000 are recommended tse <- addMediation(tse, name = "assay_mediation", outcome = "bmi_group", treatment = "nationality", assay.type = "clr", covariates = c("sex", "age"), treat.value = "Scandinavia", control.value = "CentralEurope", boot = TRUE, sims = 100, p.adj.method = "fdr") # Show results for first 5 mediators head(metadata(tse)$assay_mediation, 5) # Perform ordination tse <- runMDS(tse, name = "MDS", method = "euclidean", assay.type = "clr", ncomponents = 3) # Analyse mediated effect of nationality on BMI via NMDS components # 100 permutations were done to speed up execution, but ~1000 are recommended tse <- addMediation(tse, name = "reddim_mediation", outcome = "bmi_group", treatment = "nationality", dimred = "MDS", covariates = c("sex", "age"), treat.value = "Scandinavia", control.value = "CentralEurope", boot = TRUE, sims = 100, p.adj.method = "fdr") # Show results for first 5 mediators head(metadata(tse)$reddim_mediation, 5) ## End(Not run)
These functions perform PERMANOVA to assess the significance of group
differences based on a specified dissimilarity matrix. The results can be
returned directly or added to metadata in an object of class
TreeSummarizedExperiment
.
getPERMANOVA(x, ...) addPERMANOVA(x, ...) ## S4 method for signature 'SingleCellExperiment' getPERMANOVA(x, ...) ## S4 method for signature 'SummarizedExperiment' getPERMANOVA(x, assay.type = "counts", formula = NULL, col.var = NULL, ...) ## S4 method for signature 'ANY' getPERMANOVA(x, formula, data, method = "bray", test.homogeneity = TRUE, ...) ## S4 method for signature 'SummarizedExperiment' addPERMANOVA(x, name = "permanova", ...)
getPERMANOVA(x, ...) addPERMANOVA(x, ...) ## S4 method for signature 'SingleCellExperiment' getPERMANOVA(x, ...) ## S4 method for signature 'SummarizedExperiment' getPERMANOVA(x, assay.type = "counts", formula = NULL, col.var = NULL, ...) ## S4 method for signature 'ANY' getPERMANOVA(x, formula, data, method = "bray", test.homogeneity = TRUE, ...) ## S4 method for signature 'SummarizedExperiment' addPERMANOVA(x, name = "permanova", ...)
x |
|
... |
additional arguments passed to
|
assay.type |
|
formula |
|
col.var |
|
data |
|
method |
|
test.homogeneity |
|
name |
|
PERMANOVA is a non-parametric method used to test whether the centroids of different groups (as defined by the formula or covariates) are significantly different in terms of multivariate space.
PERMANOVA relies on the assumption of group homogeneity, meaning the groups should be distinct and have similar variances within each group. This assumption is essential as PERMANOVA is sensitive to differences in within-group dispersion, which can otherwise confound results. This is why the functions return homogeneity test results by default.
The functions utilize vegan::adonis2
to compute
PERMANOVA. For homogeneity testing,
vegan::betadisper
along
with vegan::permutest
are utilized by default,
which allow testing for equal dispersion across groups and validate the
homogeneity assumption.
PERMANOVA and distance-based redundancy analysis (dbRDA) are closely related methods for analyzing multivariate data. PERMANOVA is non-parametric, making fewer assumptions about the data. In contrast, dbRDA assumes a linear relationship when constructing the ordination space, although it also employs a PERMANOVA-like approach to test the significance of predictors within this space. dbRDA offers a broader scope overall, as it provides visualization of the constrained ordination, which can reveal patterns and relationships. However, when the underlying data structure is non-linear, the results from the two methods can differ significantly due to dbRDA's reliance on linear assumptions.
getPERMANOVA
returns the PERMANOVA results or a list containing the
PERMANOVA results and homogeneity test results if
test.homogeneity = TRUE
. addPERMANOVA
adds these results to
metadata of x
.
For more details on the actual implementation see
vegan::adonis2
,
vegan::betadisper
, and
vegan::permutest
. See also
addCCA
and addRDA
data(GlobalPatterns) tse <- GlobalPatterns # Apply relative transformation tse <- transformAssay(tse, method = "relabundance") # Perform PERMANOVA tse <- addPERMANOVA( tse, assay.type = "relabundance", method = "bray", formula = x ~ SampleType, permutations = 99 ) # The results are stored to metadata metadata(tse)[["permanova"]] # Calculate dbRDA rda_res <- getRDA( tse, assay.type = "relabundance", method = "bray", formula = x ~ SampleType, permutations = 99) # Significance results are similar to PERMANOVA attr(rda_res, "significance")
data(GlobalPatterns) tse <- GlobalPatterns # Apply relative transformation tse <- transformAssay(tse, method = "relabundance") # Perform PERMANOVA tse <- addPERMANOVA( tse, assay.type = "relabundance", method = "bray", formula = x ~ SampleType, permutations = 99 ) # The results are stored to metadata metadata(tse)[["permanova"]] # Calculate dbRDA rda_res <- getRDA( tse, assay.type = "relabundance", method = "bray", formula = x ~ SampleType, permutations = 99) # Significance results are similar to PERMANOVA attr(rda_res, "significance")
These functions calculate the population prevalence for taxonomic ranks in a
SummarizedExperiment-class
object.
getPrevalence(x, ...) ## S4 method for signature 'ANY' getPrevalence( x, detection = 0, include.lowest = include_lowest, include_lowest = FALSE, sort = FALSE, na.rm = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' getPrevalence( x, assay.type = assay_name, assay_name = "counts", rank = NULL, ... ) getPrevalent(x, ...) ## S4 method for signature 'ANY' getPrevalent( x, prevalence = 50/100, include.lowest = include_lowest, include_lowest = FALSE, ... ) ## S4 method for signature 'SummarizedExperiment' getPrevalent( x, rank = NULL, prevalence = 50/100, include.lowest = include_lowest, include_lowest = FALSE, ... ) getRare(x, ...) ## S4 method for signature 'ANY' getRare( x, prevalence = 50/100, include.lowest = include_lowest, include_lowest = FALSE, ... ) ## S4 method for signature 'SummarizedExperiment' getRare( x, rank = NULL, prevalence = 50/100, include.lowest = include_lowest, include_lowest = FALSE, ... ) subsetByPrevalent(x, ...) ## S4 method for signature 'SummarizedExperiment' subsetByPrevalent(x, rank = NULL, ...) ## S4 method for signature 'TreeSummarizedExperiment' subsetByPrevalent(x, update.tree = FALSE, ...) subsetByRare(x, ...) ## S4 method for signature 'SummarizedExperiment' subsetByRare(x, rank = NULL, ...) ## S4 method for signature 'TreeSummarizedExperiment' subsetByRare(x, update.tree = FALSE, ...) getPrevalentAbundance( x, assay.type = assay_name, assay_name = "relabundance", ... ) ## S4 method for signature 'ANY' getPrevalentAbundance( x, assay.type = assay_name, assay_name = "relabundance", ... ) ## S4 method for signature 'SummarizedExperiment' getPrevalentAbundance(x, assay.type = assay_name, assay_name = "counts", ...)
getPrevalence(x, ...) ## S4 method for signature 'ANY' getPrevalence( x, detection = 0, include.lowest = include_lowest, include_lowest = FALSE, sort = FALSE, na.rm = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' getPrevalence( x, assay.type = assay_name, assay_name = "counts", rank = NULL, ... ) getPrevalent(x, ...) ## S4 method for signature 'ANY' getPrevalent( x, prevalence = 50/100, include.lowest = include_lowest, include_lowest = FALSE, ... ) ## S4 method for signature 'SummarizedExperiment' getPrevalent( x, rank = NULL, prevalence = 50/100, include.lowest = include_lowest, include_lowest = FALSE, ... ) getRare(x, ...) ## S4 method for signature 'ANY' getRare( x, prevalence = 50/100, include.lowest = include_lowest, include_lowest = FALSE, ... ) ## S4 method for signature 'SummarizedExperiment' getRare( x, rank = NULL, prevalence = 50/100, include.lowest = include_lowest, include_lowest = FALSE, ... ) subsetByPrevalent(x, ...) ## S4 method for signature 'SummarizedExperiment' subsetByPrevalent(x, rank = NULL, ...) ## S4 method for signature 'TreeSummarizedExperiment' subsetByPrevalent(x, update.tree = FALSE, ...) subsetByRare(x, ...) ## S4 method for signature 'SummarizedExperiment' subsetByRare(x, rank = NULL, ...) ## S4 method for signature 'TreeSummarizedExperiment' subsetByRare(x, update.tree = FALSE, ...) getPrevalentAbundance( x, assay.type = assay_name, assay_name = "relabundance", ... ) ## S4 method for signature 'ANY' getPrevalentAbundance( x, assay.type = assay_name, assay_name = "relabundance", ... ) ## S4 method for signature 'SummarizedExperiment' getPrevalentAbundance(x, assay.type = assay_name, assay_name = "counts", ...)
x |
|
... |
additional arguments
|
detection |
|
include.lowest |
|
include_lowest |
Deprecated. Use |
sort |
|
na.rm |
|
assay.type |
|
assay_name |
Deprecated. Use |
rank |
|
prevalence |
Prevalence threshold (in 0 to 1). The
required prevalence is strictly greater by default. To include the
limit, set |
update.tree |
|
getPrevalence
calculates the frequency of samples that exceed
the detection threshold. For SummarizedExperiment
objects, the
prevalence is calculated for the selected taxonomic rank, otherwise for the
rows. The absolute population prevalence can be obtained by multiplying the
prevalence by the number of samples (ncol(x)
).
The core abundance index from getPrevalentAbundance
gives the relative
proportion of the core species (in between 0 and 1). The core taxa are
defined as those that exceed the given population prevalence threshold at the
given detection level as set for getPrevalent
.
subsetPrevalent
and subsetRareFeatures
return a subset of
x
.
The subset includes the most prevalent or rare taxa that are calculated with
getPrevalent
or getRare
respectively.
getPrevalent
returns taxa that are more prevalent with the
given detection threshold for the selected taxonomic rank.
getRare
returns complement of getPrevalent
.
subsetPrevalent
and subsetRareFeatures
return subset of
x
.
All other functions return a named vectors:
getPrevalence
returns a numeric
vector with the
names being set to either the row names of x
or the names after
agglomeration.
getPrevalentAbundance
returns a numeric
vector with
the names corresponding to the column name of x
and include the
joint abundance of prevalent taxa.
getPrevalent
and getRare
return a
character
vector with only the names exceeding the threshold set
by prevalence
, if the rownames
of x
is set.
Otherwise an integer
vector is returned matching the rows in
x
.
A Salonen et al. The adult intestinal core microbiota is determined by analysis depth and health status. Clinical Microbiology and Infection 18(S4):16 20, 2012. To cite the R package, see citation('mia')
data(GlobalPatterns) tse <- GlobalPatterns # Get prevalence estimates for individual ASV/OTU prevalence.frequency <- getPrevalence(tse, detection = 0, sort = TRUE) head(prevalence.frequency) # Get prevalence estimates for phyla # - the getPrevalence function itself always returns population frequencies prevalence.frequency <- getPrevalence(tse, rank = "Phylum", detection = 0, sort = TRUE) head(prevalence.frequency) # - to obtain population counts, multiply frequencies with the sample size, # which answers the question "In how many samples is this phylum detectable" prevalence.count <- prevalence.frequency * ncol(tse) head(prevalence.count) # Detection threshold 1 (strictly greater by default); # Note that the data (GlobalPatterns) is here in absolute counts # (and not compositional, relative abundances) # Prevalence threshold 50 percent (strictly greater by default) prevalent <- getPrevalent( tse, rank = "Phylum", detection = 10, prevalence = 50/100) head(prevalent) # Add relative aundance data tse <- transformAssay(tse, assay.type = "counts", method = "relabundance") # Gets a subset of object that includes prevalent taxa altExp(tse, "prevalent") <- subsetByPrevalent(tse, rank = "Family", assay.type = "relabundance", detection = 0.001, prevalence = 0.55) altExp(tse, "prevalent") # getRare returns the inverse rare <- getRare(tse, rank = "Phylum", assay.type = "relabundance", detection = 1/100, prevalence = 50/100) head(rare) # Gets a subset of object that includes rare taxa altExp(tse, "rare") <- subsetByRare( tse, rank = "Class", assay.type = "relabundance", detection = 0.001, prevalence = 0.001) altExp(tse, "rare") # Names of both experiments, prevalent and rare, can be found from slot # altExpNames tse data(esophagus) getPrevalentAbundance(esophagus, assay.type = "counts")
data(GlobalPatterns) tse <- GlobalPatterns # Get prevalence estimates for individual ASV/OTU prevalence.frequency <- getPrevalence(tse, detection = 0, sort = TRUE) head(prevalence.frequency) # Get prevalence estimates for phyla # - the getPrevalence function itself always returns population frequencies prevalence.frequency <- getPrevalence(tse, rank = "Phylum", detection = 0, sort = TRUE) head(prevalence.frequency) # - to obtain population counts, multiply frequencies with the sample size, # which answers the question "In how many samples is this phylum detectable" prevalence.count <- prevalence.frequency * ncol(tse) head(prevalence.count) # Detection threshold 1 (strictly greater by default); # Note that the data (GlobalPatterns) is here in absolute counts # (and not compositional, relative abundances) # Prevalence threshold 50 percent (strictly greater by default) prevalent <- getPrevalent( tse, rank = "Phylum", detection = 10, prevalence = 50/100) head(prevalent) # Add relative aundance data tse <- transformAssay(tse, assay.type = "counts", method = "relabundance") # Gets a subset of object that includes prevalent taxa altExp(tse, "prevalent") <- subsetByPrevalent(tse, rank = "Family", assay.type = "relabundance", detection = 0.001, prevalence = 0.55) altExp(tse, "prevalent") # getRare returns the inverse rare <- getRare(tse, rank = "Phylum", assay.type = "relabundance", detection = 1/100, prevalence = 50/100) head(rare) # Gets a subset of object that includes rare taxa altExp(tse, "rare") <- subsetByRare( tse, rank = "Class", assay.type = "relabundance", detection = 0.001, prevalence = 0.001) altExp(tse, "rare") # Names of both experiments, prevalent and rare, can be found from slot # altExpNames tse data(esophagus) getPrevalentAbundance(esophagus, assay.type = "counts")
GlobalPatterns compared the microbial communities from 25 environmental samples
and three known "mock communities" at a an average depth of 3.1 million reads
per sample. Authors reproduced diversity patterns seen in many other
published studies, while investigating technical bias by applying the same
techniques to simulated microbial communities of known composition. Special
thanks are given to J. Gregory Caporaso for providing the OTU-clustered data
files for inclusion in the phyloseq package, from which this data was
converted to TreeSummarizedExperiment
.
data(GlobalPatterns)
data(GlobalPatterns)
A TreeSummarizedExperiment with 19216 features and 26 samples. The rowData contains taxonomic information at Kingdom, Phylum, Class, Order, Family, Genus and Species levels. The colData includes:
Sample ID taken from the corresponding study
primer used for sequencing
final barcode (6 nucleotides)
truncated barcode with an added tyrosine (6 nucleotides)
complete barcode with a length of 11 nucleotides
sampling type by collection site (Soil, Feces, Skin, Tongue, Freshwater, Creek Freshwater, Ocean, Estuary Sediment and Mock)
additional information (sampling location, environmental factors and study type)
Caporaso, J. G., et al.
Caporaso, J. G., et al. (2011). Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. PNAS, 108, 4516-4522. https://doi.org/10.1073/pnas.1000080107
These functions generate a hierarchy tree using taxonomic information from a
SummarizedExperiment
object and add this hierarchy tree into the rowTree
.
getHierarchyTree(x, ...) ## S4 method for signature 'SummarizedExperiment' getHierarchyTree(x, ...) addHierarchyTree(x, ...) ## S4 method for signature 'SummarizedExperiment' addHierarchyTree(x, ...)
getHierarchyTree(x, ...) ## S4 method for signature 'SummarizedExperiment' getHierarchyTree(x, ...) addHierarchyTree(x, ...) ## S4 method for signature 'SummarizedExperiment' addHierarchyTree(x, ...)
x |
|
... |
optional arguments not used currently. |
addHierarchyTree
calculates a hierarchy tree from the available
taxonomic information and add it to rowTree
.
getHierarchyTree
generates a hierarchy tree from the available
taxonomic information. Internally it uses
toTree
and
resolveLoop
to sanitize
data if needed.
Please note that a hierarchy tree is not an actual phylogenetic tree. A phylogenetic tree represents evolutionary relationships among features. On the other hand, a hierarchy tree organizes species into a hierarchical structure based on their taxonomic ranks.
addHierarchyTree
: a TreeSummarizedExperiment
whose
phylo
tree represents the hierarchy among available taxonomy
information.
getHierarchyTree
: a phylo
tree representing the
hierarchy among available taxonomy information.
# Generate a tree based on taxonomic rank hierarchy (a hierarchy tree). data(GlobalPatterns) tse <- GlobalPatterns getHierarchyTree(tse) # Add a hierarchy tree to a TreeSummarizedExperiment. # Please note that any tree already stored in rowTree() will be overwritten. tse <- addHierarchyTree(tse) tse
# Generate a tree based on taxonomic rank hierarchy (a hierarchy tree). data(GlobalPatterns) tse <- GlobalPatterns getHierarchyTree(tse) # Add a hierarchy tree to a TreeSummarizedExperiment. # Please note that any tree already stored in rowTree() will be overwritten. tse <- addHierarchyTree(tse) tse
HintikkaXO is a multiomics dataset from a rat experiment studying effect of fat and prebiotics in diet. It contains high-throughput profiling data from 40 rat samples, including 39 biomarkers, 38 metabolites (NMR), and 12706 OTUs from 318 species, measured from Cecum. This is diet comparison study with High/Low fat diet and xylo-oligosaccaride supplementation. Column metadata is common for all experiments (microbiota, metabolites, biomarkers) and is described below.
data(HintikkaXOData)
data(HintikkaXOData)
A MultiAssayExperiment with 3 experiments (microbiota, metabolites and biomarkers). rowData of the microbiota experiment contains taxonomic information at Phylum, Class, Order, Family, Genus, Species and OTU levels. The metabolites and biomarkers experiments contain 38 NMR metabolites and 39 biomarkers, respectively. The colData includes:
Sample ID (character)
Rat ID (factor)
Site of measurement ("Cecum"); single value
Diet group (factor; combination of the Fat and XOS fields)
Fat in Diet (factor; Low/High)
XOS Diet Supplement (numeric; 0/1)
Hintikka L et al.
Hintikka L et al. (2021): Xylo-oligosaccharides in prevention of hepatic steatosis and adipose tissue inflammation: associating taxonomic and metabolomic patterns in fecal microbiota with biclustering. International Journal of Environmental Research and Public Health 18(8):4049. https://doi.org/10.3390/ijerph18084049
TreeSummarizedExperiment
object to/from ‘BIOM’
resultsFor convenience, a few functions are available to convert BIOM, DADA2 and
phyloseq objects to
TreeSummarizedExperiment
objects, and
TreeSummarizedExperiment
objects to phyloseq objects.
importBIOM(file, ...) convertFromBIOM( x, prefix.rm = removeTaxaPrefixes, removeTaxaPrefixes = FALSE, rank.from.prefix = rankFromPrefix, rankFromPrefix = FALSE, artifact.rm = remove.artifacts, remove.artifacts = FALSE, ... ) convertToBIOM(x, assay.type = "counts", ...) ## S4 method for signature 'SummarizedExperiment' convertToBIOM(x, assay.type = "counts", ...)
importBIOM(file, ...) convertFromBIOM( x, prefix.rm = removeTaxaPrefixes, removeTaxaPrefixes = FALSE, rank.from.prefix = rankFromPrefix, rankFromPrefix = FALSE, artifact.rm = remove.artifacts, remove.artifacts = FALSE, ... ) convertToBIOM(x, assay.type = "counts", ...) ## S4 method for signature 'SummarizedExperiment' convertToBIOM(x, assay.type = "counts", ...)
file |
BIOM file location |
... |
Additional arguments. Not used currently. |
x |
|
prefix.rm |
|
removeTaxaPrefixes |
Deprecated. Use |
rank.from.prefix |
|
rankFromPrefix |
Deprecated.Use |
artifact.rm |
|
remove.artifacts |
Deprecated. Use |
assay.type |
|
convertFromBIOM
coerces a biom
object to a
TreeSummarizedExperiment
object.
convertToBIOM
coerces a
TreeSummarizedExperiment
object to a biom
object.
importBIOM
loads a BIOM file and creates a
TreeSummarizedExperiment
object from the BIOM object contained in the loaded file.
convertFromBIOM
returns an object of class
TreeSummarizedExperiment
importBIOM
returns an object of class
TreeSummarizedExperiment
importMetaPhlAn
convertFromPhyloseq
convertFromBIOM
convertFromDADA2
importQIIME2
importMothur
importHUMAnN
# Convert BIOM results to a TreeSE # Load biom file library(biomformat) biom_file <- system.file("extdata", "rich_dense_otu_table.biom", package = "biomformat") # Make TreeSE from BIOM object biom_object <- biomformat::read_biom(biom_file) tse <- convertFromBIOM(biom_object) # Convert TreeSE object to BIOM biom <- convertToBIOM(tse) # Load biom file library(biomformat) biom_file <- system.file( "extdata", "rich_dense_otu_table.biom", package = "biomformat") # Make TreeSE from biom file tse <- importBIOM(biom_file) # Get taxonomyRanks from prefixes and remove prefixes tse <- importBIOM( biom_file, rank.from.prefix = TRUE, prefix.rm = TRUE) # Load another biom file biom_file <- system.file( "extdata", "Aggregated_humanization2.biom", package = "mia") # Clean artifacts from taxonomic data tse <- importBIOM(biom_file, artifact.rm = TRUE)
# Convert BIOM results to a TreeSE # Load biom file library(biomformat) biom_file <- system.file("extdata", "rich_dense_otu_table.biom", package = "biomformat") # Make TreeSE from BIOM object biom_object <- biomformat::read_biom(biom_file) tse <- convertFromBIOM(biom_object) # Convert TreeSE object to BIOM biom <- convertToBIOM(tse) # Load biom file library(biomformat) biom_file <- system.file( "extdata", "rich_dense_otu_table.biom", package = "biomformat") # Make TreeSE from biom file tse <- importBIOM(biom_file) # Get taxonomyRanks from prefixes and remove prefixes tse <- importBIOM( biom_file, rank.from.prefix = TRUE, prefix.rm = TRUE) # Load another biom file biom_file <- system.file( "extdata", "Aggregated_humanization2.biom", package = "mia") # Clean artifacts from taxonomic data tse <- importBIOM(biom_file, artifact.rm = TRUE)
TreeSummarizedExperiment
Import HUMAnN results to TreeSummarizedExperiment
file |
|
col.data |
a DataFrame-like object that includes sample names in
rownames, or a single |
colData |
Deprecated. Use |
... |
additional arguments:
|
Import HUMAnN (currently version 3.0 supported) results of functional
predictions based on metagenome composition (e.g. pathways or gene families).
The input must be in merged HUMAnN format. (See
the HUMAnN
documentation and humann_join_tables
method.)
The function parses gene/pathway information along with taxonomy information
from the input file. This information is stored to rowData
. Abundances
are stored to assays
.
Usually the workflow includes also taxonomy data from Metaphlan. See
importMetaPhlAn to load the data to TreeSE
.
A
TreeSummarizedExperiment
object
Beghini F, McIver LJ, Blanco-Míguez A, Dubois L, Asnicar F, Maharjan S, Mailyan A, Manghi P, Scholz M, Thomas AM, Valles-Colomer M, Weingart G, Zhang Y, Zolfo M, Huttenhower C, Franzosa EA, & Segata N (2021) Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife. 10:e65088.
importMetaPhlAn
convertFromPhyloseq
convertFromBIOM
convertFromDADA2
importQIIME2
importMothur
# File path file_path <- system.file("extdata", "humann_output.tsv", package = "mia") # Import data tse <- importHUMAnN(file_path) tse
# File path file_path <- system.file("extdata", "humann_output.tsv", package = "mia") # Import data tse <- importHUMAnN(file_path) tse
TreeSummarizedExperiment
Import Metaphlan results to TreeSummarizedExperiment
file |
|
col.data |
a DataFrame-like object that includes sample names in
rownames, or a single |
colData |
Deprecated. use |
sample_meta |
Deprecated. Use |
tree.file |
|
phy_tree |
Deprecated. Use |
... |
additional arguments:
|
Import Metaphlan (versions 2, 3 and 4 supported) results.
Input must be in merged Metaphlan format.
(See
the Metaphlan documentation and merge_metaphlan_tables
method.)
Data is imported so that data at the lowest rank is imported as a
TreeSummarizedExperiment
object. Data at higher rank is imported as a
SummarizedExperiment
objects which are stored to altExp
of
TreeSummarizedExperiment
object.
Currently Metaphlan versions 2, 3, and 4 are supported.
A
TreeSummarizedExperiment
object
Beghini F, McIver LJ, Blanco-Míguez A, Dubois L, Asnicar F, Maharjan S, Mailyan A, Manghi P, Scholz M, Thomas AM, Valles-Colomer M, Weingart G, Zhang Y, Zolfo M, Huttenhower C, Franzosa EA, & Segata N (2021) Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife. 10:e65088. doi: 10.7554/eLife.65088
convertFromPhyloseq
convertFromBIOM
convertFromDADA2
importQIIME2
importMothur
# (Data is from tutorial # https://github.com/biobakery/biobakery/wiki/metaphlan3#merge-outputs) # File path file_path <- system.file( "extdata", "merged_abundance_table.txt", package = "mia") # Import data tse <- importMetaPhlAn(file_path) # Data at the lowest rank tse # Data at higher rank is stored in altExp altExps(tse) # Higher rank data is in SE format, for example, Phylum rank altExp(tse, "phylum")
# (Data is from tutorial # https://github.com/biobakery/biobakery/wiki/metaphlan3#merge-outputs) # File path file_path <- system.file( "extdata", "merged_abundance_table.txt", package = "mia") # Import data tse <- importMetaPhlAn(file_path) # Data at the lowest rank tse # Data at higher rank is stored in altExp altExps(tse) # Higher rank data is in SE format, for example, Phylum rank altExp(tse, "phylum")
TreeSummarizedExperiment
This method creates a TreeSummarizedExperiment
object from Mothur
files provided as input.
importMothur( assay.file = sharedFile, sharedFile, taxonomyFile = NULL, row.file = taxonomyFile, designFile = NULL, col.file = designFile )
importMothur( assay.file = sharedFile, sharedFile, taxonomyFile = NULL, row.file = taxonomyFile, designFile = NULL, col.file = designFile )
assay.file |
|
sharedFile |
Deprecated. Use |
taxonomyFile |
Deprecated. Use |
row.file |
|
designFile |
Deprecated. Use |
col.file |
|
Results exported from Mothur can be imported as a
SummarizedExperiment
using importMothur
. Except for the
assay.file
, the other data types, row.file
, and
col.file
, are optional, but are highly encouraged to be provided.
A
TreeSummarizedExperiment
object
https://mothur.org/ https://mothur.org/wiki/shared_file/ https://mothur.org/wiki/taxonomy_file/ https://mothur.org/wiki/constaxonomy_file/ https://mothur.org/wiki/design_file/
convertFromPhyloseq
convertFromBIOM
convertFromDADA2
importQIIME2
# Abundance table counts <- system.file("extdata", "mothur_example.shared", package = "mia") # Taxa table (in "cons.taxonomy" or "taxonomy" format) taxa <- system.file("extdata", "mothur_example.cons.taxonomy", package = "mia") #taxa <- system.file("extdata", "mothur_example.taxonomy", package = "mia") # Sample meta data meta <- system.file("extdata", "mothur_example.design", package = "mia") # Creates se object from files se <- importMothur(assay.file = counts, row.file = taxa, col.file = meta) # Convert SE to TreeSE tse <- as(se, "TreeSummarizedExperiment") tse
# Abundance table counts <- system.file("extdata", "mothur_example.shared", package = "mia") # Taxa table (in "cons.taxonomy" or "taxonomy" format) taxa <- system.file("extdata", "mothur_example.cons.taxonomy", package = "mia") #taxa <- system.file("extdata", "mothur_example.taxonomy", package = "mia") # Sample meta data meta <- system.file("extdata", "mothur_example.design", package = "mia") # Creates se object from files se <- importMothur(assay.file = counts, row.file = taxa, col.file = meta) # Convert SE to TreeSE tse <- as(se, "TreeSummarizedExperiment") tse
TreeSummarizedExperiment
Results exported from QIMME2 can be imported as a
TreeSummarizedExperiment
using importQIIME2
. Except for the
assay.file
, the other data types, row.file
,
refseq.file
and tree.file
, are optional, but are highly
encouraged to be provided.
Import the QIIME2 artifacts to R.
importQIIME2( assay.file = featureTableFile, featureTableFile, row.file = taxonomyTableFile, taxonomyTableFile = NULL, col.file = sampleMetaFile, sampleMetaFile = NULL, as.refseq = featureNamesAsRefSeq, featureNamesAsRefSeq = TRUE, refseq.file = refSeqFile, refSeqFile = NULL, tree.file = phyTreeFile, phyTreeFile = NULL, ... ) importQZA(file, temp.dir = temp, temp = tempdir(), ...)
importQIIME2( assay.file = featureTableFile, featureTableFile, row.file = taxonomyTableFile, taxonomyTableFile = NULL, col.file = sampleMetaFile, sampleMetaFile = NULL, as.refseq = featureNamesAsRefSeq, featureNamesAsRefSeq = TRUE, refseq.file = refSeqFile, refSeqFile = NULL, tree.file = phyTreeFile, phyTreeFile = NULL, ... ) importQZA(file, temp.dir = temp, temp = tempdir(), ...)
assay.file |
|
featureTableFile |
Deprecated. use |
row.file |
|
taxonomyTableFile |
Deprecated. use |
col.file |
|
sampleMetaFile |
Deprecated. Use |
as.refseq |
|
featureNamesAsRefSeq |
Deprecated. Use |
refseq.file |
|
refSeqFile |
Deprecated. Use |
tree.file |
|
phyTreeFile |
Deprecated. Use |
... |
additional arguments:
|
file |
character, path of the input qza file. Only files in format of
|
temp.dir |
character, a temporary directory in which the qza file will be
decompressed to, default |
temp |
Deprecated. Use |
Both arguments as.refseq
and refseq.file
can be used
to define reference sequences of features. as.refseq
is
only taken into account, if refseq.file
is NULL
. No reference
sequences are tried to be created, if featureNameAsRefSeq
is
FALSE
and refseq.file
is NULL
.
A
TreeSummarizedExperiment
object
matrix
object for feature table, DataFrame
for taxonomic table,
ape::phylo
object for phylogenetic tree,
Biostrings::DNAStringSet
for representative sequences of taxa.
Yang Cao
Bolyen E et al. 2019: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37: 852–857. https://doi.org/10.1038/s41587-019-0209-9
convertFromPhyloseq
convertFromBIOM
convertFromDADA2
importMothur
assay.file <- system.file("extdata", "table.qza", package = "mia") row.file <- system.file("extdata", "taxonomy.qza", package = "mia") col.file <- system.file("extdata", "sample-metadata.tsv", package = "mia") tree.file <- system.file("extdata", "tree.qza", package = "mia") refseq.file <- system.file("extdata", "refseq.qza", package = "mia") tse <- importQIIME2( assay.file = assay.file, row.file = row.file, col.file = col.file, refseq.file = refseq.file, tree.file = tree.file ) tse # Read individual files assay.file <- system.file("extdata", "table.qza", package = "mia") row.file <- system.file("extdata", "taxonomy.qza", package = "mia") col.file <- system.file("extdata", "sample-metadata.tsv", package = "mia") assay <- importQZA(assay.file) rowdata <- importQZA(row.file, prefix.rm = TRUE) coldata <- read.table(col.file, header = TRUE, sep = "\t", comment.char = "") # Assign rownames rownames(coldata) <- coldata[, 1] coldata[, 1] <- NULL # Order coldata based on assay coldata <- coldata[match(colnames(assay), rownames(coldata)), ] # Create SE from individual files se <- SummarizedExperiment(assays = list(assay), rowData = rowdata, colData = coldata) se
assay.file <- system.file("extdata", "table.qza", package = "mia") row.file <- system.file("extdata", "taxonomy.qza", package = "mia") col.file <- system.file("extdata", "sample-metadata.tsv", package = "mia") tree.file <- system.file("extdata", "tree.qza", package = "mia") refseq.file <- system.file("extdata", "refseq.qza", package = "mia") tse <- importQIIME2( assay.file = assay.file, row.file = row.file, col.file = col.file, refseq.file = refseq.file, tree.file = tree.file ) tse # Read individual files assay.file <- system.file("extdata", "table.qza", package = "mia") row.file <- system.file("extdata", "taxonomy.qza", package = "mia") col.file <- system.file("extdata", "sample-metadata.tsv", package = "mia") assay <- importQZA(assay.file) rowdata <- importQZA(row.file, prefix.rm = TRUE) coldata <- read.table(col.file, header = TRUE, sep = "\t", comment.char = "") # Assign rownames rownames(coldata) <- coldata[, 1] coldata[, 1] <- NULL # Order coldata based on assay coldata <- coldata[match(colnames(assay), rownames(coldata)), ] # Create SE from individual files se <- SummarizedExperiment(assays = list(assay), rowData = rowdata, colData = coldata) se
TreeSummarizedExperiment
Import taxpasta-specific BIOM results to
TreeSummarizedExperiment
importTaxpasta(file, add.tree = TRUE, ...)
importTaxpasta(file, add.tree = TRUE, ...)
file |
|
add.tree |
|
... |
additional arguments
|
importTaxpasta
imports data that is returned from Taxonomic Profile
Aggregation and Standardization (taxpasta) pipeline. See more information on
taxpasta from
taxpasta documentation.
A TreeSummarizedExperiment
object.
## Not run: # File path to BIOM file file_path <- system.file("extdata", "complete.biom", package = "mia") # Import BIOM as TreeSE, and set ranks. tse <- importTaxpasta(file_path, set.ranks = TRUE) # Import BIOM as TreeSE without adding hierarchy tree tse <- importTaxpasta(file_path, add.tree = FALSE) ## End(Not run)
## Not run: # File path to BIOM file file_path <- system.file("extdata", "complete.biom", package = "mia") # Import BIOM as TreeSE, and set ranks. tse <- importTaxpasta(file_path, set.ranks = TRUE) # Import BIOM as TreeSE without adding hierarchy tree tse <- importTaxpasta(file_path, add.tree = FALSE) ## End(Not run)
The decontam
functions isContaminant
and
isNotContaminant
are made available for
SummarizedExperiment
objects.
## S4 method for signature 'SummarizedExperiment' isContaminant( seqtab, assay.type = assay_name, assay_name = "counts", name = "isContaminant", concentration = NULL, control = NULL, batch = NULL, threshold = 0.1, normalize = TRUE, detailed = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' isNotContaminant( seqtab, assay.type = assay_name, assay_name = "counts", name = "isNotContaminant", control = NULL, threshold = 0.5, normalize = TRUE, detailed = FALSE, ... ) addContaminantQC(x, name = "isContaminant", ...) ## S4 method for signature 'SummarizedExperiment' addContaminantQC(x, name = "isContaminant", ...) addNotContaminantQC(x, name = "isNotContaminant", ...) ## S4 method for signature 'SummarizedExperiment' addNotContaminantQC(x, name = "isNotContaminant", ...)
## S4 method for signature 'SummarizedExperiment' isContaminant( seqtab, assay.type = assay_name, assay_name = "counts", name = "isContaminant", concentration = NULL, control = NULL, batch = NULL, threshold = 0.1, normalize = TRUE, detailed = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' isNotContaminant( seqtab, assay.type = assay_name, assay_name = "counts", name = "isNotContaminant", control = NULL, threshold = 0.5, normalize = TRUE, detailed = FALSE, ... ) addContaminantQC(x, name = "isContaminant", ...) ## S4 method for signature 'SummarizedExperiment' addContaminantQC(x, name = "isContaminant", ...) addNotContaminantQC(x, name = "isNotContaminant", ...) ## S4 method for signature 'SummarizedExperiment' addNotContaminantQC(x, name = "isNotContaminant", ...)
seqtab , x
|
|
assay.type |
|
assay_name |
Deprecated. Use |
name |
|
concentration |
|
control |
|
batch |
|
threshold |
|
normalize |
|
detailed |
|
... |
|
for isContaminant
/ isNotContaminant
a DataFrame
or for addContaminantQC
/addNotContaminantQC
a modified object
of class(x)
decontam:isContaminant
,
decontam:isNotContaminant
data(esophagus) # setup of some mock data colData(esophagus)$concentration <- c(1,2,3) colData(esophagus)$control <- c(FALSE,FALSE,TRUE) isContaminant(esophagus, method = "frequency", concentration = "concentration") esophagus <- addContaminantQC(esophagus, method = "frequency", concentration = "concentration") colData(esophagus) isNotContaminant(esophagus, control = "control") esophagus <- addNotContaminantQC(esophagus, control = "control") colData(esophagus)
data(esophagus) # setup of some mock data colData(esophagus)$concentration <- c(1,2,3) colData(esophagus)$control <- c(FALSE,FALSE,TRUE) isContaminant(esophagus, method = "frequency", concentration = "concentration") esophagus <- addContaminantQC(esophagus, method = "frequency", concentration = "concentration") colData(esophagus) isNotContaminant(esophagus, control = "control") esophagus <- addNotContaminantQC(esophagus, control = "control") colData(esophagus)
SummarizedExperiment
object into a long data.framemeltSE
Converts a
SummarizedExperiment
object into a long data.frame which can be used for tidyverse
-tools.
meltSE(x, ...) ## S4 method for signature 'SummarizedExperiment' meltSE( x, assay.type = assay_name, assay_name = "counts", add.row = add_row_data, add_row_data = NULL, add.col = add_col_data, add_col_data = NULL, row.name = feature_name, feature_name = "FeatureID", col.name = sample_name, sample_name = "SampleID", ... )
meltSE(x, ...) ## S4 method for signature 'SummarizedExperiment' meltSE( x, assay.type = assay_name, assay_name = "counts", add.row = add_row_data, add_row_data = NULL, add.col = add_col_data, add_col_data = NULL, row.name = feature_name, feature_name = "FeatureID", col.name = sample_name, sample_name = "SampleID", ... )
x |
|
... |
optional arguments:
|
assay.type |
|
assay_name |
Deprecated. Use |
add.row |
|
add_row_data |
Deprecated. Use |
add.col |
|
add_col_data |
Deprecated. Use |
row.name |
|
feature_name |
Deprecated. Use |
col.name |
|
sample_name |
Deprecated. Use |
If the colData
contains a column “SampleID” or the
rowData
contains a column “FeatureID”, they will be renamed to
“SampleID_col” and “FeatureID_row”, if row names or column
names are set.
A tibble
with the molten data. The assay values are given in a
column named like the selected assay assay.type
. In addition, a
column “FeatureID” will contain the rownames, if set, and analogously
a column “SampleID” with the colnames, if set.
data(GlobalPatterns) molten_tse <- meltSE( GlobalPatterns, assay.type = "counts", add.row = TRUE, add.col = TRUE ) molten_tse
data(GlobalPatterns) molten_tse <- meltSE( GlobalPatterns, assay.type = "counts", add.row = TRUE, add.col = TRUE ) molten_tse
Merge SE objects into single SE object.
mergeSEs(x, ...) ## S4 method for signature 'SimpleList' mergeSEs( x, assay.type = "counts", assay_name = NULL, join = "full", missing.values = missing_values, missing_values = NA, collapse.cols = collapse_samples, collapse_samples = FALSE, collapse.rows = collapse_features, collapse_features = TRUE, verbose = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' mergeSEs(x, y = NULL, ...) ## S4 method for signature 'list' mergeSEs(x, ...)
mergeSEs(x, ...) ## S4 method for signature 'SimpleList' mergeSEs( x, assay.type = "counts", assay_name = NULL, join = "full", missing.values = missing_values, missing_values = NA, collapse.cols = collapse_samples, collapse_samples = FALSE, collapse.rows = collapse_features, collapse_features = TRUE, verbose = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' mergeSEs(x, y = NULL, ...) ## S4 method for signature 'list' mergeSEs(x, ...)
x |
|
... |
optional arguments (not used). |
assay.type |
|
assay_name |
Deprecated. Use |
join |
|
missing.values |
|
missing_values |
Deprecated. Use |
collapse.cols |
|
collapse_samples |
Deprecated. Use |
collapse.rows |
|
collapse_features |
Deprecated. Use |
verbose |
|
y |
a |
This function merges multiple SummarizedExperiment
objects. It combines
rowData
, assays
, and colData
so that the output includes
each unique row and column ones. The merging is done based on rownames
and
colnames
. rowTree
and colTree
are preserved if linkage
between rows/cols and the tree is found.
Equally named rows are interpreted as equal. Further
matching based on rowData
is not done. For samples, collapsing
is disabled by default meaning that equally named samples that are stored
in different objects are interpreted as unique. Collapsing can be enabled
with collapse.cols = TRUE
when equally named samples describe the same
sample.
If, for example, all rows are not shared with
individual objects, there are missing values in assays
. The notation of missing
can be specified with the missing.values
argument. If input consists of
TreeSummarizedExperiment
objects, also rowTree
, colTree
, and
referenceSeq
are preserved if possible. The data is preserved if
all the rows or columns can be found from it.
Compared to cbind
and rbind
mergeSEs
allows more freely merging since cbind
and rbind
expect
that rows and columns are matching, respectively.
You can choose joining methods from 'full'
, 'inner'
,
'left'
, and 'right'
. In all the methods, all the samples are
included in the result object. However, with different methods, it is possible
to choose which rows are included.
full
– all unique features
inner
– all shared features
left
– all the features of the first object
right
– all the features of the second object
The output depends on the input. If the input contains SummarizedExperiment
object, then the output will be SummarizedExperiment
. When all the input
objects belong to TreeSummarizedExperiment
, the output will be
TreeSummarizedExperiment
.
A single SummarizedExperiment
object.
TreeSummarizedExperiment::cbind
TreeSummarizedExperiment::rbind
data(GlobalPatterns) data(esophagus) data(enterotype) # Take only subsets so that it wont take so long tse1 <- GlobalPatterns[1:100, ] tse2 <- esophagus tse3 <- enterotype[1:100, ] # Merge two TreeSEs tse <- mergeSEs(tse1, tse2) # Merge a list of TreeSEs list <- SimpleList(tse1, tse2, tse3) tse <- mergeSEs(list, assay.type = "counts", missing.values = 0) tse # With 'join', it is possible to specify the merging method. Subsets are used # here just to show the functionality tse_temp <- mergeSEs(tse[1:10, 1:10], tse[5:100, 11:20], join = "left") tse_temp # If your objects contain samples that describe one and same sample, # you can collapse equally named samples to one by specifying 'collapse.cols' tse_temp <- mergeSEs(list(tse[1:10, 1], tse[1:20, 1], tse[1:5, 1]), collapse.cols = TRUE, join = "inner") tse_temp # Merge all available assays tse <- transformAssay(tse, method="relabundance") ts1 <- transformAssay(tse1, method="relabundance") tse_temp <- mergeSEs(tse, tse1, assay.type = assayNames(tse))
data(GlobalPatterns) data(esophagus) data(enterotype) # Take only subsets so that it wont take so long tse1 <- GlobalPatterns[1:100, ] tse2 <- esophagus tse3 <- enterotype[1:100, ] # Merge two TreeSEs tse <- mergeSEs(tse1, tse2) # Merge a list of TreeSEs list <- SimpleList(tse1, tse2, tse3) tse <- mergeSEs(list, assay.type = "counts", missing.values = 0) tse # With 'join', it is possible to specify the merging method. Subsets are used # here just to show the functionality tse_temp <- mergeSEs(tse[1:10, 1:10], tse[5:100, 11:20], join = "left") tse_temp # If your objects contain samples that describe one and same sample, # you can collapse equally named samples to one by specifying 'collapse.cols' tse_temp <- mergeSEs(list(tse[1:10, 1], tse[1:20, 1], tse[1:5, 1]), collapse.cols = TRUE, join = "inner") tse_temp # Merge all available assays tse <- transformAssay(tse, method="relabundance") ts1 <- transformAssay(tse1, method="relabundance") tse_temp <- mergeSEs(tse, tse1, assay.type = assayNames(tse))
mia provides various datasets derived from independent experimental studies. The datasets represent instances of the TreeSummarizedExperiment and MultiAssayExperiment containers and can serve as tools to practice the mia functionality.
Currently, the following datasets are available:
dmn_se
: A SummarizedExperiment with 130 features and
278 samples
enterotype
: A TreeSummarizedExperiment with 553
features and 280 samples
esophagus
: A TreeSummarizedExperiment with 58 features
and 3 samples
GlobalPatterns
: A TreeSummarizedExperiment with 19216
features and 26 samples
HintikkaXOData
: A MultiAssayExperiment with 3
experiments (microbiota, metabolites and biomarkers)
peerj13075
: A TreeSummarizedExperiment with 674
features and 58 samples
Tengeler2020
: A TreeSummarizedExperiment with 151
features and 27 samples
# Load dataset from mia library(mia) data("GlobalPatterns", package = "mia") # In this case, the dataset is a TreeSE, so it is renamed as tse tse <- GlobalPatterns # Print summary tse
# Load dataset from mia library(mia) data("GlobalPatterns", package = "mia") # In this case, the dataset is a TreeSE, so it is renamed as tse tse <- GlobalPatterns # Print summary tse
peerj13075 includes skin microbial profiles of 58 volunteers with multiple factors. 16S r-RNA sequencing of V3-V4 regions was done to generate millions of read using illumina platform. A standard bioinformatic and statistical analysis done to explore skin bacterial diversity and its association with age, diet, geographical locations. The authors investigated significant association of skin microbiota with individual’s geographical location.
data(peerj13075)
data(peerj13075)
A TreeSummarizedExperiment with 674 features and 58 samples. The rowData contains taxonomic information at kingdom, phylum, class, order, family and genus level. The colData includes:
sample ID
city where participant lives (Ahmednagar, Pune and Nashik)
participant's gender (Male or Female)
participant's age group (Middle_age, Adult and Elderly)
participant's diet (Veg or Mixed)
Potbhare, R., et al.
Potbhare, R., RaviKumar, A., Munukka, E., Lahti, L., & Ashma, R. (2022). Skin microbiota diversity among genetically unrelated individuals of Indian origin. PeerJ, 10, e13075. https://doi.org/10.7717/peerj.13075 Supplemental information includes OTU table and taxonomy table publicly-accessible from: https://www.doi.org/10.7717/peerj.13075/supp-1 https://www.doi.org/10.7717/peerj.13075/supp-2
rarefyAssay
randomly subsamples counts within a
SummarizedExperiment
object and returns a new
SummarizedExperiment
containing the original assay and the new
subsampled assay.
rarefyAssay( x, assay.type = assay_name, assay_name = "counts", sample = min_size, min_size = min(colSums2(assay(x))), replace = TRUE, name = "subsampled", verbose = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' rarefyAssay( x, assay.type = assay_name, assay_name = "counts", sample = min_size, min_size = min(colSums2(assay(x, assay.type))), replace = TRUE, name = "subsampled", verbose = TRUE, ... )
rarefyAssay( x, assay.type = assay_name, assay_name = "counts", sample = min_size, min_size = min(colSums2(assay(x))), replace = TRUE, name = "subsampled", verbose = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' rarefyAssay( x, assay.type = assay_name, assay_name = "counts", sample = min_size, min_size = min(colSums2(assay(x, assay.type))), replace = TRUE, name = "subsampled", verbose = TRUE, ... )
x |
|
assay.type |
|
assay_name |
Deprecated. Use |
sample |
|
min_size |
Deprecated. Use |
replace |
|
name |
|
verbose |
|
... |
additional arguments not used |
Although the subsampling approach is highly debated in microbiome research,
we include the rarefyAssay
function because there may be some
instances where it can be useful.
Note that the output of rarefyAssay
is not the equivalent as the
input and any result have to be verified with the original dataset.
Subsampling/Rarefying may undermine downstream analyses and have unintended consequences. Therefore, make sure this normalization is appropriate for your data.
To maintain the reproducibility, please define the seed using set.seed() before implement this function.
rarefyAssay
return x
with subsampled data.
McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS computational biology. 2014 Apr 3;10(4):e1003531.
Gloor GB, Macklaim JM, Pawlowsky-Glahn V & Egozcue JJ (2017) Microbiome Datasets Are Compositional: And This Is Not Optional. Frontiers in Microbiology 8: 2224. doi: 10.3389/fmicb.2017.02224
Weiss S, Xu ZZ, Peddada S, Amir A, Bittinger K, Gonzalez A, Lozupone C, Zaneveld JR, Vázquez-Baeza Y, Birmingham A, Hyde ER. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome. 2017 Dec;5(1):1-8.
# When samples in TreeSE are less than specified sample, they will be removed. # If after subsampling features are not present in any of the samples, # they will be removed. data(GlobalPatterns) tse <- GlobalPatterns set.seed(123) tse_subsampled <- rarefyAssay(tse, sample = 60000, name = "subsampled") tse_subsampled dim(tse) dim(assay(tse_subsampled, "subsampled"))
# When samples in TreeSE are less than specified sample, they will be removed. # If after subsampling features are not present in any of the samples, # they will be removed. data(GlobalPatterns) tse <- GlobalPatterns set.seed(123) tse_subsampled <- rarefyAssay(tse, sample = 60000, name = "subsampled") tse_subsampled dim(tse) dim(assay(tse_subsampled, "subsampled"))
These functions perform Canonical Correspondence Analysis on data stored
in a SummarizedExperiment
.
getCCA(x, ...) addCCA(x, ...) getRDA(x, ...) addRDA(x, ...) calculateCCA(x, ...) runCCA(x, ...) ## S4 method for signature 'ANY' getCCA(x, formula, data, ...) ## S4 method for signature 'SummarizedExperiment' getCCA( x, formula = NULL, col.var = variables, variables = NULL, test.signif = TRUE, assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", ... ) ## S4 method for signature 'SingleCellExperiment' addCCA(x, altexp = NULL, name = "CCA", ...) calculateRDA(x, ...) runRDA(x, ...) ## S4 method for signature 'ANY' getRDA(x, formula, data, ...) ## S4 method for signature 'SummarizedExperiment' getRDA( x, formula = NULL, col.var = variables, variables = NULL, test.signif = TRUE, assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", ... ) ## S4 method for signature 'SingleCellExperiment' addRDA(x, altexp = NULL, name = "RDA", ...)
getCCA(x, ...) addCCA(x, ...) getRDA(x, ...) addRDA(x, ...) calculateCCA(x, ...) runCCA(x, ...) ## S4 method for signature 'ANY' getCCA(x, formula, data, ...) ## S4 method for signature 'SummarizedExperiment' getCCA( x, formula = NULL, col.var = variables, variables = NULL, test.signif = TRUE, assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", ... ) ## S4 method for signature 'SingleCellExperiment' addCCA(x, altexp = NULL, name = "CCA", ...) calculateRDA(x, ...) runRDA(x, ...) ## S4 method for signature 'ANY' getRDA(x, formula, data, ...) ## S4 method for signature 'SummarizedExperiment' getRDA( x, formula = NULL, col.var = variables, variables = NULL, test.signif = TRUE, assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", ... ) ## S4 method for signature 'SingleCellExperiment' addRDA(x, altexp = NULL, name = "RDA", ...)
x |
|
... |
additional arguments passed to vegan::cca or vegan::dbrda and other internal functions.
|
formula |
|
data |
|
col.var |
|
variables |
Deprecated. Use |
test.signif |
|
assay.type |
|
assay_name |
Deprecated. Use |
exprs_values |
Deprecated. Use |
altexp |
|
name |
|
*CCA functions utilize vegan:cca
and *RDA functions
vegan:dbRDA
. By default, dbRDA is done with euclidean distances, which
is equivalent to RDA. col.var
and formula
can be missing,
which turns the CCA analysis into a CA analysis and dbRDA into PCoA/MDS.
Significance tests are done with vegan:anova.cca
(PERMANOVA). Group
dispersion, i.e., homogeneity within groups is analyzed with
vegan::betadisper
(multivariate homogeneity of groups dispersions
(variances)) and statistical significance of homogeneity is tested with a
test specified by homogeneity.test
parameter.
For getCCA
a matrix with samples as rows and CCA dimensions as
columns. Attributes include output from scores
,
eigenvalues, the cca
/rda
object and significance analysis
results.
For addCCA
a modified x
with the results stored in
reducedDim
as the given name
.
For more details on the actual implementation see
cca
and dbrda
library(miaViz) data("enterotype", package = "mia") tse <- enterotype # Perform CCA and exclude any sample with missing ClinicalStatus tse <- addCCA( tse, formula = data ~ ClinicalStatus, na.action = na.exclude ) # Plot CCA plotCCA(tse, "CCA", colour_by = "ClinicalStatus") # Fetch significance results attr(reducedDim(tse, "CCA"), "significance") tse <- transformAssay(tse, method = "relabundance") # Specify dissimilarity measure tse <- addRDA( tse, formula = data ~ ClinicalStatus, assay.type = "relabundance", method = "bray", name = "RDA_bray", na.action = na.exclude ) # To scale values when using *RDA functions, use # transformAssay(MARGIN = "features", ...) tse <- transformAssay(tse, method = "standardize", MARGIN = "features") # Data might include taxa that do not vary. Remove those because after # z-transform their value is NA tse <- tse[rowSums(is.na(assay(tse, "standardize"))) == 0, ] # Calculate RDA tse <- addRDA( tse, formula = data ~ ClinicalStatus, assay.type = "standardize", name = "rda_scaled", na.action = na.omit ) # Plot RDA plotRDA(tse, "rda_scaled", colour_by = "ClinicalStatus") # A common choice along with PERMANOVA is ANOVA when statistical significance # of homogeneity of groups is analysed. Moreover, full significance test # results can be returned. tse <- addRDA( tse, formula = data ~ ClinicalStatus, homogeneity.test = "anova", full = TRUE ) # Example showing how to pass extra parameters, such as 'permutations', # to anova.cca tse <- addRDA( tse, formula = data ~ ClinicalStatus, permutations = 500 )
library(miaViz) data("enterotype", package = "mia") tse <- enterotype # Perform CCA and exclude any sample with missing ClinicalStatus tse <- addCCA( tse, formula = data ~ ClinicalStatus, na.action = na.exclude ) # Plot CCA plotCCA(tse, "CCA", colour_by = "ClinicalStatus") # Fetch significance results attr(reducedDim(tse, "CCA"), "significance") tse <- transformAssay(tse, method = "relabundance") # Specify dissimilarity measure tse <- addRDA( tse, formula = data ~ ClinicalStatus, assay.type = "relabundance", method = "bray", name = "RDA_bray", na.action = na.exclude ) # To scale values when using *RDA functions, use # transformAssay(MARGIN = "features", ...) tse <- transformAssay(tse, method = "standardize", MARGIN = "features") # Data might include taxa that do not vary. Remove those because after # z-transform their value is NA tse <- tse[rowSums(is.na(assay(tse, "standardize"))) == 0, ] # Calculate RDA tse <- addRDA( tse, formula = data ~ ClinicalStatus, assay.type = "standardize", name = "rda_scaled", na.action = na.omit ) # Plot RDA plotRDA(tse, "rda_scaled", colour_by = "ClinicalStatus") # A common choice along with PERMANOVA is ANOVA when statistical significance # of homogeneity of groups is analysed. Moreover, full significance test # results can be returned. tse <- addRDA( tse, formula = data ~ ClinicalStatus, homogeneity.test = "anova", full = TRUE ) # Example showing how to pass extra parameters, such as 'permutations', # to anova.cca tse <- addRDA( tse, formula = data ~ ClinicalStatus, permutations = 500 )
Double Principal Correspondance analysis is made available via the
ade4
package in typical fashion. Results are stored in the
reducedDims
and are available for all the expected functions.
getDPCoA(x, y, ...) ## S4 method for signature 'ANY,ANY' getDPCoA( x, y, ncomponents = 2, ntop = NULL, subset.row = subset_row, subset_row = NULL, scale = FALSE, transposed = FALSE, ... ) ## S4 method for signature 'TreeSummarizedExperiment,missing' getDPCoA( x, ..., assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", tree.name = tree_name, tree_name = "phylo" ) calculateDPCoA(x, ...) addDPCoA(x, ..., altexp = NULL, name = "DPCoA") runDPCoA(x, ...)
getDPCoA(x, y, ...) ## S4 method for signature 'ANY,ANY' getDPCoA( x, y, ncomponents = 2, ntop = NULL, subset.row = subset_row, subset_row = NULL, scale = FALSE, transposed = FALSE, ... ) ## S4 method for signature 'TreeSummarizedExperiment,missing' getDPCoA( x, ..., assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", tree.name = tree_name, tree_name = "phylo" ) calculateDPCoA(x, ...) addDPCoA(x, ..., altexp = NULL, name = "DPCoA") runDPCoA(x, ...)
x |
|
y |
a |
... |
Currently not used. |
ncomponents |
|
ntop |
|
subset.row |
|
subset_row |
Deprecated. Use |
scale |
|
transposed |
|
assay.type |
|
assay_name |
Deprecated. Use |
exprs_values |
Deprecated. Use |
tree.name |
|
tree_name |
Deprecated. Use |
altexp |
|
name |
|
For addDPCoA
a TreeSummarizedExperiment containing the
expression values as well as a rowTree
to calculate y
using
cophenetic.phylo
.
In addition to the reduced dimension on the features, the reduced dimension
for samples are returned as well as sample_red
attribute.
eig
, feature_weights
and sample_weights
are
returned as attributes as well.
For getDPCoA
a matrix with samples as rows and CCA dimensions as
columns
For addDPCoA
a modified x
with the results stored in
reducedDim
as the given name
data(esophagus) dpcoa <- getDPCoA(esophagus) head(dpcoa) esophagus <- addDPCoA(esophagus) reducedDims(esophagus) library(scater) plotReducedDim(esophagus, "DPCoA")
data(esophagus) dpcoa <- getDPCoA(esophagus) head(dpcoa) esophagus <- addDPCoA(esophagus) reducedDims(esophagus) library(scater) plotReducedDim(esophagus, "DPCoA")
Perform non-metric multi-dimensional scaling (nMDS) on samples, based on the
data in a SingleCellExperiment
object.
getNMDS(x, ...) ## S4 method for signature 'ANY' getNMDS( x, FUN = vegdist, nmds.fun = nmdsFUN, nmdsFUN = c("isoMDS", "monoMDS"), ncomponents = 2, ntop = 500, subset.row = subset_row, subset_row = NULL, scale = FALSE, transposed = FALSE, keep.dist = keep_dist, keep_dist = FALSE, ... ) ## S4 method for signature 'SummarizedExperiment' getNMDS( x, ..., assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", FUN = vegdist ) ## S4 method for signature 'SingleCellExperiment' getNMDS( x, ..., assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", dimred = NULL, ndimred = n_dimred, n_dimred = NULL, FUN = vegdist ) calculateNMDS(x, ...) addNMDS(x, ..., altexp = NULL, name = "NMDS") runNMDS(x, ...)
getNMDS(x, ...) ## S4 method for signature 'ANY' getNMDS( x, FUN = vegdist, nmds.fun = nmdsFUN, nmdsFUN = c("isoMDS", "monoMDS"), ncomponents = 2, ntop = 500, subset.row = subset_row, subset_row = NULL, scale = FALSE, transposed = FALSE, keep.dist = keep_dist, keep_dist = FALSE, ... ) ## S4 method for signature 'SummarizedExperiment' getNMDS( x, ..., assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", FUN = vegdist ) ## S4 method for signature 'SingleCellExperiment' getNMDS( x, ..., assay.type = assay_name, assay_name = exprs_values, exprs_values = "counts", dimred = NULL, ndimred = n_dimred, n_dimred = NULL, FUN = vegdist ) calculateNMDS(x, ...) addNMDS(x, ..., altexp = NULL, name = "NMDS") runNMDS(x, ...)
x |
|
... |
additional arguments to pass to |
FUN |
|
nmds.fun |
|
nmdsFUN |
Deprecated. Use |
ncomponents |
|
ntop |
|
subset.row |
|
subset_row |
Deprecated. Use |
scale |
|
transposed |
|
keep.dist |
|
keep_dist |
Deprecated. Use |
assay.type |
|
assay_name |
Deprecated. Use |
exprs_values |
Deprecated. Use |
dimred |
|
ndimred |
|
n_dimred |
Deprecated. Use |
altexp |
|
name |
|
For addNMDS
a SingleCellExperiment
Either MASS::isoMDS
or
vegan::monoMDS
are used internally to compute
the NMDS components. If you supply a custom FUN
, make sure that
the arguments of FUN
and nmds.fun
do not collide.
For getNMDS
, a matrix is returned containing the MDS
coordinates for each sample (row) and dimension (column).
MASS::isoMDS
,
vegan::monoMDS
for NMDS component calculation.
plotMDS
, to quickly visualize the
results.
# generate some example data mat <- matrix(1:60, nrow = 6) df <- DataFrame(n = c(1:6)) tse <- TreeSummarizedExperiment(assays = list(counts = mat), rowData = df) # getNMDS(tse) # data(esophagus) esophagus <- addNMDS(esophagus, FUN = vegan::vegdist, name = "BC") esophagus <- addNMDS(esophagus, FUN = vegan::vegdist, name = "euclidean", method = "euclidean") reducedDims(esophagus)
# generate some example data mat <- matrix(1:60, nrow = 6) df <- DataFrame(n = c(1:6)) tse <- TreeSummarizedExperiment(assays = list(counts = mat), rowData = df) # getNMDS(tse) # data(esophagus) esophagus <- addNMDS(esophagus, FUN = vegan::vegdist, name = "BC") esophagus <- addNMDS(esophagus, FUN = vegan::vegdist, name = "euclidean", method = "euclidean") reducedDims(esophagus)
TreeSummarizedExperiment
column-wise or row-wise based on
grouping variableSplit TreeSummarizedExperiment
column-wise or row-wise based on
grouping variable
splitOn(x, ...) ## S4 method for signature 'SummarizedExperiment' splitOn(x, group = f, f = NULL, ...) ## S4 method for signature 'SingleCellExperiment' splitOn(x, group = f, f = NULL, ...) ## S4 method for signature 'TreeSummarizedExperiment' splitOn( x, group = f, f = NULL, update.tree = update_rowTree, update_rowTree = FALSE, ... ) unsplitOn(x, ...) ## S4 method for signature 'list' unsplitOn(x, update.tree = update_rowTree, update_rowTree = FALSE, ...) ## S4 method for signature 'SimpleList' unsplitOn(x, update.tree = update_rowTree, update_rowTree = FALSE, ...) ## S4 method for signature 'SingleCellExperiment' unsplitOn( x, altexp = altExpNames, altExpNames = names(altExps(x)), keep.dimred = keep_reducedDims, keep_reducedDims = FALSE, ... )
splitOn(x, ...) ## S4 method for signature 'SummarizedExperiment' splitOn(x, group = f, f = NULL, ...) ## S4 method for signature 'SingleCellExperiment' splitOn(x, group = f, f = NULL, ...) ## S4 method for signature 'TreeSummarizedExperiment' splitOn( x, group = f, f = NULL, update.tree = update_rowTree, update_rowTree = FALSE, ... ) unsplitOn(x, ...) ## S4 method for signature 'list' unsplitOn(x, update.tree = update_rowTree, update_rowTree = FALSE, ...) ## S4 method for signature 'SimpleList' unsplitOn(x, update.tree = update_rowTree, update_rowTree = FALSE, ...) ## S4 method for signature 'SingleCellExperiment' unsplitOn( x, altexp = altExpNames, altExpNames = names(altExps(x)), keep.dimred = keep_reducedDims, keep_reducedDims = FALSE, ... )
x |
|
... |
Arguments passed to
|
group |
|
f |
Deprecated. Use |
update.tree |
|
update_rowTree |
Deprecated. Use |
altexp |
|
altExpNames |
Deprecated. Use |
keep.dimred |
|
keep_reducedDims |
Deprecated. Use |
splitOn
split data based on grouping variable. Splitting can be done
column-wise or row-wise. The returned value is a list of
SummarizedExperiment
objects; each element containing members of each
group.
For splitOn
: SummarizedExperiment
objects in a
SimpleList
.
For unsplitOn
: x
, with rowData
and assay
data replaced by the unsplit data. colData
of x is kept as well
and any existing rowTree
is dropped as well, since existing
rowLinks
are not valid anymore.
agglomerateByRanks
agglomerateByVariable
,
sumCountsAcrossFeatures
,
agglomerateByRank
,
altExps
,
splitAltExps
data(GlobalPatterns) tse <- GlobalPatterns # Split data based on SampleType. se_list <- splitOn(tse, group = "SampleType") # List of SE objects is returned. se_list # Create arbitrary groups rowData(tse)$group <- sample(1:3, nrow(tse), replace = TRUE) colData(tse)$group <- sample(1:3, ncol(tse), replace = TRUE) # Split based on rows # Each element is named based on their group name. If you don't want to name # elements, use use_name = FALSE. Since "group" can be found from rowdata and # colData you must use `by`. se_list <- splitOn(tse, group = "group", use.names = FALSE, by = 1) # When column names are shared between elements, you can store the list to # altExps altExps(tse) <- se_list altExps(tse) # If you want to split on columns and update rowTree, you can do se_list <- splitOn(tse, group = colData(tse)$group, update.tree = TRUE) # If you want to combine groups back together, you can use unsplitBy unsplitOn(se_list)
data(GlobalPatterns) tse <- GlobalPatterns # Split data based on SampleType. se_list <- splitOn(tse, group = "SampleType") # List of SE objects is returned. se_list # Create arbitrary groups rowData(tse)$group <- sample(1:3, nrow(tse), replace = TRUE) colData(tse)$group <- sample(1:3, ncol(tse), replace = TRUE) # Split based on rows # Each element is named based on their group name. If you don't want to name # elements, use use_name = FALSE. Since "group" can be found from rowdata and # colData you must use `by`. se_list <- splitOn(tse, group = "group", use.names = FALSE, by = 1) # When column names are shared between elements, you can store the list to # altExps altExps(tse) <- se_list altExps(tse) # If you want to split on columns and update rowTree, you can do se_list <- splitOn(tse, group = colData(tse)$group, update.tree = TRUE) # If you want to combine groups back together, you can use unsplitBy unsplitOn(se_list)
To query a SummarizedExperiment
for interesting features, several
functions are available.
getTop( x, top = 5L, method = c("mean", "sum", "median"), assay.type = assay_name, assay_name = "counts", na.rm = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' getTop( x, top = 5L, method = c("mean", "sum", "median", "prevalence"), assay.type = assay_name, assay_name = "counts", na.rm = TRUE, ... ) getUnique(x, ...) ## S4 method for signature 'SummarizedExperiment' getUnique(x, rank = NULL, ...) summarizeDominance(x, group = NULL, name = "dominant_taxa", ...) ## S4 method for signature 'SummarizedExperiment' summarizeDominance(x, group = NULL, name = "dominant_taxa", ...) ## S4 method for signature 'SummarizedExperiment' summary(object, assay.type = assay_name, assay_name = "counts")
getTop( x, top = 5L, method = c("mean", "sum", "median"), assay.type = assay_name, assay_name = "counts", na.rm = TRUE, ... ) ## S4 method for signature 'SummarizedExperiment' getTop( x, top = 5L, method = c("mean", "sum", "median", "prevalence"), assay.type = assay_name, assay_name = "counts", na.rm = TRUE, ... ) getUnique(x, ...) ## S4 method for signature 'SummarizedExperiment' getUnique(x, rank = NULL, ...) summarizeDominance(x, group = NULL, name = "dominant_taxa", ...) ## S4 method for signature 'SummarizedExperiment' summarizeDominance(x, group = NULL, name = "dominant_taxa", ...) ## S4 method for signature 'SummarizedExperiment' summary(object, assay.type = assay_name, assay_name = "counts")
x |
|
top |
|
method |
|
assay.type |
|
assay_name |
Deprecated. Use |
na.rm |
|
... |
Additional arguments passed on to |
rank |
|
group |
With group, it is possible to group the observations in an
overview. Must be one of the column names of |
name |
|
object |
A
|
The getTop
extracts the most top
abundant “FeatureID”s
in a SummarizedExperiment
object.
The getUnique
is a basic function to access different taxa at a
particular taxonomic rank.
summarizeDominance
returns information about most dominant
taxa in a tibble. Information includes their absolute and relative
abundances in whole data set.
The summary
will return a summary of counts for all samples and
features in
SummarizedExperiment
object.
The getTop
returns a vector of the most top
abundant
“FeatureID”s
The getUnique
returns a vector of unique taxa present at a
particular rank
The summarizeDominance
returns an overview in a tibble. It contains dominant taxa
in a column named *name*
and its abundance in the data set.
The summary
returns a list with two tibble
s
perCellQCMetrics
,
perFeatureQCMetrics
,
addPerCellQC
,
addPerFeatureQC
,
quickPerCellQC
data(GlobalPatterns) top_taxa <- getTop(GlobalPatterns, method = "mean", top = 5, assay.type = "counts") top_taxa # Use 'detection' to select detection threshold when using prevalence method top_taxa <- getTop(GlobalPatterns, method = "prevalence", top = 5, assay_name = "counts", detection = 100) top_taxa # Top taxa os specific rank getTop(agglomerateByRank(GlobalPatterns, rank = "Genus", na.rm = TRUE)) # Gets the overview of dominant taxa dominant_taxa <- summarizeDominance(GlobalPatterns, rank = "Genus") dominant_taxa # With group, it is possible to group observations based on specified groups # Gets the overview of dominant taxa dominant_taxa <- summarizeDominance(GlobalPatterns, rank = "Genus", group = "SampleType", na.rm = TRUE) dominant_taxa # Get an overview of sample and taxa counts summary(GlobalPatterns, assay.type= "counts") # Get unique taxa at a particular taxonomic rank # sort = TRUE means that output is sorted in alphabetical order # With na.rm = TRUE, it is possible to remove NAs # sort and na.rm can also be used in function getTop getUnique(GlobalPatterns, "Phylum", sort = TRUE)
data(GlobalPatterns) top_taxa <- getTop(GlobalPatterns, method = "mean", top = 5, assay.type = "counts") top_taxa # Use 'detection' to select detection threshold when using prevalence method top_taxa <- getTop(GlobalPatterns, method = "prevalence", top = 5, assay_name = "counts", detection = 100) top_taxa # Top taxa os specific rank getTop(agglomerateByRank(GlobalPatterns, rank = "Genus", na.rm = TRUE)) # Gets the overview of dominant taxa dominant_taxa <- summarizeDominance(GlobalPatterns, rank = "Genus") dominant_taxa # With group, it is possible to group observations based on specified groups # Gets the overview of dominant taxa dominant_taxa <- summarizeDominance(GlobalPatterns, rank = "Genus", group = "SampleType", na.rm = TRUE) dominant_taxa # Get an overview of sample and taxa counts summary(GlobalPatterns, assay.type= "counts") # Get unique taxa at a particular taxonomic rank # sort = TRUE means that output is sorted in alphabetical order # With na.rm = TRUE, it is possible to remove NAs # sort and na.rm can also be used in function getTop getUnique(GlobalPatterns, "Phylum", sort = TRUE)
rowData
.These function work on data present in rowData
and define a way to
represent taxonomic data alongside the features of a
SummarizedExperiment
.
taxonomyRanks(x) ## S4 method for signature 'SummarizedExperiment' taxonomyRanks(x) taxonomyRankEmpty( x, rank = taxonomyRanks(x)[1L], empty.fields = c(NA, "", " ", "\t", "-", "_") ) ## S4 method for signature 'SummarizedExperiment' taxonomyRankEmpty( x, rank = taxonomyRanks(x)[1], empty.fields = c(NA, "", " ", "\t", "-", "_") ) checkTaxonomy(x, ...) ## S4 method for signature 'SummarizedExperiment' checkTaxonomy(x) setTaxonomyRanks(ranks) getTaxonomyRanks() getTaxonomyLabels(x, ...) ## S4 method for signature 'SummarizedExperiment' getTaxonomyLabels( x, empty.fields = c(NA, "", " ", "\t", "-", "_"), with.rank = with_rank, with_rank = FALSE, make.unique = make_unique, make_unique = TRUE, resolve.loops = resolve_loops, resolve_loops = FALSE, ... ) mapTaxonomy(x, ...) ## S4 method for signature 'SummarizedExperiment' mapTaxonomy( x, taxa = NULL, from = NULL, to = NULL, use.grepl = use_grepl, use_grepl = FALSE ) IdTaxaToDataFrame(from)
taxonomyRanks(x) ## S4 method for signature 'SummarizedExperiment' taxonomyRanks(x) taxonomyRankEmpty( x, rank = taxonomyRanks(x)[1L], empty.fields = c(NA, "", " ", "\t", "-", "_") ) ## S4 method for signature 'SummarizedExperiment' taxonomyRankEmpty( x, rank = taxonomyRanks(x)[1], empty.fields = c(NA, "", " ", "\t", "-", "_") ) checkTaxonomy(x, ...) ## S4 method for signature 'SummarizedExperiment' checkTaxonomy(x) setTaxonomyRanks(ranks) getTaxonomyRanks() getTaxonomyLabels(x, ...) ## S4 method for signature 'SummarizedExperiment' getTaxonomyLabels( x, empty.fields = c(NA, "", " ", "\t", "-", "_"), with.rank = with_rank, with_rank = FALSE, make.unique = make_unique, make_unique = TRUE, resolve.loops = resolve_loops, resolve_loops = FALSE, ... ) mapTaxonomy(x, ...) ## S4 method for signature 'SummarizedExperiment' mapTaxonomy( x, taxa = NULL, from = NULL, to = NULL, use.grepl = use_grepl, use_grepl = FALSE ) IdTaxaToDataFrame(from)
x |
|
rank |
|
empty.fields |
|
... |
optional arguments not used currently. |
ranks |
|
with.rank |
|
with_rank |
Deprecated. Use |
make.unique |
|
make_unique |
Deprecated. Use |
resolve.loops |
|
resolve_loops |
Deprecated. Use |
taxa |
|
from |
|
to |
|
use.grepl |
|
use_grepl |
Deprecated. Use |
taxonomyRanks
returns, which columns of rowData(x)
are regarded
as columns containing taxonomic information.
taxonomyRankEmpty
checks, if a selected rank is empty of information.
checkTaxonomy
checks, if taxonomy information is valid and whether
it contains any problems. This is a soft test, which reports some
diagnostic and might mature into a data validator used upon object
creation.
getTaxonomyLabels
generates a character vector per row consisting of
the lowest taxonomic information possible. If data from different levels,
is to be mixed, the taxonomic level is prepended by default.
IdTaxaToDataFrame
extracts taxonomic results from results of
IdTaxa
.
mapTaxonomy
maps the given features (taxonomic groups; taxa
)
to the specified taxonomic level (to
argument) in rowData
of the SummarizedExperiment
data object
(i.e. rowData(x)[,taxonomyRanks(x)]
). If the argument to
is
not provided, then all matching taxonomy rows in rowData
will be
returned. This function allows handy conversions between different
Taxonomic information from the IdTaxa
function of DECIPHER
package are returned as a special class. With as(taxa,"DataFrame")
the information can be easily converted to a DataFrame
compatible
with storing the taxonomic information a rowData
. Please note that the
assigned confidence information are returned as metatdata
and can
be accessed using metadata(df)$confidence
.
taxonomyRanks
: a character
vector with all the
taxonomic ranks found in colnames(rowData(x))
taxonomyRankEmpty
: a logical
value
mapTaxonomy
: a list
per element of taxa. Each
element is either a DataFrame
, a character
or NULL
.
If all character
results have the length of one, a single
character
vector is returned.
agglomerateByRank
,
toTree
,
resolveLoop
data(GlobalPatterns) GlobalPatterns taxonomyRanks(GlobalPatterns) checkTaxonomy(GlobalPatterns) table(taxonomyRankEmpty(GlobalPatterns,"Kingdom")) table(taxonomyRankEmpty(GlobalPatterns,"Species")) getTaxonomyLabels(GlobalPatterns[1:20,]) # mapTaxonomy ## returns the unique taxonomic information mapTaxonomy(GlobalPatterns) # returns specific unique taxonomic information mapTaxonomy(GlobalPatterns, taxa = "Escherichia") # returns information on a single output mapTaxonomy(GlobalPatterns, taxa = "Escherichia",to="Family") # setTaxonomyRanks tse <- GlobalPatterns colnames(rowData(tse))[1] <- "TAXA1" setTaxonomyRanks(colnames(rowData(tse))) # Taxonomy ranks set to: taxa1 phylum class order family genus species # getTaxonomyRanks is to get/check if the taxonomic ranks is set to "TAXA1" getTaxonomyRanks()
data(GlobalPatterns) GlobalPatterns taxonomyRanks(GlobalPatterns) checkTaxonomy(GlobalPatterns) table(taxonomyRankEmpty(GlobalPatterns,"Kingdom")) table(taxonomyRankEmpty(GlobalPatterns,"Species")) getTaxonomyLabels(GlobalPatterns[1:20,]) # mapTaxonomy ## returns the unique taxonomic information mapTaxonomy(GlobalPatterns) # returns specific unique taxonomic information mapTaxonomy(GlobalPatterns, taxa = "Escherichia") # returns information on a single output mapTaxonomy(GlobalPatterns, taxa = "Escherichia",to="Family") # setTaxonomyRanks tse <- GlobalPatterns colnames(rowData(tse))[1] <- "TAXA1" setTaxonomyRanks(colnames(rowData(tse))) # Taxonomy ranks set to: taxa1 phylum class order family genus species # getTaxonomyRanks is to get/check if the taxonomic ranks is set to "TAXA1" getTaxonomyRanks()
Tengeler2020 includes gut microbiota profiles of 27 persons with ADHD. A standard bioinformatic and statistical analysis done to demonstrate that altered microbial composition could be a driver of altered brain structure and function and concomitant changes in the animals’ behavior. This was investigated by colonizing young, male, germ-free C57BL/6JOlaHsd mice with microbiota from individuals with and without ADHD.
data(Tengeler2020)
data(Tengeler2020)
A TreeSummarizedExperiment with 151 features and 27 samples. The rowData contains taxonomic information at Kingdom, Phylum, Class, Order, Family and Genus level. The colData includes:
clinical status of the patient (ADHD or Control)
cohort to which the patient belongs (Cohort_1, Cohort_2 and Cohort_3)
combination of patient_status and cohort
unique sample ID
A.C. Tengeler, et al.
Tengeler, A.C., Dam, S.A., Wiesmann, M. et al. Gut microbiota from persons with attention-deficit/hyperactivity disorder affects the brain in mice. Microbiome 8, 44 (2020). https://doi.org/10.1186/s40168-020-00816-x
Supplemental information includes Home-cage activity, methods, results and imaging parameters and publicly-accessible from: https://static-content.springer.com/esm/art%3A10.1186%2Fs40168-020-00816-x/MediaObjects/40168_2020_816_MOESM1_ESM.docx https://static-content.springer.com/esm/art%3A10.1186%2Fs40168-020-00816-x/MediaObjects/40168_2020_816_MOESM2_ESM.docx https://static-content.springer.com/esm/art%3A10.1186%2Fs40168-020-00816-x/MediaObjects/40168_2020_816_MOESM3_ESM.docx
The study combined Quantitative Microbiome Profiling (QMP) with extensive patient phenotyping from a group of 589 colorectal cancer (CRC) patients, advanced adenoma (AA) patients, and healthy controls. By implementing confounder control and quantitative profiling methods, the study was able to reveal potential misleading associations between microbial markers and colorectal cancer development that were driven by other factors like intestinal inflammation, rather than the cancer diagnosis itself.
data(Tito2024QMP)
data(Tito2024QMP)
A TreeSummarizedExperiment with 676 features and 589 samples. The rowData contains species. The colData includes:
(character) Sample ID from the corresponding study
(factor) Diagnosis type, with possible values: "ADE" (advanced adenoma), "CRC" (colorectal cancer), "CTL" (control)
(factor) Colonoscopy result, with possible values: "FIT_Positive", "familial_risk_familial_CRC_FCC", "familial_risk_no", "abdomil_complaints"
Shadman Ishraq
Raúl Y. Tito, Sara Verbandt, Marta Aguirre Vazquez, Leo Lahti, Chloe Verspecht, Verónica Lloréns-Rico, Sara Vieira-Silva, Janine Arts, Gwen Falony, Evelien Dekker, Joke Reumers, Sabine Tejpar & Jeroen Raes (2024). Microbiome confounders and quantitative profiling challenge predicted microbial targets in colorectal cancer development. Nature Medicine,30, 1339-1348. https://doi.org/10.1038/s41591-024-02963-2
Variety of transformations for abundance data, stored in assay
.
See details for options.
transformAssay(x, ...) ## S4 method for signature 'SummarizedExperiment' transformAssay( x, assay.type = "counts", assay_name = NULL, method = c("alr", "chi.square", "clr", "css", "frequency", "hellinger", "log", "log10", "log2", "max", "normalize", "pa", "range", "rank", "rclr", "relabundance", "rrank", "standardize", "total", "z"), MARGIN = "samples", name = method, pseudocount = FALSE, ... ) ## S4 method for signature 'SingleCellExperiment' transformAssay(x, altexp = altExpNames(x), ...)
transformAssay(x, ...) ## S4 method for signature 'SummarizedExperiment' transformAssay( x, assay.type = "counts", assay_name = NULL, method = c("alr", "chi.square", "clr", "css", "frequency", "hellinger", "log", "log10", "log2", "max", "normalize", "pa", "range", "rank", "rclr", "relabundance", "rrank", "standardize", "total", "z"), MARGIN = "samples", name = method, pseudocount = FALSE, ... ) ## S4 method for signature 'SingleCellExperiment' transformAssay(x, altexp = altExpNames(x), ...)
x |
|
... |
additional arguments passed e.g. on to
|
assay.type |
|
assay_name |
Deprecated. Use |
method |
|
MARGIN |
|
name |
|
pseudocount |
|
altexp |
|
transformAssay
function provides a variety of options for
transforming abundance data. The transformed data is calculated and stored
in a new assay
.
The transformAssay
provides sample-wise (column-wise) or feature-wise
(row-wise) transformation to the abundance table
(assay) based on specified MARGIN
.
The available transformation methods include:
'alr', 'chi.square', 'clr', 'frequency', 'hellinger', 'log',
'normalize', 'pa', 'rank', 'rclr' relabundance', 'rrank', 'standardize',
'total': please refer to
decostand
for details.
'css': Cumulative Sum Scaling (CSS) can be used to normalize count data
by accounting for differences in library sizes. By default, the function
determines the normalization percentile for summing and scaling
counts. If you want to specify the percentile value, good default value
might be 0.5
.The method is inspired by the CSS methods in
metagenomeSeq
package.
'log10': log10 transformation can be used for reducing the skewness of the data.
where is a single value of data.
'log2': log2 transformation can be used for reducing the skewness of the data.
where is a single value of data.
transformAssay
returns the input object x
, with a new
transformed abundance table named name
added in the
assay
.
Paulson, J., Stine, O., Bravo, H. et al. (2013) Differential abundance analysis for microbial marker-gene surveys Nature Methods 10, 1200–1202. doi:10.1038/nmeth.2658
data(GlobalPatterns) tse <- GlobalPatterns # By specifying 'method', it is possible to apply different transformations, # e.g. compositional transformation. tse <- transformAssay(tse, method = "relabundance") # The target of transformation can be specified with "assay.type" # Pseudocount can be added by specifying 'pseudocount'. # Perform CLR with smallest positive value as pseudocount tse <- transformAssay( tse, assay.type = "relabundance", method = "clr", pseudocount = TRUE ) head(assay(tse, "clr")) # Perform CSS normalization. tse <- transformAssay(tse, method = "css") head(assay(tse, "css")) # With MARGIN, you can specify the if transformation is done for samples or # for features. Here Z-transformation is done feature-wise. tse <- transformAssay(tse, method = "standardize", MARGIN = "features") head(assay(tse, "standardize")) # Name of the stored table can be specified. tse <- transformAssay(tse, method="hellinger", name="test") head(assay(tse, "test")) # pa returns presence absence table. tse <- transformAssay(tse, method = "pa") head(assay(tse, "pa")) # rank returns ranks of taxa. tse <- transformAssay(tse, method = "rank") head(assay(tse, "rank")) # In order to use other ranking variants, modify the chosen assay directly: assay(tse, "rank_average", withDimnames = FALSE) <- colRanks( assay(tse, "counts"), ties.method = "average", preserveShape = TRUE) # Using altexp parameter. First agglomerate the data and then apply # transformation. tse <- GlobalPatterns tse <- agglomerateByRanks(tse) tse <- transformAssay(tse, method = "relabundance") # The transformation is applied to all alternative experiments altExp(tse, "Species")
data(GlobalPatterns) tse <- GlobalPatterns # By specifying 'method', it is possible to apply different transformations, # e.g. compositional transformation. tse <- transformAssay(tse, method = "relabundance") # The target of transformation can be specified with "assay.type" # Pseudocount can be added by specifying 'pseudocount'. # Perform CLR with smallest positive value as pseudocount tse <- transformAssay( tse, assay.type = "relabundance", method = "clr", pseudocount = TRUE ) head(assay(tse, "clr")) # Perform CSS normalization. tse <- transformAssay(tse, method = "css") head(assay(tse, "css")) # With MARGIN, you can specify the if transformation is done for samples or # for features. Here Z-transformation is done feature-wise. tse <- transformAssay(tse, method = "standardize", MARGIN = "features") head(assay(tse, "standardize")) # Name of the stored table can be specified. tse <- transformAssay(tse, method="hellinger", name="test") head(assay(tse, "test")) # pa returns presence absence table. tse <- transformAssay(tse, method = "pa") head(assay(tse, "pa")) # rank returns ranks of taxa. tse <- transformAssay(tse, method = "rank") head(assay(tse, "rank")) # In order to use other ranking variants, modify the chosen assay directly: assay(tse, "rank_average", withDimnames = FALSE) <- colRanks( assay(tse, "counts"), ties.method = "average", preserveShape = TRUE) # Using altexp parameter. First agglomerate the data and then apply # transformation. tse <- GlobalPatterns tse <- agglomerateByRanks(tse) tse <- transformAssay(tse, method = "relabundance") # The transformation is applied to all alternative experiments altExp(tse, "Species")