| Title: | Core Utils for Metabolomics Data |
|---|---|
| Description: | MetaboCoreUtils defines metabolomics-related core functionality provided as low-level functions to allow a data structure-independent usage across various R packages. This includes functions to calculate between ion (adduct) and compound mass-to-charge ratios and masses or functions to work with chemical formulas. The package provides also a set of adduct definitions and information on some commercially available internal standard mixes commonly used in MS experiments. |
| Authors: | Johannes Rainer [aut, cre] (ORCID: <https://orcid.org/0000-0002-6977-7147>), Michael Witting [aut] (ORCID: <https://orcid.org/0000-0002-1462-4426>), Andrea Vicini [aut], Liesa Salzer [ctb] (ORCID: <https://orcid.org/0000-0003-0761-0656>), Sebastian Gibb [aut] (ORCID: <https://orcid.org/0000-0001-7406-4443>), Michael Stravs [ctb] (ORCID: <https://orcid.org/0000-0002-1426-8572>), Roger Gine [aut] (ORCID: <https://orcid.org/0000-0003-0288-9619>), William Kumler [aut] (ORCID: <https://orcid.org/0000-0002-5022-8009>), Philippine Louail [aut] (ORCID: <https://orcid.org/0009-0007-5429-6846>), Gabriele Tomè [aut] (ORCID: <https://orcid.org/0000-0002-3976-6068>, fnd: MetaRbolomics4Galaxy project (CUP: D53C25001030003) co-funded by the Autonomous Province of Bolzano under the Joint Projects South Tyrol–Germany 2025 program.) |
| Maintainer: | Johannes Rainer <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.21.1 |
| Built: | 2026-05-10 09:33:46 UTC |
| Source: | https://github.com/bioc/MetaboCoreUtils |
addElements Add one chemical formula to another.
addElements(x, y)addElements(x, y)
x |
|
y |
|
character Resulting formula
Michael Witting and Sebastian Gibb
addElements("C6H12O6", "Na") addElements("C6H12O6", c("Na", "H2O"))addElements("C6H12O6", "Na") addElements("C6H12O6", c("Na", "H2O"))
adductCharge() returns the charge of an adduct.
adductCharge(x)adductCharge(x)
x |
|
integer of length equal to x with the charge of the adduct/ion.
Other adduct related functions:
adductFormula(),
adductNames(),
formula2mz(),
mass2mz(),
mz2mass(),
standardizeSingleCharge()
a <- c("[M+3H]3+", "[M+H]+", "[M-2H]2-", "[M-H]-") adductCharge(a)a <- c("[M+3H]3+", "[M+H]+", "[M-2H]2-", "[M-H]-") adductCharge(a)
adductFormula calculates the chemical formulas for the specified adducts
of provided chemical formulas.
adductFormula(formulas, adduct = "[M+H]+", standardize = TRUE)adductFormula(formulas, adduct = "[M+H]+", standardize = TRUE)
formulas |
|
adduct |
|
standardize |
|
character matrix with formula rows and adducts columns
containing all ion formulas. In case an ion can't be generated (eg.
[M-NH3+H]+ in a molecule that doesn't have nitrogen), a NA is returned
instead.
Roger Gine
adductNames() for a list of all available predefined adducts and
adducts() for the adduct data.frame definition style.
Other adduct related functions:
adductCharge(),
adductNames(),
formula2mz(),
mass2mz(),
mz2mass(),
standardizeSingleCharge()
# Calculate the ion formulas of glucose with adducts [M+H]+, [M+Na]+ and [M+K]+ adductFormula("C6H12O6", c("[M+H]+", "[M+Na]+", "[M+K]+")) # > "[C6H13O6]+" "[C6H12O6Na]+" "[C6H12O6K]+" # Use a custom set of adduct definitions (For instance, a iron (Fe2+) adduct) custom_ads <- data.frame(name = "[M+Fe]2+", mass_multi = 0.5, charge = 2, formula_add = "Fe", formula_sub = "C0", positive = "TRUE") adductFormula("C6H12O6", custom_ads)# Calculate the ion formulas of glucose with adducts [M+H]+, [M+Na]+ and [M+K]+ adductFormula("C6H12O6", c("[M+H]+", "[M+Na]+", "[M+K]+")) # > "[C6H13O6]+" "[C6H12O6Na]+" "[C6H12O6K]+" # Use a custom set of adduct definitions (For instance, a iron (Fe2+) adduct) custom_ads <- data.frame(name = "[M+Fe]2+", mass_multi = 0.5, charge = 2, formula_add = "Fe", formula_sub = "C0", positive = "TRUE") adductFormula("C6H12O6", custom_ads)
adductNames() returns all supported adduct definitions that can be used by
mass2mz() and mz2mass().
adducts returns a data.frame with the adduct definitions.
adductNames(polarity = c("positive", "negative")) adducts(polarity = c("positive", "negative"))adductNames(polarity = c("positive", "negative")) adducts(polarity = c("positive", "negative"))
polarity |
|
for adductNames(): character vector with all valid adduct names
for the selected ion mode. For adducts: data.frame with the adduct
definitions.
Michael Witting, Johannes Rainer
Other adduct related functions:
adductCharge(),
adductFormula(),
formula2mz(),
mass2mz(),
mz2mass(),
standardizeSingleCharge()
## retrieve names of adduct names in positive ion mode adductNames(polarity = "positive") ## retrieve names of adduct names in negative ion mode adductNames(polarity = "negative")## retrieve names of adduct names in positive ion mode adductNames(polarity = "positive") ## retrieve names of adduct names in negative ion mode adductNames(polarity = "negative")
Calculate beta parameters for a chromatographic peak, both its similarity
to a bell curve of varying degrees of skew (i.e., assymmetry or
tailing factor) and the standard deviation of the residuals after the
best-fit bell is normalized and subtracted. This function
requires at least 5 scans or it will return NA for both parameters.
betaValues( intensity = numeric(), rtime = seq_along(intensity), skews = c(3, 3.5, 4, 4.5, 5), zero.rm = TRUE )betaValues( intensity = numeric(), rtime = seq_along(intensity), skews = c(3, 3.5, 4, 4.5, 5), zero.rm = TRUE )
intensity |
A |
rtime |
A numeric vector corresponding to the retention times of each intensity. Retention times are expected to be in increasing order without duplicates. If not provided, intensities will be assumed to be equally spaced. |
skews |
A |
zero.rm |
|
The function compares the actual chromatographic peak to Beta distribution
curves calculated with the dbeta() function.
numeric of length 2.
William Kumler
Kumler W, Hazelton B J and Ingalls A E (2023) "Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics" BMC Bioinformatics 24(1):404. doi: 10.1186/s12859-023-05533-4
dbeta() for the function to calculate the Beta distribution.
## Define intensity values of a typical chromatographic peak skinny_peak <- c(9107, 3326, 9523, 3245, 3429, 9394, 1123, 935, 5128, 8576, 2711, 3427, 7294, 8109, 9288, 6997, 9756, 8034, 1317, 8866, 13877, 14854, 28296, 57101, 92209, 151797, 222386, 299402, 365045, 394255, 402680, 363996, 293985, 222989, 147007, 94947, 52924, 32438, 11511, 10836, 8046, 601, 889, 5917, 2690, 5381, 9901, 8494, 3349, 8283, 3410, 5935, 3332, 7041, 3284, 7478, 76, 3739, 2158, 5507) skinny_peak_rt <- seq_along(skinny_peak)+100 plot(skinny_peak_rt, skinny_peak, xlab = "RT", ylab = "intensity", type = "l") ## Calculate beta parameters for the full region including noise signal ## around the peak res <- betaValues(skinny_peak, skinny_peak_rt) res ## Calculate beta parameters for peak signal res <- betaValues(skinny_peak[20:40], skinny_peak_rt[20:40]) res ## Define noisy chromatographic signal noise_peak <- c(0.288, 0.788, 0.409, 0.883, 0.94, 0.046, 0.528, 0.892, 0.551, 0.457, 0.957, 0.453, 0.678, 0.573, 0.103, 0.9, 0.246, 0.042, 0.328, 0.955, 0.89, 0.693, 0.641, 0.994, 0.656, 0.709, 0.544, 0.594, 0.289, 0.147, 0.963, 0.902, 0.691, 0.795, 0.025, 0.478, 0.758, 0.216, 0.318, 0.232, 0.143, 0.415, 0.414, 0.369, 0.152, 0.139, 0.233, 0.466, 0.266, 0.858, 0.046, 0.442, 0.799, 0.122, 0.561, 0.207, 0.128, 0.753, 0.895, 0.374, 0.665, 0.095, 0.384, 0.274, 0.815, 0.449, 0.81, 0.812, 0.794, 0.44, 0.754, 0.629, 0.71, 0.001, 0.475, 0.22, 0.38, 0.613, 0.352, 0.111, 0.244, 0.668, 0.418, 0.788, 0.103, 0.435, 0.985, 0.893, 0.886, 0.175, 0.131, 0.653, 0.344, 0.657, 0.32, 0.188, 0.782, 0.094, 0.467, 0.512) noise_peak_rt <- seq_along(noise_peak) + 10 res <- betaValues(noise_peak, noise_peak_rt) res## Define intensity values of a typical chromatographic peak skinny_peak <- c(9107, 3326, 9523, 3245, 3429, 9394, 1123, 935, 5128, 8576, 2711, 3427, 7294, 8109, 9288, 6997, 9756, 8034, 1317, 8866, 13877, 14854, 28296, 57101, 92209, 151797, 222386, 299402, 365045, 394255, 402680, 363996, 293985, 222989, 147007, 94947, 52924, 32438, 11511, 10836, 8046, 601, 889, 5917, 2690, 5381, 9901, 8494, 3349, 8283, 3410, 5935, 3332, 7041, 3284, 7478, 76, 3739, 2158, 5507) skinny_peak_rt <- seq_along(skinny_peak)+100 plot(skinny_peak_rt, skinny_peak, xlab = "RT", ylab = "intensity", type = "l") ## Calculate beta parameters for the full region including noise signal ## around the peak res <- betaValues(skinny_peak, skinny_peak_rt) res ## Calculate beta parameters for peak signal res <- betaValues(skinny_peak[20:40], skinny_peak_rt[20:40]) res ## Define noisy chromatographic signal noise_peak <- c(0.288, 0.788, 0.409, 0.883, 0.94, 0.046, 0.528, 0.892, 0.551, 0.457, 0.957, 0.453, 0.678, 0.573, 0.103, 0.9, 0.246, 0.042, 0.328, 0.955, 0.89, 0.693, 0.641, 0.994, 0.656, 0.709, 0.544, 0.594, 0.289, 0.147, 0.963, 0.902, 0.691, 0.795, 0.025, 0.478, 0.758, 0.216, 0.318, 0.232, 0.143, 0.415, 0.414, 0.369, 0.152, 0.139, 0.233, 0.466, 0.266, 0.858, 0.046, 0.442, 0.799, 0.122, 0.561, 0.207, 0.128, 0.753, 0.895, 0.374, 0.665, 0.095, 0.384, 0.274, 0.815, 0.449, 0.81, 0.812, 0.794, 0.44, 0.754, 0.629, 0.71, 0.001, 0.475, 0.22, 0.38, 0.613, 0.352, 0.111, 0.244, 0.668, 0.418, 0.788, 0.103, 0.435, 0.985, 0.893, 0.886, 0.175, 0.131, 0.653, 0.344, 0.657, 0.32, 0.188, 0.782, 0.094, 0.467, 0.512) noise_peak_rt <- seq_along(noise_peak) + 10 res <- betaValues(noise_peak, noise_peak_rt) res
Kendrick mass defect analysis is a way to analyze high-resolution MS data in order to identify homologous series. The Kendrick mass (KM) is calculated by choosing a specific molecular fragment (e.g. CH2) and settings its mass to an integer mass. In case of CH2 the mass of 14.01565 would be set to 14.The Kendrick mass defect (KMD) is defined as the difference between the KM and the nominial (integer) KM. All molecules of homologoues series, e.g. only differing in the number of CH2, will have an identical KMD. In an additional step the KMD can be referenced to the mass defect of specific lipid backbone and by this normalize values to the referenced KMD (RKMD). This leads to values of 0 for saturated species or -1, -2, -3, etc for unsaturated species.
Available functoins are:
calculateKm: calculates the Kendrick mass from an exact mass for a
specific molecular fragment, e.g. "CH2".
calculateKmd: calculates the Kendrick mass defect from an exact mass for
a specific molecular fragment, e.g. "CH2".
calculateRkmd: calculates the referenced Kendrick mass defect from an
exact mass for a specific molecular fragment, e.g. "CH2", and a reference
KMD.
isRkmd: Checks if a calculated RKMD falls within a specific error range
around an negative integer corresponding the number of double bonds, in
case of CH2 as fragment.
calculateKm(x, fragment = 14/14.01565) calculateKmd(x, fragment = 14/14.01565) calculateRkmd(x, fragment = 14/14.01565, rkmd = 0.749206) isRkmd(x, rkmdTolerance = 0.1)calculateKm(x, fragment = 14/14.01565) calculateKmd(x, fragment = 14/14.01565) calculateRkmd(x, fragment = 14/14.01565, rkmd = 0.749206) isRkmd(x, rkmdTolerance = 0.1)
x |
|
fragment |
|
rkmd |
|
rkmdTolerance |
|
numeric or boolean. All functions, except isRkmd return a
numeric with same length as the input corresponding to the KM, KMD or
RMKD.
isRkmd returns a logical with TRUE or FALSE indicating if the
RKMD falls within a specific range around a negative integer
corresponding to the number of double bonds.
Michael Witting
calculateKm(760.5851) calculateKmd(760.5851) calculateRkmd(760.5851, rkmd = 0.749206) isRkmd(calculateRkmd(760.5851, rkmd = 0.749206))calculateKm(760.5851) calculateKmd(760.5851) calculateRkmd(760.5851, rkmd = 0.749206) isRkmd(calculateRkmd(760.5851, rkmd = 0.749206))
calculateMass calculates the exact mass from a formula. Isotopes are also
supported. For isotopes, the isotope type needs to be specified as an
element's prefix, e.g. "[13C]" for carbon 13 or "[2H]" for deuterium.
A formula with 2 carbon 13 isotopes and 3 carbons would thus contain e.g.
"[13C2]C3".
calculateMass(x)calculateMass(x)
x |
|
numeric Resulting exact mass.
Michael Witting
calculateMass("C6H12O6") calculateMass("NH3") calculateMass(c("C6H12O6", "NH3")) ## Calculate masses for formulas containing isotope information. calculateMass(c("C6H12O6", "[13C3]C3H12O6")) ## Calculate mass for a chemical with 5 deuterium. calculateMass("C11[2H5]H7N2O2")calculateMass("C6H12O6") calculateMass("NH3") calculateMass(c("C6H12O6", "NH3")) ## Calculate masses for formulas containing isotope information. calculateMass(c("C6H12O6", "[13C3]C3H12O6")) ## Calculate mass for a chemical with 5 deuterium. calculateMass("C11[2H5]H7N2O2")
containsElements checks if one sum formula is contained in another.
containsElements(x, y)containsElements(x, y)
x |
|
y |
|
logical TRUE if y is contained in x
Michael Witting and Sebastian Gibb
containsElements("C6H12O6", "H2O") containsElements("C6H12O6", "NH3")containsElements("C6H12O6", "H2O") containsElements("C6H12O6", "NH3")
convertMtime performs effective mobility scale transformation of CE(-MS)
data, which is used to overcome variations of the migration times, caused by
differences in the Electroosmotic Flow (EOF) between different runs.
In order to monitor the EOF and perform the transformation, neutral or
charged EOF markers are spiked into the sample before analysis. The
information of the EOF markers (migration time and effective mobility) will
be then used to perform the effective mobility transformation of the
migration time scale.
convertMtime( x = numeric(), rtime = numeric(), mobility = numeric(), tR = 0, U = numeric(), L = numeric() )convertMtime( x = numeric(), rtime = numeric(), mobility = numeric(), tR = 0, U = numeric(), L = numeric() )
x |
|
rtime |
|
mobility |
|
tR |
|
U |
|
L |
|
numeric vector of same length as x with effective mobility values.
Liesa Salzer
rtime <- c(10,20,30,40,50,60,70,80,90,100) marker_rt <- c(20,80) mobility <- c(0, 2000) convertMtime(rtime, marker_rt, mobility)rtime <- c(10,20,30,40,50,60,70,80,90,100) marker_rt <- c(20,80) mobility <- c(0, 2000) convertMtime(rtime, marker_rt, mobility)
correctRindex performs correction of retention indices (RIs) based on
reference substances.
Even after conversion of RTs to RIs slight deviations might exist. These
deviations can be further normalized, if they are linear, by using two
metabolites for which the RIs are known (e.g. internal standards).
correctRindex(x, y)correctRindex(x, y)
x |
|
y |
|
numeric vector of same length than x with corrected retention
indices. Values are floating point decimals. If integer values shall be
used conversion has to be performed manually.
Michael Witting
ref <- data.frame(rindex = c(110, 210), refindex = c(100, 200)) rindex <- c(110, 210) correctRindex(rindex, ref)ref <- data.frame(rindex = c(110, 210), refindex = c(100, 200)) rindex <- c(110, 210) correctRindex(rindex, ref)
countElements parses strings representing a chemical formula into a named
vector of element counts.
countElements(x)countElements(x)
x |
|
list of integer with the element counts (names being elements).
Michael Witting and Sebastian Gibb
countElements(c("C6H12O6", "C11H12N2O2"))countElements(c("C6H12O6", "C11H12N2O2"))
The fit_lm and adjust_lm functions facilitate linear model-based
normalization of abundance matrices. The expected noise in a numeric
data matrix can be modeled with a linear regression model using the
fit_lm function and the data can subsequently be adjusted using the
adjust_lm function (i.e., the modeled noise will be removed from the
abundance values). A typical use case would be to remove injection index
dependent signal drifts in a LC-MS derived metabolomics data:
a linear model of the form y ~ injection_index could be used to model
the measured abundances of each feature (each row in a data matrix) as a
function of the injection index in which a specific sample was measured
during the LC-MS measurement run. The fitted linear regression models
can subsequently be used to adjust the abundance matrix by removing
these dependencies from the data. This allows to perform signal
adjustments as described in (Wehrens et al. 2016).
The two functions are described in more details below:
fit_lm allows to fit a linear regression model (defined with parameter
formula) to each row of the numeric data matrix submitted with parameter
y. Additional covariates of the linear model defined in formula are
expected to be provided as columns in a data.frame supplied via
the data parameter.
The linear model is expected to be defined by a formula starting with
y ~ . To model for example an injection index dependency of values in
y a formula y ~ injection_index could be used, with values for the
injection index being provided as a column "injection_index" in the
data data frame. fit_lm would thus fit this model to each row of
y.
Linear models can be fitted either with the standard least squares of
lm() by setting method = "lm" (the default), or with the more robust
methods from the robustbase package with method = "lmrob".
adjust_lm can be used to adjust abundances in a data matrix y based
on linear regression models provided with parameter lm. Parameter lm
is expected to be a list of length equal to the number of rows of y,
each element being a linear model (i.e., a results from lm or lmrob).
Covariates for the model need to be provided as columns in a data.frame
provided with parameter data. The number of rows of that data.frame
need to match the number of columns of y. The function returns the
input matrix y with values in rows adjusted with the linear models
provided by lm. No adjustment is performed for rows for which the
respective element in lm is NA. See examples below for details or the
vignette for more examples, descriptions and information.
fit_lm( formula, data, y, method = c("lm", "lmrob"), control = NULL, minVals = ceiling(nrow(data) * 0.75), model = TRUE, ..., BPPARAM = SerialParam() ) adjust_lm(y = matrix(), data = data.frame(), lm = list(), ...)fit_lm( formula, data, y, method = c("lm", "lmrob"), control = NULL, minVals = ceiling(nrow(data) * 0.75), model = TRUE, ..., BPPARAM = SerialParam() ) adjust_lm(y = matrix(), data = data.frame(), lm = list(), ...)
formula |
|
data |
|
y |
for |
method |
|
control |
a |
minVals |
|
model |
|
... |
for |
BPPARAM |
parallel processing setup. See |
lm |
|
For `fit_lm`: a `list` with linear models (either of type *lm* or *lmrob*) or length equal to the number of rows of `y`. `NA` is reported for rows with too few non-missing data points (depending on parameter `minValues`). For `adjust_lm`: a numeric matrix (same dimensions as input matrix `y`) with the values adjusted with the provided linear models.
Johannes Rainer
Wehrens R, Hageman JA, van Eeuwijk F, Kooke R, Flood PJ, Wijnker E, Keurentjes JJ, Lommen A, van Eekelen HD, Hall RD Mumm R and de Vos RC. Improved batch correction in untargeted MS-based metabolomics. Metabolomics 2016; 12:88.
## See also the vignette for more details and examples. ## Load a test matrix with abundances of features from a LC-MS experiment. vals <- read.table(system.file("txt", "feature_values.txt", package = "MetaboCoreUtils"), sep = "\t") vals <- as.matrix(vals) ## Define a data.frame with the covariates to be used to model the noise sdata <- data.frame(injection_index = seq_len(ncol(vals))) ## Fit a linear model describing the feature abundances as a ## function of the index in which samples were injected during the LC-MS ## run. We're fitting the model to log2 transformed data. ## Note that such a model should **only** be fitted if the samples ## were randomized, i.e. the injection index is independent of any ## experimental covariate. Alternatively, the injection order dependent ## signal drift could be estimated using QC samples (if they were ## repeatedly injected) - see vignette for more details. ii_lm <- fit_lm(y ~ injection_index, data = sdata, y = log2(vals)) ## The result is a list of linear models ii_lm[[1]] ## Plotting the data for one feature: plot(x = sdata$injection_index, y = log2(vals[2, ]), ylab = expression(log[2]~abundance), xlab = "injection index") grid() ## plot also the fitted model abline(ii_lm[[2]], lty = 2) ## For this feature (row) a decreasing signal intensity with injection ## index was observed (and modeled). ## For another feature an increasing intensity can be observed. plot(x = sdata$injection_index, y = log2(vals[3, ]), ylab = expression(log[2]~abundance), xlab = "injection index") grid() ## plot also the fitted model abline(ii_lm[[3]], lty = 2) ## This trend can be removed from the data using the `adjust_lm` function ## by providing the linear models descring the drift. Note that, because ## we're adjusting log2 transformed data, the resulting abundances are ## also in log2 scale. vals_adj <- adjust_lm(log2(vals), data = sdata, lm = ii_lm) ## Plotting the data before (open circles) and after adjustment (filled ## points) plot(x = sdata$injection_index, y = log2(vals[2, ]), ylab = expression(log[2]~abundance), xlab = "injection index") points(x = sdata$injection_index, y = vals_adj[2, ], pch = 16) grid() ## Adding the line fitted through the raw data abline(ii_lm[[2]], lty = 2) ## Adding a line fitted through the adjusted data abline(lm(vals_adj[2, ] ~ sdata$injection_index), lty = 1) ## After adjustment there is no more dependency on injection index.## See also the vignette for more details and examples. ## Load a test matrix with abundances of features from a LC-MS experiment. vals <- read.table(system.file("txt", "feature_values.txt", package = "MetaboCoreUtils"), sep = "\t") vals <- as.matrix(vals) ## Define a data.frame with the covariates to be used to model the noise sdata <- data.frame(injection_index = seq_len(ncol(vals))) ## Fit a linear model describing the feature abundances as a ## function of the index in which samples were injected during the LC-MS ## run. We're fitting the model to log2 transformed data. ## Note that such a model should **only** be fitted if the samples ## were randomized, i.e. the injection index is independent of any ## experimental covariate. Alternatively, the injection order dependent ## signal drift could be estimated using QC samples (if they were ## repeatedly injected) - see vignette for more details. ii_lm <- fit_lm(y ~ injection_index, data = sdata, y = log2(vals)) ## The result is a list of linear models ii_lm[[1]] ## Plotting the data for one feature: plot(x = sdata$injection_index, y = log2(vals[2, ]), ylab = expression(log[2]~abundance), xlab = "injection index") grid() ## plot also the fitted model abline(ii_lm[[2]], lty = 2) ## For this feature (row) a decreasing signal intensity with injection ## index was observed (and modeled). ## For another feature an increasing intensity can be observed. plot(x = sdata$injection_index, y = log2(vals[3, ]), ylab = expression(log[2]~abundance), xlab = "injection index") grid() ## plot also the fitted model abline(ii_lm[[3]], lty = 2) ## This trend can be removed from the data using the `adjust_lm` function ## by providing the linear models descring the drift. Note that, because ## we're adjusting log2 transformed data, the resulting abundances are ## also in log2 scale. vals_adj <- adjust_lm(log2(vals), data = sdata, lm = ii_lm) ## Plotting the data before (open circles) and after adjustment (filled ## points) plot(x = sdata$injection_index, y = log2(vals[2, ]), ylab = expression(log[2]~abundance), xlab = "injection index") points(x = sdata$injection_index, y = vals_adj[2, ], pch = 16) grid() ## Adding the line fitted through the raw data abline(ii_lm[[2]], lty = 2) ## Adding a line fitted through the adjusted data abline(lm(vals_adj[2, ] ~ sdata$injection_index), lty = 1) ## After adjustment there is no more dependency on injection index.
formula2mz calculates the m/z values from a list of molecular formulas and
adduct definitions.
Custom adduct definitions can be passed to the adduct parameter in form of
a data.frame. This data.frame is expected to have columns "mass_add"
and "mass_multi" defining the additive and multiplicative part of the
calculation. See adducts() for examples.
formula2mz(formula, adduct = "[M+H]+", standardize = TRUE)formula2mz(formula, adduct = "[M+H]+", standardize = TRUE)
formula |
|
adduct |
either a |
standardize |
|
Numeric matrix with same number of rows than elements in formula
and number of columns being equal to the length of adduct (adduct names
are used as column names). Each column thus represents the m/z of formula
for each defined adduct.
Roger Gine
Other adduct related functions:
adductCharge(),
adductFormula(),
adductNames(),
mass2mz(),
mz2mass(),
standardizeSingleCharge()
## Calculate m/z values of adducts of a list of formulas formulas <- c("C6H12O6", "C9H11NO3", "C16H13ClN2O") ads <- c("[M+H]+", "[M+Na]+", "[2M+H]+", "[M]+") formula2mz(formulas, ads) formula2mz(formulas, adductNames()) #All available adducts ## Use custom-defined adducts as input custom_ads <- data.frame(mass_add = c(1, 2, 3), mass_multi = c(1, 2, 0.5)) formula2mz(formulas, custom_ads) ## Use standardize = FALSE to keep formula unaltered formula2mz("H12C6O6") formula2mz("H12C6O6", standardize = FALSE)## Calculate m/z values of adducts of a list of formulas formulas <- c("C6H12O6", "C9H11NO3", "C16H13ClN2O") ads <- c("[M+H]+", "[M+Na]+", "[2M+H]+", "[M]+") formula2mz(formulas, ads) formula2mz(formulas, adductNames()) #All available adducts ## Use custom-defined adducts as input custom_ads <- data.frame(mass_add = c(1, 2, 3), mass_multi = c(1, 2, 0.5)) formula2mz(formulas, custom_ads) ## Use standardize = FALSE to keep formula unaltered formula2mz("H12C6O6") formula2mz("H12C6O6", standardize = FALSE)
Guess the source of names based on predefined mappings. The function counts
how many elements in the input vector x match the names defined in the
mapping schema for each software and returns the software with the highest
match count.
guessSource(x = character(), map = softwareMappingSchema())guessSource(x = character(), map = softwareMappingSchema())
x |
A |
map |
Optional |
A character(1) with the name of the guessed source software.
Gabriele Tomè
Other name translation functions:
nameMapping(),
softwareMapping(),
softwareMappingSchema(),
translate()
## MS-Dial names x <- c("Average Rt(min)", "Alignment ID", "Average Mz") guessSource(x)## MS-Dial names x <- c("Average Rt(min)", "Alignment ID", "Average Mz") guessSource(x)
indexRtime uses a list of known substances to convert retention times (RTs)
to retention indices (RIs). By this retention information is normalized
for differences in experimental settings, such as gradient delay volume,
dead volume or flow rate. By default linear interpolation is performed,
other ways of calculation can supplied as function.
indexRtime(x, y, FUN = rtiLinear, ...)indexRtime(x, y, FUN = rtiLinear, ...)
x |
|
y |
|
FUN |
|
... |
additional parameter used by |
numeric vector of same length as x with retention indices. Values
floating point decimals. If integer values shall be used conversion has
to be performed manually
Michael Witting
rti <- data.frame(rtime = c(1,2,3), rindex = c(100,200,300)) rtime <- c(1.5, 2.5) indexRtime(rtime, rti)rti <- data.frame(rtime = c(1,2,3), rindex = c(100,200,300)) rtime <- c(1.5, 2.5) indexRtime(rtime, rti)
internalStandardMixNames returns available names of internal standard
mixes provided by the MetaboCoreUtils package.
internalStandardMixNames()internalStandardMixNames()
character names of available IS mixes
Michael Witting
internalStandardMixNames()internalStandardMixNames()
internalStandards returns a table with metabolite standards available
in commercial internal standard mixes. The returned data frame contains
the following columns:
"name": the name of the standard
"formula_salt": chemical formula of the salt that was used to produce
the standard mix
"formula_metabolite": chemical formula of the metabolite in free form
"smiles_salt": SMILES of the salt that was used to produced the
standard mix
"smiles_metabolite": SMILES of the metabolite in free form
"mol_weight_salt": molecular (average) weight of the salt (can be used
for calculation of molar concentration, etc.)
"exact_mass_metabolite": exact mass of free metabolites
"conc": concentration of the metabolite in ug/mL (of salt form)
"mix": name of internal standard mix
internalStandards(mix = "QReSS")internalStandards(mix = "QReSS")
mix |
|
data.frame data on internal standards
Michael Witting
internalStandardMixNames() for provided internal standard mixes.
internalStandards(mix = "QReSS") internalStandards(mix = "UltimateSplashOne")internalStandards(mix = "QReSS") internalStandards(mix = "UltimateSplashOne")
In order to identify potential isotopologues based on only m/z and intensity
values with the isotopologues() function, sets of pre-calculated parameters
are required. This function returns such parameter sets estimated on
different sources/databases. The nomenclature used to describe isotopes
follows the following convention: the number of neutrons is provided in [
as a prefix to the element and the number of atoms of the element as suffix.
[13]C2[37]Cl3 describes thus an isotopic substitution containing 2 [13]C
isotopes and 3 [37]Cl isotopes.
Each row in the returned data.frame is associated with an isotopic
substitution (which can involve isotopes of several elements or different
isotopes of the same element). In general for each isotopic substitution
multiple rows are present in the data.frame. Each row provides parameters
to compute bounds (for the ratio between the isotopologue peak and the
monoisotopic one) on a certain mass range. The provided isotopic
substitutions are in general the most frequently observed substitutions in
the database (e.g. HMDB) on which they were defined. Parameters (columns)
defined for each isotopic substitution are:
"minmass": the minimal mass of a compound for which the isotopic
substitution was found. Peaks with a mass lower than this will most likely
not have the respective isotopic substitution.
"maxmass": the maximal mass of a compound for which the isotopic
substitution was found. Peaks with a mass higher than this will most likely
not have the respective isotopic substitution.
"md": the mass difference between the monoisotopic peak and a peak of an
isotopologue characterized by the respective isotopic substitution.
"leftend": left endpoint of the mass interval.
"rightend": right endpoint of the mass interval.
"LBint": intercept of the lower bound line on the mass interval whose
endpoints are "leftend" and "rightend".
"LBslope": slope of the lower bound line on the mass interval.
"UBint": intercept of the upper bound line on the mass interval.
"UBslope": slope of the upper bound line on the mass interval.
isotopicSubstitutionMatrix(source = c("HMDB_NEUTRAL"))isotopicSubstitutionMatrix(source = c("HMDB_NEUTRAL"))
source |
|
data.frame with parameters to detect the defined isotopic
substitutions
source = "HMDB": most common isotopic substitutions and parameters for
these have been calculated for all compounds from the
Human Metabolome Database (HMDB, July 2021). Note that
the substitutions were calculated on the neutral masses (i.e. the
chemical formulas of the compounds, not considering any adducts).
Andrea Vicini
## Get the substitution matrix calculated on HMDB isotopicSubstitutionMatrix("HMDB_NEUTRAL")## Get the substitution matrix calculated on HMDB isotopicSubstitutionMatrix("HMDB_NEUTRAL")
Given a spectrum (i.e. a peak matrix with m/z and intensity values)
the function identifies groups of potential isotopologue peaks based on
pre-defined mass differences and intensity (probability) ratios that need
to be passed to the function with the substDefinition parameter. Each
isotopic substitution in a compound determines a certain isotopologue and it
is associated with a certain mass difference of that with respect to the
monoisotopic isotopologue. Also each substitution in a compound is linked
to a certain ratio between the intensities of the peaks of the corresponding
isotopologue and the monoisotopic one. This ratio isn't the same for
isotopologues corresponding to the same isotopic substitution but to
different compounds. Through the substDefinition parameter we provide
upper and lower values to compute bounds for each isotopic substitution
dependent on the peak's mass.
isotopologues( x, substDefinition = isotopicSubstitutionMatrix(), tolerance = 0, ppm = 20, seedMz = numeric(), charge = 1, .check = TRUE )isotopologues( x, substDefinition = isotopicSubstitutionMatrix(), tolerance = 0, ppm = 20, seedMz = numeric(), charge = 1, .check = TRUE )
x |
|
substDefinition |
|
tolerance |
|
ppm |
|
seedMz |
|
charge |
|
.check |
|
The function iterates over the peaks (rows) in x. For each peak (which is
assumed to be the monoisotopic peak) it searches other peaks in x with a
difference in mass matching (given ppm and tolerance) any of the
pre-defined mass differences in substDefinitions (column "md"). The mass
is obtained by multiplying the m/z of the peaks for the charge expected
for the ionized compounds.
For matching peaks, the function next evaluates whether their intensity is
within the expected (pre-defined) intensity range. Using "LBint",
"LBslope", "UBint", "UBslope" of the previously matched isotopic
substitution in substDefinition, the function estimates a (mass dependent)
lower and upper intensity ratio limit based on the peak's mass.
When some peaks are grouped together their indexes are excluded from the set of indexes that are searched for further groups (i.e. peaks already assigned to an isotopologue group are not considered/tested again thus each peak can only be part of one isotopologue group).
list of integer vectors. Each integer vector contains the
indixes of the rows in x with potential isotopologues of the same
compound.
Andrea Vicini
## Read theoretical isotope pattern (high resolution) from example file x <- read.table(system.file("exampleSpectra", "serine-alpha-lactose-caffeine.txt", package = "MetaboCoreUtils"), header = TRUE) x <- x[order(x$mz), ] plot(x$mz, x$intensity, type = "h") isos <- isotopologues(x, ppm = 5) isos ## highlight them in the plot for (i in seq_along(isos)) { z <- isos[[i]] points(x$mz[z], x$intensity[z], col = i + 1) }## Read theoretical isotope pattern (high resolution) from example file x <- read.table(system.file("exampleSpectra", "serine-alpha-lactose-caffeine.txt", package = "MetaboCoreUtils"), header = TRUE) x <- x[order(x$mz), ] plot(x$mz, x$intensity, type = "h") isos <- isotopologues(x, ppm = 5) isos ## highlight them in the plot for (i in seq_along(isos)) { z <- isos[[i]] points(x$mz[z], x$intensity[z], col = i + 1) }
mass2mz calculates the m/z value from a neutral mass and an adduct
definition.
Custom adduct definitions can be passed to the adduct parameter in form of
a data.frame. This data.frame is expected to have columns "mass_add"
and "mass_multi" defining the additive and multiplicative part of the
calculation. See adducts() for examples.
mass2mz(x, adduct = "[M+H]+")mass2mz(x, adduct = "[M+H]+")
x |
|
adduct |
either a |
numeric matrix with same number of rows than elements in x and
number of columns being equal to the length of adduct (adduct names
are used as column names). Each column thus represents the m/z of x
for each defined adduct.
Michael Witting, Johannes Rainer
mz2mass() for the reverse calculation, adductNames() for
supported adduct definitions.
Other adduct related functions:
adductCharge(),
adductFormula(),
adductNames(),
formula2mz(),
mz2mass(),
standardizeSingleCharge()
exact_mass <- c(100, 200, 250) adduct <- "[M+H]+" ## Calculate m/z of [M+H]+ adduct from neutral mass mass2mz(exact_mass, adduct) exact_mass <- 100 adduct <- "[M+Na]+" ## Calculate m/z of [M+Na]+ adduct from neutral mass mass2mz(exact_mass, adduct) ## Calculate m/z of multiple adducts from neutral mass mass2mz(exact_mass, adduct = adductNames()) ## Provide a custom adduct definition. adds <- data.frame(mass_add = c(1, 2, 3), mass_multi = c(1, 2, 0.5)) rownames(adds) <- c("a", "b", "c") mass2mz(c(100, 200), adds)exact_mass <- c(100, 200, 250) adduct <- "[M+H]+" ## Calculate m/z of [M+H]+ adduct from neutral mass mass2mz(exact_mass, adduct) exact_mass <- 100 adduct <- "[M+Na]+" ## Calculate m/z of [M+Na]+ adduct from neutral mass mass2mz(exact_mass, adduct) ## Calculate m/z of multiple adducts from neutral mass mass2mz(exact_mass, adduct = adductNames()) ## Provide a custom adduct definition. adds <- data.frame(mass_add = c(1, 2, 3), mass_multi = c(1, 2, 0.5)) rownames(adds) <- c("a", "b", "c") mass2mz(c(100, 200), adds)
The mclosest function calculates the closest rows between two matrices
(or data frames) considering pairwise differences between values in columns
of x and table. It returns the index of the closest row in table for
each row in x.
mclosest(x, table, ppm = 0, tolerance = Inf)mclosest(x, table, ppm = 0, tolerance = Inf)
x |
|
table |
|
ppm |
|
tolerance |
|
If, for a row of x, two rows of table are closest only the index of first
row will be returned.
For both the tolerance and ppm arguments, if their length is different to
the number of columns of x and table, the input argument will be
replicated to match it.
integer vector of indices indicating the closest row of table for
each row of x. If no suitable match is found for a row in x based on the
specified tolerance and ppm, the corresponding index is set to NA.
Philippine Louail
x <- data.frame(a = 1:5, b = 3:7) table <- data.frame(c = c(11, 23, 3, 5, 1), d = c(32:35, 45)) ## Get for each row of `x` the index of the row in `table` with the smallest ## difference of values (per column) mclosest(x, table) ## If the absolute difference is larger than `tolerance`, return `NA`. Note ## that the tolerance value of `25` is used for difference for each pairwise ## column in `x` and `table`. mclosest(x, table, tolerance = 25)x <- data.frame(a = 1:5, b = 3:7) table <- data.frame(c = c(11, 23, 3, 5, 1), d = c(32:35, 45)) ## Get for each row of `x` the index of the row in `table` with the smallest ## difference of values (per column) mclosest(x, table) ## If the absolute difference is larger than `tolerance`, return `NA`. Note ## that the tolerance value of `25` is used for difference for each pairwise ## column in `x` and `table`. mclosest(x, table, tolerance = 25)
multiplyElements Multiply the number of atoms of each element by a
constant, positive, integer
multiplyElements(x, k)multiplyElements(x, k)
x |
|
k |
|
character strings with the standardized chemical formula.
Roger Gine
multiplyElements("H2O", 3) multiplyElements(c("C6H12O6", "Na", "CH4O"), 2)multiplyElements("H2O", 3) multiplyElements(c("C6H12O6", "Na", "CH4O"), 2)
mz2mass calculates the neutral mass from a given m/z value and adduct
definition.
Custom adduct definitions can be passed to the adduct parameter in form of
a data.frame. This data.frame is expected to have columns "mass_add"
and "mass_multi" defining the additive and multiplicative part of the
calculation. See adducts() for examples.
mz2mass(x, adduct = "[M+H]+")mz2mass(x, adduct = "[M+H]+")
x |
|
adduct |
either a |
numeric matrix with same number of rows than elements in x and
number of columns being equal to the length of adduct (adduct names
are used as column names. Each column thus represents the neutral mass
of x for each defined adduct.
Michael Witting, Johannes Rainer
mass2mz() for the reverse calculation, adductNames() for
supported adduct definitions.
Other adduct related functions:
adductCharge(),
adductFormula(),
adductNames(),
formula2mz(),
mass2mz(),
standardizeSingleCharge()
ion_mass <- c(100, 200, 300) adduct <- "[M+H]+" ## Calculate m/z of [M+H]+ adduct from neutral mass mz2mass(ion_mass, adduct) ion_mass <- 100 adduct <- "[M+Na]+" ## Calculate m/z of [M+Na]+ adduct from neutral mass mz2mass(ion_mass, adduct) ## Provide a custom adduct definition. adds <- data.frame(mass_add = c(1, 2, 3), mass_multi = c(1, 2, 0.5)) rownames(adds) <- c("a", "b", "c") mz2mass(c(100, 200), adds)ion_mass <- c(100, 200, 300) adduct <- "[M+H]+" ## Calculate m/z of [M+H]+ adduct from neutral mass mz2mass(ion_mass, adduct) ion_mass <- 100 adduct <- "[M+Na]+" ## Calculate m/z of [M+Na]+ adduct from neutral mass mz2mass(ion_mass, adduct) ## Provide a custom adduct definition. adds <- data.frame(mass_add = c(1, 2, 3), mass_multi = c(1, 2, 0.5)) rownames(adds) <- c("a", "b", "c") mz2mass(c(100, 200), adds)
Get a named character vector that defines the mapping between names of two
software based on the mapping schema. The names of the returned vector are
the names of the from software and the values are the corresponding names
of the to software.
nameMapping( from = character(), to = character(), map = softwareMappingSchema() )nameMapping( from = character(), to = character(), map = softwareMappingSchema() )
from |
|
to |
|
map |
Optional |
A named character vector with the mapping between the two software.
Gabriele Tomè
Other name translation functions:
guessSource(),
softwareMapping(),
softwareMappingSchema(),
translate()
nameMapping(from = "MS-Dial", to = "mzTab-M")nameMapping(from = "MS-Dial", to = "mzTab-M")
pasteElements creates a chemical formula from element counts (such as
returned by countElements()).
pasteElements(x)pasteElements(x)
x |
|
character() with the chemical formulas.
Michael Witting and Sebastian Gibb
elements <- c("C" = 6, "H" = 12, "O" = 6) pasteElements(elements)elements <- c("C" = 6, "H" = 12, "O" = 6) pasteElements(elements)
The following functions allow to calculate basic quality assessment estimates typically employed in the analysis of metabolomics data. These functions are designed to be applied to entire rows of data, where each row corresponds to a feature. Subsequently, these estimates can serve as a foundation for feature filtering.
rsd and rowRsd are convenience functions to calculate the relative
standard deviation (i.e. coefficient of variation) of a numerical vector
or for rows of a numerical matrix, respectively.
rowDratio computes the D-ratio or dispersion ratio, defined as the
standard deviation for QC (Quality Control) samples divided by the
standard deviation for biological test samples, for each feature (row) in
the matrix.
percentMissing and rowPercentMissing determine the percentage of
missing values in a vector or for each row of a matrix, respectively.
rowBlank identifies rows (i.e., features) where the mean of test samples
is lower than a specified multiple (defined by the threshold parameter)
of the mean of blank samples. This can be used to flag features that result
from contamination in the solvent of the samples.
These functions are based on standard filtering methods described in the literature, and they are implemented to assist in preprocessing metabolomics data.
rsd(x, na.rm = TRUE, mad = FALSE) rowRsd(x, na.rm = TRUE, mad = FALSE) rowDratio(x, y, na.rm = TRUE, mad = FALSE) percentMissing(x) rowPercentMissing(x) rowBlank(x, y, threshold = 2, na.rm = TRUE)rsd(x, na.rm = TRUE, mad = FALSE) rowRsd(x, na.rm = TRUE, mad = FALSE) rowDratio(x, y, na.rm = TRUE, mad = FALSE) percentMissing(x) rowPercentMissing(x) rowBlank(x, y, threshold = 2, na.rm = TRUE)
x |
|
na.rm |
|
mad |
|
y |
|
threshold |
|
See individual function description above for details.
For rsd and rowRsd the feature abundances are expected to be provided in
natural scale and not e.g. log2 scale as it may lead to incorrect
interpretations.
Philippine Louail, Johannes Rainer
Broadhurst D, Goodacre R, Reinke SN, Kuligowski J, Wilson ID, Lewis MR, Dunn WB. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics. 2018;14(6):72. doi: 10.1007/s11306-018-1367-3. Epub 2018 May 18. PMID: 29805336; PMCID: PMC5960010.
## coefficient of variation a <- c(4.3, 4.5, 3.6, 5.3) rsd(a) A <- rbind(a, a, a) rowRsd(A) ## Dratio x <- c(4.3, 4.5, 3.6, 5.3) X <- rbind(a, a, a) rowDratio(X, X) #' ## Percent Missing b <- c(1, NA, 3, 4, NA) percentMissing(b) B <- matrix(c(1, 2, 3, NA, 5, 6, 7, 8, 9), nrow = 3) rowPercentMissing(B) ## Blank Rows test_samples <- matrix(c(13, 21, 3, 4, 5, 6), nrow = 2) blank_samples <- matrix(c(0, 1, 2, 3, 4, 5), nrow = 2) rowBlank(test_samples, blank_samples)## coefficient of variation a <- c(4.3, 4.5, 3.6, 5.3) rsd(a) A <- rbind(a, a, a) rowRsd(A) ## Dratio x <- c(4.3, 4.5, 3.6, 5.3) X <- rbind(a, a, a) rowDratio(X, X) #' ## Percent Missing b <- c(1, NA, 3, 4, NA) percentMissing(b) B <- matrix(c(1, 2, 3, NA, 5, 6, 7, 8, 9), nrow = 3) rowPercentMissing(B) ## Blank Rows test_samples <- matrix(c(13, 21, 3, 4, 5, 6), nrow = 2) blank_samples <- matrix(c(0, 1, 2, 3, 4, 5), nrow = 2) rowBlank(test_samples, blank_samples)
Get the names of the supported software defined in the default mapping schema
softwareMapping()softwareMapping()
A character vector with the names of the supported software.
Gabriele Tomè
Other name translation functions:
guessSource(),
nameMapping(),
softwareMappingSchema(),
translate()
softwareMapping()softwareMapping()
Get the mapping schema as a data.frame that defines the mapping between
names of different software.
softwareMappingSchema(path = NULL)softwareMappingSchema(path = NULL)
path |
Optional |
A data.frame with the mapping schema.
Gabriele Tomè
Other name translation functions:
guessSource(),
nameMapping(),
softwareMapping(),
translate()
softwareMappingSchema()softwareMappingSchema()
standardizeFormula standardizes a supplied chemical formula according
to the Hill notation system.
standardizeFormula(x)standardizeFormula(x)
x |
|
character strings with the standardized chemical formula.
Michael Witting and Sebastian Gibb
pasteElements() countElements()
standardizeFormula("C6O6H12")standardizeFormula("C6O6H12")
standardizeSingleCharge() replaces the short-hand notation of single
charges eventually present in adduct definition with the standard
notion, i.e., it replaces "+" with "1+" in e.g. "[M+H]+" and "-"
with "1-" in "[M-H]-".
standardizeSingleCharge(x)standardizeSingleCharge(x)
x |
|
character with standardized single charge definitions.
Other adduct related functions:
adductCharge(),
adductFormula(),
adductNames(),
formula2mz(),
mass2mz(),
mz2mass()
a <- c("[M+H]+", "[M-H]-", "[M+H]1+") standardizeSingleCharge(a)a <- c("[M+H]+", "[M-H]-", "[M+H]1+") standardizeSingleCharge(a)
subtractElements subtracts one chemical formula from another.
subtractElements(x, y)subtractElements(x, y)
x |
|
y |
|
character Resulting formula
Michael Witting and Sebastian Gibb
subtractElements("C6H12O6", "H2O") subtractElements("C6H12O6", "NH3")subtractElements("C6H12O6", "H2O") subtractElements("C6H12O6", "NH3")
Map names from one software to another. The function replaces elements
in the input vector x with their corresponding values in the mapping if a
match is found. If an element in x does not have a corresponding mapping,
it is returned unchanged with a warning.
translate(x = character(), mapping = NULL)translate(x = character(), mapping = NULL)
x |
|
mapping |
A named |
A character vector with the translated names.
Gabriele Tomè
Other name translation functions:
guessSource(),
nameMapping(),
softwareMapping(),
softwareMappingSchema()
## MS-Dial names x <- c("Average Rt(min)", "Alignment ID", "Average Mz") map_vec <- nameMapping(from = "MS-Dial", to = "mzTab-M") translate(x, mapping = map_vec)## MS-Dial names x <- c("Average Rt(min)", "Alignment ID", "Average Mz") map_vec <- nameMapping(from = "MS-Dial", to = "mzTab-M") translate(x, mapping = map_vec)