Title: | Tools for high-throughput metabolomics |
---|---|
Description: | Tools to analyze and visualize high-throughput metabolomics data aquired using chromatography-mass spectrometry. These tools preprocess data in a way that enables reliable and powerful differential analysis. At the core of these methods is a peak detection phase that pools information across all samples simultaneously. This is in contrast to other methods that detect peaks in a sample-by-sample basis. |
Authors: | Leslie Myint [cre, aut] , Kasper Daniel Hansen [aut] |
Maintainer: | Leslie Myint <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.33.0 |
Built: | 2025-01-06 06:06:26 UTC |
Source: | https://github.com/bioc/yamss |
The bakedpi
method stands for bivariate approximate kernel density
estimation for peak identification. It performs background correction,
retention time correction, and bivariate kernel density estimation.
bakedpi(cmsRaw, dbandwidth = c(0.005, 10), dgridstep = c(0.005, 1), outfileDens = NULL, dortalign = FALSE, mzsubset = NULL, verbose = TRUE)
bakedpi(cmsRaw, dbandwidth = c(0.005, 10), dgridstep = c(0.005, 1), outfileDens = NULL, dortalign = FALSE, mzsubset = NULL, verbose = TRUE)
cmsRaw |
An object of class |
dbandwidth |
A length-2 vector indicating the kernel density bandwidth
in the M/Z and retention time (scan) directions. Default: |
dgridstep |
A length-2 vector indicating the grid step sizes. Default:
|
outfileDens |
Name of a file to save density estimate. If NULL, no output is saved. |
dortalign |
A logical value. Should retention time correction be performed? |
mzsubset |
A length-2 vector indicating a subset of the M/Z range to
process. |
verbose |
Should the function be verbose? |
bakedpi
first performs region-specific background correction. An
optional retention time correction step follows in which M/Z region-specific
shifts are computed to align the raw data. Next the two-dimensional density
estimate is computed. The purpose of this function is to take the raw data
read in by readMSdata
and perform the steps necessary for bivariate
kernel density estimation. The output of this function is used by
slicepi
to detect peaks and provide peak quantifications.
An object of class CMSproc
containing background corrected intensities,
the bivariate kernel density estimate, and quantiles of the nonzero values in
the density estimate.
## A very small dataset data(cmsRawExample) cmsProc1 <- bakedpi(cmsRawExample, dbandwidth = c(0.01, 10), dgridstep = c(0.01, 1), dortalign = TRUE, mzsubset = c(500,510)) ## A longer example which takes a few minutes to run. ## This is still a smaller mz-slice of the full data. if (require(mtbls2)) { data(mtbls2) filepath <- file.path(find.package("mtbls2"), "mzML") files <- list.files(filepath, pattern = "MSpos-Ex1", recursive = TRUE, full.names = TRUE) colData <- DataFrame(sampClasses = rep(c("wild-type", "mutant"), each = 4)) cmsRaw <- readMSdata(files = files, colData = colData, verbose = TRUE) cmsProc2 <- bakedpi(cmsRaw, dbandwidth = c(0.01, 10), dgridstep = c(0.01, 1), outfileDens = NULL, dortalign = TRUE, mzsubset = c(500, 520)) }
## A very small dataset data(cmsRawExample) cmsProc1 <- bakedpi(cmsRawExample, dbandwidth = c(0.01, 10), dgridstep = c(0.01, 1), dortalign = TRUE, mzsubset = c(500,510)) ## A longer example which takes a few minutes to run. ## This is still a smaller mz-slice of the full data. if (require(mtbls2)) { data(mtbls2) filepath <- file.path(find.package("mtbls2"), "mzML") files <- list.files(filepath, pattern = "MSpos-Ex1", recursive = TRUE, full.names = TRUE) colData <- DataFrame(sampClasses = rep(c("wild-type", "mutant"), each = 4)) cmsRaw <- readMSdata(files = files, colData = colData, verbose = TRUE) cmsProc2 <- bakedpi(cmsRaw, dbandwidth = c(0.01, 10), dgridstep = c(0.01, 1), outfileDens = NULL, dortalign = TRUE, mzsubset = c(500, 520)) }
This class builds on the CMSraw
class to additionally store
background-corrected intensities as well as the bivariate kernel density
estimate.
colData
:a DataFrame
of phenotype and sample information.
rawDT
:a data.table
of raw spectral information.
mzParams
:a list containing the minimum and maximum M/Z value and number of scans in each sample.
rtAlign
:a logical indicating whether the data has been retention time aligned or not.
bgcorrDT
:a data.table
of background-corrected spectral information.
density
:a matrix with rows corresponding to M/Z values and columns corresponding to scans containing the kernel density estimate.
densityQuantiles
:a numeric vector containing the 100 percent quantiles of the nonzero density values.
We have the following utility functions:
show
:The show method; prints the object.
getEICS
:Gets extracted ion chromatograms (EICs) for the supplied M/Z ranges.
plotDensityRegion
:Makes an image plot of the density estimate in a specified M/Z and scan region.
We have the following accessor functions:
colData
:Gets the DataFrame
containing phenotype and sample information.
densityEstimate
:Gets the matrix containing the density estimate.
densityQuantiles
:Gets the quantiles of the nonzero values in the density estimate.
## Construct a completely fake example densmat <- matrix(rnorm(600), nrow = 20, ncol = 30) colnames(densmat) <- 1:ncol(densmat) rownames(densmat) <- seq(350, by = 0.005, length.out = nrow(densmat)) cmsobj <- new("CMSproc", density = densmat) head(densityEstimate(cmsobj)) ## Takes about 20s to run data(cmsRawExample) cmsProc <- bakedpi(cmsRawExample, dbandwidth = c(0.01, 10), dgridstep = c(0.01, 1), dortalign = TRUE, mzsubset = c(500,510)) cmsProc
## Construct a completely fake example densmat <- matrix(rnorm(600), nrow = 20, ncol = 30) colnames(densmat) <- 1:ncol(densmat) rownames(densmat) <- seq(350, by = 0.005, length.out = nrow(densmat)) cmsobj <- new("CMSproc", density = densmat) head(densityEstimate(cmsobj)) ## Takes about 20s to run data(cmsRawExample) cmsProc <- bakedpi(cmsRawExample, dbandwidth = c(0.01, 10), dgridstep = c(0.01, 1), dortalign = TRUE, mzsubset = c(500,510)) cmsProc
This class saves the data from raw mass spectral data files in a
data.table
and is used in subsequent processing. Objects of
this class are created by the readMSdata
function.
colData
:a DataFrame
of phenotype and sample information.
rawDT
:a data.table
of raw spectral information.
mzParams
:a list containing the minimum and maximum M/Z value and number of scans in each sample.
We have the following utility functions:
show
:The show method; prints the object.
getEICS
:Gets extracted ion chromatograms (EICs) for the supplied M/Z ranges.
We have the following accessor functions:
colData
:Gets the DataFrame
containing phenotype and sample information.
data(cmsRawExample) cmsRawExample if (require(mtbls2)) { data(mtbls2) filepath <- file.path(find.package("mtbls2"), "mzML") files <- list.files(filepath, pattern = "MSpos-Ex1", recursive = TRUE, full.names = TRUE)[1] colData <- DataFrame(group = "wild-type") cmsRaw <- readMSdata(files = files, colData = colData, verbose = TRUE) colData(cmsRaw) }
data(cmsRawExample) cmsRawExample if (require(mtbls2)) { data(mtbls2) filepath <- file.path(find.package("mtbls2"), "mzML") files <- list.files(filepath, pattern = "MSpos-Ex1", recursive = TRUE, full.names = TRUE)[1] colData <- DataFrame(group = "wild-type") cmsRaw <- readMSdata(files = files, colData = colData, verbose = TRUE) colData(cmsRaw) }
This object contains parsed raw data for 4 samples in the MTBLS2 dataset.
cmsRawExample
cmsRawExample
A CMSraw
object containing information on 4 samples
in the MTBLS2 dataset.
An object of class CMSraw
containing parsed data for 4 samples
in the MTBLS2 dataset.
The mtbls2
Bioconductor data package.
This class is based on the SummarizedExperiment
class. It holds
information on peak quantifications, M/Z and scan bounds, sample information,
and preprocessing metadata. Objects of the class can be constructed
using CMSslice
.
We have the following utility functions:
show
:The show method; prints the object.
We have the following accessor functions:
colData
:Gets the DataFrame
containing phenotype and sample information.
densityCutoff
:Gets the value used to threshold the density for peak calling.
densityQuantiles
:Gets the quantiles of the nonzero values in the density estimate.
peakBounds
:Gets the DataFrame
of M/Z bounds, scan bounds, and ID numbers
for detected peaks.
peakQuants
:Gets the matrix of peak quantifications (rows: peaks, columns: samples).
## Construct a fake class quants <- matrix(rnorm(12*5000), nrow = 5000, ncol = 12) bounds <- cbind(mzmin = seq(from = 100, to = 1100, length.out = 5000), mzmax = seq(from = 100, to = 1100, length.out = 5000) + 0.1, scan.min = rep(10,5000), scan.max = rep(20, 5000), peaknum = 1:5000) cmsobj <- CMSslice(assays = SimpleList(peakQuants = quants), rowData = DataFrame(bounds)) head(peakQuants(cmsobj)) ## A better example which takes 20s to run data(cmsRawExample) cmsProc <- bakedpi(cmsRawExample, dbandwidth = c(0.01, 10), dgridstep = c(0.01, 1), dortalign = TRUE, mzsubset = c(500,510)) cutoff <- tail(densityQuantiles(cmsProc), 2)[1] sliced <- slicepi(cmsProc, cutoff = cutoff, verbose = TRUE) sliced
## Construct a fake class quants <- matrix(rnorm(12*5000), nrow = 5000, ncol = 12) bounds <- cbind(mzmin = seq(from = 100, to = 1100, length.out = 5000), mzmax = seq(from = 100, to = 1100, length.out = 5000) + 0.1, scan.min = rep(10,5000), scan.max = rep(20, 5000), peaknum = 1:5000) cmsobj <- CMSslice(assays = SimpleList(peakQuants = quants), rowData = DataFrame(bounds)) head(peakQuants(cmsobj)) ## A better example which takes 20s to run data(cmsRawExample) cmsProc <- bakedpi(cmsRawExample, dbandwidth = c(0.01, 10), dgridstep = c(0.01, 1), dortalign = TRUE, mzsubset = c(500,510)) cutoff <- tail(densityQuantiles(cmsProc), 2)[1] sliced <- slicepi(cmsProc, cutoff = cutoff, verbose = TRUE) sliced
Performs differential abundance analysis on quantification information
in a CMSslice
object.
diffrep(cms, classes)
diffrep(cms, classes)
cms |
An object of class |
classes |
A character vector of class labels for the samples. |
Differential analysis is performed using the limma
package which
uses empirical Bayes methods in the estimation of feature-wise variances.
A data.frame
containing differential analysis information
including log fold changes and p-values.
quantmat <- matrix(rnorm(12*5000), nrow = 5000, ncol = 12) cmsobj <- CMSslice(assays = SimpleList(peakQuants = quantmat)) classes <- rep(c("case", "control"), each = 6) difftab <- diffrep(cmsobj, classes)
quantmat <- matrix(rnorm(12*5000), nrow = 5000, ncol = 12) cmsobj <- CMSslice(assays = SimpleList(peakQuants = quantmat)) classes <- rep(c("case", "control"), each = 6) difftab <- diffrep(cmsobj, classes)
Computes extracted ion chromatrograms (EICs) for the given M/Z ranges. Intensities are on the log2 scale.
getEICS(object, mzranges)
getEICS(object, mzranges)
object |
An object of class |
mzranges |
A 2-column matrix where each row corresponds to one M/Z range and the first and second columns are the minimum and maximum M/Z values for the range respectively. |
In a given M/Z range, the maximum intensity observed in each scan gives the extracted ion chromatogram.
A list with length equal to the number of rows of mzranges
where each list element is a # scans by # samples matrix of EICs (on the
log2 scale).
data(cmsRawExample) mzranges <- rbind(c(500.01, 500.03), c(501.3, 501.5)) eicList <- getEICS(cmsRawExample, mzranges)
data(cmsRawExample) mzranges <- rbind(c(500.01, 500.03), c(501.3, 501.5)) eicList <- getEICS(cmsRawExample, mzranges)
Computes total ion chromatogram (TIC) for a single sample. Intensities
are on the log2 scale. This requires a CMSraw
object, typically
produced from readMSdata
.
getTIC(object, sample)
getTIC(object, sample)
object |
An object of class |
sample |
An integer - for which sample should the TIC be computed?. |
A vector with length equal to the number of scans containing the log2 sum of intensities at each scan.
data(cmsRawExample) tic <- getTIC(cmsRawExample, sample = 1)
data(cmsRawExample) tic <- getTIC(cmsRawExample, sample = 1)
Makes an image
plot of the density estimate in the specified
M/Z and scan region.
plotDensityRegion(cms, mzrange, scanrange)
plotDensityRegion(cms, mzrange, scanrange)
cms |
An object of class |
mzrange |
A length-2 vector indicating the M/Z range to plot. |
scanrange |
A length-2 vector indicating the scan range to plot. |
This function is invoked for its side effect of plotting.
## For illustration purposes, we make a "dummy" object ## with a random matrix as the density estimate densmat <- matrix(rnorm(600), nrow = 20, ncol = 30) colnames(densmat) <- 1:ncol(densmat) rownames(densmat) <- seq(350, by = 0.005, length.out = nrow(densmat)) densityQuantiles <- quantile(densmat, seq(from = 0, to = 1, by = 0.001)) cmsobj <- new("CMSproc", density = densmat, densityQuantiles = densityQuantiles) plotDensityRegion(cmsobj, mzrange = c(350.01, 350.03), scanrange = c(10,20))
## For illustration purposes, we make a "dummy" object ## with a random matrix as the density estimate densmat <- matrix(rnorm(600), nrow = 20, ncol = 30) colnames(densmat) <- 1:ncol(densmat) rownames(densmat) <- seq(350, by = 0.005, length.out = nrow(densmat)) densityQuantiles <- quantile(densmat, seq(from = 0, to = 1, by = 0.001)) cmsobj <- new("CMSproc", density = densmat, densityQuantiles = densityQuantiles) plotDensityRegion(cmsobj, mzrange = c(350.01, 350.03), scanrange = c(10,20))
Creates a CMSraw
object that contains a data.table
of
raw mass spectral information for all samples. The resulting object
also stores phenotype and sample information. This object is the
basic encapsulation of essential raw experimental data and serves as
the output for further processing methods.
readMSdata(files, colData, mzsubset, verbose)
readMSdata(files, colData, mzsubset, verbose)
files |
A character vector of filenames pointing to the raw data. |
colData |
A |
mzsubset |
A length-2 vector indicating a subset of the M/Z range to
process. |
verbose |
Should the function be verbose? |
A vector with length equal to the number of scans containing the log2 sum of intensities at each scan.
if (require(mtbls2)) { data(mtbls2) filepath <- file.path(find.package("mtbls2"), "mzML") file <- list.files(filepath, pattern = "MSpos-Ex1", recursive = TRUE, full.names = TRUE)[1] colData <- DataFrame(group = "wild-type") cmsRaw <- readMSdata(files = file, colData = colData, verbose = TRUE) }
if (require(mtbls2)) { data(mtbls2) filepath <- file.path(find.package("mtbls2"), "mzML") file <- list.files(filepath, pattern = "MSpos-Ex1", recursive = TRUE, full.names = TRUE)[1] colData <- DataFrame(group = "wild-type") cmsRaw <- readMSdata(files = file, colData = colData, verbose = TRUE) }
The slicepi
method uses the bivariate approximate kernel density
estimate computed by bakedpi
and uses a cutoff to bound and quantify
peaks.
slicepi(object, cutoff = NULL, verbose = TRUE)
slicepi(object, cutoff = NULL, verbose = TRUE)
object |
An object of class |
cutoff |
A number indicating the threshold to apply to the density
estimate. |
verbose |
Should the function be verbose? |
slicepi
uses the most intense features in set regions of the M/Z space
to identify a data-driven density cutoff to detect peaks. Once peak bounds have
been computed, the extracted ion chromatograms for the peaks are computed, and
the EICs are integrated to obtain peak quantifications.
An object of class CMSslice
containing peak bounds and quantifications
as well as sample and preprocessing metadata.
data(cmsRawExample) cmsProc <- bakedpi(cmsRawExample, dbandwidth = c(0.01, 10), dgridstep = c(0.01, 1), outfileDens = NULL, dortalign = FALSE, verbose = TRUE) dqs <- densityQuantiles(cmsProc) cmsSlice <- slicepi(cmsProc, cutoff = dqs[996], verbose = TRUE) cmsSlice
data(cmsRawExample) cmsProc <- bakedpi(cmsRawExample, dbandwidth = c(0.01, 10), dgridstep = c(0.01, 1), outfileDens = NULL, dortalign = FALSE, verbose = TRUE) dqs <- densityQuantiles(cmsProc) cmsSlice <- slicepi(cmsProc, cutoff = dqs[996], verbose = TRUE) cmsSlice