| Title: | LC-MS and GC-MS Data Analysis |
|---|---|
| Description: | Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling. |
| Authors: | Colin A. Smith [aut], Ralf Tautenhahn [aut], Steffen Neumann [aut, cre] (ORCID: <https://orcid.org/0000-0002-7899-7192>), Paul Benton [aut], Christopher Conley [aut], Johannes Rainer [aut] (ORCID: <https://orcid.org/0000-0002-6977-7147>), Michael Witting [ctb], William Kumler [aut] (ORCID: <https://orcid.org/0000-0002-5022-8009>), Philippine Louail [aut] (ORCID: <https://orcid.org/0009-0007-5429-6846>), Pablo Vangeenderhuysen [ctb] (ORCID: <https://orcid.org/0000-0002-5492-6904>), Carl Brunius [ctb] (ORCID: <https://orcid.org/0000-0003-3957-870X>) |
| Maintainer: | Steffen Neumann <[email protected]> |
| License: | GPL (>= 2) + file LICENSE |
| Version: | 4.11.0 |
| Built: | 2026-06-03 07:58:16 UTC |
| Source: | https://github.com/bioc/xcms |
The methods listed on this page allow to filter and subset XCMSnExp
objects. Most of them are inherited from the MSnbase::OnDiskMSnExp object
defined in the MSnbase package and have been adapted for XCMSnExp to
enable correct subsetting of preprocessing results.
[: subset a XCMSnExp object by spectra. Be aware that this removes
all preprocessing results, except adjusted retention times if
keepAdjustedRtime = TRUE is passed to the method.
[[: extracts a single Spectrum object (defined in MSnbase). The
reported retention time is the adjusted retention time if alignment has
been performed.
filterChromPeaks: subset the chromPeaks matrix in object. Parameter
method allows to specify how the chromatographic peaks should be
filtered. Currently, only method = "keep" is supported which allows to
specify chromatographic peaks to keep with parameter keep (i.e. provide
a logical, integer or character defining which chromatographic peaks
to keep). Feature definitions (if present) are updated correspondingly.
filterFeatureDefinitions: allows to subset the feature definitions of
an XCMSnExp object. Parameter features allow to define which features
to keep. It can be a logical, integer (index of features to keep) or
character (feature IDs) vector.
filterFile: allows to reduce the XCMSnExp to data from only selected
files. Identified chromatographic peaks for these files are retained while
correspondence results (feature definitions) are removed by default. To
force keeping feature definitions use keepFeatures = TRUE. Adjusted
retention times (if present) are retained by default if present. Use
keepAdjustedRtime = FALSE to drop them.
filterMsLevel: reduces the XCMSnExp object to spectra of the
specified MS level(s). Chromatographic peaks and identified features are
also subsetted to the respective MS level. See also the filterMsLevel
documentation in MSnbase for details and examples.
filterMz: filters the data set based on the provided m/z value range.
All chromatographic peaks and features (grouped peaks) with their apex
falling within the provided mz value range are retained
(i.e. if chromPeaks(object)[, "mz"] is >= mz[1] and <= mz[2]).
Adjusted retention times, if present, are kept.
filterRt: filters the data set based on the provided retention time
range. All chromatographic peaks and features (grouped peaks)
within the specified retention time window are retained
(i.e. if the retention time corresponding to the peak's apex is within the
specified rt range). If retention time correction has been performed,
the method will by default filter the object by adjusted retention times.
The argument adjusted allows to specify manually whether filtering
should be performed on raw or adjusted retention times. Filtering by
retention time does not drop any preprocessing results nor does it remove
or change alignment results (i.e. adjusted retention times).
The method returns an empty object if no spectrum or feature is within
the specified retention time range.
split: splits an XCMSnExp object into a list of XCMSnExp objects
based on the provided parameter f. Note that by default all
pre-processing results are removed by the splitting, except adjusted
retention times, if the optional argument keepAdjustedRtime = TRUE is
provided.
## S4 method for signature 'XCMSnExp,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'XCMSnExp,ANY,ANY' x[[i, j, drop = FALSE]] ## S4 method for signature 'XCMSnExp' filterMsLevel(object, msLevel., keepAdjustedRtime = hasAdjustedRtime(object)) ## S4 method for signature 'XCMSnExp' filterFile( object, file, keepAdjustedRtime = hasAdjustedRtime(object), keepFeatures = FALSE ) ## S4 method for signature 'XCMSnExp' filterMz(object, mz, msLevel., ...) ## S4 method for signature 'XCMSnExp' filterRt(object, rt, msLevel., adjusted = hasAdjustedRtime(object)) ## S4 method for signature 'XCMSnExp,ANY' split(x, f, drop = FALSE, ...) ## S4 method for signature 'XCMSnExp' filterChromPeaks( object, keep = rep(TRUE, nrow(chromPeaks(object))), method = "keep", ... ) ## S4 method for signature 'XCMSnExp' filterFeatureDefinitions(object, features = integer())## S4 method for signature 'XCMSnExp,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'XCMSnExp,ANY,ANY' x[[i, j, drop = FALSE]] ## S4 method for signature 'XCMSnExp' filterMsLevel(object, msLevel., keepAdjustedRtime = hasAdjustedRtime(object)) ## S4 method for signature 'XCMSnExp' filterFile( object, file, keepAdjustedRtime = hasAdjustedRtime(object), keepFeatures = FALSE ) ## S4 method for signature 'XCMSnExp' filterMz(object, mz, msLevel., ...) ## S4 method for signature 'XCMSnExp' filterRt(object, rt, msLevel., adjusted = hasAdjustedRtime(object)) ## S4 method for signature 'XCMSnExp,ANY' split(x, f, drop = FALSE, ...) ## S4 method for signature 'XCMSnExp' filterChromPeaks( object, keep = rep(TRUE, nrow(chromPeaks(object))), method = "keep", ... ) ## S4 method for signature 'XCMSnExp' filterFeatureDefinitions(object, features = integer())
x |
For |
i |
For |
j |
For |
... |
Optional additional arguments. |
drop |
For |
object |
A XCMSnExp object. |
msLevel. |
For |
keepAdjustedRtime |
For |
file |
For |
keepFeatures |
For |
mz |
For |
rt |
For |
adjusted |
For |
f |
For |
keep |
For |
method |
For |
features |
For |
All subsetting methods try to ensure that the returned data is
consistent. Correspondence results for example are removed by default if the
data set is sub-setted by file, since the correspondence results are
dependent on the files on which correspondence was performed. This can be
changed by setting keepFeatures = TRUE.
For adjusted retention times, most subsetting methods
support the argument keepAdjustedRtime (even the [ method)
that forces the adjusted retention times to be retained even if the
default would be to drop them.
All methods return an XCMSnExp object.
The filterFile method removes also process history steps not
related to the files to which the object should be sub-setted and updates
the fileIndex attribute accordingly. Also, the method does not
allow arbitrary ordering of the files or re-ordering of the files within
the object.
Note also that most of the filtering methods, and also the subsetting
operations [ drop all or selected preprocessing results. To
consolidate the alignment results, i.e. ensure that adjusted retention
times are always preserved, use the applyAdjustedRtime()
function on the object that contains the alignment results. This replaces
the raw retention times with the adjusted ones.
Johannes Rainer
XCMSnExp for base class documentation.
XChromatograms() for similar filter functions on
XChromatograms objects.
## Loading a test data set with identified chromatographic peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) ## Subset the dataset to the first and third file. xod_sub <- filterFile(faahko_sub, file = c(1, 3)) ## The number of chromatographic peaks per file for the full object table(chromPeaks(faahko_sub)[, "sample"]) ## The number of chromatographic peaks per file for the subset table(chromPeaks(xod_sub)[, "sample"]) basename(fileNames(faahko_sub)) basename(fileNames(xod_sub)) ## Filter on mz values; chromatographic peaks and features within the ## mz range are retained (as well as adjusted retention times). xod_sub <- filterMz(faahko_sub, mz = c(300, 400)) head(chromPeaks(xod_sub)) nrow(chromPeaks(xod_sub)) nrow(chromPeaks(faahko_sub)) ## Filter on rt values. All chromatographic peaks and features within the ## retention time range are retained. Filtering is performed by default on ## adjusted retention times, if present. xod_sub <- filterRt(faahko_sub, rt = c(2700, 2900)) range(rtime(xod_sub)) head(chromPeaks(xod_sub)) range(chromPeaks(xod_sub)[, "rt"]) nrow(chromPeaks(faahko_sub)) nrow(chromPeaks(xod_sub)) ## Extract a single Spectrum faahko_sub[[4]] ## Subsetting using [ removes all preprocessing results - using ## keepAdjustedRtime = TRUE would keep adjusted retention times, if present. xod_sub <- faahko_sub[fromFile(faahko_sub) == 1] xod_sub ## Using split does also remove preprocessing results, but it supports the ## optional parameter keepAdjustedRtime. ## Split the object into a list of XCMSnExp objects, one per file xod_list <- split(faahko_sub, f = fromFile(faahko_sub)) xod_list## Loading a test data set with identified chromatographic peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) ## Subset the dataset to the first and third file. xod_sub <- filterFile(faahko_sub, file = c(1, 3)) ## The number of chromatographic peaks per file for the full object table(chromPeaks(faahko_sub)[, "sample"]) ## The number of chromatographic peaks per file for the subset table(chromPeaks(xod_sub)[, "sample"]) basename(fileNames(faahko_sub)) basename(fileNames(xod_sub)) ## Filter on mz values; chromatographic peaks and features within the ## mz range are retained (as well as adjusted retention times). xod_sub <- filterMz(faahko_sub, mz = c(300, 400)) head(chromPeaks(xod_sub)) nrow(chromPeaks(xod_sub)) nrow(chromPeaks(faahko_sub)) ## Filter on rt values. All chromatographic peaks and features within the ## retention time range are retained. Filtering is performed by default on ## adjusted retention times, if present. xod_sub <- filterRt(faahko_sub, rt = c(2700, 2900)) range(rtime(xod_sub)) head(chromPeaks(xod_sub)) range(chromPeaks(xod_sub)[, "rt"]) nrow(chromPeaks(faahko_sub)) nrow(chromPeaks(xod_sub)) ## Extract a single Spectrum faahko_sub[[4]] ## Subsetting using [ removes all preprocessing results - using ## keepAdjustedRtime = TRUE would keep adjusted retention times, if present. xod_sub <- faahko_sub[fromFile(faahko_sub) == 1] xod_sub ## Using split does also remove preprocessing results, but it supports the ## optional parameter keepAdjustedRtime. ## Split the object into a list of XCMSnExp objects, one per file xod_list <- split(faahko_sub, f = fromFile(faahko_sub)) xod_list
Subset an xcmsRaw object by scans. The
returned xcmsRaw object contains values for all
scans specified with argument i. Note that the scanrange
slot of the returned xcmsRaw will be
c(1, length(object@scantime)) and hence not range(i).
## S4 method for signature 'xcmsRaw,logicalOrNumeric,missing,missing' x[i, j, drop]## S4 method for signature 'xcmsRaw,logicalOrNumeric,missing,missing' x[i, j, drop]
x |
The |
i |
Integer or logical vector specifying the scans/spectra to which
|
j |
Not supported. |
drop |
Not supported. |
Only subsetting by scan index in increasing order or by a logical
vector are supported. If not ordered, argument i is sorted
automatically. Indices which are larger than the total number of scans
are discarded.
The sub-setted xcmsRaw object.
Johannes Rainer
## Load a test file library(xcms) library(MsDataHub) file <- ko15.CDF() xraw <- xcmsRaw(file, profstep = 0) ## The number of scans/spectra: length(xraw@scantime) ## Subset the object to scans with a scan time from 3500 to 4000. xsub <- xraw[xraw@scantime >= 3500 & xraw@scantime <= 4000] range(xsub@scantime) ## The number of scans: length(xsub@scantime) ## The number of values of the subset: length(xsub@env$mz)## Load a test file library(xcms) library(MsDataHub) file <- ko15.CDF() xraw <- xcmsRaw(file, profstep = 0) ## The number of scans/spectra: length(xraw@scantime) ## Subset the object to scans with a scan time from 3500 to 4000. xsub <- xraw[xraw@scantime >= 3500 & xraw@scantime <= 4000] range(xsub@scantime) ## The number of scans: length(xsub@scantime) ## The number of values of the subset: length(xsub@env$mz)
Determine which peaks are absent / present in a sample class
object |
|
class |
Name of a sample class from |
minfrac |
minimum fraction of samples necessary in the class to be absent/present |
Determine which peaks are absent / present in a sample class
The functions treat peaks that are only present because
of fillPeaks correctly, i.e. does not count them as present.
An logical vector with the same length as nrow(groups(object)).
absent(object, ...)
present(object, ...)
The adjustRtime method(s) perform retention time correction (alignment)
between chromatograms of different samples/dataset. Alignment is performed
by default on MS level 1 data. Retention times of spectra from other MS
levels, if present, are subsequently adjusted based on the adjusted
retention times of the MS1 spectra. Note that calling adjustRtime on a
xcms result object will remove any eventually present previous alignment
results as well as any correspondence analysis results. To run a second
round of alignment, raw retention times need to be replaced with adjusted
ones using the applyAdjustedRtime() function.
The alignment method can be specified (and configured) using a dedicated
param argument.
Supported param objects are:
ObiwarpParam: performs retention time adjustment based on the full m/z -
rt data using the obiwarp method (Prince (2006)). It is based on the
original code but supports in addition
alignment of multiple samples by aligning each against a center sample.
The alignment is performed directly on the profile-matrix and can hence
be performed independently of the peak detection or peak grouping.
PeakGroupsParam: performs retention time correction based on the
alignment of features defined in all/most samples (corresponding to
house keeping compounds or marker compounds) (Smith 2006). First the
retention time deviation of these features is described by fitting either a
polynomial (smooth = "loess") or a linear (smooth = "linear") function
to the data points. These are then subsequently used to adjust the
retention time of each spectrum in each sample (even from spectra of
MS levels different than MS 1). Since the function is
based on features (i.e. chromatographic peaks grouped across samples) a
initial correspondence analysis has to be performed before using the
groupChromPeaks() function. Alternatively, it is also possible to
manually define a numeric matrix with retention times of markers in each
samples that should be used for alignment. Such a matrix can be passed
to the alignment function using the peakGroupsMatrix parameter of the
PeakGroupsParam parameter object. By default the
adjustRtimePeakGroups() function is used to define this matrix. This
function identifies peak groups (features) for alignment in object
based on the parameters defined in param. See also
do_adjustRtime_peakGroups() for the core API function.
LamaParama: This function performs retention time correction by aligning
chromatographic data to an external reference dataset (concept and initial
implementation by Carl Brunius). The process involves identifying and
aligning peaks within the experimental chromatographic data, represented
as an XcmsExperiment object, to a predefined set of landmark features
called "lamas". These landmark features are characterized by their
mass-to-charge ratio (m/z) and retention time. see LamaParama() for more
information on the method.
adjustRtime(object, param, ...) adjustRtimePeakGroups(object, param, ...) ## S4 method for signature 'MsExperiment,ObiwarpParam' adjustRtime(object, param, chunkSize = 2L, BPPARAM = bpparam()) ## S4 method for signature 'MsExperiment,PeakGroupsParam' adjustRtime(object, param, msLevel = 1L, ...) PeakGroupsParam( minFraction = 0.9, extraPeaks = 1, smooth = "loess", span = 0.2, family = "gaussian", peakGroupsMatrix = matrix(nrow = 0, ncol = 0), subset = integer(), subsetAdjust = c("average", "previous") ) ObiwarpParam( binSize = 1, centerSample = integer(), response = 1L, distFun = "cor_opt", gapInit = numeric(), gapExtend = numeric(), factorDiag = 2, factorGap = 1, localAlignment = FALSE, initPenalty = 0, subset = integer(), subsetAdjust = c("average", "previous"), rtimeDifferenceThreshold = 5 ) ## S4 method for signature 'OnDiskMSnExp,ObiwarpParam' adjustRtime(object, param, msLevel = 1L) ## S4 replacement method for signature 'ObiwarpParam' binSize(object) <- value ## S4 method for signature 'XCMSnExp,PeakGroupsParam' adjustRtime(object, param, msLevel = 1L) ## S4 method for signature 'XCMSnExp,ObiwarpParam' adjustRtime(object, param, msLevel = 1L)adjustRtime(object, param, ...) adjustRtimePeakGroups(object, param, ...) ## S4 method for signature 'MsExperiment,ObiwarpParam' adjustRtime(object, param, chunkSize = 2L, BPPARAM = bpparam()) ## S4 method for signature 'MsExperiment,PeakGroupsParam' adjustRtime(object, param, msLevel = 1L, ...) PeakGroupsParam( minFraction = 0.9, extraPeaks = 1, smooth = "loess", span = 0.2, family = "gaussian", peakGroupsMatrix = matrix(nrow = 0, ncol = 0), subset = integer(), subsetAdjust = c("average", "previous") ) ObiwarpParam( binSize = 1, centerSample = integer(), response = 1L, distFun = "cor_opt", gapInit = numeric(), gapExtend = numeric(), factorDiag = 2, factorGap = 1, localAlignment = FALSE, initPenalty = 0, subset = integer(), subsetAdjust = c("average", "previous"), rtimeDifferenceThreshold = 5 ) ## S4 method for signature 'OnDiskMSnExp,ObiwarpParam' adjustRtime(object, param, msLevel = 1L) ## S4 replacement method for signature 'ObiwarpParam' binSize(object) <- value ## S4 method for signature 'XCMSnExp,PeakGroupsParam' adjustRtime(object, param, msLevel = 1L) ## S4 method for signature 'XCMSnExp,ObiwarpParam' adjustRtime(object, param, msLevel = 1L)
object |
For |
param |
The parameter object defining the alignment method (and its setting). |
... |
ignored. |
chunkSize |
For |
BPPARAM |
parallel processing setup. Defaults to |
msLevel |
For |
minFraction |
For |
extraPeaks |
For |
smooth |
For |
span |
For |
family |
For |
peakGroupsMatrix |
For |
subset |
For |
subsetAdjust |
For |
binSize |
|
centerSample |
|
response |
For |
distFun |
For |
gapInit |
For |
gapExtend |
For |
factorDiag |
For |
factorGap |
For |
localAlignment |
For |
initPenalty |
For |
rtimeDifferenceThreshold |
For |
value |
For all assignment methods: the value to set/replace. |
adjustRtime on an OnDiskMSnExp or XCMSnExp object will return an
XCMSnExp object with the alignment results.
adjustRtime on an MsExperiment or XcmsExperiment will return an
XcmsExperiment with the adjusted retention times stored in an new
spectra variable rtime_adjusted in the object's spectra.
ObiwarpParam, PeakGroupsParam and LamaParama return the respective
parameter object.
adjustRtimeGroups returns a matrix with the retention times of marker
features in each sample (each row one feature, each row one sample).
All alignment methods allow to perform the retention time correction on a
user-selected subset of samples (e.g. QC samples) after which all samples
not part of that subset will be adjusted based on the adjusted retention
times of the closest subset sample (close in terms of index within object
and hence possibly injection index). It is thus suggested to load MS data
files in the order in which their samples were injected in the measurement
run(s).
How the non-subset samples are adjusted depends also on the parameter
subsetAdjust: with subsetAdjust = "previous", each non-subset
sample is adjusted based on the closest previous subset sample which
results in most cases with adjusted retention times of the non-subset
sample being identical to the subset sample on which the adjustment bases.
The second, default, option is subsetAdjust = "average" in which case
each non subset sample is adjusted based on the average retention time
adjustment from the previous and following subset sample. For the average,
a weighted mean is used with weights being the inverse of the distance of
the non-subset sample to the subset samples used for alignment.
See also section Alignment of experiments including blanks in the xcms vignette for more details.
Colin Smith, Johannes Rainer, Philippine Louail, Carl Brunius
Prince, J. T., and Marcotte, E. M. (2006) "Chromatographic Alignment of ESI-LC-MS Proteomic Data Sets by Ordered Bijective Interpolated Warping" Anal. Chem., 78 (17), 6140-6152. doi: 10.1021/ac0605344
Smith, C.A., Want, E.J., O'Maille, G., Abagyan, R. and Siuzdak, G. (2006). "XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification" Anal. Chem. 78:779-787. doi: 10.1021/ac051437y
plotAdjustedRtime() for visualization of alignment results.
Alignment is achieved using the adjustRtime() method with a param of
class LamaParama. This method corrects retention time by aligning
chromatographic data with an external reference dataset.
Chromatographic peaks in the experimental data are first matched to predefined (external) landmark features based on their mass-to-charge ratio and retention time and subsequently the data is aligned by minimizing the differences in retention times between the matched chromatographic peaks and lamas. This adjustment is performed file by file.
Adjustable parameters such as ppm, tolerance, and toleranceRt define
acceptable deviations during the matching process. It's crucial to note that
only lamas and chromatographic peaks exhibiting a one-to-one mapping are
considered when estimating retention time shifts. If a file has no peaks
matching with lamas, no adjustment will be performed, and the the retention
times will be returned as-is. Users can evaluate this matching, for example,
by checking the number of matches and ranges of the matching peaks, by first
running [matchLamasChromPeaks()].
Different warping methods are available; users can choose to fit a loess
(method = "loess", the default) or a gam (method = "gam") between the
reference data points and observed matching ChromPeaks. Additional
parameters such as span, weight, outlierTolerance, zeroWeight,
and bs are specific to these models. These parameters offer flexibility
in fine-tuning how the matching chromatographic peaks are fitted to the
lamas, thereby generating a model to align the overall retention time for
a single file.
Other functions related to this method:
LamaParama(): return the respective parameter object for alignment
using adjustRtime() function. It is also the input for the functions
listed below.
matchLamasChromPeaks(): quickly matches each file's ChromPeaks
to Lamas, allowing the user to evaluate the matches for each file.
summarizeLamaMatch(): generates a summary of the LamaParama method.
See below for the details of the return object.
matchedRtimes(): Access the list of data.frame saved in the
LamaParama object, generated by the matchLamasChromPeaks() function.
plot():plot the chromatographic peaks versus the reference lamas as
well as the fitting line for the chosen model type. The user can decide
what file to inspect by specifying the assay number with the parameter
assay
## S4 method for signature 'XcmsExperiment,LamaParama' adjustRtime(object, param, BPPARAM = bpparam(), ...) matchLamasChromPeaks(object, param, BPPARAM = bpparam()) summarizeLamaMatch(param) matchedRtimes(param) LamaParama( lamas = matrix(ncol = 2, nrow = 0, dimnames = list(NULL, c("mz", "rt"))), method = c("loess", "gam"), span = 0.5, outlierTolerance = 3, zeroWeight = 10, ppm = 20, tolerance = 0, toleranceRt = 5, bs = "tp" ) ## S4 method for signature 'LamaParama,ANY' plot( x, index = 1L, colPoints = "#00000060", colFit = "#00000080", xlab = "Matched Chromatographic peaks", ylab = "Lamas", ... )## S4 method for signature 'XcmsExperiment,LamaParama' adjustRtime(object, param, BPPARAM = bpparam(), ...) matchLamasChromPeaks(object, param, BPPARAM = bpparam()) summarizeLamaMatch(param) matchedRtimes(param) LamaParama( lamas = matrix(ncol = 2, nrow = 0, dimnames = list(NULL, c("mz", "rt"))), method = c("loess", "gam"), span = 0.5, outlierTolerance = 3, zeroWeight = 10, ppm = 20, tolerance = 0, toleranceRt = 5, bs = "tp" ) ## S4 method for signature 'LamaParama,ANY' plot( x, index = 1L, colPoints = "#00000060", colFit = "#00000080", xlab = "Matched Chromatographic peaks", ylab = "Lamas", ... )
object |
An object of class |
param |
An object of class |
BPPARAM |
For |
... |
For |
lamas |
For |
method |
For |
span |
For |
outlierTolerance |
For |
zeroWeight |
For |
ppm |
For |
tolerance |
For |
toleranceRt |
For |
bs |
For |
x |
For |
index |
For |
colPoints |
For |
colFit |
For |
xlab, ylab
|
For |
For matchLamasChromPeaks(): A LamaParama object with new slot rtMap
composed of a list of matrices representing the 1:1 matches between Lamas
(ref) and ChromPeaks (obs). To access this, matchedRtimes() can be used.
For matchedRtimes(): A list of data.frame representing matches
between chromPeaks and lamas for each files.
For summarizeLamaMatch():A data.frame with:
"Total_peaks": total number of chromatographic peaks in the file.
"Matched_peak": The number of matched peaks to Lamas.
"Total_Lamas": Total number of Lamas.
"Model_summary": summary.loess or summary.gam object for each file.
If there are no matches when using matchLamasChromPeaks(), the file
retention will not be adjusted when calling adjustRtime() with the same
LamaParama and XcmsExperiment object.
To see examples on how to utilize this methods and its functionality, see the vignette.
Carl Brunius, Philippine Louail
## load test and reference datasets ref <- loadXcmsData("xmse") tst <- loadXcmsData("faahko_sub2") ## create lamas input from the reference dataset library(MsExperiment) f <- sampleData(ref)$sample_type f[f == "QC"] <- NA ref <- filterFeatures(ref, PercentMissingFilter(threshold = 0, f = f)) ref_mz_rt <- featureDefinitions(ref)[, c("mzmed","rtmed")] ## Set up the LamaParama object param <- LamaParama(lamas = ref_mz_rt, method = "loess", span = 0.5, outlierTolerance = 3, zeroWeight = 10, ppm = 20, tolerance = 0, toleranceRt = 20, bs = "tp") ## input into `adjustRtime()` tst_adjusted <- adjustRtime(tst, param = param) ## run diagnostic functions to pre-evaluate alignment param <- matchLamasChromPeaks(tst, param = param) mtch <- matchedRtimes(param) ## Access summary of matches and model information summary <- summarizeLamaMatch(param) ##coverage for each file summary$Matched_peaks / summary$Total_peaks * 100 ## Access the information on the model of for the first file summary$model_summary[[1]]## load test and reference datasets ref <- loadXcmsData("xmse") tst <- loadXcmsData("faahko_sub2") ## create lamas input from the reference dataset library(MsExperiment) f <- sampleData(ref)$sample_type f[f == "QC"] <- NA ref <- filterFeatures(ref, PercentMissingFilter(threshold = 0, f = f)) ref_mz_rt <- featureDefinitions(ref)[, c("mzmed","rtmed")] ## Set up the LamaParama object param <- LamaParama(lamas = ref_mz_rt, method = "loess", span = 0.5, outlierTolerance = 3, zeroWeight = 10, ppm = 20, tolerance = 0, toleranceRt = 20, bs = "tp") ## input into `adjustRtime()` tst_adjusted <- adjustRtime(tst, param = param) ## run diagnostic functions to pre-evaluate alignment param <- matchLamasChromPeaks(tst, param = param) mtch <- matchedRtimes(param) ## Access summary of matches and model information summary <- summarizeLamaMatch(param) ##coverage for each file summary$Matched_peaks / summary$Total_peaks * 100 ## Access the information on the model of for the first file summary$model_summary[[1]]
Replaces the raw retention times with the adjusted retention time or returns the object unchanged if none are present.
applyAdjustedRtime(object)applyAdjustedRtime(object)
object |
An XCMSnExp or XcmsExperiment object. |
Adjusted retention times are stored in parallel to the adjusted
retention times in XCMSnExp or XcmsExperiment objects. The
applyAdjustedRtime replaces the raw (original) retention times with the
adjusted retention times.
An XCMSnExp or XcmsExperiment object with the raw (original) retention
times being replaced with the adjusted retention time.
Replacing the raw retention times with adjusted retention times
disables the possibility to restore raw retention times using the
dropAdjustedRtime() method. This function does not remove the
retention time processing step with the settings of the alignment from
the processHistory() of the object to ensure that the processing
history is preserved.
Johannes Rainer
adjustRtime() for the function to perform the alignment (retention
time correction).
[adjustedRtime()] for the method to extract adjusted retention times from an [XCMSnExp] object. [dropAdjustedRtime] for the method to delete alignment results and to restore the raw retention times.
## Load a test data set with detected peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) xod <- adjustRtime(faahko_sub, param = ObiwarpParam()) hasAdjustedRtime(xod) ## Replace raw retention times with adjusted retention times. xod <- applyAdjustedRtime(xod) ## No adjusted retention times present hasAdjustedRtime(xod) ## Raw retention times have been replaced with adjusted retention times plot(split(rtime(faahko_sub), fromFile(faahko_sub))[[1]] - split(rtime(xod), fromFile(xod))[[1]], type = "l") ## And the process history still contains the settings for the alignment processHistory(xod)## Load a test data set with detected peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) xod <- adjustRtime(faahko_sub, param = ObiwarpParam()) hasAdjustedRtime(xod) ## Replace raw retention times with adjusted retention times. xod <- applyAdjustedRtime(xod) ## No adjusted retention times present hasAdjustedRtime(xod) ## Raw retention times have been replaced with adjusted retention times plot(split(rtime(faahko_sub), fromFile(faahko_sub))[[1]] - split(rtime(xod), fromFile(xod))[[1]], type = "l") ## And the process history still contains the settings for the alignment processHistory(xod)
AutoLockMass ~~AutoLockMass - This function decides where the lock mass scans are
in the xcmsRaw object. This is done by using the scan time differences.
object |
An |
AutoLockMass A numeric vector of scan locations corresponding to lock Mass scans
signature(object = "xcmsRaw")
Paul Benton, [email protected]
## Not run: library(xcms) library(faahKO) ## These files do not have this problem ## to correct for but just for an example cdfpath <- system.file("cdf", package = "faahKO") cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE) xr<-xcmsRaw(cdffiles[1]) xr ##Lets assume that the lockmass starts at 1 and is every 100 scans lockMass<-xcms:::makeacqNum(xr, freq=100, start=1) ## these are equalvent lockmass2<-AutoLockMass(xr) all((lockmass == lockmass2) == TRUE) ob<-stitch(xr, lockMass) ## End(Not run)## Not run: library(xcms) library(faahKO) ## These files do not have this problem ## to correct for but just for an example cdfpath <- system.file("cdf", package = "faahKO") cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE) xr<-xcmsRaw(cdffiles[1]) xr ##Lets assume that the lockmass starts at 1 and is every 100 scans lockMass<-xcms:::makeacqNum(xr, freq=100, start=1) ## these are equalvent lockmass2<-AutoLockMass(xr) all((lockmass == lockmass2) == TRUE) ob<-stitch(xr, lockMass) ## End(Not run)
The methods listed on this page are XCMSnExp()
methods inherited from its parent, the
MSnbase::OnDiskMSnExp() class from the MSnbase
package, that alter the raw data or are related to data subsetting. Thus
calling any of these methods causes all xcms pre-processing
results to be removed from the XCMSnExp() object to ensure
its data integrity.
bin(): allows to bin spectra. See
MSnbase::bin() documentation in the MSnbase package for more
details and examples.
clean(): removes unused 0 intensity data
points. See MSnbase::clean() documentation in the MSnbase package
for details and examples.
filterAcquisitionNum(): filters the XCMSnExp() object keeping only
spectra with the provided acquisition numbers. See
MSnbase::filterAcquisitionNum() for details and examples.
The normalize() method performs basic normalization of
spectra intensities. See MSnbase::normalize() documentation
in the MSnbase package for details and examples.
The pickPeaks() method performs peak picking. See documentation for
that function in the MSnbase package for details and examples.
The removePeaks() method removes mass peaks (intensities)
lower than a threshold. Note that these peaks refer to mass
peaks, which are different to the chromatographic peaks detected and
analyzed in a metabolomics experiment! See
MSnbase::removePeaks() documentation for details and
examples.
The smooth() method smooths spectra. See
MSnbase::smooth() documentation in MSnbase for details and
examples.
## S4 method for signature 'XCMSnExp' bin(x, binSize = 1L, msLevel.) ## S4 method for signature 'XCMSnExp' clean(object, all = FALSE, verbose = FALSE, msLevel.) ## S4 method for signature 'XCMSnExp' filterAcquisitionNum(object, n, file) ## S4 method for signature 'XCMSnExp' normalize(object, method = c("max", "sum"), ...) ## S4 method for signature 'XCMSnExp' pickPeaks( object, halfWindowSize = 3L, method = c("MAD", "SuperSmoother"), SNR = 0L, ... ) ## S4 method for signature 'XCMSnExp' removePeaks(object, t = "min", verbose = FALSE, msLevel.) ## S4 method for signature 'XCMSnExp' smooth( x, method = c("SavitzkyGolay", "MovingAverage"), halfWindowSize = 2L, verbose = FALSE, ... )## S4 method for signature 'XCMSnExp' bin(x, binSize = 1L, msLevel.) ## S4 method for signature 'XCMSnExp' clean(object, all = FALSE, verbose = FALSE, msLevel.) ## S4 method for signature 'XCMSnExp' filterAcquisitionNum(object, n, file) ## S4 method for signature 'XCMSnExp' normalize(object, method = c("max", "sum"), ...) ## S4 method for signature 'XCMSnExp' pickPeaks( object, halfWindowSize = 3L, method = c("MAD", "SuperSmoother"), SNR = 0L, ... ) ## S4 method for signature 'XCMSnExp' removePeaks(object, t = "min", verbose = FALSE, msLevel.) ## S4 method for signature 'XCMSnExp' smooth( x, method = c("SavitzkyGolay", "MovingAverage"), halfWindowSize = 2L, verbose = FALSE, ... )
x |
|
binSize |
|
msLevel. |
For |
object |
|
all |
For |
verbose |
|
n |
For |
file |
For |
method |
For |
... |
Optional additional arguments. |
halfWindowSize |
For |
SNR |
For |
t |
For |
For all methods: a XCMSnExp object.
Johannes Rainer
XCMSnExp-filter for methods to filter and subset
XCMSnExp objects.
XCMSnExp() for base class documentation.
MSnbase::OnDiskMSnExp() for the documentation of the
parent class.
This functions takes two same-sized numeric vectors x
and y, bins/cuts x into bins (either a pre-defined number
of equal-sized bins or bins of a pre-defined size) and aggregates values
in y corresponding to x values falling within each bin. By
default (i.e. method = "max") the maximal y value for the
corresponding x values is identified. x is expected to be
incrementally sorted and, if not, it will be internally sorted (in which
case also y will be ordered according to the order of x).
binYonX( x, y, breaks, nBins, binSize, binFromX, binToX, fromIdx = 1L, toIdx = length(x), method = "max", baseValue, sortedX = !is.unsorted(x), shiftByHalfBinSize = FALSE, returnIndex = FALSE, returnX = TRUE )binYonX( x, y, breaks, nBins, binSize, binFromX, binToX, fromIdx = 1L, toIdx = length(x), method = "max", baseValue, sortedX = !is.unsorted(x), shiftByHalfBinSize = FALSE, returnIndex = FALSE, returnX = TRUE )
x |
Numeric vector to be used for binning. |
y |
Numeric vector (same length than |
breaks |
Numeric vector defining the breaks for the bins, i.e. the lower and upper values for each bin. See examples below. |
nBins |
|
binSize |
|
binFromX |
Optional |
binToX |
Same as |
fromIdx |
Integer vector defining the start position of one or multiple
sub-sets of input vector |
toIdx |
Same as |
method |
A character string specifying the method that should be used to
aggregate values in |
baseValue |
The base value for empty bins (i.e. bins into which either
no values in |
sortedX |
Whether |
shiftByHalfBinSize |
Logical specifying whether the bins should be
shifted by half the bin size to the left. Thus, the first bin will have
its center at |
returnIndex |
Logical indicating whether the index of the max (if
|
returnX |
|
The breaks defining the boundary of each bin can be either passed
directly to the function with the argument breaks, or are
calculated on the data based on arguments nBins or binSize
along with fromIdx, toIdx and optionally binFromX
and binToX.
Arguments fromIdx and toIdx allow to specify subset(s) of
the input vector x on which bins should be calculated. The
default the full x vector is considered. Also, if not specified
otherwise with arguments binFromX and binToX, the range
of the bins within each of the sub-sets will be from x[fromIdx]
to x[toIdx]. Arguments binFromX and binToX allow to
overwrite this by manually defining the a range on which the breaks
should be calculated. See examples below for more details.
Calculation of breaks: for `nBins` the breaks correspond to `seq(min(x[fromIdx])), max(x[fromIdx], length.out = (nBins + 1))`. For `binSize` the breaks correspond to `seq(min(x[fromIdx]), max(x[toIdx]), by = binSize)` with the exception that the last break value is forced to be equal to `max(x[toIdx])`. This ensures that all values from the specified range are covered by the breaks defining the bins. The last bin could however in some instances be slightly larger than `binSize`. See [breaks_on_binSize()] and [breaks_on_nBins()] for more details.
Returns a list of length 2, the first element (named "x")
contains the bin mid-points, the second element (named "y") the
aggregated values from input vector y within each bin. For
returnIndex = TRUE the list contains an additional element
"index" with the index of the max or min (depending on whether
method = "max" or method = "min") value within each bin in
input vector x.
The function ensures that all values within the range used to define
the breaks are considered in the binning (and assigned to a bin). This
means that for all bins except the last one values in x have to be
>= xlower and < xupper (with xlower
and xupper being the lower and upper boundary, respectively). For
the last bin the condition is x >= xlower & x <= xupper.
Note also that if shiftByHalfBinSize is TRUE the range of
values that is used for binning is expanded by binSize (i.e. the
lower boundary will be fromX - binSize/2, the upper
toX + binSize/2). Setting this argument to TRUE resembles
the binning that is/was used in profBin function from
xcms < 1.51.
`NA` handling: by default the function ignores `NA` values in `y` (thus inherently assumes `na.rm = TRUE`). No `NA` values are allowed in `x`.
Johannes Rainer
######## ## Simple example illustrating the breaks and the binning. ## ## Define breaks for 5 bins: brks <- seq(2, 12, length.out = 6) ## The first bin is then [2,4), the second [4,6) and so on. brks ## Get the max value falling within each bin. binYonX(x = 1:16, y = 1:16, breaks = brks) ## Thus, the largest value in x = 1:16 falling into the bin [2,4) (i.e. being ## >= 2 and < 4) is 3, the largest one falling into [4,6) is 5 and so on. ## Note however the function ensures that the minimal and maximal x-value ## (in this example 1 and 12) fall within a bin, i.e. 12 is considered for ## the last bin. ####### ## Performing the binning ons sub-set of x ## X <- 1:16 ## Bin X from element 4 to 10 into 5 bins. X[4:10] binYonX(X, X, nBins = 5L, fromIdx = 4, toIdx = 10) ## This defines breaks for 5 bins on the values from 4 to 10 and bins ## the values into these 5 bins. Alternatively, we could manually specify ## the range for the binning, i.e. the minimal and maximal value for the ## breaks: binYonX(X, X, nBins = 5L, fromIdx = 4, toIdx = 10, binFromX = 1, binToX = 16) ## In this case the breaks for 5 bins were defined from a value 1 to 16 and ## the values 4 to 10 were binned based on these breaks. ####### ## Bin values within a sub-set of x, second example ## ## This example illustrates how the fromIdx and toIdx parameters can be used. ## x defines 3 times the sequence form 1 to 10, while y is the sequence from ## 1 to 30. In this very simple example x is supposed to represent M/Z values ## from 3 consecutive scans and y the intensities measured for each M/Z in ## each scan. We want to get the maximum intensities for M/Z value bins only ## for the second scan, and thus we use fromIdx = 11 and toIdx = 20. The breaks ## for the bins are defined with the nBins, binFromX and binToX. X <- rep(1:10, 3) Y <- 1:30 ## Bin the M/Z values in the second scan into 5 bins and get the maximum ## intensity for each bin. Note that we have to specify sortedX = TRUE as ## the x and y vectors would be sorted otherwise. binYonX(X, Y, nBins = 5L, sortedX = TRUE, fromIdx = 11, toIdx = 20) ####### ## Bin in overlapping sub-sets of X ## ## In this example we define overlapping sub-sets of X and perform the binning ## within these. X <- 1:30 ## Define the start and end indices of the sub-sets. fIdx <- c(2, 8, 21) tIdx <- c(10, 25, 30) binYonX(X, nBins = 5L, fromIdx = fIdx, toIdx = tIdx) ## The same, but pre-defining also the desired range of the bins. binYonX(X, nBins = 5L, fromIdx = fIdx, toIdx = tIdx, binFromX = 4, binToX = 28) ## The same bins are thus used for each sub-set.######## ## Simple example illustrating the breaks and the binning. ## ## Define breaks for 5 bins: brks <- seq(2, 12, length.out = 6) ## The first bin is then [2,4), the second [4,6) and so on. brks ## Get the max value falling within each bin. binYonX(x = 1:16, y = 1:16, breaks = brks) ## Thus, the largest value in x = 1:16 falling into the bin [2,4) (i.e. being ## >= 2 and < 4) is 3, the largest one falling into [4,6) is 5 and so on. ## Note however the function ensures that the minimal and maximal x-value ## (in this example 1 and 12) fall within a bin, i.e. 12 is considered for ## the last bin. ####### ## Performing the binning ons sub-set of x ## X <- 1:16 ## Bin X from element 4 to 10 into 5 bins. X[4:10] binYonX(X, X, nBins = 5L, fromIdx = 4, toIdx = 10) ## This defines breaks for 5 bins on the values from 4 to 10 and bins ## the values into these 5 bins. Alternatively, we could manually specify ## the range for the binning, i.e. the minimal and maximal value for the ## breaks: binYonX(X, X, nBins = 5L, fromIdx = 4, toIdx = 10, binFromX = 1, binToX = 16) ## In this case the breaks for 5 bins were defined from a value 1 to 16 and ## the values 4 to 10 were binned based on these breaks. ####### ## Bin values within a sub-set of x, second example ## ## This example illustrates how the fromIdx and toIdx parameters can be used. ## x defines 3 times the sequence form 1 to 10, while y is the sequence from ## 1 to 30. In this very simple example x is supposed to represent M/Z values ## from 3 consecutive scans and y the intensities measured for each M/Z in ## each scan. We want to get the maximum intensities for M/Z value bins only ## for the second scan, and thus we use fromIdx = 11 and toIdx = 20. The breaks ## for the bins are defined with the nBins, binFromX and binToX. X <- rep(1:10, 3) Y <- 1:30 ## Bin the M/Z values in the second scan into 5 bins and get the maximum ## intensity for each bin. Note that we have to specify sortedX = TRUE as ## the x and y vectors would be sorted otherwise. binYonX(X, Y, nBins = 5L, sortedX = TRUE, fromIdx = 11, toIdx = 20) ####### ## Bin in overlapping sub-sets of X ## ## In this example we define overlapping sub-sets of X and perform the binning ## within these. X <- 1:30 ## Define the start and end indices of the sub-sets. fIdx <- c(2, 8, 21) tIdx <- c(10, 25, 30) binYonX(X, nBins = 5L, fromIdx = fIdx, toIdx = tIdx) ## The same, but pre-defining also the desired range of the bins. binYonX(X, nBins = 5L, fromIdx = fIdx, toIdx = tIdx, binFromX = 4, binToX = 28) ## The same bins are thus used for each sub-set.
The BlankFlag class and method enable users to flag features of an
XcmsExperiment or SummarizedExperiment object based on the relationship
between the intensity of a feature in blanks compared to the intensity in the
samples.
This class and method are part of the possible dispatch of the
generic function filterFeatures. Features below (<) the user-input
threshold will be flagged by calling the filterFeatures function. This
means that an extra column will be created in featureDefinitions or
rowData called possible_contaminants with a logical value for each
feature.
BlankFlag( threshold = 2, blankIndex = integer(), qcIndex = integer(), na.rm = TRUE ) ## S4 method for signature 'XcmsResult,BlankFlag' filterFeatures(object, filter, ...) ## S4 method for signature 'SummarizedExperiment,BlankFlag' filterFeatures(object, filter, assay = 1)BlankFlag( threshold = 2, blankIndex = integer(), qcIndex = integer(), na.rm = TRUE ) ## S4 method for signature 'XcmsResult,BlankFlag' filterFeatures(object, filter, ...) ## S4 method for signature 'SummarizedExperiment,BlankFlag' filterFeatures(object, filter, assay = 1)
threshold |
|
blankIndex |
|
qcIndex |
|
na.rm |
|
object |
|
filter |
The parameter object selecting and configuring the type of
filtering. It can be one of the following classes: |
... |
Optional parameters. For |
assay |
For filtering of |
For BlankFlag: a BlankFlag class. filterFeatures returns
the input object with an added column in the features metadata called
possible_contaminants with a logical value for each feature. This is added
to featureDefinitions for XcmsExperiment objects and rowData for
SummarizedExperiment objects.
Philippine Louail
Other Filter features in xcms:
DratioFilter,
PercentMissingFilter,
RsdFilter
Defines breaks for binSize sized bins for values ranging
from fromX to toX.
breaks_on_binSize(fromX, toX, binSize)breaks_on_binSize(fromX, toX, binSize)
fromX |
|
toX |
|
binSize |
|
This function creates breaks for bins of size binSize. The
function ensures that the full data range is included in the bins, i.e. the
last value (upper boundary of the last bin) is always equal toX. This
however means that the size of the last bin will not always be equal to the
desired bin size.
See examples for more details and a comparisom to R's seq() function.
A numeric vector defining the lower and upper bounds of the bins.
Johannes Rainer
binYonX() for a binning function.
Other functions to define bins:
breaks_on_nBins()
## Define breaks with a size of 0.13 for a data range from 1 to 10: breaks_on_binSize(1, 10, 0.13) ## The size of the last bin is however larger than 0.13: diff(breaks_on_binSize(1, 10, 0.13)) ## If we would use seq, the max value would not be included: seq(1, 10, by = 0.13) ## In the next example we use binSize that leads to an additional last bin with ## a smaller binSize: breaks_on_binSize(1, 10, 0.51) ## Again, the max value is included, but the size of the last bin is < 0.51. diff(breaks_on_binSize(1, 10, 0.51)) ## Using just seq would result in the following bin definition: seq(1, 10, by = 0.51) ## Thus it defines one bin (break) less.## Define breaks with a size of 0.13 for a data range from 1 to 10: breaks_on_binSize(1, 10, 0.13) ## The size of the last bin is however larger than 0.13: diff(breaks_on_binSize(1, 10, 0.13)) ## If we would use seq, the max value would not be included: seq(1, 10, by = 0.13) ## In the next example we use binSize that leads to an additional last bin with ## a smaller binSize: breaks_on_binSize(1, 10, 0.51) ## Again, the max value is included, but the size of the last bin is < 0.51. diff(breaks_on_binSize(1, 10, 0.51)) ## Using just seq would result in the following bin definition: seq(1, 10, by = 0.51) ## Thus it defines one bin (break) less.
Calculate breaks for same-sized bins for data values
from fromX to toX.
breaks_on_nBins(fromX, toX, nBins, shiftByHalfBinSize = FALSE)breaks_on_nBins(fromX, toX, nBins, shiftByHalfBinSize = FALSE)
fromX |
|
toX |
|
nBins |
|
shiftByHalfBinSize |
Logical indicating whether the bins should be
shifted left by half bin size. This results centered bins, i.e. the
first bin being centered at |
This generates bins such as a call to
seq(fromX, toX, length.out = nBins) would. The first and second element
in the result vector thus defines the lower and upper boundary for the first
bin, the second and third value for the second bin and so on.
A numeric vector of length nBins + 1 defining the lower and
upper bounds of the bins.
Johannes Rainer
binYonX() for a binning function.
Other functions to define bins:
breaks_on_binSize()
## Create breaks to bin values from 3 to 20 into 20 bins breaks_on_nBins(3, 20, nBins = 20) ## The same call but using shiftByHalfBinSize breaks_on_nBins(3, 20, nBins = 20, shiftByHalfBinSize = TRUE)## Create breaks to bin values from 3 to 20 into 20 bins breaks_on_nBins(3, 20, nBins = 20) ## The same call but using shiftByHalfBinSize breaks_on_nBins(3, 20, nBins = 20, shiftByHalfBinSize = TRUE)
Combines the samples and peaks from multiple xcmsSet objects
into a single object. Group and retention time correction data
are discarded. The profinfo list is set to be equal to the
first object.
xs1 |
|
... |
|
A xcmsSet object.
c(xs1, ...)
Colin A. Smith, [email protected]
Calibrate peaks using mz values of known masses/calibrants. mz values of identified peaks are adjusted based on peaks that are close to the provided mz values. See details below for more information.
The isCalibrated function returns TRUE if chromatographic
peaks of the XCMSnExp object x were calibrated and FALSE otherwise.
CalibrantMassParam( mz = list(), mzabs = 1e-04, mzppm = 5, neighbors = 3, method = "linear" ) isCalibrated(object) ## S4 method for signature 'XCMSnExp' calibrate(object, param)CalibrantMassParam( mz = list(), mzabs = 1e-04, mzppm = 5, neighbors = 3, method = "linear" ) isCalibrated(object) ## S4 method for signature 'XCMSnExp' calibrate(object, param)
mz |
a |
mzabs |
|
mzppm |
|
neighbors |
|
method |
|
object |
An XCMSnExp object. |
param |
The |
The method does first identify peaks that are close to the provided
mz values and, given that there difference to the calibrants is smaller
than the user provided cut off (based on arguments mzabs and mzppm),
their mz values are replaced with the provided mz values. The mz values
of all other peaks are either globally shifted (for method = "shift"
or estimated by a linear model through all calibrants.
Peaks are considered close to a calibrant mz if the difference between
the calibrant and its mz is <= mzabs + mz * mzppm /1e6.
Adjustment methods: adjustment function/factor is estimated using the difference between calibrant and peak mz values only for peaks that are close enough to the calibrants. The availabel methods are:
shift: shifts the m/z of each peak by a global factor which
corresponds to the average difference between peak mz and calibrant mz.
linear: fits a linear model throught the differences between
calibrant and peak mz values and adjusts the mz values of all peaks
using this.
edgeshift: performs same adjustment as linear for peaks that are
within the mz range of the calibrants and shift outside of it.
For more information, details and examples refer to the xcms-direct-injection vignette.
For CalibrantMassParam: a CalibrantMassParam instance.
For calibrate: an XCMSnExp object with chromatographic peaks being
calibrated. Be aware that the actual raw mz values are not (yet)
calibrated, but only the identified chromatographic peaks.
The CalibrantMassParam() function returns an instance of
the CalibrantMassParam class with all settings and properties set.
The calibrate method returns an XCMSnExp object with the
chromatographic peaks being calibrated. Note that only the detected
peaks are calibrated, but not the individual mz values in each spectrum.
CalibrantMassParam classes don't have exported getter or setter
methods.
Joachim Bargsten, Johannes Rainer
Calibrate peaks of a xcmsSet via a set of known masses
object |
a |
calibrants |
a vector or a list of vectors with reference m/z-values |
method |
the used calibrating-method, see below |
mzppm |
the relative error used for matching peaks in ppm (parts per million) |
mzabs |
the absolute error used for matching peaks in Da |
neighbours |
the number of neighbours from wich the one with the highest intensity is used (instead of the nearest) |
plotres |
can be set to TRUE if wanted a result-plot showing the found m/z with the distances and the regression |
object |
a |
calibrants |
for each sample different calibrants can be used, if a list of m/z-vectors is given. The length of the list must be the same as the number of samples, alternatively a single vector of masses can be given which is used for all samples. |
method |
"shift" for shifting each m/z, "linear" does a linear regression and adds a linear term to each m/z. "edgeshift" does a linear regression within the range of the mz-calibrants and a shift outside. |
calibrate(object, calibrants,method="linear",
mzabs=0.0001, mzppm=5,
neighbours=3, plotres=FALSE)
chromatogram: extract chromatographic data (such as an extracted ion
chromatogram, a base peak chromatogram or total ion chromatogram) from
an MSnbase::OnDiskMSnExp or XCMSnExp objects. See also the help page of
the chromatogram function in the MSnbase package.
## S4 method for signature 'XCMSnExp' chromatogram( object, rt, mz, aggregationFun = "sum", missing = NA_real_, msLevel = 1L, BPPARAM = bpparam(), adjustedRtime = hasAdjustedRtime(object), filled = FALSE, include = c("apex_within", "any", "none"), ... )## S4 method for signature 'XCMSnExp' chromatogram( object, rt, mz, aggregationFun = "sum", missing = NA_real_, msLevel = 1L, BPPARAM = bpparam(), adjustedRtime = hasAdjustedRtime(object), filled = FALSE, include = c("apex_within", "any", "none"), ... )
object |
Either a MSnbase::OnDiskMSnExp or XCMSnExp object from which the chromatograms should be extracted. |
rt |
|
mz |
|
aggregationFun |
|
missing |
|
msLevel |
|
BPPARAM |
Parallelisation backend to be used, which will
depend on the architecture. Default is
|
adjustedRtime |
For |
filled |
|
include |
|
... |
optional parameters - currently ignored. |
Arguments rt and mz allow to specify the MS data slice (i.e. the m/z
range and retention time window) from which the chromatogram should be
extracted. These parameters can be either a numeric of length 2 with the
lower and upper limit, or a matrix with two columns with the lower and
upper limits to extract multiple EICs at once.
The parameter aggregationSum allows to specify the function to be
used to aggregate the intensities across the m/z range for the same
retention time. Setting aggregationFun = "sum" would e.g. allow
to calculate the total ion chromatogram (TIC),
aggregationFun = "max" the base peak chromatogram (BPC).
If for a given retention time no intensity is measured in that spectrum a
NA intensity value is returned by default. This can be changed with the
parameter missing, setting missing = 0 would result in a 0 intensity
being returned in these cases.
chromatogram returns a XChromatograms object with
the number of columns corresponding to the number of files in
object and number of rows the number of specified ranges (i.e.
number of rows of matrices provided with arguments mz and/or
rt). All chromatographic peaks with their apex position within the
m/z and retention time range are also retained as well as all feature
definitions for these peaks.
For XCMSnExp objects, if adjusted retention times are
available, the chromatogram method will by default report
and use these (for the subsetting based on the provided parameter
rt). This can be changed by setting adjustedRtime = FALSE.
Johannes Rainer
XCMSnExp for the data object.
MSnbase::Chromatogram() for the object representing chromatographic
data.
[XChromatograms] for the object allowing to arrange multiple [XChromatogram] objects. [plot] to plot a [XChromatogram] or [MSnbase::MChromatograms] objects. `as` (`as(x, "data.frame")`) in `MSnbase` for a method to extract the MS data as `data.frame`.
## Load a test data set with identified chromatographic peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) ## Extract the ion chromatogram for one chromatographic peak in the data. chrs <- chromatogram(faahko_sub, rt = c(2700, 2900), mz = 335) chrs ## Identified chromatographic peaks chromPeaks(chrs) ## Plot the chromatogram plot(chrs) ## Extract chromatograms for multiple ranges. mzr <- matrix(c(335, 335, 344, 344), ncol = 2, byrow = TRUE) rtr <- matrix(c(2700, 2900, 2600, 2750), ncol = 2, byrow = TRUE) chrs <- chromatogram(faahko_sub, mz = mzr, rt = rtr) chromPeaks(chrs) plot(chrs) ## Get access to all chromatograms for the second mz/rt range chrs[1, ] ## Plot just that one plot(chrs[1, , drop = FALSE])## Load a test data set with identified chromatographic peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) ## Extract the ion chromatogram for one chromatographic peak in the data. chrs <- chromatogram(faahko_sub, rt = c(2700, 2900), mz = 335) chrs ## Identified chromatographic peaks chromPeaks(chrs) ## Plot the chromatogram plot(chrs) ## Extract chromatograms for multiple ranges. mzr <- matrix(c(335, 335, 344, 344), ncol = 2, byrow = TRUE) rtr <- matrix(c(2700, 2900, 2600, 2750), ncol = 2, byrow = TRUE) chrs <- chromatogram(faahko_sub, mz = mzr, rt = rtr) chromPeaks(chrs) plot(chrs) ## Get access to all chromatograms for the second mz/rt range chrs[1, ] ## Plot just that one plot(chrs[1, , drop = FALSE])
Extract an ion chromatogram (EIC) for each chromatographic peak in an
XcmsExperiment() object. The result is returned as an XChromatograms()
of length equal to the number of chromatographic peaks (and one column).
chromPeakChromatograms(object, ...) ## S4 method for signature 'XcmsExperiment' chromPeakChromatograms( object, expandRt = 0, expandMz = 0, aggregationFun = "max", peaks = character(), return.type = c("XChromatograms", "MChromatograms"), ..., progressbar = TRUE )chromPeakChromatograms(object, ...) ## S4 method for signature 'XcmsExperiment' chromPeakChromatograms( object, expandRt = 0, expandMz = 0, aggregationFun = "max", peaks = character(), return.type = c("XChromatograms", "MChromatograms"), ..., progressbar = TRUE )
object |
An |
... |
currently ignored. |
expandRt |
|
expandMz |
|
aggregationFun |
|
peaks |
optional |
return.type |
|
progressbar |
|
Johannes Rainer
featureChromatograms() to extract an EIC for each feature.
## Load a test data set with detected peaks library(MSnbase) library(xcms) library(MsExperiment) faahko_sub <- loadXcmsData("faahko_sub2") ## Get EICs for every detected chromatographic peak chrs <- chromPeakChromatograms(faahko_sub) chrs ## Order of EICs matches the order in chromPeaks chromPeaks(faahko_sub) |> head() ## variable "sample_index" provides the index of the sample the EIC was ## extracted from fData(chrs)$sample_index ## Get the EIC for selected peaks only. pks <- rownames(chromPeaks(faahko_sub))[c(6, 12)] pks ## Expand the data on retention time dimension by 15 seconds (on each side) res <- chromPeakChromatograms(faahko_sub, peaks = pks, expandRt = 5) plot(res[1, ])## Load a test data set with detected peaks library(MSnbase) library(xcms) library(MsExperiment) faahko_sub <- loadXcmsData("faahko_sub2") ## Get EICs for every detected chromatographic peak chrs <- chromPeakChromatograms(faahko_sub) chrs ## Order of EICs matches the order in chromPeaks chromPeaks(faahko_sub) |> head() ## variable "sample_index" provides the index of the sample the EIC was ## extracted from fData(chrs)$sample_index ## Get the EIC for selected peaks only. pks <- rownames(chromPeaks(faahko_sub))[c(6, 12)] pks ## Expand the data on retention time dimension by 15 seconds (on each side) res <- chromPeakChromatograms(faahko_sub, peaks = pks, expandRt = 5) plot(res[1, ])
Extract (MS1 or MS2) spectra from an XcmsExperiment or XCMSnExp object
for identified chromatographic peaks. To return spectra for selected
chromatographic peaks, their peak ID (i.e., row name in the chromPeaks
matrix) can be provided with parameter peaks.
For msLevel = 1L (only supported for return.type = "Spectra" or
return.type = "List") MS1 spectra within the retention time boundaries
(in the file in which the peak was detected) are returned. For
msLevel = 2L MS2 spectra are returned for a chromatographic
peak if their precursor m/z is within the retention time and m/z range of
the chromatographic peak. Parameter method allows to define whether all
or a single spectrum should be returned:
method = "all": (default): return all spectra for each chromatographic
peak.
method = "closest_rt": return the spectrum with the retention time
closest to the peak's retention time (at apex).
method = "closest_mz": return the spectrum with the precursor m/z
closest to the peaks's m/z (at apex); only supported for msLevel > 1.
method = "largest_tic": return the spectrum with the largest total
signal (sum of peaks intensities).
method = "largest_bpi": return the spectrum with the largest peak
intensity (maximal peak intensity).
method = "signal": only for object being a XCMSnExp: return the
spectrum with the sum of intensities most similar to the peak's apex
signal ("maxo"); only supported for msLevel = 2L.
Parameter return.type allows to specify the type of the result object.
With return.type = "Spectra" (the default) a Spectra::Spectra object
with all matching spectra is returned. With return.type = "Spectra" a
List of Spectra is returned.
The length of the list is equal to the number of rows
of chromPeaks. Each element of the list contains thus a Spectra with all
spectra for one chromatographic peak (or a Spectra of length 0 if no
spectrum was found for the respective chromatographic peak).
Parameter chromPeakColumns allows the user to add specific metadata
columns from the chromatographic peaks (chromPeaks) to the returned
spectra object. This can be useful to keep information such as retention
time (rt), m/z (mz). The columns will be named as they are written in the
chromPeaks object with the prefix "chrom_peak_". The peak ID
(i.e., the row name of the peak in the chromPeaks matrix) is always added
to the spectra object as a metadata column named "chrom_peak_id".
See also the LC-MS/MS data analysis vignette for more details and examples.
chromPeakSpectra(object, ...) ## S4 method for signature 'XcmsExperiment' chromPeakSpectra( object, method = c("all", "closest_rt", "closest_mz", "largest_tic", "largest_bpi"), msLevel = 2L, expandRt = 0, expandMz = 0, ppm = 0, skipFilled = FALSE, peaks = character(), chromPeakColumns = c("rt", "mz"), return.type = c("Spectra", "List"), BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp' chromPeakSpectra( object, msLevel = 2L, expandRt = 0, expandMz = 0, ppm = 0, method = c("all", "closest_rt", "closest_mz", "signal", "largest_tic", "largest_bpi"), skipFilled = FALSE, return.type = c("Spectra", "MSpectra", "List", "list"), peaks = character() )chromPeakSpectra(object, ...) ## S4 method for signature 'XcmsExperiment' chromPeakSpectra( object, method = c("all", "closest_rt", "closest_mz", "largest_tic", "largest_bpi"), msLevel = 2L, expandRt = 0, expandMz = 0, ppm = 0, skipFilled = FALSE, peaks = character(), chromPeakColumns = c("rt", "mz"), return.type = c("Spectra", "List"), BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp' chromPeakSpectra( object, msLevel = 2L, expandRt = 0, expandMz = 0, ppm = 0, method = c("all", "closest_rt", "closest_mz", "signal", "largest_tic", "largest_bpi"), skipFilled = FALSE, return.type = c("Spectra", "MSpectra", "List", "list"), peaks = character() )
object |
XcmsExperiment or XCMSnExp object with identified chromatographic peaks for which spectra should be returned. |
... |
ignored. |
method |
|
msLevel |
|
expandRt |
|
expandMz |
|
ppm |
|
skipFilled |
|
peaks |
|
chromPeakColumns |
|
return.type |
|
BPPARAM |
parallel processing setup. Defaults to
|
parameter return.type allow to specify the type of the returned object:
return.type = "Spectra" (default): a Spectra object (defined in the
Spectra package). The result contains all spectra for all peaks.
Metadata column "peak_id" provides the ID of the respective peak
(i.e. its rowname in chromPeaks().
return.type = "List": List of length equal to the number of
chromatographic peaks is returned, each element being a Spectra with
the spectra for one chromatographic peak.
For backward compatibility options "MSpectra" and "list" are also
supported but are not suggested.
return.type = "MSpectra" (deprecated): a MSnbase::MSpectra object
with elements being Spectrum objects. The result objects contains all
spectra for all peaks. Metadata column "peak_id" provides the ID of the
respective peak (i.e. its rowname in chromPeaks()).
return.type = "list": list of lists that are either of length
0 or contain Spectrum2 object(s) within the m/z-rt range. The
length of the list matches the number of peaks.
Johannes Rainer
## Read a file with DDA LC-MS/MS data library(MsExperiment) library(MsDataHub) fl <- MsDataHub::PestMix1_DDA.mzML() dda <- readMsExperiment(fl) ## Perform MS1 peak detection dda <- findChromPeaks(dda, CentWaveParam(peakwidth = c(5, 15), prefilter = c(5, 1000))) ## Return all MS2 spectro for each chromatographic peaks as a Spectra object ms2_sps <- chromPeakSpectra(dda) ms2_sps ## spectra variable *chrom_peak_id* contain the row names of the peaks in the ## chromPeak matrix and allow thus to map chromatographic peaks to the ## returned MS2 spectra ms2_sps$chrom_peak_id chromPeaks(dda) ## Alternatively, return the result as a List of Spectra objects. This list ## is parallel to chromPeaks hence the mapping between chromatographic peaks ## and MS2 spectra is easier. ms2_sps <- chromPeakSpectra(dda, return.type = "List") names(ms2_sps) rownames(chromPeaks(dda)) ms2_sps[[1L]] ## Parameter `msLevel` allows to define from which MS level spectra should ## be returned. By default `msLevel = 2L` but with `msLevel = 1L` all ## MS1 spectra with a retention time within the retention time range of ## a chromatographic peak can be returned. Alternatively, selected ## spectra can be returned by specifying the selection criteria/method ## with the `method` parameter. Below we extract for each chromatographic ## peak the MS1 spectra with a retention time closest to the ## chromatographic peak's apex position. Alternatively it would also be ## possible to select the spectrum with the highest total signal or ## highest (maximal) intensity. ms1_sps <- chromPeakSpectra(dda, msLevel = 1L, method = "closest_rt") ms1_sps ## Parameter peaks would allow to extract spectra for specific peaks only. ## Peaks can be defined with parameter `peaks` which can be either an ## `integer` with the index of the peak in the `chromPeaks` matrix or a ## `character` with its rowname in `chromPeaks`. chromPeakSpectra(dda, msLevel = 1L, method = "closest_rt", peaks = c(3, 5))## Read a file with DDA LC-MS/MS data library(MsExperiment) library(MsDataHub) fl <- MsDataHub::PestMix1_DDA.mzML() dda <- readMsExperiment(fl) ## Perform MS1 peak detection dda <- findChromPeaks(dda, CentWaveParam(peakwidth = c(5, 15), prefilter = c(5, 1000))) ## Return all MS2 spectro for each chromatographic peaks as a Spectra object ms2_sps <- chromPeakSpectra(dda) ms2_sps ## spectra variable *chrom_peak_id* contain the row names of the peaks in the ## chromPeak matrix and allow thus to map chromatographic peaks to the ## returned MS2 spectra ms2_sps$chrom_peak_id chromPeaks(dda) ## Alternatively, return the result as a List of Spectra objects. This list ## is parallel to chromPeaks hence the mapping between chromatographic peaks ## and MS2 spectra is easier. ms2_sps <- chromPeakSpectra(dda, return.type = "List") names(ms2_sps) rownames(chromPeaks(dda)) ms2_sps[[1L]] ## Parameter `msLevel` allows to define from which MS level spectra should ## be returned. By default `msLevel = 2L` but with `msLevel = 1L` all ## MS1 spectra with a retention time within the retention time range of ## a chromatographic peak can be returned. Alternatively, selected ## spectra can be returned by specifying the selection criteria/method ## with the `method` parameter. Below we extract for each chromatographic ## peak the MS1 spectra with a retention time closest to the ## chromatographic peak's apex position. Alternatively it would also be ## possible to select the spectrum with the highest total signal or ## highest (maximal) intensity. ms1_sps <- chromPeakSpectra(dda, msLevel = 1L, method = "closest_rt") ms1_sps ## Parameter peaks would allow to extract spectra for specific peaks only. ## Peaks can be defined with parameter `peaks` which can be either an ## `integer` with the index of the peak in the `chromPeaks` matrix or a ## `character` with its rowname in `chromPeaks`. chromPeakSpectra(dda, msLevel = 1L, method = "closest_rt", peaks = c(3, 5))
The chromPeakSummary() method calculates summary statistics or other
metrics for each of the identified chromatographic peaks in an xcms result
object, such as the XcmsExperiment(). Different metrics can be calculated,
depending upon (and configured by) using dedicated parameter classes. As a
result, the method returns a matrix or data.frame with one row per
chromatographic peak. Each column contains calculated values, depending on
the used method/parameter class.
Currently implemented methods/parameter classes are:
BetaDistributionParam: calculates the beta_cor and beta_snr quality
metrics as described in Kumler 2023 representing the result from a
(correlation) test of similarity (using Pearson's correlation coefficient)
to a bell curve and the signal-to-noise ratio calculated on the residuals
of this test.
chromPeakSummary(object, param, ...) ## S4 method for signature 'XcmsExperiment,BetaDistributionParam' chromPeakSummary( object, param, msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam() ) BetaDistributionParam()chromPeakSummary(object, param, ...) ## S4 method for signature 'XcmsExperiment,BetaDistributionParam' chromPeakSummary( object, param, msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam() ) BetaDistributionParam()
object |
an xcms result object containing information on identified chromatographic peaks. |
param |
a parameter object defining the method/summaries that should be calculated (see description above for supported parameter classes). |
... |
additional arguments passed to the method implementation. |
msLevel |
|
chunkSize |
|
BPPARAM |
Parallel processing setup. See |
A matrix or data.frame with the same number of rows as there are
chromatographic peaks. Columns contain the calculated values. The number of
columns, their names and content depend on the used parameter object. See
the respective documentation above for more details.
Pablo Vangeenderhuysen, Johannes Rainer, William Kumler
Kumler W, Hazelton B J and Ingalls A E (2023) "Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics" BMC Bioinformatics 24(1):404. doi: 10.1186/s12859-023-05533-4
Collecting Peaks into xcmsFragmentss from several
MS-runs using xcmsSet and
xcmsRaw.
object |
(empty) |
xs |
A |
compMethod |
("floor", "round", "none"): compare-method which is used to find the parent peak of a MSnpeak through comparing the MZ-values of the MS1peaks with the MSnParentPeaks. |
snthresh, mzgap, uniq
|
these are the parameters for the getspec-peakpicker included in xcmsRaw. |
After running collect(xFragments,xSet) The peak table of the xcmsFragments includes the ms1Peaks from all experiments stored in a xcmsSet-object. Further it contains the relevant msN-peaks from the xcmsRaw-objects, which were created temporarily with the paths in xcmsSet.
A matrix with columns:
peakID |
unique identifier of every peak |
MSnParentPeakID |
PeakID of the parent peak of a msLevel>1 - peak, it is 0 if the peak is msLevel 1. |
msLevel |
The msLevel of the peak. |
rt |
retention time of the peak midpoint |
mz |
the mz-Value of the peak |
intensity |
the intensity of the peak |
sample |
the number of the sample from the xcmsSet |
GroupPeakMSn |
Used for grouped xcmsSet groups |
CollisionEnergy |
The collision energy of the fragment |
collect(object, ...)
For xcms >= 3.15.3 please use MSnbase::compareChromatograms() instead
of correlate
Correlate intensities of two chromatograms with each other. If the two
Chromatogram objects have different retention times they are first
aligned to match data points in the first to data points in the second
chromatogram. See help on alignRt in MSnbase::Chromatogram() for more
details.
If correlate is called on a single MSnbase::MChromatograms() object a
pairwise correlation of each chromatogram with each other is performed and
a matrix with the correlation coefficients is returned.
Note that the correlation of two chromatograms depends also on their order,
e.g. correlate(chr1, chr2) might not be identical to
correlate(chr2, chr1). The lower and upper triangular part of the
correlation matrix might thus be different.
## S4 method for signature 'Chromatogram,Chromatogram' correlate( x, y, use = "pairwise.complete.obs", method = c("pearson", "kendall", "spearman"), align = c("closest", "approx"), ... ) ## S4 method for signature 'MChromatograms,missing' correlate( x, y = NULL, use = "pairwise.complete.obs", method = c("pearson", "kendall", "spearman"), align = c("closest", "approx"), ... ) ## S4 method for signature 'MChromatograms,MChromatograms' correlate( x, y = NULL, use = "pairwise.complete.obs", method = c("pearson", "kendall", "spearman"), align = c("closest", "approx"), ... )## S4 method for signature 'Chromatogram,Chromatogram' correlate( x, y, use = "pairwise.complete.obs", method = c("pearson", "kendall", "spearman"), align = c("closest", "approx"), ... ) ## S4 method for signature 'MChromatograms,missing' correlate( x, y = NULL, use = "pairwise.complete.obs", method = c("pearson", "kendall", "spearman"), align = c("closest", "approx"), ... ) ## S4 method for signature 'MChromatograms,MChromatograms' correlate( x, y = NULL, use = "pairwise.complete.obs", method = c("pearson", "kendall", "spearman"), align = c("closest", "approx"), ... )
x |
|
y |
|
use |
|
method |
|
align |
|
... |
optional parameters passed along to the |
numeric(1) or matrix (if called on MChromatograms objects)
with the correlation coefficient. If a matrix is returned, the rows
represent the chromatograms in x and the columns the chromatograms in
y.
Michael Witting, Johannes Rainer
library(MSnbase) chr1 <- Chromatogram(rtime = 1:10 + rnorm(n = 10, sd = 0.3), intensity = c(5, 29, 50, NA, 100, 12, 3, 4, 1, 3)) chr2 <- Chromatogram(rtime = 1:10 + rnorm(n = 10, sd = 0.3), intensity = c(80, 50, 20, 10, 9, 4, 3, 4, 1, 3)) chr3 <- Chromatogram(rtime = 3:9 + rnorm(7, sd = 0.3), intensity = c(53, 80, 130, 15, 5, 3, 2)) chrs <- MChromatograms(list(chr1, chr2, chr3)) ## Using `compareChromatograms` instead of `correlate`. compareChromatograms(chr1, chr2) compareChromatograms(chr2, chr1) compareChromatograms(chrs, chrs)library(MSnbase) chr1 <- Chromatogram(rtime = 1:10 + rnorm(n = 10, sd = 0.3), intensity = c(5, 29, 50, NA, 100, 12, 3, 4, 1, 3)) chr2 <- Chromatogram(rtime = 1:10 + rnorm(n = 10, sd = 0.3), intensity = c(80, 50, 20, 10, 9, 4, 3, 4, 1, 3)) chr3 <- Chromatogram(rtime = 3:9 + rnorm(7, sd = 0.3), intensity = c(53, 80, 130, 15, 5, 3, 2)) chrs <- MChromatograms(list(chr1, chr2, chr3)) ## Using `compareChromatograms` instead of `correlate`. compareChromatograms(chr1, chr2) compareChromatograms(chr2, chr1) compareChromatograms(chrs, chrs)
Create a report showing the most significant differences between two sets of samples. Optionally create extracted ion chromatograms for the most significant differences.
object |
the |
class1 |
character vector with the first set of sample classes to be compared |
class2 |
character vector with the second set of sample classes to be compared |
filebase |
base file name to save report, |
eicmax |
number of the most significantly different analytes to create EICs for |
eicwidth |
width (in seconds) of EICs produced |
sortpval |
logical indicating whether the reports should be sorted by p-value |
classeic |
character vector with the sample classes to include in the EICs |
value |
intensity values to be used for the diffreport. |
metlin |
mass uncertainty to use for generating link to Metlin metabolite database. the sign of the uncertainty indicates negative or positive mode data for M+H or M-H calculation. a value of FALSE or 0 removes the column |
h |
Numeric variable for the height of the eic and boxplots that are printed out. |
w |
Numeric variable for the width of the eic and boxplots print out made. |
mzdec |
Number of decimal places of title m/z values in the eic plot. |
missing |
|
... |
optional arguments to be passed to |
This method handles creation of summary reports with statistics about which analytes were most significantly different between two sets of samples. It computes Welch's two-sample t-statistic for each analyte and ranks them by p-value. It returns a summary report that can optionally be written out to a tab-separated file.
Additionally, it does all the heavy lifting involved in creating superimposed extracted ion chromatograms for a given number of analytes. It does so by reading the raw data files associated with the samples of interest one at a time. As it does so, it prints the name of the sample it is currently reading. Depending on the number and size of the samples, this process can take a long time.
If a base file name is provided, the report (see Value section) will be saved to a tab separated file. If EICs are generated, they will be saved as 640x480 PNG files in a newly created subdirectory. However this parameter can be changed with the commands arguments. The numbered file names correspond to the rows in the report.
Chromatographic traces in the EICs are colored and labeled by
their sample class. Sample classes take their color from the
current palette. The color a sample class is assigned is dependent
its order in the xcmsSet object, not the order given in
the class arguments. Thus levels(sampclass(object))[1]
would use color palette()[1] and so on. In that way, sample
classes maintain the same color across any number of different
generated reports.
When there are multiple sample classes, xcms will produce boxplots of the different classes and will generate a single anova p-value statistic. Like the eic's the plot number corresponds to the row number in the report.
A data frame with the following columns:
fold |
mean fold change (always greater than 1, see |
tstat |
Welch's two sample t-statistic, positive for analytes having
greater intensity in |
pvalue |
p-value of t-statistic |
anova |
p-value of the anova statistic if there are multiple classes |
mzmed |
median m/z of peaks in the group |
mzmin |
minimum m/z of peaks in the group |
mzmax |
maximum m/z of peaks in the group |
rtmed |
median retention time of peaks in the group |
rtmin |
minimum retention time of peaks in the group |
rtmax |
maximum retention time of peaks in the group |
npeaks |
number of peaks assigned to the group |
Sample Classes |
number samples from each sample class represented in the group |
metlin |
A URL to metlin for that mass |
... |
one column for every sample class |
Sample Names |
integrated intensity value for every sample |
... |
one column for every sample |
diffreport(object, class1 = levels(sampclass(object))[1],
class2 = levels(sampclass(object))[2],
filebase = character(), eicmax = 0, eicwidth = 200,
sortpval = TRUE, classeic = c(class1,class2),
value=c("into","maxo","intb"), metlin = FALSE,
h=480,w=640, mzdec=2, missing =
numeric(), ...)
OnDiskMSnExp objectdirname allows to get and set the path to the directory containing the
source files of the OnDiskMSnExp (or XCMSnExp) object.
## S4 method for signature 'OnDiskMSnExp' dirname(path) ## S4 replacement method for signature 'OnDiskMSnExp' dirname(path) <- value## S4 method for signature 'OnDiskMSnExp' dirname(path) ## S4 replacement method for signature 'OnDiskMSnExp' dirname(path) <- value
path |
|
value |
|
Johannes Rainer
The function performs retention time correction by assessing
the retention time deviation across all samples using peak groups
(features) containg chromatographic peaks present in most/all samples.
The retention time deviation for these features in each sample is
described by fitting either a polynomial (smooth = "loess") or
a linear (smooth = "linear") model to the data points. The
models are subsequently used to adjust the retention time for each
spectrum in each sample.
do_adjustRtime_peakGroups( peaks, peakIndex, rtime = list(), minFraction = 0.9, extraPeaks = 1, smooth = c("loess", "linear"), span = 0.2, family = c("gaussian", "symmetric"), peakGroupsMatrix = matrix(ncol = 0, nrow = 0), subset = integer(), subsetAdjust = c("average", "previous") )do_adjustRtime_peakGroups( peaks, peakIndex, rtime = list(), minFraction = 0.9, extraPeaks = 1, smooth = c("loess", "linear"), span = 0.2, family = c("gaussian", "symmetric"), peakGroupsMatrix = matrix(ncol = 0, nrow = 0), subset = integer(), subsetAdjust = c("average", "previous") )
peaks |
a |
peakIndex |
a |
rtime |
a |
minFraction |
For |
extraPeaks |
For |
smooth |
For |
span |
For |
family |
For |
peakGroupsMatrix |
optional |
subset |
For |
subsetAdjust |
For |
The alignment bases on the presence of compounds that can be found
in all/most samples of an experiment. The retention times of individual
spectra are then adjusted based on the alignment of the features
corresponding to these house keeping compounds. The parameters
minFraction and extraPeaks can be used to fine tune which
features should be used for the alignment (i.e. which features
most likely correspond to the above mentioned house keeping compounds).
Parameter subset allows to define a subset of samples within the
experiment that should be aligned. All samples not being part of the subset
will be aligned based on the adjustment of the closest sample within the
subset. This allows to e.g. exclude blank samples from the alignment process
with their retention times being still adjusted based on the alignment
results of the real samples.
A list with numeric vectors with the adjusted
retention times grouped by sample.
The method ensures that returned adjusted retention times are increasingly ordered, just as the raw retention times.
Colin Smith, Johannes Rainer
Colin A. Smith, Elizabeth J. Want, Grace O'Maille, Ruben Abagyan and Gary Siuzdak. "XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification" Anal. Chem. 2006, 78:779-787.
This function performs peak density and wavelet based chromatographic peak detection for high resolution LC/MS data in centroid mode Tautenhahn 2008.
do_findChromPeaks_centWave( mz, int, scantime, valsPerSpect, ppm = 25, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, roiList = list(), firstBaselineCheck = TRUE, roiScales = NULL, sleep = 0, extendLengthMSW = FALSE, verboseBetaColumns = FALSE )do_findChromPeaks_centWave( mz, int, scantime, valsPerSpect, ppm = 25, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, roiList = list(), firstBaselineCheck = TRUE, roiScales = NULL, sleep = 0, extendLengthMSW = FALSE, verboseBetaColumns = FALSE )
mz |
Numeric vector with the individual m/z values from all scans/ spectra of one file/sample. |
int |
Numeric vector with the individual intensity values from all scans/spectra of one file/sample. |
scantime |
Numeric vector of length equal to the number of spectra/scans of the data representing the retention time of each scan. |
valsPerSpect |
Numeric vector with the number of values for each spectrum. |
ppm |
|
peakwidth |
|
snthresh |
|
prefilter |
|
mzCenterFun |
Name of the function to calculate the m/z center of the
chromatographic peak. Allowed are: |
integrate |
Integration method. For |
mzdiff |
|
fitgauss |
|
noise |
|
verboseColumns |
|
roiList |
An optional list of regions-of-interest (ROI) representing
detected mass traces. If ROIs are submitted the first analysis step is
omitted and chromatographic peak detection is performed on the submitted
ROIs. Each ROI is expected to have the following elements specified:
|
firstBaselineCheck |
|
roiScales |
Optional numeric vector with length equal to |
sleep |
|
extendLengthMSW |
Option to force centWave to use all scales when
running centWave rather than truncating with the EIC length. Uses the
"open" method to extend the EIC to a integer base-2 length prior to
being passed to |
verboseBetaColumns |
Option to calculate two additional metrics of peak
quality via comparison to an idealized bell curve. Adds |
This algorithm is most suitable for high resolution
LC/{TOF,OrbiTrap,FTICR}-MS data in centroid mode. In the first phase
the method identifies regions of interest (ROIs) representing
mass traces that are characterized as regions with less than ppm
m/z deviation in consecutive scans in the LC/MS map. In detail, starting
with a single m/z, a ROI is extended if a m/z can be found in the next scan
(spectrum) for which the difference to the mean m/z of the ROI is smaller
than the user defined ppm of the m/z. The mean m/z of the ROI is then
updated considering also the newly included m/z value.
These ROIs are then, after some cleanup, analyzed using continuous wavelet
transform (CWT) to locate chromatographic peaks on different scales. The
first analysis step is skipped, if regions of interest are passed with
the roiList parameter.
A matrix, each row representing an identified chromatographic peak, with columns:
"mz": Intensity weighted mean of m/z values of the peak across scans.
"mzmin": Minimum m/z of the peak.
"mzmax": Maximum m/z of the peak.
"rt": Retention time of the peak's midpoint.
"rtmin": Minimum retention time of the peak.
'"rtmax: Maximum retention time of the peak.
"into": Integrated (original) intensity of the peak.
"intb": Per-peak baseline corrected integrated peak intensity.
"maxo": Maximum intensity of the peak.
"sn": Signal to noise ratio, defined as (maxo - baseline)/sd,
sd being the standard deviation of local chromatographic noise.
"egauss": RMSE of Gaussian fit.
Additional columns for verboseColumns = TRUE:
"mu": Gaussian parameter mu.
"sigma": Gaussian parameter sigma.
"h": Gaussian parameter h.
"f": Region number of the m/z ROI where the peak was localized.
"dppm": m/z deviation of mass trace across scans in ppm.
"scale": Scale on which the peak was localized.
"scpos": Peak position found by wavelet analysis (scan number).
"scmin": Left peak limit found by wavelet analysis (scan number).
"scmax": Right peak limit found by wavelet analysis (scan numer).
Additional columns for verboseBetaColumns = TRUE:
"beta_cor": Correlation between an "ideal" bell curve and the raw data.
"beta_snr": Signal-to-noise residuals calculated from the beta_cor fit.
The centWave was designed to work on centroided mode, thus it is expected that such data is presented to the function.
This function exposes core chromatographic peak detection functionality of the *centWave* method. While this function can be called directly, users will generally call the corresponding method for the data object instead.
Ralf Tautenhahn, Johannes Rainer
Ralf Tautenhahn, Christoph Böttcher, and Steffen Neumann "Highly sensitive feature detection for high resolution LC/MS" BMC Bioinformatics 2008, 9:504 doi: 10.1186/1471-2105-9-504
Other core peak detection functions:
do_findChromPeaks_centWaveWithPredIsoROIs(),
do_findChromPeaks_massifquant(),
do_findChromPeaks_matchedFilter(),
do_findPeaks_MSW()
## Load the test file faahko_sub <- loadXcmsData("faahko_sub") ## Subset to one file and restrict to a certain retention time range data <- filterRt(filterFile(faahko_sub, 1), c(2500, 3000)) ## Get m/z and intensity values mzs <- mz(data) ints <- intensity(data) ## Define the values per spectrum: valsPerSpect <- lengths(mzs) ## Calling the function. We're using a large value for noise and prefilter ## to speed up the call in the example - in a real use case we would either ## set the value to a reasonable value or use the default value. res <- do_findChromPeaks_centWave(mz = unlist(mzs), int = unlist(ints), scantime = rtime(data), valsPerSpect = valsPerSpect, noise = 10000, prefilter = c(3, 10000)) head(res)## Load the test file faahko_sub <- loadXcmsData("faahko_sub") ## Subset to one file and restrict to a certain retention time range data <- filterRt(filterFile(faahko_sub, 1), c(2500, 3000)) ## Get m/z and intensity values mzs <- mz(data) ints <- intensity(data) ## Define the values per spectrum: valsPerSpect <- lengths(mzs) ## Calling the function. We're using a large value for noise and prefilter ## to speed up the call in the example - in a real use case we would either ## set the value to a reasonable value or use the default value. res <- do_findChromPeaks_centWave(mz = unlist(mzs), int = unlist(ints), scantime = rtime(data), valsPerSpect = valsPerSpect, noise = 10000, prefilter = c(3, 10000)) head(res)
The do_findChromPeaks_centWaveWithPredIsoROIs performs a
two-step centWave based peak detection: chromatographic peaks are
identified using centWave followed by a prediction of the location of
the identified peaks' isotopes in the mz-retention time space. These
locations are fed as regions of interest (ROIs) to a subsequent
centWave run. All non overlapping peaks from these two peak detection
runs are reported as the final list of identified peaks.
The do_findChromPeaks_centWaveAddPredIsoROIs performs
centWave based peak detection based in regions of interest (ROIs)
representing predicted isotopes for the peaks submitted with argument
peaks. The function returns a matrix with the identified peaks
consisting of all input peaks and peaks representing predicted isotopes
of these (if found by the centWave algorithm).
do_findChromPeaks_centWaveWithPredIsoROIs( mz, int, scantime, valsPerSpect, ppm = 25, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, roiList = list(), firstBaselineCheck = TRUE, roiScales = NULL, snthreshIsoROIs = 6.25, maxCharge = 3, maxIso = 5, mzIntervalExtension = TRUE, polarity = "unknown", extendLengthMSW = FALSE, verboseBetaColumns = FALSE ) do_findChromPeaks_addPredIsoROIs( mz, int, scantime, valsPerSpect, ppm = 25, peakwidth = c(20, 50), snthresh = 6.25, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, peaks. = NULL, maxCharge = 3, maxIso = 5, mzIntervalExtension = TRUE, polarity = "unknown" )do_findChromPeaks_centWaveWithPredIsoROIs( mz, int, scantime, valsPerSpect, ppm = 25, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, roiList = list(), firstBaselineCheck = TRUE, roiScales = NULL, snthreshIsoROIs = 6.25, maxCharge = 3, maxIso = 5, mzIntervalExtension = TRUE, polarity = "unknown", extendLengthMSW = FALSE, verboseBetaColumns = FALSE ) do_findChromPeaks_addPredIsoROIs( mz, int, scantime, valsPerSpect, ppm = 25, peakwidth = c(20, 50), snthresh = 6.25, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, peaks. = NULL, maxCharge = 3, maxIso = 5, mzIntervalExtension = TRUE, polarity = "unknown" )
mz |
Numeric vector with the individual m/z values from all scans/ spectra of one file/sample. |
int |
Numeric vector with the individual intensity values from all scans/spectra of one file/sample. |
scantime |
Numeric vector of length equal to the number of spectra/scans of the data representing the retention time of each scan. |
valsPerSpect |
Numeric vector with the number of values for each spectrum. |
ppm |
|
peakwidth |
|
snthresh |
For |
prefilter |
|
mzCenterFun |
Name of the function to calculate the m/z center of the
chromatographic peak. Allowed are: |
integrate |
Integration method. For |
mzdiff |
|
fitgauss |
|
noise |
|
verboseColumns |
|
roiList |
An optional list of regions-of-interest (ROI) representing
detected mass traces. If ROIs are submitted the first analysis step is
omitted and chromatographic peak detection is performed on the submitted
ROIs. Each ROI is expected to have the following elements specified:
|
firstBaselineCheck |
|
roiScales |
Optional numeric vector with length equal to |
snthreshIsoROIs |
|
maxCharge |
|
maxIso |
|
mzIntervalExtension |
|
polarity |
|
extendLengthMSW |
Option to force centWave to use all scales when
running centWave rather than truncating with the EIC length. Uses the
"open" method to extend the EIC to a integer base-2 length prior to
being passed to |
verboseBetaColumns |
Option to calculate two additional metrics of peak
quality via comparison to an idealized bell curve. Adds |
peaks. |
A matrix such as one returned by
a call to |
For more details on the centWave algorithm see
centWave().
A matrix, each row representing an identified chromatographic peak. All non-overlapping peaks identified in both centWave runs are reported. The matrix columns are:
"mz": Intensity weighted mean of m/z values of the peaks across scans.
"mzmin": Minimum m/z of the peaks.
"mzmax": Maximum m/z of the peaks.
"rt": Retention time of the peak's midpoint.
"rtmin": Minimum retention time of the peak.
"rtmax": Maximum retention time of the peak.
"into": Integrated (original) intensity of the peak.
"intb": Per-peak baseline corrected integrated peak intensity.
"maxo": Maximum intensity of the peak.
"sn": Signal to noise ratio, defined as (maxo - baseline)/sd,
sd being the standard deviation of local chromatographic noise.
"egauss": RMSE of Gaussian fit.
Additional columns for verboseColumns = TRUE:
"mu": Gaussian parameter mu.
"sigma": Gaussian parameter sigma.
"h": Gaussian parameter h.
"f": Region number of the m/z ROI where the peak was localized.
"dppm": m/z deviation of mass trace across scans in ppm.
"scale": Scale on which the peak was localized.
"scpos": Peak position found by wavelet analysis (scan number).
"scmin": Left peak limit found by wavelet analysis (scan number).
"scmax": Right peak limit found by wavelet analysis (scan numer).
Additional columns for verboseBetaColumns = TRUE:
"beta_cor": Correlation between an "ideal" bell curve and the raw data.
"beta_snr": Signal-to-noise residuals calculated from the beta_cor fit.
Hendrik Treutler, Johannes Rainer
Other core peak detection functions:
do_findChromPeaks_centWave(),
do_findChromPeaks_massifquant(),
do_findChromPeaks_matchedFilter(),
do_findPeaks_MSW()
Massifquant is a Kalman filter (KF)-based chromatographic peak
detection for XC-MS data in centroid mode. The identified peaks
can be further refined with the centWave method (see
do_findChromPeaks_centWave() for details on centWave)
by specifying withWave = TRUE.
do_findChromPeaks_massifquant( mz, int, scantime, valsPerSpect, ppm = 10, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, criticalValue = 1.125, consecMissedLimit = 2, unions = 1, checkBack = 0, withWave = FALSE )do_findChromPeaks_massifquant( mz, int, scantime, valsPerSpect, ppm = 10, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, criticalValue = 1.125, consecMissedLimit = 2, unions = 1, checkBack = 0, withWave = FALSE )
mz |
Numeric vector with the individual m/z values from all scans/ spectra of one file/sample. |
int |
Numeric vector with the individual intensity values from all scans/spectra of one file/sample. |
scantime |
Numeric vector of length equal to the number of spectra/scans of the data representing the retention time of each scan. |
valsPerSpect |
Numeric vector with the number of values for each spectrum. |
ppm |
|
peakwidth |
|
snthresh |
|
prefilter |
|
mzCenterFun |
Name of the function to calculate the m/z center of the
chromatographic peak. Allowed are: |
integrate |
Integration method. For |
mzdiff |
|
fitgauss |
|
noise |
|
verboseColumns |
|
criticalValue |
|
consecMissedLimit |
|
unions |
|
checkBack |
|
withWave |
|
This algorithm's performance has been tested rigorously
on high resolution LC/(OrbiTrap, TOF)-MS data in centroid mode.
Simultaneous kalman filters identify peaks and calculate their
area under the curve. The default parameters are set to operate on
a complex LC-MS Orbitrap sample. Users will find it useful to do some
simple exploratory data analysis to find out where to set a minimum
intensity, and identify how many scans an average peak spans. The
consecMissedLimit parameter has yielded good performance on
Orbitrap data when set to (2) and on TOF data it was found best
to be at (1). This may change as the algorithm has yet to be
tested on many samples. The criticalValue parameter is perhaps
most dificult to dial in appropriately and visual inspection of peak
identification is the best suggested tool for quick optimization.
The ppm and checkBack parameters have shown less influence
than the other parameters and exist to give users flexibility and
better accuracy.
A matrix, each row representing an identified chromatographic peak, with columns:
"mz": Intensity weighted mean of m/z values of the peaks across
scans.
"mzmin": Minumum m/z of the peak.
"mzmax": Maximum m/z of the peak.
"rtmin": Minimum retention time of the peak.
"rtmax": Maximum retention time of the peak.
"rt": Retention time of the peak's midpoint.
"into": Integrated (original) intensity of the peak.
"maxo": Maximum intensity of the peak.
If withWave is set to TRUE, the result is the same as
returned by the do_findChromPeaks_centWave() method.
Christopher Conley
Conley CJ, Smith R, Torgrip RJ, Taylor RM, Tautenhahn R and Prince JT "Massifquant: open-source Kalman filter-based XC-MS isotope trace feature detection" Bioinformatics 2014, 30(18):2636-43. doi: 10.1093/bioinformatics/btu359
Other core peak detection functions:
do_findChromPeaks_centWave(),
do_findChromPeaks_centWaveWithPredIsoROIs(),
do_findChromPeaks_matchedFilter(),
do_findPeaks_MSW()
## Load the test file faahko_sub <- loadXcmsData("faahko_sub") ## Subset to one file and restrict to a certain retention time range data <- filterRt(filterFile(faahko_sub, 1), c(2500, 3000)) ## Get m/z and intensity values mzs <- mz(data) ints <- intensity(data) ## Define the values per spectrum: valsPerSpect <- lengths(mzs) ## Perform the peak detection using massifquant - setting prefilter to ## a high value to speed up the call for the example res <- do_findChromPeaks_massifquant(mz = unlist(mzs), int = unlist(ints), scantime = rtime(data), valsPerSpect = valsPerSpect, prefilter = c(3, 10000)) head(res)## Load the test file faahko_sub <- loadXcmsData("faahko_sub") ## Subset to one file and restrict to a certain retention time range data <- filterRt(filterFile(faahko_sub, 1), c(2500, 3000)) ## Get m/z and intensity values mzs <- mz(data) ints <- intensity(data) ## Define the values per spectrum: valsPerSpect <- lengths(mzs) ## Perform the peak detection using massifquant - setting prefilter to ## a high value to speed up the call for the example res <- do_findChromPeaks_massifquant(mz = unlist(mzs), int = unlist(ints), scantime = rtime(data), valsPerSpect = valsPerSpect, prefilter = c(3, 10000)) head(res)
This function identifies peaks in the chromatographic
time domain as described in Smith 2006. The intensity values are
binned by cutting The LC/MS data into slices (bins) of a mass unit
(binSize m/z) wide. Within each bin the maximal intensity is
selected. The peak detection is then performed in each bin by
extending it based on the steps parameter to generate slices
comprising bins current_bin - steps +1 to
current_bin + steps - 1.
Each of these slices is then filtered with matched filtration using
a second-derative Gaussian as the model peak shape. After filtration
peaks are detected using a signal-to-ration cut-off. For more details
and illustrations see Smith 2006.
do_findChromPeaks_matchedFilter( mz, int, scantime, valsPerSpect, binSize = 0.1, impute = "none", baseValue, distance, fwhm = 30, sigma = fwhm/2.3548, max = 5, snthresh = 10, steps = 2, mzdiff = 0.8 - binSize * steps, index = FALSE, sleep = 0 )do_findChromPeaks_matchedFilter( mz, int, scantime, valsPerSpect, binSize = 0.1, impute = "none", baseValue, distance, fwhm = 30, sigma = fwhm/2.3548, max = 5, snthresh = 10, steps = 2, mzdiff = 0.8 - binSize * steps, index = FALSE, sleep = 0 )
mz |
Numeric vector with the individual m/z values from all scans/ spectra of one file/sample. |
int |
Numeric vector with the individual intensity values from all scans/spectra of one file/sample. |
scantime |
Numeric vector of length equal to the number of spectra/scans of the data representing the retention time of each scan. |
valsPerSpect |
Numeric vector with the number of values for each spectrum. |
binSize |
|
impute |
Character string specifying the method to be used for missing
value imputation. Allowed values are |
baseValue |
The base value to which empty elements should be set. This
is only considered for |
distance |
For |
fwhm |
|
sigma |
|
max |
|
snthresh |
|
steps |
|
mzdiff |
|
index |
|
sleep |
|
The intensities are binned by the provided m/z values within each
spectrum (scan). Binning is performed such that the bins are centered
around the m/z values (i.e. the first bin includes all m/z values between
min(mz) - bin_size/2 and min(mz) + bin_size/2).
For more details on binning and missing value imputation see [binYonX()] and [imputeLinInterpol()] functions.
A matrix, each row representing an identified chromatographic peak, with columns:
"mz": Intensity weighted mean of m/z values of the peak across scans.
"mzmin": Minimum m/z of the peak.
"mzmax": Maximum m/z of the peak.
"rt": Retention time of the peak's midpoint.
"rtmin": Minimum retention time of the peak.
"rtmax": Maximum retention time of the peak.
"into": Integrated (original) intensity of the peak.
"intf": Integrated intensity of the filtered peak.
"maxo": Maximum intensity of the peak.
"maxf": Maximum intensity of the filtered peak.
"i": Rank of peak in merged EIC (<= max).
"sn": Signal to noise ratio of the peak.
This function exposes core peak detection functionality of the matchedFilter method.
Colin A Smith, Johannes Rainer
Colin A. Smith, Elizabeth J. Want, Grace O'Maille, Ruben Abagyan and Gary Siuzdak. "XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification" Anal. Chem. 2006, 78:779-787. doi: 10.1021/ac051437y
binYonX() for a binning function,
imputeLinInterpol() for the interpolation of missing values.
Other core peak detection functions:
do_findChromPeaks_centWave(),
do_findChromPeaks_centWaveWithPredIsoROIs(),
do_findChromPeaks_massifquant(),
do_findPeaks_MSW()
## Load the test file faahko_sub <- loadXcmsData("faahko_sub") ## Subset to one file and restrict to a certain retention time range data <- filterRt(filterFile(faahko_sub, 1), c(2500, 3000)) ## Get m/z and intensity values mzs <- mz(data) ints <- intensity(data) ## Define the values per spectrum: valsPerSpect <- lengths(mzs) res <- do_findChromPeaks_matchedFilter(mz = unlist(mzs), int = unlist(ints), scantime = rtime(data), valsPerSpect = valsPerSpect) head(res)## Load the test file faahko_sub <- loadXcmsData("faahko_sub") ## Subset to one file and restrict to a certain retention time range data <- filterRt(filterFile(faahko_sub, 1), c(2500, 3000)) ## Get m/z and intensity values mzs <- mz(data) ints <- intensity(data) ## Define the values per spectrum: valsPerSpect <- lengths(mzs) res <- do_findChromPeaks_matchedFilter(mz = unlist(mzs), int = unlist(ints), scantime = rtime(data), valsPerSpect = valsPerSpect) head(res)
This function performs peak detection in mass spectrometry direct injection spectrum using a wavelet based algorithm.
do_findPeaks_MSW( mz, int, snthresh = 3, verboseColumns = FALSE, scantime = numeric(), valsPerSpect = integer(), ... )do_findPeaks_MSW( mz, int, snthresh = 3, verboseColumns = FALSE, scantime = numeric(), valsPerSpect = integer(), ... )
mz |
Numeric vector with the individual m/z values from all scans/ spectra of one file/sample. |
int |
Numeric vector with the individual intensity values from all scans/spectra of one file/sample. |
snthresh |
|
verboseColumns |
|
scantime |
ignored. |
valsPerSpect |
ignored. |
... |
Additional parameters to be passed to the
|
This is a wrapper around the peak picker in Bioconductor's
MassSpecWavelet package calling
peakDetectionCWT() and tuneInPeakInfo() functions. See the
xcmsDirect vignette for more information.
A matrix, each row representing an identified peak, with columns:
"mz": m/z value of the peak at the centroid position.
"mzmin": Minimum m/z of the peak.
"mzmax": Maximum m/z of the peak.
"rt": Always -1.
"rtmin": Always -1.
"rtmax": Always -1.
"into": Integrated (original) intensity of the peak.
"maxo": Maximum intensity of the peak.
"intf": Always NA.
"maxf": Maximum MSW-filter response of the peak.
"sn": Signal to noise ratio.
Joachim Kutzera, Steffen Neumann, Johannes Rainer
Other core peak detection functions:
do_findChromPeaks_centWave(),
do_findChromPeaks_centWaveWithPredIsoROIs(),
do_findChromPeaks_massifquant(),
do_findChromPeaks_matchedFilter()
The do_groupChromPeaks_density function performs chromatographic peak
grouping based on the density (distribution) of peaks, found in different
samples, along the retention time axis in slices of overlapping m/z ranges.
By default (with parameter ppm = 0) these m/z ranges have all the same
(constant) size (depending on parameter binSize). For values of ppm
larger than 0 the m/z bins (ranges or slices) will have increasing sizes
depending on the m/z value. This better models the m/z-dependent
measurement error/precision seen on some MS instruments.
do_groupChromPeaks_density( peaks, sampleGroups, bw = 30, minFraction = 0.5, minSamples = 1, binSize = 0.25, maxFeatures = 50, sleep = 0, index = seq_len(nrow(peaks)), ppm = 0, rtCenterFun = c("median", "mean", "wMean") )do_groupChromPeaks_density( peaks, sampleGroups, bw = 30, minFraction = 0.5, minSamples = 1, binSize = 0.25, maxFeatures = 50, sleep = 0, index = seq_len(nrow(peaks)), ppm = 0, rtCenterFun = c("median", "mean", "wMean") )
peaks |
A |
sampleGroups |
For |
bw |
For |
minFraction |
For |
minSamples |
For |
binSize |
For |
maxFeatures |
For |
sleep |
|
index |
An optional |
ppm |
For |
rtCenterFun |
For |
For overlapping slices along the mz dimension, the function calculates the density distribution of identified peaks along the retention time axis and groups peaks from the same or different samples that are close to each other. See (Smith 2006) for more details.
A data.frame, each row representing a (mz-rt) feature (i.e. a peak group)
with columns:
"mzmed": median of the peaks' apex mz values.
"mzmin": smallest mz value of all peaks' apex within the feature.
"mzmax":largest mz value of all peaks' apex within the feature.
"rtmed": the median of the peaks' retention times.
"rtmin": the smallest retention time of the peaks in the group.
"rtmax": the largest retention time of the peaks in the group.
"npeaks": the total number of peaks assigned to the feature.
"peakidx": a list with the indices of all peaks in a feature in the
peaks input matrix.
Note that this number can be larger than the total number of samples, since multiple peaks from the same sample could be assigned to a feature.
The default settings might not be appropriate for all LC/GC-MS setups,
especially the bw and binSize parameter should be adjusted
accordingly.
Colin Smith, Johannes Rainer
Colin A. Smith, Elizabeth J. Want, Grace O'Maille, Ruben Abagyan and Gary Siuzdak. "XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification" Anal. Chem. 2006, 78:779-787. doi: 10.1021/ac051437y
Other core peak grouping algorithms:
do_groupChromPeaks_nearest(),
do_groupPeaks_mzClust()
## Load the test file library(xcms) library(MsExperiment) faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Extract the matrix with the identified peaks from the xcmsSet: pks <- chromPeaks(faahko_sub) ## Perform the peak grouping with default settings: res <- do_groupChromPeaks_density(pks, sampleGroups = rep(1, 3)) ## The feature definitions: head(res)## Load the test file library(xcms) library(MsExperiment) faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Extract the matrix with the identified peaks from the xcmsSet: pks <- chromPeaks(faahko_sub) ## Perform the peak grouping with default settings: res <- do_groupChromPeaks_density(pks, sampleGroups = rep(1, 3)) ## The feature definitions: head(res)
The do_groupChromPeaks_nearest function groups peaks across samples by
creating a master peak list and assigning corresponding peaks from all
samples to each peak group (i.e. feature). The method is inspired by the
correspondence algorithm of mzMine (Katajamaa 2006).
do_groupChromPeaks_nearest( peaks, sampleGroups, mzVsRtBalance = 10, absMz = 0.2, absRt = 15, kNN = 10 )do_groupChromPeaks_nearest( peaks, sampleGroups, mzVsRtBalance = 10, absMz = 0.2, absRt = 15, kNN = 10 )
peaks |
A |
sampleGroups |
For |
mzVsRtBalance |
For |
absMz |
For |
absRt |
For |
kNN |
For |
A list with elements "featureDefinitions" and
"peakIndex". "featureDefinitions" is a matrix, each row
representing an (mz-rt) feature (i.e. peak group) with columns:
"mzmed": median of the peaks' apex mz values.
"mzmin": smallest mz value of all peaks' apex within the feature.
"mzmax":largest mz value of all peaks' apex within the feature.
"rtmed": the median of the peaks' retention times.
"rtmin": the smallest retention time of the peaks in the feature.
"rtmax": the largest retention time of the peaks in the feature.
"npeaks": the total number of peaks assigned to the feature.
"peakIndex" is a list with the indices of all peaks in a feature in the
peaks input matrix.
Katajamaa M, Miettinen J, Oresic M: MZmine: Toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics 2006, 22:634-636. doi: 10.1093/bioinformatics/btk039
Other core peak grouping algorithms:
do_groupChromPeaks_density(),
do_groupPeaks_mzClust()
The do_groupPeaks_mzClust function performs high resolution
correspondence on single spectra samples.
do_groupPeaks_mzClust( peaks, sampleGroups, ppm = 20, absMz = 0, minFraction = 0.5, minSamples = 1 )do_groupPeaks_mzClust( peaks, sampleGroups, ppm = 20, absMz = 0, minFraction = 0.5, minSamples = 1 )
peaks |
A |
sampleGroups |
For |
ppm |
For |
absMz |
For |
minFraction |
For |
minSamples |
For |
A list with elements "featureDefinitions" and
"peakIndex". "featureDefinitions" is a matrix, each row
representing an (mz-rt) feature (i.e. peak group) with columns:
"mzmed": median of the peaks' apex mz values.
"mzmin": smallest mz value of all peaks' apex within the feature.
"mzmax": largest mz value of all peaks' apex within the feature.
"rtmed": always -1.
"rtmin": always -1.
"rtmax": always -1.
"npeaks": the total number of peaks assigned to the feature. Note that
this number can be larger than the total number of samples, since
multiple peaks from the same sample could be assigned to a group.
"peakIndex" is a list with the indices of all peaks in a peak group in
the peaks input matrix.
Saira A. Kazmi, Samiran Ghosh, Dong-Guk Shin, Dennis W. Hill
and David F. Grant
Alignment of high resolution mass spectra:
development of a heuristic approach for metabolomics.
Metabolomics,
Vol. 2, No. 2, 75-83 (2006)
Other core peak grouping algorithms:
do_groupChromPeaks_density(),
do_groupChromPeaks_nearest()
The DratioFilter class and method enable users to filter features from an
XcmsExperiment or SummarizedExperiment object based on the D-ratio or
dispersion ratio. This is defined as the standard deviation for QC
samples divided by the standard deviation for biological test samples, for
each feature of the object (Broadhurst et al.).
This filter is part of the possible dispatch of the generic function
filterFeatures. Features above (>) the user-input threshold will be
removed from the entire dataset.
DratioFilter( threshold = 0.5, qcIndex = integer(), studyIndex = integer(), na.rm = TRUE, mad = FALSE ) ## S4 method for signature 'XcmsResult,DratioFilter' filterFeatures(object, filter, ...) ## S4 method for signature 'SummarizedExperiment,DratioFilter' filterFeatures(object, filter, assay = 1)DratioFilter( threshold = 0.5, qcIndex = integer(), studyIndex = integer(), na.rm = TRUE, mad = FALSE ) ## S4 method for signature 'XcmsResult,DratioFilter' filterFeatures(object, filter, ...) ## S4 method for signature 'SummarizedExperiment,DratioFilter' filterFeatures(object, filter, assay = 1)
threshold |
|
qcIndex |
|
studyIndex |
|
na.rm |
|
mad |
|
object |
|
filter |
The parameter object selecting and configuring the type of
filtering. It can be one of the following classes: |
... |
Optional parameters. For |
assay |
For filtering of |
For DratioFilter: a DratioFilter class. filterFeatures return
the input object minus the features that did not met the user input threshold
Philippine Louail
Broadhurst D, Goodacre R, Reinke SN, Kuligowski J, Wilson ID, Lewis MR, Dunn WB. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics. 2018;14(6):72. doi: 10.1007/s11306-018-1367-3. Epub 2018 May 18. PMID: 29805336; PMCID: PMC5960010.
Other Filter features in xcms:
BlankFlag,
PercentMissingFilter,
RsdFilter
estimatePrecursorIntensity() determines the precursor intensity for a MS 2
spectrum based on the intensity of the respective signal from the
neighboring MS 1 spectra (i.e. based on the peak with the m/z matching the
precursor m/z of the MS 2 spectrum). Based on parameter method either the
intensity of the peak from the previous MS 1 scan is used
(method = "previous") or an interpolation between the intensity from the
previous and subsequent MS1 scan is used (method = "interpolation", which
considers also the retention times of the two MS1 scans and the retention
time of the MS2 spectrum).
## S4 method for signature 'MsExperiment' estimatePrecursorIntensity( object, ppm = 10, tolerance = 0, method = c("previous", "interpolation"), BPPARAM = bpparam() ) ## S4 method for signature 'OnDiskMSnExp' estimatePrecursorIntensity( object, ppm = 10, tolerance = 0, method = c("previous", "interpolation"), BPPARAM = bpparam() )## S4 method for signature 'MsExperiment' estimatePrecursorIntensity( object, ppm = 10, tolerance = 0, method = c("previous", "interpolation"), BPPARAM = bpparam() ) ## S4 method for signature 'OnDiskMSnExp' estimatePrecursorIntensity( object, ppm = 10, tolerance = 0, method = c("previous", "interpolation"), BPPARAM = bpparam() )
object |
|
ppm |
|
tolerance |
|
method |
|
BPPARAM |
parallel processing setup. See |
numeric with length equal to the number of spectra in x. NA is
returned for MS 1 spectra or if no matching peak in a MS 1 scan can be
found for an MS 2 spectrum
Johannes Rainer with feedback and suggestions from Corey Broeckling
A general function for asymmetric chromatographic peaks.
etg(x, H, t1, tt, k1, kt, lambda1, lambdat, alpha, beta)etg(x, H, t1, tt, k1, kt, lambda1, lambdat, alpha, beta)
x |
times to evaluate function at |
H |
peak height |
t1 |
time of leading edge inflection point |
tt |
time of trailing edge inflection point |
k1 |
leading edge parameter |
kt |
trailing edge parameter |
lambda1 |
leading edge parameter |
lambdat |
trailing edge parameter |
alpha |
leading edge parameter |
beta |
trailing edge parameter |
The function evaluated at times x.
Colin A. Smith, [email protected]
Jianwei Li. Development and Evaluation of Flexible Empirical Peak Functions for Processing Chromatographic Peaks. Anal. Chem., 69 (21), 4452-4462, 1997. http://dx.doi.org/10.1021/ac970481d
Export the feature table for further analysis in the MetaboAnalyst
software (or the MetaboAnalystR R package).
exportMetaboAnalyst( x, file = NULL, label, value = "into", digits = NULL, groupnames = FALSE, ... )exportMetaboAnalyst( x, file = NULL, label, value = "into", digits = NULL, groupnames = FALSE, ... )
x |
XCMSnExp object with identified chromatographic peaks grouped across samples. |
file |
|
label |
either |
value |
|
digits |
|
groupnames |
|
... |
additional parameters to be passed to the |
If file is not specified, the function returns the matrix in
the format supported by MetaboAnalyst.
Johannes Rainer
data.frame containing MS dataUPDATE: the extractMsData and plotMsData functions are deprecated
and as(x, "data.frame") and plot(x, type = "XIC") (x being an
OnDiskMSnExp or XCMSnExp object) should be used instead. See examples
below. Be aware that filtering the raw object might however drop the
adjusted retention times. In such cases it is advisable to use the
applyAdjustedRtime() function prior to filtering.
Extract a data.frame of retention time, mz and intensity
values from each file/sample in the provided rt-mz range (or for the full
data range if rt and mz are not defined).
## S4 method for signature 'OnDiskMSnExp' extractMsData(object, rt, mz, msLevel = 1L) ## S4 method for signature 'XCMSnExp' extractMsData( object, rt, mz, msLevel = 1L, adjustedRtime = hasAdjustedRtime(object) )## S4 method for signature 'OnDiskMSnExp' extractMsData(object, rt, mz, msLevel = 1L) ## S4 method for signature 'XCMSnExp' extractMsData( object, rt, mz, msLevel = 1L, adjustedRtime = hasAdjustedRtime(object) )
object |
A |
rt |
|
mz |
|
msLevel |
|
adjustedRtime |
(for |
A list of length equal to the number of samples/files in
object. Each element being a data.frame with columns
"rt", "mz" and "i" with the retention time, mz and
intensity tuples of a file. If no data is available for the mz-rt range
in a file a data.frame with 0 rows is returned for that file.
Johannes Rainer
XCMSnExp for the data object.
## Load a test data set with detected peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) ## Extract the full MS data for a certain retention time range ## as a data.frame tmp <- filterRt(faahko_sub, rt = c(2800, 2900)) ms_all <- as(tmp, "data.frame") head(ms_all) nrow(ms_all)## Load a test data set with detected peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) ## Extract the full MS data for a certain retention time range ## as a data.frame tmp <- filterRt(faahko_sub, rt = c(2800, 2900)) ms_all <- as(tmp, "data.frame") head(ms_all) nrow(ms_all)
Feature compounding aims at identifying and grouping LC-MS features
representing different ions or adducts (including isotopes) of the same
originating compound.
The MsFeatures package
provides a general framework and functionality to group features based on
different properties. The groupFeatures methods for XcmsExperiment() or
XCMSnExp objects implemented in xcms extend these to enable
the compounding of LC-MS data considering also e.g. feature peak shaped.
Note that these functions simply define feature groups but don't
actually aggregate or combine the features.
See MsFeatures::groupFeatures() for an overview on the general feature
grouping concept as well as details on the individual settings and
parameters.
The available options for groupFeatures on xcms preprocessing results
(i.e. on XcmsExperiment or XCMSnExp objects after correspondence
analysis with groupChromPeaks()) are:
Grouping by similar retention times: groupFeatures-similar-rtime().
Grouping by similar feature values across samples:
MsFeatures::AbundanceSimilarityParam().
Grouping by similar peak shape of extracted ion chromatograms:
EicSimilarityParam().
An ideal workflow grouping features should sequentially perform the above methods (in the listed order).
Compounded feature groups can be accessed with the featureGroups function.
## S4 method for signature 'XcmsResult' featureGroups(object) ## S4 replacement method for signature 'XcmsResult' featureGroups(object) <- value## S4 method for signature 'XcmsResult' featureGroups(object) ## S4 replacement method for signature 'XcmsResult' featureGroups(object) <- value
object |
an |
value |
for |
Johannes Rainer, Mar Garcia-Aloy, Vinicius Veri Hernandes
plotFeatureGroups() for visualization of grouped features.
Extract ion chromatograms for features in an XcmsExperiment or
XCMSnExp object. The function returns for each feature the
extracted ion chromatograms (along with all associated chromatographic
peaks) in each sample. The chromatogram is extracted from the m/z - rt
region that includes all chromatographic peaks of a feature. By default,
this region is defined using the range of the chromatographic peaks' m/z
and retention times (with mzmin = min, mzmax = max, rtmin = min and
rtmax = max). For some features, and depending on the data, the m/z and
rt range can thus be relatively large. The boundaries of the m/z - rt
region can also be restricted by changing parameters mzmin, mzmax,
rtmin and rtmax to a different functions, such as median.
By default only chromatographic peaks associated with a feature are
included in the returned XChromatograms object. For object being an
XCMSnExp object parameter include allows also to return all
chromatographic peaks with their apex position within the selected
region (include = "apex_within") or any chromatographic peak overlapping
the m/z and retention time range (include = "any").
featureChromatograms(object, ...) ## S4 method for signature 'XcmsExperiment' featureChromatograms( object, expandRt = 0, expandMz = 0, aggregationFun = "max", features = character(), return.type = "XChromatograms", chunkSize = 2L, mzmin = min, mzmax = max, rtmin = min, rtmax = max, ..., progressbar = TRUE, BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp' featureChromatograms( object, expandRt = 0, aggregationFun = "max", features, include = c("feature_only", "apex_within", "any", "all"), filled = FALSE, n = length(fileNames(object)), value = c("maxo", "into"), expandMz = 0, ... )featureChromatograms(object, ...) ## S4 method for signature 'XcmsExperiment' featureChromatograms( object, expandRt = 0, expandMz = 0, aggregationFun = "max", features = character(), return.type = "XChromatograms", chunkSize = 2L, mzmin = min, mzmax = max, rtmin = min, rtmax = max, ..., progressbar = TRUE, BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp' featureChromatograms( object, expandRt = 0, aggregationFun = "max", features, include = c("feature_only", "apex_within", "any", "all"), filled = FALSE, n = length(fileNames(object)), value = c("maxo", "into"), expandMz = 0, ... )
object |
|
... |
optional arguments to be passed along to the |
expandRt |
|
expandMz |
|
aggregationFun |
|
features |
|
return.type |
|
chunkSize |
For |
mzmin |
|
mzmax |
|
rtmin |
|
rtmax |
|
progressbar |
|
BPPARAM |
For |
include |
Only for |
filled |
Only for |
n |
Only for |
value |
Only for |
XChromatograms() object. In future, depending on parameter
return.type, the data might be returned as a different object.
The EIC data of a feature is extracted from every sample using the same
m/z - rt area. The EIC in a sample does thus not exactly represent the
signal of the actually identified chromatographic peak in that sample.
The chromPeakChromatograms() function would allow to extract the actual
EIC of the chromatographic peak in a specific sample. See also examples
below.
Parameters include, filled, n and value are only supported
for object being an XCMSnExp.
When extracting EICs from only the top n samples it can happen that one
or more of the features specified with features are dropped because they
have no detected peak in the top n samples. The chance for this to happen
is smaller if x contains also filled-in peaks (with fillChromPeaks).
Johannes Rainer
filterColumnsKeepTop() to filter the extracted EICs keeping only
the top n columns (samples) with the highest intensity.
chromPeakChromatograms() for a function to extract an EIC for each
chromatographic peak.
## Load a test data set with detected peaks library(xcms) library(MsExperiment) faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Perform correspondence analysis xdata <- groupChromPeaks(faahko_sub, param = PeakDensityParam(minFraction = 0.8, sampleGroups = rep(1, 3))) ## Get the feature definitions featureDefinitions(xdata) ## Extract ion chromatograms for the first 3 features. Parameter ## `features` can be either the feature IDs or feature indices. chrs <- featureChromatograms(xdata, features = rownames(featureDefinitions)[1:3]) ## Plot the EIC for the first feature using different colors for each file. plot(chrs[1, ], col = c("red", "green", "blue")) ## The EICs for all 3 samples use the same m/z and retention time range, ## which was defined using the `featureArea` function: featureArea(xdata, features = rownames(featureDefinitions(xdata))[1:3], mzmin = min, mzmax = max, rtmin = min, rtmax = max) ## To extract the actual (exact) EICs for each chromatographic peak of ## a feature in each sample, the `chromPeakChromatograms` function would ## need to be used instead. Below we extract the EICs for all ## chromatographic peaks of the first feature. We need to first get the ## IDs of all chromatographic peaks assigned to the first feature: peak_ids <- rownames(chromPeaks(xdata))[featureDefinitions(xdata)$peakidx[[1L]]] ## We can now pass these to the `chromPeakChromatograms` function with ## parameter `peaks`: eic_1 <- chromPeakChromatograms(xdata, peaks = peak_ids) ## To plot these into a single plot we need to use the ## `plotChromatogramsOverlay` function: plotChromatogramsOverlay(eic_1)## Load a test data set with detected peaks library(xcms) library(MsExperiment) faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Perform correspondence analysis xdata <- groupChromPeaks(faahko_sub, param = PeakDensityParam(minFraction = 0.8, sampleGroups = rep(1, 3))) ## Get the feature definitions featureDefinitions(xdata) ## Extract ion chromatograms for the first 3 features. Parameter ## `features` can be either the feature IDs or feature indices. chrs <- featureChromatograms(xdata, features = rownames(featureDefinitions)[1:3]) ## Plot the EIC for the first feature using different colors for each file. plot(chrs[1, ], col = c("red", "green", "blue")) ## The EICs for all 3 samples use the same m/z and retention time range, ## which was defined using the `featureArea` function: featureArea(xdata, features = rownames(featureDefinitions(xdata))[1:3], mzmin = min, mzmax = max, rtmin = min, rtmax = max) ## To extract the actual (exact) EICs for each chromatographic peak of ## a feature in each sample, the `chromPeakChromatograms` function would ## need to be used instead. Below we extract the EICs for all ## chromatographic peaks of the first feature. We need to first get the ## IDs of all chromatographic peaks assigned to the first feature: peak_ids <- rownames(chromPeaks(xdata))[featureDefinitions(xdata)$peakidx[[1L]]] ## We can now pass these to the `chromPeakChromatograms` function with ## parameter `peaks`: eic_1 <- chromPeakChromatograms(xdata, peaks = peak_ids) ## To plot these into a single plot we need to use the ## `plotChromatogramsOverlay` function: plotChromatogramsOverlay(eic_1)
This function returns spectra associated with the identified features in
the input object. By default, spectra are returned for all features (from
all MS levels), but parameter features allows to specify/select features
for which spectra should be returned.
Parameter msLevel allows to define whether MS level 1 or 2 spectra
should be returned. For msLevel = 1L MS1 spectra within the
retention time range of each chromatographic peak (in that respective
data file) associated with a feature are returned. For msLevel = 2L
MS2 spectra with a retention time within the retention time range and their
precursor m/z within the m/z range of any chromatographic peak of a feature
are returned. Thus, only MS2 spectra for chromatographic peaks associated
with the feature and also measured in the sample in which the
chromatographic was identified are reported. By default, all spectra
fulfilling the above described condition are reported. This can be adapted
with parameter method. See the description of method in the
chromPeakSpectra() documentation for more information. Internally,
featureSpectra() uses chromPeakSpectra() to extract the feature's
chromatographic peaks' spectra, thus any other parameter for this function
can be passed through ....
Note that with the default for parameter skipFilled (skipFilled = FALSE)
also gap-filled chromatographic peaks are considered. Use
skipFilled = TRUE to report only spectra for detected peaks.
The information from featureDefinitions for each feature can be included
in the returned Spectra::Spectra() object using the featureColumns
parameter.
This is useful for keeping details such as the median retention time
(rtmed) or median m/z (mzmed). The columns will retain their names
as specified in the featureDefinitions data, prefixed by "feature_"
(e.g., "feature_mzmed"). Additionally, the feature ID (i.e., the row
name of the feature in the featureDefinitions data frame) is always added
as a metadata column named "feature_id".
See also chromPeakSpectra(), as it supports a similar parameter for
including columns from the chromatographic peaks in the returned spectra
object.
These parameters can be used in combination to include information from both
the chromatographic peaks and the features in the returned
Spectra::Spectra().
The peak ID (i.e., the row name of the peak in the chromPeaks matrix)
is added as a metadata column named "chrom_peak_id".
featureSpectra(object, ...) ## S4 method for signature 'XcmsExperiment' featureSpectra( object, msLevel = 2L, expandRt = 0, expandMz = 0, ppm = 0, skipFilled = FALSE, return.type = c("Spectra", "List"), features = character(), featureColumns = c("rtmed", "mzmed"), ... ) ## S4 method for signature 'XCMSnExp' featureSpectra( object, msLevel = 2L, expandRt = 0, expandMz = 0, ppm = 0, skipFilled = FALSE, return.type = c("MSpectra", "Spectra", "list", "List"), features = character(), ... )featureSpectra(object, ...) ## S4 method for signature 'XcmsExperiment' featureSpectra( object, msLevel = 2L, expandRt = 0, expandMz = 0, ppm = 0, skipFilled = FALSE, return.type = c("Spectra", "List"), features = character(), featureColumns = c("rtmed", "mzmed"), ... ) ## S4 method for signature 'XCMSnExp' featureSpectra( object, msLevel = 2L, expandRt = 0, expandMz = 0, ppm = 0, skipFilled = FALSE, return.type = c("MSpectra", "Spectra", "list", "List"), features = character(), ... )
object |
XcmsExperiment or XCMSnExp object with feature defitions. |
... |
additional arguments to be passed along to |
msLevel |
|
expandRt |
|
expandMz |
|
ppm |
|
skipFilled |
|
return.type |
|
features |
|
featureColumns |
|
The function returns either a Spectra::Spectra() (for
return.type = "Spectra")
or a List of Spectra (for return.type = "List"). For the latter,
the order and the length matches parameter features (or if no features
is defined the order of the features in featureDefinitions(object)).
Spectra variables "chrom_peak_id" and "feature_id" define to which
chromatographic peak or feature each individual spectrum is associated
with.
Johannes Rainer
Simple function to calculate feature summaries. These include counts and
percentages of samples in which a chromatographic peak is present for each
feature and counts and percentages of samples in which more than one
chromatographic peak was annotated to the feature. Also relative standard
deviations (RSD) are calculated for the integrated peak areas per feature
across samples. For perSampleCounts = TRUE also the individual
chromatographic peak counts per sample are returned.
featureSummary( x, group, perSampleCounts = FALSE, method = "maxint", skipFilled = TRUE )featureSummary( x, group, perSampleCounts = FALSE, method = "maxint", skipFilled = TRUE )
x |
|
group |
|
perSampleCounts |
|
method |
|
skipFilled |
|
matrix with one row per feature and columns:
"count": the total number of samples in which a peak was found.
"perc": the percentage of samples in which a peak was found.
"multi_count": the total number of samples in which more than one peak
was assigned to the feature.
"multi_perc": the percentage of those samples in which a peak was found,
that have also multiple peaks annotated to the feature. Example: for a
feature, at least one peak was detected in 50 samples. In 5 of them 2 peaks
were assigned to the feature. "multi_perc" is in this case 10%.
"rsd": relative standard deviation (coefficient of variation) of the
integrated peak area of the feature's peaks.
The same 4 columns are repeated for each unique element (level) in group
if group was provided.
If perSampleCounts = TRUE also one column for each sample is returned
with the peak counts per sample.
Johannes Rainer
Gap filling integrate signal in the m/z-rt area of a feature (i.e., a
chromatographic peak group) for samples in which no chromatographic
peak for this feature was identified and add it to the chromPeaks()
matrix. Such filled-in peaks are indicated with a TRUE in column
"is_filled" in the result object's chromPeakData() data frame.
The method for gap filling along with its settings can be defined with
the param argument. Two different approaches are available:
param = FillChromPeaksParam(): the default of the original xcms
code. Signal is integrated from the m/z and retention time range as
defined in the featureDefinitions() data frame, i.e. from the
"rtmin", "rtmax", "mzmin" and "mzmax". This method is not
suggested as it underestimates the actual peak area and it is also
not available for object being an XcmsExperiment object. See
details below for more information and settings for this method.
param = ChromPeakAreaParam(): the area from which the signal for a
feature is integrated is defined based on the feature's chromatographic
peak areas. The m/z range is by default defined as the the lower quartile
of chromatographic peaks' "mzmin" value to the upper quartile of the
chromatographic peaks' "mzmax" values.
The retention time range for the area is defined analogously.
Alternatively, by setting mzmin = median,
mzmax = median, rtmin = median and rtmax = median in
ChromPeakAreaParam, the median "mzmin", "mzmax", "rtmin" and
"rtmax" values from all detected chromatographic peaks of a feature
would be used instead.
Parameter minMzWidthPpm allows in addition to define a minimal
guaranteed m/z width expressed in ppm of the features' m/z and centered
around it. The default is minMzWidthPpm = 0.0. With a
minMzWidthPpm > 0, the lower m/z boundary for a feature is defined as
the smaller value from the m/z derived from its chromatographic peaks'
"mzmin", and the feature's m/z minus minMzWidthPpm / 2 ppm of its
m/z. The upper m/z boundary is determined in the same way.
In contrast to the FillChromPeaksParam approach this method uses (all)
identified chromatographic peaks of a feature to define the area
from which the signal should be integrated.
fillChromPeaks(object, param, ...) ## S4 method for signature 'XcmsExperiment,ChromPeakAreaParam' fillChromPeaks( object, param, msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam() ) FillChromPeaksParam( expandMz = 0, expandRt = 0, ppm = 0, fixedMz = 0, fixedRt = 0 ) ChromPeakAreaParam( mzmin = function(z, na.rm = TRUE) quantile(z, probs = 0.25, names = FALSE, na.rm = na.rm), mzmax = function(z, na.rm = TRUE) quantile(z, probs = 0.75, names = FALSE, na.rm = na.rm), rtmin = function(z, na.rm = TRUE) quantile(z, probs = 0.25, names = FALSE, na.rm = na.rm), rtmax = function(z, na.rm = TRUE) quantile(z, probs = 0.75, names = FALSE, na.rm = na.rm), minMzWidthPpm = 0 ) ## S4 method for signature 'XCMSnExp,FillChromPeaksParam' fillChromPeaks(object, param, msLevel = 1L, BPPARAM = bpparam()) ## S4 method for signature 'XCMSnExp,ChromPeakAreaParam' fillChromPeaks(object, param, msLevel = 1L, BPPARAM = bpparam()) ## S4 method for signature 'XCMSnExp,missing' fillChromPeaks(object, param, BPPARAM = bpparam(), msLevel = 1L)fillChromPeaks(object, param, ...) ## S4 method for signature 'XcmsExperiment,ChromPeakAreaParam' fillChromPeaks( object, param, msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam() ) FillChromPeaksParam( expandMz = 0, expandRt = 0, ppm = 0, fixedMz = 0, fixedRt = 0 ) ChromPeakAreaParam( mzmin = function(z, na.rm = TRUE) quantile(z, probs = 0.25, names = FALSE, na.rm = na.rm), mzmax = function(z, na.rm = TRUE) quantile(z, probs = 0.75, names = FALSE, na.rm = na.rm), rtmin = function(z, na.rm = TRUE) quantile(z, probs = 0.25, names = FALSE, na.rm = na.rm), rtmax = function(z, na.rm = TRUE) quantile(z, probs = 0.75, names = FALSE, na.rm = na.rm), minMzWidthPpm = 0 ) ## S4 method for signature 'XCMSnExp,FillChromPeaksParam' fillChromPeaks(object, param, msLevel = 1L, BPPARAM = bpparam()) ## S4 method for signature 'XCMSnExp,ChromPeakAreaParam' fillChromPeaks(object, param, msLevel = 1L, BPPARAM = bpparam()) ## S4 method for signature 'XCMSnExp,missing' fillChromPeaks(object, param, BPPARAM = bpparam(), msLevel = 1L)
object |
|
param |
|
... |
currently ignored. |
msLevel |
|
chunkSize |
For |
BPPARAM |
Parallel processing settings. |
expandMz |
for |
expandRt |
for |
ppm |
for |
fixedMz |
for |
fixedRt |
for |
mzmin |
|
mzmax |
|
rtmin |
|
rtmax |
|
minMzWidthPpm |
For |
After correspondence (i.e. grouping of chromatographic peaks across
samples) there will always be features (peak groups) that do not include
peaks from every sample. The fillChromPeaks method defines
intensity values for such features in the missing samples by integrating
the signal in the m/z-rt region of the feature. Two different approaches
to define this region are available: with ChromPeakAreaParam the region
is defined based on the detected chromatographic peaks of a feature,
while with FillChromPeaksParam the region is defined based on the m/z and
retention times of the feature (which represent the m/z and retentention
times of the apex position of the associated chromatographic peaks). For the
latter approach various parameters are available to increase the area from
which signal is to be integrated, either by a constant value (fixedMz and
fixedRt) or by a feature-relative amount (expandMz and expandRt).
Adjusted retention times will be used if available.
Based on the peak finding algorithm that was used to identify the
(chromatographic) peaks, different internal functions are used to
guarantee that the integrated peak signal matches as much as possible
the peak signal integration used during the peak detection. For peaks
identified with the matchedFilter() method, signal
integration is performed on the profile matrix generated with
the same settings used also during peak finding (using the same
bin size for example). For direct injection data and peaks
identified with the MSW algorithm signal is integrated
only along the mz dimension. For all other methods the complete (raw)
signal within the area is used.
An XcmsExperiment or XCMSnExp object with previously missing
chromatographic peaks for features filled into its chromPeaks() matrix.
The FillChromPeaksParam() function returns a
FillChromPeaksParam object.
The reported "mzmin", "mzmax", "rtmin" and
"rtmax" for the filled peaks represents the actual MS area from
which the signal was integrated.
No peak is filled in if no signal was present in a file/sample
in the respective mz-rt area. These samples will still show a NA
in the matrix returned by the featureValues() method.
Johannes Rainer
groupChromPeaks() for methods to perform the correspondence.
featureArea for the function to define the m/z-retention time region for each feature.
## Load a test data set with identified chromatographic peaks library(xcms) library(MsExperiment) res <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Perform the correspondence. We assign all samples to the same group. res <- groupChromPeaks(res, param = PeakDensityParam(sampleGroups = rep(1, length(res)))) ## For how many features do we lack an integrated peak signal? sum(is.na(featureValues(res))) ## Filling missing peak data using the peak area from identified ## chromatographic peaks. res <- fillChromPeaks(res, param = ChromPeakAreaParam()) ## Alternatively, force a minimal guaranteed m/z width for the regions ## to integrate signal from. res <- fillChromPeaks(res, param = ChromPeakAreaParam(minMzWidthPpm = 10)) ## How many missing values do we have after peak filling? sum(is.na(featureValues(res))) ## Get the peaks that have been filled in: fp <- chromPeaks(res)[chromPeakData(res)$is_filled, ] head(fp) ## Get the process history step along with the parameters used to perform ## The peak filling: ph <- processHistory(res, type = "Missing peak filling")[[1]] ph ## The parameter class: ph@param ## It is also possible to remove filled-in peaks: res <- dropFilledChromPeaks(res) sum(is.na(featureValues(res)))## Load a test data set with identified chromatographic peaks library(xcms) library(MsExperiment) res <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Perform the correspondence. We assign all samples to the same group. res <- groupChromPeaks(res, param = PeakDensityParam(sampleGroups = rep(1, length(res)))) ## For how many features do we lack an integrated peak signal? sum(is.na(featureValues(res))) ## Filling missing peak data using the peak area from identified ## chromatographic peaks. res <- fillChromPeaks(res, param = ChromPeakAreaParam()) ## Alternatively, force a minimal guaranteed m/z width for the regions ## to integrate signal from. res <- fillChromPeaks(res, param = ChromPeakAreaParam(minMzWidthPpm = 10)) ## How many missing values do we have after peak filling? sum(is.na(featureValues(res))) ## Get the peaks that have been filled in: fp <- chromPeaks(res)[chromPeakData(res)$is_filled, ] head(fp) ## Get the process history step along with the parameters used to perform ## The peak filling: ph <- processHistory(res, type = "Missing peak filling")[[1]] ph ## The parameter class: ph@param ## It is also possible to remove filled-in peaks: res <- dropFilledChromPeaks(res) sum(is.na(featureValues(res)))
For each sample, identify peak groups where that sample is not represented. For each of those peak groups, integrate the signal in the region of that peak group and create a new peak.
object |
the |
method |
the filling method |
After peak grouping, there will always be peak groups that do not include peaks from every sample. This method produces intensity values for those missing samples by integrating raw data in peak group region. According to the type of raw-data there are 2 different methods available. for filling gcms/lcms data the method "chrom" integrates raw-data in the chromatographic domain, whereas "MSW" is used for peaklists without retention-time information like those from direct-infusion spectra.
A xcmsSet objects with filled in peak groups.
fillPeaks(object, method="")
For each sample, identify peak groups where that sample is not represented. For each of those peak groups, integrate the signal in the region of that peak group and create a new peak.
object |
the |
nSlaves |
(DEPRECATED): number of slaves/cores to be used for
parallel peak filling.
MPI is used if installed, otherwise the snow package is employed for
multicore support. If none of the two packages is available it uses
the parallel package for parallel processing on multiple CPUs of the
current machine. Users are advised to use the |
expand.mz |
Expansion factor for the m/z range used for integration. |
expand.rt |
Expansion factor for the rentention time range used for integration. |
BPPARAM |
allows to define a specific parallel processing setup
for the current task (see |
After peak grouping, there will always be peak groups that do not include peaks from every sample. This method produces intensity values for those missing samples by integrating raw data in peak group region. In a given group, the start and ending retention time points for integration are defined by the median start and end points of the other detected peaks. The start and end m/z values are similarly determined. Intensities can be still be zero, which is a rather unusual intensity for a peak. This is the case if e.g. the raw data was threshholded, and the integration area contains no actual raw intensities, or if one sample is miscalibrated, such thet the raw data points are (just) outside the integration area.
Importantly, if retention time correction data is available, the alignment information is used to more precisely integrate the propper region of the raw data. If the corrected retention time is beyond the end of the raw data, the value will be not-a-number (NaN).
A xcmsSet objects with filled in peak groups (into and maxo).
fillPeaks.chrom(object, nSlaves=0,expand.mz=1,expand.rt=1,
BPPARAM = bpparam())
xcmsSet-class,
getPeaks
fillPeaks
For each sample, identify peak groups where that sample is not represented. For each of those peak groups, integrate the signal in the region of that peak group and create a new peak.
object |
the |
After peak grouping, there will always be peak groups that do not include peaks from every sample. This method produces intensity values for those missing samples by integrating raw data in peak group region. In a given group, the start and ending m/z values for integration are defined by the median start and end points of the other detected peaks.
A xcmsSet objects with filled in peak groups.
fillPeaks.MSW(object)
In contrast to the fillPeaks.chrom method the maximum
intensity reported in column "maxo" is not the maximum
intensity measured in the expected peak area (defined by columns
"mzmin" and "mzmax"), but the largest intensity of mz
value(s) closest to the "mzmed" of the feature.
xcmsSet-class,
getPeaks
fillPeaks
These functions allow to filter (subset) MSnbase::MChromatograms() or
XChromatograms() objects, i.e. sets of chromatographic data, without
changing the data (intensity and retention times) within the individual
chromatograms (MSnbase::Chromatogram() objects).
filterColumnsIntensityAbove: subsets a MChromatograms objects keeping
only columns (samples) for which value is larger than the provided
threshold in which rows (i.e. if which = "any" a
column is kept if any of the chromatograms in that column have a
value larger than threshold or with which = "all" all
chromatograms in that column fulfill this criteria). Parameter value
allows to define on which value the comparison should be performed, with
value = "bpi" the maximum intensity of each chromatogram is compared to
threshold, with value = "tic" the total sum of intensities of each chromatogram is compared to threshold. For XChromatogramsobject,value = "maxo"andvalue = "into"are supported which compares the largest intensity of all identified chromatographic peaks in the chromatogram withthreshold', or the integrated peak area, respectively.
filterColumnsKeepTop: subsets a MChromatograms object keeping the top
n columns sorted by the value specified with sortBy. In detail, for
each column the value defined by sortBy is extracted from each
chromatogram and aggregated using the aggregationFun. Thus, by default,
for each chromatogram the maximum intensity is determined
(sortBy = "bpi") and these values are summed up for chromatograms in the
same column (aggregationFun = sum). The columns are then sorted by these
values and the top n columns are retained in the returned
MChromatograms. Similar to the filterColumnsIntensityAbove function,
this function allows to use for XChromatograms objects to sort the
columns by column sortBy = "maxo" or sortBy = "into" of the
chromPeaks matrix.
## S4 method for signature 'MChromatograms' filterColumnsIntensityAbove( object, threshold = 0, value = c("bpi", "tic"), which = c("any", "all") ) ## S4 method for signature 'MChromatograms' filterColumnsKeepTop( object, n = 1L, sortBy = c("bpi", "tic"), aggregationFun = sum ) ## S4 method for signature 'XChromatograms' filterColumnsIntensityAbove( object, threshold = 0, value = c("bpi", "tic", "maxo", "into"), which = c("any", "all") ) ## S4 method for signature 'XChromatograms' filterColumnsKeepTop( object, n = 1L, sortBy = c("bpi", "tic", "maxo", "into"), aggregationFun = sum )## S4 method for signature 'MChromatograms' filterColumnsIntensityAbove( object, threshold = 0, value = c("bpi", "tic"), which = c("any", "all") ) ## S4 method for signature 'MChromatograms' filterColumnsKeepTop( object, n = 1L, sortBy = c("bpi", "tic"), aggregationFun = sum ) ## S4 method for signature 'XChromatograms' filterColumnsIntensityAbove( object, threshold = 0, value = c("bpi", "tic", "maxo", "into"), which = c("any", "all") ) ## S4 method for signature 'XChromatograms' filterColumnsKeepTop( object, n = 1L, sortBy = c("bpi", "tic", "maxo", "into"), aggregationFun = sum )
object |
|
threshold |
for |
value |
|
which |
for |
n |
for |
sortBy |
for |
aggregationFun |
for |
a filtered MChromatograms (or XChromatograms) object with the
same number of rows (EICs) but eventually a lower number of columns
(samples).
Johannes Rainer
library(MSnbase) chr1 <- Chromatogram(rtime = 1:10 + rnorm(n = 10, sd = 0.3), intensity = c(5, 29, 50, NA, 100, 12, 3, 4, 1, 3)) chr2 <- Chromatogram(rtime = 1:10 + rnorm(n = 10, sd = 0.3), intensity = c(80, 50, 20, 10, 9, 4, 3, 4, 1, 3)) chr3 <- Chromatogram(rtime = 3:9 + rnorm(7, sd = 0.3), intensity = c(53, 80, 130, 15, 5, 3, 2)) chrs <- MChromatograms(list(chr1, chr2, chr1, chr3, chr2, chr3), ncol = 3, byrow = FALSE) chrs #### filterColumnsIntensityAbove ## ## Keep all columns with for which the maximum intensity of any of its ## chromatograms is larger 90 filterColumnsIntensityAbove(chrs, threshold = 90) ## Require that ALL chromatograms in a column have a value larger 90 filterColumnsIntensityAbove(chrs, threshold = 90, which = "all") ## If none of the columns fulfills the criteria no columns are returned filterColumnsIntensityAbove(chrs, threshold = 900) ## Filtering XChromatograms allow in addition to filter on the columns ## "maxo" or "into" of the identified chromatographic peaks within each ## chromatogram. #### filterColumnsKeepTop ## ## Keep the 2 columns with the highest sum of maximal intensities in their ## chromatograms filterColumnsKeepTop(chrs, n = 1) ## Keep the 50 percent of columns with the highest total sum of signal. Note ## that n will be rounded to the next larger integer value filterColumnsKeepTop(chrs, n = 0.5 * ncol(chrs), sortBy = "tic")library(MSnbase) chr1 <- Chromatogram(rtime = 1:10 + rnorm(n = 10, sd = 0.3), intensity = c(5, 29, 50, NA, 100, 12, 3, 4, 1, 3)) chr2 <- Chromatogram(rtime = 1:10 + rnorm(n = 10, sd = 0.3), intensity = c(80, 50, 20, 10, 9, 4, 3, 4, 1, 3)) chr3 <- Chromatogram(rtime = 3:9 + rnorm(7, sd = 0.3), intensity = c(53, 80, 130, 15, 5, 3, 2)) chrs <- MChromatograms(list(chr1, chr2, chr1, chr3, chr2, chr3), ncol = 3, byrow = FALSE) chrs #### filterColumnsIntensityAbove ## ## Keep all columns with for which the maximum intensity of any of its ## chromatograms is larger 90 filterColumnsIntensityAbove(chrs, threshold = 90) ## Require that ALL chromatograms in a column have a value larger 90 filterColumnsIntensityAbove(chrs, threshold = 90, which = "all") ## If none of the columns fulfills the criteria no columns are returned filterColumnsIntensityAbove(chrs, threshold = 900) ## Filtering XChromatograms allow in addition to filter on the columns ## "maxo" or "into" of the identified chromatographic peaks within each ## chromatogram. #### filterColumnsKeepTop ## ## Keep the 2 columns with the highest sum of maximal intensities in their ## chromatograms filterColumnsKeepTop(chrs, n = 1) ## Keep the 50 percent of columns with the highest total sum of signal. Note ## that n will be rounded to the next larger integer value filterColumnsKeepTop(chrs, n = 0.5 * ncol(chrs), sortBy = "tic")
xcms Result ObjectThe XcmsExperiment is a data container for xcms preprocessing
results (i.e. results from chromatographic peak detection, alignment and
correspondence analysis). It is the preferred and default result object
since version 4 of xcms.
It provides the same functionality than the XCMSnExp object, but uses the more advanced and modern MS infrastructure provided by the MsExperiment and Spectra Bioconductor packages. This enables a much higher flexibility of data representation and storage and ensures future expandability.
Documentation of the various functions for XcmsExperiment objects are
grouped by topic and provided in the sections below.
The default xcms data analysis workflow is to perform:
chromatographic peak detection using findChromPeaks()
optionally refine identified chromatographic peaks using
refineChromPeaks() (this is highly suggested for centWave-based
chromatographic peak detection)
retention time alignment (retention time adjustment) using adjustRtime().
Depending on the method used, this may require to run a correspondence
analysis first
correspondence analysis to group chromatographic peaks across samples
to define the LC-MS features using the groupChromPeaks() function
gap-filling to rescue signal in samples in which no chromatographic
peak was identified and hence a missing value would be reported. This
can be performed using the fillChromPeaks() function.
For very large LC-MS experiments (either with a very large number of samples
or very large data files, or both), the XcmsExperimentHdf5() object can
be used instead. See the respective help page for more information.
filterFeatureDefinitions(object, ...) ## S4 method for signature 'MsExperiment' filterRt(object, rt = numeric(), ...) ## S4 method for signature 'MsExperiment' filterMzRange(object, mz = numeric(), msLevel. = uniqueMsLevels(object)) ## S4 method for signature 'MsExperiment' filterMz(object, mz = numeric(), msLevel. = uniqueMsLevels(object)) ## S4 method for signature 'MsExperiment' filterMsLevel(object, msLevel. = uniqueMsLevels(object)) ## S4 method for signature 'MsExperiment' uniqueMsLevels(object) ## S4 method for signature 'MsExperiment' filterFile(object, file = integer(), ...) ## S4 method for signature 'MsExperiment' rtime(object) ## S4 method for signature 'MsExperiment' fromFile(object) ## S4 method for signature 'MsExperiment' fileNames(object) ## S4 method for signature 'MsExperiment' polarity(object) ## S4 method for signature 'MsExperiment' filterIsolationWindow(object, mz = numeric()) ## S4 method for signature 'MsExperiment' chromatogram( object, rt = matrix(nrow = 0, ncol = 2), mz = matrix(nrow = 0, ncol = 2), aggregationFun = "sum", msLevel = 1L, isolationWindowTargetMz = NULL, chunkSize = 2L, return.type = "MChromatograms", BPPARAM = bpparam() ) ## S4 method for signature 'MsExperiment,missing' plot(x, y, msLevel = 1L, peakCol = "#ff000060", ...) ## S3 method for class 'XcmsExperiment' c(...) ## S4 method for signature 'XcmsExperiment,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'XcmsExperiment' filterIsolationWindow(object, mz = numeric()) ## S4 method for signature 'XcmsExperiment' filterRt(object, rt, msLevel.) ## S4 method for signature 'XcmsExperiment' filterMzRange(object, mz = numeric(), msLevel. = uniqueMsLevels(object)) ## S4 method for signature 'XcmsExperiment' filterMsLevel(object, msLevel. = uniqueMsLevels(object)) ## S4 method for signature 'XcmsExperiment' hasChromPeaks(object, msLevel = integer()) ## S4 method for signature 'XcmsExperiment' dropChromPeaks(object, keepAdjustedRtime = FALSE) ## S4 replacement method for signature 'XcmsExperiment' chromPeaks(object) <- value ## S4 method for signature 'XcmsExperiment' chromPeaks( object, rt = numeric(), mz = numeric(), ppm = 0, msLevel = integer(), type = c("any", "within", "apex_within"), isFilledColumn = FALSE, columns = character() ) ## S4 replacement method for signature 'XcmsExperiment' chromPeakData(object) <- value ## S4 method for signature 'XcmsExperiment' chromPeakData( object, msLevel = integer(), columns = character(), return.type = c("DataFrame", "data.frame") ) ## S4 method for signature 'XcmsExperiment' filterChromPeaks( object, keep = rep(TRUE, nrow(.chromPeaks(object))), method = "keep", ... ) ## S4 method for signature 'XcmsExperiment' dropAdjustedRtime(object) ## S4 method for signature 'MsExperiment' hasAdjustedRtime(object) ## S4 method for signature 'XcmsExperiment' rtime(object, adjusted = hasAdjustedRtime(object)) ## S4 method for signature 'XcmsExperiment' adjustedRtime(object) ## S4 method for signature 'XcmsExperiment' hasFeatures(object, msLevel = integer()) ## S4 method for signature 'XcmsResult' featureArea( object, mzmin = min, mzmax = max, rtmin = min, rtmax = max, features = character(), minMzWidthPpm = 0 ) ## S4 replacement method for signature 'XcmsExperiment' featureDefinitions(object) <- value ## S4 method for signature 'XcmsExperiment' featureDefinitions( object, mz = numeric(), rt = numeric(), ppm = 0, type = c("any", "within", "apex_within"), msLevel = integer() ) ## S4 method for signature 'XcmsExperiment' dropFeatureDefinitions(object, keepAdjustedRtime = FALSE) ## S4 method for signature 'XcmsExperiment' filterFeatureDefinitions(object, features = integer()) ## S4 method for signature 'XcmsExperiment' hasFilledChromPeaks(object) ## S4 method for signature 'XcmsExperiment' dropFilledChromPeaks(object) ## S4 method for signature 'XcmsExperiment' quantify(object, ...) ## S4 method for signature 'XcmsExperiment' featureValues( object, method = c("medret", "maxint", "sum"), value = "into", intensity = "into", filled = TRUE, missing = NA_real_, msLevel = integer() ) ## S4 method for signature 'XcmsExperiment' chromatogram( object, rt = matrix(nrow = 0, ncol = 2), mz = matrix(nrow = 0, ncol = 2), aggregationFun = "sum", msLevel = 1L, chunkSize = 2L, isolationWindowTargetMz = NULL, return.type = c("XChromatograms", "MChromatograms"), include = character(), chromPeaks = c("apex_within", "any", "none"), BPPARAM = bpparam() ) ## S4 method for signature 'XcmsExperiment' processHistory(object, type) ## S4 method for signature 'XcmsExperiment' filterFile( object, file, keepAdjustedRtime = hasAdjustedRtime(object), keepFeatures = FALSE, ... )filterFeatureDefinitions(object, ...) ## S4 method for signature 'MsExperiment' filterRt(object, rt = numeric(), ...) ## S4 method for signature 'MsExperiment' filterMzRange(object, mz = numeric(), msLevel. = uniqueMsLevels(object)) ## S4 method for signature 'MsExperiment' filterMz(object, mz = numeric(), msLevel. = uniqueMsLevels(object)) ## S4 method for signature 'MsExperiment' filterMsLevel(object, msLevel. = uniqueMsLevels(object)) ## S4 method for signature 'MsExperiment' uniqueMsLevels(object) ## S4 method for signature 'MsExperiment' filterFile(object, file = integer(), ...) ## S4 method for signature 'MsExperiment' rtime(object) ## S4 method for signature 'MsExperiment' fromFile(object) ## S4 method for signature 'MsExperiment' fileNames(object) ## S4 method for signature 'MsExperiment' polarity(object) ## S4 method for signature 'MsExperiment' filterIsolationWindow(object, mz = numeric()) ## S4 method for signature 'MsExperiment' chromatogram( object, rt = matrix(nrow = 0, ncol = 2), mz = matrix(nrow = 0, ncol = 2), aggregationFun = "sum", msLevel = 1L, isolationWindowTargetMz = NULL, chunkSize = 2L, return.type = "MChromatograms", BPPARAM = bpparam() ) ## S4 method for signature 'MsExperiment,missing' plot(x, y, msLevel = 1L, peakCol = "#ff000060", ...) ## S3 method for class 'XcmsExperiment' c(...) ## S4 method for signature 'XcmsExperiment,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'XcmsExperiment' filterIsolationWindow(object, mz = numeric()) ## S4 method for signature 'XcmsExperiment' filterRt(object, rt, msLevel.) ## S4 method for signature 'XcmsExperiment' filterMzRange(object, mz = numeric(), msLevel. = uniqueMsLevels(object)) ## S4 method for signature 'XcmsExperiment' filterMsLevel(object, msLevel. = uniqueMsLevels(object)) ## S4 method for signature 'XcmsExperiment' hasChromPeaks(object, msLevel = integer()) ## S4 method for signature 'XcmsExperiment' dropChromPeaks(object, keepAdjustedRtime = FALSE) ## S4 replacement method for signature 'XcmsExperiment' chromPeaks(object) <- value ## S4 method for signature 'XcmsExperiment' chromPeaks( object, rt = numeric(), mz = numeric(), ppm = 0, msLevel = integer(), type = c("any", "within", "apex_within"), isFilledColumn = FALSE, columns = character() ) ## S4 replacement method for signature 'XcmsExperiment' chromPeakData(object) <- value ## S4 method for signature 'XcmsExperiment' chromPeakData( object, msLevel = integer(), columns = character(), return.type = c("DataFrame", "data.frame") ) ## S4 method for signature 'XcmsExperiment' filterChromPeaks( object, keep = rep(TRUE, nrow(.chromPeaks(object))), method = "keep", ... ) ## S4 method for signature 'XcmsExperiment' dropAdjustedRtime(object) ## S4 method for signature 'MsExperiment' hasAdjustedRtime(object) ## S4 method for signature 'XcmsExperiment' rtime(object, adjusted = hasAdjustedRtime(object)) ## S4 method for signature 'XcmsExperiment' adjustedRtime(object) ## S4 method for signature 'XcmsExperiment' hasFeatures(object, msLevel = integer()) ## S4 method for signature 'XcmsResult' featureArea( object, mzmin = min, mzmax = max, rtmin = min, rtmax = max, features = character(), minMzWidthPpm = 0 ) ## S4 replacement method for signature 'XcmsExperiment' featureDefinitions(object) <- value ## S4 method for signature 'XcmsExperiment' featureDefinitions( object, mz = numeric(), rt = numeric(), ppm = 0, type = c("any", "within", "apex_within"), msLevel = integer() ) ## S4 method for signature 'XcmsExperiment' dropFeatureDefinitions(object, keepAdjustedRtime = FALSE) ## S4 method for signature 'XcmsExperiment' filterFeatureDefinitions(object, features = integer()) ## S4 method for signature 'XcmsExperiment' hasFilledChromPeaks(object) ## S4 method for signature 'XcmsExperiment' dropFilledChromPeaks(object) ## S4 method for signature 'XcmsExperiment' quantify(object, ...) ## S4 method for signature 'XcmsExperiment' featureValues( object, method = c("medret", "maxint", "sum"), value = "into", intensity = "into", filled = TRUE, missing = NA_real_, msLevel = integer() ) ## S4 method for signature 'XcmsExperiment' chromatogram( object, rt = matrix(nrow = 0, ncol = 2), mz = matrix(nrow = 0, ncol = 2), aggregationFun = "sum", msLevel = 1L, chunkSize = 2L, isolationWindowTargetMz = NULL, return.type = c("XChromatograms", "MChromatograms"), include = character(), chromPeaks = c("apex_within", "any", "none"), BPPARAM = bpparam() ) ## S4 method for signature 'XcmsExperiment' processHistory(object, type) ## S4 method for signature 'XcmsExperiment' filterFile( object, file, keepAdjustedRtime = hasAdjustedRtime(object), keepFeatures = FALSE, ... )
object |
An |
... |
Additional optional parameters. For |
rt |
For |
mz |
For |
msLevel. |
For |
file |
For |
aggregationFun |
For |
msLevel |
|
isolationWindowTargetMz |
For |
chunkSize |
For |
return.type |
For |
BPPARAM |
For |
x |
An |
y |
For |
peakCol |
For |
i |
For |
j |
For |
drop |
For |
keepAdjustedRtime |
|
value |
For |
ppm |
For |
type |
For |
isFilledColumn |
For |
columns |
For |
keep |
For |
method |
For |
adjusted |
For |
mzmin |
For |
mzmax |
For |
rtmin |
For |
rtmax |
For |
features |
For |
minMzWidthPpm |
For |
intensity |
For |
filled |
For |
missing |
For |
include |
For |
chromPeaks |
For |
keepFeatures |
for most subsetting functions ( |
[: subset an XcmsExperiment by sample (parameter i). Subsetting
will by default drop correspondence results (as subsetting by samples will
obviously affect the feature definition) while alignment results (adjusted
retention times) and identified chromatographic peaks (for the selected
samples) will be retained. Which preprocessing results should be
kept or dropped can also be configured with optional parameters
keepChromPeaks (by default TRUE), keepAdjustedRtime (by default
TRUE) and keepFeatures (by default FALSE).
c(): multiple XcmsExperiment objects can be combined into one using the
c() function. This requires however that all the XcmsExperiments'
Spectra objects use the same type of MsBackend and that their
processing queues are empty. Also, only combining of peak detection
results is supported. Any eventually present alignment or correspondence
results will be dropped before combining the XcmsExperiment objects.
Finally, at present, only the MS data of the individual XcmsExperiment
objects is combined and any data eventually present in the @qdata,
@otherData and @experimentFiles slots is ignored.
The function returns a XcmsExperiment objects with the combined MS data
(Spectra objects) and chromatographic peak detection results.
filterChromPeaks(): filter chromatographic peaks of an XcmsExperiment
keeping only those specified with parameter keep. Returns the
XcmsExperiment with the filtered data. Chromatographic peaks to
retain can be specified either by providing their index in the
chromPeaks() matrix, their ID (rowname in chromPeaks()) or with a
logical vector with the same length than number of rows of
chromPeaks(). Assignment of chromatographic peaks are updated to
eventually present feature definitions after filtering.
filterFeatureDefinitions(): filter feature definitions of an
XcmsExperiment keeping only those defined with parameter features,
which can be a logical of length equal to the number of features,
an integer with the index of the features in
featureDefinitions(object) to keep or a character with the feature
IDs (i.e. row names in featureDefinitions(object)).
filterFile(): filter an XcmsExperiment (or MsExperiment) by file
(sample). The index of the samples to which the data should be subsetted
can be specified with parameter file. The sole purpose of this function
is to provide backward compatibility with the MSnbase package. Wherever
possible, the [ function should be used instead for any sample-based
subsetting. Parameters keepChromPeaks, keepAdjustedRtime and
keepChromPeaks can be passed using ....
Note also that in contrast to [, filterFile() does not support
subsetting in arbitrary order.
filterIsolationWindow(): filter the spectra within an MsExperiment
or XcmsExperiment object keeping only those with an isolation window
containing the specified m/z (i.e., keeping spectra with an
"isolationWindowLowerMz" smaller than the user-provided mz and an
"isolationWindowUpperMz" larger than mz). For an XcmsExperiment also
all chromatographic peaks (and subsequently also features) are removed for
which the range of their "isolationWindowLowerMz" and
"isolationWindowUpperMz" (columns in chromPeakData()) do not contain
the user provided mz.
filterMsLevel(): filter the data of the XcmsExperiment or
MsExperiment to keep only data of the MS level(s) specified with
parameter msLevel..
filterMz(), filterMzRange(): filter the spectra within an
XcmsExperiment or MsExperiment to the specified m/z range (parameter
mz). For XcmsExperiment also identified chromatographic peaks and
features are filtered keeping only those that are within the specified
m/z range (i.e. for which the m/z of the peak apex is within the m/z
range). Parameter msLevels. allows to restrict the filtering to
only specified MS levels. By default data from all MS levels are
filtered.
filterRt(): filter an XcmsExperiment keeping only data within the
specified retention time range (parameter rt). This function will keep
all preprocessing results present within the retention time range: all
identified chromatographic peaks with the retention time of the apex
position within the retention time range rt are retained along, if
present, with the associated features.
Parameter msLevel. is currently ignored, i.e. filtering will always
performed on all MS levels of the object.
chromatogram(): extract chromatographic data from a data set. Parameters
mz and rt allow to define specific m/z - retention time regions to
extract the data from (to e.g. for extracted ion chromatograms EICs).
Both parameters are expected to be numerical two-column matrices with
the first column defining the lower and the second the upper margin.
Each row can define a separate m/z - retention time region. Currently
the function returns a MSnbase::MChromatograms() object for object
being a MsExperiment or, for object being an XcmsExperiment,
either a MChromatograms or XChromatograms() depending on parameter
return.type (can be either "MChromatograms" or "XChromatograms").
For the latter also chromatographic peaks detected within the provided
m/z and retention times are returned. Parameter chromPeaks allows
to specify which chromatographic peaks should be reported. See
documentation on the chromPeaks parameter for more information.
If the XcmsExperiment contains correspondence results, also the
associated feature definitions will be included in the returned
XChromatograms. By default the function returns chromatograms from MS1
data, but by setting parameter msLevel = 2L it is possible to e.g.
extract also MS2 chromatograms. By default, with parameter
isolationWindowTargetMz = NULL or isolationWindowTargetMz = NA_real_,
data from all MS2 spectra will be considered in the chromatogram
extraction. If MS2 data was generated within different m/z isolation
windows (such as e.g. with Scies SWATH data), the parameter
isolationWindowTargetMz should be used to ensure signal is only extracted
from the respective isolation window. The isolationWindowTargetMz()
function on the Spectra object can be used to inspect/list available
isolation windows of a data set. See also the xcms LC-MS/MS vignette for
examples and details.
chromPeaks(): returns a numeric matrix with the identified
chromatographic peaks. Each row represents a chromatographic peak
identified in one sample (file). The number of columns depends on the
peak detection algorithm (see findChromPeaks()) but most methods return
the following columns: "mz" (intensity-weighted mean of the m/z values
of all mass peaks included in the chromatographic peak), "mzmin" (
smallest m/z value of any mass peak in the chromatographic peak), "mzmax"
(largest m/z value of any mass peak in the chromatographic peak), "rt"
(retention time of the peak apex), "rtmin" (retention time of the first
scan/mass peak of the chromatographic peak), "rtmax" (retention time of
the last scan/mass peak of the chromatographic peak), "into" (integrated
intensity of the chromatographic peak), "maxo" (maximal intensity of any
mass peak of the chromatographic peak), "sample" (index of the sample
in object in which the peak was identified). Parameters rt, mz,
ppm, msLevel and type allow to extract subsets of identified
chromatographic peaks from the object. Parameter columns allows to
optionally define which columns to extract. See parameter description
below for details.
chromPeakData(): returns a DataFrame with potential additional
annotations for the identified chromatographic peaks. Each row in this
DataFrame corresponds to a row (same index and row name) in the
chromPeaks() matrix. The default annotations are "ms_level" (the MS
level in which the peak was identified) and "is_filled" (whether the
chromatographic peak was detected (by findChromPeaks()) or
filled-in (by fillChromPeaks()). Parameter columns can be used to
restrict the returned data frame to selected columns. Parameter
return.type can be used to specify the type of returned objects, either
a DataFrame (the default, return.type = "DataFrame") or a data.frame
(return.type = "data.frame").
chromPeakSpectra(): extract MS spectra for identified chromatographic
peaks. This can be either all (full scan) MS1 spectra with retention
times between the retention time range of a chromatographic peak, all
MS2 spectra (if present) with a retention time within the retention
time range of a (MS1) chromatographic peak and a precursor m/z within
the m/z range of the chromatographic peak or single, selected spectra
depending on their total signal or highest signal. Parameter msLevel
allows to define from which MS level spectra should be extracted,
parameter method allows to define if all or selected spectra should
be returned. See chromPeakSpectra() for details.
dropChromPeaks(): removes (all) chromatographic peak detection results
from object. This will also remove any correspondence results (i.e.
features) and eventually present adjusted retention times from the object
if the alignment was performed after the peak detection.
Alignment results (adjusted retention times) can be retained if parameter
keepAdjustedRtime is set to TRUE.
dropFilledChromPeaks(): removes chromatographic peaks added by gap
filling with fillChromPeaks().
fillChromPeaks(): perform gap filling to integrate signal missing
values in samples in which no chromatographic peak was found. This
depends on correspondence results, hence groupChromPeaks() needs to be
called first. For details and options see fillChromPeaks().
findChromPeaks: perform chromatographic peak detection. See
findChromPeaks() for details.
hasChromPeaks(): whether the object contains peak detection results.
Parameter msLevel allows to check whether peak detection results are
available for the specified MS level(s).
hasFilledChromPeaks(): whether gap-filling results (i.e., filled-in
chromatographic peaks) are present.
manualChromPeaks(): manually add chromatographic peaks by defining
their m/z and retention time ranges. See manualChromPeaks() for
details and examples.
plotChromPeakImage(): show the density of identified chromatographic
peaks per file along the retention time. See plotChromPeakImage() for
details.
plotChromPeaks(): indicate identified chromatographic peaks from one
sample in the RT-m/z space. See plotChromPeaks() for details.
plotPrecursorIons(): general visualization of precursor ions of
LC-MS/MS data. See plotPrecursorIons() for details.
refineChromPeaks(): refines identified chromatographic peaks in
object. See refineChromPeaks() for details.
adjustedRtime(): extract adjusted retention times. This is just an
alias for rtime(object, adjusted = TRUE).
adjustRtime(): performs retention time adjustment (alignment) of the
data. See adjustRtime() for details.
applyAdjustedRtime(): replaces the original (raw) retention times with
the adjusted ones. See applyAdjustedRtime() for more information.
dropAdjustedRtime(): drops alignment results (adjusted retention time)
from the result object. This also reverts the retention times of
identified chromatographic peaks if present in the result object. Note
that any results from a correspondence analysis (i.e. feature definitions)
will be dropped too (if the correspondence analysis was performed
after the alignment). This can be overruled with
keepAdjustedRtime = TRUE.
hasAdjustedRtime(): whether alignment was performed on the object (i.e.,
the object contains alignment results).
plotAdjustedRtime(): plot the alignment results; see
plotAdjustedRtime() for more information.
dropFeatureDefinitions(): removes any correspondence analysis results
from object as well as any filled-in chromatographic peaks. By default
(with parameter keepAdjustedRtime = FALSE) also all alignment results
will be removed if alignment was performed after the correspondence
analysis. This can be overruled with keepAdjustedRtime = TRUE.
featureArea(): returns a matrix with columns "mzmin", "mzmax",
"rtmin" and "rtmax" with the m/z and retention time range for each
feature (row) in object. By default these represent the minimal m/z
and retention times as well as maximal m/z and retention times for
all chromatographic peaks assigned to that feature. Parameter
minMzWidthPpm (default minMzWidthPpm = 0.01) can be used to define a minimal required (total) m/z width expressed in ppm of the features' m/z. With a minMzWidthPpmlarger than 0 the reported"mzmin"is the minimum of the determined minimal m/z for a feature (based on parametermzmin) and the m/z of the feature minus minMzWidthPpm / 2ppm of the feature's m/z value. The reported"mzmax"is calculated in the same way. Parameterfeaturesallows to extract these values for selected features only. Parametersmzmin, mzmax, rtminandrtmaxallow to define the function to calculate the reported"mzmin", "mzmax", "rtmin"and"rtmax"' values.
featureChromatograms(): extract ion chromatograms (EICs) for each
feature in object. See featureChromatograms() for more details.
featureDefinitions(): returns a data.frame with feature definitions or
an empty data.frame if no correspondence analysis results are present.
Parameters msLevel, mz, ppm and rt allow to define subsets of
feature definitions that should be returned with the parameter type
defining how these parameters should be used to subset the returned
data.frame. See parameter descriptions for details.
featureSpectra(): returns a Spectra::Spectra() or List of Spectra
with (MS1 or MS2) spectra associated to each feature's chromatographic
peaks. See featureSpectra() for more details and available parameters.
featuresSummary(): calculate a simple summary on features. See
featureSummary() for details.
groupChromPeaks(): performs the correspondence analysis (i.e., grouping
of chromatographic peaks into LC-MS features). See groupChromPeaks()
for details.
hasFeatures(): whether correspondence analysis results are presentin in
object. The optional parameter msLevel allows to define the MS
level(s) for which it should be determined if feature definitions are
available.
overlappingFeatures(): identify features that overlapping or close in
m/z - rt dimension. See overlappingFeatures() for more information.
XcmsExperiment
Preprocessing results can be extracted using the following functions:
chromPeaks(): extract identified chromatographic peaks. See section on
chromatographic peak detection for details.
featureDefinitions(): extract the definition of features
(chromatographic peaks grouped across samples). See section on
correspondence analysis for details.
featureValues(): extract a matrix of values for features from each
sample (file). Rows are features, columns samples. Which value should be
returned can be defined with parameter value, which can be any column of
the chromPeaks() matrix. By default (value = "into") the integrated
chromatographic peak intensities are returned. With parameter msLevel it
is possible to extract values for features from certain MS levels.
During correspondence analysis, more than one chromatographic peak per
sample can be assigned to the same feature (e.g. if they are very close in
retention time). Parameter method allows to define the strategy to deal
with such cases: method = "medret": report the value from the
chromatographic peak with the apex position closest to the feautre's
median retention time. method = "maxint": report the value from the
chromatographic peak with the largest signal (parameter intensity allows
to define the column in chromPeaks that should be selected; defaults to
intensity = "into"). method = "sum"': sum the values for all
chromatographic peaks assigned to the feature in the same sample.
quantify(): extract the correspondence analysis results as a
SummarizedExperiment::SummarizedExperiment(). The feature values
are used as assay in the returned SummarizedExperiment, rowData
contains the featureDefinitions (without column "peakidx") and
colData the sampleData of object. Additional parameters to the
featureValues function (that is used to extract the feature value
matrix) can be passed via ....
plot(): plot for each file the position of individual peaks in the m/z -
retention time space (with color-coded intensity) and a base peak
chromatogram. This function should ideally be called only on a data subset
(i.e. after using filterRt() and filterMz() to restrict to a region of
interest). Parameter msLevel allows to define from which MS level the
plot should be created. If x is a XcmsExperiment with available
identified chromatographic peaks, also the region defining the peaks
are indicated with a rectangle. Parameter peakCol allows to define the
color of the border for these rectangles.
plotAdjustedRtime(): plot the alignment results; see
plotAdjustedRtime() for more information.
plotChromPeakImage(): show the density of identified chromatographic
peaks per file along the retention time. See plotChromPeakImage() for
details.
plotChromPeaks(): indicate identified chromatographic peaks from one
sample in the RT-m/z space. See plotChromPeaks() for details.
uniqueMsLevels(): returns the unique MS levels of the spectra in
object.
The functions listed below ensure compatibility with the older
XCMSnExp() xcms result object. Also, an XcmsExperiment can be coerced
to the older XCMSnExp class using as(object, "XCMSnExp") same as a
XCMSnExp class can be coerced to XcmsExperiment using
as(object, "XcmsExperiment").
fileNames(): returns the original data file names for the spectra data.
Ideally, the dataOrigin or dataStorage spectra variables from the
object's spectra() should be used instead.
fromFile(): returns the file (sample) index for each spectrum within
object. Generally, subsetting by sample using the [ is the preferred
way to get spectra from a specific sample.
polarity(): returns the polarity information for each spectrum in
object.
processHistory(): returns a list with ProcessHistory
process history objects that contain also the parameter object used
for the different processings. Optional parameter type allows to
query for specific processing steps.
rtime(): extract retention times of the spectra from the
MsExperiment or XcmsExperiment object. It is thus a shortcut for
rtime(spectra(object)) which would be the preferred way to extract
retention times from an MsExperiment. The rtime() method for
XcmsExperiment has an additional parameter adjusted which allows to
define whether adjusted retention times (if present - adjusted = TRUE)
or raw retention times (adjusted = FALSE) should be returned. By
default adjusted retention times are returned if available.
XCMSnExp() object Subsetting by [ supports arbitrary ordering.
Johannes Rainer
## Create a MsExperiment object representing the data from an LC-MS ## experiment. library(MsExperiment) library(MsDataHub) library(faahKO) ## Define the raw data files fls <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"), system.file('cdf/KO/ko16.CDF', package = "faahKO"), system.file('cdf/KO/ko18.CDF', package = "faahKO")) ## Define a data frame with the sample characterization df <- data.frame(mzML_file = basename(fls), sample = c("ko15", "ko16", "ko18")) ## Importe the data. This will initialize a `Spectra` object representing ## the raw data and assign these to the individual samples. mse <- readMsExperiment(spectraFiles = fls, sampleData = df) ## Extract a total ion chromatogram and base peak chromatogram ## from the data bpc <- chromatogram(mse, aggregationFun = "max") tic <- chromatogram(mse) ## Plot them par(mfrow = c(2, 1)) plot(bpc, main = "BPC") plot(tic, main = "TIC") ## Extracting MS2 chromatographic data ## ## To show how MS2 chromatograms can be extracted we first load a DIA ## (SWATH) data set. library(MsDataHub) mse_dia <- readMsExperiment(PestMix1_SWATH.mzML()) ## Extracting MS2 chromatogram requires also to specify the isolation ## window from which to extract the data. Without that chromatograms ## will be empty: chr_ms2 <- chromatogram(mse_dia, msLevel = 2L) intensity(chr_ms2[[1L]]) ## First we list available isolation windows table(isolationWindowTargetMz(spectra(mse_dia))) ## We can then extract the TIC of MS2 data for a specific isolation window chr_ms2 <- chromatogram(mse_dia, msLevel = 2L, isolationWindowTargetMz = 244.05) plot(chr_ms2) #### ## Chromatographic peak detection ## Perform peak detection on the data using the centWave algorith. Note ## that the parameters are chosen to reduce the run time of the example. p <- CentWaveParam(noise = 10000, snthresh = 40, prefilter = c(3, 10000)) xmse <- findChromPeaks(mse, param = p) xmse ## Have a quick look at the identified chromatographic peaks head(chromPeaks(xmse)) ## Extract chromatographic peaks identified between 3000 and 3300 seconds chromPeaks(xmse, rt = c(3000, 3300), type = "within") ## Extract ion chromatograms (EIC) for the first two chromatographic ## peaks. chrs <- chromatogram(xmse, mz = chromPeaks(xmse)[1:2, c("mzmin", "mzmax")], rt = chromPeaks(xmse)[1:2, c("rtmin", "rtmax")]) ## An EIC for each sample and each of the two regions was extracted. ## Identified chromatographic peaks in the defined regions are extracted ## as well. chrs ## Plot the EICs for the second defined region plot(chrs[2, ]) ## Subsetting the data to the results (and data) for the second sample a <- xmse[2] nrow(chromPeaks(xmse)) nrow(chromPeaks(a)) ## Filtering the result by retention time: keeping all spectra and ## chromatographic peaks within 3000 and 3500 seconds. xmse_sub <- filterRt(xmse, rt = c(3000, 3500)) xmse_sub nrow(chromPeaks(xmse_sub)) ## Perform an initial feature grouping to allow alignment using the ## peak groups method: pdp <- PeakDensityParam(sampleGroups = rep(1, 3)) xmse <- groupChromPeaks(xmse, param = pdp) ## Perform alignment using the peak groups method. pgp <- PeakGroupsParam(span = 0.4) xmse <- adjustRtime(xmse, param = pgp) ## Visualizing the alignment results plotAdjustedRtime(xmse) ## Performing the final correspondence analysis xmse <- groupChromPeaks(xmse, param = pdp) ## Show the definition of the first 6 features featureDefinitions(xmse) |> head() ## Extract the feature values; show the results for the first 6 rows. featureValues(xmse) |> head() ## The full results can also be extracted as a `SummarizedExperiment` ## that would eventually simplify subsequent analyses with other packages. ## Any additional parameters passed to the function are passed to the ## `featureValues` function that is called to generate the feature value ## matrix. se <- quantify(xmse, method = "sum") ## EICs for all features can be extracted with the `featureChromatograms` ## function. Note that, depending on the data set, extracting this for ## all features might take some time. Below we extract EICs for the ## first 10 features by providing the feature IDs. chrs <- featureChromatograms(xmse, features = rownames(featureDefinitions(xmse))[1:10]) chrs plot(chrs[3, ])## Create a MsExperiment object representing the data from an LC-MS ## experiment. library(MsExperiment) library(MsDataHub) library(faahKO) ## Define the raw data files fls <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"), system.file('cdf/KO/ko16.CDF', package = "faahKO"), system.file('cdf/KO/ko18.CDF', package = "faahKO")) ## Define a data frame with the sample characterization df <- data.frame(mzML_file = basename(fls), sample = c("ko15", "ko16", "ko18")) ## Importe the data. This will initialize a `Spectra` object representing ## the raw data and assign these to the individual samples. mse <- readMsExperiment(spectraFiles = fls, sampleData = df) ## Extract a total ion chromatogram and base peak chromatogram ## from the data bpc <- chromatogram(mse, aggregationFun = "max") tic <- chromatogram(mse) ## Plot them par(mfrow = c(2, 1)) plot(bpc, main = "BPC") plot(tic, main = "TIC") ## Extracting MS2 chromatographic data ## ## To show how MS2 chromatograms can be extracted we first load a DIA ## (SWATH) data set. library(MsDataHub) mse_dia <- readMsExperiment(PestMix1_SWATH.mzML()) ## Extracting MS2 chromatogram requires also to specify the isolation ## window from which to extract the data. Without that chromatograms ## will be empty: chr_ms2 <- chromatogram(mse_dia, msLevel = 2L) intensity(chr_ms2[[1L]]) ## First we list available isolation windows table(isolationWindowTargetMz(spectra(mse_dia))) ## We can then extract the TIC of MS2 data for a specific isolation window chr_ms2 <- chromatogram(mse_dia, msLevel = 2L, isolationWindowTargetMz = 244.05) plot(chr_ms2) #### ## Chromatographic peak detection ## Perform peak detection on the data using the centWave algorith. Note ## that the parameters are chosen to reduce the run time of the example. p <- CentWaveParam(noise = 10000, snthresh = 40, prefilter = c(3, 10000)) xmse <- findChromPeaks(mse, param = p) xmse ## Have a quick look at the identified chromatographic peaks head(chromPeaks(xmse)) ## Extract chromatographic peaks identified between 3000 and 3300 seconds chromPeaks(xmse, rt = c(3000, 3300), type = "within") ## Extract ion chromatograms (EIC) for the first two chromatographic ## peaks. chrs <- chromatogram(xmse, mz = chromPeaks(xmse)[1:2, c("mzmin", "mzmax")], rt = chromPeaks(xmse)[1:2, c("rtmin", "rtmax")]) ## An EIC for each sample and each of the two regions was extracted. ## Identified chromatographic peaks in the defined regions are extracted ## as well. chrs ## Plot the EICs for the second defined region plot(chrs[2, ]) ## Subsetting the data to the results (and data) for the second sample a <- xmse[2] nrow(chromPeaks(xmse)) nrow(chromPeaks(a)) ## Filtering the result by retention time: keeping all spectra and ## chromatographic peaks within 3000 and 3500 seconds. xmse_sub <- filterRt(xmse, rt = c(3000, 3500)) xmse_sub nrow(chromPeaks(xmse_sub)) ## Perform an initial feature grouping to allow alignment using the ## peak groups method: pdp <- PeakDensityParam(sampleGroups = rep(1, 3)) xmse <- groupChromPeaks(xmse, param = pdp) ## Perform alignment using the peak groups method. pgp <- PeakGroupsParam(span = 0.4) xmse <- adjustRtime(xmse, param = pgp) ## Visualizing the alignment results plotAdjustedRtime(xmse) ## Performing the final correspondence analysis xmse <- groupChromPeaks(xmse, param = pdp) ## Show the definition of the first 6 features featureDefinitions(xmse) |> head() ## Extract the feature values; show the results for the first 6 rows. featureValues(xmse) |> head() ## The full results can also be extracted as a `SummarizedExperiment` ## that would eventually simplify subsequent analyses with other packages. ## Any additional parameters passed to the function are passed to the ## `featureValues` function that is called to generate the feature value ## matrix. se <- quantify(xmse, method = "sum") ## EICs for all features can be extracted with the `featureChromatograms` ## function. Note that, depending on the data set, extracting this for ## all features might take some time. Below we extract EICs for the ## first 10 features by providing the feature IDs. chrs <- featureChromatograms(xmse, features = rownames(featureDefinitions(xmse))[1:10]) chrs plot(chrs[3, ])
When dealing with metabolomics results, it is often necessary to filter
features based on certain criteria. These criteria are typically derived
from statistical formulas applied to full rows of data, where each row
represents a feature and its abundance of signal in each samples.
The filterFeatures function filters features based on these conventional
quality assessment criteria. Multiple types of filtering are implemented and
can be defined by the filter argument.
Supported filter arguments are:
RsdFilter: Calculates the relative standard deviation
(i.e. coefficient of variation) in abundance for each feature in QC
(Quality Control) samples and filters them in the input object according to
a provided threshold.
DratioFilter: Computes the D-ratio or dispersion ratio, defined as
the standard deviation in abundance for QC samples divided by the standard
deviation for biological test samples, for each feature and filters them
according to a provided threshold.
PercentMissingFilter: Determines the percentage of missing values for
each feature in the various sample groups and filters them according to a
provided threshold.
BlankFlag: Identifies features where the mean abundance in test samples
is lower than a specified multiple of the mean abundance of blank samples.
This can be used to flag features that result from contamination in the
solvent of the samples. A new column possible_contaminants is added to the
featureDefinitions (XcmsExperiment object) or rowData
(SummarizedExperiment object) reflecting this.
For specific examples, see the help pages of the individual parameter classes listed above.
object |
|
filter |
The parameter object selecting and configuring the type of
filtering. It can be one of the following classes: |
assay |
For filtering of |
... |
Optional parameters. For |
Philippine Louail
Broadhurst D, Goodacre R, Reinke SN, Kuligowski J, Wilson ID, Lewis MR, Dunn WB. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics. 2018;14(6):72. doi: 10.1007/s11306-018-1367-3. Epub 2018 May 18. PMID: 29805336; PMCID: PMC5960010.
## See the vignettes for more detailed examples library(MsExperiment) ## Load a test data set with features defined. test_xcms <- loadXcmsData() ## Set up parameter to filter based on coefficient of variation. By setting ## the filter such as below, features that have a coefficient of variation ## superior to 0.3 in QC samples will be removed from the object `test_xcms` ## when calling the `filterFeatures` function. rsd_filter <- RsdFilter(threshold = 0.3, qcIndex = sampleData(test_xcms)$sample_type == "QC") filtered_data_rsd <- filterFeatures(object = test_xcms, filter = rsd_filter) ## Set up parameter to filter based on D-ratio. By setting the filter such ## as below, features that have a D-ratio computed based on their abundance ## between QC and study samples superior to 0.5 will be removed from the ## object `test_xcms`. dratio_filter <- DratioFilter(threshold = 0.5, qcIndex = sampleData(test_xcms)$sample_type == "QC", studyIndex = sampleData(test_xcms)$sample_type == "study") filtered_data_dratio <- filterFeatures(object = test_xcms, filter = dratio_filter) ## Set up parameter to filter based on the percent of missing data. ## Parameter f should represent the sample group of samples, for which the ## percentage of missing values will be evaluated. As the setting is defined ## bellow, if a feature as less (or equal) to 30% missing values in one ## sample group, it will be kept in the `test_xcms` object. missing_data_filter <- PercentMissingFilter(threshold = 30, f = sampleData(test_xcms)$sample_type) filtered_data_missing <- filterFeatures(object = test_xcms, filter = missing_data_filter) ## Set up parameter to flag possible contaminants based on blank samples' ## abundance. By setting the filter such as below, features that have mean ## abundance ratio between blank(here use study as an example) and QC ## samples less than 2 will be marked as `TRUE` in an extra column named ## `possible_contaminants` in the `featureDefinitions` table of the object ## `test_xcms`. filter <- BlankFlag(threshold = 2, qcIndex = sampleData(test_xcms)$sample_type == "QC", blankIndex = sampleData(test_xcms)$sample_type == "study") filtered_xmse <- filterFeatures(test_xcms, filter)## See the vignettes for more detailed examples library(MsExperiment) ## Load a test data set with features defined. test_xcms <- loadXcmsData() ## Set up parameter to filter based on coefficient of variation. By setting ## the filter such as below, features that have a coefficient of variation ## superior to 0.3 in QC samples will be removed from the object `test_xcms` ## when calling the `filterFeatures` function. rsd_filter <- RsdFilter(threshold = 0.3, qcIndex = sampleData(test_xcms)$sample_type == "QC") filtered_data_rsd <- filterFeatures(object = test_xcms, filter = rsd_filter) ## Set up parameter to filter based on D-ratio. By setting the filter such ## as below, features that have a D-ratio computed based on their abundance ## between QC and study samples superior to 0.5 will be removed from the ## object `test_xcms`. dratio_filter <- DratioFilter(threshold = 0.5, qcIndex = sampleData(test_xcms)$sample_type == "QC", studyIndex = sampleData(test_xcms)$sample_type == "study") filtered_data_dratio <- filterFeatures(object = test_xcms, filter = dratio_filter) ## Set up parameter to filter based on the percent of missing data. ## Parameter f should represent the sample group of samples, for which the ## percentage of missing values will be evaluated. As the setting is defined ## bellow, if a feature as less (or equal) to 30% missing values in one ## sample group, it will be kept in the `test_xcms` object. missing_data_filter <- PercentMissingFilter(threshold = 30, f = sampleData(test_xcms)$sample_type) filtered_data_missing <- filterFeatures(object = test_xcms, filter = missing_data_filter) ## Set up parameter to flag possible contaminants based on blank samples' ## abundance. By setting the filter such as below, features that have mean ## abundance ratio between blank(here use study as an example) and QC ## samples less than 2 will be marked as `TRUE` in an extra column named ## `possible_contaminants` in the `featureDefinitions` table of the object ## `test_xcms`. filter <- BlankFlag(threshold = 2, qcIndex = sampleData(test_xcms)$sample_type == "QC", blankIndex = sampleData(test_xcms)$sample_type == "study") filtered_xmse <- filterFeatures(test_xcms, filter)
The findChromPeaks method performs chromatographic peak detection on
LC/GC-MS data. The peak detection algorithm can be selected, and configured,
using the param argument.
Supported param objects are:
CentWaveParam(): chromatographic peak detection using the centWave
algorithm.
CentWavePredIsoParam(): centWave with predicted isotopes. Peak
detection uses a two-step centWave-based approach considering also feature
isotopes.
MatchedFilterParam(): peak detection using the matched filter
algorithm.
MassifquantParam(): peak detection using the Kalman filter-based
massifquant method.
MSWParam(): single-spectrum non-chromatography MS data peak detection.
For specific examples see the help pages of the individual parameter classes listed above.
findChromPeaks(object, param, ...) ## S4 method for signature 'MsExperiment,Param' findChromPeaks( object, param, msLevel = 1L, chunkSize = 2L, hdf5File = character(), force.overwrite = FALSE, ..., BPPARAM = bpparam() ) ## S4 method for signature 'XcmsExperiment,Param' findChromPeaks( object, param, msLevel = 1L, chunkSize = 2L, add = FALSE, ..., BPPARAM = bpparam() )findChromPeaks(object, param, ...) ## S4 method for signature 'MsExperiment,Param' findChromPeaks( object, param, msLevel = 1L, chunkSize = 2L, hdf5File = character(), force.overwrite = FALSE, ..., BPPARAM = bpparam() ) ## S4 method for signature 'XcmsExperiment,Param' findChromPeaks( object, param, msLevel = 1L, chunkSize = 2L, add = FALSE, ..., BPPARAM = bpparam() )
object |
The data object on which to perform the peak detection. Can be
an |
param |
The parameter object selecting and configuring the algorithm. |
... |
Optional parameters. |
msLevel |
|
chunkSize |
|
hdf5File |
For |
force.overwrite |
For |
BPPARAM |
Parallel processing setup. Uses by default the system-wide
default setup. See |
add |
|
Johannes Rainer
plotChromPeaks() to plot identified chromatographic peaks for one file.
refineChromPeaks() for methods to refine or clean identified
chromatographic peaks.
manualChromPeaks() to manually add/define chromatographic peaks.
Other peak detection methods:
findChromPeaks-centWave,
findChromPeaks-centWaveWithPredIsoROIs,
findChromPeaks-massifquant,
findChromPeaks-matchedFilter,
findPeaks-MSW
The centWave algorithm perform peak density and wavelet based chromatographic peak detection for high resolution LC/MS data in centroid mode Tautenhahn 2008.
The findChromPeaks,OnDiskMSnExp,CentWaveParam() method
performs chromatographic peak detection using the centWave
algorithm on all samples from an OnDiskMSnExp
object. OnDiskMSnExp objects encapsule all
experiment specific data and load the spectra data (mz and intensity
values) on the fly from the original files applying also all eventual
data manipulations.
CentWaveParam( ppm = 25, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1L, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, roiList = list(), firstBaselineCheck = TRUE, roiScales = numeric(), extendLengthMSW = FALSE, verboseBetaColumns = FALSE ) ## S4 method for signature 'OnDiskMSnExp,CentWaveParam' findChromPeaks( object, param, BPPARAM = bpparam(), return.type = "XCMSnExp", msLevel = 1L, ... ) ## S4 method for signature 'CentWaveParam' as.list(x, ...)CentWaveParam( ppm = 25, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1L, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, roiList = list(), firstBaselineCheck = TRUE, roiScales = numeric(), extendLengthMSW = FALSE, verboseBetaColumns = FALSE ) ## S4 method for signature 'OnDiskMSnExp,CentWaveParam' findChromPeaks( object, param, BPPARAM = bpparam(), return.type = "XCMSnExp", msLevel = 1L, ... ) ## S4 method for signature 'CentWaveParam' as.list(x, ...)
ppm |
|
peakwidth |
|
snthresh |
|
prefilter |
|
mzCenterFun |
Name of the function to calculate the m/z center of the
chromatographic peak. Allowed are: |
integrate |
Integration method. For |
mzdiff |
|
fitgauss |
|
noise |
|
verboseColumns |
|
roiList |
An optional list of regions-of-interest (ROI) representing
detected mass traces. If ROIs are submitted the first analysis step is
omitted and chromatographic peak detection is performed on the submitted
ROIs. Each ROI is expected to have the following elements specified:
|
firstBaselineCheck |
|
roiScales |
Optional numeric vector with length equal to |
extendLengthMSW |
Option to force centWave to use all scales when
running centWave rather than truncating with the EIC length. Uses the
"open" method to extend the EIC to a integer base-2 length prior to
being passed to |
verboseBetaColumns |
Option to calculate two additional metrics of peak
quality via comparison to an idealized bell curve. Adds |
object |
For For all other methods: a parameter object. |
param |
An |
BPPARAM |
A parameter class specifying if and how parallel processing
should be performed. It defaults to |
return.type |
Character specifying what type of object the method should
return. Can be either |
msLevel |
|
... |
ignored. |
x |
The parameter object. |
The centWave algorithm is most suitable for high resolution
LC/{TOF,OrbiTrap,FTICR}-MS data in centroid mode. In the first phase
the method identifies regions of interest (ROIs) representing
mass traces that are characterized as regions with less than ppm
m/z deviation in consecutive scans in the LC/MS map. In detail, starting
with a single m/z, a ROI is extended if a m/z can be found in the next scan
(spectrum) for which the difference to the mean m/z of the ROI is smaller
than the user defined ppm of the m/z. The mean m/z of the ROI is then
updated considering also the newly included m/z value.
These ROIs are then, after some cleanup, analyzed using continuous wavelet
transform (CWT) to locate chromatographic peaks on different scales.
The first analysis step is skipped, if regions of interest are passed
via the param parameter.
Parallel processing (one process per sample) is supported and can
be configured either by the BPPARAM parameter or by globally
defining the parallel processing mode using the
BiocParallel::register() method from the BiocParallel
package.
The CentWaveParam() function returns a CentWaveParam
class instance with all of the settings specified for chromatographic
peak detection by the centWave method.
For findChromPeaks(): if return.type = "XCMSnExp" an
XCMSnExp() object with the results of the peak detection.
If return.type = "list" a list of length equal to the number of
samples with matrices specifying the identified peaks.
If return.type = "xcmsSet" an xcmsSet object
with the results of the peak detection.
These methods and classes are part of the updated and modernized
xcms user interface which will eventually replace the
findPeaks() methods.
Ralf Tautenhahn, Johannes Rainer
Ralf Tautenhahn, Christoph Böttcher, and Steffen Neumann "Highly sensitive feature detection for high resolution LC/MS" BMC Bioinformatics 2008, 9:504 doi: 10.1186/1471-2105-9-504
The do_findChromPeaks_centWave() core API function and
findPeaks.centWave() for the old user interface.
peaksWithCentWave() for functions to perform centWave peak
detection in purely chromatographic data.
XCMSnExp() for the object containing the results of
the peak detection.
Other peak detection methods:
findChromPeaks(),
findChromPeaks-centWaveWithPredIsoROIs,
findChromPeaks-massifquant,
findChromPeaks-matchedFilter,
findPeaks-MSW
## Create a CentWaveParam object. Note that the noise is set to 10000 to ## speed up the execution of the example - in a real use case the default ## value should be used, or it should be set to a reasonable value. cwp <- CentWaveParam(ppm = 25, noise = 10000, prefilter = c(3, 10000)) cwp ## Perform the peak detection using centWave on some of the files from the ## faahKO package. Files are read using the `readMsExperiment` function ## from the MsExperiment package library(faahKO) library(xcms) library(MsExperiment) fls <- dir(system.file("cdf/KO", package = "faahKO"), recursive = TRUE, full.names = TRUE) raw_data <- readMsExperiment(fls[1]) ## Perform the peak detection using the settings defined above. res <- findChromPeaks(raw_data, param = cwp) head(chromPeaks(res))## Create a CentWaveParam object. Note that the noise is set to 10000 to ## speed up the execution of the example - in a real use case the default ## value should be used, or it should be set to a reasonable value. cwp <- CentWaveParam(ppm = 25, noise = 10000, prefilter = c(3, 10000)) cwp ## Perform the peak detection using centWave on some of the files from the ## faahKO package. Files are read using the `readMsExperiment` function ## from the MsExperiment package library(faahKO) library(xcms) library(MsExperiment) fls <- dir(system.file("cdf/KO", package = "faahKO"), recursive = TRUE, full.names = TRUE) raw_data <- readMsExperiment(fls[1]) ## Perform the peak detection using the settings defined above. res <- findChromPeaks(raw_data, param = cwp) head(chromPeaks(res))
This method performs a two-step centWave-based chromatographic peak detection: in a first centWave run peaks are identified for which then the location of their potential isotopes in the mz-retention time is predicted. A second centWave run is then performed on these regions of interest (ROIs). The final list of chromatographic peaks comprises all non-overlapping peaks from both centWave runs.
The findChromPeaks,OnDiskMSnExp,CentWavePredIsoParam()
method performs a two-step centWave-based chromatographic peak detection
on all samples from an OnDiskMSnExp object.
OnDiskMSnExp objects encapsule all experiment
specific data and load the spectra data (mz and intensity values) on the
fly from the original files applying also all eventual data
manipulations.
CentWavePredIsoParam( ppm = 25, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1L, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, roiList = list(), firstBaselineCheck = TRUE, roiScales = numeric(), extendLengthMSW = FALSE, verboseBetaColumns = FALSE, snthreshIsoROIs = 6.25, maxCharge = 3, maxIso = 5, mzIntervalExtension = TRUE, polarity = "unknown" ) ## S4 method for signature 'OnDiskMSnExp,CentWavePredIsoParam' findChromPeaks( object, param, BPPARAM = bpparam(), return.type = "XCMSnExp", msLevel = 1L, ... )CentWavePredIsoParam( ppm = 25, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1L, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, roiList = list(), firstBaselineCheck = TRUE, roiScales = numeric(), extendLengthMSW = FALSE, verboseBetaColumns = FALSE, snthreshIsoROIs = 6.25, maxCharge = 3, maxIso = 5, mzIntervalExtension = TRUE, polarity = "unknown" ) ## S4 method for signature 'OnDiskMSnExp,CentWavePredIsoParam' findChromPeaks( object, param, BPPARAM = bpparam(), return.type = "XCMSnExp", msLevel = 1L, ... )
ppm |
|
peakwidth |
|
snthresh |
|
prefilter |
|
mzCenterFun |
Name of the function to calculate the m/z center of the
chromatographic peak. Allowed are: |
integrate |
Integration method. For |
mzdiff |
|
fitgauss |
|
noise |
|
verboseColumns |
|
roiList |
An optional list of regions-of-interest (ROI) representing
detected mass traces. If ROIs are submitted the first analysis step is
omitted and chromatographic peak detection is performed on the submitted
ROIs. Each ROI is expected to have the following elements specified:
|
firstBaselineCheck |
|
roiScales |
Optional numeric vector with length equal to |
extendLengthMSW |
Option to force centWave to use all scales when
running centWave rather than truncating with the EIC length. Uses the
"open" method to extend the EIC to a integer base-2 length prior to
being passed to |
verboseBetaColumns |
Option to calculate two additional metrics of peak
quality via comparison to an idealized bell curve. Adds |
snthreshIsoROIs |
|
maxCharge |
|
maxIso |
|
mzIntervalExtension |
|
polarity |
|
object |
For For all other methods: a parameter object. |
param |
An |
BPPARAM |
A parameter class specifying if and how parallel processing
should be performed. It defaults to |
return.type |
Character specifying what type of object the method should
return. Can be either |
msLevel |
|
... |
ignored. |
See centWave() for details on the centWave method.
Parallel processing (one process per sample) is supported and can
be configured either by the BPPARAM parameter or by globally
defining the parallel processing mode using the
BiocParallel::register() method from the BiocParallel
package.
The CentWavePredIsoParam() function returns a
CentWavePredIsoParam class instance with all of the settings
specified for the two-step centWave-based peak detection considering also
isotopes.
For findChromPeaks(): if return.type = "XCMSnExp" an
XCMSnExp object with the results of the peak detection.
If return.type = "list" a list of length equal to the number of
samples with matrices specifying the identified peaks.
If return.type = "xcmsSet" an xcmsSet object
with the results of the peak detection.
Hendrik Treutler, Johannes Rainer
The do_findChromPeaks_centWaveWithPredIsoROIs() core
API function.
XCMSnExp() for the object containing the results of
the peak detection.
Other peak detection methods:
findChromPeaks(),
findChromPeaks-centWave,
findChromPeaks-massifquant,
findChromPeaks-matchedFilter,
findPeaks-MSW
## Create a param object p <- CentWavePredIsoParam(maxCharge = 4, snthresh = 25) p## Create a param object p <- CentWavePredIsoParam(maxCharge = 4, snthresh = 25) p
Massifquant is a Kalman filter (KF)-based chromatographic peak
detection for XC-MS data in centroid mode. The identified peaks
can be further refined with the centWave method (see
findChromPeaks-centWave() for details on centWave)
by specifying withWave = TRUE.
The findChromPeaks,OnDiskMSnExp,MassifquantParam()
method performs chromatographic peak detection using the
massifquant algorithm on all samples from an
OnDiskMSnExp object.
OnDiskMSnExp objects encapsule all experiment
specific data and load the spectra data (mz and intensity values) on the
fly from the original files applying also all eventual data
manipulations.
MassifquantParam( ppm = 25, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1L, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, criticalValue = 1.125, consecMissedLimit = 2, unions = 1, checkBack = 0, withWave = FALSE ) ## S4 method for signature 'OnDiskMSnExp,MassifquantParam' findChromPeaks( object, param, BPPARAM = bpparam(), return.type = "XCMSnExp", msLevel = 1L, ... )MassifquantParam( ppm = 25, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1L, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, criticalValue = 1.125, consecMissedLimit = 2, unions = 1, checkBack = 0, withWave = FALSE ) ## S4 method for signature 'OnDiskMSnExp,MassifquantParam' findChromPeaks( object, param, BPPARAM = bpparam(), return.type = "XCMSnExp", msLevel = 1L, ... )
ppm |
|
peakwidth |
|
snthresh |
|
prefilter |
|
mzCenterFun |
Name of the function to calculate the m/z center of the
chromatographic peak. Allowed are: |
integrate |
Integration method. For |
mzdiff |
|
fitgauss |
|
noise |
|
verboseColumns |
|
criticalValue |
|
consecMissedLimit |
|
unions |
|
checkBack |
|
withWave |
|
object |
For For all other methods: a parameter object. |
param |
An |
BPPARAM |
A parameter class specifying if and how parallel processing
should be performed. It defaults to |
return.type |
Character specifying what type of object the method should
return. Can be either |
msLevel |
|
... |
ignored. |
This algorithm's performance has been tested rigorously
on high resolution LC/(OrbiTrap, TOF)-MS data in centroid mode.
Simultaneous kalman filters identify chromatographic peaks and calculate
their area under the curve. The default parameters are set to operate on
a complex LC-MS Orbitrap sample. Users will find it useful to do some
simple exploratory data analysis to find out where to set a minimum
intensity, and identify how many scans an average peak spans. The
consecMissedLimit parameter has yielded good performance on
Orbitrap data when set to (2) and on TOF data it was found best
to be at (1). This may change as the algorithm has yet to be
tested on many samples. The criticalValue parameter is perhaps
most dificult to dial in appropriately and visual inspection of peak
identification is the best suggested tool for quick optimization.
The ppm and checkBack parameters have shown less influence
than the other parameters and exist to give users flexibility and
better accuracy.
Parallel processing (one process per sample) is supported and can
be configured either by the BPPARAM parameter or by globally
defining the parallel processing mode using the
BiocParallel::register() method from the BiocParallel
package.
The MassifquantParam() function returns a
MassifquantParam class instance with all of the settings
specified for chromatographic peak detection by the massifquant
method.
For findChromPeaks(): if return.type = "XCMSnExp" an
XCMSnExp object with the results of the peak detection.
If return.type = "list" a list of length equal to the number of
samples with matrices specifying the identified peaks.
If return.type = "xcmsSet" an xcmsSet object
with the results of the peak detection.
Christopher Conley, Johannes Rainer
Conley CJ, Smith R, Torgrip RJ, Taylor RM, Tautenhahn R and Prince JT "Massifquant: open-source Kalman filter-based XC-MS isotope trace feature detection" Bioinformatics 2014, 30(18):2636-43. doi: 10.1093/bioinformatics/btu359
The do_findChromPeaks_massifquant() core API function
and findPeaks.massifquant() for the old user interface.
XCMSnExp() for the object containing the results of
the peak detection.
Other peak detection methods:
findChromPeaks(),
findChromPeaks-centWave,
findChromPeaks-centWaveWithPredIsoROIs,
findChromPeaks-matchedFilter,
findPeaks-MSW
## Create a MassifquantParam object. mqp <- MassifquantParam(snthresh = 30, prefilter = c(6, 10000)) mqp ## Perform the peak detection using massifquant on the files from the ## faahKO package. Files are read using the readMSData from the MSnbase ## package library(faahKO) library(MSnbase) fls <- dir(system.file("cdf/KO", package = "faahKO"), recursive = TRUE, full.names = TRUE) raw_data <- readMSData(fls[1], mode = "onDisk") ## Perform the peak detection using the settings defined above. res <- findChromPeaks(raw_data, param = mqp) head(chromPeaks(res))## Create a MassifquantParam object. mqp <- MassifquantParam(snthresh = 30, prefilter = c(6, 10000)) mqp ## Perform the peak detection using massifquant on the files from the ## faahKO package. Files are read using the readMSData from the MSnbase ## package library(faahKO) library(MSnbase) fls <- dir(system.file("cdf/KO", package = "faahKO"), recursive = TRUE, full.names = TRUE) raw_data <- readMSData(fls[1], mode = "onDisk") ## Perform the peak detection using the settings defined above. res <- findChromPeaks(raw_data, param = mqp) head(chromPeaks(res))
The matchedFilter algorithm identifies peaks in the
chromatographic time domain as described in Smith 2006. The intensity
values are binned by cutting The LC/MS data into slices (bins) of a mass
unit (binSize m/z) wide. Within each bin the maximal intensity is
selected. The chromatographic peak detection is then performed in each
bin by extending it based on the steps parameter to generate
slices comprising bins current_bin - steps +1 to
current_bin + steps - 1. Each of these slices is then filtered
with matched filtration using a second-derative Gaussian as the model
peak shape. After filtration peaks are detected using a signal-to-ratio
cut-off. For more details and illustrations see Smith 2006.
The findChromPeaks,OnDiskMSnExp,MatchedFilterParam()
method performs peak detection using the matchedFilter algorithm
on all samples from an MSnbase::OnDiskMSnExp() object.
MSnbase::OnDiskMSnExp() objects encapsule all experiment
specific data and load the spectra data (mz and intensity values) on the
fly from the original files applying also all eventual data
manipulations.
MatchedFilterParam( binSize = 0.1, impute = "none", baseValue = numeric(), distance = numeric(), fwhm = 30, sigma = fwhm/2.3548, max = 5, snthresh = 10, steps = 2, mzdiff = 0.8 - binSize * steps, index = FALSE ) ## S4 method for signature 'OnDiskMSnExp,MatchedFilterParam' findChromPeaks( object, param, BPPARAM = bpparam(), return.type = "XCMSnExp", msLevel = 1L, ... )MatchedFilterParam( binSize = 0.1, impute = "none", baseValue = numeric(), distance = numeric(), fwhm = 30, sigma = fwhm/2.3548, max = 5, snthresh = 10, steps = 2, mzdiff = 0.8 - binSize * steps, index = FALSE ) ## S4 method for signature 'OnDiskMSnExp,MatchedFilterParam' findChromPeaks( object, param, BPPARAM = bpparam(), return.type = "XCMSnExp", msLevel = 1L, ... )
binSize |
|
impute |
Character string specifying the method to be used for missing
value imputation. Allowed values are |
baseValue |
The base value to which empty elements should be set. This
is only considered for |
distance |
For |
fwhm |
|
sigma |
|
max |
|
snthresh |
|
steps |
|
mzdiff |
|
index |
|
object |
For For all other methods: a parameter object. |
param |
An |
BPPARAM |
A parameter class specifying if and how parallel processing
should be performed. It defaults to |
return.type |
Character specifying what type of object the method should
return. Can be either |
msLevel |
|
... |
ignored. |
The intensities are binned by the provided m/z values within each
spectrum (scan). Binning is performed such that the bins are centered
around the m/z values (i.e. the first bin includes all m/z values between
min(mz) - bin_size/2 and min(mz) + bin_size/2).
For more details on binning and missing value imputation see [binYonX()] and [imputeLinInterpol()] methods.
Parallel processing (one process per sample) is supported and can
be configured either by the BPPARAM parameter or by globally
defining the parallel processing mode using the
BiocParallel::register() method from the BiocParallel
package.
The MatchedFilterParam() function returns a
MatchedFilterParam class instance with all of the settings
specified for chromatographic detection by the matchedFilter
method.
For findChromPeaks(): if return.type = "XCMSnExp" an
XCMSnExp() object with the results of the peak detection.
If return.type = "list" a list of length equal to the number of
samples with matrices specifying the identified peaks.
If return.type = "xcmsSet" an xcmsSet object
with the results of the peak detection.
Colin A Smith, Johannes Rainer
Colin A. Smith, Elizabeth J. Want, Grace O'Maille, Ruben Abagyan and Gary Siuzdak. "XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification" Anal. Chem. 2006, 78:779-787. doi: 10.1021/ac051437y
The do_findChromPeaks_matchedFilter() core API function
and findPeaks.matchedFilter() for the old user interface.
peaksWithMatchedFilter() for functions to perform matchedFilter
peak detection in purely chromatographic data.
XCMSnExp() for the object containing the results of
the chromatographic peak detection.
Other peak detection methods:
findChromPeaks(),
findChromPeaks-centWave,
findChromPeaks-centWaveWithPredIsoROIs,
findChromPeaks-massifquant,
findPeaks-MSW
## Create a MatchedFilterParam object. Note that we use a unnecessarily large ## binSize parameter to reduce the run-time of the example. mfp <- MatchedFilterParam(binSize = 5, snthresh = 15) mfp ## Perform the peak detection using matchecFilter on the files from the ## faahKO package. Files are read using the readMSData from the MSnbase ## package library(faahKO) library(MSnbase) fls <- dir(system.file("cdf/KO", package = "faahKO"), recursive = TRUE, full.names = TRUE) raw_data <- readMSData(fls[1], mode = "onDisk") ## Perform the chromatographic peak detection using the settings defined ## above. Note that we are also disabling parallel processing in this ## example by registering a "SerialParam" res <- findChromPeaks(raw_data, param = mfp) head(chromPeaks(res))## Create a MatchedFilterParam object. Note that we use a unnecessarily large ## binSize parameter to reduce the run-time of the example. mfp <- MatchedFilterParam(binSize = 5, snthresh = 15) mfp ## Perform the peak detection using matchecFilter on the files from the ## faahKO package. Files are read using the readMSData from the MSnbase ## package library(faahKO) library(MSnbase) fls <- dir(system.file("cdf/KO", package = "faahKO"), recursive = TRUE, full.names = TRUE) raw_data <- readMSData(fls[1], mode = "onDisk") ## Perform the chromatographic peak detection using the settings defined ## above. Note that we are also disabling parallel processing in this ## example by registering a "SerialParam" res <- findChromPeaks(raw_data, param = mfp) head(chromPeaks(res))
findChromPeaks on a MSnbase::Chromatogram or MSnbase::MChromatograms
object with a CentWaveParam parameter object performs centWave-based
peak detection on purely chromatographic data. See centWave for details
on the method and CentWaveParam for details on the parameter class.
Note that not all settings from the CentWaveParam will be used.
See peaksWithCentWave() for the arguments used for peak detection
on purely chromatographic data.
After chromatographic peak detection, identified peaks can also be refined
with the refineChromPeaks() method, which can help to reduce peak
detection artifacts.
## S4 method for signature 'Chromatogram,CentWaveParam' findChromPeaks(object, param, ...) ## S4 method for signature 'MChromatograms,CentWaveParam' findChromPeaks(object, param, BPPARAM = bpparam(), ...) ## S4 method for signature 'MChromatograms,MatchedFilterParam' findChromPeaks(object, param, BPPARAM = BPPARAM, ...)## S4 method for signature 'Chromatogram,CentWaveParam' findChromPeaks(object, param, ...) ## S4 method for signature 'MChromatograms,CentWaveParam' findChromPeaks(object, param, BPPARAM = bpparam(), ...) ## S4 method for signature 'MChromatograms,MatchedFilterParam' findChromPeaks(object, param, BPPARAM = BPPARAM, ...)
object |
a MSnbase::Chromatogram or MSnbase::MChromatograms object. |
param |
a CentWaveParam object specifying the settings for the
peak detection. See |
... |
currently ignored. |
BPPARAM |
a parameter class specifying if and how parallel processing
should be performed (only for |
If called on a Chromatogram object, the method returns an XChromatogram
object with the identified peaks. Columns "mz", "mzmin" and "mzmax" in
the chromPeaks() peak matrix provide the mean m/z and the maximum and
minimum m/z value of the Chromatogram object. See peaksWithCentWave()
for details on the remaining columns.
Johannes Rainer
peaksWithCentWave() for the downstream function and centWave
for details on the method.
library(MSnbase) ## Loading a test data set with identified chromatographic peaks faahko_sub <- loadXcmsData("faahko_sub2") faahko_sub <- filterRt(faahko_sub, c(2500, 3700)) ## od <- as(filterFile(faahko_sub, 1L), "MsExperiment") ## Extract chromatographic data for a small m/z range chr <- chromatogram(od, mz = c(272.1, 272.3))[1, 1] ## Identify peaks with default settings xchr <- findChromPeaks(chr, CentWaveParam()) xchr ## Plot data and identified peaks. plot(xchr) library(MsExperiment) library(xcms) library(faahKO) ## Perform peak detection on an MChromatograms object fls <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"), system.file('cdf/KO/ko16.CDF', package = "faahKO"), system.file('cdf/KO/ko18.CDF', package = "faahKO")) od3 <- readMsExperiment(fls) ## Disable parallel processing for this example register(SerialParam()) ## Extract chromatograms for a m/z - retention time slice chrs <- chromatogram(od3, mz = 344, rt = c(2500, 3500)) ## Perform peak detection using CentWave xchrs <- findChromPeaks(chrs, param = CentWaveParam()) xchrs ## Extract the identified chromatographic peaks chromPeaks(xchrs) ## plot the result plot(xchrs)library(MSnbase) ## Loading a test data set with identified chromatographic peaks faahko_sub <- loadXcmsData("faahko_sub2") faahko_sub <- filterRt(faahko_sub, c(2500, 3700)) ## od <- as(filterFile(faahko_sub, 1L), "MsExperiment") ## Extract chromatographic data for a small m/z range chr <- chromatogram(od, mz = c(272.1, 272.3))[1, 1] ## Identify peaks with default settings xchr <- findChromPeaks(chr, CentWaveParam()) xchr ## Plot data and identified peaks. plot(xchr) library(MsExperiment) library(xcms) library(faahKO) ## Perform peak detection on an MChromatograms object fls <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"), system.file('cdf/KO/ko16.CDF', package = "faahKO"), system.file('cdf/KO/ko18.CDF', package = "faahKO")) od3 <- readMsExperiment(fls) ## Disable parallel processing for this example register(SerialParam()) ## Extract chromatograms for a m/z - retention time slice chrs <- chromatogram(od3, mz = 344, rt = c(2500, 3500)) ## Perform peak detection using CentWave xchrs <- findChromPeaks(chrs, param = CentWaveParam()) xchrs ## Extract the identified chromatographic peaks chromPeaks(xchrs) ## plot the result plot(xchrs)
findChromPeaks on a MSnbase::Chromatogram() or
MSnbase::MChromatograms() object with a
MatchedFilterParam parameter object performs matchedFilter-based peak
detection on purely chromatographic data. See matchedFilter for details
on the method and MatchedFilterParam for details on the parameter class.
Note that not all settings from the MatchedFilterParam will be used.
See peaksWithMatchedFilter() for the arguments used for peak detection
on purely chromatographic data.
## S4 method for signature 'Chromatogram,MatchedFilterParam' findChromPeaks(object, param, ...)## S4 method for signature 'Chromatogram,MatchedFilterParam' findChromPeaks(object, param, ...)
object |
a |
param |
a MatchedFilterParam object specifying the settings for the
peak detection. See |
... |
currently ignored. |
If called on a Chromatogram object, the method returns a matrix with
the identified peaks. Columns "mz", "mzmin" and "mzmax" in
the chromPeaks() peak matrix provide the mean m/z and the maximum and
minimum m/z value of the Chromatogram object. See
peaksWithMatchedFilter() for details on the remaining columns.
Johannes Rainer
peaksWithMatchedFilter() for the downstream function and
matchedFilter for details on the method.
## Loading a test data set with identified chromatographic peaks faahko_sub <- loadXcmsData("faahko_sub2") faahko_sub <- filterRt(faahko_sub, c(2500, 3700)) ## od <- as(filterFile(faahko_sub, 1L), "MsExperiment") ## Extract chromatographic data for a small m/z range chr <- chromatogram(od, mz = c(272.1, 272.3))[1, 1] ## Identify peaks with default settings xchr <- findChromPeaks(chr, MatchedFilterParam()) ## Plot the identified peaks plot(xchr)## Loading a test data set with identified chromatographic peaks faahko_sub <- loadXcmsData("faahko_sub2") faahko_sub <- filterRt(faahko_sub, c(2500, 3700)) ## od <- as(filterFile(faahko_sub, 1L), "MsExperiment") ## Extract chromatographic data for a small m/z range chr <- chromatogram(od, mz = c(272.1, 272.3))[1, 1] ## Identify peaks with default settings xchr <- findChromPeaks(chr, MatchedFilterParam()) ## Plot the identified peaks plot(xchr)
The findChromPeaksIsolationWindow function allows to perform a
chromatographic peak detection in MS level > 1 spectra of certain isolation
windows (e.g. SWATH pockets). The function performs a peak detection,
separately for all spectra belonging to the same isolation window and adds
them to the chromPeaks() matrix of the result object. Information about
the isolation window in which they were detected is added to
chromPeakData() data frame.
Note that peak detection with this method does not remove previously
identified chromatographic peaks (e.g. on MS1 level using the
findChromPeaks() function but adds newly identified peaks to the existing
chromPeaks() matrix.
Isolation windows can be defined with the isolationWindow parameter, that
by default uses the definition of isolationWindowTargetMz(), i.e.
chromatographic peak detection is performed for all spectra with the same
isolation window target m/z (seprarately for each file). The parameter
param allows to define and configure the peak detection algorithm (see
findChromPeaks() for more information).
findChromPeaksIsolationWindow(object, ...) ## S4 method for signature 'MsExperiment' findChromPeaksIsolationWindow( object, param, msLevel = 2L, isolationWindow = isolationWindowTargetMz(spectra(object)), chunkSize = 2L, ..., BPPARAM = bpparam() ) ## S4 method for signature 'OnDiskMSnExp' findChromPeaksIsolationWindow( object, param, msLevel = 2L, isolationWindow = isolationWindowTargetMz(object), ... )findChromPeaksIsolationWindow(object, ...) ## S4 method for signature 'MsExperiment' findChromPeaksIsolationWindow( object, param, msLevel = 2L, isolationWindow = isolationWindowTargetMz(spectra(object)), chunkSize = 2L, ..., BPPARAM = bpparam() ) ## S4 method for signature 'OnDiskMSnExp' findChromPeaksIsolationWindow( object, param, msLevel = 2L, isolationWindow = isolationWindowTargetMz(object), ... )
object |
|
... |
currently not used. |
param |
Peak detection parameter object, such as a
CentWaveParam object defining and configuring the chromographic
peak detection algorithm.
See also |
msLevel |
|
isolationWindow |
|
chunkSize |
if |
BPPARAM |
if |
An XcmsExperiment or XCMSnExp object with the chromatographic peaks
identified in spectra of each isolation window from each file added to the
chromPeaks matrix.
Isolation window definition for each identified peak are stored as additional
columns in chromPeakData().
Johannes Rainer, Michael Witting
reconstructChromPeakSpectra() for the function to reconstruct
MS2 spectra for each MS1 chromatographic peak.
This is a method to find a fragment mass with a ppm window in a xcmsFragment object
findMZ(object, find, ppmE=25, print=TRUE)findMZ(object, find, ppmE=25, print=TRUE)
object |
xcmsFragment object type |
find |
The fragment ion to be found |
ppmE |
the ppm error window for searching |
print |
If we should print a nice little report |
The method simply searches for a given fragment ion in an xcmsFragment object type given a certain ppm error window
A data frame with the following columns:
PrecursorMz |
The precursor m/z of the fragment |
MSnParentPeakID |
An index ID of the location of the precursor peak in the xcmsFragment object |
msLevel |
The level of the found fragment ion |
rt |
the Retention time of the found ion |
mz |
the actual m/z of the found fragment ion |
intensity |
The intensity of the fragment ion |
sample |
Which sample the fragment ion came from |
GroupPeakMSn |
an ID if the peaks were grouped by an xcmsSet grouping |
CollisionEnergy |
The collision energy of the precursor scan |
H. Paul Benton, [email protected]
H. Paul Benton, D.M. Wong, S.A.Strauger, G. Siuzdak "XC"
Analytical Chemistry 2008
This is a method to find a neutral loss with a ppm window in a xcmsFragment object
findneutral(object, find, ppmE=25, print=TRUE)findneutral(object, find, ppmE=25, print=TRUE)
object |
xcmsFragment object type |
find |
The neutral loss to be found |
ppmE |
the ppm error window for searching |
print |
If we should print a nice little report |
The method searches for a given neutral loss in an xcmsFragment object type given a certain ppm error window. The neutral losses are generated between neighbouring ions. The resulting data frame shows the whole scan in which the neutral loss was found.
A data frame with the following columns:
PrecursorMz |
The precursor m/z of the neutral losses |
MSnParentPeakID |
An index ID of the location of the precursor peak in the xcmsFragment object |
msLevel |
The level of the found fragment ion |
rt |
the Retention time of the found ion |
mz |
the actual m/z of the found fragment ion |
intensity |
The intensity of the fragment ion |
sample |
Which sample the fragment ion came from |
GroupPeakMSn |
an ID if the peaks were grouped by an xcmsSet grouping |
CollisionEnergy |
The collision energy of the precursor scan |
H. Paul Benton, [email protected]
H. Paul Benton, D.M. Wong, S.A.Strauger, G. Siuzdak "XC"
Analytical Chemistry 2008
## Not run: library(msdata) mzMLpath <- system.file("iontrap", package = "msdata") mzMLfiles<-list.files(mzMLpath, pattern = "extracted.mzML", recursive = TRUE, full.names = TRUE) xs <- xcmsSet(mzMLfiles, method = "MS1") ##takes only one file from the file set xfrag <- xcmsFragments(xs) found<-findneutral(xfrag, 58.1455, 50) ## End(Not run)## Not run: library(msdata) mzMLpath <- system.file("iontrap", package = "msdata") mzMLfiles<-list.files(mzMLpath, pattern = "extracted.mzML", recursive = TRUE, full.names = TRUE) xs <- xcmsSet(mzMLfiles, method = "MS1") ##takes only one file from the file set xfrag <- xcmsFragments(xs) found<-findneutral(xfrag, 58.1455, 50) ## End(Not run)
A number of peak pickers exist in XCMS. findPeaks
is the generic method.
object |
|
method |
Method to use for peak detection. See details. |
... |
Optional arguments to be passed along |
Different algorithms can be used by specifying them with the
method argument. For example to use the matched filter
approach described by Smith et al (2006) one would use:
findPeaks(object, method="matchedFilter"). This is also
the default.
Further arguments given by ... are
passed through to the function implementing
the method.
A character vector of nicknames for the
algorithms available is returned by
getOption("BioC")$xcms$findPeaks.methods.
If the nickname of a method is called "centWave",
the help page for that specific method can
be accessed with ?findPeaks.centWave.
A matrix with columns:
mz |
weighted (by intensity) mean of peak m/z across scans |
mzmin |
m/z of minimum step |
mzmax |
m/z of maximum step |
rt |
retention time of peak midpoint |
rtmin |
leading edge of peak retention time |
rtmax |
trailing edge of peak retention time |
into |
integrated area of original (raw) peak |
maxo |
maximum intensity of original (raw) peak |
and additional columns depending on the choosen method.
findPeaks(object, ...)
findPeaks.matchedFilter
findPeaks.centWave
findPeaks.addPredictedIsotopeFeatures
findPeaks.centWaveWithPredictedIsotopeROIs
xcmsRaw-class
Perform peak detection in mass spectrometry direct injection spectrum using a wavelet based algorithm.
The findChromPeaks,OnDiskMSnExp,MSWParam()
method performs peak detection in single-spectrum non-chromatography MS
data using functionality from the MassSpecWavelet package on all
samples from an OnDiskMSnExp object.
OnDiskMSnExp objects encapsule all experiment
specific data and load the spectra data (mz and intensity values) on the
fly from the original files applying also all eventual data
manipulations.
MSWParam( snthresh = 3, verboseColumns = FALSE, scales = c(1, seq(2, 30, 2), seq(32, 64, 4)), nearbyPeak = TRUE, peakScaleRange = 5, ampTh = 0.01, minNoiseLevel = ampTh/snthresh, ridgeLength = 24, peakThr = NULL, tuneIn = FALSE, ... ) ## S4 method for signature 'OnDiskMSnExp,MSWParam' findChromPeaks( object, param, BPPARAM = bpparam(), return.type = "XCMSnExp", msLevel = 1L, ... )MSWParam( snthresh = 3, verboseColumns = FALSE, scales = c(1, seq(2, 30, 2), seq(32, 64, 4)), nearbyPeak = TRUE, peakScaleRange = 5, ampTh = 0.01, minNoiseLevel = ampTh/snthresh, ridgeLength = 24, peakThr = NULL, tuneIn = FALSE, ... ) ## S4 method for signature 'OnDiskMSnExp,MSWParam' findChromPeaks( object, param, BPPARAM = bpparam(), return.type = "XCMSnExp", msLevel = 1L, ... )
snthresh |
|
verboseColumns |
|
scales |
Numeric defining the scales of the continuous wavelet transform (CWT). |
nearbyPeak |
|
peakScaleRange |
|
ampTh |
|
minNoiseLevel |
|
ridgeLength |
|
peakThr |
|
tuneIn |
|
... |
Additional parameters to be passed to the
|
object |
For For all other methods: a parameter object. |
param |
An |
BPPARAM |
A parameter class specifying if and how parallel processing
should be performed. It defaults to |
return.type |
Character specifying what type of object the method should
return. Can be either |
msLevel |
|
This is a wrapper for the peak picker in Bioconductor's
MassSpecWavelet package calling
peakDetectionCWT and
tuneInPeakInfo functions. See the
xcmsDirect vignette for more information.
Parallel processing (one process per sample) is supported and can
be configured either by the BPPARAM parameter or by globally
defining the parallel processing mode using the
BiocParallel::register() method from the BiocParallel
package.
The MSWParam() function returns a MSWParam
class instance with all of the settings specified for peak detection by
the MSW method.
For findChromPeaks(): if return.type = "XCMSnExp" an
XCMSnExp object with the results of the peak detection.
If return.type = "list" a list of length equal to the number of
samples with matrices specifying the identified peaks.
If return.type = "xcmsSet" an xcmsSet object
with the results of the detection.
Joachim Kutzera, Steffen Neumann, Johannes Rainer
The do_findPeaks_MSW() core API function
and findPeaks.MSW() for the old user interface.
XCMSnExp() for the object containing the results of
the peak detection.
Other peak detection methods:
findChromPeaks(),
findChromPeaks-centWave,
findChromPeaks-centWaveWithPredIsoROIs,
findChromPeaks-massifquant,
findChromPeaks-matchedFilter
library(MSnbase) ## Create a MSWParam object mp <- MSWParam(snthresh = 15) mp ## Loading a small subset of direct injection, single spectrum files library(MsDataHub) fl <- MsDataHub::HAM004_641fE_14.11.07..Exp1.extracted.mzML() fticr <- readMSData(fl, msLevel. = 1, mode = "onDisk") ## Perform the MSW peak detection on these: p <- MSWParam(scales = c(1, 7), peakThr = 80000, ampTh = 0.005, SNR.method = "data.mean", winSize.noise = 500) fticr <- findChromPeaks(fticr, param = p) head(chromPeaks(fticr))library(MSnbase) ## Create a MSWParam object mp <- MSWParam(snthresh = 15) mp ## Loading a small subset of direct injection, single spectrum files library(MsDataHub) fl <- MsDataHub::HAM004_641fE_14.11.07..Exp1.extracted.mzML() fticr <- readMSData(fl, msLevel. = 1, mode = "onDisk") ## Perform the MSW peak detection on these: p <- MSWParam(scales = c(1, 7), peakThr = 80000, ampTh = 0.005, SNR.method = "data.mean", winSize.noise = 500) fticr <- findChromPeaks(fticr, param = p) head(chromPeaks(fticr))
Peak density and wavelet based feature detection aiming at isotope peaks for high resolution LC/MS data in centroid mode
object |
|
ppm |
maxmial tolerated m/z deviation in consecutive scans, in ppm (parts per million) |
peakwidth |
Chromatographic peak width, given as range (min,max) in seconds |
prefilter |
|
mzCenterFun |
Function to calculate the m/z center of the feature: |
integrate |
Integration method. If |
mzdiff |
minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap |
fitgauss |
logical, if TRUE a Gaussian is fitted to each peak |
scanrange |
scan range to process |
noise |
optional argument which is useful for data that was centroided without any intensity threshold,
centroids with intensity < |
sleep |
number of seconds to pause between plotting peak finding cycles |
verbose.columns |
logical, if TRUE additional peak meta data columns are returned |
xcmsPeaks |
peak list picked using the |
snthresh |
signal to noise ratio cutoff, definition see below. |
maxcharge |
max. number of the isotope charge. |
maxiso |
max. number of the isotope peaks to predict for each detected feature. |
mzIntervalExtension |
logical, if TRUE predicted isotope ROIs (regions of interest) are extended in the m/z dimension to increase the detection of low intensity and hence noisy peaks. |
This algorithm is most suitable for high resolution LC/{TOF,OrbiTrap,FTICR}-MS data in centroid mode. In the first phase of the method isotope ROIs (regions of interest) in the LC/MS map are predicted.
In the second phase these mass traces are further analysed.
Continuous wavelet transform (CWT) is used to locate chromatographic peaks on different scales.
The resulting peak list and the given peak list (xcmsPeaks) are merged and redundant peaks are removed.
A matrix with columns:
mz |
weighted (by intensity) mean of peak m/z across scans |
mzmin |
m/z peak minimum |
mzmax |
m/z peak maximum |
rt |
retention time of peak midpoint |
rtmin |
leading edge of peak retention time |
rtmax |
trailing edge of peak retention time |
into |
integrated peak intensity |
intb |
baseline corrected integrated peak intensity |
maxo |
maximum peak intensity |
sn |
Signal/Noise ratio, defined as |
egauss |
RMSE of Gaussian fit |
if verbose.columns is TRUE additionally :
mu |
Gaussian parameter mu |
sigma |
Gaussian parameter sigma |
h |
Gaussian parameter h |
f |
Region number of m/z ROI where the peak was localised |
dppm |
m/z deviation of mass trace across scans in ppm |
scale |
Scale on which the peak was localised |
scpos |
Peak position found by wavelet analysis |
scmin |
Left peak limit found by wavelet analysis (scan number) |
scmax |
Right peak limit found by wavelet analysis (scan number) |
findPeaks.centWave(object, ppm=25, peakwidth=c(20,50),
prefilter=c(3,100), mzCenterFun="wMean", integrate=1, mzdiff=-0.001, fitgauss=FALSE,
scanrange= numeric(), noise=0, sleep=0, verbose.columns=FALSE, xcmsPeaks, snthresh=6.25, maxcharge=3, maxiso=5, mzIntervalExtension=TRUE)
Ralf Tautenhahn
Ralf Tautenhahn, Christoph Böttcher, and Steffen Neumann "Highly sensitive feature detection for high resolution LC/MS" BMC Bioinformatics 2008, 9:504\ Hendrik Treutler and Steffen Neumann. "Prediction, detection, and validation of isotope clusters in mass spectrometry data" Submitted to Metabolites 2016, Special Issue "Bioinformatics and Data Analysis"
findPeaks.centWave
findPeaks-methods
xcmsRaw-class
Peak density and wavelet based feature detection for high resolution LC/MS data in centroid mode
object |
|
ppm |
maxmial tolerated m/z deviation in consecutive scans, in ppm (parts per million) |
peakwidth |
Chromatographic peak width, given as range (min,max) in seconds |
snthresh |
signal to noise ratio cutoff, definition see below. |
prefilter |
|
mzCenterFun |
Function to calculate the m/z center of the feature: |
integrate |
Integration method. If |
mzdiff |
minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap |
fitgauss |
logical, if TRUE a Gaussian is fitted to each peak |
scanrange |
scan range to process |
noise |
optional argument which is useful for data that was centroided without any intensity threshold,
centroids with intensity < |
sleep |
number of seconds to pause between plotting peak finding cycles |
verbose.columns |
logical, if TRUE additional peak meta data columns are returned |
ROI.list |
A optional list of ROIs that represents detected mass traces (ROIs). If this list is empty (default) then centWave detects the mass trace ROIs,
otherwise this step is skipped and the supplied ROIs are used in the peak detection phase. Each ROI object in the list has the following slots:
|
firstBaselineCheck |
logical, if TRUE continuous data within ROI is checked to be above 1st baseline |
roiScales |
numeric, optional vector of scales for each ROI in |
This algorithm is most suitable for high resolution LC/{TOF,OrbiTrap,FTICR}-MS data in centroid mode. In the first phase of the method mass traces (characterised as regions with less than ppm m/z deviation in consecutive scans) in the LC/MS map are located.
In the second phase these mass traces are further analysed.
Continuous wavelet transform (CWT) is used to locate chromatographic peaks on different scales.
A matrix with columns:
mz |
weighted (by intensity) mean of peak m/z across scans |
mzmin |
m/z peak minimum |
mzmax |
m/z peak maximum |
rt |
retention time of peak midpoint |
rtmin |
leading edge of peak retention time |
rtmax |
trailing edge of peak retention time |
into |
integrated peak intensity |
intb |
baseline corrected integrated peak intensity |
maxo |
maximum peak intensity |
sn |
Signal/Noise ratio, defined as |
egauss |
RMSE of Gaussian fit |
if verbose.columns is TRUE additionally :
mu |
Gaussian parameter mu |
sigma |
Gaussian parameter sigma |
h |
Gaussian parameter h |
f |
Region number of m/z ROI where the peak was localised |
dppm |
m/z deviation of mass trace across scans in ppm |
scale |
Scale on which the peak was localised |
scpos |
Peak position found by wavelet analysis |
scmin |
Left peak limit found by wavelet analysis (scan number) |
scmax |
Right peak limit found by wavelet analysis (scan number) |
findPeaks.centWave(object, ppm=25, peakwidth=c(20,50), snthresh=10,
prefilter=c(3,100), mzCenterFun="wMean", integrate=1, mzdiff=-0.001, fitgauss=FALSE,
scanrange= numeric(), noise=0, sleep=0, verbose.columns=FALSE, ROI.list=list()),
firstBaselineCheck=TRUE, roiScales=NULL
Ralf Tautenhahn
Ralf Tautenhahn, Christoph Böttcher, and Steffen Neumann "Highly sensitive feature detection for high resolution LC/MS" BMC Bioinformatics 2008, 9:504
centWave for the new user interface.
findPeaks-methods
xcmsRaw-class
Peak density and wavelet based feature detection for high resolution LC/MS data in centroid mode with additional peak picking of isotope features on basis of isotope peak predictions
object |
|
ppm |
maxmial tolerated m/z deviation in consecutive scans, in ppm (parts per million) |
peakwidth |
Chromatographic peak width, given as range (min,max) in seconds |
snthresh |
signal to noise ratio cutoff, definition see below. |
prefilter |
|
mzCenterFun |
Function to calculate the m/z center of the feature: |
integrate |
Integration method. If |
mzdiff |
minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap |
fitgauss |
logical, if TRUE a Gaussian is fitted to each peak |
scanrange |
scan range to process |
noise |
optional argument which is useful for data that was centroided without any intensity threshold,
centroids with intensity < |
sleep |
number of seconds to pause between plotting peak finding cycles |
verbose.columns |
logical, if TRUE additional peak meta data columns are returned |
ROI.list |
A optional list of ROIs that represents detected mass traces (ROIs). If this list is empty (default) then centWave detects the mass trace ROIs,
otherwise this step is skipped and the supplied ROIs are used in the peak detection phase. Each ROI object in the list has the following slots:
|
firstBaselineCheck |
logical, if TRUE continuous data within ROI is checked to be above 1st baseline |
roiScales |
numeric, optional vector of scales for each ROI in |
snthreshIsoROIs |
signal to noise ratio cutoff for predicted isotope ROIs, definition see below. |
maxcharge |
max. number of the isotope charge. |
maxiso |
max. number of the isotope peaks to predict for each detected feature. |
mzIntervalExtension |
logical, if TRUE predicted isotope ROIs (regions of interest) are extended in the m/z dimension to increase the detection of low intensity and hence noisy peaks. |
This algorithm is most suitable for high resolution LC/{TOF,OrbiTrap,FTICR}-MS data in centroid mode.
The centWave algorithm is applied in two peak picking steps as follows. In the first peak picking step ROIs (regions of interest, characterised as regions with less than ppm m/z deviation in consecutive scans) in the LC/MS map are located and further analysed using continuous wavelet transform (CWT) for the localization of chromatographic peaks on different scales.
In the second peak picking step isotope ROIs in the LC/MS map are predicted further analysed using continuous wavelet transform (CWT) for the localization of chromatographic peaks on different scales.
The peak lists resulting from both peak picking steps are merged and redundant peaks are removed.
A matrix with columns:
mz |
weighted (by intensity) mean of peak m/z across scans |
mzmin |
m/z peak minimum |
mzmax |
m/z peak maximum |
rt |
retention time of peak midpoint |
rtmin |
leading edge of peak retention time |
rtmax |
trailing edge of peak retention time |
into |
integrated peak intensity |
intb |
baseline corrected integrated peak intensity |
maxo |
maximum peak intensity |
sn |
Signal/Noise ratio, defined as |
egauss |
RMSE of Gaussian fit |
if verbose.columns is TRUE additionally :
mu |
Gaussian parameter mu |
sigma |
Gaussian parameter sigma |
h |
Gaussian parameter h |
f |
Region number of m/z ROI where the peak was localised |
dppm |
m/z deviation of mass trace across scans in ppm |
scale |
Scale on which the peak was localised |
scpos |
Peak position found by wavelet analysis |
scmin |
Left peak limit found by wavelet analysis (scan number) |
scmax |
Right peak limit found by wavelet analysis (scan number) |
findPeaks.centWaveWithPredictedIsotopeROIs(object, ppm=25, peakwidth=c(20,50), snthresh=10,
prefilter=c(3,100), mzCenterFun="wMean", integrate=1, mzdiff=-0.001, fitgauss=FALSE,
scanrange= numeric(), noise=0, sleep=0, verbose.columns=FALSE, ROI.list=list(),
firstBaselineCheck=TRUE, roiScales=NULL, snthreshIsoROIs=6.25, maxcharge=3, maxiso=5, mzIntervalExtension=TRUE)
Ralf Tautenhahn
Ralf Tautenhahn, Christoph Böttcher, and Steffen Neumann "Highly sensitive feature detection for high resolution LC/MS" BMC Bioinformatics 2008, 9:504\ Hendrik Treutler and Steffen Neumann. "Prediction, detection, and validation of isotope clusters in mass spectrometry data" Submitted to Metabolites 2016, Special Issue "Bioinformatics and Data Analysis"
do_findChromPeaks_centWaveWithPredIsoROIs for the
corresponding core API function.
findPeaks.addPredictedIsotopeFeatures
findPeaks.centWave
findPeaks-methods
xcmsRaw-class
Massifquant is a Kalman filter (KF) based feature detection for XC-MS data in centroid mode (currently in experimental stage). Optionally allows for calling the method "centWave" on features discovered by Massifquant to further refine the feature detection; to do so, supply any additional parameters specific to centWave (even more experimental). The method may be conveniently called through the xcmsSet(...) method.
The following arguments are specific to Massifquant. Any additional arguments supplied must correspond as specified by the method findPeaks.centWave.
object |
An xcmsRaw object. |
criticalValue |
Numeric: Suggested values: (0.1-3.0). This setting helps determine the the Kalman Filter prediciton margin of error. A real centroid belonging to a bonafide feature must fall within the KF prediction margin of error. Much like in the construction of a confidence interval, criticalVal loosely translates to be a multiplier of the standard error of the prediction reported by the Kalman Filter. If the features in the XC-MS sample have a small mass deviance in ppm error, a smaller critical value might be better and vice versa. |
consecMissedLimit |
Integer: Suggested values:(1,2,3). While a feature is in the proces of being detected by a Kalman Filter, the Kalman Filter may not find a predicted centroid in every scan. After 1 or more consecutive failed predictions, this setting informs Massifquant when to stop a Kalman Filter from following a candidate feature. |
prefilter |
Numeric Vector: (Positive Integer, Positive Numeric): The first argument is only used if (withWave = 1); see centWave for details. The second argument specifies the minimum threshold for the maximum intensity of a feature that must be met. |
peakwidth |
Integer Vector: (Positive Integer, Positive Integer): Only the first argument is used for Massifquant, which specifices the minimum feature length in time scans. If centWave is used, then the second argument is the maximum feature length subject to being greater than the mininum feature length. |
ppm |
The minimum estimated parts per million mass resolution a feature must possess. |
unions |
Integer: set to 1 if apply t-test union on segmentation; set to 0 if no t-test to be applied on chromatographically continous features sharing same m/z range. Explanation: With very few data points, sometimes a Kalman Filter stops tracking a feature prematurely. Another Kalman Filter is instantiated and begins following the rest of the signal. Because tracking is done backwards to forwards, this algorithmic defect leaves a real feature divided into two segments or more. With this option turned on, the program identifies segmented features and combines them (merges them) into one with a two sample t-test. The potential danger of this option is that some truly distinct features may be merged. |
withWave |
Integer: set to 1 if turned on; set to 0 if turned off. Allows the user to find features first with Massifquant and then filter those features with the second phase of centWave, which includes wavelet estimation. |
checkBack |
Integer: set to 1 if turned on; set to 0 if turned off. The convergence of a Kalman Filter to a feature's precise m/z mapping is very fast, but sometimes it incorporates erroneous centroids as part of a feature (especially early on). The "scanBack" option is an attempt to remove the occasional outlier that lies beyond the converged bounds of the Kalman Filter. The option does not directly affect identification of a feature because it is a postprocessing measure; it has not shown to be a extremely useful thus far and the default is set to being turned off. |
This algorithm's performance has been tested rigorously on high resolution LC/{OrbiTrap, TOF}-MS data in centroid mode. Simultaneous kalman filters identify features and calculate their area under the curve. The default parameters are set to operate on a complex LC-MS Orbitrap sample. Users will find it useful to do some simple exploratory data analysis to find out where to set a minimum intensity, and identify how many scans an average feature spans. The "consecMissedLimit" parameter has yielded good performance on Orbitrap data when set to (2) and on TOF data it was found best to be at (1). This may change as the algorithm has yet to be tested on many samples. The "criticalValue" parameter is perhaps most dificult to dial in appropriately and visual inspection of peak identification is the best suggested tool for quick optimization. The "ppm" and "checkBack" parameters have shown less influence than the other parameters and exist to give users flexibility and better accuracy.
If the method findPeaks.massifquant(...) is used, then a matrix is returned with rows corresponding to features, and properties of the features listed with the following column names. Otherwise, if centWave feature is used also (withWave = 1), or Massifquant is called through the xcmsSet(...) method, then their corresponding return values are used.
mz |
weighted m/z mean (weighted by intensity) of the feature |
mzmin |
m/z lower boundary of the feature |
mzmax |
m/z upper boundary of the feature |
rtmin |
starting scan time of the feature |
rtmax |
starting scan time of the feature |
into |
the raw quantitation (area under the curve) of the feature. |
area |
feature area that is not normalized by the scan rate. |
findPeaks.massifquant(object, ppm=10, peakwidth=c(20,50), snthresh=10,
prefilter=c(3,100), mzCenterFun="wMean", integrate=1, mzdiff=-0.001,
fitgauss=FALSE, scanrange= numeric(), noise=0,
sleep=0, verbose.columns=FALSE, criticalValue = 1.125, consecMissedLimit = 2,
unions = 1, checkBack = 0, withWave = 0)
Christopher Conley
Submitted for review. Christopher Conley, Ralf J .O Torgrip. Ryan Taylor, and John T. Prince. "Massifquant: open-source Kalman filter based XC-MS feature detection". August 2013.
centWave for the new user interface.
findPeaks-methods
xcmsSet
xcmsRaw
xcmsRaw-class
library(faahKO) library(xcms) #load all the wild type and Knock out samples cdfpath <- system.file("cdf", package = "faahKO") ## Subset to only the first 2 files. cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE)[1:2] ## Run the massifquant analysis. Setting the noise level to 10000 to speed up ## execution of the examples - in a real use case it should be set to a reasoable ## value. xset <- xcmsSet(cdffiles, method = "massifquant", consecMissedLimit = 1, snthresh = 10, criticalValue = 1.73, ppm = 10, peakwidth= c(30, 60), prefilter= c(1,3000), noise = 10000, withWave = 0)library(faahKO) library(xcms) #load all the wild type and Knock out samples cdfpath <- system.file("cdf", package = "faahKO") ## Subset to only the first 2 files. cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE)[1:2] ## Run the massifquant analysis. Setting the noise level to 10000 to speed up ## execution of the examples - in a real use case it should be set to a reasoable ## value. xset <- xcmsSet(cdffiles, method = "massifquant", consecMissedLimit = 1, snthresh = 10, criticalValue = 1.73, ppm = 10, peakwidth= c(30, 60), prefilter= c(1,3000), noise = 10000, withWave = 0)
Find peaks in the chromatographic time domain of the
profile matrix. For more details see
do_findChromPeaks_matchedFilter().
## S4 method for signature 'xcmsRaw' findPeaks.matchedFilter( object, fwhm = 30, sigma = fwhm/2.3548, max = 5, snthresh = 10, step = 0.1, steps = 2, mzdiff = 0.8 - step * steps, index = FALSE, sleep = 0, scanrange = numeric() )## S4 method for signature 'xcmsRaw' findPeaks.matchedFilter( object, fwhm = 30, sigma = fwhm/2.3548, max = 5, snthresh = 10, step = 0.1, steps = 2, mzdiff = 0.8 - step * steps, index = FALSE, sleep = 0, scanrange = numeric() )
object |
The |
fwhm |
|
sigma |
|
max |
|
snthresh |
|
step |
numeric(1) specifying the width of the bins/slices in m/z dimension. |
steps |
|
mzdiff |
|
index |
|
sleep |
(DEPRECATED). The use of this parameter is highly discouraged, as it could cause problems in parallel processing mode. |
scanrange |
Numeric vector defining the range of scans to which the
original |
A matrix, each row representing an intentified chromatographic peak.
Colin A. Smith
Colin A. Smith, Elizabeth J. Want, Grace O'Maille, Ruben Abagyan and Gary Siuzdak. "XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification" Anal. Chem. 2006, 78:779-787. doi: 10.1021/ac051437y
Collecting Tandem MS or MS$^n$ Mass Spectrometry precursor peaks as annotated in XML raw file
object |
|
Some mass spectrometers can acquire MS1 and MS2 (or MS$^n$ scans) quasi simultaneously, e.g. in data dependent tandem MS or DDIT mode.
Since xcmsFragments attaches all MS$^n$ peaks to MS1 peaks in xcmsSet, it is important that findPeaks and xcmsSet do not miss any MS1 precursor peak.
To be sure that all MS1 precursor peaks are in an xcmsSet, findPeaks.MS1 does not do an actual peak picking, but simply uses the annotation stored in mzXML, mzData or mzML raw files.
This relies on the following XML tags:
mzData:
<spectrum id="463">
<spectrumInstrument msLevel="2">
<cvParam cvLabel="psi" accession="PSI:1000039" name="TimeInSeconds" value="92.7743"/>
</spectrumInstrument>
<precursor msLevel="1" spectrumRef="461">
<cvParam cvLabel="psi" accession="PSI:1000040" name="MassToChargeRatio" value="462.091"/>
<cvParam cvLabel="psi" accession="PSI:1000042" name="Intensity" value="366.674"/>
</precursor>
</spectrum>
mzXML:
<scan num="17" msLevel="2" retentionTime="PT1.5224S">
<precursorMz precursorIntensity="125245">220.1828003</precursorMz>
</scan>
Several mzXML and mzData converters are known to create incomplete files, either without intensities (they will be set to 0) or without the precursor retention time (then a reasonably close rt will be chosen. NYI).
A matrix with columns:
mz, mzmin, mzmax
|
annotated MS1 precursor selection mass |
rt, rtmin, rtmax
|
annotated MS1 precursor retention time |
into, maxo, sn
|
annotated MS1 precursor intensity |
findPeaks.MS1(object)
Steffen Neumann, [email protected]
findPeaks-methods
xcmsRaw-class
This method performs peak detection in mass spectrometry direct injection spectrum using a wavelet based algorithm.
## S4 method for signature 'xcmsRaw' findPeaks.MSW(object, snthresh = 3, verbose.columns = FALSE, ...)## S4 method for signature 'xcmsRaw' findPeaks.MSW(object, snthresh = 3, verbose.columns = FALSE, ...)
object |
The |
snthresh |
|
verbose.columns |
Logical whether additional peak meta data columns should be returned. |
... |
Additional parameters to be passed to the
|
This is a wrapper around the peak picker in Bioconductor's
MassSpecWavelet package calling
peakDetectionCWT and
tuneInPeakInfo functions.
A matrix, each row representing an intentified peak.
Joachim Kutzera, Steffen Neumann, Johannes Rainer
The GenericParam class allows to store generic parameter
information such as the name of the function that was/has to be called
(slot fun) and its arguments (slot args). This object is
used to track the process history of the data processings of an
XCMSnExp object. This is in contrast to e.g. the
CentWaveParam() object that is passed to the actual
processing method.
GenericParam(fun = character(), args = list())GenericParam(fun = character(), args = list())
fun |
|
args |
|
The GenericParam() function returns a GenericParam
object.
funcharacter specifying the function name.
argslist (ideally named) with the arguments to the
function.
Johannes Rainer
processHistory() for how to access the process history
of an XCMSnExp object.
prm <- GenericParam(fun = "mean") prm <- GenericParam(fun = "mean", args = list(na.rm = TRUE))prm <- GenericParam(fun = "mean") prm <- GenericParam(fun = "mean", args = list(na.rm = TRUE))
Generate multiple extracted ion chromatograms for m/z values of
interest. For xcmsSet objects, reread original raw data
and apply precomputed retention time correction, if applicable.
Note that this method will always return profile, not raw data (with profile data being the binned data along M/Z). See details for further information.
object |
the |
mzrange |
Either a two column matrix with minimum or maximum m/z or a
matrix of any dimensions containing columns For |
rtrange |
A two column matrix the same size as For |
step |
step (bin) size to use for profile generation. Note that a
value of |
groupidx |
either character vector with names or integer vector with indicies of peak groups for which to get EICs |
sampleidx |
either character vector with names or integer vector with indicies of samples for which to get EICs |
rt |
|
In contrast to the rawEIC method, that extracts the
actual raw values, this method extracts them from the object's profile
matrix (or if the provided step argument does not match the
profStep of the object the profile matrix is calculated
on the fly and the values returned).
For xcmsSet and xcmsRaw objects, an xcmsEIC object.
getEIC(object, mzrange, rtrange = NULL, step = 0.1)
getEIC(object, mzrange, rtrange = 200, groupidx,
sampleidx = sampnames(object), rt = c("corrected", "raw"))
xcmsRaw-class,
xcmsSet-class,
xcmsEIC-class,
rawEIC
Integrate extracted ion chromatograms in pre-defined defined
regions. Return output similar to findPeaks.
object |
the |
peakrange |
matrix or data frame with 4 columns: |
step |
step size to use for profile generation |
A matrix with columns:
i |
rank of peak identified in merged EIC (<= |
mz |
weighted (by intensity) mean of peak m/z across scans |
mzmin |
m/z of minimum step |
mzmax |
m/z of maximum step |
ret |
retention time of peak midpoint |
retmin |
leading edge of peak retention time |
retmax |
trailing edge of peak retention time |
into |
integrated area of original (raw) peak |
intf |
integrated area of filtered peak, always |
maxo |
maximum intensity of original (raw) peak |
maxf |
maximum intensity of filtered peak, always |
getPeaks(object, peakrange, step = 0.1)
Return the data from a single mass scan using the numeric index of the scan as a reference.
object |
the |
scan |
integer index of scan. if negative, the index numbered from the end |
mzrange |
limit data points returned to those between in the range,
|
A matrix with two columns:
mz |
m/z values |
intensity |
intensity values |
getScan(object, scan, mzrange = numeric())
getMsnScan(object, scan, mzrange = numeric())
Return full-resolution averaged data from multiple mass scans.
object |
the |
... |
arguments passed to |
Based on the mass points from the spectra selected, a master unique list of masses is generated. Every spectra is interpolated at those masses and then averaged.
A matrix with two columns:
mz |
m/z values |
intensity |
intensity values |
getSpec(object, ...)
xcmsRaw-class,
profRange,
getScan
Reads the raw data applies evential retention time corrections and
waters Lock mass correction and
returns it as an xcmsRaw object (or list of xcmsRaw
objects) for one or more files of the xcmsSet object.
object |
the |
sampleidx |
The index of the sample for which the raw data should be returned. Can be a single number or a numeric vector with the indices. Alternatively, the file name can be specified. |
profmethod |
The profile method. |
profstep |
The profile step. |
rt |
Whether corrected or raw retention times should be returned. |
... |
Additional arguments submitted to the |
A single xcmsRaw object or a list of xcmsRaw objects.
getXcmsRaw(object, sampleidx=1,
profmethod=profinfo(object)$method, profstep=profinfo(object)$step,
rt=c("corrected", "raw"), ...
)
Johannes Rainer, [email protected]
A number of grouping (or alignment) methods exist in XCMS. group
is the generic method.
object |
|
method |
Method to use for grouping. See details. |
... |
Optional arguments to be passed along |
Different algorithms can be used by specifying them with the
method argument. For example to use the density-based
approach described by Smith et al (2006) one would use:
group(object, method="density"). This is also the default.
Further arguments given by ... are
passed through to the function implementing
the method.
A character vector of nicknames for the
algorithms available is returned by
getOption("BioC")$xcms$group.methods.
If the nickname of a method is called "mzClust",
the help page for that specific method can
be accessed with ?group.mzClust.
An xcmsSet object with peak group assignments and statistics.
group(object, ...)
group.density
group.mzClust
group.nearest
xcmsSet-class,
Group peaks together across samples using overlapping m/z bins and calculation of smoothed peak distributions in chromatographic time.
object |
the |
minfrac |
minimum fraction of samples necessary in at least one of the sample groups for it to be a valid group |
minsamp |
minimum number of samples necessary in at least one of the sample groups for it to be a valid group |
bw |
bandwidth (standard deviation or half width at half maximum) of gaussian smoothing kernel to apply to the peak density chromatogram |
mzwid |
width of overlapping m/z slices to use for creating peak density chromatograms and grouping peaks across samples |
max |
maximum number of groups to identify in a single m/z slice |
sleep |
seconds to pause between plotting successive steps of the peak grouping algorithm. peaks are plotted as points showing relative intensity. identified groups are flanked by dotted vertical lines. |
An xcmsSet object with peak group assignments and statistics.
group(object, bw = 30, minfrac = 0.5, minsamp = 1,
mzwid = 0.25, max = 50, sleep = 0)
do_groupChromPeaks_density for the core API function
performing the analysis.
xcmsSet-class,
density
Runs high resolution alignment on single spectra samples stored in a given xcmsSet.
object |
a xcmsSet with peaks |
mzppm |
the relative error used for clustering/grouping in ppm (parts per million) |
mzabs |
the absolute error used for clustering/grouping |
minsamp |
set the minimum number of samples in one bin |
minfrac |
set the minimum fraction of each class in one bin |
Returns a xcmsSet with slots groups and groupindex set.
group(object, method="mzClust", mzppm = 20, mzabs = 0, minsamp = 1, minfrac=0)
Saira A. Kazmi, Samiran Ghosh, Dong-Guk Shin,
Dennis W. Hill and David F. Grant
Alignment of high resolution mass spectra: development of a heuristic
approach for metabolomics.
Metabolomics, Vol. 2, No. 2, 75-83 (2006)
## Not run: library(MsDataHub) mzMLfiles <- c(MsDataHub::HAM004_641fE_14.11.07..Exp1.extracted.mzML(), MsDataHub::HAM004_641fE_14.11.07..Exp2.extracted.mzML(), MsDataHub::HAM005_641fE_14.11.07..Exp1.extracted.mzML(), MsDataHub::HAM005_641fE_14.11.07..Exp2.extracted.mzML()) xs <- xcmsSet(method="MSW", files=mzMLfiles, scales=c(1,7), SNR.method='data.mean' , winSize.noise=500, peakThr=80000, amp.Th=0.005) xsg <- group(xs, method="mzClust") ## End(Not run)## Not run: library(MsDataHub) mzMLfiles <- c(MsDataHub::HAM004_641fE_14.11.07..Exp1.extracted.mzML(), MsDataHub::HAM004_641fE_14.11.07..Exp2.extracted.mzML(), MsDataHub::HAM005_641fE_14.11.07..Exp1.extracted.mzML(), MsDataHub::HAM005_641fE_14.11.07..Exp2.extracted.mzML()) xs <- xcmsSet(method="MSW", files=mzMLfiles, scales=c(1,7), SNR.method='data.mean' , winSize.noise=500, peakThr=80000, amp.Th=0.005) xsg <- group(xs, method="mzClust") ## End(Not run)
Group peaks together across samples by creating a master peak list and assigning corresponding peaks from all samples. It is inspired by the alignment algorithm of mzMine. For further details check http://mzmine.sourceforge.net/ and
Katajamaa M, Miettinen J, Oresic M: MZmine: Toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics (Oxford, England) 2006, 22:634?636.
Currently, there is no equivalent to minfrac or minsamp.
object |
the |
mzVsRTbalance |
Multiplicator for mz value before calculating the (euclidean) distance between two peaks. |
mzCheck |
Maximum tolerated distance for mz. |
rtCheck |
Maximum tolerated distance for RT. |
kNN |
Number of nearest Neighbours to check |
An xcmsSet object with peak group assignments and statistics.
group(object, mzVsRTbalance=10, mzCheck=0.2, rtCheck=15, kNN=10)
xcmsSet-class,
group.density and
group.mzClust
## Not run: library(xcms) library(faahKO) ## These files do not have this problem to correct for ## but just for an example cdfpath <- system.file("cdf", package = "faahKO") cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE) xset<-xcmsSet(cdffiles) gxset<-group(xset, method="nearest") nrow(gxset@groups) == 1096 ## the number of features before minFrac post.minFrac<-function(object, minFrac=0.5){ ix.minFrac<-sapply(1:length(unique(sampclass(object))), function(x, object, mf){ meta<-groups(object) minFrac.idx<-numeric(length=nrow(meta)) idx<-which( meta[,levels(sampclass(object))[x]] >= mf*length(which(levels(sampclass(object))[x] == sampclass(object)) )) minFrac.idx[idx]<-1 return(minFrac.idx) }, object, minFrac) ix.minFrac<-as.logical(apply(ix.minFrac, 1, sum)) ix<-which(ix.minFrac == TRUE) return(ix) } ## using the above function we can get a post processing minFrac idx<-post.minFrac(gxset) gxset.post<-gxset ## copy the xcmsSet object gxset.post@groupidx<-gxset@groupidx[idx] gxset.post@groups<-gxset@groups[idx,] nrow(gxset.post@groups) == 465 ## number of features after minFrac ## End(Not run)## Not run: library(xcms) library(faahKO) ## These files do not have this problem to correct for ## but just for an example cdfpath <- system.file("cdf", package = "faahKO") cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE) xset<-xcmsSet(cdffiles) gxset<-group(xset, method="nearest") nrow(gxset@groups) == 1096 ## the number of features before minFrac post.minFrac<-function(object, minFrac=0.5){ ix.minFrac<-sapply(1:length(unique(sampclass(object))), function(x, object, mf){ meta<-groups(object) minFrac.idx<-numeric(length=nrow(meta)) idx<-which( meta[,levels(sampclass(object))[x]] >= mf*length(which(levels(sampclass(object))[x] == sampclass(object)) )) minFrac.idx[idx]<-1 return(minFrac.idx) }, object, minFrac) ix.minFrac<-as.logical(apply(ix.minFrac, 1, sum)) ix<-which(ix.minFrac == TRUE) return(ix) } ## using the above function we can get a post processing minFrac idx<-post.minFrac(gxset) gxset.post<-gxset ## copy the xcmsSet object gxset.post@groupidx<-gxset@groupidx[idx] gxset.post@groups<-gxset@groups[idx,] nrow(gxset.post@groups) == 465 ## number of features after minFrac ## End(Not run)
The groupChromPeaks method performs a correspondence analysis i.e., it
groups chromatographic peaks across samples to define the LC-MS features.
The correspondence algorithm can be selected, and configured, using the
param argument. See documentation of XcmsExperiment() and XCMSnExp()
for information on how to access and extract correspondence results.
The correspondence analysis can be performed on chromatographic peaks of
any MS level (if present and if chromatographic peak detection has been
performed for that MS level) defining features combining these peaks. The
MS level can be selected with the parameter msLevel. By default, calling
groupChromPeaks will remove any previous correspondence results. This can
be disabled with add = TRUE, which will add newly defined features to
already present feature definitions.
Supported param objects are:
PeakDensityParam: correspondence using the peak density method
(Smith 2006) that groups chromatographic peaks along the retention time
axis within slices of (partially overlapping) m/z ranges. By default,
these m/z ranges (bins) have a constant size. By setting ppm to a value
larger than 0, m/z dependent bin sizes can be used instead (better
representing the m/z dependent measurement error of some MS instruments).
All peaks (from the same or from different samples) with their apex
position being close on the retention time axis are grouped into a LC-MS
feature. Only samples with non-missing sample group assignment (i.e., for
which the value provided with parameter sampleGroups is different than
NA) are considered and counted for the feature definition. This allows
to exclude certain samples or groups (e.g. blanks) from the feature
definition avoiding thus features with only detected peaks in these. Note
that this affects only the definition of new features.
Chromatographic peaks in these samples will still be assigned to features
which were defined based on the other samples.
See in addition do_groupChromPeaks_density() for the core API
function.
NearestPeaksParam: performs peak grouping based on the proximity of
chromatographic peaks from different samples in the m/z - rt space similar
to the correspondence method of mzMine (Katajamaa 2006). The method
creates first a master peak list consisting of all chromatographic peaks
from the sample with the most detected peaks and iteratively calculates
distances to peaks from the sample with the next most number of peaks
grouping peaks together if their distance is smaller than the provided
thresholds.
See in addition do_groupChromPeaks_nearest() for the core API function.
MzClustParam: performs high resolution peak grouping for
single spectrum metabolomics data (Kazmi 2006). This method should
only be used for such data as the retention time is not considered
in the correspondence analysis.
See in addition do_groupPeaks_mzClust() for the core API function.
For specific examples and description of the method and settings see the help pages of the individual parameter classes listed above.
groupChromPeaks(object, param, ...) ## S4 method for signature 'XcmsExperiment,Param' groupChromPeaks(object, param, msLevel = 1L, add = FALSE) PeakDensityParam( sampleGroups = numeric(), bw = 30, minFraction = 0.5, minSamples = 1, binSize = 0.25, ppm = 0, maxFeatures = 50, rtCenterFun = c("median", "mean", "wMean") ) MzClustParam( sampleGroups = numeric(), ppm = 20, absMz = 0, minFraction = 0.5, minSamples = 1 ) NearestPeaksParam( sampleGroups = numeric(), mzVsRtBalance = 10, absMz = 0.2, absRt = 15, kNN = 10 ) ## S4 method for signature 'PeakDensityParam' as.list(x, ...) ## S4 method for signature 'XCMSnExp,PeakDensityParam' groupChromPeaks(object, param, msLevel = 1L, add = FALSE) ## S4 method for signature 'XCMSnExp,MzClustParam' groupChromPeaks(object, param, msLevel = 1L) ## S4 method for signature 'XCMSnExp,NearestPeaksParam' groupChromPeaks(object, param, msLevel = 1L, add = FALSE)groupChromPeaks(object, param, ...) ## S4 method for signature 'XcmsExperiment,Param' groupChromPeaks(object, param, msLevel = 1L, add = FALSE) PeakDensityParam( sampleGroups = numeric(), bw = 30, minFraction = 0.5, minSamples = 1, binSize = 0.25, ppm = 0, maxFeatures = 50, rtCenterFun = c("median", "mean", "wMean") ) MzClustParam( sampleGroups = numeric(), ppm = 20, absMz = 0, minFraction = 0.5, minSamples = 1 ) NearestPeaksParam( sampleGroups = numeric(), mzVsRtBalance = 10, absMz = 0.2, absRt = 15, kNN = 10 ) ## S4 method for signature 'PeakDensityParam' as.list(x, ...) ## S4 method for signature 'XCMSnExp,PeakDensityParam' groupChromPeaks(object, param, msLevel = 1L, add = FALSE) ## S4 method for signature 'XCMSnExp,MzClustParam' groupChromPeaks(object, param, msLevel = 1L) ## S4 method for signature 'XCMSnExp,NearestPeaksParam' groupChromPeaks(object, param, msLevel = 1L, add = FALSE)
object |
The data object on which the correspondence analysis should be
performed. Can be an |
param |
The parameter object selecting and configuring the algorithm. |
... |
Optional parameters. |
msLevel |
|
add |
|
sampleGroups |
For |
bw |
For |
minFraction |
For |
minSamples |
For |
binSize |
For |
ppm |
For |
maxFeatures |
For |
rtCenterFun |
For |
absMz |
For |
mzVsRtBalance |
For |
absRt |
For |
kNN |
For |
x |
The parameter object. |
For groupChromPeaks: either an XcmsExperiment() or XCMSnExp()
object with the correspondence result.
Colin Smith, Johannes Rainer
Smith, C.A., Want E.J., O'Maille G., Abagyan R., and Siuzdak G. (2006) "XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification" Anal. Chem. 78:779-787. doi: 10.1021/ac051437y
Katajamaa, M., Miettinen, J., Oresic, M. (2006) "MZmine: Toolbox for processing and visualization of mass spectrometry based molecular profile data". Bioinformatics, 22:634-636. doi: 10.1093/bioinformatics/btk039
Kazmi S. A., Ghosh, S., Shin, D., Hill, D.W., and Grant, D.F. (2006) "Alignment of high resolution mass spectra: development of a heuristic approach for metabolomics. Metabolomics Vol. 2, No. 2, 75-83.
Features from the same originating compound are expected to have similar
intensities across samples. This method thus groups features based on
similarity of abundances (i.e. feature values) across samples in a
data set.
See also MsFeatures::AbundanceSimilarityParam() for additional
information and details.
This help page lists parameters specific for xcms result objects (i.e.
XcmsExperiment() and XCMSnExp() objects). Documentation of the
parameters for the similarity calculation is available in the
MsFeatures::AbundanceSimilarityParam() help page in the MsFeatures
package.
## S4 method for signature 'XcmsResult,AbundanceSimilarityParam' groupFeatures( object, param, msLevel = 1L, method = c("medret", "maxint", "sum"), value = "into", intensity = "into", filled = TRUE, ... )## S4 method for signature 'XcmsResult,AbundanceSimilarityParam' groupFeatures( object, param, msLevel = 1L, method = c("medret", "maxint", "sum"), value = "into", intensity = "into", filled = TRUE, ... )
object |
|
param |
|
msLevel |
|
method |
|
value |
|
intensity |
|
filled |
|
... |
additional parameters passed to the |
input object with feature group definitions added to (or updated
in) a column "feature_group" in its featureDefinitions data frame.
Johannes Rainer
feature-grouping for a general overview.
Other feature grouping methods:
groupFeatures-eic-similarity,
groupFeatures-similar-rtime
library(MsFeatures) library(MsExperiment) ## Load a test data set with detected peaks faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Group chromatographic peaks across samples xodg <- groupChromPeaks(faahko_sub, param = PeakDensityParam(sampleGroups = rep(1, 3))) ## Group features based on correlation of feature values (integrated ## peak area) across samples. Note that there are many missing values ## in the feature value which influence grouping of features in the present ## data set. xodg_grp <- groupFeatures(xodg, param = AbundanceSimilarityParam(threshold = 0.8)) table(featureDefinitions(xodg_grp)$feature_group) ## Group based on the maximal peak intensity per feature xodg_grp <- groupFeatures(xodg, param = AbundanceSimilarityParam(threshold = 0.8, value = "maxo")) table(featureDefinitions(xodg_grp)$feature_group)library(MsFeatures) library(MsExperiment) ## Load a test data set with detected peaks faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Group chromatographic peaks across samples xodg <- groupChromPeaks(faahko_sub, param = PeakDensityParam(sampleGroups = rep(1, 3))) ## Group features based on correlation of feature values (integrated ## peak area) across samples. Note that there are many missing values ## in the feature value which influence grouping of features in the present ## data set. xodg_grp <- groupFeatures(xodg, param = AbundanceSimilarityParam(threshold = 0.8)) table(featureDefinitions(xodg_grp)$feature_group) ## Group based on the maximal peak intensity per feature xodg_grp <- groupFeatures(xodg, param = AbundanceSimilarityParam(threshold = 0.8, value = "maxo")) table(featureDefinitions(xodg_grp)$feature_group)
Features from the same originating compound are expected to share their
elution pattern (i.e. chromatographic peak shape) with it.
Thus, this methods allows to group features based on similarity of their
extracted ion chromatograms (EICs). The similarity calculation is performed
separately for each sample with the similarity score being aggregated across
samples for the final generation of the similarity matrix on which the
grouping (considering parameter threshold) will be performed.
The MSnbase::compareChromatograms() function is used for similarity
calculation which by default calculates the Pearson's correlation
coefficient. The
settings for compareChromatograms() can be specified with parameters
ALIGNFUN, ALIGNFUNARGS, FUN and FUNARGS. ALIGNFUN defaults to
alignRt and is the function used to align the chromatograms
before comparison. For information and parameters of alignRt() see the
documentation for MSnbase::Chromatogram().
ALIGNFUNARGS allows to specify additional arguments for the
ALIGNFUN function. It defaults to
ALIGNFUNARGS = list(tolerance = 0, method = "closest") which ensures that
data points from the same spectrum (scan, i.e. with the same retention time)
are compared between the EICs from the same sample. Parameter FUN defines
the function to calculate the similarity score and defaults to FUN = cor
and FUNARGS allows to pass additional arguments to this function (defaults
to FUNARGS = list(use = "pairwise.complete.obs"). See also
MSnbase::compareChromatograms() for more information.
The grouping of features based on the EIC similarity matrix is performed
with the function specified with parameter groupFun which defaults to
groupFun = groupSimilarityMatrix which groups all rows (features) in the
similarity matrix with a similarity score larger than threshold into the
same cluster. This creates clusters of features in which all features
have a similarity score >= threshold with any other feature in that
cluster. See MsFeatures::groupSimilarityMatrix() for details.
Additional parameters to that function can be passed with the ... argument.
This feature grouping should be called after an initial feature
grouping by retention time (see MsFeatures::SimilarRtimeParam()).
The feature groups defined in columns "feature_group" of
featureDefinitions(object) (for
features matching msLevel) will be used and refined by this method.
Features with a value of NA in featureDefinitions(object)$feature_group
will be skipped/not considered for feature grouping.
EicSimilarityParam( threshold = 0.9, n = 1, onlyPeak = TRUE, value = c("maxo", "into"), groupFun = groupSimilarityMatrix, ALIGNFUN = alignRt, ALIGNFUNARGS = list(tolerance = 0, method = "closest"), FUN = cor, FUNARGS = list(use = "pairwise.complete.obs"), ... ) ## S4 method for signature 'XcmsResult,EicSimilarityParam' groupFeatures(object, param, msLevel = 1L)EicSimilarityParam( threshold = 0.9, n = 1, onlyPeak = TRUE, value = c("maxo", "into"), groupFun = groupSimilarityMatrix, ALIGNFUN = alignRt, ALIGNFUNARGS = list(tolerance = 0, method = "closest"), FUN = cor, FUNARGS = list(use = "pairwise.complete.obs"), ... ) ## S4 method for signature 'XcmsResult,EicSimilarityParam' groupFeatures(object, param, msLevel = 1L)
threshold |
|
n |
|
onlyPeak |
|
value |
|
groupFun |
|
ALIGNFUN |
|
ALIGNFUNARGS |
named |
FUN |
|
FUNARGS |
named |
... |
for |
object |
|
param |
|
msLevel |
|
input object with feature groups added (i.e. in column
"feature_group" of its featureDefinitions data frame.
At present the featureChromatograms() function is used to extract the
EICs for each feature, which does however use one m/z and rt range for
each feature and the EICs do thus not exactly represent the identified
chromatographic peaks of each sample (i.e. their specific m/z and
retention time ranges).
While being possible to be performed on the full data set without prior
feature grouping, this is not suggested for the following reasons: I) the
selection of the top n samples with the highest signal for the
feature group will be biased by very abundant compounds as this is
performed on the full data set (i.e. the samples with the highest overall
intensities are used for correlation of all features) and II) it is
computationally much more expensive because a pairwise correlation between
all features has to be performed.
It is also suggested to perform the correlation on a subset of samples
per feature with the highest intensities of the peaks (for that feature)
although it would also be possible to run the correlation on all samples by
setting n equal to the total number of samples in the data set. EIC
correlation should however be performed ideally on samples in which the
original compound is highly abundant to avoid correlation of missing values
or noisy peak shapes as much as possible.
By default also the signal which is outside identified chromatographic peaks is excluded from the correlation.
Johannes Rainer
feature-grouping for a general overview.
Other feature grouping methods:
groupFeatures-abundance-correlation,
groupFeatures-similar-rtime
library(MsFeatures) library(MsExperiment) ## Load a test data set with detected peaks faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Group chromatographic peaks across samples xodg <- groupChromPeaks(faahko_sub, param = PeakDensityParam(sampleGroups = rep(1, 3))) ## Performing a feature grouping based on EIC similarities on a single ## sample xodg_grp <- groupFeatures(xodg, param = EicSimilarityParam(n = 1)) table(featureDefinitions(xodg_grp)$feature_group) ## Usually it is better to perform this correlation on pre-grouped features ## e.g. based on similar retention time. xodg_grp <- groupFeatures(xodg, param = SimilarRtimeParam(diffRt = 4)) xodg_grp <- groupFeatures(xodg_grp, param = EicSimilarityParam(n = 1)) table(featureDefinitions(xodg_grp)$feature_group)library(MsFeatures) library(MsExperiment) ## Load a test data set with detected peaks faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Group chromatographic peaks across samples xodg <- groupChromPeaks(faahko_sub, param = PeakDensityParam(sampleGroups = rep(1, 3))) ## Performing a feature grouping based on EIC similarities on a single ## sample xodg_grp <- groupFeatures(xodg, param = EicSimilarityParam(n = 1)) table(featureDefinitions(xodg_grp)$feature_group) ## Usually it is better to perform this correlation on pre-grouped features ## e.g. based on similar retention time. xodg_grp <- groupFeatures(xodg, param = SimilarRtimeParam(diffRt = 4)) xodg_grp <- groupFeatures(xodg_grp, param = EicSimilarityParam(n = 1)) table(featureDefinitions(xodg_grp)$feature_group)
Group features based on similar retention time. This method is supposed to be
used as an initial crude grouping of features based on the median retention
time of all their chromatographic peaks. All features with a difference in
their retention time which is <= parameter diffRt of the parameter object
are grouped together. If a column "feature_group" is found in
featureDefinitions() this is further sub-grouped by this method.
See MsFeatures::SimilarRtimeParam() in MsFeatures for more details.
## S4 method for signature 'XcmsResult,SimilarRtimeParam' groupFeatures(object, param, msLevel = 1L, ...)## S4 method for signature 'XcmsResult,SimilarRtimeParam' groupFeatures(object, param, msLevel = 1L, ...)
object |
|
param |
|
msLevel |
|
... |
passed to the |
the input object with feature groups added (i.e. in column
"feature_group" of its featureDefinitions data frame.
Johannes Rainer
Other feature grouping methods:
groupFeatures-abundance-correlation,
groupFeatures-eic-similarity
library(MsFeatures) library(MsExperiment) ## Load a test data set with detected peaks faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Group chromatographic peaks across samples xodg <- groupChromPeaks(faahko_sub, param = PeakDensityParam(sampleGroups = rep(1, 3))) ## Group features based on similar retention time (i.e. difference <= 2 seconds) xodg_grp <- groupFeatures(xodg, param = SimilarRtimeParam(diffRt = 2)) ## Feature grouping get added to the featureDefinitions in column "feature_group" head(featureDefinitions(xodg_grp)$feature_group) table(featureDefinitions(xodg_grp)$feature_group) length(unique(featureDefinitions(xodg_grp)$feature_group)) ## Using an alternative groupiing method that creates larger groups xodg_grp <- groupFeatures(xodg, param = SimilarRtimeParam(diffRt = 2, groupFun = MsCoreUtils::group)) length(unique(featureDefinitions(xodg_grp)$feature_group))library(MsFeatures) library(MsExperiment) ## Load a test data set with detected peaks faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Group chromatographic peaks across samples xodg <- groupChromPeaks(faahko_sub, param = PeakDensityParam(sampleGroups = rep(1, 3))) ## Group features based on similar retention time (i.e. difference <= 2 seconds) xodg_grp <- groupFeatures(xodg, param = SimilarRtimeParam(diffRt = 2)) ## Feature grouping get added to the featureDefinitions in column "feature_group" head(featureDefinitions(xodg_grp)$feature_group) table(featureDefinitions(xodg_grp)$feature_group) length(unique(featureDefinitions(xodg_grp)$feature_group)) ## Using an alternative groupiing method that creates larger groups xodg_grp <- groupFeatures(xodg, param = SimilarRtimeParam(diffRt = 2, groupFun = MsCoreUtils::group)) length(unique(featureDefinitions(xodg_grp)$feature_group))
Allow linking of peak group data between classes using unique group names that remain the same as long as no re-grouping occurs.
object |
the |
mzdec |
number of decimal places to use for m/z |
rtdec |
number of decimal places to use for retention time |
template |
a character vector with existing group names whose format should be emulated |
A character vector with unique names for each peak group in the
object. The format is M[m/z]T[time in seconds].
(object, mzdec = 0, rtdec = 0, template = NULL)
(object)
groupnames generates names for the identified features from the
correspondence analysis based in their mass and retention time. This
generates feature names that are equivalent to the group names of the old
user interface (aka xcms1).
## S4 method for signature 'XCMSnExp' groupnames(object, mzdec = 0, rtdec = 0, template = NULL)## S4 method for signature 'XCMSnExp' groupnames(object, mzdec = 0, rtdec = 0, template = NULL)
object |
|
mzdec |
|
rtdec |
|
template |
|
character with unique names for each feature in object. The
format is M(m/z)T(time in seconds).
groupOverlaps identifies overlapping ranges in the input data and groups
them by returning their indices in xmin xmax.
groupOverlaps(xmin, xmax)groupOverlaps(xmin, xmax)
xmin |
|
xmax |
|
list with the indices of grouped elements.
Johannes Rainer
x <- c(2, 12, 34.2, 12.4) y <- c(3, 16, 35, 36) groupOverlaps(x, y)x <- c(2, 12, 34.2, 12.4) y <- c(3, 16, 35, 36) groupOverlaps(x, y)
Generate a matrix of peak values with rows for every group and
columns for every sample. The value included in the matrix can
be any of the columns from the xcmsSet peaks slot
matrix. Collisions where more than one peak from a single sample
are in the same group get resolved with one of several user-selectable
methods.
object |
the |
method |
conflict resolution method, |
value |
name of peak column to enter into returned matrix, or |
intensity |
if |
A matrix with with rows for every group and columns for every
sample. Missing peaks have NA values.
groupval(object, method = c("medret", "maxint"),
value = "index", intensity = "into")
The highlightChromPeaks() function adds chromatographic
peak definitions to an existing plot, such as one created by the
plot() method on a MSnbase::Chromatogram() or
MSnbase::MChromatograms() object.
highlightChromPeaks( x, rt, mz, peakIds = character(), border = rep("00000040", length(fileNames(x))), lwd = 1, col = NA, type = c("rect", "point", "polygon"), whichPeaks = c("any", "within", "apex_within"), ... )highlightChromPeaks( x, rt, mz, peakIds = character(), border = rep("00000040", length(fileNames(x))), lwd = 1, col = NA, type = c("rect", "point", "polygon"), whichPeaks = c("any", "within", "apex_within"), ... )
x |
For |
rt |
For |
mz |
|
peakIds |
|
border |
colors to be used to color the border of the rectangles/peaks.
Has to be equal to the number of samples in |
lwd |
|
col |
For |
type |
the plotting type. See |
whichPeaks |
|
... |
additional parameters to the |
Johannes Rainer
## Load a test data set with detected peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) ## Extract the ion chromatogram for one chromatographic peak in the data. chrs <- chromatogram(faahko_sub, rt = c(2700, 2900), mz = 335) plot(chrs) ## Extract chromatographic peaks for the mz/rt range (if any). chromPeaks(faahko_sub, rt = c(2700, 2900), mz = 335) ## Highlight the chromatographic peaks in the area ## Show the peak definition with a rectangle highlightChromPeaks(faahko_sub, rt = c(2700, 2900), mz = 335) ## Color the actual peak highlightChromPeaks(faahko_sub, rt = c(2700, 2900), mz = 335, col = c("#ff000020", "#00ff0020"), type = "polygon")## Load a test data set with detected peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) ## Extract the ion chromatogram for one chromatographic peak in the data. chrs <- chromatogram(faahko_sub, rt = c(2700, 2900), mz = 335) plot(chrs) ## Extract chromatographic peaks for the mz/rt range (if any). chromPeaks(faahko_sub, rt = c(2700, 2900), mz = 335) ## Highlight the chromatographic peaks in the area ## Show the peak definition with a rectangle highlightChromPeaks(faahko_sub, rt = c(2700, 2900), mz = 335) ## Color the actual peak highlightChromPeaks(faahko_sub, rt = c(2700, 2900), mz = 335, col = c("#ff000020", "#00ff0020"), type = "polygon")
Create log intensity false-color image of a xcmsRaw object plotted with m/z and retention time axes
x |
xcmsRaw object |
col |
vector of colors to use for for the image |
... |
arguments for |
image(x, col = rainbow(256), ...)
Colin A. Smith, [email protected]
This function provides missing value imputation based on linear
interpolation and resembles some of the functionality of the
profBinLin() and profBinLinBase() functions deprecated from
version 1.51 on.
imputeLinInterpol( x, baseValue, method = "lin", distance = 1L, noInterpolAtEnds = FALSE )imputeLinInterpol( x, baseValue, method = "lin", distance = 1L, noInterpolAtEnds = FALSE )
x |
A numeric vector with eventual missing ( |
baseValue |
The base value to which empty elements should be set. This
is only considered for |
method |
One of |
distance |
For |
noInterpolAtEnds |
For |
Values for NAs in input vector x can be imputed using methods
"lin" and "linbase":
`impute = "lin"` uses simple linear imputation to derive a value for an empty element in input vector `x` from its neighboring non-empty elements. This method is equivalent to the linear interpolation in the `profBinLin` method. Whether interpolation is performed if missing values are present at the beginning and end of `x` can be set with argument `noInterpolAtEnds`. By default interpolation is also performed at the ends interpolating from `0` at the beginning and towards `0` at the end. For `noInterpolAtEnds = TRUE` no interpolation is performed at both ends replacing the missing values at the beginning and/or the end of `x` with `0`. `impute = "linbase"` uses linear interpolation to impute values for empty elements within a user-definable proximity to non-empty elements and setting the element's value to the `baseValue` otherwise. The default for the `baseValue` is half of the smallest value in `x` (`NA`s being removed). Whether linear interpolation based imputation is performed for a missing value depends on the `distance` argument. Interpolation is only performed if one of the next `distance` closest neighbors to the current empty element has a value other than `NA`. No interpolation takes place for `distance = 0`, while `distance = 1` means that the value for an empty element is interpolated from directly adjacent non-empty elements while, if the next neighbors of the current empty element are also `NA`, it's vale is set to `baseValue`. This corresponds to the linear interpolation performed by the `profBinLinBase` method. For more details see examples below.
A numeric vector with empty values imputed based on the selected
method.
Johannes Rainer
####### ## Impute missing values by linearly interpolating from neighboring ## non-empty elements x <- c(3, NA, 1, 2, NA, NA, 4, NA, NA, NA, 3, NA, NA, NA, NA, 2) imputeLinInterpol(x, method = "lin") ## visualize the interpolation: plot(x = 1:length(x), y = x) points(x = 1:length(x), y = imputeLinInterpol(x, method = "lin"), type = "l", col = "grey") ## If the first or last elements are NA, interpolation is performed from 0 ## to the first non-empty element. x <- c(NA, 2, 1, 4, NA) imputeLinInterpol(x, method = "lin") ## visualize the interpolation: plot(x = 1:length(x), y = x) points(x = 1:length(x), y = imputeLinInterpol(x, method = "lin"), type = "l", col = "grey") ## If noInterpolAtEnds is TRUE no interpolation is performed at both ends imputeLinInterpol(x, method = "lin", noInterpolAtEnds = TRUE) ###### ## method = "linbase" ## "linbase" performs imputation by interpolation for empty elements based on ## 'distance' adjacent non-empty elements, setting all remaining empty elements ## to the baseValue x <- c(3, NA, 1, 2, NA, NA, 4, NA, NA, NA, 3, NA, NA, NA, NA, 2) ## Setting distance = 0 skips imputation by linear interpolation imputeLinInterpol(x, method = "linbase", distance = 0) ## With distance = 1 for all empty elements next to a non-empty element the value ## is imputed by linear interpolation. xInt <- imputeLinInterpol(x, method = "linbase", distance = 1L) xInt plot(x = 1:length(x), y = x, ylim = c(0, max(x, na.rm = TRUE))) points(x = 1:length(x), y = xInt, type = "l", col = "grey") ## Setting distance = 2L would cause that for all empty elements for which the ## distance to the next non-empty element is <= 2 the value is imputed by ## linear interpolation: xInt <- imputeLinInterpol(x, method = "linbase", distance = 2L) xInt plot(x = 1:length(x), y = x, ylim = c(0, max(x, na.rm = TRUE))) points(x = 1:length(x), y = xInt, type = "l", col = "grey")####### ## Impute missing values by linearly interpolating from neighboring ## non-empty elements x <- c(3, NA, 1, 2, NA, NA, 4, NA, NA, NA, 3, NA, NA, NA, NA, 2) imputeLinInterpol(x, method = "lin") ## visualize the interpolation: plot(x = 1:length(x), y = x) points(x = 1:length(x), y = imputeLinInterpol(x, method = "lin"), type = "l", col = "grey") ## If the first or last elements are NA, interpolation is performed from 0 ## to the first non-empty element. x <- c(NA, 2, 1, 4, NA) imputeLinInterpol(x, method = "lin") ## visualize the interpolation: plot(x = 1:length(x), y = x) points(x = 1:length(x), y = imputeLinInterpol(x, method = "lin"), type = "l", col = "grey") ## If noInterpolAtEnds is TRUE no interpolation is performed at both ends imputeLinInterpol(x, method = "lin", noInterpolAtEnds = TRUE) ###### ## method = "linbase" ## "linbase" performs imputation by interpolation for empty elements based on ## 'distance' adjacent non-empty elements, setting all remaining empty elements ## to the baseValue x <- c(3, NA, 1, 2, NA, NA, 4, NA, NA, NA, 3, NA, NA, NA, NA, 2) ## Setting distance = 0 skips imputation by linear interpolation imputeLinInterpol(x, method = "linbase", distance = 0) ## With distance = 1 for all empty elements next to a non-empty element the value ## is imputed by linear interpolation. xInt <- imputeLinInterpol(x, method = "linbase", distance = 1L) xInt plot(x = 1:length(x), y = x, ylim = c(0, max(x, na.rm = TRUE))) points(x = 1:length(x), y = xInt, type = "l", col = "grey") ## Setting distance = 2L would cause that for all empty elements for which the ## distance to the next non-empty element is <= 2 the value is imputed by ## linear interpolation: xInt <- imputeLinInterpol(x, method = "linbase", distance = 2L) xInt plot(x = 1:length(x), y = x, ylim = c(0, max(x, na.rm = TRUE))) points(x = 1:length(x), y = xInt, type = "l", col = "grey")
imputeRowMin imputes missing values in x by replacing NAs in each row
with a proportion of the minimal value for that row (i.e.
min_fraction * min(x[i, ])).
imputeRowMin(x, min_fraction = 1/2)imputeRowMin(x, min_fraction = 1/2)
x |
|
min_fraction |
|
Johannes Rainer
imputeLCMD package for more left censored imputation functions.
Other imputation functions:
imputeRowMinRand()
library(MSnbase) library(faahKO) data("faahko") xset <- group(faahko) mat <- groupval(xset, value = "into") mat_imp <- imputeRowMin(mat) head(mat) head(mat_imp) ## Replace with 1/8 of the row mimimum head(imputeRowMin(mat, min_fraction = 1/8))library(MSnbase) library(faahKO) data("faahko") xset <- group(faahko) mat <- groupval(xset, value = "into") mat_imp <- imputeRowMin(mat) head(mat) head(mat_imp) ## Replace with 1/8 of the row mimimum head(imputeRowMin(mat, min_fraction = 1/8))
Replace missing values with random numbers.
When using the method = "mean_sd", random numbers will be generated
from a normal distribution based
on (a fraction of) the row min and a standard deviation estimated from the
linear relationship between row standard deviation and mean of the full data
set. Parameter sd_fraction allows to further reduce the estimated
standard deviation.
When using the method method = "from_to", random numbers between 2 specific values
will be generated.
imputeRowMinRand( x, method = c("mean_sd", "from_to"), min_fraction = 1/2, min_fraction_from = 1/1000, sd_fraction = 1, abs = TRUE )imputeRowMinRand( x, method = c("mean_sd", "from_to"), min_fraction = 1/2, min_fraction_from = 1/1000, sd_fraction = 1, abs = TRUE )
x |
|
method |
method |
min_fraction |
|
min_fraction_from |
|
sd_fraction |
|
abs |
|
For method mean_sd, imputed
values are taken from a normal distribution with mean being a
user defined fraction of the row minimum and the standard deviation
estimated for that mean based on the linear relationship between row
standard deviations and row means in the full matrix x.
To largely avoid imputed values being negative or larger than the real
values, the standard deviation for the random number generation is estimated
ignoring the intercept of the linear model estimating the relationship
between standard deviation and mean. If abs = TRUE NA values are
replaced with the absolute value of the random values.
For method from_to, imputed values are taken between 2 user defined fractions of the row minimum.
Johannes Rainer, Mar Garcia-Aloy
imputeLCMD package for more left censored imputation functions.
Other imputation functions:
imputeRowMin()
library(faahKO) library(MSnbase) data("faahko") xset <- group(faahko) mat <- groupval(xset, value = "into") ## Estimate the relationship between row sd and mean. The standard deviation ## of the random distribution is estimated on this relationship. mns <- rowMeans(mat, na.rm = TRUE) sds <- apply(mat, MARGIN = 1, sd, na.rm = TRUE) plot(mns, sds) abline(lm(sds ~ mns)) mat_imp_meansd <- imputeRowMinRand(mat, method = "mean_sd") mat_imp_fromto <- imputeRowMinRand(mat, method = "from_to") head(mat) head(mat_imp_meansd) head(mat_imp_fromto)library(faahKO) library(MSnbase) data("faahko") xset <- group(faahko) mat <- groupval(xset, value = "into") ## Estimate the relationship between row sd and mean. The standard deviation ## of the random distribution is estimated on this relationship. mns <- rowMeans(mat, na.rm = TRUE) sds <- apply(mat, MARGIN = 1, sd, na.rm = TRUE) plot(mns, sds) abline(lm(sds ~ mns)) mat_imp_meansd <- imputeRowMinRand(mat, method = "mean_sd") mat_imp_fromto <- imputeRowMinRand(mat, method = "from_to") head(mat) head(mat_imp_meansd) head(mat_imp_fromto)
isolationWindowTargetMz extracts the isolation window target m/z definition
for each spectrum in object.
## S4 method for signature 'OnDiskMSnExp' isolationWindowTargetMz(object)## S4 method for signature 'OnDiskMSnExp' isolationWindowTargetMz(object)
object |
MSnbase::OnDiskMSnExp object. |
a numeric of length equal to the number of spectra in object with
the isolation window target m/z or NA if not specified/available.
Johannes Rainer
Create an image of the raw (profile) data m/z against retention time, with the intensity color coded.
x |
xcmsRaw object. |
log |
Whether the intensity should be log transformed. |
col.regions |
The color ramp that should be used for encoding of the intensity. |
rt |
wheter the original ( |
... |
Arguments for |
levelplot(x, log=TRUE, col.regions=colorRampPalette(brewer.pal(9,
"YlOrRd"))(256), ...)
levelplot(x, log=TRUE, col.regions=colorRampPalette(brewer.pal(9,
"YlOrRd"))(256), rt="raw", ...)
Johannes Rainer, [email protected]
This function extracts the raw data which will be used an
xcmsRaw object. Further processing of data is
done in the xcmsRaw constructor.
object |
Specification of a data source (such as a file name or database query) |
The implementing methods decide how to gather the data.
A list containing elements describing the data source. The rt,
scanindex, tic, and acquisitionNum components
each have one entry per scan. They are parallel in the sense that
rt[1], scanindex[1], and acquisitionNum[1] all
refer to the same scan. The list containst the following components:
rt |
Numeric vector with acquisition time (in seconds) for each scan |
tic |
Numeric vector with Total Ion Count for each scan |
scanindex |
Integer vector with starting positions of each scan in the |
mz |
Concatenated vector of m/z values for all scans |
intensity |
Concatenated vector of intensity values for all scans |
signature(object = "xcmsSource")Uses loadRaw,xcmsSource-method to extract raw data.
Subclasses of xcmsSource can provide different
ways of fetching data.
Daniel Hackney, [email protected]
Data sets with xcms preprocessing results are provided within the xcms
package and can be loaded with the loadXcmsData function. The available
Test data sets are:
xdata: an XCMSnExp() object with the results from a xcms-based
pre-processing of an LC-MS untargeted metabolomics data set. The raw data
files are provided in the faahKO R package.
xmse: an XcmsExperiment() object with the results from an xcms-based
pre-processing of an LC-MS untargeted metabolomics data set (same original
data set and pre-processing settings as for the xdata data set).
The pre-processing of this data set is described in detail in the xcms
vignette of the xcms package.
faahko_sub: an XCMSnExp() object with identified
chromatographic peaks in 3 samples from the data files in the faahKO
R package.
faahko_sub2: an XcmsExperiment() object with identified
chromatographic peaks in 3 samples from the data files in the faahKO
R package.
Data sets can also be loaded using data, which would however require to
update objects to point to the location of the raw data files. The
loadXcmsData loads the data and ensures that all paths are updated
accordingly.
loadXcmsData(x = c("xmse", "xdata", "faahko_sub", "faahko_sub2"))loadXcmsData(x = c("xmse", "xdata", "faahko_sub", "faahko_sub2"))
x |
For |
library(xcms) xdata <- loadXcmsData()library(xcms) xdata <- loadXcmsData()
The manualChromPeaks function allows to manually define chromatographic
peaks, integrate the intensities within the specified peak area and add
them to the object's chromPeaks matrix. A peak is not added for a sample
if no signal was found in the respective data file.
Because chromatographic peaks are added to eventually previously identified
peaks, it is suggested to run refineChromPeaks() with the
MergeNeighboringPeaksParam() approach to merge potentially overlapping
peaks.
The manualFeatures function allows to manually group identified
chromatographic peaks into features by providing their index in the
object's chromPeaks matrix.
manualChromPeaks(object, ...) manualFeatures(object, ...) ## S4 method for signature 'MsExperiment' manualChromPeaks( object, chromPeaks = matrix(numeric()), samples = seq_along(object), msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam() ) ## S4 method for signature 'XcmsExperiment' manualChromPeaks( object, chromPeaks = matrix(numeric()), samples = seq_along(object), msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam() ) ## S4 method for signature 'XcmsExperiment' manualFeatures(object, peakIdx = list(), msLevel = 1L) ## S4 method for signature 'OnDiskMSnExp' manualChromPeaks( object, chromPeaks = matrix(), samples = seq_along(fileNames(object)), msLevel = 1L, BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp' manualChromPeaks( object, chromPeaks = matrix(), samples = seq_along(fileNames(object)), msLevel = 1L, BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp' manualFeatures(object, peakIdx = list(), msLevel = 1L)manualChromPeaks(object, ...) manualFeatures(object, ...) ## S4 method for signature 'MsExperiment' manualChromPeaks( object, chromPeaks = matrix(numeric()), samples = seq_along(object), msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam() ) ## S4 method for signature 'XcmsExperiment' manualChromPeaks( object, chromPeaks = matrix(numeric()), samples = seq_along(object), msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam() ) ## S4 method for signature 'XcmsExperiment' manualFeatures(object, peakIdx = list(), msLevel = 1L) ## S4 method for signature 'OnDiskMSnExp' manualChromPeaks( object, chromPeaks = matrix(), samples = seq_along(fileNames(object)), msLevel = 1L, BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp' manualChromPeaks( object, chromPeaks = matrix(), samples = seq_along(fileNames(object)), msLevel = 1L, BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp' manualFeatures(object, peakIdx = list(), msLevel = 1L)
object |
XcmsExperiment, XCMSnExp or MSnbase::OnDiskMSnExp object. |
... |
ignored. |
chromPeaks |
For |
samples |
For |
msLevel |
|
chunkSize |
|
BPPARAM |
parallel processing settings (see |
peakIdx |
For |
XcmsExperiment or XCMSnExp with the manually added
chromatographic peaks or features.
Johannes Rainer
## Read a test dataset. library(MsDataHub) fls <- MsDataHub::PestMix1_DDA.mzML() ## Define a data frame with some sample annotations ann <- data.frame( injection_index = 1, sample_id = c("Pest_mix")) ## Import the data library(MsExperiment) mse <- readMsExperiment(fls) ## Define some arbitrary peak areas pks <- cbind( mzmin = c(512, 234.3), mzmax = c(513, 235), rtmin = c(10, 33), rtmax = c(19, 50) ) pks res <- manualChromPeaks(mse, pks) chromPeaks(res) ## Peaks were only found in the second file.## Read a test dataset. library(MsDataHub) fls <- MsDataHub::PestMix1_DDA.mzML() ## Define a data frame with some sample annotations ann <- data.frame( injection_index = 1, sample_id = c("Pest_mix")) ## Import the data library(MsExperiment) mse <- readMsExperiment(fls) ## Define some arbitrary peak areas pks <- cbind( mzmin = c(512, 234.3), mzmax = c(513, 235), rtmin = c(10, 33), rtmax = c(19, 50) ) pks res <- manualChromPeaks(mse, pks) chromPeaks(res) ## Peaks were only found in the second file.
For each element in a matix, replace it with the median of the values around it.
medianFilter(x, mrad, nrad)medianFilter(x, mrad, nrad)
x |
numeric matrix to median filter |
mrad |
number of rows on either side of the value to use for median calculation |
nrad |
number of rows on either side of the value to use for median calculation |
A matrix whose values have been median filtered
Colin A. Smith, [email protected]
mat <- matrix(1:25, nrow=5) mat medianFilter(mat, 1, 1)mat <- matrix(1:25, nrow=5) mat medianFilter(mat, 1, 1)
The MS2 and MSn data is stored in separate slots,
and can not directly be used by e.g. findPeaks().
msn2xcmsRaw() will copy the MSn spectra
into the "normal" xcmsRaw slots.
msn2xcmsRaw(xmsn)msn2xcmsRaw(xmsn)
xmsn |
an object of class |
The default gap value is determined from the 90th percentile of the pair-wise differences between adjacent mass values.
An xcmsRaw object
Steffen Neumann [email protected]
library(MsDataHub) msnfile <- MsDataHub::PestMix1_DDA.mzML() xrmsn <- xcmsRaw(msnfile, includeMSn=TRUE) xr <- msn2xcmsRaw(xrmsn)library(MsDataHub) msnfile <- MsDataHub::PestMix1_DDA.mzML() xrmsn <- xcmsRaw(msnfile, includeMSn=TRUE) xr <- msn2xcmsRaw(xrmsn)
overlappingFeatures identifies features that are overlapping or close in
the m/z - rt space.
overlappingFeatures(x, expandMz = 0, expandRt = 0, ppm = 0)overlappingFeatures(x, expandMz = 0, expandRt = 0, ppm = 0)
x |
|
expandMz |
|
expandRt |
|
ppm |
|
list with indices of features (in featureDefinitions()) that
are overlapping.
Johannes Rainer
## Load a test data set with detected peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) ## Correspondence analysis xdata <- groupChromPeaks(faahko_sub, param = PeakDensityParam(sampleGroups = c(1, 1, 1))) ## Identify overlapping features overlappingFeatures(xdata) ## Identify features that are separated on retention time by less than ## 2 minutes overlappingFeatures(xdata, expandRt = 60)## Load a test data set with detected peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) ## Correspondence analysis xdata <- groupChromPeaks(faahko_sub, param = PeakDensityParam(sampleGroups = c(1, 1, 1))) ## Identify overlapping features overlappingFeatures(xdata) ## Identify features that are separated on retention time by less than ## 2 minutes overlappingFeatures(xdata, expandRt = 60)
Plot extracted ion chromatograms for many peaks simultaneously, indicating peak integration start and end points with vertical grey lines.
object |
the |
peaks |
matrix with peak information as produced by |
figs |
two-element vector describing the number of rows and the number of columns of peaks to plot, if missing then an approximately square grid that will fit the number of peaks supplied |
width |
width of chromatogram retention time to plot for each peak |
This function is intended to help graphically analyze the results of peak picking. It can help estimate the number of false positives and improper integration start and end points. Its output is very compact and tries to waste as little space as possible. Each plot is labeled with rounded m/z and retention time separated by a space.
signature(object = "xcmsSet")plotPeaks(object, peaks, figs, width = 200)
xcmsRaw-class,
findPeaks,
split.screen
peaksWithCentWave identifies (chromatographic) peaks in purely
chromatographic data, i.e. based on intensity and retention time values
without m/z values.
peaksWithCentWave( int, rt, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), integrate = 1, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, firstBaselineCheck = TRUE, extendLengthMSW = FALSE, ... )peaksWithCentWave( int, rt, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), integrate = 1, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, firstBaselineCheck = TRUE, extendLengthMSW = FALSE, ... )
int |
|
rt |
|
peakwidth |
|
snthresh |
|
prefilter |
|
integrate |
|
fitgauss |
|
noise |
|
verboseColumns |
|
firstBaselineCheck |
|
extendLengthMSW |
|
... |
currently ignored. |
The method uses the same algorithm for the peak detection than centWave,
employs however a different approach to identify the initial regions in
which the peak detection is performed (i.e. the regions of interest ROI).
The method first identifies all local maxima in the chromatographic data and
defines the corresponding positions +/- peakwidth[2] as the ROIs. Noise
estimation bases also on these ROIs and can thus be different from centWave
resulting in different signal to noise ratios.
A matrix, each row representing an identified chromatographic peak, with columns:
"rt": retention time of the peak's midpoint (time of the maximum signal).
"rtmin": minimum retention time of the peak.
"rtmax": maximum retention time of the peak.
"into": integrated (original) intensity of the peak.
"intb": per-peak baseline corrected integrated peak intensity.
"maxo": maximum (original) intensity of the peak.
"sn": signal to noise ratio of the peak defined as
(maxo - baseline)/sd with sd being the standard deviation of the local
chromatographic noise.
Additional columns for verboseColumns = TRUE:
"mu": gaussian parameter mu.
"sigma": gaussian parameter sigma.
"h": gaussian parameter h.
"f": region number of the m/z ROI where the peak was localized.
"dppm": m/z deviation of mass trace across scans in ppm (always NA).
"scale": scale on which the peak was localized.
"scpos": peak position found by wavelet analysis (index in int).
"scmin": left peak limit found by wavelet analysis (index in int).
"scmax": right peak limit found by wavelet analysis (index in int).
Johannes Rainer
centWave for a detailed description of the peak detection method.
Other peak detection functions for chromatographic data:
peaksWithMatchedFilter()
## Reading a file library(MsExperiment) library(xcms) od <- readMsExperiment(system.file("cdf/KO/ko15.CDF", package = "faahKO")) ## Extract chromatographic data for a small m/z range mzr <- c(272.1, 272.2) chr <- chromatogram(od, mz = mzr, rt = c(3000, 3300))[1, 1] int <- intensity(chr) rt <- rtime(chr) ## Plot the region plot(chr, type = "h") ## Identify peaks in the chromatographic data pks <- peaksWithCentWave(intensity(chr), rtime(chr)) pks ## Highlight the peaks rect(xleft = pks[, "rtmin"], xright = pks[, "rtmax"], ybottom = rep(0, nrow(pks)), ytop = pks[, "maxo"], col = "#ff000040", border = "#00000040")## Reading a file library(MsExperiment) library(xcms) od <- readMsExperiment(system.file("cdf/KO/ko15.CDF", package = "faahKO")) ## Extract chromatographic data for a small m/z range mzr <- c(272.1, 272.2) chr <- chromatogram(od, mz = mzr, rt = c(3000, 3300))[1, 1] int <- intensity(chr) rt <- rtime(chr) ## Plot the region plot(chr, type = "h") ## Identify peaks in the chromatographic data pks <- peaksWithCentWave(intensity(chr), rtime(chr)) pks ## Highlight the peaks rect(xleft = pks[, "rtmin"], xright = pks[, "rtmax"], ybottom = rep(0, nrow(pks)), ytop = pks[, "maxo"], col = "#ff000040", border = "#00000040")
The function performs peak detection using the matchedFilter algorithm on chromatographic data (i.e. with only intensities and retention time).
peaksWithMatchedFilter( int, rt, fwhm = 30, sigma = fwhm/2.3548, max = 20, snthresh = 10, ... )peaksWithMatchedFilter( int, rt, fwhm = 30, sigma = fwhm/2.3548, max = 20, snthresh = 10, ... )
int |
|
rt |
|
fwhm |
|
sigma |
|
max |
|
snthresh |
|
... |
currently ignored. |
A matrix, each row representing an identified chromatographic peak, with columns:
"rt": retention time of the peak's midpoint (time of the maximum signal).
"rtmin": minimum retention time of the peak.
"rtmax": maximum retention time of the peak.
"into": integrated (original) intensity of the peak.
"intf": integrated intensity of the filtered peak.
"maxo": maximum (original) intensity of the peak.
"maxf"" maximum intensity of the filtered peak.
"sn": signal to noise ratio of the peak.
Johannes Rainer
matchedFilter for a detailed description of the peak detection method.
Other peak detection functions for chromatographic data:
peaksWithCentWave()
## Load the test file faahko_sub <- loadXcmsData("faahko_sub") ## Subset to one file and drop identified chromatographic peaks data <- dropChromPeaks(filterFile(faahko_sub, 1)) ## Extract chromatographic data for a small m/z range chr <- chromatogram(data, mz = c(272.1, 272.3), rt = c(3000, 3200))[1, 1] pks <- peaksWithMatchedFilter(intensity(chr), rtime(chr)) pks ## Plotting the data plot(rtime(chr), intensity(chr), type = "h") rect(xleft = pks[, "rtmin"], xright = pks[, "rtmax"], ybottom = c(0, 0), ytop = pks[, "maxo"], border = "red")## Load the test file faahko_sub <- loadXcmsData("faahko_sub") ## Subset to one file and drop identified chromatographic peaks data <- dropChromPeaks(filterFile(faahko_sub, 1)) ## Extract chromatographic data for a small m/z range chr <- chromatogram(data, mz = c(272.1, 272.3), rt = c(3000, 3200))[1, 1] pks <- peaksWithMatchedFilter(intensity(chr), rtime(chr)) pks ## Plotting the data plot(rtime(chr), intensity(chr), type = "h") rect(xleft = pks[, "rtmin"], xright = pks[, "rtmax"], ybottom = c(0, 0), ytop = pks[, "maxo"], border = "red")
Create a report showing all aligned peaks.
object |
the |
filebase |
base file name to save report, |
... |
arguments passed down to |
This method handles creation of summary reports similar to
diffreport. It returns a summary report that can
optionally be written out to a tab-separated file.
If a base file name is provided, the report (see Value section) will be saved to a tab separated file.
A data frame with the following columns:
mz |
median m/z of peaks in the group |
mzmin |
minimum m/z of peaks in the group |
mzmax |
maximum m/z of peaks in the group |
rt |
median retention time of peaks in the group |
rtmin |
minimum retention time of peaks in the group |
rtmax |
maximum retention time of peaks in the group |
npeaks |
number of peaks assigned to the group |
Sample Classes |
number samples from each sample class represented in the group |
... |
one column for every sample class |
Sample Names |
integrated intensity value for every sample |
... |
one column for every sample |
peakTable(object, filebase = character(), ...)
## Not run: library(faahKO) cdfpath <- system.file("cdf", package = "faahKO") cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE) xs<-xcmsSet(cdf files) xs<-group(xs) peakTable(xs, filebase="peakList") ## End(Not run)## Not run: library(faahKO) cdfpath <- system.file("cdf", package = "faahKO") cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE) xs<-xcmsSet(cdf files) xs<-group(xs) peakTable(xs, filebase="peakList") ## End(Not run)
The PercentMissingFilter class and method enable users to filter features
from an XcmsExperiment or SummarizedExperiment object based on the
percentage (values from 1 to 100) of missing values for each features in
different sample groups and filters them according to a provided threshold.
This filter is part of the possible dispatch of the generic function
filterFeatures. Features with a percentage of missing values higher (>)
than the user input threshold in all sample groups will be removed (i.e.
features for which the proportion of missing values is below (<=) the
threshold in at least one sample group will be retained).
PercentMissingFilter(threshold = 30, f = factor()) ## S4 method for signature 'XcmsResult,PercentMissingFilter' filterFeatures(object, filter, ...) ## S4 method for signature 'SummarizedExperiment,PercentMissingFilter' filterFeatures(object, filter, assay = 1)PercentMissingFilter(threshold = 30, f = factor()) ## S4 method for signature 'XcmsResult,PercentMissingFilter' filterFeatures(object, filter, ...) ## S4 method for signature 'SummarizedExperiment,PercentMissingFilter' filterFeatures(object, filter, assay = 1)
threshold |
|
f |
|
object |
|
filter |
The parameter object selecting and configuring the type of
filtering. It can be one of the following classes: |
... |
Optional parameters. For |
assay |
For filtering of |
For PercentMissingFilter: a PercentMissingFilter class.
filterFeatures return the input object minus the features that did not met
the user input threshold
Philippine Louail
Other Filter features in xcms:
BlankFlag,
DratioFilter,
RsdFilter
The phenoDataFromPaths function builds a data.frame
representing the experimental design from the folder structure in which
the files of the experiment are located.
phenoDataFromPaths(paths)phenoDataFromPaths(paths)
paths |
|
This function is used by the old xcmsSet function to guess
the experimental design (i.e. group assignment of the files) from the
folders in which the files of the experiment can be found.
## List the files available in the faahKO package base_dir <- system.file("cdf", package = "faahKO") cdf_files <- list.files(base_dir, recursive = TRUE, full.names = TRUE)## List the files available in the faahKO package base_dir <- system.file("cdf", package = "faahKO") cdf_files <- list.files(base_dir, recursive = TRUE, full.names = TRUE)
Batch plot a list of extracted ion chromatograms to the current graphics device.
x |
the |
y |
optional |
groupidx |
either character vector with names or integer vector with indicies of peak groups for which to plot EICs |
sampleidx |
either character vector with names or integer vector with indicies of samples for which to plot EICs |
rtrange |
a two column matrix with minimum and maximum retention times between which to return EIC data points if it has the same number of rows as the number groups in the
it may also be a single number specifying the time window around the peak for which to plot EIC data |
col |
color to use for plotting extracted ion chromatograms. if missing
and if it is the same length as the number groups in the |
legtext |
text to use for legend. if |
peakint |
logical, plot integrated peak area with darkened lines (requires
that |
sleep |
seconds to pause between plotting EICs |
... |
other graphical parameters |
A xcmsSet object.
plot.xcmsEIC(x, y, groupidx = groupnames(x), sampleidx = sampnames(x), rtrange = x@rtrange,
col = rep(1, length(sampleidx)), legtext = NULL, peakint = TRUE, sleep = 0, ...)
Colin A. Smith, [email protected]
xcmsEIC-class,
png,
pdf,
postscript,
The plotAdjustedRtime function plots the difference between the adjusted
and raw retention times on the y-axis against the raw retention times on
the x-axis. Each line represents the results for one sample (file).
If alignment was performed using the peak groups method (see
adjustRtime() for more infromation) also the peak groups used in the
alignment are visualized.
plotAdjustedRtime( object, col = "#00000080", lty = 1, lwd = 1, type = "l", adjustedRtime = TRUE, xlab = ifelse(adjustedRtime, yes = expression(rt[adj]), no = expression(rt[raw])), ylab = expression(rt[adj] - rt[raw]), peakGroupsCol = "#00000060", peakGroupsPch = 16, peakGroupsLty = 3, ylim, ... )plotAdjustedRtime( object, col = "#00000080", lty = 1, lwd = 1, type = "l", adjustedRtime = TRUE, xlab = ifelse(adjustedRtime, yes = expression(rt[adj]), no = expression(rt[raw])), ylab = expression(rt[adj] - rt[raw]), peakGroupsCol = "#00000060", peakGroupsPch = 16, peakGroupsLty = 3, ylim, ... )
object |
A |
col |
color(s) for the individual lines. Has to be of length 1 or equal to the number of samples. |
lty |
line type for the lines of the individual samples. |
lwd |
line width for the lines of the individual samples. |
type |
plot type (see |
adjustedRtime |
|
xlab |
the label for the x-axis. |
ylab |
the label for the y-axis. |
peakGroupsCol |
color to be used for the peak groups (only if alignment was performed using the peak groups method. |
peakGroupsPch |
point character ( |
peakGroupsLty |
line type ( |
ylim |
optional |
... |
Additional arguments to be passed down to the |
Johannes Rainer
## Load a test data set with detected peaks faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Performing the peak grouping using the "peak density" method. p <- PeakDensityParam(sampleGroups = c(1, 1, 1)) res <- groupChromPeaks(faahko_sub, param = p) ## Perform the retention time adjustment using peak groups found in both ## files. fgp <- PeakGroupsParam(minFraction = 1) res <- adjustRtime(res, param = fgp) ## Visualize the impact of the alignment. plotAdjustedRtime(res, adjusted = FALSE) grid()## Load a test data set with detected peaks faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) ## Performing the peak grouping using the "peak density" method. p <- PeakDensityParam(sampleGroups = c(1, 1, 1)) res <- groupChromPeaks(faahko_sub, param = p) ## Perform the retention time adjustment using peak groups found in both ## files. fgp <- PeakGroupsParam(minFraction = 1) res <- adjustRtime(res, param = fgp) ## Visualize the impact of the alignment. plotAdjustedRtime(res, adjusted = FALSE) grid()
Uses the pre-generated profile mode matrix to plot averaged or base peak extracted ion chromatograms over a specified mass range.
object |
the |
base |
logical, plot a base-peak chromatogram |
ident |
logical, use mouse to identify and label peaks |
fitgauss |
logical, fit a gaussian to the largest peak |
vline |
numeric vector with locations of vertical lines |
... |
arguments passed to |
If ident == TRUE, an integer vector with the indecies of
the points that were identified. If fitgauss == TRUE, a
nls model with the fitted gaussian. Otherwise a two-column
matrix with the plotted points.
plotChrom(object, base = FALSE, ident = FALSE,
fitgauss = FALSE, vline = numeric(0), ...)
plotOverlay draws chromatographic peak data from multiple (different)
extracted ion chromatograms (EICs) into the same plot. This allows to
directly compare the peak shape of these EICs in the same sample. In
contrast to the plot function for MSnbase::MChromatograms() object,
which draws the data from the same EIC across multiple samples in the
same plot, this function draws the different EICs from the same sample
into the same plot.
If plotChromatogramsOverlay is called on a XChromatograms object any
present chromatographic peaks will also be highlighted/drawn depending on the
parameters peakType, peakCol, peakBg and peakPch (see also help on
the plot function for XChromatogram() object for details).
## S4 method for signature 'MChromatograms' plotChromatogramsOverlay( object, col = "#00000060", type = "l", main = NULL, xlab = "rtime", ylab = "intensity", xlim = numeric(), ylim = numeric(), stacked = 0, transform = identity, ... ) ## S4 method for signature 'XChromatograms' plotChromatogramsOverlay( object, col = "#00000060", type = "l", main = NULL, xlab = "rtime", ylab = "intensity", xlim = numeric(), ylim = numeric(), peakType = c("polygon", "point", "rectangle", "none"), peakBg = NULL, peakCol = NULL, peakPch = 1, stacked = 0, transform = identity, ... )## S4 method for signature 'MChromatograms' plotChromatogramsOverlay( object, col = "#00000060", type = "l", main = NULL, xlab = "rtime", ylab = "intensity", xlim = numeric(), ylim = numeric(), stacked = 0, transform = identity, ... ) ## S4 method for signature 'XChromatograms' plotChromatogramsOverlay( object, col = "#00000060", type = "l", main = NULL, xlab = "rtime", ylab = "intensity", xlim = numeric(), ylim = numeric(), peakType = c("polygon", "point", "rectangle", "none"), peakBg = NULL, peakCol = NULL, peakPch = 1, stacked = 0, transform = identity, ... )
object |
|
col |
definition of the color in which the chromatograms should be
drawn. Can be of length 1 or equal to |
type |
|
main |
optional title of the plot. If not defined, the range of m/z values is used. |
xlab |
|
ylab |
|
xlim |
optional |
ylim |
optional |
stacked |
|
transform |
|
... |
optional arguments to be passed to the plotting functions (see
help on the base R |
peakType |
if |
peakBg |
if |
peakCol |
if |
peakPch |
if |
silently returns a list (length equal to ncol(object) of
numeric (length equal to nrow(object)) with the y position of
each EIC.
Johannes Rainer
## Load preprocessed data and extract EICs for some features. library(xcms) library(MSnbase) xdata <- loadXcmsData() data(xdata) ## Update the path to the files for the local system dirname(xdata) <- c(rep(system.file("cdf", "KO", package = "faahKO"), 4), rep(system.file("cdf", "WT", package = "faahKO"), 4)) ## Subset to the first 3 files. xdata <- filterFile(xdata, 1:3, keepFeatures = TRUE) ## Define features for which to extract EICs fts <- c("FT097", "FT163", "FT165") chrs <- featureChromatograms(xdata, features = fts) plotChromatogramsOverlay(chrs) ## plot the overlay of EICs in the first sample plotChromatogramsOverlay(chrs[, 1]) ## Define a different color for each feature (row in chrs). By default, also ## all chromatographic peaks of a feature is labeled in the same color. plotChromatogramsOverlay(chrs[, 1], col = c("#ff000040", "#00ff0040", "#0000ff40")) ## Alternatively, we can define a color for each individual chromatographic ## peak and provide this with the `peakBg` and `peakCol` parameters. chromPeaks(chrs[, 1]) ## Use a color for each of the two identified peaks in that sample plotChromatogramsOverlay(chrs[, 1], col = c("#ff000040", "#00ff0040", "#0000ff40"), peakBg = c("#ffff0020", "#00ffff20")) ## Plotting the data in all samples. plotChromatogramsOverlay(chrs, col = c("#ff000040", "#00ff0040", "#0000ff40")) ## Creating a "stacked" EIC plot: the EICs are placed along the y-axis ## relative to their m/z value. With `stacked = 1` the y-axis is split in ## half, the lower half being used for the stacking of the EICs, the upper ## half being used for the *original* intensity axis. res <- plotChromatogramsOverlay(chrs[, 1], stacked = 1, col = c("#ff000040", "#00ff0040", "#0000ff40")) ## add horizontal lines for the m/z values of each EIC abline(h = res[[1]], col = "grey", lty = 2) ## Note that this type of visualization is different than the conventional ## plot function for chromatographic data, which will draw the EICs for ## multiple samples into the same plot plot(chrs) ## Converting the object to a MChromatograms without detected peaks chrs <- as(chrs, "MChromatograms") plotChromatogramsOverlay(chrs, col = c("#ff000040", "#00ff0040", "#0000ff40"))## Load preprocessed data and extract EICs for some features. library(xcms) library(MSnbase) xdata <- loadXcmsData() data(xdata) ## Update the path to the files for the local system dirname(xdata) <- c(rep(system.file("cdf", "KO", package = "faahKO"), 4), rep(system.file("cdf", "WT", package = "faahKO"), 4)) ## Subset to the first 3 files. xdata <- filterFile(xdata, 1:3, keepFeatures = TRUE) ## Define features for which to extract EICs fts <- c("FT097", "FT163", "FT165") chrs <- featureChromatograms(xdata, features = fts) plotChromatogramsOverlay(chrs) ## plot the overlay of EICs in the first sample plotChromatogramsOverlay(chrs[, 1]) ## Define a different color for each feature (row in chrs). By default, also ## all chromatographic peaks of a feature is labeled in the same color. plotChromatogramsOverlay(chrs[, 1], col = c("#ff000040", "#00ff0040", "#0000ff40")) ## Alternatively, we can define a color for each individual chromatographic ## peak and provide this with the `peakBg` and `peakCol` parameters. chromPeaks(chrs[, 1]) ## Use a color for each of the two identified peaks in that sample plotChromatogramsOverlay(chrs[, 1], col = c("#ff000040", "#00ff0040", "#0000ff40"), peakBg = c("#ffff0020", "#00ffff20")) ## Plotting the data in all samples. plotChromatogramsOverlay(chrs, col = c("#ff000040", "#00ff0040", "#0000ff40")) ## Creating a "stacked" EIC plot: the EICs are placed along the y-axis ## relative to their m/z value. With `stacked = 1` the y-axis is split in ## half, the lower half being used for the stacking of the EICs, the upper ## half being used for the *original* intensity axis. res <- plotChromatogramsOverlay(chrs[, 1], stacked = 1, col = c("#ff000040", "#00ff0040", "#0000ff40")) ## add horizontal lines for the m/z values of each EIC abline(h = res[[1]], col = "grey", lty = 2) ## Note that this type of visualization is different than the conventional ## plot function for chromatographic data, which will draw the EICs for ## multiple samples into the same plot plot(chrs) ## Converting the object to a MChromatograms without detected peaks chrs <- as(chrs, "MChromatograms") plotChromatogramsOverlay(chrs, col = c("#ff000040", "#00ff0040", "#0000ff40"))
Plot the density of chromatographic peaks along the retention
time axis and indicate which peaks would be (or were) grouped into the
same feature based using the peak density correspondence method.
Settings for the peak density method can be passed with an
PeakDensityParam object to parameter param. If the object contains
correspondence results and the correspondence was performed with the
peak groups method, the results from that correspondence can be
visualized setting simulate = FALSE.
## S4 method for signature 'XCMSnExp' plotChromPeakDensity( object, mz, rt, param, simulate = TRUE, col = "#00000080", xlab = "retention time", ylab = "sample", xlim = range(rt), main = NULL, type = c("any", "within", "apex_within"), ... )## S4 method for signature 'XCMSnExp' plotChromPeakDensity( object, mz, rt, param, simulate = TRUE, col = "#00000080", xlab = "retention time", ylab = "sample", xlim = range(rt), main = NULL, type = c("any", "within", "apex_within"), ... )
object |
A XCMSnExp object with identified chromatographic peaks. |
mz |
|
rt |
|
param |
PeakDensityParam from which parameters for the
peak density correspondence algorithm can be extracted. If not provided
and if |
simulate |
|
col |
Color to be used for the individual samples. Length has to be 1
or equal to the number of samples in |
xlab |
|
ylab |
|
xlim |
|
main |
|
type |
|
... |
Additional parameters to be passed to the |
The plotChromPeakDensity function allows to evaluate
different settings for the peak density on an mz slice of
interest (e.g. containing chromatographic peaks corresponding to a known
metabolite).
The plot shows the individual peaks that were detected within the
specified mz slice at their retention time (x-axis) and sample in
which they were detected (y-axis). The density function is plotted as a
black line. Parameters for the density function are taken from the
param object. Grey rectangles indicate which chromatographic peaks
would be grouped into a feature by the peak density correspondence
method. Parameters for the algorithm are also taken from param.
See groupChromPeaks() for more information about the
algorithm and its supported settings.
The function is called for its side effect, i.e. to create a plot.
Johannes Rainer
groupChromPeaks() for details on the
peak density correspondence method and supported settings.
## Load a test data set with detected peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Plot the chromatographic peak density for a specific mz range to evaluate ## different peak density correspondence settings. mzr <- c(305.05, 305.15) plotChromPeakDensity(faahko_sub, mz = mzr, pch = 16, param = PeakDensityParam(sampleGroups = rep(1, length(fileNames(faahko_sub)))))## Load a test data set with detected peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Plot the chromatographic peak density for a specific mz range to evaluate ## different peak density correspondence settings. mzr <- c(305.05, 305.15) plotChromPeakDensity(faahko_sub, mz = mzr, pch = 16, param = PeakDensityParam(sampleGroups = rep(1, length(fileNames(faahko_sub)))))
plotChromPeaks plots the identified chromatographic
peaks from one file into the plane spanned by the retention time (x-axis)
and m/z (y-axis) dimension. Each chromatographic peak is plotted as a
rectangle representing its width in RT and m/z dimension.
plotChromPeakImage plots the number of detected peaks for
each sample along the retention time axis as an image plot, i.e.
with the number of peaks detected in each bin along the retention time
represented with the color of the respective cell.
plotChromPeaks( x, file = 1, xlim = NULL, ylim = NULL, add = FALSE, border = "#00000060", col = NA, xlab = "retention time", ylab = "mz", main = NULL, msLevel = 1L, ... ) plotChromPeakImage( x, binSize = 30, xlim = NULL, log = FALSE, xlab = "retention time", yaxt = par("yaxt"), main = "Chromatographic peak counts", msLevel = 1L, ... )plotChromPeaks( x, file = 1, xlim = NULL, ylim = NULL, add = FALSE, border = "#00000060", col = NA, xlab = "retention time", ylab = "mz", main = NULL, msLevel = 1L, ... ) plotChromPeakImage( x, binSize = 30, xlim = NULL, log = FALSE, xlab = "retention time", yaxt = par("yaxt"), main = "Chromatographic peak counts", msLevel = 1L, ... )
x |
A |
file |
For |
xlim |
|
ylim |
For |
add |
For |
border |
For |
col |
For |
xlab |
|
ylab |
For |
main |
|
msLevel |
|
... |
Additional arguments passed to the |
binSize |
For |
log |
For |
yaxt |
For |
The width and line type of the rectangles indicating the detected
chromatographic peaks for the plotChromPeaks function can be
specified using the par function, i.e. with par(lwd = 3)
and par(lty = 2), respectively.
Johannes Rainer
## Load a test data set with detected peaks faahko_sub <- loadXcmsData("faahko_sub2") ## plotChromPeakImage: plot an image for the identified peaks per file plotChromPeakImage(faahko_sub) ## Show all detected chromatographic peaks from the first file plotChromPeaks(faahko_sub) ## Plot all detected peaks from the second file and restrict the plot to a ## mz-rt slice plotChromPeaks(faahko_sub, file = 2, xlim = c(3500, 3600), ylim = c(400, 600))## Load a test data set with detected peaks faahko_sub <- loadXcmsData("faahko_sub2") ## plotChromPeakImage: plot an image for the identified peaks per file plotChromPeakImage(faahko_sub) ## Show all detected chromatographic peaks from the first file plotChromPeaks(faahko_sub) ## Plot all detected peaks from the second file and restrict the plot to a ## mz-rt slice plotChromPeaks(faahko_sub, file = 2, xlim = c(3500, 3600), ylim = c(400, 600))
Plot extracted ion chromatogram for m/z values of interest. The raw data is used in contrast to plotChrom which uses data from the profile matrix.
object |
|
mzrange |
m/z range for EIC. Uses the full m/z range by default. |
rtrange |
retention time range for EIC. Uses the full retention time range by default. |
scanrange |
scan range for EIC |
mzdec |
Number of decimal places of title m/z values in the eic plot. |
type |
Speficies how the data should be plotted (by default as a line). |
add |
If the EIC should be added to an existing plot. |
... |
Additional parameters passed to the plotting function
(e.g. |
A two-column matrix with the plotted points.
plotEIC(object, mzrange = numeric(), rtrange = numeric(),
scanrange = numeric(), mzdec=2, type="l", add=FALSE, ...)
Ralf Tautenhahn
plotFeatureGroups() visualizes defined feature groups in the m/z by
retention time space. Features are indicated by points with features from
the same feature group being connected by a line. See
MsFeatures::featureGroups() for details on and options for
feature grouping.
plotFeatureGroups( x, xlim = numeric(), ylim = numeric(), xlab = "retention time", ylab = "m/z", pch = 4, col = "#00000060", type = "o", main = "Feature groups", featureGroups = character(), ... )plotFeatureGroups( x, xlim = numeric(), ylim = numeric(), xlab = "retention time", ylab = "m/z", pch = 4, col = "#00000060", type = "o", main = "Feature groups", featureGroups = character(), ... )
x |
XcmsExperiment or |
xlim |
|
ylim |
|
xlab |
|
ylab |
|
pch |
the plotting character. Defaults to |
col |
color to be used to draw the features. At present only a single color is supported. |
type |
plotting type (see |
main |
|
featureGroups |
optional |
... |
additional parameters to be passed to the |
Johannes Rainer
UPDATE: please use plot() from the MsExperiment or
plot(x, type = "XIC") from the MSnbase package instead. See examples
in the vignette for more information.
The plotMsData creates a plot that combines an (base peak )
extracted ion chromatogram on top (rt against intensity) and a plot of
rt against m/z values at the bottom.
plotMsData( x, main = "", cex = 1, mfrow = c(2, 1), grid.color = "lightgrey", colramp = colorRampPalette(rev(brewer.pal(9, "YlGnBu"))) )plotMsData( x, main = "", cex = 1, mfrow = c(2, 1), grid.color = "lightgrey", colramp = colorRampPalette(rev(brewer.pal(9, "YlGnBu"))) )
x |
|
main |
|
cex |
|
mfrow |
|
grid.color |
a color definition for the grid line (or |
colramp |
a color ramp palette to be used to color the data points
based on their intensity. See argument |
Johannes Rainer
Plot extracted ion chromatograms for many peaks simultaneously, indicating peak integration start and end points with vertical grey lines.
object |
the |
peaks |
matrix with peak information as produced by |
figs |
two-element vector describing the number of rows and the number of columns of peaks to plot, if missing then an approximately square grid that will fit the number of peaks supplied |
width |
width of chromatogram retention time to plot for each peak |
This function is intended to help graphically analyze the results of peak picking. It can help estimate the number of false positives and improper integration start and end points. Its output is very compact and tries to waste as little space as possible. Each plot is labeled with rounded m/z and retention time separated by a space.
plotPeaks(object, peaks, figs, width = 200)
xcmsRaw-class,
findPeaks,
split.screen
Simple visualization of the position of fragment spectra's precursor ion in the MS1 retention time by m/z area.
plotPrecursorIons( x, pch = 21, col = "#00000080", bg = "#00000020", xlab = "retention time", ylab = "m/z", main = character(), ... )plotPrecursorIons( x, pch = 21, col = "#00000080", bg = "#00000020", xlab = "retention time", ylab = "m/z", main = character(), ... )
x |
|
pch |
|
col |
the color to be used for all data points. Defines the border
color if |
bg |
the background color (if |
xlab |
|
ylab |
|
main |
Optional |
... |
additional parameters to be passed to the |
Johannes Rainer
## Load a test data file with DDA LC-MS/MS data library(MsExperiment) library(MsDataHub) fl <- MsDataHub::PestMix1_DDA.mzML() pest_dda <- readMsExperiment(fl) plotPrecursorIons(pest_dda) grid() ## Subset the data object to plot the data specifically for one or ## selected file/sample: plotPrecursorIons(pest_dda[1L])## Load a test data file with DDA LC-MS/MS data library(MsExperiment) library(MsDataHub) fl <- MsDataHub::PestMix1_DDA.mzML() pest_dda <- readMsExperiment(fl) plotPrecursorIons(pest_dda) grid() ## Subset the data object to plot the data specifically for one or ## selected file/sample: plotPrecursorIons(pest_dda[1L])
Use "democracy" to determine the average m/z and RT deviations for a grouped xcmsSet, and dependency on sample or absolute m/z
plotQC() is a warpper to create a set of diagnostic plots.
For the m/z deviations, the median of all m/z withon one group are assumed.
plotQC(object, sampNames, sampColors, sampOrder, what)plotQC(object, sampNames, sampColors, sampOrder, what)
object |
A grouped |
sampNames |
Override sample names (e.g. with simplified names) |
sampColors |
Provide a set of colors (default: monochrome ?) |
sampOrder |
Override the order of samples, e.g. to bring them in order of measurement to detect time drift |
what |
A vector of which QC plots to generate. "mzdevhist": histogram of mz deviations. Should be gaussian shaped. If it is multimodal, then some peaks seem to have a systematically higher m/z deviation "rtdevhist": histogram of RT deviations. Should be gaussian shaped. If it is multimodal, then some peaks seem to have a systematically higher RT deviation "mzdevmass": Shows whether m/z deviations are absolute m/z dependent, could indicate miscalibration "mzdevtime": Shows whether m/z deviations are RT dependent, could indicate instrument drift "mzdevsample": median mz deviation for each sample, indicates outliers "rtdevsample": median RT deviation for each sample, indicates outliers |
List with four matrices, each of dimension features * samples: "mz": median mz deviation for each sample "mzdev": median mz deviation for each sample "rt": median RT deviation for each sample "rtdev": median RT deviation for each sample
Michael Wenk, Michael Wenk [email protected]
library(faahKO) xsg <- group(faahko) plotQC(xsg, what="mzdevhist") plotQC(xsg, what="rtdevhist") plotQC(xsg, what="mzdevmass") plotQC(xsg, what="mzdevtime") plotQC(xsg, what="mzdevsample") plotQC(xsg, what="rtdevsample")library(faahKO) xsg <- group(faahko) plotQC(xsg, what="mzdevhist") plotQC(xsg, what="rtdevhist") plotQC(xsg, what="mzdevmass") plotQC(xsg, what="mzdevtime") plotQC(xsg, what="mzdevsample") plotQC(xsg, what="rtdevsample")
Produce a scatterplot showing raw data point location in retention time and m/z. This plot is more useful for centroided data than continuum data.
object |
the |
mzrange |
numeric vector of length >= 2 whose range will be used to select the masses to plot |
rtrange |
numeric vector of length >= 2 whose range will be used to select the retention times to plot |
scanrange |
numeric vector of length >= 2 whose range will be used to select scans to plot |
log |
logical, log transform intensity |
title |
main title of the plot |
A matrix with the points plotted.
plotRaw(object, mzrange = numeric(), rtrange = numeric(),
scanrange = numeric(), log=FALSE, title='Raw Data')
Use corrected retention times for each sample to calculate retention time deviation profiles and plot each on the same graph.
object |
the |
col |
vector of colors for plotting each sample |
ty |
vector of line and point types for plotting each sample |
leg |
logical plot legend with sample labels |
densplit |
logical, also plot peak overall peak density |
plotrt(object, col = NULL, ty = NULL, leg = TRUE,
densplit = FALSE)
Plot a single mass scan using the impulse representation. Most useful for centroided data.
object |
the |
scan |
integer with number of scan to plot |
mzrange |
numeric vector of length >= 2 whose range will be used to select masses to plot |
ident |
logical, use mouse to interactively identify and label individual masses |
plotScan(object, scan, mzrange = numeric(), ident = FALSE)
Uses the pre-generated profile mode matrix to plot mass spectra over a specified retention time range.
object |
the |
ident |
logical, use mouse to identify and label peaks |
vline |
numeric vector with locations of vertical lines |
... |
arguments passed to |
If ident == TRUE, an integer vector with the indecies of
the points that were identified. Otherwise a two-column matrix
with the plotted points.
plotSpec(object, ident = FALSE, vline = numeric(0), ...)
This method uses the rgl package to create interactive three dimensonal representations of the profile matrix. It uses the terrain color scheme.
object |
the |
log |
logical, log transform intensity |
aspect |
numeric vector with aspect ratio of the m/z, retention time and intensity components of the plot |
... |
arguments passed to |
The rgl package is still in development and imposes some limitations on the output format. A bug in the axis label code means that the axis labels only go from 0 to the aspect ratio constant of that axis. Additionally the axes are not labeled with what they are.
It is important to only plot a small portion of the profile matrix. Large portions can quickly overwhelm your CPU and memory.
plotSurf(object, log = FALSE, aspect = c(1, 1, .5), ...)
Plot chromatogram of total ion count. Optionally allow identification of target peaks and viewing/identification of individual spectra.
object |
the |
ident |
logical, use mouse to identify and label chromatographic peaks |
msident |
logical, use mouse to identify and label spectral peaks |
If ident == TRUE, an integer vector with the indecies of
the points that were identified. Otherwise a two-column matrix
with the plotted points.
plotTIC(object, ident = FALSE, msident = FALSE)
Objects of the type ProcessHistory allow to keep track
of any data processing step in an metabolomics experiment. They are
created by the data processing methods, such as
findChromPeaks() and added to the corresponding results
objects. Thus, usually, users don't need to create them.
The XProcessHistory extends the ProcessHistory by
adding a slot param that allows to store the actual parameter
class of the processing step.
processParam(), processParam<-: get or set the
parameter class from an XProcessHistory object.
msLevel(): returns the MS level on which a certain analysis
has been performed, or NA if not defined.
The processType() method returns a character specifying the
processing step type.
The processDate() extracts the start date of the processing
step.
The processInfo() extracts optional additional information
on the processing step.
The fileIndex() extracts the indices of the files on which
the processing step was applied.
## S4 method for signature 'XProcessHistory' processParam(object) ## S4 method for signature 'XProcessHistory' msLevel(object) ## S4 method for signature 'ProcessHistory' processType(object) ## S4 method for signature 'ProcessHistory' processDate(object) ## S4 method for signature 'ProcessHistory' processInfo(object) ## S4 method for signature 'ProcessHistory' fileIndex(object)## S4 method for signature 'XProcessHistory' processParam(object) ## S4 method for signature 'XProcessHistory' msLevel(object) ## S4 method for signature 'ProcessHistory' processType(object) ## S4 method for signature 'ProcessHistory' processDate(object) ## S4 method for signature 'ProcessHistory' processInfo(object) ## S4 method for signature 'ProcessHistory' fileIndex(object)
object |
A |
For processParam: a parameter object extending the
Param class.
The processType() method returns a character string with the
processing step type.
The processDate() method returns a character string with the
time stamp of the processing step start.
The processInfo() method returns a character string with
optional additional informations.
The fileIndex() method returns a integer vector with the index
of the files/samples on which the processing step was applied.
typecharacter(1): string defining the type of the processing step.
This string has to match predefined values. Use
processHistoryTypes() to list them.
datecharacter(1): date time stamp when the processing step
was started.
infocharacter(1): optional additional information.
fileIndexinteger of length 1 or > 1 to specify on which samples of the object the processing was performed.
error(ANY): used to store eventual calculation errors.
param(Param): an object of type Param (e.g.
CentWaveParam()) specifying the settings of the processing
step.
msLevel:integer definining the MS level(s) on which the
analysis was performed.
Johannes Rainer
The profile matrix is an n x m matrix, n (rows) representing equally spaced m/z values (bins) and m (columns) the retention time of the corresponding scans. Each cell contains the maximum intensity measured for the specific scan and m/z values falling within the m/z bin.
The `profMat` method creates a new profile matrix or returns the profile matrix within the object's `@env` slot, if available. Settings for the profile matrix generation, such as `step` (the bin size), `method` or additional settings are extracted from the respective slots of the `xcmsRaw` object. Alternatively it is possible to specify all of the settings as additional parameters. For [MsExperiment()] or [XcmsExperiment()] objects, the method returns a `list` of profile matrices, one for each sample in `object`. Using parameter `fileIndex` it is also possible to create a profile matrix only for selected samples (files).
## S4 method for signature 'MsExperiment' profMat( object, method = "bin", step = 0.1, baselevel = NULL, basespace = NULL, mzrange. = NULL, fileIndex = seq_along(object), chunkSize = 1L, msLevel = 1L, BPPARAM = bpparam(), ... ) ## S4 method for signature 'xcmsRaw' profMat(object, method, step, baselevel, basespace, mzrange.)## S4 method for signature 'MsExperiment' profMat( object, method = "bin", step = 0.1, baselevel = NULL, basespace = NULL, mzrange. = NULL, fileIndex = seq_along(object), chunkSize = 1L, msLevel = 1L, BPPARAM = bpparam(), ... ) ## S4 method for signature 'xcmsRaw' profMat(object, method, step, baselevel, basespace, mzrange.)
object |
An |
method |
|
step |
|
baselevel |
|
basespace |
|
mzrange. |
Optional |
fileIndex |
For |
chunkSize |
For |
msLevel |
For |
BPPARAM |
For |
... |
ignored. |
Profile matrix generation methods:
"bin": The default profile matrix generation method that does a
simple binning, i.e. aggregating of intensity values falling within an
m/z bin.
"binlin": Binning followed by linear interpolation to impute missing
values. The value for m/z bins without a measured intensity are inferred
by a linear interpolation between neighboring bins with a measured
intensity.
"binlinbase": Binning followed by a linear interpolation to impute
values for empty elements (m/z bins) within a user-definable proximity to
non-empty elements while stetting the element's value to the
baselevel otherwise. See impute = "linbase" parameter of
imputeLinInterpol() for more details.
"intlin": Set the elements' values to the integral of the linearly
interpolated data from plus to minus half the step size.
profMat returns the profile matrix (rows representing scans,
columns equally spaced m/z values). For object being a MsExperiment
or XcmsExperiment, the method returns a list of profile matrices,
one for each file (sample).
Johannes Rainer
library(xcms) library(MsDataHub) file <- ko15.CDF() ## Load the data without generating the profile matrix (profstep = 0) xraw <- xcmsRaw(file, profstep = 0) ## Extract the profile matrix profmat <- profMat(xraw, step = 0.3) dim(profmat) ## If not otherwise specified, the settings from the xraw object are used: profinfo(xraw) ## To extract a profile matrix with linear interpolation use profmat <- profMat(xraw, step = 0.3, method = "binlin") ## Alternatively, the profMethod of the xraw objects could be changed profMethod(xraw) <- "binlin" profmat_2 <- profMat(xraw, step = 0.3) all.equal(profmat, profmat_2)library(xcms) library(MsDataHub) file <- ko15.CDF() ## Load the data without generating the profile matrix (profstep = 0) xraw <- xcmsRaw(file, profstep = 0) ## Extract the profile matrix profmat <- profMat(xraw, step = 0.3) dim(profmat) ## If not otherwise specified, the settings from the xraw object are used: profinfo(xraw) ## To extract a profile matrix with linear interpolation use profmat <- profMat(xraw, step = 0.3, method = "binlin") ## Alternatively, the profMethod of the xraw objects could be changed profMethod(xraw) <- "binlin" profmat_2 <- profMat(xraw, step = 0.3) all.equal(profmat, profmat_2)
Apply a median filter of given size to a profile matrix.
object |
the |
massrad |
number of m/z grid points on either side to use for median calculation |
scanrad |
number of scan grid points on either side to use for median calculation |
profMedFilt(object, massrad = 0, scanrad = 0)
These methods get and set the method for generating profile
(matrix) data from raw mass spectral data. It can currently be
bin, binlin, binlinbase, or intlin.
profMethod(object)
xcmsRaw-class,
profMethod,
profBin,
plotSpec,
plotChrom,
findPeaks
Specify a subset of the profile mode matrix given a mass, time, or scan range. Allow flexible user entry for other functions.
object |
the |
mzrange |
single numeric mass or vector of masses |
rtrange |
single numeric time (in seconds) or vector of times |
scanrange |
single integer scan index or vector of indecies |
... |
arguments to other functions |
This function handles selection of mass/time subsets of the profile matrix for other functions. It allows the user to specify such subsets in a variety of flexible ways with minimal typing.
Because R does partial argument matching, mzrange,
scanrange, and rtrange can be specified in short
form using m=, s=, and t=, respectively. If
both a scanrange and rtrange are specified, then
the rtrange specification takes precedence.
When specifying ranges, you may either enter a single number or
a numeric vector. If a single number is entered, then the closest
single scan or mass value is selected. If a vector is entered,
then the range is set to the range() of the values entered.
That allows specification of ranges using shortened, slightly
non-standard syntax. For example, one could specify 400 to 500
seconds using any of the following: t=c(400,500),
t=c(500,400), or t=400:500. Use of the sequence
operator (:) can save several keystrokes when specifying
ranges. However, while the sequence operator works well for
specifying integer ranges, fractional ranges do not always work
as well.
A list with the folloing items:
mzrange |
numeric vector with start and end mass |
masslab |
textual label of mass range |
massidx |
integer vector of mass indecies |
scanrange |
integer vector with stat ane end scans |
scanlab |
textual label of scan range |
scanidx |
integer vector of scan range |
rtrange |
numeric vector of start and end times |
timelab |
textual label of time range |
profRange(object, mzrange = numeric(),
rtrange = numeric(), scanrange = numeric(),
...)
These methods get and set the m/z step for generating profile (matrix) data from raw mass spectral data. Smaller steps yield more precision at the cost of greater memory usage.
profStep(object)
## Not run: library(faahKO) cdfpath <- system.file("cdf", package = "faahKO") cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE) xset <- xcmsRaw(cdffiles[1]) xset plotSurf(xset, mass=c(200,500)) profStep(xset)<-0.1 ## decrease the bin size to get better resolution plotSurf(xset, mass=c(200, 500)) ##works nicer on high resolution data. ## End(Not run)## Not run: library(faahKO) cdfpath <- system.file("cdf", package = "faahKO") cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE) xset <- xcmsRaw(cdffiles[1]) xset plotSurf(xset, mass=c(200,500)) profStep(xset)<-0.1 ## decrease the bin size to get better resolution plotSurf(xset, mass=c(200, 500)) ##works nicer on high resolution data. ## End(Not run)
featureValues,XCMSnExp() : extract a matrix for
feature values with rows representing features and columns samples.
Parameter value allows to define which column from the
chromPeaks() matrix should be returned. Multiple
chromatographic peaks from the same sample can be assigned to a feature.
Parameter method allows to specify the method to be used in such
cases to chose from which of the peaks the value should be returned.
Parameter msLevel allows to choose a specific MS level for which feature
values should be returned (given that features have been defined for that MS
level).
quantify,XCMSnExp(): return the preprocessing results as an
SummarizedExperiment::SummarizedExperiment() object containing the
feature abundances as assay matrix, the feature definitions (returned by
featureDefinitions()) as rowData and the phenotype
information as colData. This is an ideal container for further
processing of the data. Internally, the featureValues() method
is used to extract the feature abundances, parameters for that method can
be passed to quantify with ....
## S4 method for signature 'XCMSnExp' quantify(object, ...) ## S4 method for signature 'XCMSnExp' featureValues( object, method = c("medret", "maxint", "sum"), value = "into", intensity = "into", filled = TRUE, missing = NA, msLevel = integer() )## S4 method for signature 'XCMSnExp' quantify(object, ...) ## S4 method for signature 'XCMSnExp' featureValues( object, method = c("medret", "maxint", "sum"), value = "into", intensity = "into", filled = TRUE, missing = NA, msLevel = integer() )
object |
A |
... |
For |
method |
|
value |
|
intensity |
|
filled |
|
missing |
how missing values should be reported. Allowed values are
|
msLevel |
for |
For featureValues(): a matrix with
feature values, columns representing samples, rows features. The order
of the features matches the order found in the
featureDefinitions(object) DataFrame. The rownames of the
matrix are the same than those of the featureDefinitions
DataFrame. NA is reported for features without
corresponding chromatographic peak in the respective sample(s).
For quantify(): a SummarizedExperiment::SummarizedExperiment()
representing the preprocessing results.
Johannes Rainer
XCMSnExp() for information on the data object.
featureDefinitions() to extract the DataFrame with the
feature definitions.
featureChromatograms() to extract ion chromatograms for each
feature.
hasFeatures() to evaluate whether the XCMSnExp provides feature
definitions.
Generate extracted ion chromatogram for m/z values of interest. The
raw data is used in contrast to getEIC which uses
data from the profile matrix (i.e. values binned along the M/Z
dimension).
object |
|
mzrange |
m/z range for EIC |
rtrange |
retention time range for EIC |
scanrange |
scan range for EIC |
A list of :
scan |
scan number |
intensity |
added intensity values |
rawEIC(object, mzrange = numeric(), rtrange = numeric(), scanrange = numeric())
Ralf Tautenhahn
Returns a matrix with columns for time, m/z, and intensity that represents the raw data from a chromatography mass spectrometry experiment.
object |
The container of the raw data |
mzrange |
Subset by m/z range |
rtrange |
Subset by retention time range |
scanrange |
Subset by scan index range |
log |
Whether to log transform the intensities |
A numeric matrix with three columns: time, mz and intensity.
rawMat(object, mzrange = numeric(), rtrange = numeric(),
scanrange = numeric(), log=FALSE)
Michael Lawrence
plotRaw for plotting the raw intensities
Reconstructs MS2 spectra for each MS1 chromatographic peak (if possible) for data independent acquisition (DIA) data (such as SWATH). See the LC-MS/MS analysis vignette for more details and examples.
reconstructChromPeakSpectra(object, ...) ## S4 method for signature 'XcmsExperiment' reconstructChromPeakSpectra( object, expandRt = 0, diffRt = 2, minCor = 0.8, intensity = "maxo", peakId = rownames(chromPeaks(object, msLevel = 1L)), BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp' reconstructChromPeakSpectra( object, expandRt = 0, diffRt = 2, minCor = 0.8, intensity = "maxo", peakId = rownames(chromPeaks(object, msLevel = 1L)), BPPARAM = bpparam(), return.type = c("Spectra", "MSpectra") )reconstructChromPeakSpectra(object, ...) ## S4 method for signature 'XcmsExperiment' reconstructChromPeakSpectra( object, expandRt = 0, diffRt = 2, minCor = 0.8, intensity = "maxo", peakId = rownames(chromPeaks(object, msLevel = 1L)), BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp' reconstructChromPeakSpectra( object, expandRt = 0, diffRt = 2, minCor = 0.8, intensity = "maxo", peakId = rownames(chromPeaks(object, msLevel = 1L)), BPPARAM = bpparam(), return.type = c("Spectra", "MSpectra") )
object |
|
... |
ignored. |
expandRt |
|
diffRt |
|
minCor |
|
intensity |
|
peakId |
optional |
BPPARAM |
parallel processing setup. See |
return.type |
|
In detail, the function performs for each MS1 chromatographic peak:
Identify all MS2 chromatographic peaks from the isolation window
containing the m/z of the ion (i.e. the MS1 chromatographic peak) with
approximately the same retention time than the MS1 peak (accepted rt shift
can be specified with the diffRt parameter).
Correlate the peak shapes of the candidate MS2 chromatographic peaks with
the peak shape of the MS1 peak retaining only MS2 chromatographic peaks
for which the correlation is > minCor.
Reconstruct the MS2 spectrum using the m/z of all above selected MS2
chromatographic peaks and their intensity (either "maxo" or "into").
Each MS2 chromatographic peak selected for an MS1 peak will thus represent
one mass peak in the reconstructed spectrum.
The resulting Spectra::Spectra() object provides also the peak IDs of
the MS2 chromatographic peaks for each spectrum as well as their
correlation value with spectra variables ms2_peak_id and ms2_peak_cor.
Spectra::Spectra() object (defined in the Spectra package) with the
reconstructed MS2 spectra for all MS1 peaks in object. Contains
empty spectra (i.e. without m/z and intensity values) for MS1 peaks for
which reconstruction was not possible (either no MS2 signal was recorded
or the correlation of the MS2 chromatographic peaks with the MS1
chromatographic peak was below threshold minCor. Spectra variables
"ms2_peak_id" and "ms2_peak_cor" (of type IRanges::CharacterList()
and IRanges::NumericList() with length equal to the number of peaks per
reconstructed MS2 spectrum) providing the IDs and the correlation of the
MS2 chromatographic peaks from which the MS2 spectrum was reconstructed.
As retention time the median retention times of all MS2 chromatographic
peaks used for the spectrum reconstruction is reported. The MS1
chromatographic peak intensity is reported as the reconstructed
spectrum's precursorIntensity value (see parameter intensity above).
Johannes Rainer, Michael Witting
findChromPeaksIsolationWindow() for the function to perform MS2
peak detection in DIA isolation windows and for examples.
The refineChromPeaks method performs a post-processing of the
chromatographic peak detection step to eventually clean and improve the
results. The function can be applied to a XcmsExperiment() or XCMSnExp()
object after peak detection with findChromPeaks(). The type of peak
refinement and cleaning can be defined, along with all its settings, using
one of the following parameter objects:
CleanPeaksParam: remove chromatographic peaks with a retention time
range larger than the provided maximal acceptable width (maxPeakwidth).
FilterIntensityParam: remove chromatographic peaks with intensities
below the specified threshold. By default (with nValues = 1) values in
the chromPeaks matrix are evaluated: all peaks with a value in the
column defined with parameter value that are >= a threshold (defined
with parameter threshold) are retained. If nValues is larger than 1,
the individual peak intensities from the raw MS files are evaluated:
chromatographic peaks with at least nValues mass peaks >= threshold
are retained.
MergeNeighboringPeaksParam: peak detection sometimes fails to identify a
chromatographic peak correctly, especially for broad peaks and if the peak
shape is irregular (mostly for HILIC data). In such cases several smaller
peaks are reported. Also, peak detection with centWave can result in
partially or completely overlapping peaks. This method aims to reduce
such peak detection artifacts by merging chromatographic peaks that are
overlapping or close in RT and m/z dimension (considering also the measured
signal between them). See section Details for MergeNeighboringPeaksParam
for details and a comprehensive description of the approach.
refineChromPeaks methods will always remove feature definitions, because
a call to this method can change or remove identified chromatographic peaks,
which may be part of features.
refineChromPeaks(object, param, ...) ## S4 method for signature 'XcmsExperiment,CleanPeaksParam' refineChromPeaks(object, param = CleanPeaksParam(), msLevel = 1L) ## S4 method for signature 'XcmsExperiment,MergeNeighboringPeaksParam' refineChromPeaks( object, param, msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam() ) ## S4 method for signature 'XcmsExperiment,FilterIntensityParam' refineChromPeaks( object, param, msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam() ) ## S4 method for signature 'XcmsExperimentHdf5,FilterIntensityParam' refineChromPeaks(object, param, msLevel = 1L, ...) CleanPeaksParam(maxPeakwidth = 10) MergeNeighboringPeaksParam( expandRt = 2, expandMz = 0, ppm = 10, minProp = 0.75 ) FilterIntensityParam(threshold = 0, nValues = 1L, value = "maxo") ## S4 method for signature 'XCMSnExp,CleanPeaksParam' refineChromPeaks(object, param = CleanPeaksParam(), msLevel = 1L) ## S4 method for signature 'XCMSnExp,MergeNeighboringPeaksParam' refineChromPeaks( object, param = MergeNeighboringPeaksParam(), msLevel = 1L, BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp,FilterIntensityParam' refineChromPeaks( object, param = FilterIntensityParam(), msLevel = 1L, BPPARAM = bpparam() )refineChromPeaks(object, param, ...) ## S4 method for signature 'XcmsExperiment,CleanPeaksParam' refineChromPeaks(object, param = CleanPeaksParam(), msLevel = 1L) ## S4 method for signature 'XcmsExperiment,MergeNeighboringPeaksParam' refineChromPeaks( object, param, msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam() ) ## S4 method for signature 'XcmsExperiment,FilterIntensityParam' refineChromPeaks( object, param, msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam() ) ## S4 method for signature 'XcmsExperimentHdf5,FilterIntensityParam' refineChromPeaks(object, param, msLevel = 1L, ...) CleanPeaksParam(maxPeakwidth = 10) MergeNeighboringPeaksParam( expandRt = 2, expandMz = 0, ppm = 10, minProp = 0.75 ) FilterIntensityParam(threshold = 0, nValues = 1L, value = "maxo") ## S4 method for signature 'XCMSnExp,CleanPeaksParam' refineChromPeaks(object, param = CleanPeaksParam(), msLevel = 1L) ## S4 method for signature 'XCMSnExp,MergeNeighboringPeaksParam' refineChromPeaks( object, param = MergeNeighboringPeaksParam(), msLevel = 1L, BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp,FilterIntensityParam' refineChromPeaks( object, param = FilterIntensityParam(), msLevel = 1L, BPPARAM = bpparam() )
object |
XCMSnExp or XcmsExperiment object with identified chromatographic peaks. |
param |
Object defining the refinement method and its settings. |
... |
ignored. |
msLevel |
|
chunkSize |
For |
BPPARAM |
parameter object to set up parallel processing. Uses the
default parallel processing setup returned by |
maxPeakwidth |
For |
expandRt |
For |
expandMz |
For |
ppm |
For |
minProp |
For |
threshold |
For |
nValues |
For |
value |
For |
XCMSnExp or XcmsExperiment object with the refined
chomatographic peaks.
For peak refinement using the MergeNeighboringPeaksParam, chromatographic
peaks are first expanded in m/z and retention time dimension (based on
parameters expandMz, ppm and expandRt) and subsequently grouped into
sets of merge candidates if they are (after expansion) overlapping in both
m/z and rt (within the same sample). Note that each peak gets
expanded by expandRt and expandMz, thus peaks differing by less than
2 * expandMz (or 2 * expandRt) will be evaluated for merging.
Peak merging is performed along the retention time axis, i.e., the peaks are
first ordered by their "rtmin" and merge candidates are defined iteratively
starting with the first peak.
Candidate peaks are merged if the
average intensity of the 3 data points in the middle position between them
(i.e., at half the distance between "rtmax" of the first and "rtmin" of
the second peak) is larger than a certain proportion (minProp) of the
smaller ("maxo") intensity of both peaks. In cases in which this calculated
mid point is not located between the apexes of the two peaks (e.g., if the
peaks are largely overlapping) the average signal intensity at half way
between the apexes is used instead. Candidate peaks are not merged if all 3
data points between them have NA intensities.
Merged peaks get the "mz", "rt", "sn" and "maxo" values from the
peak with the largest signal ("maxo") as well as its row in the metadata
of the peak (chromPeakData). The "rtmin" and "rtmax" of the merged
peaks are updated and "into" is recalculated based on all signal between
"rtmin" and "rtmax" and the newly defined "mzmin" and "mzmax" (which
is the range of "mzmin" and "mzmax" of the merged peaks after expanding
by expandMz and ppm). The reported "mzmin" and "mzmax" for the
merged peak represents the m/z range of all non-NA intensities used for the
calculation of the peak signal ("into").
Johannes Rainer, Mar Garcia-Aloy
## Load a test data set with detected peaks library(xcms) library(MsExperiment) faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) #### ## CleanPeaksParam: ## Distribution of chromatographic peak widths quantile(chromPeaks(faahko_sub)[, "rtmax"] - chromPeaks(faahko_sub)[, "rtmin"]) ## Remove all chromatographic peaks with a width larger 60 seconds data <- refineChromPeaks(faahko_sub, param = CleanPeaksParam(60)) quantile(chromPeaks(data)[, "rtmax"] - chromPeaks(data)[, "rtmin"]) #### ## FilterIntensityParam: ## Remove all peaks with a maximal intensity below 50000 res <- refineChromPeaks(faahko_sub, param = FilterIntensityParam(threshold = 50000)) nrow(chromPeaks(faahko_sub)) nrow(chromPeaks(res)) #### ## MergeNeighboringPeaksParam: ## Subset to a single file xd <- filterFile(faahko_sub, file = 1) ## Example of a split peak that will be merged mzr <- 305.1 + c(-0.01, 0.01) chr <- chromatogram(xd, mz = mzr, rt = c(2700, 3700)) plot(chr) ## Combine the peaks res <- refineChromPeaks(xd, param = MergeNeighboringPeaksParam(expandRt = 4)) chr_res <- chromatogram(res, mz = mzr, rt = c(2700, 3700)) plot(chr_res) ## Example of a peak that was not merged, because the signal between them ## is lower than the cut-off minProp mzr <- 496.2 + c(-0.01, 0.01) chr <- chromatogram(xd, mz = mzr, rt = c(3200, 3500)) plot(chr) chr_res <- chromatogram(res, mz = mzr, rt = c(3200, 3500)) plot(chr_res)## Load a test data set with detected peaks library(xcms) library(MsExperiment) faahko_sub <- loadXcmsData("faahko_sub2") ## Disable parallel processing for this example register(SerialParam()) #### ## CleanPeaksParam: ## Distribution of chromatographic peak widths quantile(chromPeaks(faahko_sub)[, "rtmax"] - chromPeaks(faahko_sub)[, "rtmin"]) ## Remove all chromatographic peaks with a width larger 60 seconds data <- refineChromPeaks(faahko_sub, param = CleanPeaksParam(60)) quantile(chromPeaks(data)[, "rtmax"] - chromPeaks(data)[, "rtmin"]) #### ## FilterIntensityParam: ## Remove all peaks with a maximal intensity below 50000 res <- refineChromPeaks(faahko_sub, param = FilterIntensityParam(threshold = 50000)) nrow(chromPeaks(faahko_sub)) nrow(chromPeaks(res)) #### ## MergeNeighboringPeaksParam: ## Subset to a single file xd <- filterFile(faahko_sub, file = 1) ## Example of a split peak that will be merged mzr <- 305.1 + c(-0.01, 0.01) chr <- chromatogram(xd, mz = mzr, rt = c(2700, 3700)) plot(chr) ## Combine the peaks res <- refineChromPeaks(xd, param = MergeNeighboringPeaksParam(expandRt = 4)) chr_res <- chromatogram(res, mz = mzr, rt = c(2700, 3700)) plot(chr_res) ## Example of a peak that was not merged, because the signal between them ## is lower than the cut-off minProp mzr <- 496.2 + c(-0.01, 0.01) chr <- chromatogram(xd, mz = mzr, rt = c(3200, 3500)) plot(chr) chr_res <- chromatogram(res, mz = mzr, rt = c(3200, 3500)) plot(chr_res)
removeIntensities allows to remove intensities from chromatographic data
matching certain conditions (depending on parameter which). The
intensities are actually not removed but replaced with NA_real_. To
actually remove the intensities (and the associated retention times)
use MSnbase::clean() afterwards.
Parameter which allows to specify which intensities should be replaced by
NA_real_. By default (which = "below_threshod" intensities below
threshold are removed. If x is a XChromatogram or XChromatograms
object (and hence provides also chromatographic peak definitions within the
object) which = "outside_chromPeak" can be selected which removes any
intensity which is outside the boundaries of identified chromatographic
peak(s) in the chromatographic data.
Note that filterIntensity() might be a better approach to
subset/filter chromatographic data.
## S4 method for signature 'Chromatogram' removeIntensity(object, which = "below_threshold", threshold = 0) ## S4 method for signature 'MChromatograms' removeIntensity(object, which = "below_threshold", threshold = 0) ## S4 method for signature 'XChromatogram' removeIntensity( object, which = c("below_threshold", "outside_chromPeak"), threshold = 0 )## S4 method for signature 'Chromatogram' removeIntensity(object, which = "below_threshold", threshold = 0) ## S4 method for signature 'MChromatograms' removeIntensity(object, which = "below_threshold", threshold = 0) ## S4 method for signature 'XChromatogram' removeIntensity( object, which = c("below_threshold", "outside_chromPeak"), threshold = 0 )
object |
an object representing chromatographic data. Can be a
|
which |
|
threshold |
|
the input object with matching intensities being replaced by NA.
Johannes Rainer
library(MSnbase) chr <- Chromatogram(rtime = 1:10 + rnorm(n = 10, sd = 0.3), intensity = c(5, 29, 50, NA, 100, 12, 3, 4, 1, 3)) ## Remove all intensities below 20 res <- removeIntensity(chr, threshold = 20) intensity(res)library(MSnbase) chr <- Chromatogram(rtime = 1:10 + rnorm(n = 10, sd = 0.3), intensity = c(5, 29, 50, NA, 100, 12, 3, 4, 1, 3)) ## Remove all intensities below 20 res <- removeIntensity(chr, threshold = 20) intensity(res)
To correct differences between retention times between different
samples, a number of of methods exist in XCMS. retcor
is the generic method.
object |
|
method |
Method to use for retention time correction. See details. |
... |
Optional arguments to be passed along |
Different algorithms can be used by specifying them with the
method argument. For example to use the approach described by
Smith et al (2006) one would use: retcor(object,
method="loess"). This is also the default.
Further arguments given by ... are
passed through to the function implementing
the method.
A character vector of nicknames for the
algorithms available is returned by
getOption("BioC")$xcms$retcor.methods.
If the nickname of a method is called "loess",
the help page for that specific method can
be accessed with ?retcor.loess.
An xcmsSet object with corrected retntion times.
retcor(object, ...)
retcor.loess
retcor.obiwarp
xcmsSet-class,
Calculate retention time deviations for each sample. It is based on the code at http://obi-warp.sourceforge.net/. However, this function is able to align multiple samples, by a center-star strategy.
For the original publication see
Chromatographic Alignment of ESI-LC-MS Proteomics Data Sets by Ordered Bijective Interpolated Warping John T. Prince and, Edward M. Marcotte Analytical Chemistry 2006 78 (17), 6140-6152
object |
the |
plottype |
if |
profStep |
step size (in m/z) to use for profile generation from the raw data files |
center |
the index of the sample all others will be aligned to. If center==NULL, the sample with the most peaks is chosen as default. |
col |
vector of colors for plotting each sample |
ty |
vector of line and point types for plotting each sample |
response |
Responsiveness of warping. 0 will give a linear warp based on start and end points. 100 will use all bijective anchors |
distFunc |
DistFunc function: cor (Pearson's R) or cor_opt (default, calculate only 10% diagonal band of distance matrix, better runtime), cov (covariance), prd (product), euc (Euclidean distance) |
gapInit |
Penalty for Gap opening, see below |
gapExtend |
Penalty for Gap enlargement, see below |
factorDiag |
Local weighting applied to diagonal moves in alignment. |
factorGap |
Local weighting applied to gap moves in alignment. |
localAlignment |
Local rather than global alignment |
initPenalty |
Penalty for initiating alignment (for local alignment only) Default: 0 |
Default gap penalties: (gapInit, gapExtend) [by distFunc type]: 'cor' = '0.3,2.4' 'cov' = '0,11.7' 'prd' = '0,7.8' 'euc' = '0.9,1.8'
An xcmsSet object
retcor(object, method="obiwarp", plottype = c("none", "deviation"), profStep=1, center=NULL, col = NULL, ty = NULL, response=1, distFunc="cor_opt", gapInit=NULL, gapExtend=NULL, factorDiag=2, factorGap=1, localAlignment=0, initPenalty=0)
These two methods use “well behaved” peak groups to calculate retention time deviations for every time point of each sample. Use smoothed deviations to align retention times.
object |
the |
missing |
number of missing samples to allow in retention time correction groups |
extra |
number of extra peaks to allow in retention time correction correction groups |
smooth |
either |
span |
degree of smoothing for local polynomial regression fitting |
family |
if |
plottype |
if |
col |
vector of colors for plotting each sample |
ty |
vector of line and point types for plotting each sample |
An xcmsSet object
retcor(object, missing = 1, extra = 1,
smooth = c("loess", "linear"), span = .2,
family = c("gaussian", "symmetric"),
plottype = c("none", "deviation", "mdevden"),
col = NULL, ty = NULL)
xcmsSet-class,
loess
retcor.obiwarp
Expands (or contracts) the retention time window in each row of
a matrix as defined by the retmin and retmax columns.
retexp(peakrange, width = 200)retexp(peakrange, width = 200)
peakrange |
maxtrix with columns |
width |
new width for the window |
The altered matrix.
Colin A. Smith, [email protected]
rla calculates the relative log abundances (RLA, see reference) on a
numeric vector.
rla(x, group, log.transform = TRUE) rowRla(x, group, log.transform = TRUE)rla(x, group, log.transform = TRUE) rowRla(x, group, log.transform = TRUE)
x |
|
group |
|
log.transform |
|
The RLA is defines as the (log) abundance of an analyte relative to the median across all abundances of the same group.
numeric of the same length than x (for rla) or matrix with
the same dimensions than x (for rowRla).
Johannes Rainer
De Livera AM, Dias DA, De Souza D, Rupasinghe T, Pyke J, Tull D, Roessner U, McConville M, Speed TP. Normalizing and integrating metabolomics data. Anal Chem 2012 Dec 18;84(24):10768-76. doi: 10.1021/ac302748b
x <- c(3, 4, 5, 1, 2, 3, 7, 8, 9) grp <- c(1, 1, 1, 2, 2, 2, 3, 3, 3) rla(x, grp)x <- c(3, 4, 5, 1, 2, 3, 7, 8, 9) grp <- c(1, 1, 1, 2, 2, 2, 3, 3, 3) rla(x, grp)
The RsdFilter class and methods enable users to filter features from an
XcmsExperiment or SummarizedExperiment object based on their relative
standard deviation (coefficient of variation) for a specified threshold.
This filter is part of the possible dispatch of the generic function
filterFeatures. Features above (>) the user-input threshold will be
removed from the entire dataset.
RsdFilter(threshold = 0.3, qcIndex = integer(), na.rm = TRUE, mad = FALSE) ## S4 method for signature 'XcmsResult,RsdFilter' filterFeatures(object, filter, ...) ## S4 method for signature 'SummarizedExperiment,RsdFilter' filterFeatures(object, filter, assay = 1)RsdFilter(threshold = 0.3, qcIndex = integer(), na.rm = TRUE, mad = FALSE) ## S4 method for signature 'XcmsResult,RsdFilter' filterFeatures(object, filter, ...) ## S4 method for signature 'SummarizedExperiment,RsdFilter' filterFeatures(object, filter, assay = 1)
threshold |
|
qcIndex |
|
na.rm |
|
mad |
|
object |
|
filter |
The parameter object selecting and configuring the type of
filtering. It can be one of the following classes: |
... |
Optional parameters. For |
assay |
For filtering of |
For RsdFilter: a RsdFilter class. filterFeatures return the
input object minus the features that did not met the user input threshold.
It is assumed that the abundance values are in natural scale. Abundances in log scale should be first transformed to natural scale before calculating the RSD.
Philippine Louail
Other Filter features in xcms:
BlankFlag,
DratioFilter,
PercentMissingFilter
Return sample names for an object
A character vector with sample names.
sampnames(object)
sampnames(object)
If peak detection is performed with findPeaks
setting argument stopOnError = FALSE eventual errors during the
process do not cause to stop the processing but are recorded inside of
the resulting xcmsSet object. These errors can be
accessed with the showError method.
## S4 method for signature 'xcmsSet' showError(object, message. = TRUE, ...)## S4 method for signature 'xcmsSet' showError(object, message. = TRUE, ...)
object |
An |
message. |
Logical indicating whether only the error message, or the error itself should be returned. |
... |
Additional arguments. |
A list of error messages (if message. = TRUE) or errors or an
empty list if no errors are present.
Johannes Rainer
There are several methods for calculating a distance between two sets of peaks in xcms. specDist
is the generic method.
object |
a xcmsSet or xcmsRaw. |
method |
Method to use for distance calculation. See details. |
... |
mzabs, mzppm and parameters for the distance function. |
Different algorithms can be used by specifying them with the
method argument. For example to use the "meanMZmatch"
approach with xcmsSet one would use:
specDist(object, peakIDs1, peakIDs2, method="meanMZmatch"). This is also
the default.
Further arguments given by ... are
passed through to the function implementing
the method.
A character vector of nicknames for the
algorithms available is returned by
getOption("BioC")$xcms$specDist.methods.
If the nickname of a method is called "meanMZmatch",
the help page for that specific method can
be accessed with ?specDist.meanMZmatch.
mzabs |
maximum absolute deviation for two matching peaks |
mzppm |
relative deviations in ppm for two matching peaks |
symmetric |
use symmetric pairwise m/z-matches only, or each match |
specDist(object, peakIDs1, peakIDs2,...)
specDist(object, PSpec1, PSpec2,...)
Joachim Kutzera, [email protected]
This method calculates the distance of two sets of peaks using the cosine-distance.
specDist.cosine(peakTable1, peakTable2, mzabs=0.001, mzppm=10, mzExp=0.6, intExp=3, nPdiff=2, nPmin=8, symmetric=FALSE)specDist.cosine(peakTable1, peakTable2, mzabs=0.001, mzppm=10, mzExp=0.6, intExp=3, nPdiff=2, nPmin=8, symmetric=FALSE)
peakTable1 |
a Matrix containing at least m/z-values, row must be called "mz" |
peakTable2 |
the matrix for the other mz-values |
mzabs |
maximum absolute deviation for two matching peaks |
mzppm |
relative deviations in ppm for two matching peaks |
symmetric |
use symmetric pairwise m/z-matches only, or each match |
mzExp |
the exponent used for mz |
intExp |
the exponent used for intensity |
nPdiff |
the maximum nrow-difference of the two peaktables |
nPmin |
the minimum absolute sum of peaks from both praktables |
The result is the cosine-distance of the product from weighted factors of mz and intensity from matching peaks in the two peaktables. The factors are calculated as wFact = mz^mzExp * int^intExp. if no distance is calculated (for example because no matching peaks were found) the return-value is NA.
specDist.cosine(peakTable1, peakTable2, mzabs = 0.001, mzppm = 10,
mzExp = 0.6, intExp = 3, nPdiff = 2, nPmin = 8,
symmetric = FALSE)
Joachim Kutzera, [email protected]
This method calculates the distance of two sets of peaks.
specDist.meanMZmatch(peakTable1, peakTable2, matchdist=1, matchrate=1, mzabs=0.001, mzppm=10, symmetric=TRUE)specDist.meanMZmatch(peakTable1, peakTable2, matchdist=1, matchrate=1, mzabs=0.001, mzppm=10, symmetric=TRUE)
peakTable1 |
a Matrix containing at least m/z-values, row must be called "mz" |
peakTable2 |
the matrix for the other mz-values |
mzabs |
maximum absolute deviation for two matching peaks |
mzppm |
relative deviations in ppm for two matching peaks |
symmetric |
use symmetric pairwise m/z-matches only, or each match |
matchdist |
the weight for value one (see details) |
matchrate |
the weight for value two |
The result of the calculation is a weighted sum of two values. Value one is the mean absolute difference of the matching peaks, value two is the relation of matching peaks and non matching peaks. if no distance is calculated (for example because no matching peaks were found) the return-value is NA.
specDist.meanMZmatch(peakTable1, peakTable2,
matchdist=1, matchrate=1,
mzabs=0.001, mzppm=10, symmetric=TRUE)
Joachim Kutzera, [email protected]
This method calculates the distance of two sets of peaks by just returning the number of matching peaks (m/z-values).
specDist.peakCount(peakTable1, peakTable2, mzabs=0.001, mzppm=10, symmetric=FALSE)specDist.peakCount(peakTable1, peakTable2, mzabs=0.001, mzppm=10, symmetric=FALSE)
peakTable1 |
a Matrix containing at least m/z-values, row must be called "mz" |
peakTable2 |
the matrix for the other mz-values |
mzabs |
maximum absolute deviation for two matching peaks |
mzppm |
relative deviations in ppm for two matching peaks |
symmetric |
use symmetric pairwise m/z-matches only, or each match |
specDist.peakCount(peakTable1, peakTable2, mzppm=10,symmetric=FALSE )
Joachim Kutzera, [email protected]
Given a sparse continuum mass spectrum, determine regions where no signal is present, substituting half of the minimum intensity for those regions. Calculate the noise level as the weighted mean of the regions with signal and the regions without signal. If there is only one raw peak, return zero.
specNoise(spec, gap = quantile(diff(spec[, "mz"]), 0.9))specNoise(spec, gap = quantile(diff(spec[, "mz"]), 0.9))
spec |
matrix with named columns |
gap |
threshold above which to data points are considerd to be separated by a blank region and not bridged by an interpolating line |
The default gap value is determined from the 90th percentile of the pair-wise differences between adjacent mass values.
A numeric noise level
Colin A. Smith, [email protected]
Given a spectrum, identify and list significant peaks as determined by several criteria.
specPeaks(spec, sn = 20, mzgap = 0.2)specPeaks(spec, sn = 20, mzgap = 0.2)
spec |
matrix with named columns |
sn |
minimum signal to noise ratio |
mzgap |
minimal distance between adjacent peaks, with smaller peaks being excluded |
Peaks must meet two criteria to be considered peaks: 1) Their s/n ratio must exceed a certain threshold. 2) They must not be within a given distance of any greater intensity peaks.
A matrix with columns:
mz |
m/z at maximum peak intensity |
intensity |
maximum intensity of the peak |
fwhm |
full width at half max of the peak |
Colin A. Smith, [email protected]
Divides the scans from a xcmsRaw object into
a list of multiple objects. MS$^n$ data is discarded.
x |
|
f |
factor such that |
drop |
logical indicating if levels that do not occur should be dropped (if 'f' is a 'factor' or a list). |
... |
further potential arguments passed to methods. |
A list of xcmsRaw objects.
split(x, f, drop = TRUE, ...)
Steffen Neumann, [email protected]
Divides the samples and peaks from a xcmsSet object into
a list of multiple objects. Group data is discarded.
xs |
|
f |
factor such that |
drop |
logical indicating if levels that do not occur should be dropped (if 'f' is a 'factor' or a list). |
... |
further potential arguments passed to methods. |
A list of xcmsSet objects.
split(x, f, drop = TRUE, ...)
Colin A. Smith, [email protected]
This selfStart model evalueates the Gaussian model and its
gradient. It has an initial attribute that will evalueate
the inital estimates of the parameters mu, sigma,
and h.
SSgauss(x, mu, sigma, h)SSgauss(x, mu, sigma, h)
x |
a numeric vector of values at which to evaluate the model |
mu |
mean of the distribution function |
sigma |
standard deviation of the distribution fuction |
h |
height of the distribution function |
Initial values for mu and h are chosen from the
maximal value of x. The initial value for sigma is
determined from the area under x divided by h*sqrt(2*pi).
A numeric vector of the same length as x. It is the value
of the expression h*exp(-(x-mu)^2/(2*sigma^2), which is a
modified gaussian function where the maximum height is treated
as a separate parameter not dependent on sigma. If arguments
mu, sigma, and h are names of objects, the
gradient matrix with respect to these names is attached as an
attribute named gradient.
Colin A. Smith, [email protected]
Fixes gaps in data due to calibration scans or lock mass. Automatically detects file type and calls the relevant method. The mzXML file keeps the data the same length in time but overwrites the lock mass scans. The netCDF version adds the scans back into the data thereby increasing the length of the data and correcting for the unseen gap.
object |
An |
lockMass |
A dataframe of locations of the gaps |
freq |
The intervals of the lock mass scans |
start |
The starting lock mass scan location, default is 1 |
makeacqNum takes locates the gap using the starting lock mass scan and it's intervals. This data frame is then used in
stitch to correct for the gap caused by the lock mass. Correction works by using scans from either side of the gap to fill it in.
stitch A corrected xcmsRaw-class object
makeacqNum A numeric vector of scan locations corresponding to lock Mass scans
stitch(object, lockMass=numeric())
makeacqNum(object, freq=numeric(), start=1)
Paul Benton, [email protected]
## Not run: library(xcms) library(faahKO) ## These files do not have this problem to correct for but just ## for an example cdfpath <- system.file("cdf", package = "faahKO") cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE) xr<-xcmsRaw(cdffiles[1]) xr ##Lets assume that the lockmass starts at 1 and is every 100 scans lockMass<-xcms:::makeacqNum(xr, freq=100, start=1) ## these are equcal lockmass<-AutoLockMass(xr) ob<-stitch(xr, lockMass) ob ## plot the old data before correction foo<-rawEIC(xr, m=c(200,210), scan=c(80,140)) plot(foo$scan, foo$intensity, type="h") ## plot the new corrected data to see what changed foo<-rawEIC(ob, m=c(200,210), scan=c(80,140)) plot(foo$scan, foo$intensity, type="h") ## End(Not run)## Not run: library(xcms) library(faahKO) ## These files do not have this problem to correct for but just ## for an example cdfpath <- system.file("cdf", package = "faahKO") cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE) xr<-xcmsRaw(cdffiles[1]) xr ##Lets assume that the lockmass starts at 1 and is every 100 scans lockMass<-xcms:::makeacqNum(xr, freq=100, start=1) ## these are equcal lockmass<-AutoLockMass(xr) ob<-stitch(xr, lockMass) ob ## plot the old data before correction foo<-rawEIC(xr, m=c(200,210), scan=c(80,140)) plot(foo$scan, foo$intensity, type="h") ## plot the new corrected data to see what changed foo<-rawEIC(ob, m=c(200,210), scan=c(80,140)) plot(foo$scan, foo$intensity, type="h") ## End(Not run)
The xcms result objects XcmsExperiment() and XCMSnExp() keep all
preprocessing results in memory and can thus (depending on the size of the
data set) require a large amount of memory. In contrast, the
XcmsExperimentHdf5 class, by using an on-disk data storage mechanism,
has a much lower memory footprint allowing also to analyze very large data
sets on regular computer systems such as desktop or laptop computers. With
some exceptions, including additional parameters, the functionality and
usability of this object is identical to the default XcmsExperiment
object.
This help page lists functions that have additional or different parameters
or properties than the respective methods for XcmsExperiment() objects.
For all other functions not listed here the usability is identical to those
for the XcmsExperiment() object (see the respective help page for
information).
toXcmsExperimentHdf5(object, hdf5File = tempfile()) toXcmsExperiment(object, ...) ## S4 method for signature 'XcmsExperimentHdf5' chromPeakData( object, msLevel = integer(), peaks = character(), columns = character(), return.type = c("DataFrame", "data.frame"), bySample = FALSE ) ## S4 method for signature 'XcmsExperimentHdf5' filterChromPeaks( object, keep = rep(TRUE, nrow(chromPeaks(object))), method = "keep", ... ) ## S4 method for signature 'XcmsExperimentHdf5,PeakGroupsParam' adjustRtimePeakGroups(object, param = PeakGroupsParam(), msLevel = 1L) ## S4 method for signature 'XcmsExperimentHdf5' filterFeatureDefinitions(object, features = integer())toXcmsExperimentHdf5(object, hdf5File = tempfile()) toXcmsExperiment(object, ...) ## S4 method for signature 'XcmsExperimentHdf5' chromPeakData( object, msLevel = integer(), peaks = character(), columns = character(), return.type = c("DataFrame", "data.frame"), bySample = FALSE ) ## S4 method for signature 'XcmsExperimentHdf5' filterChromPeaks( object, keep = rep(TRUE, nrow(chromPeaks(object))), method = "keep", ... ) ## S4 method for signature 'XcmsExperimentHdf5,PeakGroupsParam' adjustRtimePeakGroups(object, param = PeakGroupsParam(), msLevel = 1L) ## S4 method for signature 'XcmsExperimentHdf5' filterFeatureDefinitions(object, features = integer())
object |
|
hdf5File |
For |
... |
additional parameters eventually passed to downstream functions. |
msLevel |
For |
peaks |
For |
columns |
For |
return.type |
For |
bySample |
For |
keep |
For |
method |
For |
param |
parameter object defining and configuring the algorithm to be used. |
features |
For |
The XcmsExperimentHdf5 object stores all preprocessing results (except
adjusted retention times, which are stored as an additional spectra variable
in the object's Spectra::Spectra() object), in a file in HDF5 format.
XcmsExperimentHdf5 uses a different naming scheme for chromatographic
peaks: for efficiency reasons, chromatographic peak data is organized by
sample and MS level. The chrom peak IDs are hence in the format
"CP"<MS level>"S"<sample id><chrom peak index> with <MS level> being
the MS level in which the chromatographic peaks were detected and
<sample id> the ID of the sample (usually related to the index in the
original MsExperiment object) and the <chrom peak index> the index of the
chromatographic peak in the chrom peak matrix of that sample and
MS level.
HDF5 files do not support parallel processing, thus preprocessing results need to be stored or loaded sequentially.
All functionality for XcmsExperimentHdf5 objects is optimized to reduce
memory demand at the cost of eventually lower performance.
See description of the individual methods for information.
XcmsExperiment and XcmsExperimentHdf5
To use the XcmsExperimentHdf5 class for preprocessing results, the
hdf5File parameter of the findChromPeaks() function needs to be defined,
specifying the path and name of the HDF5 file to store the results. In
addition it is possible to convert a XcmsExperiment object to a
XcmsExperimentHdf5 object with the toXcmsExperimentHdf5() function. All
present preprocessing results will be stored to the specified HDF5 file.
To load all preprocessing results into memory and hence change from a
XcmsExperimentHdf5 to a XcmsExperiment object, the toXcmsExperiment()
function can be used.
Calling findChromPeaks() on an MsExperiment using the parameter
hdf5File will return an instance of the XcmsExperimentHdf5 class and
hence use the on-disk data storage mode described on this page. The results
are stored in the file specified with parameter hdf5File.
[: subset the XcmsExperimentHdf5 object to the specified samples.
Parameters keepChromPeaks (default TRUE), keepAdjustedRtime
(default TRUE) and keepFeatures (default FALSE) allow to configure
whether present chromatographic peaks, alignment or correspondence results
should be retained. This will only change information in the object (i.e.,
the reference to the respective entries in the HDF5 file), but will
not change the content of the HDF5 file. Thus, reverting the
retention times of detected chromatographic peaks is not supported and
keepChromPeaks = TRUE with keepAdjustedRtime = FALSE will throw an
error. Note that with keepChromPeaks = FALSE also keepFeatures is set
to FALSE.
filterChromPeaks() and filterFeatureDefinitions() to filter the
chromatographic peak and correspondence results, respectively. See
documentation below for details. Subset using unsorted or duplicated
indices is not supported.
chromPeaks() gains parameter bySample = FALSE that, if set to TRUE
returns a list of chromPeaks matrices, one for each sample. Due to
the way data is organized in XcmsExperimentHdf5 objects this is more
efficient than bySample = FALSE. Thus, in cases where chrom peak data
is subsequently evaluated or processed by sample, it is suggested to
use bySample = TRUE.
chromPeakData() gains a new parameter peaks = character() which allows
to specify from which chromatographic peaks data should be returned.
For these chromatographic peaks the ID (row name in chromPeaks())
should be provided with the peaks parameter. This can reduce the memory
requirement for cases in which only data of some selected chromatographic
peaks needs to be extracted. Also, chromPeakData() supports the
bySample parameter described for chromPeaks() above. All other
parameters present also for chromPeakData() of XcmsExperiment objects,
such as columns are supported.
filterChromPeaks() allows to filter the chromatographic peaks specifying
which should be retainend using the keep parameter. This can be either
a logical, character or integer vector. Duplicated or unsorted
indices are not supported. Eventually present feature definitions
will be updated as well. The function returns the object with the
filtered chromatographic peaks.
adjustRtimePeakGroups() and adjustRtime() with PeakGroupsParam:
parameter extraPeaks of PeakGroupsParam is ignored. Anchor peaks
are thus only defined using the minFraction and the optional subset
parameter.
featureDefinitions(): similarly to featureDefinitions() for
XcmsExperiment objects, this method returns a data.frame with the
characteristics for the defined LC-MS features. The function for
XcmsExperimentHdf5 does however not return the "peakidx" column
with the indices of the chromatographic peaks per feature. Also, the
columns are returned in alphabetic order.
featureValues(): for parameter value, the option value = "index"
(i.e. returning the index of the chromatographic peaks within the
chromPeaks() matrix per feature) is not supported.
filterFeatureDefinitions(): filter the feature definitions keeping only
the specified features. Parameter features can be used to define the
features to retain. It supports a logical, integer indices or
character with the IDs of the features (i.e., their row names in
featureDefinitions()). The function returns the input
XcmsExperimentHdf5 with the filtered content.
Johannes Rainerr, Philippine Louail
## Create a MsExperiment object representing the data from an LC-MS ## experiment. library(MsExperiment) library(faahKO) ## Define the raw data files fls <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"), system.file('cdf/KO/ko16.CDF', package = "faahKO"), system.file('cdf/KO/ko18.CDF', package = "faahKO")) ## Define a data frame with the sample characterization df <- data.frame(mzML_file = basename(fls), sample = c("ko15", "ko16", "ko18")) ## Importe the data. This will initialize a `Spectra` object representing ## the raw data and assign these to the individual samples. mse <- readMsExperiment(spectraFiles = fls, sampleData = df) ## Perform chromatographic peak detection storing the data in an HDF5 file ## Parameter `hdf5File` has to be provided and needs to be the path and ## name of a (not yet existing) file to which results are going to be ## stored. For the example below we use a temporary file. xmse <- findChromPeaks(mse, param = CentWaveParam(prefilter = c(4, 100000)), hdf5File = tempfile()) xmse ## Extract selected columnds from the chromatographic peak detection ## results chromPeaks(xmse, columns = c("rt", "mz", "into")) |> head() ## Extract the results per sample res <- chromPeaks(xmse, columns = c("rt", "mz", "into"), bySample = TRUE) ## The chromatographic peaks of the second sample: res[[2]] |> head() ## Convert the result object to the in-memory representation: xmse_mem <- toXcmsExperiment(xmse) xmse_mem## Create a MsExperiment object representing the data from an LC-MS ## experiment. library(MsExperiment) library(faahKO) ## Define the raw data files fls <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"), system.file('cdf/KO/ko16.CDF', package = "faahKO"), system.file('cdf/KO/ko18.CDF', package = "faahKO")) ## Define a data frame with the sample characterization df <- data.frame(mzML_file = basename(fls), sample = c("ko15", "ko16", "ko18")) ## Importe the data. This will initialize a `Spectra` object representing ## the raw data and assign these to the individual samples. mse <- readMsExperiment(spectraFiles = fls, sampleData = df) ## Perform chromatographic peak detection storing the data in an HDF5 file ## Parameter `hdf5File` has to be provided and needs to be the path and ## name of a (not yet existing) file to which results are going to be ## stored. For the example below we use a temporary file. xmse <- findChromPeaks(mse, param = CentWaveParam(prefilter = c(4, 100000)), hdf5File = tempfile()) xmse ## Extract selected columnds from the chromatographic peak detection ## results chromPeaks(xmse, columns = c("rt", "mz", "into")) |> head() ## Extract the results per sample res <- chromPeaks(xmse, columns = c("rt", "mz", "into"), bySample = TRUE) ## The chromatographic peaks of the second sample: res[[2]] |> head() ## Convert the result object to the in-memory representation: xmse_mem <- toXcmsExperiment(xmse) xmse_mem
xcmsSet objectThis method updates an old xcmsSet
object to the latest definition.
## S4 method for signature 'xcmsSet' updateObject(object, ..., verbose = FALSE)## S4 method for signature 'xcmsSet' updateObject(object, ..., verbose = FALSE)
object |
The |
... |
Optional additional arguments. Currently ignored. |
verbose |
Currently ignored. |
An updated xcmsSet containing all data from
the input object.
Johannes Rainer
This function allows to enable the usage of old, partially deprecated code from xcms by setting a corresponding global option. See details for functions affected.
useOriginalCode(x)useOriginalCode(x)
x |
|
The functions/methods that are affected by this option are:
do_findChromPeaks_matchedFilter: use the original code that iteratively creates a subset of the binned (profile) matrix. This is helpful for computers with limited memory or matchedFilter settings with a very small bin size.
logical(1) indicating whether old code is being used.
For parallel processing using the SOCKS method (e.g. by
BiocParallel::SnowParam() on
Windows computers) this option might not be passed to the individual R
processes performing the calculations. In such cases it is suggested to
specify the option manually and system-wide by adding the line
options(XCMSuseOriginalCode = TRUE) in a file called .Rprofile in the
folder in which new R processes are started (usually the user's
home directory; to ensure that the option is correctly read add a new line
to the file too). See also Startup from the base R documentation on how to
specify system-wide options for R.
Usage of old code is strongly dicouraged. This function is thought to be used mainly in the transition phase from xcms to xcms version 3.
Johannes Rainer
Export in XML data formats: verify the written data
verify.mzQuantML(filename, xsdfilename)verify.mzQuantML(filename, xsdfilename)
filename |
filename (may include full path) for the output file. Pipes or URLs are not allowed. |
xsdfilename |
Filename of the XSD to verify against (may include full path) |
The verify.mzQuantML() function will verify an PSI standard format mzQuantML document against the XSD schemda, see http://www.psidev.info/mzquantml
None.
Write the raw data to a (simple) CDF file.
object |
the |
filename |
filename (may include full path) for the CDF file. Pipes or URLs are not allowed. |
Currently the only application known to read the resulting file is XCMS. Others, especially those which build on the AndiMS library, will refuse to load the output.
None.
write.cdf(object, filename)
Write the raw data to a (simple) mzData file.
object |
the |
filename |
filename (may include full path) for the mzData file. Pipes or URLs are not allowed. |
This function will export a given xcmsRaw object to an mzData file. The mzData file will contain a <spectrumList> containing the <spectrum> with mass and intensity values in 32 bit precision. Other formats are currently not supported. Any header information (e.g. additional <software> information or <cvParams>) will be lost. Currently, also any MSn information will not be stored.
None.
write.mzdata(object, filename)
Export in XML data formats: Write the processed data in an xcmsSet to mzQuantML.
object |
the |
filename |
filename (may include full path) for the output file. Pipes or URLs are not allowed. |
The write.mzQuantML() function will write a (grouped) xcmsSet into the PSI standard format mzQuantML, see http://www.psidev.info/mzquantml
None.
write.mzQuantML(object, filename)
xcmsSet-class,
xcmsSet,
verify.mzQuantML,
writeMSData exports mass spectrometry data in mzML or mzXML format.
If adjusted retention times are present, these are used as retention time of
the exported spectra.
## S4 method for signature 'XCMSnExp,character' writeMSData( object, file, outformat = c("mzml", "mzxml"), copy = FALSE, software_processing = NULL, ... )## S4 method for signature 'XCMSnExp,character' writeMSData( object, file, outformat = c("mzml", "mzxml"), copy = FALSE, software_processing = NULL, ... )
object |
XCMSnExp object with the mass spectrometry data. |
file |
|
outformat |
|
copy |
|
software_processing |
optionally provide specific data processing steps.
See documentation of the |
... |
Additional parameters to pass down to the |
Johannes Rainer
MSnbase::writeMSData() function in the MSnbase package.
Write the grouped xcmsSet to an mzTab file.
object |
the |
filename |
filename (may include full path) for the mzTab file. Pipes or URLs are not allowed. |
The mzTab file format for MS-based metabolomics (and proteomics) is a lightweight supplement to the existing standard XML-based file formats (mzML, mzIdentML, mzQuantML), providing a comprehensive summary, similar in concept to the supplemental material of a scientific publication. mzTab files from xcms contain small molecule sections together with experimental metadata and basic quantitative information. The format is intended to store a simple summary of the final results.
None.
writeMzTab(object, filename)
library(faahKO) xs <- group(faahko) mzt <- data.frame(character(0)) mzt <- xcms:::mzTabHeader(mzt, version="1.1.0", mode="Complete", type="Quantification", description="faahKO", xset=xs) mzt <- xcms:::mzTabAddSME(mzt, xs) xcms:::writeMzTab(mzt, "faahKO.mzTab")library(faahKO) xs <- group(faahko) mzt <- data.frame(character(0)) mzt <- xcms:::mzTabHeader(mzt, version="1.1.0", mode="Complete", type="Quantification", description="faahKO", xset=xs) mzt <- xcms:::mzTabAddSME(mzt, xs) xcms:::writeMzTab(mzt, "faahKO.mzTab")
The XChromatogram object allows to store chromatographic data (e.g.
an extracted ion chromatogram) along with identified chromatographic peaks
within that data. The object inherits all functions from the
MSnbase::Chromatogram() object in the MSnbase package.
Multiple XChromatogram objects can be stored in a XChromatograms object.
This class extends MSnbase::MChromatograms() from the MSnbase package
and allows thus to arrange chromatograms in a matrix-like structure, columns
representing samples and rows m/z-retention time ranges.
All functions are described (grouped into topic-related sections) after the Arguments section.
XChromatograms(data, phenoData, featureData, chromPeaks, chromPeakData, ...) XChromatogram( rtime = numeric(), intensity = numeric(), mz = c(NA_real_, NA_real_), filterMz = c(NA_real_, NA_real_), precursorMz = c(NA_real_, NA_real_), productMz = c(NA_real_, NA_real_), fromFile = integer(), aggregationFun = character(), msLevel = 1L, chromPeaks, chromPeakData ) ## S4 method for signature 'XChromatogram' chromPeaks( object, rt = numeric(), mz = numeric(), ppm = 0, type = c("any", "within", "apex_within"), msLevel ) ## S4 replacement method for signature 'XChromatogram' chromPeaks(object) <- value ## S4 method for signature 'XChromatogram,ANY' plot( x, col = "#00000060", lty = 1, type = "l", xlab = "retention time", ylab = "intensity", main = NULL, peakType = c("polygon", "point", "rectangle", "none"), peakCol = "#00000060", peakBg = "#00000020", peakPch = 1, ... ) ## S4 method for signature 'XChromatogram' filterMz(object, mz, ...) ## S4 method for signature 'XChromatogram' filterRt(object, rt, ...) ## S4 method for signature 'XChromatogram' hasChromPeaks(object) ## S4 method for signature 'XChromatogram' dropFilledChromPeaks(object) ## S4 method for signature 'XChromatogram' chromPeakData(object) ## S4 replacement method for signature 'XChromatogram' chromPeakData(object) <- value ## S4 method for signature 'XChromatogram,MergeNeighboringPeaksParam' refineChromPeaks(object, param = MergeNeighboringPeaksParam()) ## S4 method for signature 'XChromatogram' filterChromPeaks(object, method = c("keepTop"), ...) ## S4 method for signature 'XChromatogram' transformIntensity(object, FUN = identity) ## S4 method for signature 'XChromatograms' hasChromPeaks(object) ## S4 method for signature 'XChromatograms' hasFilledChromPeaks(object) ## S4 method for signature 'XChromatograms' chromPeaks( object, rt = numeric(), mz = numeric(), ppm = 0, type = c("any", "within", "apex_within"), msLevel ) ## S4 method for signature 'XChromatograms' chromPeakData(object) ## S4 method for signature 'XChromatograms' filterMz(object, mz, ...) ## S4 method for signature 'XChromatograms' filterRt(object, rt, ...) ## S4 method for signature 'XChromatograms,ANY' plot( x, col = "#00000060", lty = 1, type = "l", xlab = "retention time", ylab = "intensity", main = NULL, peakType = c("polygon", "point", "rectangle", "none"), peakCol = "#00000060", peakBg = "#00000020", peakPch = 1, ... ) ## S4 method for signature 'XChromatograms' processHistory(object, fileIndex, type) ## S4 method for signature 'XChromatograms' hasFeatures(object, ...) ## S4 method for signature 'XChromatograms' dropFeatureDefinitions(object, ...) ## S4 method for signature 'XChromatograms,PeakDensityParam' groupChromPeaks(object, param) ## S4 method for signature 'XChromatograms' featureDefinitions( object, mz = numeric(), rt = numeric(), ppm = 0, type = c("any", "within", "apex_within") ) ## S4 method for signature 'XChromatograms,ANY,ANY,ANY' x[i, j, drop = TRUE] ## S4 method for signature 'XChromatograms' featureValues( object, method = c("medret", "maxint", "sum"), value = "into", intensity = "into", missing = NA, ... ) ## S4 method for signature 'XChromatograms' plotChromPeakDensity( object, param, col = "#00000060", xlab = "retention time", main = NULL, peakType = c("polygon", "point", "rectangle", "none"), peakCol = "#00000060", peakBg = "#00000020", peakPch = 1, simulate = TRUE, ... ) ## S4 method for signature 'XChromatograms' dropFilledChromPeaks(object) ## S4 method for signature 'XChromatograms,MergeNeighboringPeaksParam' refineChromPeaks(object, param = MergeNeighboringPeaksParam()) ## S4 method for signature 'XChromatograms' filterChromPeaks(object, method = c("keepTop"), ...) ## S4 method for signature 'XChromatograms' transformIntensity(object, FUN = identity)XChromatograms(data, phenoData, featureData, chromPeaks, chromPeakData, ...) XChromatogram( rtime = numeric(), intensity = numeric(), mz = c(NA_real_, NA_real_), filterMz = c(NA_real_, NA_real_), precursorMz = c(NA_real_, NA_real_), productMz = c(NA_real_, NA_real_), fromFile = integer(), aggregationFun = character(), msLevel = 1L, chromPeaks, chromPeakData ) ## S4 method for signature 'XChromatogram' chromPeaks( object, rt = numeric(), mz = numeric(), ppm = 0, type = c("any", "within", "apex_within"), msLevel ) ## S4 replacement method for signature 'XChromatogram' chromPeaks(object) <- value ## S4 method for signature 'XChromatogram,ANY' plot( x, col = "#00000060", lty = 1, type = "l", xlab = "retention time", ylab = "intensity", main = NULL, peakType = c("polygon", "point", "rectangle", "none"), peakCol = "#00000060", peakBg = "#00000020", peakPch = 1, ... ) ## S4 method for signature 'XChromatogram' filterMz(object, mz, ...) ## S4 method for signature 'XChromatogram' filterRt(object, rt, ...) ## S4 method for signature 'XChromatogram' hasChromPeaks(object) ## S4 method for signature 'XChromatogram' dropFilledChromPeaks(object) ## S4 method for signature 'XChromatogram' chromPeakData(object) ## S4 replacement method for signature 'XChromatogram' chromPeakData(object) <- value ## S4 method for signature 'XChromatogram,MergeNeighboringPeaksParam' refineChromPeaks(object, param = MergeNeighboringPeaksParam()) ## S4 method for signature 'XChromatogram' filterChromPeaks(object, method = c("keepTop"), ...) ## S4 method for signature 'XChromatogram' transformIntensity(object, FUN = identity) ## S4 method for signature 'XChromatograms' hasChromPeaks(object) ## S4 method for signature 'XChromatograms' hasFilledChromPeaks(object) ## S4 method for signature 'XChromatograms' chromPeaks( object, rt = numeric(), mz = numeric(), ppm = 0, type = c("any", "within", "apex_within"), msLevel ) ## S4 method for signature 'XChromatograms' chromPeakData(object) ## S4 method for signature 'XChromatograms' filterMz(object, mz, ...) ## S4 method for signature 'XChromatograms' filterRt(object, rt, ...) ## S4 method for signature 'XChromatograms,ANY' plot( x, col = "#00000060", lty = 1, type = "l", xlab = "retention time", ylab = "intensity", main = NULL, peakType = c("polygon", "point", "rectangle", "none"), peakCol = "#00000060", peakBg = "#00000020", peakPch = 1, ... ) ## S4 method for signature 'XChromatograms' processHistory(object, fileIndex, type) ## S4 method for signature 'XChromatograms' hasFeatures(object, ...) ## S4 method for signature 'XChromatograms' dropFeatureDefinitions(object, ...) ## S4 method for signature 'XChromatograms,PeakDensityParam' groupChromPeaks(object, param) ## S4 method for signature 'XChromatograms' featureDefinitions( object, mz = numeric(), rt = numeric(), ppm = 0, type = c("any", "within", "apex_within") ) ## S4 method for signature 'XChromatograms,ANY,ANY,ANY' x[i, j, drop = TRUE] ## S4 method for signature 'XChromatograms' featureValues( object, method = c("medret", "maxint", "sum"), value = "into", intensity = "into", missing = NA, ... ) ## S4 method for signature 'XChromatograms' plotChromPeakDensity( object, param, col = "#00000060", xlab = "retention time", main = NULL, peakType = c("polygon", "point", "rectangle", "none"), peakCol = "#00000060", peakBg = "#00000020", peakPch = 1, simulate = TRUE, ... ) ## S4 method for signature 'XChromatograms' dropFilledChromPeaks(object) ## S4 method for signature 'XChromatograms,MergeNeighboringPeaksParam' refineChromPeaks(object, param = MergeNeighboringPeaksParam()) ## S4 method for signature 'XChromatograms' filterChromPeaks(object, method = c("keepTop"), ...) ## S4 method for signature 'XChromatograms' transformIntensity(object, FUN = identity)
data |
For |
phenoData |
For |
featureData |
For |
chromPeaks |
For |
chromPeakData |
For |
... |
For |
rtime |
For |
intensity |
For For `featureValues`: `character(1)` specifying the name of the column in `chromPeaks(object)` containing the intensity value of the peak that should be used for the `method = "maxint"` conflict resolution if. |
mz |
For |
filterMz |
For |
precursorMz |
For |
productMz |
For |
fromFile |
For |
aggregationFun |
For |
msLevel |
For |
object |
An |
rt |
For |
ppm |
For |
type |
For For `plot`: what type of plot should be used for the chromatogram (such as `"l"` for lines, `"p"` for points etc), see help of [plot()] in the `graphics` package for more details. For `processHistory`: restrict returned processing steps to specific types. Use [processHistoryTypes()] to list all supported values. |
value |
For For `featureValues`: `character(1)` specifying the name of the column in `chromPeaks(object)` that should be returned or `"index"` (default) to return the index of the peak associated with the feature in each sample. To return the integrated peak area instead of the index use `value = "into"`. |
x |
For |
col |
For |
lty |
For |
xlab |
For |
ylab |
For |
main |
For |
peakType |
For |
peakCol |
For |
peakBg |
For |
peakPch |
For |
param |
For |
method |
For |
FUN |
For |
fileIndex |
For |
i |
For |
j |
For |
drop |
For |
missing |
For |
simulate |
For |
See help of the individual functions.
Objects can be created with the contructor function XChromatogram and
XChromatograms, respectively. Also, they can be coerced from
MSnbase::Chromatogram() or MSnbase::MChromatograms() objects using
as(object, "XChromatogram") or as(object, "XChromatograms").
Besides classical subsetting with [ specific filter operations on
MSnbase::MChromatograms() and XChromatograms objects are available. See
filterColumnsIntensityAbove() for more details.
[ allows to subset a XChromatograms object by row (i) and column
(j), with i and j being of type integer. The featureDefinitions
will also be subsetted accordingly and the peakidx column updated.
filterMz filters the chromatographic peaks within an XChromatogram or
XChromatograms, if a column "mz" is present in the chromPeaks matrix.
This would be the case if the XChromatogram was extracted from an
XCMSnExp() object with the chromatogram() function. All
chromatographic peaks with their m/z within the m/z range defined by mz
will be retained. Also feature definitions (if present) will be subset
accordingly. The function returns a filtered XChromatogram or
XChromatograms object.
filterRt filters chromatogram(s) by the provided retention time range.
All eventually present chromatographic peaks with their apex within the
retention time range specified with rt will be retained. Also feature
definitions, if present, will be filtered accordingly. The function
returns a filtered XChromatogram or XChromatograms object.
See also help of MSnbase::Chromatogram() in the MSnbase package for
general information and data access. The methods listed here are specific for
XChromatogram and XChromatograms objects.
chromPeaks, chromPeaks<-: extract or set the matrix with the
chromatographic peak definitions. Parameter rt allows to specify a
retention time range for which peaks should be returned along with
parameter type that defines how overlapping is defined (parameter
description for details). For XChromatogram objects the function returns
a matrix with columns "mz" (mean m/z value), "mzmin" (minimal m/z
value) and "mzmax" (maximal m/z value), "rt" (retention time of the
peak apex), "rtmin" (the lower peak boundary in retention time
dimension), "rtmax" (the upper peak boundary in retention time
dimension), "into" (the integrated peak signal/area of the peak),
"maxo" (the maximum instensity of the peak and "sn" (the signal to
noise ratio).
Note that, depending on the peak detection algorithm, the matrix may
contain additional columns.
For XChromatograms objects the matrix contains also columns "row"
and "column" specifying in which chromatogram of object the peak was
identified. Chromatographic peaks are ordered by row.
chromPeakData, chromPeakData<-: extract or set the
S4Vectors::DataFrame() with optional chromatographic peak annotations.
hasChromPeaks: infer whether a XChromatogram (or XChromatograms)
has chromatographic peaks. For XChromatogram: returns a logical(1),
for XChromatograms: returns a matrix, same dimensions than object
with either TRUE or FALSE if chromatographic peaks are available in
the chromatogram at the respective position.
hasFilledChromPeaks: whether a XChromatogram (or a XChromatogram in
a XChromatograms) has filled-in chromatographic peaks.
For XChromatogram: returns a logical(1),
for XChromatograms: returns a matrix, same dimensions than object
with either TRUE or FALSE if chromatographic peaks are available in
the chromatogram at the respective position.
dropFilledChromPeaks: removes filled-in chromatographic peaks. See
dropFilledChromPeaks() help for XCMSnExp() objects for more
information.
hasFeatures: for XChromatograms objects only: if correspondence
analysis has been performed and m/z-rt feature definitions are present.
Returns a logical(1).
dropFeatureDefinitions: for XChrmomatograms objects only: delete any
correspondence analysis results (and related process history).
featureDefinitions: for XChromatograms objects only. Extract the
results from the correspondence analysis (performed with
groupChromPeaks). Returns a DataFrame with the properties of the
defined m/z-rt features: their m/z and retention time range. Column
peakidx contains the index of the chromatographic peaks in the
chromPeaks matrix associated with the feature. Column "row" contains
the row in the XChromatograms object in which the feature was defined.
Similar to the chromPeaks method it is possible to filter the returned
feature matrix with the mz, rt and ppm parameters.
featureValues: for XChromatograms objects only. Extract the abundance
estimates for the individuals features. Note that by default (with
parameter value = "index" a matrix of indices of the peaks in the
chromPeaks matrix associated to the feature is returned. To extract the
integrated peak area use value = "into". The function returns a matrix
with one row per feature (in featureDefinitions) and each column being
a sample (i.e. column of object). For features without a peak associated
in a certain sample NA is returned. This can be changed with the
missing argument of the function.
filterChromPeaks: filters chromatographic peaks in object depending
on parameter method and method-specific parameters passed as additional
arguments with .... Available methods are:
method = "keepTop": keep top n (default n = 1L) peaks in each
chromatogram ordered by column order (defaults to order = "maxo").
Parameter decreasing (default decreasing = TRUE) can be used to
order peaks in descending (decreasing = TRUE) or ascending (
decreasing = FALSE) order to keep the top n peaks with largest or
smallest values, respectively.
processHistory: returns a list of ProcessHistory objects representing
the individual performed processing steps. Optional parameters type and
fileIndex allow to further specify which processing steps to return.
transformIntensity: transforms the intensity values of the chromatograms
with provided function FUN. See MSnbase::transformIntensity() in the
MSnbase package for details. For XChromatogram and XChromatograms
in addition to the intensity values also columns "into" and "maxo"
in the object's chromPeaks matrix are transformed by the same function.
plot draws the chromatogram and highlights in addition any
chromatographic peaks present in the XChromatogram or XChromatograms
(unless peakType = "none" was specified). To draw peaks in different
colors a vector of color definitions with length equal to
nrow(chromPeaks(x)) has to be submitted with peakCol and/or peakBg
defining one color for each peak (in the order as peaks are in
chromPeaks(x)). For base peak chromatograms or total ion chromatograms
it might be better to set peakType = "none" to avoid generating busy
plots.
plotChromPeakDensity: visualize peak density-based correspondence
analysis results. See section Correspondence analysis for more details.
See findChromPeaks-Chromatogram-CentWaveParam for information.
After chromatographic peak detection it is also possible to refine
identified chromatographic peaks with the refineChromPeaks method (e.g. to
reduce peak detection artifacts). Currently, only peak refinement using the
merge neighboring peaks method is available (see
MergeNeighboringPeaksParam() for a detailed description of the approach.
Identified chromatographic peaks in an XChromatograms object can be grouped
into features with the groupChromPeaks function. Currently, such a
correspondence analysis can be performed with the peak density method
(see groupChromPeaks for more details) specifying the algorithm settings
with a PeakDensityParam() object. A correspondence analysis is performed
separately for each row in the XChromatograms object grouping
chromatographic peaks across samples (columns).
The analysis results are stored in the returned XChromatograms object
and can be accessed with the featureDefinitions method which returns a
DataFrame with one row for each feature. Column "row" specifies in
which row of the XChromatograms object the feature was identified.
The plotChromPeakDensity method can be used to visualize peak density
correspondence results, or to simulate a peak density correspondence
analysis on chromatographic data. The resulting plot consists of two panels,
the upper panel showing the chromatographic data as well as the identified
chromatographic peaks, the lower panel the distribution of peaks (the peak
density) along the retention time axis. This plot shows each peak as a point
with it's peak's retention time on the x-axis, and the sample in which it
was found on the y-axis. The distribution of peaks along the retention time
axis is visualized with a density estimate. Grouped chromatographic peaks
are indicated with grey shaded rectangles. Parameter simulate allows to
define whether the correspondence analysis should be simulated (
simulate=TRUE, based on the available data and the provided
PeakDensityParam() parameter class) or not (simulate=FALSE). For the
latter it is assumed that a correspondence analysis has been performed with
the peak density method on the object.
See examples below.
Abundance estimates for each feature can be extracted with the
featureValues function using parameter value = "into" to extract the
integrated peak area for each feature. The result is a matrix, columns
being samples and rows features.
Highlighting the peak area(s) in an XChromatogram or XChromatograms
object (plot with peakType = "polygon") draws a polygon representing
the displayed chromatogram from the peak's minimal retention time to the
maximal retention time. If the XChromatograms was extracted from an
XCMSnExp() object with the chromatogram() function this might not
represent the actual identified peak area if the m/z range that was
used to extract the chromatogram was larger than the peak's m/z.
Johannes Rainer
findChromPeaks-centWave for peak
detection on MSnbase::MChromatograms() objects.
## ---- Creation of XChromatograms ---- ## ## Create a XChromatograms from Chromatogram objects library(MSnbase) dta <- list(Chromatogram(rtime = 1:7, c(3, 4, 6, 12, 8, 3, 2)), Chromatogram(1:10, c(4, 6, 3, 4, 7, 13, 43, 34, 23, 9))) ## Create an XChromatograms without peak data xchrs <- XChromatograms(dta) ## Create an XChromatograms with peaks data pks <- list(matrix(c(4, 2, 5, 30, 12, NA), nrow = 1, dimnames = list(NULL, c("rt", "rtmin", "rtmax", "into", "maxo", "sn"))), NULL) xchrs <- XChromatograms(dta, chromPeaks = pks) ## Create an XChromatograms from XChromatogram objects dta <- lapply(dta, as, "XChromatogram") chromPeaks(dta[[1]]) <- pks[[1]] xchrs <- XChromatograms(dta, nrow = 1) hasChromPeaks(xchrs) ## Loading a test data set with identified chromatographic peaks faahko_sub <- loadXcmsData("faahko_sub2") ## Subset the dataset to the first and third file. xod_sub <- filterFile(faahko_sub, file = c(1, 3)) od <- as(xod_sub, "MsExperiment") ## Extract chromatograms for a m/z - retention time slice chrs <- chromatogram(od, mz = 344, rt = c(2500, 3500)) chrs ## --------------------------------------------------- ## ## Chromatographic peak detection ## ## --------------------------------------------------- ## ## Perform peak detection using CentWave xchrs <- findChromPeaks(chrs, param = CentWaveParam()) xchrs ## Do we have chromatographic peaks? hasChromPeaks(xchrs) ## Process history processHistory(xchrs) ## The chromatographic peaks, columns "row" and "column" provide information ## in which sample the peak was identified. chromPeaks(xchrs) ## Spectifically extract chromatographic peaks for one sample/chromatogram chromPeaks(xchrs[1, 2]) ## Plot the results plot(xchrs) ## Plot the results using a different color for each sample sample_colors <- c("#ff000040", "#00ff0040", "#0000ff40") cols <- sample_colors[chromPeaks(xchrs)[, "column"]] plot(xchrs, col = sample_colors, peakBg = cols) ## Indicate the peaks with a rectangle plot(xchrs, col = sample_colors, peakCol = cols, peakType = "rectangle", peakBg = NA) ## --------------------------------------------------- ## ## Correspondence analysis ## ## --------------------------------------------------- ## ## Group chromatographic peaks across samples prm <- PeakDensityParam(sampleGroup = rep(1, 2)) res <- groupChromPeaks(xchrs, param = prm) hasFeatures(res) featureDefinitions(res) ## Plot the correspondence results. Use simulate = FALSE to show the ## actual results. Grouped chromatographic peaks are indicated with ## grey shaded rectangles. plotChromPeakDensity(res, simulate = FALSE) ## Simulate a correspondence analysis based on different settings. Larger ## bw will increase the smoothing of the density estimate hence grouping ## chromatographic peaks that are more apart on the retention time axis. prm <- PeakDensityParam(sampleGroup = rep(1, 3), bw = 60) plotChromPeakDensity(res, param = prm) ## Delete the identified feature definitions res <- dropFeatureDefinitions(res) hasFeatures(res) library(MSnbase) ## Create a XChromatogram object pks <- matrix(nrow = 1, ncol = 6) colnames(pks) <- c("rt", "rtmin", "rtmax", "into", "maxo", "sn") pks[, "rtmin"] <- 2 pks[, "rtmax"] <- 9 pks[, "rt"] <- 4 pks[, "maxo"] <- 19 pks[, "into"] <- 93 xchr <- XChromatogram(rtime = 1:10, intensity = c(4, 8, 14, 19, 18, 12, 9, 8, 5, 2), chromPeaks = pks) xchr ## Add arbitrary peak annotations df <- DataFrame(peak_id = c("a")) xchr <- XChromatogram(rtime = 1:10, intensity = c(4, 8, 14, 19, 18, 12, 9, 8, 5, 2), chromPeaks = pks, chromPeakData = df) xchr chromPeakData(xchr) ## Extract the chromatographic peaks chromPeaks(xchr) ## Plotting of a single XChromatogram object ## o Don't highlight chromatographic peaks plot(xchr, peakType = "none") ## o Indicate peaks with a polygon plot(xchr) ## Add a second peak to the data. pks <- rbind(chromPeaks(xchr), c(7, 7, 10, NA, 15, NA)) chromPeaks(xchr) <- pks ## Plot the peaks in different colors plot(xchr, peakCol = c("#ff000080", "#0000ff80"), peakBg = c("#ff000020", "#0000ff20")) ## Indicate the peaks as rectangles plot(xchr, peakCol = c("#ff000060", "#0000ff60"), peakBg = NA, peakType = "rectangle") ## Filter the XChromatogram by retention time xchr_sub <- filterRt(xchr, rt = c(4, 6)) xchr_sub plot(xchr_sub)## ---- Creation of XChromatograms ---- ## ## Create a XChromatograms from Chromatogram objects library(MSnbase) dta <- list(Chromatogram(rtime = 1:7, c(3, 4, 6, 12, 8, 3, 2)), Chromatogram(1:10, c(4, 6, 3, 4, 7, 13, 43, 34, 23, 9))) ## Create an XChromatograms without peak data xchrs <- XChromatograms(dta) ## Create an XChromatograms with peaks data pks <- list(matrix(c(4, 2, 5, 30, 12, NA), nrow = 1, dimnames = list(NULL, c("rt", "rtmin", "rtmax", "into", "maxo", "sn"))), NULL) xchrs <- XChromatograms(dta, chromPeaks = pks) ## Create an XChromatograms from XChromatogram objects dta <- lapply(dta, as, "XChromatogram") chromPeaks(dta[[1]]) <- pks[[1]] xchrs <- XChromatograms(dta, nrow = 1) hasChromPeaks(xchrs) ## Loading a test data set with identified chromatographic peaks faahko_sub <- loadXcmsData("faahko_sub2") ## Subset the dataset to the first and third file. xod_sub <- filterFile(faahko_sub, file = c(1, 3)) od <- as(xod_sub, "MsExperiment") ## Extract chromatograms for a m/z - retention time slice chrs <- chromatogram(od, mz = 344, rt = c(2500, 3500)) chrs ## --------------------------------------------------- ## ## Chromatographic peak detection ## ## --------------------------------------------------- ## ## Perform peak detection using CentWave xchrs <- findChromPeaks(chrs, param = CentWaveParam()) xchrs ## Do we have chromatographic peaks? hasChromPeaks(xchrs) ## Process history processHistory(xchrs) ## The chromatographic peaks, columns "row" and "column" provide information ## in which sample the peak was identified. chromPeaks(xchrs) ## Spectifically extract chromatographic peaks for one sample/chromatogram chromPeaks(xchrs[1, 2]) ## Plot the results plot(xchrs) ## Plot the results using a different color for each sample sample_colors <- c("#ff000040", "#00ff0040", "#0000ff40") cols <- sample_colors[chromPeaks(xchrs)[, "column"]] plot(xchrs, col = sample_colors, peakBg = cols) ## Indicate the peaks with a rectangle plot(xchrs, col = sample_colors, peakCol = cols, peakType = "rectangle", peakBg = NA) ## --------------------------------------------------- ## ## Correspondence analysis ## ## --------------------------------------------------- ## ## Group chromatographic peaks across samples prm <- PeakDensityParam(sampleGroup = rep(1, 2)) res <- groupChromPeaks(xchrs, param = prm) hasFeatures(res) featureDefinitions(res) ## Plot the correspondence results. Use simulate = FALSE to show the ## actual results. Grouped chromatographic peaks are indicated with ## grey shaded rectangles. plotChromPeakDensity(res, simulate = FALSE) ## Simulate a correspondence analysis based on different settings. Larger ## bw will increase the smoothing of the density estimate hence grouping ## chromatographic peaks that are more apart on the retention time axis. prm <- PeakDensityParam(sampleGroup = rep(1, 3), bw = 60) plotChromPeakDensity(res, param = prm) ## Delete the identified feature definitions res <- dropFeatureDefinitions(res) hasFeatures(res) library(MSnbase) ## Create a XChromatogram object pks <- matrix(nrow = 1, ncol = 6) colnames(pks) <- c("rt", "rtmin", "rtmax", "into", "maxo", "sn") pks[, "rtmin"] <- 2 pks[, "rtmax"] <- 9 pks[, "rt"] <- 4 pks[, "maxo"] <- 19 pks[, "into"] <- 93 xchr <- XChromatogram(rtime = 1:10, intensity = c(4, 8, 14, 19, 18, 12, 9, 8, 5, 2), chromPeaks = pks) xchr ## Add arbitrary peak annotations df <- DataFrame(peak_id = c("a")) xchr <- XChromatogram(rtime = 1:10, intensity = c(4, 8, 14, 19, 18, 12, 9, 8, 5, 2), chromPeaks = pks, chromPeakData = df) xchr chromPeakData(xchr) ## Extract the chromatographic peaks chromPeaks(xchr) ## Plotting of a single XChromatogram object ## o Don't highlight chromatographic peaks plot(xchr, peakType = "none") ## o Indicate peaks with a polygon plot(xchr) ## Add a second peak to the data. pks <- rbind(chromPeaks(xchr), c(7, 7, 10, NA, 15, NA)) chromPeaks(xchr) <- pks ## Plot the peaks in different colors plot(xchr, peakCol = c("#ff000080", "#0000ff80"), peakBg = c("#ff000020", "#0000ff20")) ## Indicate the peaks as rectangles plot(xchr, peakCol = c("#ff000060", "#0000ff60"), peakBg = NA, peakType = "rectangle") ## Filter the XChromatogram by retention time xchr_sub <- filterRt(xchr, rt = c(4, 6)) xchr_sub plot(xchr_sub)
These functions are provided for compatibility with older versions of ‘xcms’ only, and will be defunct at the next release.
The following functions/methods are deprecated.
profBin, profBinM, profBinLin,
profBinLinM, profBinLinBase, profBinLinBaseM
have been deprecated and binYonX in combination
with imputeLinInterpol should be used instead.
extractMsData: replaced by as(x, "data.frame").
plotMsData: replaced by plot(x, type = "XIC").
This class is used to store and plot parallel extracted ion
chromatograms from multiple sample files. It integrates with the
xcmsSet class to display peak area integrated during peak
identification or fill-in.
Objects can be created with the getEIC method of
the xcmsSet class. Objects can also be created by calls
of the form new("xcmsEIC", ...).
eic:list containing named entries for every sample. for each entry, a list of two column EIC matricies with retention time and intensity
mzrange:two column matrix containing starting and ending m/z for each EIC
rtrange:two column matrix containing starting and ending time for each EIC
rt:either "raw" or "corrected" to specify retention
times contained in the object
groupnames:group names from xcmsSet object used to generate EICs
signature(object = "xcmsEIC"): get groupnames slot
signature(object = "xcmsEIC"): get mzrange slot
signature(x = "xcmsEIC"): plot the extracted ion
chromatograms
signature(object = "xcmsEIC"): get rtrange slot
signature(object = "xcmsEIC"): get sample names
No notes yet.
Colin A. Smith, [email protected]
Data sources which read data from a file should inherit from this
class. The xcms package provides classes to read from
netCDF, mzData, mzXML, and mzML files
using xcmsFileSource.
This class should be considered virtual and will not work if passed to
loadRaw-methods. The reason it is not explicitly
virtual is that there does not appear to be a way for a class to be
both virtual and have a data part (which lets functions treat objects
as if they were character strings).
This class validates that a file exists at the path given.
xcmsFileSource objects should not be instantiated directly.
Instead, create subclasses and instantiate those.
.Data:Object of class "character". File path
of a file from which to read raw data as the object's data part
Class "character", from data part.
Class "xcmsSource", directly.
xcmsSourcesignature(object = "character"): Create an
xcmsFileSource object referencing the given file name.
Daniel Hackney [email protected]
EXPERIMANTAL FEATURE
xcmsFragments is an object similar to xcmsSet, which holds peaks picked (or collected) from one or several xcmsRaw objects.
There are still discussions going on about the exact API for MS$^n$ data, so this is likely to change in the future. The code is not yet pipeline-ified.
xcmsFragments(xs, ...)xcmsFragments(xs, ...)
xs |
A |
... |
further arguments to the |
After running collect(xFragments,xSet) The peaktable of the xcmsFragments includes the ms1Peaks from all experinemts stored in a xcmsSet-object. Further it contains the relevant MSn-peaks from the xcmsRaw-objects, which were created temporarily with the paths in xcmsSet.
An xcmsFragments object.
Joachim Kutzera, Steffen Neumann, [email protected]
This class is similar to xcmsSet because it stores peaks
from a number of individual files. However, xcmsFragments keeps
Tandem MS and e.g. Ion Trap or Orbitrap MS$^n$ peaks, including the
parent ion relationships.
Objects can be created with the xcmsFragments
constructor and filled with peaks using the collect method.
peaks:matrix with colmns peakID (MS1 parent in corresponding xcmsSet),
MSnParentPeakID (parent peak within this xcmsFragments), msLevel
(e.g. 2 for Tandem MS), rt (retention time in case of LC data), mz
(fragment mass-to-charge), intensity (peak intensity extracted
from the original xcmsSet), sample (the index of the rawData-file).
MS2spec:This is a list of matrixes. Each matrix in the list is a single collected spectra from collect. The column ID's are mz, intensity, and full width half maximum(fwhm). The fwhm column is only relevant if the spectra came from profile data.
specinfo:This is a matrix with reference data for the spectra in MS2spec. The column id's are preMZ, AccMZ, rtmin, rtmax, ref, CollisionEnergy. The preMZ is precursor mass from the MS1 scan. This mass is given by the XML file. With some instruments this mass is only given as nominal mass, therefore a AccMZ is given which is a weighted average mass from the MS1 scan of the collected spectra. The retention time is given by rtmin and rtmax. The ref column is a pointer to the MS2spec matrix spectra. The collisionEnergy column is the collision Energy for the spectra.
signature(object = "xcmsFragments"): gets a xcmsSet-object, collects ms1-peaks from it and the msn-peaks from the corresponding xcmsRaw-files.
signature(object = "xcmsFragments"): prints a (text based) pseudo-tree of the peaktable to display the dependencies of the peaks among each other.
signature(object = "xcmsFragments"): print a human-readable
description of this object to the console.
S. Neumann, J. Kutzera
The XCMSnExp object is a container for the results of a G/LC-MS
data preprocessing that comprises chromatographic peak detection, alignment
and correspondence. These results can be accessed with the chromPeaks(),
adjustedRtime() and featureDefinitions() functions; see below
(after the Usage, Arguments, Value and Slots sections) for more details).
Along with the results, the object contains the processing history that
allows to track each processing step along with the used settings. This
can be extracted with the processHistory() function.
XCMSnExp objects, by directly extending the
MSnbase::OnDiskMSnExp object from the MSnbase package, inherit
all of its functionality and allows thus an easy access to the full raw
data at any stage of an analysis.
To support interaction with packages requiring the old objects,
XCMSnExp objects can be coerced into xcmsSet
objects using the as() method (see examples below). All
preprocessing results will be passed along to the resulting
object.
General functions for XCMSnExp objects are (see further below for
specific function to handle chromatographic peak data, alignment and
correspondence results):
processHistoryTypes() returns the available types of
process histories. These can be passed with argument type to the
processHistory method to extract specific process step(s).
hasFilledChromPeaks(): whether filled-in peaks are present or not.
profMat(): creates a profile matrix, which
is a n x m matrix, n (rows) representing equally spaced m/z values (bins)
and m (columns) the retention time of the corresponding scans. Each cell
contains the maximum intensity measured for the specific scan and m/z
values. See profMat() for more details and description of
the various binning methods.
hasAdjustedRtime(): whether the object provides adjusted
retention times.
hasFeatures(): whether the object contains correspondence
results (i.e. features).
hasChromPeaks(): whether the object contains peak
detection results.
hasFilledChromPeaks(): whether the object contains any filled-in
chromatographic peaks.
adjustedRtime(),adjustedRtime<-:
extract/set adjusted retention times. adjustedRtime<- should not
be called manually, it is called internally by the
adjustRtime() methods. For XCMSnExp objects,
adjustedRtime<- does also apply retention time adjustments to
eventually present chromatographic peaks. The bySample parameter
allows to specify whether the adjusted retention time should be grouped
by sample (file).
featureDefinitions(), featureDefinitions<-: extract
or set the correspondence results, i.e. the mz-rt features (peak groups).
Similar to the chromPeaks() it is possible to extract features for
specified m/z and/or rt ranges. The function supports also the parameter
type that allows to specify which features to be returned if any
of rt or mz is specified. For details see help of
chromPeaks().
See also featureSummary() for a function to calculate simple
feature summaries.
chromPeaks(), chromPeaks<-: extract or set
the matrix containing the information on identified chromatographic
peaks. Rownames of the matrix represent unique IDs of the respective peaks
within the experiment.
Parameter bySample allows to specify whether peaks should
be returned ungrouped (default bySample = FALSE) or grouped by
sample (bySample = TRUE). The chromPeaks<- method for
XCMSnExp objects removes also all correspondence (peak grouping)
and retention time correction (alignment) results. The optional
arguments rt, mz, ppm and type allow to extract
only chromatographic peaks overlapping the defined retention time and/or
m/z ranges. Argument type allows to define how overlapping is
determined: for type == "any" (the default), all peaks that are even
partially overlapping the region are returned (i.e. for which either
"mzmin" or "mzmax" of the chromPeaks or
featureDefinitions matrix are within the provided m/z range), for
type == "within" the full peak has to be within the region (i.e.
both "mzmin" and "mzmax" have to be within the m/z range) and
for type == "apex_within" the peak's apex position (highest signal
of the peak) has to be within the region (i.e. the peak's or features m/z
has to be within the m/z range).
See description of the return value for details on the returned matrix.
Users usually don't have to use the chromPeaks<- method directly
as detected chromatographic peaks are added to the object by the
findChromPeaks() method. Also, chromPeaks<- will replace
any existing chromPeakData.
chromPeakData() and chromPeakData<- allow to get or set arbitrary
chromatographic peak annotations. These are returned or ar returned as a
DataFrame. Note that the number of rows and the rownames of the
DataFrame have to match those of chromPeaks. Parameter columns allows
to extract only selected columns from the chromPeakData. By default
(columns = character()) all columns are returned.
rtime(): extracts the retention time for each
scan. The bySample parameter allows to return the values grouped
by sample/file and adjusted whether adjusted or raw retention
times should be returned. By default the method returns adjusted
retention times, if they are available (i.e. if retention times were
adjusted using the adjustRtime() method).
mz(): extracts the mz values from each scan of all files within an
XCMSnExp object. These values are extracted
from the original data files and eventual processing steps are applied
on the fly. Using the bySample parameter it is possible to
switch from the default grouping of mz values by spectrum/scan to a
grouping by sample/file.
intensity(): extracts the intensity values from
each scan of all files within an XCMSnExp object. These values are
extracted from the original data files and eventual processing steps are
applied on the fly. Using the bySample parameter it is
possible to switch from the default grouping of intensity values by
spectrum/scan to a grouping by sample/file.
spectra(): extracts the
Spectrum objects containing all data from
object. The values are extracted from the original data files and
eventual processing steps are applied on the fly. By setting
bySample = TRUE, the spectra are returned grouped by sample/file.
If the XCMSnExp object contains adjusted retention times, these
are returned by default in the Spectrum objects (can be
overwritten by setting adjusted = FALSE).
processHistory(): returns a list of
ProcessHistory() objects (or objects inheriting from this
base class) representing the individual processing steps that have been
performed, eventually along with their settings (Param parameter
class). Optional arguments fileIndex, type and
msLevel allow to restrict to process steps of a certain type or
performed on a certain file or MS level.
dropChromPeaks(): drops any identified chromatographic
peaks and returns the object without that information. Note that for
XCMSnExp objects the method drops by default also results from a
correspondence (peak grouping) analysis. Adjusted retention times are
removed if the alignment has been performed after peak detection.
This can be overruled with keepAdjustedRtime = TRUE.
dropFeatureDefinitions(): drops the results from a
correspondence (peak grouping) analysis, i.e. the definition of the mz-rt
features and returns the object without that information. Note that for
XCMSnExp objects the method will also by default drop retention
time adjustment results, if these were performed after the last peak
grouping (i.e. which base on the results from the peak grouping that are
going to be removed). All related process history steps are
removed too as well as eventually filled in peaks
(by fillChromPeaks()). The parameter keepAdjustedRtime
can be used to avoid removal of adjusted retention times.
dropAdjustedRtime(): drops any retention time
adjustment information and returns the object without adjusted retention
time. For XCMSnExp objects, this also reverts the retention times
reported for the chromatographic peaks in the peak matrix to the
original, raw, ones (after chromatographic peak detection). Note that
for XCMSnExp objects the method drops also all peak grouping
results if these were performed after the retention time
adjustment. All related process history steps are removed too.
findChromPeaks() performs chromatographic peak detection
on the provided XCMSnExp objects. For more details see the method
for XCMSnExp().
Note that by default (with parameter add = FALSE) previous peak
detection results are removed. Use add = TRUE to perform a second
round of peak detection and add the newly identified peaks to the previous
peak detection results. Correspondence results (features) are always removed
prior to peak detection. Previous alignment (retention
time adjustment) results are kept, i.e. chromatographic peak detection
is performed using adjusted retention times if the data was first
aligned using e.g. obiwarp (adjustRtime()).
dropFilledChromPeaks(): drops any filled-in chromatographic
peaks (filled in by the fillChromPeaks() method) and all
related process history steps.
spectrapply() applies the provided function to each
Spectrum in the object and returns its
results. If no function is specified the function simply returns the
list of Spectrum objects.
XCMSnExp objects can be combined with the c() function. This
combines identified chromatographic peaks and the objects' pheno data but
discards alignment results or feature definitions.
plot() plots the spectrum data (see MSnbase::plot() for
MSnExp objects in the MSnbase package for more details.
For type = "XIC", identified chromatographic peaks will be indicated
as rectangles with border color peakCol.
processHistoryTypes() ## S4 method for signature 'XCMSnExp' hasFilledChromPeaks(object) ## S4 method for signature 'OnDiskMSnExp' profMat( object, method = "bin", step = 0.1, baselevel = NULL, basespace = NULL, mzrange. = NULL, fileIndex, ... ) ## S4 method for signature 'XCMSnExp' hasAdjustedRtime(object) ## S4 method for signature 'XCMSnExp' hasFeatures(object, msLevel = integer()) ## S4 method for signature 'XCMSnExp' hasChromPeaks(object, msLevel = integer()) ## S4 method for signature 'XCMSnExp' hasFilledChromPeaks(object) ## S4 method for signature 'XCMSnExp' adjustedRtime(object, bySample = FALSE) ## S4 replacement method for signature 'XCMSnExp' adjustedRtime(object) <- value ## S4 method for signature 'XCMSnExp' featureDefinitions( object, mz = numeric(), rt = numeric(), ppm = 0, type = c("any", "within", "apex_within"), msLevel = integer() ) ## S4 replacement method for signature 'XCMSnExp' featureDefinitions(object) <- value ## S4 method for signature 'XCMSnExp' chromPeaks( object, bySample = FALSE, rt = numeric(), mz = numeric(), ppm = 0, msLevel = integer(), type = c("any", "within", "apex_within"), isFilledColumn = FALSE ) ## S4 replacement method for signature 'XCMSnExp' chromPeaks(object) <- value ## S4 method for signature 'XCMSnExp' rtime(object, bySample = FALSE, adjusted = hasAdjustedRtime(object)) ## S4 method for signature 'XCMSnExp' mz(object, bySample = FALSE, BPPARAM = bpparam()) ## S4 method for signature 'XCMSnExp' intensity(object, bySample = FALSE, BPPARAM = bpparam()) ## S4 method for signature 'XCMSnExp' spectra( object, bySample = FALSE, adjusted = hasAdjustedRtime(object), BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp' processHistory(object, fileIndex, type, msLevel) ## S4 method for signature 'XCMSnExp' dropChromPeaks(object, keepAdjustedRtime = FALSE) ## S4 method for signature 'XCMSnExp' dropFeatureDefinitions(object, keepAdjustedRtime = FALSE, dropLastN = -1) ## S4 method for signature 'XCMSnExp' dropAdjustedRtime(object) ## S4 method for signature 'XCMSnExp' profMat( object, method = "bin", step = 0.1, baselevel = NULL, basespace = NULL, mzrange. = NULL, fileIndex, ... ) ## S4 method for signature 'XCMSnExp,Param' findChromPeaks( object, param, BPPARAM = bpparam(), return.type = "XCMSnExp", msLevel = 1L, add = FALSE ) ## S4 method for signature 'XCMSnExp' dropFilledChromPeaks(object) ## S4 method for signature 'XCMSnExp' spectrapply(object, FUN = NULL, BPPARAM = bpparam(), ...) ## S3 method for class 'XCMSnExp' c(...) ## S4 method for signature 'XCMSnExp' chromPeakData(object, columns = character(), ...) ## S4 replacement method for signature 'XCMSnExp' chromPeakData(object) <- value ## S4 method for signature 'XCMSnExp,missing' plot(x, y, type = c("spectra", "XIC"), peakCol = "#ff000060", ...)processHistoryTypes() ## S4 method for signature 'XCMSnExp' hasFilledChromPeaks(object) ## S4 method for signature 'OnDiskMSnExp' profMat( object, method = "bin", step = 0.1, baselevel = NULL, basespace = NULL, mzrange. = NULL, fileIndex, ... ) ## S4 method for signature 'XCMSnExp' hasAdjustedRtime(object) ## S4 method for signature 'XCMSnExp' hasFeatures(object, msLevel = integer()) ## S4 method for signature 'XCMSnExp' hasChromPeaks(object, msLevel = integer()) ## S4 method for signature 'XCMSnExp' hasFilledChromPeaks(object) ## S4 method for signature 'XCMSnExp' adjustedRtime(object, bySample = FALSE) ## S4 replacement method for signature 'XCMSnExp' adjustedRtime(object) <- value ## S4 method for signature 'XCMSnExp' featureDefinitions( object, mz = numeric(), rt = numeric(), ppm = 0, type = c("any", "within", "apex_within"), msLevel = integer() ) ## S4 replacement method for signature 'XCMSnExp' featureDefinitions(object) <- value ## S4 method for signature 'XCMSnExp' chromPeaks( object, bySample = FALSE, rt = numeric(), mz = numeric(), ppm = 0, msLevel = integer(), type = c("any", "within", "apex_within"), isFilledColumn = FALSE ) ## S4 replacement method for signature 'XCMSnExp' chromPeaks(object) <- value ## S4 method for signature 'XCMSnExp' rtime(object, bySample = FALSE, adjusted = hasAdjustedRtime(object)) ## S4 method for signature 'XCMSnExp' mz(object, bySample = FALSE, BPPARAM = bpparam()) ## S4 method for signature 'XCMSnExp' intensity(object, bySample = FALSE, BPPARAM = bpparam()) ## S4 method for signature 'XCMSnExp' spectra( object, bySample = FALSE, adjusted = hasAdjustedRtime(object), BPPARAM = bpparam() ) ## S4 method for signature 'XCMSnExp' processHistory(object, fileIndex, type, msLevel) ## S4 method for signature 'XCMSnExp' dropChromPeaks(object, keepAdjustedRtime = FALSE) ## S4 method for signature 'XCMSnExp' dropFeatureDefinitions(object, keepAdjustedRtime = FALSE, dropLastN = -1) ## S4 method for signature 'XCMSnExp' dropAdjustedRtime(object) ## S4 method for signature 'XCMSnExp' profMat( object, method = "bin", step = 0.1, baselevel = NULL, basespace = NULL, mzrange. = NULL, fileIndex, ... ) ## S4 method for signature 'XCMSnExp,Param' findChromPeaks( object, param, BPPARAM = bpparam(), return.type = "XCMSnExp", msLevel = 1L, add = FALSE ) ## S4 method for signature 'XCMSnExp' dropFilledChromPeaks(object) ## S4 method for signature 'XCMSnExp' spectrapply(object, FUN = NULL, BPPARAM = bpparam(), ...) ## S3 method for class 'XCMSnExp' c(...) ## S4 method for signature 'XCMSnExp' chromPeakData(object, columns = character(), ...) ## S4 replacement method for signature 'XCMSnExp' chromPeakData(object) <- value ## S4 method for signature 'XCMSnExp,missing' plot(x, y, type = c("spectra", "XIC"), peakCol = "#ff000060", ...)
object |
either a |
method |
|
step |
|
baselevel |
|
basespace |
|
mzrange. |
Optional |
fileIndex |
For |
... |
Additional parameters. |
msLevel |
|
bySample |
|
value |
For For `featureDefinitions<-`: a `DataFrame` with peak grouping information. See return value for the `featureDefinitions` method for the expected format. For `chromPeaks<-`: a `matrix` with information on detected peaks. See return value for the `chromPeaks` method for the expected format. |
mz |
optional |
rt |
optional |
ppm |
optional |
type |
For |
isFilledColumn |
|
adjusted |
|
BPPARAM |
Parameter class for parallel processing. See
|
keepAdjustedRtime |
For |
dropLastN |
For |
param |
A |
return.type |
Character specifying what type of object the method should
return. Can be either |
add |
For |
FUN |
For |
columns |
For |
x |
For |
y |
For |
peakCol |
For |
For profMat(): a list with a the profile matrix
matrix (or matrices if fileIndex was not specified or if
'length(fileIndex) > 1). See profile-matrix for
general help and information about the profile matrix.
For adjustedRtime(): if bySample = FALSE a numeric
vector with the adjusted retention for each spectrum of all files/samples
within the object. If bySample = TRUE a list (length equal
to the number of samples) with adjusted retention times grouped by
sample. Returns NULL if no adjusted retention times are present.
For featureDefinitions(): a DataFrame with peak grouping
information, each row corresponding to one mz-rt feature (grouped peaks
within and across samples) and columns "mzmed" (median mz value),
"mzmin" (minimal mz value), "mzmax" (maximum mz value),
"rtmed" (median retention time), "rtmin" (minimal retention
time), "rtmax" (maximal retention time) and "peakidx".
Column "peakidx" contains a list with indices of
chromatographic peaks (rows) in the matrix returned by the
chromPeaks() method that belong to that feature group. The method
returns NULL if no feature definitions are present.
featureDefinitions() supports also parameters mz, rt,
ppm and type to return only features within certain ranges (see
description of chromPeaks() for details).
For chromPeaks: if bySample = FALSE a matrix (each row
being a chromatographic peak, rownames representing unique IDs of the peaks)
with at least the following columns:
"mz" (intensity-weighted mean of mz values of the peak across
scans/retention times),
"mzmin" (minimal mz value),
"mzmax" (maximal mz value),
"rt" (retention time of the peak apex),
"rtmin" (minimal retention time),
"rtmax" (maximal retention time),
"into" (integrated, original, intensity of the peak),
"maxo" (maximum intentity of the peak),
"sample" (sample index in which the peak was identified) and
Depending on the employed peak detection algorithm and the
verboseColumns parameter of it, additional columns might be
returned. If parameter isFilledColumn was set to TRUE a column
named "is_filled" is also returned.
For bySample = TRUE the chromatographic peaks are
returned as a list of matrices, each containing the
chromatographic peaks of a specific sample. For samples in which no
peaks were detected a matrix with 0 rows is returned.
For rtime(): if bySample = FALSE a numeric vector with
the retention times of each scan, if bySample = TRUE a
list of numeric vectors with the retention times per sample.
For mz(): if bySample = FALSE a list with the mz
values (numeric vectors) of each scan. If bySample = TRUE a
list with the mz values per sample.
For intensity(): if bySample = FALSE a list with
the intensity values (numeric vectors) of each scan. If
bySample = TRUE a list with the intensity values per
sample.
For spectra(): if bySample = FALSE a list with
Spectrum objects. If bySample = TRUE the
result is grouped by sample, i.e. as a list of lists, each
element in the outer list being the list of spectra
of the specific file.
For processHistory(): a list of
ProcessHistory() objects providing the details of the
individual data processing steps that have been performed.
.processHistorylist with XProcessHistory objects
tracking all individual analysis steps that have been performed.
msFeatureDataMsFeatureData class extending environment
and containing the results from a chromatographic peak detection (element
"chromPeaks"), peak grouping (element "featureDefinitions")
and retention time correction (element "adjustedRtime") steps.
This object should not be manipulated directly.
Chromatographic peak data is added to an XCMSnExp object by the
findChromPeaks() function. Functions to access chromatographic
peak data are:
hasChromPeaks() whether chromatographic peak data is available,
see below for help of the function.
chromPeaks() access chromatographic peaks (see below for help).
dropChromPeaks() remove chromatographic peaks (see below for
help).
dropFilledChromPeaks() remove filled-in peaks (see below for
help).
[fillChromPeaks()] fill-in missing peaks (see respective
help page).
[plotChromPeaks()] plot identified peaks for a file (see
respective help page).
[plotChromPeakImage()] plot distribution of peaks along the
retention time axis (see respective help page).
Adjusted retention times are stored in an XCMSnExp object besides the
original, raw, retention times, allowing to switch between raw and adjusted
times. It is also possible to replace the raw retention times with the
adjusted ones with the applyAdjustedRtime() function. The adjusted
retention times are added to an XCMSnExp by the
adjustRtime() function. All functions related to the access of
adjusted retention times are:
hasAdjustedRtime() whether adjusted retention times are available
(see below for help).
dropAdjustedRtime() remove adjusted retention times (see below
for help).
applyAdjustedRtime() replace the raw retention times with
the adjusted ones (see respective help page).
plotAdjustedRtime() plot differences between adjusted and
raw retention times (see respective help page).
The correspondence analysis groupChromPeaks() adds the definition of
LC-MS features to an XCMSnExp object. All functions related to these are
listed below:
hasFeatures() whether correspondence results are available (see
below for help).
featureDefinitions() access the definitions of the features (see
below for help).
dropFeatureDefinitions() remove correspondence results (see below
for help).
featureValues() access values for features (see respective
help page).
featureSummary() perform a simple summary of the defined
features (see respective help page).
overlappingFeatures() identify features that are overlapping or close
in the m/z - rt space (see respective help page).
quantify(): extract feature intensities and put them, along
with feature definitions and phenodata information, into a
SummarizedExperiment::SummarizedExperiment(). See help page for details.
The "chromPeaks" element in the msFeatureData slot is
equivalent to the @peaks slot of the xcmsSet object, the
"featureDefinitions" contains information from the @groups
and @groupidx slots from an xcmsSet object.
Johannes Rainer
MSnbase::OnDiskMSnExp, and MSnbase::pSet for a complete list of inherited methods.
[findChromPeaks()] for available peak detection methods returning a `XCMSnExp` object as a result. [groupChromPeaks()] for available peak grouping methods and `featureDefinitions` for the method to extract the feature definitions representing the peak grouping results. [adjustRtime()] for retention time adjustment methods. [chromatogram()] to extract chromatographic MS data. [featureChromatograms()] to extract chromatograms for each feature. [chromPeakSpectra()] to extract MS1 or MS2 spectra for each chromatographic peak. [featureSpectra()] to extract MS1 or MS2 spectra for features.
fillChromPeaks() for the method to fill-in eventually
missing chromatographic peaks for a feature in some samples.
## Load a test data set with detected peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) ## The results from the peak detection are now stored in the XCMSnExp ## object faahko_sub ## The detected peaks can be accessed with the chromPeaks method. head(chromPeaks(faahko_sub)) ## The settings of the chromatographic peak detection can be accessed with ## the processHistory method processHistory(faahko_sub) ## Also the parameter class for the peak detection can be accessed processParam(processHistory(faahko_sub)[[1]]) ## The XCMSnExp inherits all methods from the pSet and OnDiskMSnExp classes ## defined in Bioconductor's MSnbase package. To access the (raw) retention ## time for each spectrum we can use the rtime method. Setting bySample = TRUE ## would cause the retention times to be grouped by sample head(rtime(faahko_sub)) ## Similarly it is possible to extract the mz values or the intensity values ## using the mz and intensity method, respectively, also with the option to ## return the results grouped by sample instead of the default, which is ## grouped by spectrum. Finally, to extract all of the data we can use the ## spectra method which returns Spectrum objects containing all raw data. ## Note that all these methods read the information from the original input ## files and subsequently apply eventual data processing steps to them. mzs <- mz(faahko_sub, bySample = TRUE) length(mzs) lengths(mzs) ## The full data could also be read using the spectra data, which returns ## a list of Spectrum object containing the mz, intensity and rt values. ## spctr <- spectra(faahko_sub) ## To get all spectra of the first file we can split them by file ## head(split(spctr, fromFile(faahko_sub))[[1]]) ############ ## Filtering ## ## XCMSnExp objects can be filtered by file, retention time, mz values or ## MS level. For some of these filter preprocessing results (mostly ## retention time correction and peak grouping results) will be dropped. ## Below we filter the XCMSnExp object by file to extract the results for ## only the second file. xod_2 <- filterFile(faahko_sub, file = 2) xod_2 ## Now the objects contains only the idenfified peaks for the second file head(chromPeaks(xod_2)) ########## ## Coercing to an xcmsSet object ## ## We can also coerce the XCMSnExp object into an xcmsSet object: xs <- as(faahko_sub, "xcmsSet") head(peaks(xs))## Load a test data set with detected peaks library(MSnbase) data(faahko_sub) ## Update the path to the files for the local system dirname(faahko_sub) <- system.file("cdf/KO", package = "faahKO") ## Disable parallel processing for this example register(SerialParam()) ## The results from the peak detection are now stored in the XCMSnExp ## object faahko_sub ## The detected peaks can be accessed with the chromPeaks method. head(chromPeaks(faahko_sub)) ## The settings of the chromatographic peak detection can be accessed with ## the processHistory method processHistory(faahko_sub) ## Also the parameter class for the peak detection can be accessed processParam(processHistory(faahko_sub)[[1]]) ## The XCMSnExp inherits all methods from the pSet and OnDiskMSnExp classes ## defined in Bioconductor's MSnbase package. To access the (raw) retention ## time for each spectrum we can use the rtime method. Setting bySample = TRUE ## would cause the retention times to be grouped by sample head(rtime(faahko_sub)) ## Similarly it is possible to extract the mz values or the intensity values ## using the mz and intensity method, respectively, also with the option to ## return the results grouped by sample instead of the default, which is ## grouped by spectrum. Finally, to extract all of the data we can use the ## spectra method which returns Spectrum objects containing all raw data. ## Note that all these methods read the information from the original input ## files and subsequently apply eventual data processing steps to them. mzs <- mz(faahko_sub, bySample = TRUE) length(mzs) lengths(mzs) ## The full data could also be read using the spectra data, which returns ## a list of Spectrum object containing the mz, intensity and rt values. ## spctr <- spectra(faahko_sub) ## To get all spectra of the first file we can split them by file ## head(split(spctr, fromFile(faahko_sub))[[1]]) ############ ## Filtering ## ## XCMSnExp objects can be filtered by file, retention time, mz values or ## MS level. For some of these filter preprocessing results (mostly ## retention time correction and peak grouping results) will be dropped. ## Below we filter the XCMSnExp object by file to extract the results for ## only the second file. xod_2 <- filterFile(faahko_sub, file = 2) xod_2 ## Now the objects contains only the idenfified peaks for the second file head(chromPeaks(xod_2)) ########## ## Coercing to an xcmsSet object ## ## We can also coerce the XCMSnExp object into an xcmsSet object: xs <- as(faahko_sub, "xcmsSet") head(peaks(xs))
A matrix of peak information. The actual columns depend on
how it is generated (i.e. the findPeaks method).
Objects can be created by calls of the form new("xcmsPeaks", ...).
.Data:The matrix holding the peak information
Class "matrix", from data part.
Class "array", by class "matrix", distance 2.
Class "structure", by class "matrix", distance 3.
Class "vector", by class "matrix", distance 4, with explicit coerce.
None yet. Some utilities for working with peak data would be nice.
Michael Lawrence
findPeaks for detecting peaks in an
xcmsRaw.
This function handles the task of reading a NetCDF/mzXML file containing
LC/MS or GC/MS data into a new xcmsRaw object. It also
transforms the data into profile (maxrix) mode for efficient
plotting and data exploration.
xcmsRaw(filename, profstep = 1, profmethod = "bin", profparam = list(), includeMSn=FALSE, mslevel=NULL, scanrange=NULL) deepCopy(object)xcmsRaw(filename, profstep = 1, profmethod = "bin", profparam = list(), includeMSn=FALSE, mslevel=NULL, scanrange=NULL) deepCopy(object)
filename |
path name of the NetCDF or mzXML file to read |
profstep |
step size (in m/z) to use for profile generation |
profmethod |
method to use for profile generation. See
|
profparam |
extra parameters to use for profile generation |
includeMSn |
only for XML file formats: also read MS$^n$ (Tandem-MS of Ion-/Orbi- Trap spectra) |
mslevel |
move data from mslevel into normal MS1 slots, e.g. for peak picking and visualisation |
scanrange |
scan range to read |
object |
An xcmsRaw object |
See profile-matrix for details on profile matrix
generation methods and settings.
The scanrange to import can be restricted, otherwise all MS1 data
is read. If profstep is set to 0, no profile matrix is generated.
Unless includeMSn = TRUE only first level MS data is read, not MS/MS,
etc.
deepCopy(xraw) will create a copy of the xcmsRaw object with its own
copy of mz and intensity data in xraw@env.
A xcmsRaw object.
Colin A. Smith, [email protected]
NetCDF file format: https://www.unidata.ucar.edu/software/netcdf/ http://www.astm.org/Standards/E2077.htm http://www.astm.org/Standards/E2078.htm
mzXML file format: http://sashimi.sourceforge.net/software_glossolalia.html
PSI-MS working group who developed mzData and mzML file formats: http://www.psidev.info/index.php?q=node/80
Parser used for XML file formats: http://tools.proteomecenter.org/wiki/index.php?title=Software:RAMP
xcmsRaw-class,
profStep,
profMethod
xcmsFragments
## Not run: library(xcms) library(faahKO) cdfpath <- system.file("cdf", package = "faahKO") cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE) xr<-xcmsRaw(cdffiles[1]) xr ##This gives some information about the file names(attributes(xr)) ## Lets have a look at the structure of the object str(xr) ##same but with a preview of each slot in the object ##SO... lets have a look at how this works head(xr@scanindex) ##[1] 0 429 860 1291 1718 2140 xr@env$mz[425:430] ##[1] 596.3 597.0 597.3 598.1 599.3 200.1 ##We can see that the 429 index is the last mz of scan 1 therefore... mz.scan1<-xr@env$mz[(1+xr@scanindex[1]):xr@scanindex[2]] intensity.scan1<-xr@env$intensity[(1+xr@scanindex[1]):xr@scanindex[2]] plot(mz.scan1, intensity.scan1, type="h", main=paste("Scan 1 of file", basename(cdffiles[1]), sep="")) ##the easier way :p scan1<-getScan(xr, 1) head(scan1) plotScan(xr, 1) ## End(Not run)## Not run: library(xcms) library(faahKO) cdfpath <- system.file("cdf", package = "faahKO") cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE) xr<-xcmsRaw(cdffiles[1]) xr ##This gives some information about the file names(attributes(xr)) ## Lets have a look at the structure of the object str(xr) ##same but with a preview of each slot in the object ##SO... lets have a look at how this works head(xr@scanindex) ##[1] 0 429 860 1291 1718 2140 xr@env$mz[425:430] ##[1] 596.3 597.0 597.3 598.1 599.3 200.1 ##We can see that the 429 index is the last mz of scan 1 therefore... mz.scan1<-xr@env$mz[(1+xr@scanindex[1]):xr@scanindex[2]] intensity.scan1<-xr@env$intensity[(1+xr@scanindex[1]):xr@scanindex[2]] plot(mz.scan1, intensity.scan1, type="h", main=paste("Scan 1 of file", basename(cdffiles[1]), sep="")) ##the easier way :p scan1<-getScan(xr, 1) head(scan1) plotScan(xr, 1) ## End(Not run)
This class handles processing and visualization of the raw data from a single LC/MS or GS/MS run. It includes methods for producing a standard suite of plots including individual spectra, multi-scan average spectra, TIC, and EIC. It will also produce a feature list of significant peaks using matched filtration.
Objects can be created with the xcmsRaw constructor
which reads data from a NetCDF file into a new object.
acquisitionNum:Numeric representing the acquisition
number of the individual scans/spectra. Length of
acquisitionNum is equal to the number of spectra/scans in the
object and hence equal to the scantime slot. Note however that
this information is only available in mzML files.
env:environment with three variables: mz - concatenated
m/z values for all scans, intensity - corresponding
signal intensity for each m/z value, and profile -
matrix represention of the intensity values with columns
representing scans and rows representing equally spaced m/z
values. The profile matrix should be extracted with the
profMat method.
filepath:Path to the raw data file
gradient:matrix with first row, time, containing the time point
for interpolation and successive columns representing solvent
fractions at each point
msnAcquisitionNum:for each scan a unique acquisition number as reported via "spectrum id" (mzData) or "<scan num=...>" and "<scanOrigin num=...>" (mzXML)
msnCollisionEnergy:"CollisionEnergy" (mzData) or "collisionEnergy" (mzXML)
msnLevel:for each scan the "msLevel" (both mzData and mzXML)
msnPrecursorCharge:"ChargeState" (mzData) and "precursorCharge" (mzXML)
msnPrecursorIntensity:"Intensity" (mzData) or "precursorIntensity" (mzXML)
msnPrecursorMz:"MassToChargeRatio" (mzData) or "precursorMz" (mzXML)
msnPrecursorScan:"spectrumRef" (both mzData and mzXML)
msnRt:Retention time of the scan
msnScanindex:msnScanindex
mzrange:numeric vector of length 2 with minimum and maximum m/z values represented in the profile matrix
polarity:polarity
profmethod:characer value with name of method used for generating the profile matrix.
profparam:list to store additional profile matrix
generation settings. Use the profinfo method to
extract all profile matrix creation relevant information.
scanindex:integer vector with starting positions of each scan in the
mz and intensity variables (note that index
values are based off a 0 initial position instead of 1).
scantime:numeric vector with acquisition time (in seconds) for each scan.
tic:numeric vector with total ion count (intensity) for each scan
mslevel:Numeric representing the MS level that is present in MS1
slot. This slot should be accessed through its getter method
mslevel.
scanrange:Numeric of length 2 specifying the scan range (or NULL for
the full range). This slot should be accessed through its getter
method scanrange. Note that the scanrange will
always be 1 to the number of scans within the xcmsRaw
object, which does not necessarily have to match to the scan index in
the original mzML file (e.g. if the original data was sub-setted). The
acquisitionNum information can be used to track the
original position of each scan in the mzML file.
signature(object = "xcmsRaw"): feature detection using
matched filtration in the chromatographic time domain
signature(object = "xcmsRaw"): get extracted ion
chromatograms in specified m/z ranges. This will return the total
ion chromatogram (TIC) if the m/z range corresponds to the full m/z
range (i.e. sum of all signals per retention time across all m/z).
signature(object = "xcmsRaw"): get data for peaks in
specified m/z and time ranges
signature(object = "xcmsRaw"): get m/z and intensity
values for a single mass scan
signature(object = "xcmsRaw"): get average m/z and
intensity values for multiple mass scans
signature(x = "xcmsRaw"): get data for peaks in
specified m/z and time ranges
Create an image of the raw (profile) data m/z against retention time, with the intensity color coded.
Getter method for the mslevel slot.
signature(object = "xcmsRaw"): plot a chromatogram
from profile data
signature(object = "xcmsRaw"): plot locations of raw
intensity data points
signature(object = "xcmsRaw"): plot a mass spectrum
of an individual scan from the raw data
signature(object = "xcmsRaw"): plot a mass spectrum
from profile data
signature(object = "xcmsRaw"): experimental method for
plotting 3D surface of profile data with rgl.
signature(object = "xcmsRaw"): plot total ion count
chromatogram
signature(object = "xcmsRaw"): returns a list containing
the profile generation method and step (profile m/z step size) and
eventual additional parameters to the profile function.
signature(object = "xcmsRaw"): median filter profile
data in time and m/z dimensions
signature(object = "xcmsRaw"): change the method of
generating the profile matrix
signature(object = "xcmsRaw"): get the method of
generating the profile matrix
signature(object = "xcmsRaw"): get vector of m/z values
for each row of the profile matrix
signature(object = "xcmsRaw"): interpret flexible ways
of specifying subsets of the profile matrix
signature(object = "xcmsRaw"): change the m/z step
used for generating the profile matrix
signature(object = "xcmsRaw"): get the m/z step used
for generating the profile matrix
signature(object = "xcmsRaw"): reverse the order of the
data points for each scan
Getter method for the scanrange slot. See slot description
above for more information.
signature(object = "xcmsRaw"): sort the data points
by increasing m/z for each scan
signature(object = "xcmsRaw"): Raw data correction for
lock mass calibration gaps.
signature(object = "xcmsRaw"):
internal function to identify regions of interest in the raw
data as part of the first step of centWave-based peak detection.
Colin A. Smith, [email protected], Johannes Rainer [email protected]
xcmsRaw, subset-xcmsRaw for subsetting by spectra.
This function handles the construction of xcmsSet objects. It finds peaks in batch mode and pre-sorts files from subdirectories into different classes suitable for grouping.
xcmsSet(files = NULL, snames = NULL, sclass = NULL, phenoData = NULL, profmethod = "bin", profparam = list(), polarity = NULL, lockMassFreq=FALSE, mslevel=NULL, nSlaves=0, progressCallback=NULL, scanrange = NULL, BPPARAM = bpparam(), stopOnError = TRUE, ...)xcmsSet(files = NULL, snames = NULL, sclass = NULL, phenoData = NULL, profmethod = "bin", profparam = list(), polarity = NULL, lockMassFreq=FALSE, mslevel=NULL, nSlaves=0, progressCallback=NULL, scanrange = NULL, BPPARAM = bpparam(), stopOnError = TRUE, ...)
files |
path names of the NetCDF/mzXML files to read |
snames |
sample names. By default the file name without extension is used. |
sclass |
sample classes. |
phenoData |
|
profmethod |
Method to use for profile generation. Supported
values are |
profparam |
parameters to use for profile generation. |
polarity |
filter raw data for positive/negative scans |
lockMassFreq |
Performs correction for Waters LockMass function |
mslevel |
perform peak picking on data of given mslevel |
nSlaves |
DEPRECATED, use |
progressCallback |
function to be called, when progressInfo changes (useful for GUIs) |
scanrange |
scan range to read |
BPPARAM |
a |
stopOnError |
Logical specifying whether the feature detection
call should stop on the first encountered error (the default), or
whether feature detection is performed in all files regardless
eventual failures for individual files in which case all errors are
reported as |
... |
further arguments to the |
The default values of the files, snames, sclass, and
phenoData arguments cause the function to recursively search
for readable files. The filename without extention is used for the
sample name. The subdirectory path is used for the sample class.
If the files contain both positive and negative spectra, the polarity
can be selected explicitly. The default (NULL) is to read all scans.
If phenoData is provided, it is stored to the phenoData
slot of the returned xcmsSet class. If that data.frame
contains a column named “class”, its content will be returned
by the sampclass method and thus be used for the
group/class assignment of the individual files (e.g. for peak grouping
etc.). For more details see the help of the xcmsSet-class.
The step size (in m/z) to use for profile generation can be submitted
either using the profparam argument
(e.g. profparam=list(step=0.1)) or by submitting
step=0.1. By specifying a value of 0 the profile matrix
generation can be skipped.
The feature/peak detection algorithm can be specified with the
method argument which defaults to the "matchFilter"
method (findPeaks.matchedFilter). Possible values are
returned by getOption("BioC")$xcms$findPeaks.methods.
The lock mass correction allows for the lock mass scan to be added back in with the last working scan. This correction gives better reproducibility between sample sets.
A xcmsSet object.
The arguments profmethod and profparam have no influence
on the feature/peak detection. The step size parameter step for
the profile generation in the findPeaks.matchedFilter
peak detection algorithm can be passed using the ....
Colin A. Smith, [email protected]
xcmsSet-class,
findPeaks,
profStep,
profMethod,
profBin
This class transforms a set of peaks from multiple LC/MS or GC/MS samples into a matrix of preprocessed data. It groups the peaks and does nonlinear retention time correction without internal standards. It fills in missing peak values from raw data. Lastly, it generates extracted ion chromatograms for ions of interest.
The phenoData slot (and phenoData parameter in the
xcmsSet function) is intended to contain a data.frame describing
all experimental factors, i.e. the samples along with their
properties. If this data.frame contains a column named
“class”, this will be returned by the sampclass method
and will thus be used by all methods to determine the sample
grouping/class assignment (e.g. to define the colors in various plots
or for the group method).
The sampclass<- method adds or replaces the “class”
column in the phenoData slot. If a data.frame is
submitted to this method, the interaction of its columns will be
stored into the “class” column.
Also, similar to other classes in Bioconductor, the $ method
can be used to directly access all columns in the phenoData
slot (e.g. use xset$name on a xcmsSet object called
“xset” to extract the values from a column named “name” in the phenoData slot).
Objects can be created with the xcmsSet constructor
which gathers peaks from a set NetCDF files. Objects can also be
created by calls of the form new("xcmsSet", ...).
matrix containing peak data.
A vector with peak indices of peaks which have been added by a
fillPeaks method.
Matrix containing statistics about peak groups.
List containing indices of peaks in each group.
A data.frame containing the experimental design factors.
list containing two lists, raw and corrected,
each containing retention times for every scan of every sample.
Character vector with absolute path name of each NetCDF file.
list containing the values method - profile generation
method, and step - profile m/z step size and eventual
additional parameters to the profile function.
logical vector filled if the waters Lock mass correction
parameter is used.
A string ("positive" or "negative" or NULL) describing whether only positive or negative scans have been used reading the raw data.
Progress informations for some xcms functions (for GUI).
Function to be called, when progressInfo changes (for GUI).
Numeric representing the MS level on which the peak picking was
performed (by default on MS1). This slot should be accessed
through its getter method mslevel.
Numeric of length 2 specifying the scan range (or NULL for
the full range). This slot should be accessed through its getter
method scanrange. The scan range provided in this slot
represents the scans to which the whole raw data is subsetted.
Internal slot to be used to keep track of performed processing steps. This slot should not be directly accessed by the user.
signature("xcmsSet"): combine objects together
signature(object = "xcmsSet"): set filepaths slot
signature(object = "xcmsSet"): get filepaths slot
signature(object = "xcmsSet"): create report of
differentially regulated ions including EICs
signature(object = "xcmsSet"): fill in peak data for
groups with missing peaks
signature(object = "xcmsSet"): get list of EICs for
each sample in the set
signature(object = "xcmsSet", sampleidx = 1,
profmethod = profMethod(object), profstep = profStep(object),
profparam=profinfo(object), mslevel = NULL, scanrange = NULL,
rt=c("corrected", "raw"), BPPARAM = bpparam()): read the raw
data for one or more files in the xcmsSet and return
it. The default parameters will apply all settings used in the
original xcmsSet call to generate the xcmsSet
object to be applied also to the raw data. Parameter
sampleidx allows to specify which raw file(s) should be
loaded. Argument BPPARAM allows to setup parallel
processing.
signature(object = "xcmsSet"): set groupidx slot
signature(object = "xcmsSet"): get groupidx slot
signature(object = "xcmsSet"): get textual names for
peak groups
signature(object = "xcmsSet"): set groups slot
signature(object = "xcmsSet"): get groups slot
signature(object = "xcmsSet"): get matrix of values
from peak data with a row for each peak group
signature(object = "xcmsSet"): find groups of peaks
across samples that share similar m/z and retention times
Getter method for the mslevel slot.
signature(object = "xcmsSet"): set peaks slot
signature(object = "xcmsSet"): get peaks slot
signature(object = "xcmsSet"): plot retention time
deviation profiles
signature(object = "xcmsSet"): set profinfo slot
signature(object = "xcmsSet"): get profinfo slot
signature(object = "xcmsSet"): extract the method used to
generate the profile matrix.
signature(object = "xcmsSet"): extract the profile step
used for the generation of the profile matrix.
signature(object = "xcmsSet"): use initial grouping
of peaks to do nonlinear loess retention time correction
signature(object = "xcmsSet"): Replaces the column
“class” in the phenoData slot. See details for more information.
signature(object = "xcmsSet"): Returns the content of the
column “class” from the phenoData slot or, if not
present, the interaction of the experimental design factors
(i.e. of the phenoData data.frame). See details for
more information.
signature(object = "xcmsSet"): set the phenoData slot
signature(object = "xcmsSet"): get the phenoData slot
signature(object = "xcmsSet"): set the progressCallback slot
signature(object = "xcmsSet"): get the progressCallback slot
Getter method for the scanrange slot. See scanrange slot
description above for more details.
signature(object = "xcmsSet"): set rownames in the
phenoData slot
signature(object = "xcmsSet"): get rownames in the
phenoData slot
signature("xcmsSet"): divide the xcmsSet into a list of
xcmsSet objects depending on the provided factor. Note that only
peak data will be preserved, i.e. eventual peak grouping information
will be lost.
object$name, object$name<-value
Access and set name column in phenoData
object[, i]Conducts subsetting of a xcmsSet instance. Only subsetting
on columns, i.e. samples, is supported. Subsetting is performed on
all slots, also on groups and groupidx. Parameter
i can be an integer vector, a logical vector or a character
vector of sample names (matching sampnames).
Colin A. Smith, [email protected], Johannes Rainer [email protected]
This virtual class provides an implementation-independent way to load
mass spectrometer data from various sources for use in an
xcmsRaw object. Subclasses can be defined to
enable data to be loaded from user-specified sources. The virtual
class xcmsFileSource is included out of the box
which contains a file name as a character string.
When implementing child classes of xcmsSource, a corresponding
loadRaw-methods method must be provided which accepts
the xcmsSource child class and returns a list in the format
described in loadRaw-methods.
A virtual Class: No objects may be created from it.
Daniel Hackney, [email protected]
xcmsSource-methods for creating xcmsSource
objects in various ways.
xcmsSource object in a flexible wayUsers can define alternate means of reading data for
xcmsRaw objects by creating new implementations
of this method.
signature(object = "xcmsSource")Pass the object through unmodified.
Daniel Hackney, [email protected]