Title: | Base Functions and Classes for Mass Spectrometry and Proteomics |
---|---|
Description: | MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data. |
Authors: | Laurent Gatto, Johannes Rainer and Sebastian Gibb with contributions from Guangchuang Yu, Samuel Wieczorek, Vasile-Cosmin Lazar, Vladislav Petyuk, Thomas Naake, Richie Cotton, Arne Smits, Martina Fisher, Ludger Goeminne, Adriaan Sticker, Lieven Clement and Pascal Maas. |
Maintainer: | Laurent Gatto <[email protected]> |
License: | Artistic-2.0 |
Version: | 2.33.2 |
Built: | 2024-11-13 03:27:48 UTC |
Source: | https://github.com/bioc/MSnbase |
These methods add identification data to a raw MS experiment (an
"MSnExp"
object) or to quantitative data (an
"MSnSet"
object). The identification data needs
to be available as a mzIdentML
file (and passed as filenames,
or directly as identification object) or, alternatively, can be passed
as an arbitrary data.frame
. See details in the Methods
section.
The featureData
slots in a "MSnExp"
or a
"MSnSet"
instance provides only one row per MS2
spectrum but the identification is not always bijective. Prior to
addition, the identification data is filtered as documented in the
filterIdentificationDataFrame
function: (1) only PSMs
matching the regular (non-decoy) database are retained; (2) PSMs of
rank greater than 1 are discarded; and (3) only proteotypic peptides
are kept.
If after filtering, more then one PSM per spectrum are still present,
these are combined (reduced, see
reduce,data.frame-method
) into a single row and
separated by a semi-colon. This has as side-effect that feature
variables that are being reduced are converted to characters. See the
reduce
manual page for examples.
See also the section about identification data in the MSnbase-demo vignette for details and additional examples.
After addition of the identification data, new feature variables are
created. The column nprot
contains the number of members in the
protein group; the columns accession
and description
contain a semicolon separated list of all matches. The columns
npsm.prot
and npep.prot
represent the number of PSMs and
peptides that were matched to a particular protein group. The column
npsm.pep
indicates how many PSMs were attributed to a peptide
(as defined by its sequence pepseq
). All these values are
re-calculated after filtering and reduction.
signature(object = "MSnExp", id = "character", ...
Adds the identification data stored in mzIdentML files to a
"MSnExp"
instance. The method handles one or
multiple mzIdentML files provided via id
. id
has to
be a character
vector of valid filenames. See below for
additional arguments.
signature(object = "MSnExp", id = "mzID", ...)
Same
as above but id
is a mzID
object generated by
mzID::mzID
. See below for additional arguments.
signature(object = "MSnExp", id = "mzIDCollection",
...)
Same as above but id
is a mzIDCollection
object. See below for additional arguments.
signature(object = "MSnExp", id = "mzRident", ...
Same as above but id
is a mzRident
object generated
by mzR::openIdfile
. See below for additional arguments.
signature(object = "MSnExp", id = "data.frame", ...
Same as above but id
could be a data.frame
. See
below for additional arguments.
signature(object = "MSnSet", id = "character", ...)
Adds the identification data stored in mzIdentML files to an
"MSnSet"
instance. The method handles one or
multiple mzIdentML files provided via id
. id
has to
be a character
vector of valid filenames. See below for
additional arguments.
signature(object = "MSnSet", id = "mzID", ...)
Same
as above but id
is a mzID
object. See below for
additional arguments.
signature(object = "MSnSet", id = "mzIDCollection",
...)
Same as above but id
is a mzIDCollection
object. See below for additional arguments.
signature(object = "MSnSet", id = "data.frame", ...)
Same as above but id
is a data.frame
. See below for
additional arguments.
The methods above take the following additional argument. These need
to be set when adding identification data as a data.frame
. In
all other cases, the defaults are set automatically.
The matching between the features (raw spectra or quantiative
features) and identification results is done by matching columns
in the featue data (the featureData
slot) and the
identification data. These values are the spectrum file index and
the acquisition number, passed as a character
of length
2. The default values for these variables in the object
's
feature data are "spectrum.file"
and
"acquisition.num"
. Values need to be provided when
id
is a data.frame
.
The default values for the spectrum file and acquisition numbers
in the identification data (the id
argument) are
"spectrumFile"
and "acquisitionNum"
. Values need to
be provided when id
is a data.frame
.
The protein (group) accession number or identifier. Defaults are
"DatabaseAccess"
when passing filenames or mzRident
objects and "accession"
when passing mzID
or
mzIDCollection
objects. A value needs to be provided when
id
is a data.frame
.
The protein (group) description. Defaults are
"DatabaseDescription"
when passing filenames or
mzRident
objects and "description"
when passing
mzID
or mzIDCollection
objects. A value needs to be
provided when id
is a data.frame
.
The peptide sequence variable name. Defaults are "sequence"
when passing filenames or mzRident
objects and
"pepseq"
when passing mzID
or mzIDCollection
objects. A value needs to be provided when id
is a
data.frame
.
The key to be used when the identification data need to be reduced
(see details section). Defaults are "spectrumID"
when
passing filenames or mzRident
objects and
"spectrumid"
when passing mzID
or
mzIDCollection
objects. A value needs to be provided when
id
is a data.frame
.
The feature variable used to define whether the PSM was matched in
the decoy of regular fasta database for PSM filtering. Defaults
are "isDecoy"
when passing filenames or mzRident
objects and "isdecoy"
when passing mzID
or
mzIDCollection
objects. A value needs to be provided when
id
is a data.frame
. See
filterIdentificationDataFrame
for details.
The feature variable used to defined the rank of the PSM for
filtering. Defaults is "rank"
. A value needs to be provided
when id
is a data.frame
. See
filterIdentificationDataFrame
for details.
The feature variable used to defined the protein (groupo)
accession or identifier for PSM filterin. Defaults is to use the
same value as acc
. A value needs to be provided when
id
is a data.frame
. See
filterIdentificationDataFrame
for details.
A logical
defining whether to print out
messages or not. Default is to use the session-wide open from
isMSnbaseVerbose
.
Sebastian Gibb <[email protected]> and Laurent Gatto
filterIdentificationDataFrame
for the function that
filters identification data, readMzIdData
to read the
identification data as a unfiltered data.frame
and
reduce,data.frame-method
to reduce it to a
data.frame
that contains only unique PSMs per row.
## find path to a mzXML file quantFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "mzXML$") ## find path to a mzIdentML file identFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "dummyiTRAQ.mzid") ## create basic MSnExp msexp <- readMSData(quantFile) ## add identification information msexp <- addIdentificationData(msexp, identFile) ## access featureData fData(msexp) idSummary(msexp)
## find path to a mzXML file quantFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "mzXML$") ## find path to a mzIdentML file identFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "dummyiTRAQ.mzid") ## create basic MSnExp msexp <- readMSData(quantFile) ## add identification information msexp <- addIdentificationData(msexp, identFile) ## access featureData fData(msexp) idSummary(msexp)
This function evaluates the variability within all protein group
of an MSnSet
. If a protein group is composed only of a
single feature, NA
is returned.
aggvar(object, groupBy, fun)
aggvar(object, groupBy, fun)
object |
An object of class |
groupBy |
A |
fun |
A function the summarise the distance between features
within protein groups, typically |
This function can be used to identify protein groups with
incoherent feature (petides or PSMs) expression patterns. Using
max
as a function, one can identify protein groups with
single extreme outliers, such as, for example, a mis-identified
peptide that was erroneously assigned to that protein group. Using
mean
identifies more systematic inconsistencies where, for
example, the subsets of peptide (or PSM) feautres correspond to
proteins with different expression patterns.
A matrix
providing the number of features per
protein group (nb_feats
column) and the aggregation
summarising distance (agg_dist
column).
Laurent Gatto
combineFeatures
to combine PSMs
quantitation into peptides and/or into proteins.
library("pRolocdata") data(hyperLOPIT2015ms3r1psm) groupBy <- "Protein.Group.Accessions" res1 <- aggvar(hyperLOPIT2015ms3r1psm, groupBy, fun = max) res2 <- aggvar(hyperLOPIT2015ms3r1psm, groupBy, fun = mean) par(mfrow = c(1, 3)) plot(res1, log = "y", main = "Single outliers (max)") plot(res2, log = "y", main = "Overall inconsistency (mean)") plot(res1[, "agg_dist"], res2[, "agg_dist"], xlab = "max", ylab = "mean")
library("pRolocdata") data(hyperLOPIT2015ms3r1psm) groupBy <- "Protein.Group.Accessions" res1 <- aggvar(hyperLOPIT2015ms3r1psm, groupBy, fun = max) res2 <- aggvar(hyperLOPIT2015ms3r1psm, groupBy, fun = mean) par(mfrow = c(1, 3)) plot(res1, log = "y", main = "Single outliers (max)") plot(res2, log = "y", main = "Overall inconsistency (mean)") plot(res1[, "agg_dist"], res2[, "agg_dist"], xlab = "max", ylab = "mean")
data.frame
A function to convert the identification data contained in an
mzRident
object to a data.frame
. Each row represents
a scan, which can however be repeated several times if the PSM
matches multiple proteins and/or contains two or more
modifications. To reduce the data.frame
so that rows/scans
are unique and use semicolon-separated values to combine
information pertaining a scan, use
reduce
.
from |
An object of class |
See also the Tandem MS identification data section in the MSnbase-demo vignette.
A data.frame
Laurent Gatto
## find path to a mzIdentML file identFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "dummyiTRAQ.mzid") library("mzR") x <- openIDfile(identFile) x as(x, "data.frame")
## find path to a mzIdentML file identFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "dummyiTRAQ.mzid") library("mzR") x <- openIDfile(identFile) x as(x, "data.frame")
MSnSet
Given a list of MSnSet
instances, typically representing
replicated experiments, the function returns an average
MSnSet
.
averageMSnSet(x, avg = function(x) mean(x, na.rm = TRUE), disp = npcv)
averageMSnSet(x, avg = function(x) mean(x, na.rm = TRUE), disp = npcv)
x |
A |
avg |
The averaging function. Default is the mean after
removing missing values, as computed by |
disp |
The disperion function. Default is an non-parametric
coefficient of variation that replaces the standard deviation by
the median absolute deviation as computed by
|
This function is aimed at facilitating the visualisation of replicated experiments and should not be used as a replacement for a statistical analysis.
The samples of the instances to be averaged must be identical but
can be in a different order (they will be reordered by
default). The features names of the result will correspond to the
union of the feature names of the input MSnSet
instances. Each average value will be computed by the avg
function and the dispersion of the replicated measurements will be
estimated by the disp
function. These dispersions will be
stored as a data.frame
in the feature metadata that can be
accessed with fData(.)$disp
. Similarly, the number of
missing values that were present when average (and dispersion)
were computed are available in fData(.)$disp
.
Currently, the feature metadata of the returned object corresponds
the the feature metadata of the first object in the list
(augmented with the missing value and dispersion values); the
metadata of the features that were missing in this first input are
missing (i.e. populated with NA
s). This may change in the
future.
A new average MSnSet
.
Laurent Gatto
compfnames
to compare MSnSet feature names.
library("pRolocdata") ## 3 replicates from Tan et al. 2009 data(tan2009r1) data(tan2009r2) data(tan2009r3) x <- MSnSetList(list(tan2009r1, tan2009r2, tan2009r3)) avg <- averageMSnSet(x) dim(avg) head(exprs(avg)) head(fData(avg)$nNA) head(fData(avg)$disp) ## using the standard deviation as measure of dispersion avg2 <-averageMSnSet(x, disp = sd) head(fData(avg2)$disp) ## keep only complete observations, i.e proteins ## that had 0 missing values for all samples sel <- apply(fData(avg)$nNA, 1 , function(x) all(x == 0)) avg <- avg[sel, ] disp <- rowMax(fData(avg)$disp) library("pRoloc") setStockcol(paste0(getStockcol(), "AA")) plot2D(avg, cex = 7.7 * disp) title(main = paste("Dispersion: non-parametric CV", paste(round(range(disp), 3), collapse = " - ")))
library("pRolocdata") ## 3 replicates from Tan et al. 2009 data(tan2009r1) data(tan2009r2) data(tan2009r3) x <- MSnSetList(list(tan2009r1, tan2009r2, tan2009r3)) avg <- averageMSnSet(x) dim(avg) head(exprs(avg)) head(fData(avg)$nNA) head(fData(avg)$disp) ## using the standard deviation as measure of dispersion avg2 <-averageMSnSet(x, disp = sd) head(fData(avg2)$disp) ## keep only complete observations, i.e proteins ## that had 0 missing values for all samples sel <- apply(fData(avg)$nNA, 1 , function(x) all(x == 0)) avg <- avg[sel, ] disp <- rowMax(fData(avg)$disp) library("pRoloc") setStockcol(paste0(getStockcol(), "AA")) plot2D(avg, cex = 7.7 * disp) title(main = paste("Dispersion: non-parametric CV", paste(round(range(disp), 3), collapse = " - ")))
This method aggregates individual spectra (Spectrum
instances)
or whole experiments (MSnExp
instances) into discrete bins. All
intensity values which belong to the same bin are summed together.
signature(object = "MSnExp", binSize = "numeric", verbose =
"logical")
Bins all spectra in an MSnExp
object. Use binSize
to control the size of a bin
(in Dalton, default is 1
).
Displays a control bar if verbose set to TRUE
(default). Returns a binned MSnExp
instance.
signature(object = "Spectrum", binSize = "numeric",
breaks = "numeric", msLevel. = "numeric")
Bin the
Spectrum
object. Use binSize
to control the size
of a bin (in Dalton, default is 1
). Similar to
hist
you could use breaks
to
specify the breakpoints between m/z bins. msLevel.
defines the level of the spectrum, and if msLevel(object)
!= msLevel.
, cleaning is ignored. Only relevant when called
from OnDiskMSnExp
and is only relevant for developers.
Returns a binned Spectrum
instance.
Sebastian Gibb <[email protected]>
clean
, pickPeaks
, smooth
,
removePeaks
and trimMz
for other spectra processing methods.
s <- new("Spectrum2", mz=1:10, intensity=1:10) intensity(s) intensity(bin(s, binSize=2)) data(itraqdata) sum(peaksCount(itraqdata)) itraqdata2 <- bin(itraqdata, binSize=2) sum(peaksCount(itraqdata2)) processingData(itraqdata2)
s <- new("Spectrum2", mz=1:10, intensity=1:10) intensity(s) intensity(bin(s, binSize=2)) data(itraqdata) sum(peaksCount(itraqdata)) itraqdata2 <- bin(itraqdata, binSize=2) sum(peaksCount(itraqdata2)) processingData(itraqdata2)
These method calculates a-, b-, c-, x-, y- and z-ions produced by fragmentation.
sequence |
|
object |
Object of class |
tolerance |
|
method |
|
type |
|
z |
|
modifications |
named |
neutralLoss |
|
verbose |
|
signature(sequence = "character", object = "missing", ...)
Calculates the theoretical fragments for a peptide sequence
.
Returns a data.frame
with the columns c("mz", "ion", "type",
"pos", "z", "seq")
.
signature(sequence = "character", object = "Spectrum2", ...)
Calculates and matches the theoretical fragments for a peptide
sequence
and a "Spectrum2"
object
.
The ...
arguments are passed to the internal functions.
Currently tolerance
, method
and relative
are
supported.
You could change the tolerance
(default 0.1
) and
decide whether this tolerance should be applied relative to the target m/z
(relative = TRUE
) or absolute (default relative = FALSE
)
to match the theoretical fragment MZ with the MZ of the spectrum. When
(relative = TRUE
) the mass tolerance window is set to target mz
+/- (target mz * tolerance)
and target mz +/- tolerance
otherwise.
In cases of multiple matches use method
to select the peak with
the highest intensity (method = "highest"
, default) respectively
closest MZ (method = "closes"
). If method = "all"
is set
all possible matches in the current tolerance range are reported.
Returns the same data.frame
as above but the mz
column
represents the matched MZ values of the spectrum. Additionally there
is a column error
that contains the difference between the observed
MZ (from the spectrum) to the theoretical fragment MZ.
Sebastian Gibb <[email protected]>
## find path to a mzXML file file <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "mzXML$") ## create basic MSnExp msexp <- readMSData(file, centroided = FALSE) ## centroid them msexp <- pickPeaks(msexp) ## calculate fragments for ACE with default modification calculateFragments("ACE", modifications=c(C=57.02146)) ## calculate fragments for ACE with an addition N-terminal modification calculateFragments("ACE", modifications=c(C=57.02146, Nterm=229.1629)) ## calculate fragments for ACE without any modifications calculateFragments("ACE", modifications=NULL) calculateFragments("VESITARHGEVLQLRPK", type=c("a", "b", "c", "x", "y", "z"), z=1:2) calculateFragments("VESITARHGEVLQLRPK", msexp[[1]]) ## neutral loss PSMatch::defaultNeutralLoss() ## disable water loss on the C terminal PSMatch::defaultNeutralLoss(disableWaterLoss="Cterm") ## real example calculateFragments("PQR") calculateFragments("PQR", neutralLoss=PSMatch::defaultNeutralLoss(disableWaterLoss="Cterm")) calculateFragments("PQR", neutralLoss=PSMatch::defaultNeutralLoss(disableAmmoniaLoss="Q")) ## disable neutral loss completely calculateFragments("PQR", neutralLoss=NULL)
## find path to a mzXML file file <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "mzXML$") ## create basic MSnExp msexp <- readMSData(file, centroided = FALSE) ## centroid them msexp <- pickPeaks(msexp) ## calculate fragments for ACE with default modification calculateFragments("ACE", modifications=c(C=57.02146)) ## calculate fragments for ACE with an addition N-terminal modification calculateFragments("ACE", modifications=c(C=57.02146, Nterm=229.1629)) ## calculate fragments for ACE without any modifications calculateFragments("ACE", modifications=NULL) calculateFragments("VESITARHGEVLQLRPK", type=c("a", "b", "c", "x", "y", "z"), z=1:2) calculateFragments("VESITARHGEVLQLRPK", msexp[[1]]) ## neutral loss PSMatch::defaultNeutralLoss() ## disable water loss on the C terminal PSMatch::defaultNeutralLoss(disableWaterLoss="Cterm") ## real example calculateFragments("PQR") calculateFragments("PQR", neutralLoss=PSMatch::defaultNeutralLoss(disableWaterLoss="Cterm")) calculateFragments("PQR", neutralLoss=PSMatch::defaultNeutralLoss(disableAmmoniaLoss="Q")) ## disable neutral loss completely calculateFragments("PQR", neutralLoss=NULL)
The Chromatogram
class is designed to store
chromatographic MS data, i.e. pairs of retention time and intensity
values. Instances of the class can be created with the
Chromatogram
constructor function but in most cases the dedicated
methods for OnDiskMSnExp and
MSnExp objects extracting chromatograms should be
used instead (i.e. the chromatogram()
method).
Chromatogram( rtime = numeric(), intensity = numeric(), mz = c(NA_real_, NA_real_), filterMz = c(NA_real_, NA_real_), precursorMz = c(NA_real_, NA_real_), productMz = c(NA_real_, NA_real_), fromFile = integer(), aggregationFun = character(), msLevel = 1L ) aggregationFun(object) ## S4 method for signature 'Chromatogram' show(object) ## S4 method for signature 'Chromatogram' rtime(object) ## S4 method for signature 'Chromatogram' intensity(object) ## S4 method for signature 'Chromatogram' mz(object, filter = FALSE) ## S4 method for signature 'Chromatogram' precursorMz(object) ## S4 method for signature 'Chromatogram' fromFile(object) ## S4 method for signature 'Chromatogram' length(x) ## S4 method for signature 'Chromatogram' as.data.frame(x) ## S4 method for signature 'Chromatogram' filterRt(object, rt) ## S4 method for signature 'Chromatogram' clean(object, all = FALSE, na.rm = FALSE) ## S4 method for signature 'Chromatogram,ANY' plot( x, col = "#00000060", lty = 1, type = "l", xlab = "retention time", ylab = "intensity", main = NULL, ... ) ## S4 method for signature 'Chromatogram' msLevel(object) ## S4 method for signature 'Chromatogram' isEmpty(x) ## S4 method for signature 'Chromatogram' productMz(object) ## S4 method for signature 'Chromatogram' bin( x, binSize = 0.5, breaks = seq(floor(min(rtime(x))), ceiling(max(rtime(x))), by = binSize), fun = max ) ## S4 method for signature 'Chromatogram' normalize(object, method = c("max", "sum")) ## S4 method for signature 'Chromatogram' filterIntensity(object, intensity = 0, ...) ## S4 method for signature 'Chromatogram,Chromatogram' alignRt(x, y, method = c("closest", "approx"), ...) ## S4 method for signature 'Chromatogram,Chromatogram' compareChromatograms( x, y, ALIGNFUN = alignRt, ALIGNFUNARGS = list(), FUN = cor, FUNARGS = list(use = "pairwise.complete.obs"), ... ) ## S4 method for signature 'Chromatogram' transformIntensity(object, FUN = identity)
Chromatogram( rtime = numeric(), intensity = numeric(), mz = c(NA_real_, NA_real_), filterMz = c(NA_real_, NA_real_), precursorMz = c(NA_real_, NA_real_), productMz = c(NA_real_, NA_real_), fromFile = integer(), aggregationFun = character(), msLevel = 1L ) aggregationFun(object) ## S4 method for signature 'Chromatogram' show(object) ## S4 method for signature 'Chromatogram' rtime(object) ## S4 method for signature 'Chromatogram' intensity(object) ## S4 method for signature 'Chromatogram' mz(object, filter = FALSE) ## S4 method for signature 'Chromatogram' precursorMz(object) ## S4 method for signature 'Chromatogram' fromFile(object) ## S4 method for signature 'Chromatogram' length(x) ## S4 method for signature 'Chromatogram' as.data.frame(x) ## S4 method for signature 'Chromatogram' filterRt(object, rt) ## S4 method for signature 'Chromatogram' clean(object, all = FALSE, na.rm = FALSE) ## S4 method for signature 'Chromatogram,ANY' plot( x, col = "#00000060", lty = 1, type = "l", xlab = "retention time", ylab = "intensity", main = NULL, ... ) ## S4 method for signature 'Chromatogram' msLevel(object) ## S4 method for signature 'Chromatogram' isEmpty(x) ## S4 method for signature 'Chromatogram' productMz(object) ## S4 method for signature 'Chromatogram' bin( x, binSize = 0.5, breaks = seq(floor(min(rtime(x))), ceiling(max(rtime(x))), by = binSize), fun = max ) ## S4 method for signature 'Chromatogram' normalize(object, method = c("max", "sum")) ## S4 method for signature 'Chromatogram' filterIntensity(object, intensity = 0, ...) ## S4 method for signature 'Chromatogram,Chromatogram' alignRt(x, y, method = c("closest", "approx"), ...) ## S4 method for signature 'Chromatogram,Chromatogram' compareChromatograms( x, y, ALIGNFUN = alignRt, ALIGNFUNARGS = list(), FUN = cor, FUNARGS = list(use = "pairwise.complete.obs"), ... ) ## S4 method for signature 'Chromatogram' transformIntensity(object, FUN = identity)
rtime |
for |
intensity |
for |
mz |
for |
filterMz |
for |
precursorMz |
for |
productMz |
for |
fromFile |
for |
aggregationFun |
for |
msLevel |
for |
object |
|
filter |
for |
x |
|
rt |
for |
all |
for |
na.rm |
for |
col |
for |
lty |
for |
type |
for |
xlab |
for |
ylab |
for |
main |
for |
... |
for |
binSize |
for |
breaks |
for |
fun |
for |
method |
|
y |
for |
ALIGNFUN |
for |
ALIGNFUNARGS |
|
FUN |
for |
FUNARGS |
for |
The mz
, filterMz
, precursorMz
and
productMz
are stored as a numeric(2)
representing a range
even if the chromatogram was generated for only a single ion (i.e. a
single mz value). Using ranges for mz
values allow this class to
be used also for e.g. total ion chromatograms or base peak chromatograms.
The slots `precursorMz` and `productMz` allow to represent SRM (single reaction monitoring) and MRM (multiple SRM) chromatograms. As example, a `Chromatogram` for a SRM transition 273 -> 153 will have a `@precursorMz = c(273, 273)` and a `@productMz = c(153, 153)`.
Chromatogram
objects can be extracted from an MSnExp
or OnDiskMSnExp
object with the chromatogram()
function.
Alternatively, the constructor function Chromatogram
can be used, which
takes arguments rtime
, intensity
, mz
, filterMz
, precursorMz
,
productMz
, fromFile
, aggregationFun
and msLevel
.
aggregationFun
: gets the aggregation function used to create the
Chromatogram
.
as.data.frame
: returns a data.frame
with columns "rtime"
and
"intensity"
.
fromFile
: returns an integer(1)
with the index of the originating file.
intensity
: returns the intensities from the Chromatogram
.
isEmpty
: returns TRUE
if the chromatogram is empty or has only NA
intensities.
length
: returns the length (i.e. number of data points) of the
Chromatogram
.
msLevel
: returns an integer(1)
with the MS level of the chromatogram.
mz
: get the m/z (range) from the Chromatogram
. The function returns
a numeric(2)
with the lower and upper boundaries. Parameter filter
allows to specify whether the m/z range used to filter the originating
object should be returned or the m/z range of the actual data.
precursorMz
: get the m/z of the precursor ion. The function returns a
numeric(2)
with the lower and upper boundary.
productMz
: get the m/z of the producto chromatogram/ion. The function
returns a numeric(2)
with the lower and upper m/z value.
rtime
: returns the retention times from the Chromatogram
.
filterRt
: filter/subset the Chromatogram
to the specified retention
time range (defined with parameter rt
).
filterIntensity
: filter a Chromatogram()
object removing data
points with intensities below a user provided threshold. If intensity
is a numeric
value, the returned chromatogram will only contain data
points with intensities > intensity
. In addition it is possible to
provide a function to perform the filtering.
This function is expected to take the input Chromatogram
(object
) and
to return a logical vector with the same length then there are data points
in object
with TRUE
for data points that should be kept and FALSE
for data points that should be removed. See examples below.
alignRt
: Aligns chromatogram x
against chromatogram y
. The resulting
chromatogram has the same length (number of data points) than y
and the
same retention times thus allowing to perform any pair-wise comparisons
between the chromatograms. If x
is a MChromatograms()
object, each
Chromatogram
in it is aligned against y
. Additional parameters (...
)
are passed along to the alignment functions (e.g. closest()
).
Parameter method
allows to specify which alignment method
should be used. Currently there are the following options:
method = "closest"
(the default): match data points in the first
chromatogram (x
) to those of the second (y
) based on the difference
between their retention times: each data point in x
is assigned to the
data point in y
with the smallest difference in their retention times
if their difference is smaller than the minimum average difference
between retention times in x
or y
(parameter tolerance
for the
call to the closest()
function).
By setting tolerance = 0
only exact retention times are matched against
each other (i.e. only values are kept with exactly the same retention
times between both chromatograms).
method = "approx"
: uses the base R approx
function to approximate
intensities in x
to the retention times in y
(using linear
interpolation). This should only be used for chromatograms that were
measured in the same measurement run (e.g. MS1 and corresponding MS2
chromatograms from SWATH experiments).
bin
: aggregates intensity values from a chromatogram in discrete bins
along the retention time axis and returns a Chromatogram
object with
the retention time representing the mid-point of the bins and the
intensity the binned signal. Parameters binSize
and breaks
allow to
define the binning, fun
the function which should be used to aggregate
the intensities within a bin.
compareChromatograms
: calculates a similarity score between 2
chromatograms after aligning them. Parameter ALIGNFUN
allows to define
a function that can be used to align x
against y
(defaults to
ALIGNFUN = alignRt
). Subsequently, the similarity is calculated on the
aligned intensities with the function provided with parameter FUN
which
defaults to cor
(hence by default the Pearson correlation is calculated
between the aligned intensities of the two compared chromatograms).
Additional parameters can be passed to the ALIGNFUN
and FUN
with the
parameter ALIGNFUNARGS
and FUNARGS
, respectively.
clean
: removes 0-intensity data points (and NA
values). See clean()
for details.
normalize
, normalise
: normalises the intensities of a chromatogram by
dividing them either by the maximum intensity (method = "max"
) or total
intensity (method = "sum"
) of the chromatogram.
transformIntensity
: allows to manipulate the intensity values of a
chromatogram using a user provided function. See below for examples.
plot
: plots a Chromatogram
object.
Johannes Rainer
MChromatograms for combining Chromatogram
in
a two-dimensional matrix (rows being mz-rt ranges, columns samples).
chromatogram()] for the method to extract chromatogram data from an
MSnExpor
OnDiskMSnExpobject. [clean()] for the method to *clean* a
Chromatogram' object.
## Create a simple Chromatogram object. ints <- abs(rnorm(100, sd = 100)) rts <- seq_len(length(ints)) chr <- Chromatogram(rtime = rts, intensity = ints) chr ## Extract intensities intensity(chr) ## Extract retention times rtime(chr) ## Extract the mz range - is NA for the present example mz(chr) ## plot the Chromatogram plot(chr) ## Create a simple Chromatogram object based on random values. chr <- Chromatogram(intensity = abs(rnorm(1000, mean = 2000, sd = 200)), rtime = sort(abs(rnorm(1000, mean = 10, sd = 5)))) chr ## Get the intensities head(intensity(chr)) ## Get the retention time head(rtime(chr)) ## What is the retention time range of the object? range(rtime(chr)) ## Filter the chromatogram to keep only values between 4 and 10 seconds chr2 <- filterRt(chr, rt = c(4, 10)) range(rtime(chr2)) ## Data manipulations: ## normalize a chromatogram par(mfrow = c(1, 2)) plot(chr) plot(normalize(chr, method = "max")) ## Align chromatograms against each other chr1 <- Chromatogram(rtime = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), intensity = c(3, 5, 14, 30, 24, 6, 2, 1, 1, 0)) chr2 <- Chromatogram(rtime = c(2.5, 3.42, 4.5, 5.43, 6.5), intensity = c(5, 12, 15, 11, 5)) plot(chr1, col = "black") points(rtime(chr2), intensity(chr2), col = "blue", type = "l") ## Align chr2 to chr1 without interpolation res <- alignRt(chr2, chr1) rtime(res) intensity(res) points(rtime(res), intensity(res), col = "#00ff0080", type = "l") ## Align chr2 to chr1 with interpolation res <- alignRt(chr2, chr1, method = "approx") points(rtime(res), intensity(res), col = "#ff000080", type = "l") legend("topright", col = c("black", "blue", "#00ff0080","#ff000080"),lty = 1, legend = c("chr1", "chr2", "chr2 matchRtime", "chr2 approx")) ## Compare Chromatograms. Align chromatograms with `alignRt` and ## method `"approx"` compareChromatograms(chr2, chr1, ALIGNFUNARGS = list(method = "approx")) ## Data filtering chr1 <- Chromatogram(rtime = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), intensity = c(3, 5, 14, 30, 24, 6, 2, 1, 1, 0)) ## Remove data points with intensities below 10 res <- filterIntensity(chr1, 10) intensity(res) ## Remove data points with an intensity lower than 10% of the maximum ## intensity in the Chromatogram filt_fun <- function(x, prop = 0.1) { x@intensity >= max(x@intensity, na.rm = TRUE) * prop } res <- filterIntensity(chr1, filt_fun) intensity(res) ## Remove data points with an intensity lower than half of the maximum res <- filterIntensity(chr1, filt_fun, prop = 0.5) intensity(res) ## log2 transform intensity values res <- transformIntensity(chr1, log2) intensity(res) log2(intensity(chr1))
## Create a simple Chromatogram object. ints <- abs(rnorm(100, sd = 100)) rts <- seq_len(length(ints)) chr <- Chromatogram(rtime = rts, intensity = ints) chr ## Extract intensities intensity(chr) ## Extract retention times rtime(chr) ## Extract the mz range - is NA for the present example mz(chr) ## plot the Chromatogram plot(chr) ## Create a simple Chromatogram object based on random values. chr <- Chromatogram(intensity = abs(rnorm(1000, mean = 2000, sd = 200)), rtime = sort(abs(rnorm(1000, mean = 10, sd = 5)))) chr ## Get the intensities head(intensity(chr)) ## Get the retention time head(rtime(chr)) ## What is the retention time range of the object? range(rtime(chr)) ## Filter the chromatogram to keep only values between 4 and 10 seconds chr2 <- filterRt(chr, rt = c(4, 10)) range(rtime(chr2)) ## Data manipulations: ## normalize a chromatogram par(mfrow = c(1, 2)) plot(chr) plot(normalize(chr, method = "max")) ## Align chromatograms against each other chr1 <- Chromatogram(rtime = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), intensity = c(3, 5, 14, 30, 24, 6, 2, 1, 1, 0)) chr2 <- Chromatogram(rtime = c(2.5, 3.42, 4.5, 5.43, 6.5), intensity = c(5, 12, 15, 11, 5)) plot(chr1, col = "black") points(rtime(chr2), intensity(chr2), col = "blue", type = "l") ## Align chr2 to chr1 without interpolation res <- alignRt(chr2, chr1) rtime(res) intensity(res) points(rtime(res), intensity(res), col = "#00ff0080", type = "l") ## Align chr2 to chr1 with interpolation res <- alignRt(chr2, chr1, method = "approx") points(rtime(res), intensity(res), col = "#ff000080", type = "l") legend("topright", col = c("black", "blue", "#00ff0080","#ff000080"),lty = 1, legend = c("chr1", "chr2", "chr2 matchRtime", "chr2 approx")) ## Compare Chromatograms. Align chromatograms with `alignRt` and ## method `"approx"` compareChromatograms(chr2, chr1, ALIGNFUNARGS = list(method = "approx")) ## Data filtering chr1 <- Chromatogram(rtime = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), intensity = c(3, 5, 14, 30, 24, 6, 2, 1, 1, 0)) ## Remove data points with intensities below 10 res <- filterIntensity(chr1, 10) intensity(res) ## Remove data points with an intensity lower than 10% of the maximum ## intensity in the Chromatogram filt_fun <- function(x, prop = 0.1) { x@intensity >= max(x@intensity, na.rm = TRUE) * prop } res <- filterIntensity(chr1, filt_fun) intensity(res) ## Remove data points with an intensity lower than half of the maximum res <- filterIntensity(chr1, filt_fun, prop = 0.5) intensity(res) ## log2 transform intensity values res <- transformIntensity(chr1, log2) intensity(res) log2(intensity(chr1))
The chromatogram
method extracts chromatogram(s) from an
MSnExp
or OnDiskMSnExp
object.
Depending on the provided parameters this can be a total ion chromatogram
(TIC), a base peak chromatogram (BPC) or an extracted ion chromatogram
(XIC) extracted from each sample/file.
## S4 method for signature 'MSnExp' chromatogram( object, rt, mz, aggregationFun = "sum", missing = NA_real_, msLevel = 1L, BPPARAM = bpparam() )
## S4 method for signature 'MSnExp' chromatogram( object, rt, mz, aggregationFun = "sum", missing = NA_real_, msLevel = 1L, BPPARAM = bpparam() )
object |
For |
rt |
A |
mz |
A |
aggregationFun |
|
missing |
|
msLevel |
|
BPPARAM |
Parallelisation backend to be used, which will
depend on the architecture. Default is
|
Arguments rt
and mz
allow to specify the MS
data slice from which the chromatogram should be extracted.
The parameter aggregationSum
allows to specify the function to be
used to aggregate the intensities across the mz range for the same
retention time. Setting aggregationFun = "sum"
would e.g. allow
to calculate the total ion chromatogram (TIC),
aggregationFun = "max"
the base peak chromatogram (BPC).
The length of the extracted Chromatogram
object,
i.e. the number of available data points, corresponds to the number of
scans/spectra measured in the specified retention time range. If in a
specific scan (for a give retention time) no signal was measured in the
specified mz range, a NA_real_
is reported as intensity for the
retention time (see Notes for more information). This can be changed
using the missing
parameter.
By default or if \code{mz} and/or \code{rt} are numeric vectors, the function extracts one \code{\link{Chromatogram}} object for each file in the \code{\linkS4class{MSnExp}} or \code{\linkS4class{OnDiskMSnExp}} object. Providing a numeric matrix with argument \code{mz} or \code{rt} enables to extract multiple chromatograms per file, one for each row in the matrix. If the number of columns of \code{mz} or \code{rt} are not equal to 2, \code{range} is called on each row of the matrix.
chromatogram
returns a MChromatograms
object with
the number of columns corresponding to the number of files in
object
and number of rows the number of specified ranges (i.e.
number of rows of matrices provided with arguments mz
and/or
rt
). The featureData
of the returned object contains columns
"mzmin"
and "mzmax"
with the values from input argument
mz
(if used) and "rtmin"
and "rtmax"
if the input
argument rt
was used.
Johannes Rainer
Chromatogram
and MChromatograms
for the
classes that represent single and multiple chromatograms.
## Read a test data file. library(BiocParallel) register(SerialParam()) library(msdata) f <- c(system.file("microtofq/MM14.mzML", package = "msdata"), system.file("microtofq/MM8.mzML", package = "msdata")) ## Read the data as an MSnExp msd <- readMSData(f, msLevel = 1) ## Extract the total ion chromatogram for each file: tic <- chromatogram(msd) tic ## Extract the TIC for the second file: tic[1, 2] ## Plot the TIC for the first file plot(rtime(tic[1, 1]), intensity(tic[1, 1]), type = "l", xlab = "rtime", ylab = "intensity", main = "TIC") ## Extract chromatograms for a MS data slices defined by retention time ## and mz ranges. rtr <- rbind(c(10, 60), c(280, 300)) mzr <- rbind(c(140, 160), c(300, 320)) chrs <- chromatogram(msd, rt = rtr, mz = mzr) ## Each row of the returned MChromatograms object corresponds to one mz-rt ## range. The Chromatogram for the first range in the first file is empty, ## because the retention time range is outside of the file's rt range: chrs[1, 1] ## The mz and/or rt ranges used are provided as featureData of the object fData(chrs) ## The mz method can be used to extract the m/z ranges directly mz(chrs) ## Also the Chromatogram for the second range in the second file is empty chrs[2, 2] ## Get the extracted chromatogram for the first range in the second file chr <- chrs[1, 2] chr plot(rtime(chr), intensity(chr), xlab = "rtime", ylab = "intensity")
## Read a test data file. library(BiocParallel) register(SerialParam()) library(msdata) f <- c(system.file("microtofq/MM14.mzML", package = "msdata"), system.file("microtofq/MM8.mzML", package = "msdata")) ## Read the data as an MSnExp msd <- readMSData(f, msLevel = 1) ## Extract the total ion chromatogram for each file: tic <- chromatogram(msd) tic ## Extract the TIC for the second file: tic[1, 2] ## Plot the TIC for the first file plot(rtime(tic[1, 1]), intensity(tic[1, 1]), type = "l", xlab = "rtime", ylab = "intensity", main = "TIC") ## Extract chromatograms for a MS data slices defined by retention time ## and mz ranges. rtr <- rbind(c(10, 60), c(280, 300)) mzr <- rbind(c(140, 160), c(300, 320)) chrs <- chromatogram(msd, rt = rtr, mz = mzr) ## Each row of the returned MChromatograms object corresponds to one mz-rt ## range. The Chromatogram for the first range in the first file is empty, ## because the retention time range is outside of the file's rt range: chrs[1, 1] ## The mz and/or rt ranges used are provided as featureData of the object fData(chrs) ## The mz method can be used to extract the m/z ranges directly mz(chrs) ## Also the Chromatogram for the second range in the second file is empty chrs[2, 2] ## Get the extracted chromatogram for the first range in the second file chr <- chrs[1, 2] chr plot(rtime(chr), intensity(chr), xlab = "rtime", ylab = "intensity")
This method cleans out individual spectra (Spectrum
instances),
chromatograms (Chromatogram
instances)
or whole experiments (MSnExp
instances) of 0-intensity
peaks. Unless all
is set to FALSE
, original 0-intensity
values are retained only around peaks. If more than two 0's were
separating two peaks, only the first and last ones, those directly
adjacent to the peak ranges are kept. If two peaks are separated by
only one 0-intensity value, it is retained. An illustrative example is
shown below.
signature(object = "MSnExp", all = "logical", verbose =
"logical")
Cleans all spectra in MSnExp
object. Displays a control bar if verbose set to TRUE
(default). Returns a cleaned MSnExp
instance.
signature(object = "Spectrum", all = "logical",
msLevel. = "numeric")
Cleans the Spectrum
object. Returns a cleaned Spectrum
instance. If all
= TRUE
, then all zeros are removed. msLevel.
defines the
level of the spectrum, and if msLevel(object) !=
msLevel.
, cleaning is ignored. Only relevant when called from
OnDiskMSnExp
and is only relevant for developers.
signature(object = "Chromatogram", all = "logical",
na.rm = "logical")
Cleans the Chromatogram
instance and returns
a cleaned Chromatogram
object. If
na.rm
is TRUE
(default is FALSE
) all
NA
intensities are removed before cleaning the chromatogram.
Laurent Gatto
removePeaks
and trimMz
for other spectra
processing methods.
int <- c(1,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0) sp1 <- new("Spectrum2", intensity=int, mz=1:length(int)) sp2 <- clean(sp1) ## default is all=FALSE intensity(sp1) intensity(sp2) intensity(clean(sp1, all = TRUE)) mz(sp1) mz(sp2) mz(clean(sp1, all = TRUE)) data(itraqdata) itraqdata2 <- clean(itraqdata) sum(peaksCount(itraqdata)) sum(peaksCount(itraqdata2)) processingData(itraqdata2) ## Create a simple Chromatogram object chr <- Chromatogram(rtime = 1:12, intensity = c(0, 0, 20, 0, 0, 0, 123, 124343, 3432, 0, 0, 0)) ## Remove 0-intensity values keeping those adjacent to peaks chr <- clean(chr) intensity(chr) ## Remove all 0-intensity values chr <- clean(chr, all = TRUE) intensity(chr) ## Clean a Chromatogram with NAs. chr <- Chromatogram(rtime = 1:12, intensity = c(0, 0, 20, NA, NA, 0, 123, 124343, 3432, 0, 0, 0)) chr <- clean(chr, all = FALSE, na.rm = TRUE) intensity(chr)
int <- c(1,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0) sp1 <- new("Spectrum2", intensity=int, mz=1:length(int)) sp2 <- clean(sp1) ## default is all=FALSE intensity(sp1) intensity(sp2) intensity(clean(sp1, all = TRUE)) mz(sp1) mz(sp2) mz(clean(sp1, all = TRUE)) data(itraqdata) itraqdata2 <- clean(itraqdata) sum(peaksCount(itraqdata)) sum(peaksCount(itraqdata2)) processingData(itraqdata2) ## Create a simple Chromatogram object chr <- Chromatogram(rtime = 1:12, intensity = c(0, 0, 20, 0, 0, 0, 123, 124343, 3432, 0, 0, 0)) ## Remove 0-intensity values keeping those adjacent to peaks chr <- clean(chr) intensity(chr) ## Remove all 0-intensity values chr <- clean(chr, all = TRUE) intensity(chr) ## Clean a Chromatogram with NAs. chr <- Chromatogram(rtime = 1:12, intensity = c(0, 0, 20, NA, NA, 0, 123, 124343, 3432, 0, 0, 0)) chr <- clean(chr, all = FALSE, na.rm = TRUE) intensity(chr)
MSnSet
object This function combines the features in an
"MSnSet"
instance applying a summarisation
function (see fun
argument) to sets of features as defined by a
factor (see fcol
argument). Note that the feature names are
automatically updated based on the groupBy
parameter.
The coefficient of variations are automatically computed and collated
to the featureData slot. See cv
and cv.norm
arguments
for details.
If NA values are present, a message will be shown. Details on how missing value impact on the data aggregation are provided below.
object |
An instance of class |
groupBy |
A |
fun |
Deprecated; use |
method |
The summerising function. Currently, mean, median,
weighted mean, sum, median polish, robust summarisation (using
|
fcol |
Feature meta-data label (fData column name) defining how
to summerise the features. It must be present in
|
redundancy.handler |
If |
cv |
A |
cv.norm |
A |
verbose |
A |
... |
Additional arguments for the |
Missing values have different effect based on the aggregation method employed, as detailed below. See also examples below.
When using either "sum"
, "mean"
,
"weighted.mean"
or "median"
, any missing value will be
propagated at the higher level. If na.rm = TRUE
is used, then
the missing value will be ignored.
Missing values will result in an error when using
"medpolish"
, unless na.rm = TRUE
is used.
When using robust summarisation ("robust"
), individual
missing values are excluded prior to fitting the linear model by
robust regression. To remove all values in the feature containing
the missing values, use filterNA
.
The "iPQF"
method will fail with an error if missing
value are present, which will have to be handled explicitly. See
below.
More generally, missing values often need dedicated handling such as
filtering (see filterNA
) or imputation (see
impute
).
A new "MSnSet"
instance is returned with
ncol
(i.e. number of samples) is unchanged, but nrow
(i.e. the number od features) is now equals to the number of levels in
groupBy
. The feature metadata (featureData
slot) is
updated accordingly and only the first occurrence of a feature in the
original feature meta-data is kept.
Laurent Gatto with contributions from Martina Fischer for iPQF and Ludger Goeminne, Adriaan Sticker and Lieven Clement for robust.
iPQF: a new peptide-to-protein summarization method using peptide spectra characteristics to improve protein quantification. Fischer M, Renard BY. Bioinformatics. 2016 Apr 1;32(7):1040-7. doi:10.1093/bioinformatics/btv675. Epub 2015 Nov 20. PubMed PMID:26589272.
featureCV
to calculate coefficient of variation,
nFeatures
to document the number of features per group
in the feature data, and the aggvar
to explore
variability within protein groups.
iPQF
for iPQF summarisation.
NTR
for normalisation to reference summarisation.
data(msnset) msnset <- msnset[11:15, ] exprs(msnset) ## arbitrary grouping into two groups grp <- as.factor(c(1, 1, 2, 2, 2)) msnset.comb <- combineFeatures(msnset, groupBy = grp, method = "sum") dim(msnset.comb) exprs(msnset.comb) fvarLabels(msnset.comb) ## grouping with a list grpl <- list(c("A", "B"), "A", "A", "C", c("C", "B")) ## optional naming names(grpl) <- featureNames(msnset) exprs(combineFeatures(msnset, groupBy = grpl, method = "sum", redundancy.handler = "unique")) exprs(combineFeatures(msnset, groupBy = grpl, method = "sum", redundancy.handler = "multiple")) ## missing data exprs(msnset)[4, 4] <- exprs(msnset)[2, 2] <- NA exprs(msnset) ## NAs propagate in the 115 and 117 channels exprs(combineFeatures(msnset, grp, "sum")) ## NAs are removed before summing exprs(combineFeatures(msnset, grp, "sum", na.rm = TRUE)) ## using iPQF data(msnset2) anyNA(msnset2) res <- combineFeatures(msnset2, groupBy = fData(msnset2)$accession, redundancy.handler = "unique", method = "iPQF", low.support.filter = FALSE, ratio.calc = "sum", method.combine = FALSE) head(exprs(res)) ## using robust summarisation data(msnset) ## reset data msnset <- log(msnset, 2) ## log2 transform ## Feature X46, in the ENO protein has one missig value which(is.na(msnset), arr.ind = TRUE) exprs(msnset["X46", ]) ## Only the missing value in X46 and iTRAQ4.116 will be ignored res <- combineFeatures(msnset, fcol = "ProteinAccession", method = "robust") tail(exprs(res)) msnset2 <- filterNA(msnset) ## remove features with missing value(s) res2 <- combineFeatures(msnset2, fcol = "ProteinAccession", method = "robust") ## Here, the values for ENO are different because the whole feature ## X46 that contained the missing value was removed prior to fitting. tail(exprs(res2))
data(msnset) msnset <- msnset[11:15, ] exprs(msnset) ## arbitrary grouping into two groups grp <- as.factor(c(1, 1, 2, 2, 2)) msnset.comb <- combineFeatures(msnset, groupBy = grp, method = "sum") dim(msnset.comb) exprs(msnset.comb) fvarLabels(msnset.comb) ## grouping with a list grpl <- list(c("A", "B"), "A", "A", "C", c("C", "B")) ## optional naming names(grpl) <- featureNames(msnset) exprs(combineFeatures(msnset, groupBy = grpl, method = "sum", redundancy.handler = "unique")) exprs(combineFeatures(msnset, groupBy = grpl, method = "sum", redundancy.handler = "multiple")) ## missing data exprs(msnset)[4, 4] <- exprs(msnset)[2, 2] <- NA exprs(msnset) ## NAs propagate in the 115 and 117 channels exprs(combineFeatures(msnset, grp, "sum")) ## NAs are removed before summing exprs(combineFeatures(msnset, grp, "sum", na.rm = TRUE)) ## using iPQF data(msnset2) anyNA(msnset2) res <- combineFeatures(msnset2, groupBy = fData(msnset2)$accession, redundancy.handler = "unique", method = "iPQF", low.support.filter = FALSE, ratio.calc = "sum", method.combine = FALSE) head(exprs(res)) ## using robust summarisation data(msnset) ## reset data msnset <- log(msnset, 2) ## log2 transform ## Feature X46, in the ENO protein has one missig value which(is.na(msnset), arr.ind = TRUE) exprs(msnset["X46", ]) ## Only the missing value in X46 and iTRAQ4.116 will be ignored res <- combineFeatures(msnset, fcol = "ProteinAccession", method = "robust") tail(exprs(res)) msnset2 <- filterNA(msnset) ## remove features with missing value(s) res2 <- combineFeatures(msnset2, fcol = "ProteinAccession", method = "robust") ## Here, the values for ENO are different because the whole feature ## X46 that contained the missing value was removed prior to fitting. tail(exprs(res2))
combineSpectra
combines spectra in a MSnExp, OnDiskMSnExp
or MSpectra object applying the summarization function fun
to sets
of spectra defined by a factor (fcol
parameter). The resulting combined
spectrum for each set contains metadata information (present in mcols
and
all spectrum information other than mz
and intensity
) from the first
spectrum in each set.
Combining of spectra for MSnExp or OnDiskMSnExp objects is performed by default for each file separately, combining of spectra across files is thus not possible. See examples for details.
## S4 method for signature 'MSnExp' combineSpectra( object, fcol = "fileIdx", method = meanMzInts, ..., BPPARAM = bpparam() ) ## S4 method for signature 'MSpectra' combineSpectra(object, fcol, method = meanMzInts, fun, ...)
## S4 method for signature 'MSnExp' combineSpectra( object, fcol = "fileIdx", method = meanMzInts, ..., BPPARAM = bpparam() ) ## S4 method for signature 'MSpectra' combineSpectra(object, fcol, method = meanMzInts, fun, ...)
object |
|
fcol |
For |
method |
|
... |
additional arguments for |
BPPARAM |
For |
fun |
Deprecated use |
A MSpectra
or MSnExp
object with combined spectra. Metadata
(mcols
) and all spectrum attributes other than mz
and intensity
are taken from the first Spectrum
in each set.
Johannes Rainer, Laurent Gatto
meanMzInts()
for a function to combine spectra.
set.seed(123) mzs <- seq(1, 20, 0.1) ints1 <- abs(rnorm(length(mzs), 10)) ints1[11:20] <- c(15, 30, 90, 200, 500, 300, 100, 70, 40, 20) # add peak ints2 <- abs(rnorm(length(mzs), 10)) ints2[11:20] <- c(15, 30, 60, 120, 300, 200, 90, 60, 30, 23) ints3 <- abs(rnorm(length(mzs), 10)) ints3[11:20] <- c(13, 20, 50, 100, 200, 100, 80, 40, 30, 20) ## Create the spectra. sp1 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.01), intensity = ints1, rt = 1) sp2 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.01), intensity = ints2, rt = 2) sp3 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.009), intensity = ints3, rt = 3) spctra <- MSpectra(sp1, sp2, sp3, elementMetadata = DataFrame(idx = 1:3, group = c("b", "a", "a"))) ## Combine the spectra reporting the maximym signal res <- combineSpectra(spctra, mzd = 0.05, intensityFun = max) res ## All values other than m/z and intensity are kept from the first spectrum rtime(res) ## Plot the individual and the merged spectrum par(mfrow = c(2, 1), mar = c(4.3, 4, 1, 1)) plot(mz(sp1), intensity(sp1), xlim = range(mzs[5:25]), type = "h", col = "red") points(mz(sp2), intensity(sp2), type = "h", col = "green") points(mz(sp3), intensity(sp3), type = "h", col = "blue") plot(mz(res[[1]]), intensity(res[[1]]), type = "h", col = "black", xlim = range(mzs[5:25])) ## Combine spectra in two sets. res <- combineSpectra(spctra, fcol = "group", mzd = 0.05) res rtime(res) ## Plot the individual and the merged spectra par(mfrow = c(3, 1), mar = c(4.3, 4, 1, 1)) plot(mz(sp1), intensity(sp1), xlim = range(mzs[5:25]), type = "h", col = "red") points(mz(sp2), intensity(sp2), type = "h", col = "green") points(mz(sp3), intensity(sp3), type = "h", col = "blue") plot(mz(res[[1]]), intensity(res[[1]]), xlim = range(mzs[5:25]), type = "h", col = "black") plot(mz(res[[2]]), intensity(res[[2]]), xlim = range(mzs[5:25]), type = "h", col = "black") ## Combining spectra of an MSnExp/OnDiskMSnExp objects ## Reading data from 2 mzML files sciex <- readMSData(dir(system.file("sciex", package = "msdata"), full.names = TRUE), mode = "onDisk") ## Filter the file to a retention time range from 2 to 20 seconds (to reduce ## execution time of the example) sciex <- filterRt(sciex, rt = c(2, 20)) table(fromFile(sciex)) ## We have thus 64 spectra per file. ## In the example below we combine spectra measured in one second to a ## single spectrum. We thus first define the grouping variable and add that ## to the `fData` of the object. For combining, we use the ## `consensusSpectrum` function that combines the spectra keeping only peaks ## that were found in 50% of the spectra; by defining `mzd = 0.01` all peaks ## within an m/z of 0.01 are evaluated for combining. seconds <- round(rtime(sciex)) head(seconds) fData(sciex)$second <- seconds res <- combineSpectra(sciex, fcol = "second", mzd = 0.01, minProp = 0.1, method = consensusSpectrum) table(fromFile(res)) ## The data was reduced to 19 spectra for each file.
set.seed(123) mzs <- seq(1, 20, 0.1) ints1 <- abs(rnorm(length(mzs), 10)) ints1[11:20] <- c(15, 30, 90, 200, 500, 300, 100, 70, 40, 20) # add peak ints2 <- abs(rnorm(length(mzs), 10)) ints2[11:20] <- c(15, 30, 60, 120, 300, 200, 90, 60, 30, 23) ints3 <- abs(rnorm(length(mzs), 10)) ints3[11:20] <- c(13, 20, 50, 100, 200, 100, 80, 40, 30, 20) ## Create the spectra. sp1 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.01), intensity = ints1, rt = 1) sp2 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.01), intensity = ints2, rt = 2) sp3 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.009), intensity = ints3, rt = 3) spctra <- MSpectra(sp1, sp2, sp3, elementMetadata = DataFrame(idx = 1:3, group = c("b", "a", "a"))) ## Combine the spectra reporting the maximym signal res <- combineSpectra(spctra, mzd = 0.05, intensityFun = max) res ## All values other than m/z and intensity are kept from the first spectrum rtime(res) ## Plot the individual and the merged spectrum par(mfrow = c(2, 1), mar = c(4.3, 4, 1, 1)) plot(mz(sp1), intensity(sp1), xlim = range(mzs[5:25]), type = "h", col = "red") points(mz(sp2), intensity(sp2), type = "h", col = "green") points(mz(sp3), intensity(sp3), type = "h", col = "blue") plot(mz(res[[1]]), intensity(res[[1]]), type = "h", col = "black", xlim = range(mzs[5:25])) ## Combine spectra in two sets. res <- combineSpectra(spctra, fcol = "group", mzd = 0.05) res rtime(res) ## Plot the individual and the merged spectra par(mfrow = c(3, 1), mar = c(4.3, 4, 1, 1)) plot(mz(sp1), intensity(sp1), xlim = range(mzs[5:25]), type = "h", col = "red") points(mz(sp2), intensity(sp2), type = "h", col = "green") points(mz(sp3), intensity(sp3), type = "h", col = "blue") plot(mz(res[[1]]), intensity(res[[1]]), xlim = range(mzs[5:25]), type = "h", col = "black") plot(mz(res[[2]]), intensity(res[[2]]), xlim = range(mzs[5:25]), type = "h", col = "black") ## Combining spectra of an MSnExp/OnDiskMSnExp objects ## Reading data from 2 mzML files sciex <- readMSData(dir(system.file("sciex", package = "msdata"), full.names = TRUE), mode = "onDisk") ## Filter the file to a retention time range from 2 to 20 seconds (to reduce ## execution time of the example) sciex <- filterRt(sciex, rt = c(2, 20)) table(fromFile(sciex)) ## We have thus 64 spectra per file. ## In the example below we combine spectra measured in one second to a ## single spectrum. We thus first define the grouping variable and add that ## to the `fData` of the object. For combining, we use the ## `consensusSpectrum` function that combines the spectra keeping only peaks ## that were found in 50% of the spectra; by defining `mzd = 0.01` all peaks ## within an m/z of 0.01 are evaluated for combining. seconds <- round(rtime(sciex)) head(seconds) fData(sciex)$second <- seconds res <- combineSpectra(sciex, fcol = "second", mzd = 0.01, minProp = 0.1, method = consensusSpectrum) table(fromFile(res)) ## The data was reduced to 19 spectra for each file.
combineSpectraMovingWindow
combines signal from consecutive spectra within
a file. The resulting MSnExp
has the same total number of spectra than the
original object, but with each individual's spectrum information
representing aggregated data from the original spectrum and its neighboring
spectra. This is thus equivalent with a smoothing of the data in retention
time dimension.
Note that the function returns always a MSnExp
object, even if x
was an
OnDiskMSnExp
object.
combineSpectraMovingWindow( x, halfWindowSize = 1L, intensityFun = base::mean, mzd = NULL, timeDomain = FALSE, weighted = FALSE, ppm = 0, BPPARAM = bpparam() )
combineSpectraMovingWindow( x, halfWindowSize = 1L, intensityFun = base::mean, mzd = NULL, timeDomain = FALSE, weighted = FALSE, ppm = 0, BPPARAM = bpparam() )
x |
|
halfWindowSize |
|
intensityFun |
|
mzd |
|
timeDomain |
|
weighted |
|
ppm |
|
BPPARAM |
parallel processing settings. |
The method assumes same ions being measured in consecutive scans (i.e. LCMS data) and thus combines their signal which can increase the increase the signal to noise ratio.
Intensities (and m/z values) for signals with the same m/z value in
consecutive scans are aggregated using the intensityFun
.
m/z values of intensities from consecutive scans will never be exactly
identical, even if they represent signal from the same ion. The function
determines thus internally a similarity threshold based on differences
between m/z values within and between spectra below which m/z values are
considered to derive from the same ion. For robustness reasons, this
threshold is estimated on the 100 spectra with the largest number of
m/z - intensity pairs (i.e. mass peaks).
See meanMzInts()
for details.
Parameter timeDomain
: by default, m/z-intensity pairs from consecutive
scans to be aggregated are defined based on the square root of the m/z
values. This is because it is highly likely that in all QTOF MS instruments
data is collected based on a timing circuit (with a certain variance) and
m/z values are later derived based on the relationship t = k * sqrt(m/z)
.
Differences between individual m/z values will thus be dependent on the
actual m/z value causing both the difference between m/z values and their
scattering being different in the lower and upper m/z range. Determining
m/z values to be combined on the sqrt(mz)
reduces this dependency. For
non-QTOF MS data timeDomain = FALSE
might be used instead.
MSnExp
with the same number of spectra than x
.
The function has to read all data into memory for the spectra combining
and thus the memory requirements of this function are high, possibly
preventing its usage on large experimental data. In these cases it is
suggested to perform the combination on a per-file basis and save the
results using the writeMSData()
function afterwards.
Johannes Rainer, Sigurdur Smarason
meanMzInts()
for the function combining spectra provided in
a list
.
estimateMzScattering()
for a function to estimate m/z value scattering in
consecutive spectra.
library(MSnbase) library(msdata) ## Read a profile-mode LC-MS data file. fl <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)[1] od <- readMSData(fl, mode = "onDisk") ## Subset the object to the retention time range that includes the signal ## for proline. This is done for performance reasons. rtr <- c(165, 175) od <- filterRt(od, rtr) ## Combine signal from neighboring spectra. od_comb <- combineSpectraMovingWindow(od) ## The combined spectra have the same number of spectra, same number of ## mass peaks per spectra, but the signal is larger in the combined object. length(od) length(od_comb) peaksCount(od) peaksCount(od_comb) ## Comparing the chromatographic signal for proline (m/z ~ 116.0706) ## before and after spectra data combination. mzr <- c(116.065, 116.075) chr <- chromatogram(od, rt = rtr, mz = mzr) chr_comb <- chromatogram(od_comb, rt = rtr, mz = mzr) par(mfrow = c(1, 2)) plot(chr) plot(chr_comb) ## Chromatographic data is "smoother" after combining.
library(MSnbase) library(msdata) ## Read a profile-mode LC-MS data file. fl <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)[1] od <- readMSData(fl, mode = "onDisk") ## Subset the object to the retention time range that includes the signal ## for proline. This is done for performance reasons. rtr <- c(165, 175) od <- filterRt(od, rtr) ## Combine signal from neighboring spectra. od_comb <- combineSpectraMovingWindow(od) ## The combined spectra have the same number of spectra, same number of ## mass peaks per spectra, but the signal is larger in the combined object. length(od) length(od_comb) peaksCount(od) peaksCount(od_comb) ## Comparing the chromatographic signal for proline (m/z ~ 116.0706) ## before and after spectra data combination. mzr <- c(116.065, 116.075) chr <- chromatogram(od, rt = rtr, mz = mzr) chr_comb <- chromatogram(od_comb, rt = rtr, mz = mzr) par(mfrow = c(1, 2)) plot(chr) plot(chr_comb) ## Chromatographic data is "smoother" after combining.
Subsets MSnSet
instances to their common feature names.
commonFeatureNames(x, y)
commonFeatureNames(x, y)
x |
An instance of class |
y |
An instance of class |
An linkS4class{MSnSetList}
composed of the input
MSnSet
containing only common features in the same
order. The names of the output are either the names of the
x
and y
input variables or the names of x
if a list is provided.
Laurent Gatto
library("pRolocdata") data(tan2009r1) data(tan2009r2) cmn <- commonFeatureNames(tan2009r1, tan2009r2) names(cmn) ## as a named list names(commonFeatureNames(list(a = tan2009r1, b = tan2009r2))) ## without message suppressMessages(cmn <- commonFeatureNames(tan2009r1, tan2009r2)) ## more than 2 instance data(tan2009r3) cmn <- commonFeatureNames(list(tan2009r1, tan2009r2, tan2009r3)) length(cmn)
library("pRolocdata") data(tan2009r1) data(tan2009r2) cmn <- commonFeatureNames(tan2009r1, tan2009r2) names(cmn) ## as a named list names(commonFeatureNames(list(a = tan2009r1, b = tan2009r2))) ## without message suppressMessages(cmn <- commonFeatureNames(tan2009r1, tan2009r2)) ## more than 2 instance data(tan2009r3) cmn <- commonFeatureNames(list(tan2009r1, tan2009r2, tan2009r3)) length(cmn)
Compares two MSnSet
instances. The
qual
and processingData
slots are generally omitted.
compareMSnSets(x, y, qual = FALSE, proc = FALSE)
compareMSnSets(x, y, qual = FALSE, proc = FALSE)
x |
First MSnSet |
y |
Second MSnSet |
qual |
Should the |
proc |
Should the |
A logical
Laurent Gatto
This method compares spectra (Spectrum
instances) pairwise
or all spectra of an experiment (MSnExp
instances). Currently
the comparison is based on the number of common peaks fun = "common"
,
the Pearson correlation fun = "cor"
, the dot product
fun = "dotproduct"
or a user-defined function.
For fun = "common"
the tolerance
(default 25e-6
)
can be set and the tolerance can be defined to be relative (default
relative = TRUE
) or absolute (relative = FALSE
). To
compare spectra with fun = "cor"
and fun = "dotproduct"
,
the spectra need to be binned. The binSize
argument (in Dalton)
controls the binning precision. Please see bin
for
details.
Instead of these three predefined functions for fun
a
user-defined comparison function can be supplied. This function takes
two Spectrum
objects as the first two arguments
and ...
as third argument. The function must return a single
numeric
value. See the example section.
signature(x = "MSnExp", y = "missing", fun =
"character", ...)
Compares all spectra in an MSnExp
object. The ...
arguments are passed to the internal
functions. Returns a matrix
of dimension
length(x)
by length(x)
.
signature(x = "Spectrum", y = "Spectrum", fun =
"character", ...)
Compares two Spectrum
objects. See the
above explanation for fun
and ...
. Returns a single
numeric
value.
Sebastian Gibb <[email protected]>
Stein, S. E., & Scott, D. R. (1994). Optimization and testing of mass spectral library search algorithms for compound identification. Journal of the American Society for Mass Spectrometry, 5(9), 859-866. doi: https://doi.org/10.1016/1044-0305(94)87009-8
Lam, H., Deutsch, E. W., Eddes, J. S., Eng, J. K., King, N., Stein, S. E. and Aebersold, R. (2007) Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics, 7: 655-667. doi: https://doi.org/10.1002/pmic.200600625
bin
, clean
, pickPeaks
,
smooth
, removePeaks
and trimMz
for other spectra processing methods.
s1 <- new("Spectrum2", mz=1:10, intensity=1:10) s2 <- new("Spectrum2", mz=1:10, intensity=10:1) compareSpectra(s1, s2) compareSpectra(s1, s2, fun="cor", binSize=2) compareSpectra(s1, s2, fun="dotproduct") ## define our own (useless) comparison function (it is just a basic example) equalLength <- function(x, y, ...) { return(peaksCount(x)/(peaksCount(y)+.Machine$double.eps)) } compareSpectra(s1, s2, fun=equalLength) compareSpectra(s1, new("Spectrum2", mz=1:5, intensity=1:5), fun=equalLength) compareSpectra(s1, new("Spectrum2"), fun=equalLength) data(itraqdata) compareSpectra(itraqdata[1:5], fun="cor")
s1 <- new("Spectrum2", mz=1:10, intensity=1:10) s2 <- new("Spectrum2", mz=1:10, intensity=10:1) compareSpectra(s1, s2) compareSpectra(s1, s2, fun="cor", binSize=2) compareSpectra(s1, s2, fun="dotproduct") ## define our own (useless) comparison function (it is just a basic example) equalLength <- function(x, y, ...) { return(peaksCount(x)/(peaksCount(y)+.Machine$double.eps)) } compareSpectra(s1, s2, fun=equalLength) compareSpectra(s1, new("Spectrum2", mz=1:5, intensity=1:5), fun=equalLength) compareSpectra(s1, new("Spectrum2"), fun=equalLength) data(itraqdata) compareSpectra(itraqdata[1:5], fun="cor")
consensusSpectrum
takes a list of spectra and combines them to a
consensus spectrum containing mass peaks that are present in a user
definable proportion of spectra.
consensusSpectrum( x, mzd = 0, minProp = 0.5, intensityFun = stats::median, mzFun = stats::median, ppm = 0, weighted = FALSE, ... )
consensusSpectrum( x, mzd = 0, minProp = 0.5, intensityFun = stats::median, mzFun = stats::median, ppm = 0, weighted = FALSE, ... )
x |
|
mzd |
|
minProp |
|
intensityFun |
|
mzFun |
|
ppm |
|
weighted |
|
... |
additional arguments to be passed to |
Peaks from spectra with a difference of their m/z being smaller than mzd
are grouped into the same final mass peak with their intensities being
aggregated with intensityFun
. Alternatively (or in addition) it is
possible to perform an m/z dependent grouping of mass peaks with parameter
ppm
: mass peaks from different spectra with a difference in their m/z
smaller than ppm
of their m/z are grouped into the same final peak.
The m/z of the final mass peaks is calculated with mzFun
. By setting
weighted = TRUE
the parameter mzFun
is ignored and an intensity-weighted
mean of the m/z values from the individual mass peaks is returned as the
peak's m/z.
Johannes Rainer
Other spectra combination functions:
meanMzInts()
library(MSnbase) ## Create 3 example spectra. sp1 <- new("Spectrum2", rt = 1, precursorMz = 1.41, mz = c(1.2, 1.5, 1.8, 3.6, 4.9, 5.0, 7.8, 8.4), intensity = c(10, 3, 140, 14, 299, 12, 49, 20)) sp2 <- new("Spectrum2", rt = 1.1, precursorMz = 1.4102, mz = c(1.4, 1.81, 2.4, 4.91, 6.0, 7.2, 9), intensity = c(3, 184, 8, 156, 12, 23, 10)) sp3 <- new("Spectrum2", rt = 1.2, precursorMz = 1.409, mz = c(1, 1.82, 2.2, 3, 7.0, 8), intensity = c(8, 210, 7, 101, 17, 8)) spl <- MSpectra(sp1, sp2, sp3) ## Plot the spectra, each in a different color par(mfrow = c(2, 1), mar = c(4.3, 4, 1, 1)) plot(mz(sp1), intensity(sp1), type = "h", col = "#ff000080", lwd = 2, xlab = "m/z", ylab = "intensity", xlim = range(mz(spl)), ylim = range(intensity(spl))) points(mz(sp2), intensity(sp2), type = "h", col = "#00ff0080", lwd = 2) points(mz(sp3), intensity(sp3), type = "h", col = "#0000ff80", lwd = 2) cons <- consensusSpectrum(spl, mzd = 0.02, minProp = 2/3) ## Peaks of the consensus spectrum mz(cons) intensity(cons) ## Other Spectrum data is taken from the first Spectrum in the list rtime(cons) precursorMz(cons) plot(mz(cons), intensity(cons), type = "h", xlab = "m/z", ylab = "intensity", xlim = range(mz(spl)), ylim = range(intensity(spl)), lwd = 2)
library(MSnbase) ## Create 3 example spectra. sp1 <- new("Spectrum2", rt = 1, precursorMz = 1.41, mz = c(1.2, 1.5, 1.8, 3.6, 4.9, 5.0, 7.8, 8.4), intensity = c(10, 3, 140, 14, 299, 12, 49, 20)) sp2 <- new("Spectrum2", rt = 1.1, precursorMz = 1.4102, mz = c(1.4, 1.81, 2.4, 4.91, 6.0, 7.2, 9), intensity = c(3, 184, 8, 156, 12, 23, 10)) sp3 <- new("Spectrum2", rt = 1.2, precursorMz = 1.409, mz = c(1, 1.82, 2.2, 3, 7.0, 8), intensity = c(8, 210, 7, 101, 17, 8)) spl <- MSpectra(sp1, sp2, sp3) ## Plot the spectra, each in a different color par(mfrow = c(2, 1), mar = c(4.3, 4, 1, 1)) plot(mz(sp1), intensity(sp1), type = "h", col = "#ff000080", lwd = 2, xlab = "m/z", ylab = "intensity", xlim = range(mz(spl)), ylim = range(intensity(spl))) points(mz(sp2), intensity(sp2), type = "h", col = "#00ff0080", lwd = 2) points(mz(sp3), intensity(sp3), type = "h", col = "#0000ff80", lwd = 2) cons <- consensusSpectrum(spl, mzd = 0.02, minProp = 2/3) ## Peaks of the consensus spectrum mz(cons) intensity(cons) ## Other Spectrum data is taken from the first Spectrum in the list rtime(cons) precursorMz(cons) plot(mz(cons), intensity(cons), type = "h", xlab = "m/z", ylab = "intensity", xlim = range(mz(spl)), ylim = range(intensity(spl)), lwd = 2)
estimateMzResolution
estimates the m/z resolution of a profile-mode
Spectrum
(or of all spectra in an MSnExp or OnDiskMSnExp object.
The m/z resolution is defined as the most frequent difference between a
spectrum's m/z values.
## S4 method for signature 'MSnExp' estimateMzResolution(object, ...) ## S4 method for signature 'Spectrum' estimateMzResolution(object, ...)
## S4 method for signature 'MSnExp' estimateMzResolution(object, ...) ## S4 method for signature 'Spectrum' estimateMzResolution(object, ...)
object |
either a |
... |
currently not used. |
numeric(1)
with the m/z resolution. If called on a MSnExp
or
OnDiskMSnExp
a list
of m/z resolutions are returned (one for
each spectrum).
This assumes the data to be in profile mode and does not return meaningful results for centroided data.
The estimated m/z resolution depends on the number of ions detected in a spectrum, as some instrument don't measure (or report) signal if below a certain threshold.
Johannes Rainer
## Load a profile mode example file library(BiocParallel) register(SerialParam()) library(msdata) f <- proteomics(full.names = TRUE, pattern = "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz") od <- readMSData(f, mode = "onDisk") ## Estimate the m/z resolution on the 3rd spectrum. estimateMzResolution(od[[3]]) ## Estimate the m/z resolution for each spectrum mzr <- estimateMzResolution(od) ## plot the distribution of estimated m/z resolutions. The bimodal ## distribution represents the m/z resolution of the MS1 (first peak) and ## MS2 spectra (second peak). plot(density(unlist(mzr)))
## Load a profile mode example file library(BiocParallel) register(SerialParam()) library(msdata) f <- proteomics(full.names = TRUE, pattern = "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz") od <- readMSData(f, mode = "onDisk") ## Estimate the m/z resolution on the 3rd spectrum. estimateMzResolution(od[[3]]) ## Estimate the m/z resolution for each spectrum mzr <- estimateMzResolution(od) ## plot the distribution of estimated m/z resolutions. The bimodal ## distribution represents the m/z resolution of the MS1 (first peak) and ## MS2 spectra (second peak). plot(density(unlist(mzr)))
Estimate scattering of m/z values (due to technical, instrument specific noise) for the same ion in consecutive scans of a LCMS experiment.
estimateMzScattering(x, halfWindowSize = 1L, timeDomain = FALSE)
estimateMzScattering(x, halfWindowSize = 1L, timeDomain = FALSE)
x |
|
halfWindowSize |
|
timeDomain |
|
The m/z values of the same ions in consecutive scans (spectra) of a LCMS run will not be identical. This random noise is expected to be smaller than the resolution of the MS instrument. The distribution of differences of m/z values from neighboring spectra is thus expected to be (at least) bi-modal with the first peak representing the above described random variation and the second (or largest) peak the m/z resolution. The m/z value of the first local minimum between these first two peaks in the distribution is returned as the m/z scattering.
For timeDomain = TRUE
the function does not return the estimated
scattering of m/z values, but the scattering of sqrt(mz)
values.
Johannes Rainer
estimateMzResolution()
for the function to estimate a
profile-mode spectrum's m/z resolution from it's data.
library(MSnbase) library(msdata) ## Load a profile-mode LC-MS data file f <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)[1] od <- readMSData(f, mode = "onDisk") im <- as(filterRt(od, c(10, 20)), "MSnExp") res <- estimateMzScattering(im) ## Plot the distribution of estimated m/z scattering plot(density(unlist(res))) ## Compare the m/z resolution and m/z scattering of the spectrum with the ## most peaks idx <- which.max(unlist(spectrapply(im, peaksCount))) res[[idx]] abline(v = res[[idx]], lty = 2) estimateMzResolution(im[[idx]]) ## As expected, the m/z scattering is much lower than the m/z resolution.
library(MSnbase) library(msdata) ## Load a profile-mode LC-MS data file f <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)[1] od <- readMSData(f, mode = "onDisk") im <- as(filterRt(od, c(10, 20)), "MSnExp") res <- estimateMzScattering(im) ## Plot the distribution of estimated m/z scattering plot(density(unlist(res))) ## Compare the m/z resolution and m/z scattering of the spectrum with the ## most peaks idx <- which.max(unlist(spectrapply(im, peaksCount))) res[[idx]] abline(v = res[[idx]], lty = 2) estimateMzResolution(im[[idx]]) ## As expected, the m/z scattering is much lower than the m/z resolution.
This method performs a noise estimation on individual spectra
(Spectrum
instances).
There are currently two different noise estimators, the
Median Absolute Deviation (method = "MAD"
) and
Friedman's Super Smoother (method = "SuperSmoother"
),
as implemented in the MALDIquant::detectPeaks
and
MALDIquant::estimateNoise
functions respectively.
signature(object = "Spectrum", method = "character", ...)
Estiamtes the noise in a non-centroided spectrum (Spectrum
instance).
method
could be "MAD"
or "SuperSmoother"
.
The arguments ...
are passed to the noise estimator functions
implemented in MALDIquant::estimateNoise
.
Currenlty only the method = "SuperSmoother"
accepts additional
arguments, e.g. span
. Please see supsmu
for
details.
This method returns a two-column matrix with the m/z and intensity values
in the first and the second column.
signature(object = "MSnExp", method = "character", ...)
Estimates noise for all spectra in object
.
Sebastian Gibb <[email protected]>
S. Gibb and K. Strimmer. 2012. MALDIquant: a versatile R package for the analysis of mass spectrometry data. Bioinformatics 28: 2270-2271. http://strimmerlab.org/software/maldiquant/
pickPeaks
, and the underlying method in MALDIquant
:
estimateNoise
.
sp1 <- new("Spectrum1", intensity = c(1:6, 5:1), mz = 1:11, centroided = FALSE) estimateNoise(sp1, method = "SuperSmoother")
sp1 <- new("Spectrum1", intensity = c(1:6, 5:1), mz = 1:11, centroided = FALSE) estimateNoise(sp1, method = "SuperSmoother")
The expandFeatureVars
and mergeFeatureVars
respectively expand
and merge groups of feature variables. Using these functions, a
set of columns in a feature data can be merged into a single new
data.frame-column variables and a data.frame-column can be
expanded into single feature columns. The original feature
variables are removed.
expandFeatureVars(x, fcol, prefix) mergeFeatureVars(x, fcol, fcol2)
expandFeatureVars(x, fcol, prefix) mergeFeatureVars(x, fcol, fcol2)
x |
An object of class |
fcol |
A |
prefix |
A |
fcol2 |
A |
An MSnSet
for expanded (merged) feature variables.
Laurent Gatto
library("pRolocdata") data(hyperLOPIT2015) fvarLabels(hyperLOPIT2015) ## Let's merge all svm prediction feature variables (k <- grep("^svm", fvarLabels(hyperLOPIT2015), value = TRUE)) hl <- mergeFeatureVars(hyperLOPIT2015, fcol = k, fcol2 = "SVM") fvarLabels(hl) head(fData(hl)$SVM) ## Let's expand the new SVM into individual columns hl2 <- expandFeatureVars(hl, "SVM") fvarLabels(hl2) ## We can set the prefix manually hl2 <- expandFeatureVars(hl, "SVM", prefix = "Expanded") fvarLabels(hl2) ## If we don't want any prefix hl2 <- expandFeatureVars(hl, "SVM", prefix = NULL) fvarLabels(hl2)
library("pRolocdata") data(hyperLOPIT2015) fvarLabels(hyperLOPIT2015) ## Let's merge all svm prediction feature variables (k <- grep("^svm", fvarLabels(hyperLOPIT2015), value = TRUE)) hl <- mergeFeatureVars(hyperLOPIT2015, fcol = k, fcol2 = "SVM") fvarLabels(hl) head(fData(hl)$SVM) ## Let's expand the new SVM into individual columns hl2 <- expandFeatureVars(hl, "SVM") fvarLabels(hl2) ## We can set the prefix manually hl2 <- expandFeatureVars(hl, "SVM", prefix = "Expanded") fvarLabels(hl2) ## If we don't want any prefix hl2 <- expandFeatureVars(hl, "SVM", prefix = NULL) fvarLabels(hl2)
Extracts the MSMS spectra that originate from the precursor(s) having
the same MZ value as defined in theprec
argument.
A warning will be issued of one or several of the precursor MZ values
in prec
are absent in the experiment precursor MZ values (i.e
in precursorMz(object)
).
signature(object = "MSnExp", prec = "numeric")
Returns an "MSnExp"
containing MSMS spectra
whose precursor MZ values are in prec
.
Laurent Gatto
file <- dir(system.file(package="MSnbase",dir="extdata"), full.name=TRUE,pattern="mzXML$") aa <- readMSData(file,verbose=FALSE) my.prec <- precursorMz(aa)[1] my.prec bb <- extractPrecSpectra(aa,my.prec) precursorMz(bb) processingData(bb)
file <- dir(system.file(package="MSnbase",dir="extdata"), full.name=TRUE,pattern="mzXML$") aa <- readMSData(file,verbose=FALSE) my.prec <- precursorMz(aa)[1] my.prec bb <- extractPrecSpectra(aa,my.prec) precursorMz(bb) processingData(bb)
The Spectra package provides a more robust and efficient infrastructure for mass spectrometry data handling and analysis. So, wherever possible, the newer Spectra package should be used instead of the MSnbase. The functions listed here allow to convert between objects from the MSnbase and Spectra packages.
extractSpectraData
extracts the spectra data (m/z and intensity values
including metadata) from MSnExp, OnDiskMSnExp,
Spectrum1, Spectrum2 objects (or list
of such objects) and
returns these as a DataFrame
that can be used to create a
Spectra::Spectra object.This function enables thus
to convert data from the old MSnbase
package to the newer Spectra
package.
To convert a Spectra
object to a MSpectra
object use
as(sps, "MSpectra")
where sps
is a Spectra
object.
extractSpectraData(x)
extractSpectraData(x)
x |
a |
extracSpectraData()
returns a DataFrame()
with the full spectrum data
that can be passed to the Spectra::Spectra()
function to create a
Spectra
object.
as(x, "MSpectra")
returns a MSpectra
object with the content of the
Spectra
object x
.
Coercion from Spectra
to a MSpectra
will only assign values to the
contained Spectrum1
and Spectrum2
objects, but will not add all
eventually spectra variables present in Spectra
.
Johannes Rainer
## Read an mzML file with MSnbase fl <- system.file("TripleTOF-SWATH", "PestMix1_SWATH.mzML", package = "msdata") data <- filterRt(readMSData(fl, mode = "onDisk"), rt = c(1, 6)) ## Extract the data as a DataFrame res <- extractSpectraData(data) res library(Spectra) ## This can be used as an input for the Spectra constructor of the ## Spectra package: sps <- Spectra::Spectra(res) sps ## A Spectra object can be coerced to a MSnbase MSpectra object using msps <- as(sps, "MSpectra")
## Read an mzML file with MSnbase fl <- system.file("TripleTOF-SWATH", "PestMix1_SWATH.mzML", package = "msdata") data <- filterRt(readMSData(fl, mode = "onDisk"), rt = c(1, 6)) ## Extract the data as a DataFrame res <- extractSpectraData(data) res library(Spectra) ## This can be used as an input for the Spectra constructor of the ## Spectra package: sps <- Spectra::Spectra(res) sps ## A Spectra object can be coerced to a MSnbase MSpectra object using msps <- as(sps, "MSpectra")
This function produces the opposite as the stringsAsFactors
argument in the data.frame
or read.table
functions;
it converts factors
columns to characters
.
factorsAsStrings(x)
factorsAsStrings(x)
x |
A |
A data.frame
where factors
are converted to
characters
.
Laurent Gatto
data(iris) str(iris) str(factorsAsStrings(iris))
data(iris) str(iris) str(factorsAsStrings(iris))
"FeatComp"
Comparing feature names of two comparable MSnSet
instances.
Objects can be created with compfnames
. The method compares the
feature names of two objects of class "MSnSet"
. It prints a
summary matrix of common and unique feature names and invisibly
returns a list of FeatComp
instances.
The function will compute the common and unique features for all
feature names of the two input objects (featureNames(x)
and
feautreNames(y)
) as well as distinct subsets as defined in the
fcol1
and fcol2
feautre variables.
name
:Object of class "character"
defining the
name of the compared features. By convention, "all"
is used
when all feature names are used; otherwise, the respective levels of
the feature variables fcol1
and fcol2
.
common
:Object of class "character"
with the common
feature names.
unique1
:Object of class "character"
with the
features unique to the first MSnSet
(x
in
compfname
).
unique2
:Object of class "character"
with the
features unique to the seconn MSnSet
(y
in
compfname
).
all
:Object of class "logical"
defining if all
features of only a subset were compared. One expects that
name == "all"
when all
is TRUE
.
Accessors names
, common
, unique1
and
unique2
can be used to access the respective FeatComp
slots.
signature(x = "MSnSet", y = "MSnSet", fcol1
= "character", fcol2 = "character", simplify = "logical",
verbose = "logical")
: creates the FeatComp
comparison
object for instances x
and y
. The feature
variables to be considered to details feature comparison can be
defined by fcol1
(default is "markers"
and
fcol2
for x
and y
respectively). Setting
either to NULL
will only consider all feature names; in
such case, of simplify
is TRUE
(default), an
FeatComp
object is returned instead of a list of length
1. The verbose
logical controls if a summary table needs
to be printed (default is TRUE
).
signature(x = "list", y = "missing", ...)
:
when x
is a list of MSnSet
instances,
compfnames
is applied to all element pairs of
x
. Additional parameters fcol1
, fcol2
,
simplify
and verbose
are passed to the pairwise
comparison method.
signature(object = "FeatComp")
: prints a summary
of the object
.
Laurent Gatto and Thomas Naake
averageMSnSet
to compuate an average MSnSet
.
library("pRolocdata") data(tan2009r1) data(tan2009r2) x <- compfnames(tan2009r1, tan2009r2) x[[1]] x[2:3] head(common(x[[1]])) data(tan2009r3) tanl <- list(tan2009r1, tan2009r2, tan2009r3) xx <- compfnames(tanl, fcol1 = NULL) length(xx) tail(xx) all.equal(xx[[15]], compfnames(tan2009r2, tan2009r3, fcol1 = NULL)) str(sapply(xx, common))
library("pRolocdata") data(tan2009r1) data(tan2009r2) x <- compfnames(tan2009r1, tan2009r2) x[[1]] x[2:3] head(common(x[[1]])) data(tan2009r3) tanl <- list(tan2009r1, tan2009r2, tan2009r3) xx <- compfnames(tanl, fcol1 = NULL) length(xx) tail(xx) all.equal(xx[[15]], compfnames(tan2009r2, tan2009r3, fcol1 = NULL)) str(sapply(xx, common))
This function calculates the column-wise coefficient of variation
(CV), i.e. the ration between the standard deviation and the
mean, for the features in an MSnSet
. The CVs are calculated
for the groups of features defined by groupBy
. For groups
defined by single features, NA
is returned.
featureCV(x, groupBy, na.rm = TRUE, norm = "none", suffix = NULL)
featureCV(x, groupBy, na.rm = TRUE, norm = "none", suffix = NULL)
x |
An instance of class |
groupBy |
An object of class |
na.rm |
A |
norm |
One of normalisation methods applied prior to CV
calculation. See |
suffix |
A |
A matrix
of dimensions length(levels(groupBy))
by
ncol(x)
with the respecive CVs. The column names are formed
by pasting CV.
and the sample names of object x
, possibly
suffixed by .suffix
.
Laurent Gatto and Sebastian Gibb
data(msnset) msnset <- msnset[1:4] gb <- factor(rep(1:2, each = 2)) featureCV(msnset, gb) featureCV(msnset, gb, suffix = "2")
data(msnset) msnset <- msnset[1:4] gb <- factor(rep(1:2, each = 2)) featureCV(msnset, gb) featureCV(msnset, gb, suffix = "2")
The Features of Interest infrastructure allows to define a set
of features of particular interest to be used/matched against existing
data sets contained in "MSnSet"
. A specific set
of features is stored as an FeaturesOfInterest
object and a
collection of such non-redundant instances (for example for a specific
organism, project, ...) can be collected in a FoICollection
.
Objects can be created with the respective FeaturesOfInterest
and FoICollection
constructors.
FeaturesOfInterest
instances can be generated in two different
ways: the constructor takes either (1) a set of features names (a
character
vector) and a description (character
of length
1 - any subsequent elements are silently ignored) or (2) feature
names, a description and an instance of class
"MSnSet"
. In the latter case, we call such
FeaturesOfInterest
objects traceable, because we can identify
the origin of the feature names and thus their validity. This is done
by inspecting the MSnSet
instance and recording its dimensions,
its name and a unique md5 hash tag (these are stores as part of the
optional objpar
slot). In such cases, the feature names passed
to the FeaturesOfInterest
constructor must also be present in
the MSnSet
; if one or more are not, an error will be thrown. If
your features of interest to be recorded stem for an existing
experiment and have all been observed, it is advised to pass the 3
arguments to the constructor to ensure that the feature names as
valid. Otherwise, only the third argument should be omitted.
FoICollection
instances can be constructed by creating an empty
collection and serial additions of FeaturesOfInterest
using
addFeaturesOfInterest
or by passing a list of
FeaturesOfInterest
instance.
FeaturesOfInterest
class:
description
:Object of class "character"
describing the instance.
objpar
:Optional object of class "list"
providing details about the MSnSet
instance originally used
to create the instance. See details section.
fnames
:Object of class "character"
with the
feature of interest names.
date
:Object of class "character"
with the date
the instance was first generated.
.__classVersion__
: Object of class "Versions"
with the FeaturesOfInterest
class version. Only relevant for
development.
FoICollection
class:
foic
:Object of class "list"
with the
FeaturesOfInterest
.
.__classVersion__
:Object of class "Versions"
with the FoICollection
class version. Only relevant for
development.
Class "Versioned"
, directly.
FeaturesOfInterest
class:
signature(object = "FeaturesOfInterest")
:
returns the description of object
.
signature(object = "FeaturesOfInterest")
: returns
the features of interests.
signature(x = "FeaturesOfInterest")
: returns
the number of features of interest in x
.
signature(object = "FeaturesOfInterest")
:
displays object
.
signature(x = "FeaturesOfInterst", y =
"MSnSet", count = "logical")
: if count
is FALSE
(default), return a logical indicating whether there is at least
one feautre of interest present in x
? Otherwise, returns
the number of such features. Works also with matrices and
data.frames.
Subsetting works like lists. Returns a new
FoICollection
.
Subsetting works like lists. Returns a new
FeatureOfInterest
.
FoICollection
class:
signature(object = "FoICollection")
:
returns the description of object
.
signature(object = "FoICollection")
: returns a
list of FeaturesOfInterest
.
signature(x = "FoICollection")
: returns the
number of FeaturesOfInterest
in the collection.
signature(x = "FoICollection")
: returns the
number of features of interest in each FeaturesOfInterest
in the collection x
.
signature(x =
"FeaturesOfInterest", y = "FoICollection")
: add the
FeaturesOfInterest
instance x
to
FoICollection
y
. If x
is already present, a
message is printed and y
is returned unchanged.
signature(object =
"FoICollection", i = "numeric")
: removes the i
th
FeatureOfInterest
in the collection object
.
signature(object = "FoICollection")
: displays
object
.
Laurent Gatto
library("pRolocdata") data(tan2009r1) x <- FeaturesOfInterest(description = "A traceable test set of features of interest", fnames = featureNames(tan2009r1)[1:10], object = tan2009r1) x description(x) foi(x) y <- FeaturesOfInterest(description = "Non-traceable features of interest", fnames = featureNames(tan2009r1)[111:113]) y ## an illegal FeaturesOfInterest try(FeaturesOfInterest(description = "Won't work", fnames = c("A", "Z", featureNames(tan2009r1)), object = tan2009r1)) FeaturesOfInterest(description = "This work, but not traceable", fnames = c("A", "Z", featureNames(tan2009r1))) xx <- FoICollection() xx xx <- addFeaturesOfInterest(x, xx) xx <- addFeaturesOfInterest(y, xx) names(xx) <- LETTERS[1:2] xx ## Sub-setting xx[1] xx[[1]] xx[["A"]] description(xx) foi(xx) fnamesIn(x, tan2009r1) fnamesIn(x, tan2009r1, count = TRUE) rmFeaturesOfInterest(xx, 1)
library("pRolocdata") data(tan2009r1) x <- FeaturesOfInterest(description = "A traceable test set of features of interest", fnames = featureNames(tan2009r1)[1:10], object = tan2009r1) x description(x) foi(x) y <- FeaturesOfInterest(description = "Non-traceable features of interest", fnames = featureNames(tan2009r1)[111:113]) y ## an illegal FeaturesOfInterest try(FeaturesOfInterest(description = "Won't work", fnames = c("A", "Z", featureNames(tan2009r1)), object = tan2009r1)) FeaturesOfInterest(description = "This work, but not traceable", fnames = c("A", "Z", featureNames(tan2009r1))) xx <- FoICollection() xx xx <- addFeaturesOfInterest(x, xx) xx <- addFeaturesOfInterest(y, xx) names(xx) <- LETTERS[1:2] xx ## Sub-setting xx[1] xx[[1]] xx[["A"]] description(xx) foi(xx) fnamesIn(x, tan2009r1) fnamesIn(x, tan2009r1, count = TRUE) rmFeaturesOfInterest(xx, 1)
This function replaces all the empty characters ""
and/or
NA
s with the value of the closest preceding the preceding
non-NA
/""
element. The function is used to populate
dataframe or matrice columns where only the cells of the first row in
a set of partially identical rows are explicitly populated and the
following are empty.
fillUp(x)
fillUp(x)
x |
a vector. |
A vector as x
with all empty characters ""
and NA
values replaced by the preceding non-NA
/""
value.
Laurent Gatto
d <- data.frame(protein=c("Prot1","","","Prot2","",""), peptide=c("pep11","","pep12","pep21","pep22",""), score=c(1:2,NA,1:3)) d e <- apply(d,2,fillUp) e data.frame(e) fillUp(d[,1])
d <- data.frame(protein=c("Prot1","","","Prot2","",""), peptide=c("pep11","","pep12","pep21","pep22",""), score=c(1:2,NA,1:3)) d e <- apply(d,2,fillUp) e data.frame(e) fillUp(d[,1])
A function to filter out PSMs matching to the decoy database, of rank greater than one and matching non-proteotypic peptides.
filterIdentificationDataFrame( x, decoy = "isDecoy", rank = "rank", accession = "DatabaseAccess", spectrumID = "spectrumID", verbose = isMSnbaseVerbose() )
filterIdentificationDataFrame( x, decoy = "isDecoy", rank = "rank", accession = "DatabaseAccess", spectrumID = "spectrumID", verbose = isMSnbaseVerbose() )
x |
A |
decoy |
The column name defining whether entries match the
decoy database. Default is |
rank |
The column name holding the rank of the PSM. Default
is |
accession |
The column name holding the protein (groups)
accession. Default is |
spectrumID |
The name of the spectrum identifier
column. Default is |
verbose |
A |
The PSMs should be stored in a data.frame
such as those produced
by readMzIdData()
. Note that this function should be called
before calling the reduce method on a
PSM data.frame
.
A new data.frame
with filtered out peptides and with the
same columns as the input x
.
Laurent Gatto
This function is used to convert retention times. Conversion is
seconds to/from the more human friendly format "mm:sec". The
implementation is from MsCoreUtils::formatRt()
.
formatRt(rt)
formatRt(rt)
rt |
retention time in seconds ( |
A vector of same length as rt
.
Laurent Gatto and Sebastian Gibb
formatRt(1524) formatRt("25:24")
formatRt(1524) formatRt("25:24")
Return the name of variable varname
in call match_call
.
getVariableName(match_call, varname)
getVariableName(match_call, varname)
match_call |
An object of class |
varname |
An |
A character
with the name of the variable passed as parameter
varname
in parent close of match_call
.
Laurent Gatto
a <- 1 f <- function(x, y) MSnbase:::getVariableName(match.call(), "x") f(x = a) f(y = a)
a <- 1 f <- function(x, y) MSnbase:::getVariableName(match.call(), "x") f(x = a) f(y = a)
Given a text spread sheet f
and a pattern
to
be matched to its header (first line in the file), the function
returns the matching columns names or indices of the
corresponding data.frame
.
The function starts by reading the first line of the file (or connection)
f
with readLines
, then splits it
according to the optional ...
arguments (it is important to
correctly specify strsplit
's split
character vector here)
and then matches pattern
to the individual column names using
grep
.
Similarly, getEcols
can be used to explore the column names and
decide for the appropriate pattern
value.
These functions are useful to check the parameters to be provided to
readMSnSet2
.
grepEcols(f, pattern, ..., n = 1) getEcols(f, ..., n = 1)
grepEcols(f, pattern, ..., n = 1) getEcols(f, ..., n = 1)
f |
A connection object or a |
pattern |
A |
... |
Additional parameters passed to |
n |
An |
Depending on value
, the matching column names of
indices. In case of getEcols
, a character
of
column names.
Laurent Gatto
Helper functions to check whether raw files contain spectra or chromatograms.
hasSpectra(files) hasChromatograms(files)
hasSpectra(files) hasChromatograms(files)
files |
A |
A logical(n)
where n == length(x)
with TRUE
if that
files contains at least one spectrum, FALSE
otherwise.
Laurent Gatto
f <- msdata::proteomics(full.names = TRUE)[1:2] hasSpectra(f) hasChromatograms(f)
f <- msdata::proteomics(full.names = TRUE)[1:2] hasSpectra(f) hasChromatograms(f)
Produces a heatmap after reordring rows and columsn to highlight missing value patterns.
imageNA2( object, pcol, Rowv, Colv = TRUE, useGroupMean = FALSE, plot = TRUE, ... )
imageNA2( object, pcol, Rowv, Colv = TRUE, useGroupMean = FALSE, plot = TRUE, ... )
object |
An instance of class MSnSet |
pcol |
Either the name of a phenoData variable to be used to
determine the group structure or a factor or any object that can
be coerced as a factor of length equal to nrow(object). The
resulting factor must have 2 levels. If missing (default)
|
Rowv |
Determines if and how the rows/features are
reordered. If missing (default), rows are reordered according to
|
Colv |
A |
useGroupMean |
Replace individual feature intensities by the group mean intensity. Default is FALSE. |
plot |
A |
... |
Additional arguments passed to |
Used for its side effect of plotting. Invisibly returns Rovw and Colv.
Laurent Gatto, Samuel Wieczorek and Thomas Burger
library("pRolocdata") library("pRoloc") data(dunkley2006) pcol <- ifelse(dunkley2006$fraction <= 5, "A", "B") nax <- makeNaData(dunkley2006, pNA = 0.10) exprs(nax)[sample(nrow(nax), 30), pcol == "A"] <- NA exprs(nax)[sample(nrow(nax), 50), pcol == "B"] <- NA MSnbase:::imageNA2(nax, pcol) MSnbase:::imageNA2(nax, pcol, useGroupMean = TRUE) MSnbase:::imageNA2(nax, pcol, Colv = FALSE, useGroupMean = FALSE) MSnbase:::imageNA2(nax, pcol, Colv = FALSE, useGroupMean = TRUE)
library("pRolocdata") library("pRoloc") data(dunkley2006) pcol <- ifelse(dunkley2006$fraction <= 5, "A", "B") nax <- makeNaData(dunkley2006, pNA = 0.10) exprs(nax)[sample(nrow(nax), 30), pcol == "A"] <- NA exprs(nax)[sample(nrow(nax), 50), pcol == "B"] <- NA MSnbase:::imageNA2(nax, pcol) MSnbase:::imageNA2(nax, pcol, useGroupMean = TRUE) MSnbase:::imageNA2(nax, pcol, Colv = FALSE, useGroupMean = FALSE) MSnbase:::imageNA2(nax, pcol, Colv = FALSE, useGroupMean = TRUE)
The impute
method performs data imputation on MSnSet
instances
using a variety of methods.
Users should proceed with care when imputing data and take precautions to assure that the imputation produce valid results, in particular with naive imputations such as replacing missing values with 0.
See MsCoreUtils::impute_matrix()
for details on the different
imputation methods available and strategies.
## S4 method for signature 'MSnSet' impute(object, method, ...)
## S4 method for signature 'MSnSet' impute(object, method, ...)
object |
An |
method |
|
... |
Additional parameters passed to the inner imputation
function. See |
data(naset) ## table of missing values along the rows table(fData(naset)$nNA) ## table of missing values along the columns pData(naset)$nNA ## non-random missing values notna <- which(!fData(naset)$randna) length(notna) notna impute(naset, method = "min") if (require("imputeLCMD")) { impute(naset, method = "QRILC") impute(naset, method = "MinDet") } if (require("norm")) impute(naset, method = "MLE") impute(naset, "mixed", randna = fData(naset)$randna, mar = "knn", mnar = "QRILC") ## neighbour averaging x <- naset[1:4, 1:6] exprs(x)[1, 1] <- NA ## min value exprs(x)[2, 3] <- NA ## average exprs(x)[3, 1:2] <- NA ## min value and average ## 4th row: no imputation exprs(x) exprs(impute(x, "nbavg"))
data(naset) ## table of missing values along the rows table(fData(naset)$nNA) ## table of missing values along the columns pData(naset)$nNA ## non-random missing values notna <- which(!fData(naset)$randna) length(notna) notna impute(naset, method = "min") if (require("imputeLCMD")) { impute(naset, method = "QRILC") impute(naset, method = "MinDet") } if (require("norm")) impute(naset, method = "MLE") impute(naset, "mixed", randna = fData(naset)$randna, mar = "knn", mnar = "QRILC") ## neighbour averaging x <- naset[1:4, 1:6] exprs(x)[1, 1] <- NA ## min value exprs(x)[2, 3] <- NA ## average exprs(x)[3, 1:2] <- NA ## min value and average ## 4th row: no imputation exprs(x) exprs(impute(x, "nbavg"))
The iPQF spectra-to-protein summarisation method integrates
peptide spectra characteristics and quantitative values for protein
quantitation estimation. Spectra features, such as charge state,
sequence length, identification score and others, contain valuable
information concerning quantification accuracy. The iPQF algorithm
assigns weights to spectra according to their overall feature reliability
and computes a weighted mean to estimate protein quantities.
See also combineFeatures
for a more
general overview of feature aggregation and examples.
iPQF( object, groupBy, low.support.filter = FALSE, ratio.calc = "sum", method.combine = FALSE, feature.weight = c(7, 6, 4, 3, 2, 1, 5)^2 )
iPQF( object, groupBy, low.support.filter = FALSE, ratio.calc = "sum", method.combine = FALSE, feature.weight = c(7, 6, 4, 3, 2, 1, 5)^2 )
object |
An instance of class |
groupBy |
Vector defining spectra to protein
matching. Generally, this is a feature variable such as
|
low.support.filter |
A |
ratio.calc |
Either |
method.combine |
A |
feature.weight |
Vector |
A matrix
with estimated protein ratios.
Martina Fischer
iPQF: a new peptide-to-protein summarization method using peptide spectra characteristics to improve protein quantification. Fischer M, Renard BY. Bioinformatics. 2016 Apr 1;32(7):1040-7. doi:10.1093/bioinformatics/btv675. Epub 2015 Nov 20. PubMed PMID:26589272.
data(msnset2) head(exprs(msnset2)) prot <- combineFeatures(msnset2, groupBy = fData(msnset2)$accession, method = "iPQF") head(exprs(prot))
data(msnset2) head(exprs(msnset2)) prot <- combineFeatures(msnset2, groupBy = fData(msnset2)$accession, method = "iPQF") head(exprs(prot))
The function extracts the mode (profile or centroided) from the
raw mass spectrometry file by parsing the mzML file directly. If
the object x
stems from any other type of file, NA
s are
returned.
isCentroidedFromFile(x)
isCentroidedFromFile(x)
x |
An object of class OnDiskMSnExp. |
This function is much faster than isCentroided()
, which
estimates mode from the data, but is limited to data stemming from
mzML files which are still available in their original location
(and accessed with fileNames(x)
).
A named logical
vector of the same length as x
.
Laurent Gatto
library("msdata") f <- proteomics(full.names = TRUE, pattern = "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz") x <- readMSData(f, mode = "onDisk") table(isCentroidedFromFile(x), msLevel(x))
library("msdata") f <- proteomics(full.names = TRUE, pattern = "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz") x <- readMSData(f, mode = "onDisk") table(isCentroidedFromFile(x), msLevel(x))
This instance of class "ReporterIons"
corresponds
to the iTRAQ 4-plex set, i.e the 114, 115, 116 and 117 isobaric
tags. In the iTRAQ5 data set, an unfragmented tag, i.e reporter and
attached isobaric tag, is also included at MZ 145.
These objects are used to plot the reporter ions of interest in an
MSMS spectra (see "Spectrum2"
) as well as for
quantification (see quantify
).
iTRAQ4 iTRAQ5 iTRAQ8 iTRAQ9
iTRAQ4 iTRAQ5 iTRAQ8 iTRAQ9
Ross PL, Huang YN, Marchese JN, Williamson B, Parker K, Hattan S, Khainovski N, Pillai S, Dey S, Daniels S, Purkayastha S, Juhasz P, Martin S, Bartlet-Jones M, He F, Jacobson A, Pappin DJ. "Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents." Mol Cell Proteomics, 2004 Dec;3(12):1154-69. Epub 2004 Sep 22. PubMed PMID: 15385600.
TMT6
.
iTRAQ4 iTRAQ4[1:2] newReporter <- new("ReporterIons", description="an example", name="my reporter ions", reporterNames=c("myrep1","myrep2"), mz=c(121,122), col=c("red","blue"), width=0.05) newReporter
iTRAQ4 iTRAQ4[1:2] newReporter <- new("ReporterIons", description="an example", name="my reporter ions", reporterNames=c("myrep1","myrep2"), mz=c(121,122), col=c("red","blue"), width=0.05) newReporter
MSnExp
and MSnSet
data sets
itraqdata
is and example data sets is an iTRAQ 4-plex
experiment that has been run on an Orbitrap Velos instrument. It
includes identification data in the feature data slot obtain from the
Mascot search engine. It is a subset of an spike-in experiment where
proteins have spiked in an Erwinia background, as described in
Karp et al. (2010), Addressing accuracy and precision issues in iTRAQ quantitation, Mol Cell Proteomics. 2010 Sep;9(9):1885-97. Epub 2010 Apr 10. (PMID 20382981).
The spiked-in proteins in itradata
are BSA and ENO and are
present in relative abundances 1, 2.5, 5, 10 and 10, 5, 2.5, 1 in the
114, 115, 116 and 117 reporter tags.
The msnset
object is produced by running the quantify
method on the itraqdata
experimental data, as detailed in the
quantify
example. This example data set is used in the
MSnbase-demo vignette, available with vignette("MSnbase-demo",
package="MSnbase")
.
The msnset2
object is another example iTRAQ4 data that is used
to demonstrate features of the package, in particular the iPQF
feature aggregation method, described in iPQF
. It
corresponds to 11 proteins with spectra measurements from the original
data set described by Breitwieser et al. (2011) General
statistical modeling of data from protein relative expression isobaric
tags. J. Proteome Res., 10, 2758-2766.
itraqdata
itraqdata
data(itraqdata) itraqdata ## created by ## msnset <- quantify(itraqdata, method = "trap", reporters = iTRAQ4) data(msnset) msnset data(msnset2) msnset2
data(itraqdata) itraqdata ## created by ## msnset <- quantify(itraqdata, method = "trap", reporters = iTRAQ4) data(msnset) msnset data(msnset2) msnset2
Compares equality of all members of a list.
listOf(x, class, valid = TRUE)
listOf(x, class, valid = TRUE)
x |
A |
class |
A |
valid |
A |
TRUE
is all elements of x
inherit from
class
.
Laurent Gatto
listOf(list(), "foo") listOf(list("a", "b"), "character") listOf(list("a", 1), "character")
listOf(list(), "foo") listOf(list("a", "b"), "character") listOf(list("a", 1), "character")
Convert a vector
of characters to camel case by replacing
dots by captial letters.
makeCamelCase(x, prefix)
makeCamelCase(x, prefix)
x |
A |
prefix |
An optional |
A character
of same length as x
.
Laurent Gatto
nms <- c("aa.foo", "ab.bar") makeCamelCase(nms) makeCamelCase(nms, prefix = "x")
nms <- c("aa.foo", "ab.bar") makeCamelCase(nms) makeCamelCase(nms, prefix = "x")
These functions take an instance of class
"MSnSet"
and sets randomly selected values to
NA
.
makeNaData(object, nNA, pNA, exclude) makeNaData2(object, nRows, nNAs, exclude) whichNA(x)
makeNaData(object, nNA, pNA, exclude) makeNaData2(object, nRows, nNAs, exclude) whichNA(x)
object |
An instance of class |
nNA |
The absolute number of missing values to be assigned. |
pNA |
The proportion of missing values to be assignmed. |
exclude |
A |
nRows |
The number of rows for each set. |
nNAs |
The number of missing values for each set. |
x |
A |
makeNaData
randomly selects a number nNA
(or a
proportion pNA
) of cells in the expression matrix to be set
to NA
.
makeNaData2
will select length(nRows)
sets of rows
from object
, each with nRows[i]
rows respectively.
The first set will be assigned nNAs[1]
missing values, the
second nNAs[2]
, ... As opposed to makeNaData
, this
permits to control the number of NAs
per rows.
The whichNA
can be used to extract the indices
of the missing values, as illustrated in the example.
An instance of class MSnSet
, as object
, but
with the appropriate number/proportion of missing values. The
returned object has an additional feature meta-data columns,
nNA
Laurent Gatto
## Example 1 library(pRolocdata) data(dunkley2006) sum(is.na(dunkley2006)) dunkleyNA <- makeNaData(dunkley2006, nNA = 150) processingData(dunkleyNA) sum(is.na(dunkleyNA)) table(fData(dunkleyNA)$nNA) naIdx <- whichNA(dunkleyNA) head(naIdx) ## Example 2 dunkleyNA <- makeNaData(dunkley2006, nNA = 150, exclude = 1:10) processingData(dunkleyNA) table(fData(dunkleyNA)$nNA[1:10]) table(fData(dunkleyNA)$nNA) ## Example 3 nr <- rep(10, 5) na <- 1:5 x <- makeNaData2(dunkley2006[1:100, 1:5], nRows = nr, nNAs = na) processingData(x) (res <- table(fData(x)$nNA)) stopifnot(as.numeric(names(res)[-1]) == na) stopifnot(res[-1] == nr) ## Example 3 nr2 <- c(5, 12, 11, 8) na2 <- c(3, 8, 1, 4) x2 <- makeNaData2(dunkley2006[1:100, 1:10], nRows = nr2, nNAs = na2) processingData(x2) (res2 <- table(fData(x2)$nNA)) stopifnot(as.numeric(names(res2)[-1]) == sort(na2)) stopifnot(res2[-1] == nr2[order(na2)]) ## Example 5 nr3 <- c(5, 12, 11, 8) na3 <- c(3, 8, 1, 3) x3 <- makeNaData2(dunkley2006[1:100, 1:10], nRows = nr3, nNAs = na3) processingData(x3) (res3 <- table(fData(x3)$nNA))
## Example 1 library(pRolocdata) data(dunkley2006) sum(is.na(dunkley2006)) dunkleyNA <- makeNaData(dunkley2006, nNA = 150) processingData(dunkleyNA) sum(is.na(dunkleyNA)) table(fData(dunkleyNA)$nNA) naIdx <- whichNA(dunkleyNA) head(naIdx) ## Example 2 dunkleyNA <- makeNaData(dunkley2006, nNA = 150, exclude = 1:10) processingData(dunkleyNA) table(fData(dunkleyNA)$nNA[1:10]) table(fData(dunkleyNA)$nNA) ## Example 3 nr <- rep(10, 5) na <- 1:5 x <- makeNaData2(dunkley2006[1:100, 1:5], nRows = nr, nNAs = na) processingData(x) (res <- table(fData(x)$nNA)) stopifnot(as.numeric(names(res)[-1]) == na) stopifnot(res[-1] == nr) ## Example 3 nr2 <- c(5, 12, 11, 8) na2 <- c(3, 8, 1, 4) x2 <- makeNaData2(dunkley2006[1:100, 1:10], nRows = nr2, nNAs = na2) processingData(x2) (res2 <- table(fData(x2)$nNA)) stopifnot(as.numeric(names(res2)[-1]) == sort(na2)) stopifnot(res2[-1] == nr2[order(na2)]) ## Example 5 nr3 <- c(5, 12, 11, 8) na3 <- c(3, 8, 1, 3) x3 <- makeNaData2(dunkley2006[1:100, 1:10], nRows = nr3, nNAs = na3) processingData(x3) (res3 <- table(fData(x3)$nNA))
The MChromatograms
class allows to store
Chromatogram()
objects in a matrix
-like
two-dimensional structure.
MChromatograms(data, phenoData, featureData, ...) ## S4 method for signature 'MChromatograms' show(object) ## S4 method for signature 'MChromatograms,ANY,ANY,ANY' x[i, j, drop = FALSE] ## S4 replacement method for signature 'MChromatograms,ANY,ANY,ANY' x[i, j] <- value ## S4 method for signature 'MChromatograms,ANY' plot( x, col = "#00000060", lty = 1, type = "l", xlab = "retention time", ylab = "intensity", main = NULL, ... ) ## S4 method for signature 'MChromatograms' phenoData(object) ## S4 method for signature 'MChromatograms' pData(object) ## S4 replacement method for signature 'MChromatograms,data.frame' pData(object) <- value ## S4 method for signature 'MChromatograms' x$name ## S4 replacement method for signature 'MChromatograms' x$name <- value ## S4 replacement method for signature 'MChromatograms,ANY' colnames(x) <- value ## S4 method for signature 'MChromatograms' sampleNames(object) ## S4 replacement method for signature 'MChromatograms,ANY' sampleNames(object) <- value ## S4 method for signature 'MChromatograms' isEmpty(x) ## S4 method for signature 'MChromatograms' featureNames(object) ## S4 replacement method for signature 'MChromatograms' featureNames(object) <- value ## S4 method for signature 'MChromatograms' featureData(object) ## S4 replacement method for signature 'MChromatograms,ANY' featureData(object) <- value ## S4 method for signature 'MChromatograms' fData(object) ## S4 replacement method for signature 'MChromatograms,ANY' fData(object) <- value ## S4 method for signature 'MChromatograms' fvarLabels(object) ## S4 replacement method for signature 'MChromatograms' rownames(x) <- value ## S4 method for signature 'MChromatograms' precursorMz(object) ## S4 method for signature 'MChromatograms' productMz(object) ## S4 method for signature 'MChromatograms' mz(object) ## S4 method for signature 'MChromatograms' polarity(object) ## S4 method for signature 'MChromatograms' bin(x, binSize = 0.5, breaks = numeric(), fun = max) ## S4 method for signature 'MChromatograms' clean(object, all = FALSE, na.rm = FALSE) ## S4 method for signature 'MChromatograms' normalize(object, method = c("max", "sum")) ## S4 method for signature 'MChromatograms' filterIntensity(object, intensity = 0, ...) ## S4 method for signature 'MChromatograms,Chromatogram' alignRt(x, y, method = c("closest", "approx"), ...) ## S4 method for signature 'MChromatograms' c(x, ...) ## S4 method for signature 'MChromatograms,missing' compareChromatograms( x, y, ALIGNFUN = alignRt, ALIGNFUNARGS = list(), FUN = cor, FUNARGS = list(use = "pairwise.complete.obs"), ... ) ## S4 method for signature 'MChromatograms,MChromatograms' compareChromatograms( x, y, ALIGNFUN = alignRt, ALIGNFUNARGS = list(), FUN = cor, FUNARGS = list(use = "pairwise.complete.obs"), ... ) ## S4 method for signature 'MChromatograms' transformIntensity(object, FUN = identity)
MChromatograms(data, phenoData, featureData, ...) ## S4 method for signature 'MChromatograms' show(object) ## S4 method for signature 'MChromatograms,ANY,ANY,ANY' x[i, j, drop = FALSE] ## S4 replacement method for signature 'MChromatograms,ANY,ANY,ANY' x[i, j] <- value ## S4 method for signature 'MChromatograms,ANY' plot( x, col = "#00000060", lty = 1, type = "l", xlab = "retention time", ylab = "intensity", main = NULL, ... ) ## S4 method for signature 'MChromatograms' phenoData(object) ## S4 method for signature 'MChromatograms' pData(object) ## S4 replacement method for signature 'MChromatograms,data.frame' pData(object) <- value ## S4 method for signature 'MChromatograms' x$name ## S4 replacement method for signature 'MChromatograms' x$name <- value ## S4 replacement method for signature 'MChromatograms,ANY' colnames(x) <- value ## S4 method for signature 'MChromatograms' sampleNames(object) ## S4 replacement method for signature 'MChromatograms,ANY' sampleNames(object) <- value ## S4 method for signature 'MChromatograms' isEmpty(x) ## S4 method for signature 'MChromatograms' featureNames(object) ## S4 replacement method for signature 'MChromatograms' featureNames(object) <- value ## S4 method for signature 'MChromatograms' featureData(object) ## S4 replacement method for signature 'MChromatograms,ANY' featureData(object) <- value ## S4 method for signature 'MChromatograms' fData(object) ## S4 replacement method for signature 'MChromatograms,ANY' fData(object) <- value ## S4 method for signature 'MChromatograms' fvarLabels(object) ## S4 replacement method for signature 'MChromatograms' rownames(x) <- value ## S4 method for signature 'MChromatograms' precursorMz(object) ## S4 method for signature 'MChromatograms' productMz(object) ## S4 method for signature 'MChromatograms' mz(object) ## S4 method for signature 'MChromatograms' polarity(object) ## S4 method for signature 'MChromatograms' bin(x, binSize = 0.5, breaks = numeric(), fun = max) ## S4 method for signature 'MChromatograms' clean(object, all = FALSE, na.rm = FALSE) ## S4 method for signature 'MChromatograms' normalize(object, method = c("max", "sum")) ## S4 method for signature 'MChromatograms' filterIntensity(object, intensity = 0, ...) ## S4 method for signature 'MChromatograms,Chromatogram' alignRt(x, y, method = c("closest", "approx"), ...) ## S4 method for signature 'MChromatograms' c(x, ...) ## S4 method for signature 'MChromatograms,missing' compareChromatograms( x, y, ALIGNFUN = alignRt, ALIGNFUNARGS = list(), FUN = cor, FUNARGS = list(use = "pairwise.complete.obs"), ... ) ## S4 method for signature 'MChromatograms,MChromatograms' compareChromatograms( x, y, ALIGNFUN = alignRt, ALIGNFUNARGS = list(), FUN = cor, FUNARGS = list(use = "pairwise.complete.obs"), ... ) ## S4 method for signature 'MChromatograms' transformIntensity(object, FUN = identity)
data |
for |
phenoData |
for |
featureData |
for |
... |
for |
object |
a |
x |
for all methods: a |
i |
for |
j |
for |
drop |
for |
value |
for For `pData<-`: a `data.frame` with the number of rows matching the number of columns of `object`. For `colnames`: a `character` with the new column names. |
col |
for |
lty |
for |
type |
for |
xlab |
for |
ylab |
for |
main |
for |
name |
for |
binSize |
for |
breaks |
For |
fun |
for |
all |
for |
na.rm |
for |
method |
|
intensity |
for |
y |
for |
ALIGNFUN |
for |
ALIGNFUNARGS |
|
FUN |
for |
FUNARGS |
for |
The MChromatograms
class extends the base matrix
class
and hence allows to store Chromatogram()
objects in a
two-dimensional array. Each row is supposed to contain
Chromatogram
objects for one MS data slice with a common
m/z and rt range. Columns contain Chromatogram
objects from the
same sample.
For [
: the subset of the MChromatograms
object. If a
single element is extracted (e.g. if i
and j
are of length
1) a Chromatogram()
object is returned. Otherwise (if
drop = FALSE
, the default, is specified) a MChromatograms
object is returned. If drop = TRUE
is specified, the method
returns a list
of Chromatogram
objects.
For `phenoData`: an `AnnotatedDataFrame` representing the pheno data of the object. For `pData`: a `data.frame` representing the pheno data of the object. For `$`: the value of the corresponding column in the pheno data table of the object. For all other methods see function description.
MChromatograms
are returned by a chromatogram()
function from an MSnExp
or OnDiskMSnExp
. Alternatively, the MChromatograms
constructor function
can be used.
$
and $<-
: get or replace individual columns of the object's phenodata.
colnames
and colnames<-
: replace or set the column names of the
MChromatograms
object. Does also set the rownames
of the phenoData
.
fData
: return the feature data as a data.frame
.
fData<-
: replace the object's feature data by passing a data.frame
.
featureData
: return the feature data.
featureData<-
: replace the object's feature data.
featureNames
: returns the feature names of the MChromatograms
object.
featureNames<-
: set the feature names.
fvarLabels
: return the feature data variable names (i.e. column names).
isEmpty
: returns TRUE
if the MChromatograms
object or all of its
Chromatogram
objects is/are empty or contain only NA
intensities.
mz
: returns the m/z for each row of the MChromatograms
object
as a two-column matrix
(with columns "mzmin"
and "mzmax"
).
pData
: accesses the phenotypical description of the samples. Returns a
data.frame
.
pData<-
: replace the phenotype data.
phenoData
: accesses the phenotypical description of the samples. Returns
an AnnotatedDataFrame
object.
polarity
: returns the polarity of the scans/chromatograms: 1
, 0
or
-1
for positive, negative or unknown polarity.
precursorMz
: return the precursor m/z from the chromatograms. The
method returns a matrix
with 2 columns ("mzmin"
and "mzmax"
) and as
many rows as there are rows in the MChromatograms
object. Each row
contains the precursor m/z of the chromatograms in that row. An error is
thrown if the chromatograms within one row have different precursor m/z
values.
productMz
: return the product m/z from the chromatograms. The method
returns a matrix
with 2 columns ("mzmin"
and "mzmax"
) and as many
rows as there are rows in the MChromatograms
object. Each row contains
the product m/z of the chromatograms in that row. An error is thrown if
the chromatograms within one row have different product m/z values.
rownames<-
: replace the rownames (and featureNames) of the object.
[
subset (similar to a matrix
) by row and column (with parameters i
and j
).
[<-
replace individual or multiple elements. value
has to be either a
single Chromatogram
obhect or a list
of Chromatogram
objects.
c
concatenate (row-wise) MChromatogram
objects with the
same number of samples (columns).
filterIntensity
: filter each Chromatogram()
object within the
MChromatograms
removing data points with intensities below the user
provided threshold. If intensity
is a numeric
value, the returned
chromatogram will only contain data points with intensities > intensity
.
In addition it is possible to provide a function to perform the filtering.
This function is expected to take the input Chromatogram
(object
) and
to return a logical vector with the same length then there are data points
in object
with TRUE
for data points that should be kept and FALSE
for data points that should be removed. See the filterIntensity
documentation in the Chromatogram()
help page for details and examples.
alignRt
: align all chromatograms in an MChromatograms
object against
the chromatogram specified with y
. See documentation on alignRt
in the
Chromatogram()
help page.
bin
: aggregates intensity values of chromatograms in discrete bins
along the retention time axis. By default, individual Chromatogram
objects of one row are binned into the same bins. The function returns a
MChromatograms
object with binned chromatograms.
clean
: removes 0-intensity data points. Either all of them
(with all = TRUE
) or all except those adjacent to non-zero
intensities (all = FALSE
; default). See clean()
documentation for more
details and examples.
compareChromatograms
: calculates pairwise similarity score between
chromatograms in x
and y
and returns a similarity matrix with the
number of rows corresponding to the number of chromatograms in x
and
the number of columns to the number of chromatograms in y
.
If y
is missing, a pairwise comparison
is performed between all chromatograms in x
. See documentation on
compareChromatograms
in the Chromatogram()
help page for details.
normalize
, normalise
: normalises the intensities of a chromatogram by
dividing them either by the maximum intensity (method = "max"
) or total
intensity (method = "sum"
) of the chromatogram.
transformIntensity
: allows to manipulate the intensity values of all
chromatograms using a user provided function. See below for examples.
plot
: plots a MChromatograms
object. For each row in the object one
plot is created, i.e. all Chromatogram()
objects in the same row are
added to the same plot. If nrow(x) > 1
the plot area is split into
nrow(x)
sub-plots and the chromatograms of one row are plotted in
each.
Johannes Rainer
Chromatogram()] for the class representing chromatogram data. [chromatogram()] for the method to extract a
MChromatogramsobject from a
MSnExpor
OnDiskMSnExp object. [readSRMData()
for the function to read chromatographic data
of an SRM/MRM experiment.
## Creating some chromatogram objects to put them into a MChromatograms object ints <- abs(rnorm(25, sd = 200)) ch1 <- Chromatogram(rtime = 1:length(ints), ints) ints <- abs(rnorm(32, sd = 90)) ch2 <- Chromatogram(rtime = 1:length(ints), ints) ints <- abs(rnorm(19, sd = 120)) ch3 <- Chromatogram(rtime = 1:length(ints), ints) ints <- abs(rnorm(21, sd = 40)) ch4 <- Chromatogram(rtime = 1:length(ints), ints) ## Create a MChromatograms object with 2 rows and 2 columns chrs <- MChromatograms(list(ch1, ch2, ch3, ch4), nrow = 2) chrs ## Extract the first element from the second column. Extracting a single ## element always returns a Chromatogram object. chrs[1, 2] ## Extract the second row. Extracting a row or column (i.e. multiple elements ## returns by default a list of Chromatogram objects. chrs[2, ] ## Extract the second row with drop = FALSE, i.e. return a MChromatograms ## object. chrs[2, , drop = FALSE] ## Replace the first element. chrs[1, 1] <- ch3 chrs ## Add a pheno data. pd <- data.frame(name = c("first sample", "second sample"), idx = 1:2) pData(chrs) <- pd ## Column names correspond to the row names of the pheno data chrs ## Access a column within the pheno data chrs$name ## Access the m/z ratio for each row; this will be NA for the present ## object mz(chrs) ## Data visualization ## Create some random Chromatogram objects ints <- abs(rnorm(123, mean = 200, sd = 32)) ch1 <- Chromatogram(rtime = seq_along(ints), intensity = ints, mz = 231) ints <- abs(rnorm(122, mean = 250, sd = 43)) ch2 <- Chromatogram(rtime = seq_along(ints), intensity = ints, mz = 231) ints <- abs(rnorm(125, mean = 590, sd = 120)) ch3 <- Chromatogram(rtime = seq_along(ints), intensity = ints, mz = 542) ints <- abs(rnorm(124, mean = 1200, sd = 509)) ch4 <- Chromatogram(rtime = seq_along(ints), intensity = ints, mz = 542) ## Combine into a 2x2 MChromatograms object chrs <- MChromatograms(list(ch1, ch2, ch3, ch4), byrow = TRUE, ncol = 2) ## Plot the second row plot(chrs[2, , drop = FALSE]) ## Plot all chromatograms plot(chrs, col = c("#ff000080", "#00ff0080")) ## log2 transform intensities res <- transformIntensity(chrs, log2) plot(res)
## Creating some chromatogram objects to put them into a MChromatograms object ints <- abs(rnorm(25, sd = 200)) ch1 <- Chromatogram(rtime = 1:length(ints), ints) ints <- abs(rnorm(32, sd = 90)) ch2 <- Chromatogram(rtime = 1:length(ints), ints) ints <- abs(rnorm(19, sd = 120)) ch3 <- Chromatogram(rtime = 1:length(ints), ints) ints <- abs(rnorm(21, sd = 40)) ch4 <- Chromatogram(rtime = 1:length(ints), ints) ## Create a MChromatograms object with 2 rows and 2 columns chrs <- MChromatograms(list(ch1, ch2, ch3, ch4), nrow = 2) chrs ## Extract the first element from the second column. Extracting a single ## element always returns a Chromatogram object. chrs[1, 2] ## Extract the second row. Extracting a row or column (i.e. multiple elements ## returns by default a list of Chromatogram objects. chrs[2, ] ## Extract the second row with drop = FALSE, i.e. return a MChromatograms ## object. chrs[2, , drop = FALSE] ## Replace the first element. chrs[1, 1] <- ch3 chrs ## Add a pheno data. pd <- data.frame(name = c("first sample", "second sample"), idx = 1:2) pData(chrs) <- pd ## Column names correspond to the row names of the pheno data chrs ## Access a column within the pheno data chrs$name ## Access the m/z ratio for each row; this will be NA for the present ## object mz(chrs) ## Data visualization ## Create some random Chromatogram objects ints <- abs(rnorm(123, mean = 200, sd = 32)) ch1 <- Chromatogram(rtime = seq_along(ints), intensity = ints, mz = 231) ints <- abs(rnorm(122, mean = 250, sd = 43)) ch2 <- Chromatogram(rtime = seq_along(ints), intensity = ints, mz = 231) ints <- abs(rnorm(125, mean = 590, sd = 120)) ch3 <- Chromatogram(rtime = seq_along(ints), intensity = ints, mz = 542) ints <- abs(rnorm(124, mean = 1200, sd = 509)) ch4 <- Chromatogram(rtime = seq_along(ints), intensity = ints, mz = 542) ## Combine into a 2x2 MChromatograms object chrs <- MChromatograms(list(ch1, ch2, ch3, ch4), byrow = TRUE, ncol = 2) ## Plot the second row plot(chrs[2, , drop = FALSE]) ## Plot all chromatograms plot(chrs, col = c("#ff000080", "#00ff0080")) ## log2 transform intensities res <- transformIntensity(chrs, log2) plot(res)
Combine peaks from several spectra into a single spectrum. Intensity and
m/z values from the input spectra are aggregated into a single peak if
the difference between their m/z values is smaller than mzd
or smaller than
ppm
of their m/z. While mzd
can be used to group mass peaks with a single
fixed value, ppm
allows a m/z dependent mass peak grouping. Intensity
values of grouped mass peaks are aggregated with the intensityFun
, m/z
values by the mean, or intensity weighted mean if weighted = TRUE
.
meanMzInts( x, ..., intensityFun = base::mean, weighted = FALSE, main = 1L, mzd, ppm = 0, timeDomain = FALSE, unionPeaks = TRUE )
meanMzInts( x, ..., intensityFun = base::mean, weighted = FALSE, main = 1L, mzd, ppm = 0, timeDomain = FALSE, unionPeaks = TRUE )
x |
|
... |
additional parameters that are passed to |
intensityFun |
|
weighted |
|
main |
|
mzd |
|
ppm |
|
timeDomain |
|
unionPeaks |
|
For general merging of spectra, the mzd
and/or ppm
should be manually
specified based on the precision of the MS instrument. Peaks from spectra
with a difference in their m/z being smaller than mzd
or smaller than
ppm
of their m/z are grouped into the same final peak.
Some details for the combination of consecutive spectra of an LCMS run:
The m/z values of the same ion in consecutive scans (spectra) of a LCMS run
will not be identical. Assuming that this random variation is much smaller
than the resolution of the MS instrument (i.e. the difference between
m/z values within each single spectrum), m/z value groups are defined
across the spectra and those containing m/z values of the main
spectrum
are retained. The maximum allowed difference between m/z values for the
same ion is estimated as in estimateMzScattering()
. Alternatively it is
possible to define this maximal m/z difference with the mzd
parameter.
All m/z values with a difference smaller than this value are combined to
a m/z group.
Intensities and m/z values falling within each of these m/z groups are
aggregated using the intensity_fun
and mz_fun
, respectively. It is
highly likely that all QTOF profile data is collected with a timing circuit
that collects data points with regular intervals of time that are then later
converted into m/z values based on the relationship t = k * sqrt(m/z)
. The
m/z scale is thus non-linear and the m/z scattering (which is in fact caused
by small variations in the time circuit) will thus be different in the lower
and upper m/z scale. m/z-intensity pairs from consecutive scans to be
combined are therefore defined by default on the square root of the m/z
values. With timeDomain = FALSE
, the actual m/z values will be used.
Spectrum
with m/z and intensity values representing the aggregated values
across the provided spectra. The returned spectrum contains the union of
all peaks from all spectra (if unionPeaks = TRUE
), or the same number of
m/z and intensity pairs than the spectrum with index main
in x
(if
unionPeaks = FALSE
. All other spectrum data (such as retention time etc)
is taken from the main spectrum.
This allows e.g. to combine profile-mode spectra of consecutive scans into
the values for the main spectrum. This can improve centroiding of
profile-mode data by increasing the signal-to-noise ratio and is used in the
combineSpectraMovingWindow()
function.
Johannes Rainer, Sigurdur Smarason
estimateMzScattering()
for a function to estimate m/z scattering
in consecutive scans.
estimateMzResolution()
for a function estimating the m/z resolution of
a spectrum.
combineSpectraMovingWindow()
for the function to combine consecutive
spectra of an MSnExp
object using a moving window approach.
Other spectra combination functions:
consensusSpectrum()
library(MSnbase) ## Create 3 example profile-mode spectra with a resolution of 0.1 and small ## random variations on these m/z values on consecutive scans. set.seed(123) mzs <- seq(1, 20, 0.1) ints1 <- abs(rnorm(length(mzs), 10)) ints1[11:20] <- c(15, 30, 90, 200, 500, 300, 100, 70, 40, 20) # add peak ints2 <- abs(rnorm(length(mzs), 10)) ints2[11:20] <- c(15, 30, 60, 120, 300, 200, 90, 60, 30, 23) ints3 <- abs(rnorm(length(mzs), 10)) ints3[11:20] <- c(13, 20, 50, 100, 200, 100, 80, 40, 30, 20) ## Create the spectra. sp1 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.01), intensity = ints1) sp2 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.01), intensity = ints2) sp3 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.009), intensity = ints3) ## Combine the spectra sp_agg <- meanMzInts(list(sp1, sp2, sp3)) ## Plot the spectra before and after combining par(mfrow = c(2, 1), mar = c(4.3, 4, 1, 1)) plot(mz(sp1), intensity(sp1), xlim = range(mzs[5:25]), type = "h", col = "red") points(mz(sp2), intensity(sp2), type = "h", col = "green") points(mz(sp3), intensity(sp3), type = "h", col = "blue") plot(mz(sp_agg), intensity(sp_agg), xlim = range(mzs[5:25]), type = "h", col = "black")
library(MSnbase) ## Create 3 example profile-mode spectra with a resolution of 0.1 and small ## random variations on these m/z values on consecutive scans. set.seed(123) mzs <- seq(1, 20, 0.1) ints1 <- abs(rnorm(length(mzs), 10)) ints1[11:20] <- c(15, 30, 90, 200, 500, 300, 100, 70, 40, 20) # add peak ints2 <- abs(rnorm(length(mzs), 10)) ints2[11:20] <- c(15, 30, 60, 120, 300, 200, 90, 60, 30, 23) ints3 <- abs(rnorm(length(mzs), 10)) ints3[11:20] <- c(13, 20, 50, 100, 200, 100, 80, 40, 30, 20) ## Create the spectra. sp1 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.01), intensity = ints1) sp2 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.01), intensity = ints2) sp3 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.009), intensity = ints3) ## Combine the spectra sp_agg <- meanMzInts(list(sp1, sp2, sp3)) ## Plot the spectra before and after combining par(mfrow = c(2, 1), mar = c(4.3, 4, 1, 1)) plot(mz(sp1), intensity(sp1), xlim = range(mzs[5:25]), type = "h", col = "red") points(mz(sp2), intensity(sp2), type = "h", col = "green") points(mz(sp3), intensity(sp3), type = "h", col = "blue") plot(mz(sp_agg), intensity(sp_agg), xlim = range(mzs[5:25]), type = "h", col = "black")
The Minimum Information About a Proteomics Experiment. The current implementation is based on the MIAPE-MS 2.4 document.
title
:Object of class character
containing a
single-sentence experiment title.
abstract
:Object of class character
containing an abstract describing the experiment.
url
:Object of class character
containing a
URL for the experiment.
pubMedIds
:Object of class character
listing
strings of PubMed identifiers of papers relevant to the dataset.
samples
:Object of class list
containing
information about the samples.
preprocessing
:Object of class list
containing
information about the pre-processing steps used on the raw data from
this experiment.
other
:Object of class list
containing other
information for which none of the above slots applies.
dateStamp
:Object of class character
, giving
the date on which the work described was initiated; given in the
standard 'YYYY-MM-DD' format (with hyphens).
name
:Object of class character
containing the
name of the (stable) primary contact person for this data set;
this could be the experimenter, lab head, line manager, ...
lab
:Object of class character
containing the
laboratory where the experiment was conducted.
contact
:Object of class character
containing
contact information for lab and/or experimenter.
email
:Object of class character
containing
tmail contact information for the primary contact person (see
name
above).
instrumentModel
:Object of class character
indicating the model of the mass spectrometer used to generate the
data.
instrumentManufacturer
:Object of class
character
indicating the manufacturing company of the mass
spectrometer.
instrumentCustomisations
:Object of class
character
describing any significant (i.e. affecting
behaviour) deviations from the manufacturer's specification for the
mass spectrometer.
softwareName
:Object of class character
with
the instrument management and data analysis package(s) name(s).
softwareVersion
:Object of class character
with
the instrument management and data analysis package(s) version(s).
switchingCriteria
:Object of class character
describing the list of conditions that cause the switch from survey or
zoom mode (MS1) to or tandem mode (MSn where n > 1); e.g. 'parent
ion” mass lists, neutral loss criteria and so on [applied for
tandem MS only].
isolationWidth
:Object of class numeric
describing, for tandem instruments, the total width (i.e. not half
for plus-or-minus) of the gate applied around a selected precursor ion
m/z, provided for all levels or by MS level.
parameterFile
:Object of class character
giving
the location and name under which the mass spectrometer's parameter
settings file for the run is stored, if available. Ideally this should
be a URI+filename, or most preferably an LSID, where feasible.
ionSource
:Object of class character
describing
the ion source (ESI, MALDI, ...).
ionSourceDetails
:Object of class character
describing
the relevant details about the ion source. See MIAPE-MI docuement
for more details.
analyser
:Object of class character
describing
the analyzer type (Quadrupole, time-of-flight, ion trap, ...).
analyserDetails
:Object of class character
describing
the relevant details about the analyzer. See MIAPE-MI document
for more details.
collisionGas
:Object of class character
describing the composition of the gas used to fragment ions in the
collision cell.
collisionPressure
:Object of class numeric
providing the pressure (in bars) of the collision gas.
collisionEnergy
:Object of class character
specifying for the process of imparting a particular impetus to
ions with a given m/z value, as they travel into the collision
cell for fragmentation. This could be a global figure (e.g. for tandem
TOF's), or a complex function; for example a gradient (stepped or
continuous) of m/z values (for quads) or activation frequencies (for
traps) with associated collision energies (given in eV). Note that
collision energies are also provided for individual
"Spectrum2"
instances, and is the preferred
way of accessing this data.
detectorType
:Object of class character
describing the type of detector used in the machine (microchannel
plate, channeltron, ...).
detectorSensitivity
:Object of class character
giving and appropriate measure of the sensitivity of the described
detector (e.g. applied voltage).
The following methods as in "MIAME"
:
abstract(MIAPE)
:An accessor function for
abstract
.
expinfo(MIAPE)
:An accessor function for name
,
lab
, contact
, title
, and url
.
notes(MIAPE), notes(MIAPE) <- value
:Accessor
functions for other
. notes(MIAME) <- character
appends character to notes; use notes(MIAPE) <- list
to replace the notes entirely.
otherInfo(MIAPE)
:An accessor function for
other
.
preproc(MIAPE)
:An accessor function for
preprocessing
.
pubMedIds(MIAPE), pubMedIds(MIAME) <- value
:Accessor
function for pubMedIds
.
expemail(MIAPE)
:An accessor function for
email
slot.
exptitle(MIAPE)
:An accessor function for
title
slot.
analyzer(MIAPE)
: An accessor function for
analyser
slot. analyser(MIAPE)
is also available.
analyzerDetails(MIAPE)
: An accessor function for
analyserDetails
slot. analyserDetails
is also
available.
detectorType(MIAPE)
: An accessor function for
detectorType
slot.
ionSource(MIAPE)
: An accessor function for
ionSource
slot.
ionSourceDetails(MIAPE)
: An accessor function for
ionSourceDetails
slot.
instrumentModel(MIAPE)
: An accessor function for
instrumentModel
slot.
instrumentManufacturer(MIAPE)
: An accessor function for
instrumentManufacturer
slot.
instrumentCustomisations(MIAPE)
: An accessor function for
instrumentCustomisations
slot.
as(,"MIAME")
:Coerce the object from MIAPE
to
MIAME
class. Used when converting an MSnSet
into an
ExpressionSet
.
MIAPE-specific methods, including MIAPE-MS meta-data:
show(MIAPE)
:Displays the experiment data.
msInfo(MIAPE)
:Displays 'MIAPE-MS' information.
Class "MIAxE"
, directly.
Class "Versioned"
, by class "MIAxE", distance 2.
Laurent Gatto
About MIAPE: http://www.psidev.info/index.php?q=node/91, and references therein, especially 'Guidelines for reporting the use of mass spectrometry in proteomics', Nature Biotechnology 26, 860-861 (2008).
There is a need for adequate handling of missing value impuation in
quantitative proteomics. Before developing a framework to handle
missing data imputation optimally, we propose a set of visualisation
tools. This document serves as an internal notebook for current
progress and ideas that will eventually materialise in exported
functionality in the MSnbase
package.
The explore the structure of missing values, we propose to
1. Explore missing values in the frame of the experimental design. The
imageNA2
function offers such a simple visualisation. It
is currently limited to 2-group designs/comparisons. In case of time
course experiments or sub-cellular fractionation along a density
gradient, we propose to split the time/gradient into 2 groups
(early/late, top/bottom) as a first approximation.
2. Explore the proportion of missing values in each group.
3. Explore the total and group-wise feature intensity distributions.
The existing plotNA
function illustrates the
completeness/missingness of the data.
Laurent Gatto Samuel Wieczorek and Thomas Burger
## Other suggestions library("pRolocdata") library("pRoloc") data(dunkley2006) set.seed(1) nax <- makeNaData(dunkley2006, pNA = 0.10) pcol <- factor(ifelse(dunkley2006$fraction <= 5, "A", "B")) sel1 <- pcol == "A" ## missing values in each sample barplot(colSums(is.na(nax)), col = pcol) ## table of missing values in proteins par(mfrow = c(3, 1)) barplot(table(rowSums(is.na(nax))), main = "All") barplot(table(rowSums(is.na(nax)[sel1,])), main = "Group A") barplot(table(rowSums(is.na(nax)[!sel1,])), main = "Group B") fData(nax)$nNA1 <- rowSums(is.na(nax)[, sel1]) fData(nax)$nNA2 <- rowSums(is.na(nax)[, !sel1]) fData(nax)$nNA <- rowSums(is.na(nax)) o <- MSnbase:::imageNA2(nax, pcol) plot((fData(nax)$nNA1 - fData(nax)$nNA2)[o], type = "l") grid() plot(sort(fData(nax)$nNA1 - fData(nax)$nNA2), type = "l") grid() o2 <- order(fData(nax)$nNA1 - fData(nax)$nNA2) MSnbase:::imageNA2(nax, pcol, Rowv=o2) layout(matrix(c(rep(1, 10), rep(2, 5)), nc = 3)) MSnbase:::imageNA2(nax, pcol, Rowv=o2) plot((fData(nax)$nNA1 - fData(nax)$nNA)[o2], type = "l", col = "red", ylim = c(-9, 9), ylab = "") lines((fData(nax)$nNA - fData(nax)$nNA2)[o2], col = "steelblue") lines((fData(nax)$nNA1 - fData(nax)$nNA2)[o2], type = "l", lwd = 2)
## Other suggestions library("pRolocdata") library("pRoloc") data(dunkley2006) set.seed(1) nax <- makeNaData(dunkley2006, pNA = 0.10) pcol <- factor(ifelse(dunkley2006$fraction <= 5, "A", "B")) sel1 <- pcol == "A" ## missing values in each sample barplot(colSums(is.na(nax)), col = pcol) ## table of missing values in proteins par(mfrow = c(3, 1)) barplot(table(rowSums(is.na(nax))), main = "All") barplot(table(rowSums(is.na(nax)[sel1,])), main = "Group A") barplot(table(rowSums(is.na(nax)[!sel1,])), main = "Group B") fData(nax)$nNA1 <- rowSums(is.na(nax)[, sel1]) fData(nax)$nNA2 <- rowSums(is.na(nax)[, !sel1]) fData(nax)$nNA <- rowSums(is.na(nax)) o <- MSnbase:::imageNA2(nax, pcol) plot((fData(nax)$nNA1 - fData(nax)$nNA2)[o], type = "l") grid() plot(sort(fData(nax)$nNA1 - fData(nax)$nNA2), type = "l") grid() o2 <- order(fData(nax)$nNA1 - fData(nax)$nNA2) MSnbase:::imageNA2(nax, pcol, Rowv=o2) layout(matrix(c(rep(1, 10), rep(2, 5)), nc = 3)) MSnbase:::imageNA2(nax, pcol, Rowv=o2) plot((fData(nax)$nNA1 - fData(nax)$nNA)[o2], type = "l", col = "red", ylim = c(-9, 9), ylab = "") lines((fData(nax)$nNA - fData(nax)$nNA2)[o2], col = "steelblue") lines((fData(nax)$nNA1 - fData(nax)$nNA2)[o2], type = "l", lwd = 2)
MSmap
A class to store mass spectrometry data maps, i.e intensities collected along the M/Z and retention time space during a mass spectrometry acquisition.
Objects can be created with the MSmap
constructor. The
constructor has the following arguments:
An object created by mzR::openMSfile
or an
instance of class OnDiskMSnExp
. If the latter
contains data from multiple files, a warning will be issued and
the first one will be used.
A numeric
of length 1 defining the lower bound
of the M/Z range of the MS map.
A numeric
of length 1 defining the upper bound
of the M/Z range of the MS map.
The resolution along the M/Z range.
An optional data.frame
as produced by
mzR::header(object)
. If missing, will be computer within
the function. Ignored when object
is an
OnDiskMSnExp
.
Set 0 intensities to NA
. This can be used
to clarify the 3 dimensional plot produce by plot3D
.
call
:Object of class "call"
- the call used to
generate the instance.
map
:Object of class "matrix"
containing the
actual MS map.
mz
:Object of class "numeric"
with the M/Z
sampling bins.
res
:Object of class "numeric"
storing the the M/Z
resolution used to create the map.
rt
:Object of class "numeric"
with the
retention times of the map spectra.
ms
:Object of class "numeric"
with the MS
levels of the spectra.
t
:Object of class "logical"
indicating if the
instance has been transposed.
filename
:Object of class "character"
specifying the filename of the original raw MS data.
signature(from = "MSmap", to = "data.frame")
:
convert the MSmap
instance in a data.frame
. Useful
for plotting with lattice
or ggplot2
.
signature(object = "MSmap")
: returns the raw
data filename.
signature(object = "MSmap")
: returns the MS
level of the map spectra.
signature(object = "MSmap")
: returns the actual
map matrix
.
signature(object = "MSmap", ...)
: returns the M/Z values
of the map. Additional arguments are currently ignored.
signature(object = "MSmap", ...)
: returns retention
time values of the map. Additional arguments are currently ignored.
signature(object = "MSmap")
: returns the
resolution with which the sample along the M/Z range was done.
signature(x = "MSmap")
: returns the dimensions of
the map. ncol
and nrow
return the number of columns
and rows respectively.
signature(x = "MSmap")
: transposes the map.
signature(object = "MSmap")
: prints a summary of
the map.
signature(x = "MSmap", allTicks = "logical")
:
produces an image of the map using lattice::levelplot
. By
default, allTicks
is TRUE
and all M/Z and retention
times ticks of drawn. If set to FALSE
, only 10 ticks in
each dimension are plotted.
signature(object = "MSmap", rgl = "logical")
:
produces an three dimensional view of the map using
lattice::cloude(..., type = "h")
. If rgl
is
TRUE
, the map is visualised on a rgl
device and can
be rotated with the mouse.
Laurent Gatto
## Not run: ## downloads the data library("rpx") px1 <- PXDataset("PXD000001") (i <- grep("TMT.+mzML", pxfiles(px1), value = TRUE)) mzf <- pxget(px1, i) ## Using an mzRpwiz object ## reads the data ms <- openMSfile(mzf) hd <- header(ms) ## a set of spectra of interest: MS1 spectra eluted ## between 30 and 35 minutes retention time ms1 <- which(hd$msLevel == 1) rtsel <- hd$retentionTime[ms1] / 60 > 30 & hd$retentionTime[ms1] / 60 < 35 ## the map M <- MSmap(ms, ms1[rtsel], 521, 523, .005, hd) plot(M, aspect = 1, allTicks = FALSE) plot3D(M) if (require("rgl") & interactive()) plot3D(M, rgl = TRUE) ## With some MS2 spectra i <- ms1[which(rtsel)][1] j <- ms1[which(rtsel)][2] M2 <- MSmap(ms, i:j, 100, 1000, 1, hd) plot3D(M2) ## Using an OnDiskMSnExp object and accessors msn <- readMSData(mzf, mode = "onDisk") ## a set of spectra of interest: MS1 spectra eluted ## between 30 and 35 minutes retention time ms1 <- which(msLevel(msn) == 1) rtsel <- rtime(msn)[ms1] / 60 > 30 & rtime(msn)[ms1] / 60 < 35 ## the map M3 <- MSmap(msn, ms1[rtsel], 521, 523, .005) plot(M3, aspect = 1, allTicks = FALSE) ## With some MS2 spectra i <- ms1[which(rtsel)][1] j <- ms1[which(rtsel)][2] M4 <- MSmap(msn, i:j, 100, 1000, 1) plot3D(M4) ## End(Not run)
## Not run: ## downloads the data library("rpx") px1 <- PXDataset("PXD000001") (i <- grep("TMT.+mzML", pxfiles(px1), value = TRUE)) mzf <- pxget(px1, i) ## Using an mzRpwiz object ## reads the data ms <- openMSfile(mzf) hd <- header(ms) ## a set of spectra of interest: MS1 spectra eluted ## between 30 and 35 minutes retention time ms1 <- which(hd$msLevel == 1) rtsel <- hd$retentionTime[ms1] / 60 > 30 & hd$retentionTime[ms1] / 60 < 35 ## the map M <- MSmap(ms, ms1[rtsel], 521, 523, .005, hd) plot(M, aspect = 1, allTicks = FALSE) plot3D(M) if (require("rgl") & interactive()) plot3D(M, rgl = TRUE) ## With some MS2 spectra i <- ms1[which(rtsel)][1] j <- ms1[which(rtsel)][2] M2 <- MSmap(ms, i:j, 100, 1000, 1, hd) plot3D(M2) ## Using an OnDiskMSnExp object and accessors msn <- readMSData(mzf, mode = "onDisk") ## a set of spectra of interest: MS1 spectra eluted ## between 30 and 35 minutes retention time ms1 <- which(msLevel(msn) == 1) rtsel <- rtime(msn)[ms1] / 60 > 30 & rtime(msn)[ms1] / 60 < 35 ## the map M3 <- MSmap(msn, ms1[rtsel], 521, 523, .005) plot(M3, aspect = 1, allTicks = FALSE) ## With some MS2 spectra i <- ms1[which(rtsel)][1] j <- ms1[which(rtsel)][2] M4 <- MSmap(msn, i:j, 100, 1000, 1) plot3D(M4) ## End(Not run)
MSnbase defined a few options globally using the standard R
options mechanism. The current values of these options can be
queried with MSnbaseOptions
. The options are:
verbose
: defines a session-wide verbosity flag, that
is used if the verbose
argument in individual functions is
not set.
PARALLEL_THRESH
: defines the minimum number of spectra per file
necessary before using parallel processing.
fastLoad
: logical(1)
. If TRUE
performs faster data loading for all
methods of OnDiskMSnExp that load data from the original files (such as
spectrapply()
). Users experiencing data I/O errors (observed mostly
on macOS systems) should set this option to FALSE
.
MSnbaseOptions() isMSnbaseVerbose() setMSnbaseVerbose(opt) setMSnbaseParallelThresh(opt = 1000) setMSnbaseFastLoad(opt = TRUE) isMSnbaseFastLoad()
MSnbaseOptions() isMSnbaseVerbose() setMSnbaseVerbose(opt) setMSnbaseParallelThresh(opt = 1000) setMSnbaseFastLoad(opt = TRUE) isMSnbaseFastLoad()
opt |
The value of the new option |
isMSnbaseVerbose
is one wrapper for the verbosity flag,
also available through options("MSnbase")$verbose
.
There are also setters to set options individually. When run without argument, the verbosity setter inverts the current value of the option.
A list
of MSnbase options and the single option
values for the individual accessors.
The MSnExp
class encapsulates data and meta-data for mass
spectrometry experiments, as described in the slots
section. Several data files (currently in mzXML
) can be loaded
together with the function readMSData
.
This class extends the virtual "pSet"
class.
In version 1.19.12, the polarity
slot had been added to the
"Spectrum"
class (previously in
"Spectrum1"
). Hence, "MSnExp"
objects
created prior to this change will not be valid anymore, since all MS2
spectra will be missing the polarity
slot. Object can be
appropriately updated using the updateObject
method.
The feature variables in the feature data slot will depend on the
file. See also the documentation in the mzR
package that parses
the raw data files and produces these data.
Objects can be created by calls of the form
new("MSnExp",...)
. However, it is preferred to use the
readMSData
function that will read raw mass
spectrometry data to generate a valid "MSnExp"
instance.
assayData
:Object of class "environment"
containing the MS spectra (see "Spectrum1"
and "Spectrum2"
).
Slot is inherited from "pSet"
.
phenoData
:Object of class
"AnnotatedDataFrame"
containing
experimenter-supplied variables describing sample (i.e the
individual tags for an labelled MS experiment)
See phenoData
for more details.
Slot is inherited from "pSet"
.
featureData
:Object of class
"AnnotatedDataFrame"
containing variables
describing features (spectra in our case), e.g. identificaiton data,
peptide sequence, identification score,... (inherited from
"eSet"
). See featureData
for
more details.
Slot is inherited from "pSet"
.
experimentData
:Object of class
"MIAPE"
, containing details of experimental
methods. See experimentData
for more details.
Slot is inherited from "pSet"
.
protocolData
:Object of class
"AnnotatedDataFrame"
containing
equipment-generated variables (inherited from
"eSet"
). See protocolData
for
more details.
Slot is inherited from "pSet"
.
processingData
:Object of class
"MSnProcess"
that records all processing.
Slot is inherited from "pSet"
.
.__classVersion__
:Object of class
"Versions"
describing the versions of R,
the Biobase package, "pSet"
and
MSnExp
of the current instance.
Slot is inherited from "pSet"
.
Intended for developer use and debugging (inherited from
"eSet"
).
Class "pSet"
, directly.
Class "VersionedBiobase"
, by class "pSet", distance 2.
Class "Versioned"
, by class "pSet", distance 3.
See the "pSet"
class for documentation on
accessors inherited from pSet
, subsetting and general attribute
accession.
signature(object = "MSnExp")
: Bins spectra.
See bin
documentation for more details and examples.
signature(object = "MSnExp")
: Removes unused 0
intensity data points. See clean
documentation
for more details and examples.
signature(x = "Spectrum",
y = "missing")
: Compares spectra. See
compareSpectra
documentation for more details and
examples.
signature(object = "MSnExp", prec =
"numeric")
: extracts spectra with precursor MZ value equal to
prec
and returns an object of class 'MSnExp'.
See extractPrecSpectra
documentation for more
details and examples.
signature(object = "MSnExp")
: Performs the peak
picking to generate centroided spectra. Parameter msLevel.
allows to restrict peak picking to spectra of certain MS level(s).
See pickPeaks
documentation for more
details and examples.
signature(object = "MSnExp")
: Estimates
the noise in all profile spectra of object
. See
estimateNoise
documentation for more details and
examples.
signature(x = "MSnExp", y = "missing")
: Plots
the MSnExp
instance. See plot.MSnExp
documentation for more details.
signature(object = "MSnExp", ...)
:
Plots retention time against precursor MZ for MSnExp
instances. See plot2d
documentation for more
details.
signature(object = "MSnExp", ...)
:
Plots the density of parameters of interest.
instances. See plotDensity
documentation for more
details.
signature(object = "MSnExp", ...)
:
Plots a histogram of the m/z difference betwee all of the highest
peaks of all MS2 spectra of an experiment.
See plotMzDelta
documentation for more details.
signature(object = "MSnExp")
: Performs
quantification for all the MS2 spectra of the MSnExp
instance. See quantify
documentation for more
details. Also for OnDiskMSnExp
objects.
signature(object = "MSnExp")
: Removes
peaks lower that a threshold t
. See
removePeaks
documentation for more details and
examples.
signature(object = "MSnExp", ...)
:
Removes reporter ion peaks from all MS2 spectra of an
experiment. See removeReporters
documentation for
more details and examples.
signature(x = "MSnExp")
: Smooths spectra.
See smooth
documentation for more details and examples.
signature(object = "MSnExp", ...)
:
Adds identification data to an experiment.
See addIdentificationData
documentation for
more details and examples.
signature(object = "MSnExp", fcol =
"pepseq", keep = NULL)
: Removes non-identified features. See
removeNoId
documentation for more details and
examples.
signature(object = "MSnExp",
fcol = "nprot")
: Removes protein groups (or feature belong to
protein groups) with more than one member. The latter is defined
by extracting a feature variable (default is
"nprot"
). Also removes non-identified features.
signature(object = "MSnExp", ...)
:
Prints a summary that lists the percentage of identified features per
file (called coverage
).
signature(object = "MSnExp")
: Displays object
content as text.
signature(object = "MSnExp", ...)
:
Returns the isolation window offsets for the MS2 spectra. See
isolationWindow
in the mzR
package for details.
signature(object = "MSnExp")
: Trims the MZ
range of all the spectra of the MSnExp
instance. See
trimMz
documentation for more details and
examples.
isCentroided(object, k = 0.025, qtl = 0.9, verbose =
TRUE)
A heuristic assessing if the spectra in the
object
are in profile or centroided mode. The function
takes the qtl
th quantile top peaks, then calculates the
difference between adjacent M/Z value and returns TRUE
if
the first quartile is greater than k
. (See
MSnbase:::.isCentroided
for the code.) If verbose
(default), a table indicating mode for all MS levels is printed.
The function has been tuned to work for MS1 and MS2 spectra and data centroided using different peak picking algorithms, but false positives can occur. See https://github.com/lgatto/MSnbase/issues/131 for details. For whole experiments, where all MS1 and MS2 spectra are expected to be in the same, albeit possibly different modes, it is advised to assign the majority result for MS1 and MS2 spectra, rather than results for individual spectra. See an example below.
signature(object = "MSnExp", "data.frame")
:
Coerces the MSnExp
object to a four-column
data.frame
with columns "file"
(file index in
object
), "rt"
(retention time), "mz"
(m/z
values) and "i"
(intensity values).
signature(object = "MSnExp", "MSpectra")
:
Coerces the MSnExp
object to a MSpectra
object with all feature annotations added as metadata columns
(mcols
).
Clarifications regarding scan/acquisition numbers and indices:
A spectrumId
(or spectrumID
) is a vendor specific
field in the mzML file that contains some information about the
run/spectrum, e.g.: controllerType=0 controllerNumber=1
scan=5281 file=2
.
acquisitionNum
is a more a less sanitize spectrum id
generated from the spectrumId
field by mzR
(see
https://github.com/sneumann/mzR/blob/master/src/pwiz/data/msdata/MSData.cpp#L552-L580).
scanIndex
is the mzR
generated sequence number of the
spectrum in the raw file (which doesn't have to be the same as the
acquisitionNum
).
See also this issue: https://github.com/lgatto/MSnbase/issues/525.
Filtering and subsetting functions:
signature(object = "MSnExp", rt = "numeric",
msLevel. = "numeric")
: Retains MS spectra of level
msLevel.
with a retention times within rt[1]
and
rt[2]
.
signature(object = "MSnExp", msLevel. =
"numeric")
: Retains MS spectra of level msLevel.
.
signature(object = "MSnExp", polarity. =
"numeric")
: Retains MS spectra of polarity polarity.
.
signature(object = "MSnExp", mz = "numeric",
msLevel. = "numeric")
. See filterMz
for
details.
signature(object = "MSnExp", file)
: Retains
MS data of files matching the file index or file name provided
with parameter file
.
signature(object = "MSnExp")
:
Remove empty spectra from object
(see isEmpty
).
signature(object = "MSnExp",
acquisitionNum = "numeric")
: Retain parent (e.g. MS1) and
children scans (e.g. MS2) of acquisitionNum
. See
OnDiskMSnExp
for an example.
signature(object = "MSnExp", f =
"factor")
: split a MSnExp
object by file into a
list
of MSnExp
objects given the grouping in
factor
f
.
signature(object = "MSnExp", mz, ppm
= 10)
: retain spectra with a precursor m/z equal or similar to
the one defined with parameter mz
. Parameter ppm
allows to define an accepted difference between the provided m/z
and the spectrum's m/z.
signature(object = "MSnExp",
mz)
: retain spectra with isolation windows that contain
(which m/z range contain) the specified m/z.
Laurent Gatto
Information about the mzXML format as well converters from vendor specific formats to mzXML: http://tools.proteomecenter.org/wiki/index.php?title=Formats:mzXML.
"pSet"
and readMSData
for loading
mzXML
, mzData
or mzML
files to generate an
instance of MSnExp
.
The "OnDiskMSnExp"
manual page contains further
details and examples.
chromatogram
to extract chromatographic data from a
MSnExp
or OnDiskMSnExp
object.
write
for the function to write the data to mzML or
mzXML file(s).
mzxmlfile <- dir(system.file("extdata",package="MSnbase"), pattern="mzXML",full.names=TRUE) msnexp <- readMSData(mzxmlfile) msnexp
mzxmlfile <- dir(system.file("extdata",package="MSnbase"), pattern="mzXML",full.names=TRUE) msnexp <- readMSData(mzxmlfile) msnexp
MSnProcess
is a container for MSnExp and MSnSet processing
information. It records data files, processing steps, thresholds,
analysis methods and times that have been applied to MSnExp or MSnSet
instances.
files
:Object of class "character"
storing the
raw data files used in experiment described by the
"MSnProcess"
instance.
processing
:Object of class "character"
storing
all the processing steps and times.
merged
:Object of class "logical"
indicating
whether spectra have been merged.
cleaned
:Object of class "logical"
indicating
whether spectra have been cleaned. See clean
for
more details and examples.
removedPeaks
:Object of class "character"
describing whether peaks have been removed and which threshold was
used. See removePeaks
for more details and examples.
smoothed
:Object of class "logical"
indicating
whether spectra have been smoothed.
trimmed
:Object of class "numeric"
documenting
if/how the data has been trimmed.
normalised
:Object of class "logical"
describing whether and how data have been normalised.
MSnbaseVersion
:Object of class "character"
indicating the version of MSnbase.
.__classVersion__
:Object of class "Versions"
indicating the version of the MSnProcess
instance. Intended for
developer use and debugging.
Class "Versioned"
, directly.
signature(object = "MSnProcess")
: Returns the
file names used in experiment described by the "MSnProcess"
instance.
signature(object = "MSnProcess")
: Displays object
content as text.
signature(x = "MSnProcess", y = "MSnProcess")
:
Combines multiple MSnProcess
instances.
This class is likely to be updated using an AnnotatedDataFrame
.
Laurent Gatto
See the "MSnExp"
and "MSnSet"
classes that actually use MSnProcess
as a slot.
showClass("MSnProcess")
showClass("MSnProcess")
The MSnSet
holds quantified expression data for MS proteomics
data and the experimental meta-data.
The MSnSet
class is derived from the
"eSet"
class and mimics the
"ExpressionSet"
class classically used for
microarray data.
The constructor MSnSet(exprs, fData, pData)
can be used to
create MSnSet
instances. Argument exprs
is a
matrix
and fData
and pData
must be of class
data.frame
or "AnnotatedDataFrame"
and all
must meet the dimensions and name validity constrains.
Objects can also be created by calls of the form new("MSnSet",
exprs, ...)
. See also "ExpressionSet"
for
helpful information. Expression data produced from other softwares
can thus make use of this standardized data container to benefit
R
and Bioconductor
packages. Proteomics expression data
available as spreadsheets, as produced by third-party software such as
Proteome Discoverer, MaxQuant, ... can be imported using the
readMSnSet
and readMSnSet2
functions.
Coercion methods are also available to transform MSnSet
objects
to IBSpectra
, to data.frame
and to/from
ExpressionSet
and SummarizedExperiment
objects. In the
latter case, the metadata available in the protocolData
,
experimentData
are completely dropped, and only the logging
information of the processingData
slot is retained. All these
metadata can be subsequently be added using the
addMSnSetMetadata
(see examples below). When converting a
SummarizedExperiment
to an MSnSet
, the respective
metadata slots will be populated if available in the
SummarizedExperiment
metadata.
In the frame of the MSnbase
package, MSnSet
instances
can be generated from "MSnExp"
experiments using
the quantify
method).
qual
:Object of class "data.frame"
that records
peaks data for each of the reporter ions to be used as quality
metrics.
processingData
:Object of class
"MSnProcess"
that records all processing.
assayData
:Object of class "assayData"
containing a matrix with equal with column number equal to
nrow(phenoData)
. assayData
must contain a matrix
exprs
with rows represening features (e.g., reporters ions)
and columns representing samples. See the "AssayData"
class, exprs
and assayData
accessor
for more details. This slot in indirectly inherited from
"eSet"
.
phenoData
:Object of class "AnnotatedDataFrame"
containing experimenter-supplied variables describing sample (i.e
the individual tags for an labelled MS experiment) (indireclty
inherited from "eSet"
). See
phenoData
and the "eSet"
class
for more details. This slot can be accessed as a data.frame
with pData
and be replaced by a new valid (i.e. of
compatible dimensions and row names) data.frame
with
pData()<-
.
featureData
:Object of class
"AnnotatedDataFrame"
containing variables describing
features (spectra in our case), e.g. identificaiton data, peptide
sequence, identification score,... (inherited indirectly from
"eSet"
). See featureData
and
the "eSet"
class for more details. This slot
can be accessed as a data.frame
with fData
and be
replaced by a new valid (i.e. of compatible dimensions and row
names) data.frame
with fData()<-
.
experimentData
:Object of class
"MIAPE"
, containing details of experimental
methods (inherited from "eSet"
). See
experimentData
and the "eSet"
class for more details.
annotation
:not used here.
protocolData
:Object of class
"AnnotatedDataFrame"
containing
equipment-generated variables (inherited indirectly from
"eSet"
). See
protocolData
and the "eSet"
class for more details.
.__classVersion__
:Object of class
"Versions"
describing the versions of R,
the Biobase package, "eSet"
,
"pSet"
and MSnSet
of the
current instance. Intended for developer use and debugging (inherited
indirectly from "eSet"
).
Class "eSet"
, directly.
Class "VersionedBiobase"
, by class "eSet", distance 2.
Class "Versioned"
, by class "eSet", distance 3.
MSnSet specific methods or over-riding it's super-class are described
below. See also more "eSet"
for
inherited methods.
acquisitionNum(signature(object = "MSnSet"))
: Returns the
a numeric vector with acquisition number of each spectrum. The vector
names are the corresponding spectrum names.
The information is extracted from the object's featureData
slot.
fromFile(signature(object = "MSnSet"))
: get the index of
the file (in fileNames(object)
) from which the raw
spectra from which the corresponding feature were originally
read. The relevant information is extracted from the object's
featureData
slot.
Returns a numeric vector with names corresponding to the spectrum names.
signature(x = "MSnSet")
: Returns the dimensions of
object's assay data, i.e the number of samples and the number of
features.
signature(object = "MSnSet")
: Access file
names in the processingData
slot.
signature(object = "MSnSet")
: Prints the MIAPE-MS
meta-data stored in the experimentData
slot.
signature(object = "MSnSet")
: Access the
processingData
slot.
signature(object = "MSnSet")
: Displays object
content as text.
signature(object = "MSnSet")
: Access the reporter
ion peaks description.
signature(object = "MSnSet", impurities =
"matrix")
: performs reporter ions purity correction. See
purityCorrect
documentation for more details.
signature(object = "MSnSet")
: Performs
MSnSet
normalisation. See normalise
for more
details.
signature(x = "MSnSet")
: Returns a transposed
MSnSet
object where features are now aligned along columns
and samples along rows and the phenoData
and
featureData
slots have been swapped. The
protocolData
slot is always dropped.
signature(x = "MSnSet")
: Coerce
object from MSnSet
to
ExpressionSet-class
. The experimentData
slot
is converted to a MIAME
instance. It is also possible to
coerce an ExpressionSet
to and MSnSet
, in which case
the experimentData
slot is newly initialised.
signature(x = "MSnSet")
: Coerce
object from MSnSet
to IBSpectra
from the
isobar
package.
signature(x = "MSnSet")
: Coerce
object from MSnSet
to data.frame
. The MSnSet
is transposed and the PhenoData
slot is appended.
signature(x = "MSnSet")
:
Coerce object from MSnSet
to
SummarizedExperiment
. Only part of the metadata is
retained. See addMSnSetMetadata
and the example below for
details.
signature(x = "MSnSet")
: Writes expression values
to a tab-separated file (default is tmp.txt
). The
fDataCols
parameter can be used to specify which
featureData
columns (as column names, column number or
logical
) to append on the right of the expression matrix.
The following arguments are the same as write.table
.
signature(x = "MSnSet", y = "MSnSet", ...)
: Combines
2 or more MSnSet
instances according to their feature names.
Note that the qual
slot and the processing information are
silently dropped.
signature(object = "MSnSet", groupBy, n = 3, fun,
..., verbose = isMSnbaseVerbose())
:
Selects the n
most intense features (typically peptides or
spectra) out of all available for each set defined by
groupBy
(typically proteins) and creates a new instance of
class MSnSet
. If less than n
features are available,
all are selected. The ncol(object)
features are summerised
using fun
(default is sum
) prior to be ordered in
decreasing order. Additional parameters can be passed to
fun
through ...
, for instance to control the
behaviour of topN
in case of NA
values.
(Works also with matrix
instances.)
See also the nQuants
function to retrieve the
actual number of retained peptides out of n
.
A complete use case using topN
and nQuants
is
detailed in the synapter
package vignette.
signature(object = "MSnSet", pNA = "numeric",
pattern = "character", droplevels = "logical")
: This method
subsets object
by removing features that have (strictly)
more than pNA
percent of NA values. Default pNA
is
0, which removes any feature that exhibits missing data.
The method can also be used with a character pattern composed of
0
or 1
characters only. A 0
represent a
column/sample that is allowed a missing values, while
columns/samples with and 1
must not have NA
s.
This method also accepts matrix
instances. droplevels
defines whether unused levels in the
feature meta-data ought to be lost. Default is TRUE
.
See the droplevels
method below.
See also the is.na.MSnSet
and plotNA
methods for missing data exploration.
signature(object = "MSnSet", pNA = "numeric",
pattern = "character", droplevels = "logical")
: As
filterNA
, but for zeros.
signature(object = "MSnSet", msLevel. =
"numeric", fcol = "character")
Keeps only spectra with level
msLevel.
, as defined by the fcol
feature variable
(default is "msLevel"
).
signature(object = "MSnSet", base = "numeric")
Log
transforms exprs(object)
using
base::log
. base
(defaults is e='exp(1)'
) must
be a positive or complex number, the base with respect to which
logarithms are computed.
signature(x = "MSnSet", ...)
Drops the unused
factor levels in the featureData
slot. See
droplevels
for details.
signature(object = "MSnSet", ...)
Performs data imputation on the MSnSet
object.
See impute
for more details.
signature(object = "MSnSet", ...)
Trim leading and/or
trailing white spaces in the feature data slot. Also available for
data.frame
objects. See ?base::trimws
for details.
Additional accessors for the experimental metadata
(experimentData
slot) are defined. See
"MIAPE"
for details.
signature(object = "MSnSet")
Plots row
standard deviations versus row means. See
meanSdPlot
(vsn
package) for more details.
signature(x = "MSnSet", facetBy = "character",
sOrderBy = "character", legend = "character", low = "character",
high = "character", fnames = "logical", nmax =
"numeric")
Produces an heatmap of expression values in the
x
object. Simple horizontal facetting is enabled by
passing a single character as facetBy
. Arbitrary
facetting can be performed manually by saving the return value
of the method (see example below). Re-ordering of the samples is
possible by providing the name of a phenotypic variable to
sOrderBy
. The title of the legend can be set with
legend
and the colours with the low
and
high
arguments. If any negative value is detected in the
data, the values are considered as log fold-changes and a
divergent colour scale is used. Otherwise, a gradient from low
to high is used. To scale the quantitative data in x
prior to plotting, please see the scale
method.
When there are more than nmax
(default is 50)
features/rows, these are not printed. This behaviour can be
controlled by setting fnames
to TRUE
(always
print) or FALSE
(never print). See examples below.
The code is based on Vlad Petyuk's
vp.misc::image_msnset
. The previous version of this
method is still available through the image2
function.
signature(object = "MSnSet", pNA =
"numeric")
Plots missing data for an MSnSet
instance. pNA
is a
numeric
of length 1 that specifies the percentage
of accepted missing data values per features. This value will be
highlighted with a point on the figure, illustrating the overall
percentage of NA values in the full data set and the number of
proteins retained. Default is 1/2. See also
plotNA
.
signature(object = "MSnSet", log.it = "logical",
base = "numeric", ...)
Produces MA plots (Ratio as a function
of average intensity) for the samples in object
. If
ncol(object) == 2
, then one MA plot is produced using the
ma.plot
function from the affy
package. If
object
has more than 2 columns, then
mva.pairs
. log.it
specifies is the data
should be log-transformed (default is TRUE
) using
base
. Further ...
arguments will be passed to the
respective functions.
signature(object = "MSnSet", ...)
:
Adds identification data to a MSnSet
instance.
See addIdentificationData
documentation for
more details and examples.
signature(object = "MSnSet", fcol =
"pepseq", keep = NULL)
: Removes non-identified features. See
removeNoId
documentation for more details and
examples.
signature(object = "MSnSet",
fcol = "nprot")
: Removes protein groups (or feature belong to
protein groups) with more than one member. The latter is defined
by extracting a feature variable (default is
"nprot"
). Also removes non-identified features.
signature(object = "MSnSet", ...)
: Prints a
summary that lists the percentage of identified features per file
(called coverage
).
signature(object, label, sep)
This
function updates object
's featureData variable labels by
appending label
. By default, label
is the variable
name and the separator sep
is .
.
signature(object, label, sep)
This
function updates object
's sample names by appending
label
. By default, label
is the variable name and
the separator sep
is .
.
signature(object, label, sep)
This
function updates object
's feature names by appending
label
. By default, label
is the variable name and
the separator sep
is .
.
signature(x, fcols)
Coerces the MSnSet
instance
to a data.frame
. The direction of the data is retained and
the feature variable labels that match fcol
are appended to
the expression values. See also as(x, "data.frame")
above.
signature(x, y)
When coercing an
MSnSet
y
to a SummarizedExperiment
x
with x <- as(y, "SummarizedExperiment")
, most of y
's
metadata is lost. Only the file names, the processing log and the
MSnbase version from the processingData
slots are passed
along. The addMSnSetMetadata
function can be used to add
the complete processingData
, experimentData
and
protocolData
slots. The downside of this is that MSnbase is
now required to use the SummarizedExperiment
object.
Laurent Gatto
"eSet"
, "ExpressionSet"
and
quantify
. MSnSet
quantitation values and
annotation can be exported to a file with
write.exprs
. See readMSnSet
to
create and MSnSet
using data available in a spreadsheet or
data.frame
.
data(msnset) msnset <- msnset[10:15] exprs(msnset)[1, c(1, 4)] <- NA exprs(msnset)[2, c(1, 2)] <- NA is.na(msnset) featureNames(filterNA(msnset, pNA = 1/4)) featureNames(filterNA(msnset, pattern = "0110")) M <- matrix(rnorm(12), 4) pd <- data.frame(otherpdata = letters[1:3]) fd <- data.frame(otherfdata = letters[1:4]) x0 <- MSnSet(M, fd, pd) sampleNames(x0) M <- matrix(rnorm(12), 4) colnames(M) <- LETTERS[1:3] rownames(M) <- paste0("id", LETTERS[1:4]) pd <- data.frame(otherpdata = letters[1:3]) rownames(pd) <- colnames(M) fd <- data.frame(otherfdata = letters[1:4]) rownames(fd) <- rownames(M) x <- MSnSet(M, fd, pd) sampleNames(x) ## Visualisation library("pRolocdata") data(dunkley2006) image(dunkley2006) ## Changing colours image(dunkley2006, high = "darkgreen") image(dunkley2006, high = "darkgreen", low = "yellow") ## Forcing feature names image(dunkley2006, fnames = TRUE) ## Facetting image(dunkley2006, facetBy = "replicate") p <- image(dunkley2006) library("ggplot2") ## for facet_grid p + facet_grid(replicate ~ membrane.prep, scales = 'free', space = 'free') p + facet_grid(markers ~ replicate) ## Fold-changes dd <- dunkley2006 exprs(dd) <- exprs(dd) - 0.25 image(dd) image(dd, low = "green", high = "red") ## Feature names are displayed by default for smaller data dunkley2006 <- dunkley2006[1:25, ] image(dunkley2006) image(dunkley2006, legend = "hello") ## Coercion if (require("SummarizedExperiment")) { data(msnset) se <- as(msnset, "SummarizedExperiment") metadata(se) ## only logging se <- addMSnSetMetadata(se, msnset) metadata(se) ## all metadata msnset2 <- as(se, "MSnSet") processingData(msnset2) } as(msnset, "ExpressionSet")
data(msnset) msnset <- msnset[10:15] exprs(msnset)[1, c(1, 4)] <- NA exprs(msnset)[2, c(1, 2)] <- NA is.na(msnset) featureNames(filterNA(msnset, pNA = 1/4)) featureNames(filterNA(msnset, pattern = "0110")) M <- matrix(rnorm(12), 4) pd <- data.frame(otherpdata = letters[1:3]) fd <- data.frame(otherfdata = letters[1:4]) x0 <- MSnSet(M, fd, pd) sampleNames(x0) M <- matrix(rnorm(12), 4) colnames(M) <- LETTERS[1:3] rownames(M) <- paste0("id", LETTERS[1:4]) pd <- data.frame(otherpdata = letters[1:3]) rownames(pd) <- colnames(M) fd <- data.frame(otherfdata = letters[1:4]) rownames(fd) <- rownames(M) x <- MSnSet(M, fd, pd) sampleNames(x) ## Visualisation library("pRolocdata") data(dunkley2006) image(dunkley2006) ## Changing colours image(dunkley2006, high = "darkgreen") image(dunkley2006, high = "darkgreen", low = "yellow") ## Forcing feature names image(dunkley2006, fnames = TRUE) ## Facetting image(dunkley2006, facetBy = "replicate") p <- image(dunkley2006) library("ggplot2") ## for facet_grid p + facet_grid(replicate ~ membrane.prep, scales = 'free', space = 'free') p + facet_grid(markers ~ replicate) ## Fold-changes dd <- dunkley2006 exprs(dd) <- exprs(dd) - 0.25 image(dd) image(dd, low = "green", high = "red") ## Feature names are displayed by default for smaller data dunkley2006 <- dunkley2006[1:25, ] image(dunkley2006) image(dunkley2006, legend = "hello") ## Coercion if (require("SummarizedExperiment")) { data(msnset) se <- as(msnset, "SummarizedExperiment") metadata(se) ## only logging se <- addMSnSetMetadata(se, msnset) metadata(se) ## all metadata msnset2 <- as(se, "MSnSet") processingData(msnset2) } as(msnset, "ExpressionSet")
A class for storing lists of MSnSet
instances.
There are two ways to store different sets of measurements pertaining an experimental unit, such as replicated measures of different conditions that were recorded over more than one MS acquisition. Without focusing on any proteomics technology in particular, these multiple assays can be recorded as
A single combined MSnSet
(see the section
Combining MSnSet instances in the MSnbase-demo
section). In such cases, the different experimental (phenotypical)
conditions are recorded as an
AnnotatedDataFrame
in the phenoData
slots.
Quantitative data for features that were missing in an assay are
generally encode as missing with NA
values. Alternatively,
only features observed in all assays could be selected. See the
commonFeatureNames
functions to select only common
features among two or more MSnSet
instance.
Each set of measurements is stored in an MSnSet
which
are combined into one MSnSetList
. Each MSnSet
elements
can have identical or different samples and features. Unless
compiled directly manually by the user, one would expect at least
one of these dimensions (features/rows or samples/columns) are
conserved (i.e. all feature or samples names are identical). See
split
/unsplit
below.
Objects can be created and manipluated with:
MSnSetList(x, log, featureDAta)
The class constructor
that takes a list of valid MSnSet
instances as input
x
, an optional logging list
, and an optional feature
metadata data.frame
.
split(x, f)
An MSnSetList
can be created from
an MSnSet
instance. x
is a single
MSnSet
and f
is a factor
or a
character
of length 1. In the latter case, f
will be
matched to the feature- and phenodata variable names (in that
order). If a match is found, the respective variable is extracted,
converted to a factor if it is not one already, and used to split
x
along the features/rows (f
was a feature variable
name) or samples/columns (f
was a phenotypic variable
name). If f
is passed as a factor, its length will be
matched to nrow(x)
or ncol(x)
(in that order) to
determine if x
will be split along the features (rows) or
sample (columns). Hence, the length of f
must match exactly
to either dimension.
unsplit(value, f)
The unsplit
method reverses
the effect of splitting the value
MSnSet
along the
groups f
.
as(x, "MSnSetList")
Where x
is an instance of
class MzTab. See the class documentation for
details.
x
:Object of class list
containing valid
MSnSet
instances. Can be extracted with the
msnsets()
accessor.
log
:Object of class list
containing an object
creation log, containing among other elements the call that
generated the object. Can be accessed with objlog()
.
featureData
:Object of class DataFrame
that
stores metadata for each object in the x
slot. The number of
rows of this data.frame
must be equal to the number of items
in the x
slot and their respective (row)names must be
identical.
.__classVersion__
:The version of the instance. For development purposes only.
"[["
Extracts a single MSnSet
at position.
"["
Extracts one of more MSnSets
as
MSnSetList
.
length
Returns the number of MSnSets
.
names
Returns the names of MSnSets
, if
available. The replacement method is also available.
show
Display the object by printing a short summary.
lapply(x, FUN, ...)
Apply function FUN
to each
element of the input x
. If the application of FUN
returns and MSnSet
, then the return value is an
MSnSetList
, otherwise a list
.
sapply(x, FUN, ..., simplify = TRUE, USE.NAMES =
TRUE)
A lapply
wrapper that simplifies the ouptut to a
vector, matric or array is possible. See ?base::sapply
for details.
.
fData
Returns the features metadata featureData
slot.
fData<-
Features metadata featureData
replacement method.
Laurent Gatto
The commonFeatureNames
function to select common
features among MSnSet
instances.
library("pRolocdata") data(tan2009r1) data(tan2009r2) ## The MSnSetList class ## for an unnamed list, names are set to indices msnl <- MSnSetList(list(tan2009r1, tan2009r2)) names(msnl) ## a named example msnl <- MSnSetList(list(A = tan2009r1, B = tan2009r2)) names(msnl) msnsets(msnl) length(msnl) objlog(msnl) msnl[[1]] ## an MSnSet msnl[1] ## an MSnSetList of length 1 ## Iterating over the elements lapply(msnl, dim) ## a list lapply(msnl, normalise, method = "quantiles") ## an MSnSetList fData(msnl) fData(msnl)$X <- sapply(msnl, nrow) fData(msnl) ## Splitting and unsplitting ## splitting along the columns/samples data(dunkley2006) head(pData(dunkley2006)) (splt <- split(dunkley2006, "replicate")) lapply(splt, dim) ## the number of rows and columns of the split elements unsplt <- unsplit(splt, dunkley2006$replicate) stopifnot(compareMSnSets(dunkley2006, unsplt)) ## splitting along the rows/features head(fData(dunkley2006)) (splt <- split(dunkley2006, "markers")) unsplt <- unsplit(splt, factor(fData(dunkley2006)$markers)) simplify2array(lapply(splt, dim)) stopifnot(compareMSnSets(dunkley2006, unsplt))
library("pRolocdata") data(tan2009r1) data(tan2009r2) ## The MSnSetList class ## for an unnamed list, names are set to indices msnl <- MSnSetList(list(tan2009r1, tan2009r2)) names(msnl) ## a named example msnl <- MSnSetList(list(A = tan2009r1, B = tan2009r2)) names(msnl) msnsets(msnl) length(msnl) objlog(msnl) msnl[[1]] ## an MSnSet msnl[1] ## an MSnSetList of length 1 ## Iterating over the elements lapply(msnl, dim) ## a list lapply(msnl, normalise, method = "quantiles") ## an MSnSetList fData(msnl) fData(msnl)$X <- sapply(msnl, nrow) fData(msnl) ## Splitting and unsplitting ## splitting along the columns/samples data(dunkley2006) head(pData(dunkley2006)) (splt <- split(dunkley2006, "replicate")) lapply(splt, dim) ## the number of rows and columns of the split elements unsplt <- unsplit(splt, dunkley2006$replicate) stopifnot(compareMSnSets(dunkley2006, unsplt)) ## splitting along the rows/features head(fData(dunkley2006)) (splt <- split(dunkley2006, "markers")) unsplt <- unsplit(splt, factor(fData(dunkley2006)$markers)) simplify2array(lapply(splt, dim)) stopifnot(compareMSnSets(dunkley2006, unsplt))
MSpectra
(Mass Spectra) objects allow to collect one or more
Spectrum object(s) (Spectrum1 or Spectrum2) in
a list
-like structure with the possibility to add arbitrary annotations
to each individual Spectrum
object. These can be accessed/set with
the mcols()
method.
MSpectra
objects can be created with the MSpectra
function.
Functions to access the individual spectra's attributes are available (listed below).
writeMgfData
exports a MSpectra
object to a file in MGF format. All
metadata columns present in mcols
are exported as additional fields with
the capitalized column names used as field names (see examples below).
MSpectra(..., elementMetadata = NULL) ## S4 method for signature 'MSpectra' mz(object) ## S4 method for signature 'MSpectra' intensity(object) ## S4 method for signature 'MSpectra' rtime(object) ## S4 method for signature 'MSpectra' precursorMz(object) ## S4 method for signature 'MSpectra' precursorCharge(object) ## S4 method for signature 'MSpectra' precScanNum(object) ## S4 method for signature 'MSpectra' precursorIntensity(object) ## S4 method for signature 'MSpectra' acquisitionNum(object) ## S4 method for signature 'MSpectra' scanIndex(object) ## S4 method for signature 'MSpectra,ANY' peaksCount(object) ## S4 method for signature 'MSpectra' msLevel(object) ## S4 method for signature 'MSpectra' tic(object) ## S4 method for signature 'MSpectra' ionCount(object) ## S4 method for signature 'MSpectra' collisionEnergy(object) ## S4 method for signature 'MSpectra' fromFile(object) ## S4 method for signature 'MSpectra' polarity(object) ## S4 method for signature 'MSpectra' smoothed(object) ## S4 method for signature 'MSpectra' isEmpty(x) ## S4 method for signature 'MSpectra' centroided(object) ## S4 method for signature 'MSpectra' isCentroided(object) ## S4 method for signature 'MSpectra' writeMgfData(object, con = "spectra.mgf", COM = NULL, TITLE = NULL) ## S4 method for signature 'MSpectra' clean(object, all = FALSE, msLevel. = msLevel., ...) ## S4 method for signature 'MSpectra' removePeaks(object, t, msLevel., ...) ## S4 method for signature 'MSpectra' filterMz(object, mz, msLevel., ...) ## S4 method for signature 'MSpectra' pickPeaks( object, halfWindowSize = 3L, method = c("MAD", "SuperSmoother"), SNR = 0L, refineMz = c("none", "kNeighbors", "kNeighbours", "descendPeak"), msLevel. = unique(msLevel(object)), ... ) ## S4 method for signature 'MSpectra' smooth( x, method = c("SavitzkyGolay", "MovingAverage"), halfWindowSize = 2L, ... ) ## S4 method for signature 'MSpectra' filterMsLevel(object, msLevel.)
MSpectra(..., elementMetadata = NULL) ## S4 method for signature 'MSpectra' mz(object) ## S4 method for signature 'MSpectra' intensity(object) ## S4 method for signature 'MSpectra' rtime(object) ## S4 method for signature 'MSpectra' precursorMz(object) ## S4 method for signature 'MSpectra' precursorCharge(object) ## S4 method for signature 'MSpectra' precScanNum(object) ## S4 method for signature 'MSpectra' precursorIntensity(object) ## S4 method for signature 'MSpectra' acquisitionNum(object) ## S4 method for signature 'MSpectra' scanIndex(object) ## S4 method for signature 'MSpectra,ANY' peaksCount(object) ## S4 method for signature 'MSpectra' msLevel(object) ## S4 method for signature 'MSpectra' tic(object) ## S4 method for signature 'MSpectra' ionCount(object) ## S4 method for signature 'MSpectra' collisionEnergy(object) ## S4 method for signature 'MSpectra' fromFile(object) ## S4 method for signature 'MSpectra' polarity(object) ## S4 method for signature 'MSpectra' smoothed(object) ## S4 method for signature 'MSpectra' isEmpty(x) ## S4 method for signature 'MSpectra' centroided(object) ## S4 method for signature 'MSpectra' isCentroided(object) ## S4 method for signature 'MSpectra' writeMgfData(object, con = "spectra.mgf", COM = NULL, TITLE = NULL) ## S4 method for signature 'MSpectra' clean(object, all = FALSE, msLevel. = msLevel., ...) ## S4 method for signature 'MSpectra' removePeaks(object, t, msLevel., ...) ## S4 method for signature 'MSpectra' filterMz(object, mz, msLevel., ...) ## S4 method for signature 'MSpectra' pickPeaks( object, halfWindowSize = 3L, method = c("MAD", "SuperSmoother"), SNR = 0L, refineMz = c("none", "kNeighbors", "kNeighbours", "descendPeak"), msLevel. = unique(msLevel(object)), ... ) ## S4 method for signature 'MSpectra' smooth( x, method = c("SavitzkyGolay", "MovingAverage"), halfWindowSize = 2L, ... ) ## S4 method for signature 'MSpectra' filterMsLevel(object, msLevel.)
... |
For |
elementMetadata |
For |
object |
For all functions: a |
x |
For all functions: a |
con |
For |
COM |
For |
TITLE |
For |
all |
For |
msLevel. |
For |
t |
For |
mz |
For |
halfWindowSize |
For |
method |
For |
SNR |
For |
refineMz |
For |
MSpectra
inherits all methods from the SimpleList class of the
S4Vectors
package. This includes lapply
and other data manipulation
and subsetting operations.
New MSpectra can be created with the MSpectra(...)
function
where ...
can either be a single Spectrum object or a list
of
Spectrum
objects (Spectrum1 and/or Spectrum2).
These methods allow to access the attributes and values of the individual
Spectrum
(Spectrum1 or Spectrum2) objects within the list.
mz
return the m/z values of each spectrum as a list
of numeric
vectors.
intensity
return the intensity values of each spectrum as a list
of
numeric
vectors.
rtime
return the retention time of each spectrum as a numeric
vector
with length equal to the length of object
.
precursorMz
, precursorCharge
, precursorIntensity
, precScanNum
return precursor m/z values, charge, intensity and scan number for each
spectrum as a numeric
(or integer
) vector with length equal to the
length of object
. Note that for Spectrum1 objects NA
will be
returned.
acquisitionNum
and scanIndex
return the acquisition number of each
spectrum and its scan index as an integer
vector with the same length
than object
.
ionCount
and tic
return the ion count and total ion current of each
spectrum.
peaksCount
returns the number of peaks for each spectrum as a integer
vector.
msLevel
returns the MS level of each spectrum.
collisionEnergy
returns the collision energy for each spectrum or NA
for Spectrum1 objects.
polarity
returns the spectra's polarity.
fromFile
returns the index from the (e.g. mzML) file the spectra where
from. This applies only for spectra read using the readMSData()
function.
smoothed
whether spectra have been smoothed (i.e. processed with the
smooth()
method. Returns a logical
of length equal to the
number of spectra.
isEmpty
returns TRUE
for spectra without peak data.
centroided
, isCentroided
returns for each spectrum whether it contains
centroided data. While centroided
returns the internal attribute of
each spectrum, isCentroided
tries to guess whether spectra are
centroided from the actual peak data.
clean
cleans each spectrum. See clean()
for more details.
pickPeaks
performs peak picking to generate centroided spectra. See
pickPeaks()
for more details.
removePeaks
removes peaks lower than a threshold t
. See
removePeaks()
for more details.
smooth
smooths spectra. See smooth()
for more details.
[
can be used to subset the MSpectra
object.
filterMsLevel
filters MSpectra
to retain only spectra from certain MS
level(s).
filterMz
filters the spectra by the specified mz
range. See
filterMz()
for details.
Note that the Spectra package provides a more robust and efficient infrastructure for mass spectrometry data handling and analysis. So, wherever possible, the newer Spectra package should be used instead of the MSnbase.
For backward compatibility, it is however possible to convert between the
MSpectra
and the newer Spectra
objects:
A Spectra
object can be coerced to a MSpectra
using
as(sps, "MSpectra")
where sps
is a Spectra
object.
The extractSpectraData()
function can be used to extract the data from
a MSpectra
as a DataFrame
, which can then be used to create a
Spectra
object.
Johannes Rainer
## Create from Spectrum objects sp1 <- new("Spectrum1", mz = c(1, 2, 4), intensity = c(4, 5, 2)) sp2 <- new("Spectrum2", mz = c(1, 2, 3, 4), intensity = c(5, 3, 2, 5), precursorMz = 2) spl <- MSpectra(sp1, sp2) spl spl[[1]] ## Add also metadata columns mcols(spl)$id <- c("a", "b") mcols(spl) ## Create a MSpectra with metadata spl <- MSpectra(sp1, sp2, elementMetadata = DataFrame(id = c("a", "b"))) mcols(spl) mcols(spl)$id ## Extract the mz values for the individual spectra mz(spl) ## Extract the intensity values for the individual spectra intensity(spl) ## Extract the retention time values for the individual spectra rtime(spl) ## Extract the precursor m/z of each spectrum. precursorMz(spl) ## Extract the precursor charge of each spectrum. precursorCharge(spl) ## Extract the precursor scan number for each spectrum. precScanNum(spl) ## Extract the precursor intensity of each spectrum. precursorIntensity(spl) ## Extract the acquisition number of each spectrum. acquisitionNum(spl) ## Extract the scan index of each spectrum. scanIndex(spl) ## Get the number of peaks per spectrum. peaksCount(spl) ## Get the MS level of each spectrum. msLevel(spl) ## Get the total ion current for each spectrum. tic(spl) ## Get the total ion current for each spectrum. ionCount(spl) ## Extract the collision energy for each spectrum. collisionEnergy(spl) ## Extract the file index for each spectrum. fromFile(spl) ## Get the polarity for each spectrum. polarity(spl) ## Whether spectra are smoothed (i.e. processed with the `smooth` ## function). smoothed(spl) ## Are spectra empty (i.e. contain no peak data)? isEmpty(spl) ## Do the spectra contain centroided data? centroided(spl) ## Do the spectra contain centroided data? Whether spectra are centroided ## is estimated from the peak data. isCentroided(spl) ## Export the spectrum list to a MGF file. Values in metadata columns are ## exported as additional field for each spectrum. tmpf <- tempfile() writeMgfData(spl, tmpf) ## Evaluate the written output. The ID of each spectrum (defined in the ## "id" metadata column) is exported as field "ID". readLines(tmpf) ## Set mcols to NULL to avoid export of additional data fields. mcols(spl) <- NULL file.remove(tmpf) writeMgfData(spl, tmpf) readLines(tmpf) ## Filter the object by MS level filterMsLevel(spl, msLevel. = 1)
## Create from Spectrum objects sp1 <- new("Spectrum1", mz = c(1, 2, 4), intensity = c(4, 5, 2)) sp2 <- new("Spectrum2", mz = c(1, 2, 3, 4), intensity = c(5, 3, 2, 5), precursorMz = 2) spl <- MSpectra(sp1, sp2) spl spl[[1]] ## Add also metadata columns mcols(spl)$id <- c("a", "b") mcols(spl) ## Create a MSpectra with metadata spl <- MSpectra(sp1, sp2, elementMetadata = DataFrame(id = c("a", "b"))) mcols(spl) mcols(spl)$id ## Extract the mz values for the individual spectra mz(spl) ## Extract the intensity values for the individual spectra intensity(spl) ## Extract the retention time values for the individual spectra rtime(spl) ## Extract the precursor m/z of each spectrum. precursorMz(spl) ## Extract the precursor charge of each spectrum. precursorCharge(spl) ## Extract the precursor scan number for each spectrum. precScanNum(spl) ## Extract the precursor intensity of each spectrum. precursorIntensity(spl) ## Extract the acquisition number of each spectrum. acquisitionNum(spl) ## Extract the scan index of each spectrum. scanIndex(spl) ## Get the number of peaks per spectrum. peaksCount(spl) ## Get the MS level of each spectrum. msLevel(spl) ## Get the total ion current for each spectrum. tic(spl) ## Get the total ion current for each spectrum. ionCount(spl) ## Extract the collision energy for each spectrum. collisionEnergy(spl) ## Extract the file index for each spectrum. fromFile(spl) ## Get the polarity for each spectrum. polarity(spl) ## Whether spectra are smoothed (i.e. processed with the `smooth` ## function). smoothed(spl) ## Are spectra empty (i.e. contain no peak data)? isEmpty(spl) ## Do the spectra contain centroided data? centroided(spl) ## Do the spectra contain centroided data? Whether spectra are centroided ## is estimated from the peak data. isCentroided(spl) ## Export the spectrum list to a MGF file. Values in metadata columns are ## exported as additional field for each spectrum. tmpf <- tempfile() writeMgfData(spl, tmpf) ## Evaluate the written output. The ID of each spectrum (defined in the ## "id" metadata column) is exported as field "ID". readLines(tmpf) ## Set mcols to NULL to avoid export of additional data fields. mcols(spl) <- NULL file.remove(tmpf) writeMgfData(spl, tmpf) readLines(tmpf) ## Filter the object by MS level filterMsLevel(spl, msLevel. = 1)
MzTab
filesThe MzTab
class stores the output of a basic parsing of a
mzTab
file. It contain the metadata (a list
), comments
(a character
vector), and the at least of of the following data
types: proteins, peptides, PSMs and small molecules (as
data.frames
).
At this stage, the metadata and data are only minimally parsed. The
column names are kept as they are defined in the original files and
are thus not all going to be valid colnames. To access them using the
dollar operator, use backticks. More specific data extraction and
preparation are delegated to more specialised functions, such as the
as(., to = "MSnSetList")
and readMzTabData
for
proteomics data.
Note that no attempts are made to verify the validitiy of the mzTab file.
Objects can be created by calls the the constructor MzTab
that
takes a single mzTab
file as input.
The objects can subsequently be coerced to MSnSetList
instances with as(object, "MSnSetList")
. The resulting
MSnSetList
contains possibly empty MSnSet
instances for
proteins, peptide and PSMs, respectively named "Proteins"
,
"Peptides"
and "PSMs"
.
The assaydata slots of the two former are populated with the
protein_abundance_assay[1-n]
and
peptide_abundance_assay[1-n]
columns in the mzTab
file. No abundance values are defined for the latter. The respective
feature names correspond to protein accessions, peptide sequences and
PSM identifiers, possibly made unique as by appending sequence numbers
to duplicates.
Metadata
:Object of class "list"
storing the
metadata section.
Filename
:Object of class "character"
storing
the orginal file name.
Proteins
:Object of class "data.frame"
storing
the protein data.
Peptides
:Object of class "data.frame"
storing
the peptide data.
PSMs
:Object of class "data.frame"
storing
the PSM data.
SmallMolecules
:Object of class "data.frame"
storing the small molecules data.
MoleculeFeatures
:Object of class "data.frame"
storing the molecule features.
MoleculeEvidence
:Object of class
"data.frame"
storing the molecule evidence.
Comments
:Object of class "character"
storing
the comments that were present in the file.
signature(x = "MzTab")
: returns the meta data
list
.
signature(x = "MzTab")
: returns the mode
(complete or summary) of the mzTab
data. A shortcut for
metadata(x)$`mzTab-mode`
.
signature(x = "MzTab")
: returns the type
(quantification or identification) of the mzTab
data. A
shortcut for metadata(x)$`mzTab-type`
.
signature(object = "MzTab")
: returns the file name
of the original mzTab
file.
signature(object = "MzTab")
: returns the
peptide data.frame
.
signature(object = "MzTab")
: returns the
proteins data.frame
.
signature(object = "MzTab")
: returns the
PSMs data.frame
.
signature(object = "MzTab")
: returns
the small molecules (SML) data.frame
.
signature(object = "MzTab")
: returns
the small molecules features (SMF) data.frame
.
signature(object = "MzTab")
: returns
the small molecule identification evidence (SME) data.frame
.
signature(object = "MzTab")
: returns the
comments.
Laurent Gatto, with contributions from Richard Cotton (see https://github.com/lgatto/MSnbase/issues/41) and Steffen Neuman (see https://github.com/lgatto/MSnbase/pull/500).
The mzTab format is a light-weight, tab-delimited file format for proteomics data. Version mzTab 1.0 is aimed at proteomics, mzTab-M 2.0 was adapted to metabolomics. See https://github.com/HUPO-PSI/mzTab for details and specifications.
Griss J, Jones AR, Sachsenberg T, Walzer M, Gatto L, Hartler J, Thallinger GG, Salek RM, Steinbeck C, Neuhauser N, Cox J, Neumann S, Fan J, Reisinger F, Xu QW, Del Toro N, Perez-Riverol Y, Ghali F, Bandeira N, Xenarios I, Kohlbacher O, Vizcaino JA, Hermjakob H. The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience. Mol Cell Proteomics. 2014 Oct;13(10):2765-75. doi: 10.1074/mcp.O113.036681. Epub 2014 Jun 30. PubMed PMID: 24980485; PubMed Central PMCID: PMC4189001.
Hoffmann N, Rein J, Sachsenberg T, et al. mzTab-M: A Data Standard for Sharing Quantitative Results in Mass Spectrometry Metabolomics. Anal Chem. 2019;91(5):3302‐3310. doi:10.1021/acs.analchem.8b04310 PubMed PMID: 30688441; PubMed Central PMCID: PMC6660005.
## Test files from the mzTab developement repository fls <- c("Cytidine.mzTab", "MTBLS2.mztab", "PRIDE_Exp_Complete_Ac_1643.xml-mztab.txt", "PRIDE_Exp_Complete_Ac_16649.xml-mztab.txt", "SILAC_CQI.mzTab", "SILAC_SQ.mzTab", "iTRAQ_CQI.mzTab", "iTRAQ_SQI.mzTab", "labelfree_CQI.mzTab", "labelfree_SQI.mzTab", "lipidomics-HFD-LD-study-PL-DG-SM.mzTab", "lipidomics-HFD-LD-study-TG.mzTab") baseUrl <- "https://raw.githubusercontent.com/HUPO-PSI/mzTab/master/examples/1_0-Proteomics-Release/" ## a list of mzTab objects mzt <- sapply(file.path(baseUrl, fls), MzTab) stopifnot(length(mzt) == length(fls)) mzt[[4]] dim(proteins(mzt[[4]])) dim(psms(mzt[[4]])) prots4 <- proteins(mzt[[4]]) class(prots4) prots4[1:5, 1:4]
## Test files from the mzTab developement repository fls <- c("Cytidine.mzTab", "MTBLS2.mztab", "PRIDE_Exp_Complete_Ac_1643.xml-mztab.txt", "PRIDE_Exp_Complete_Ac_16649.xml-mztab.txt", "SILAC_CQI.mzTab", "SILAC_SQ.mzTab", "iTRAQ_CQI.mzTab", "iTRAQ_SQI.mzTab", "labelfree_CQI.mzTab", "labelfree_SQI.mzTab", "lipidomics-HFD-LD-study-PL-DG-SM.mzTab", "lipidomics-HFD-LD-study-TG.mzTab") baseUrl <- "https://raw.githubusercontent.com/HUPO-PSI/mzTab/master/examples/1_0-Proteomics-Release/" ## a list of mzTab objects mzt <- sapply(file.path(baseUrl, fls), MzTab) stopifnot(length(mzt) == length(fls)) mzt[[4]] dim(proteins(mzt[[4]])) dim(psms(mzt[[4]])) prots4 <- proteins(mzt[[4]]) class(prots4) prots4[1:5, 1:4]
Visualise missing values as a heatmap and barplots along the samples and features.
naplot( object, verbose = isMSnbaseVerbose(), reorderRows = TRUE, reorderColumns = TRUE, ... )
naplot( object, verbose = isMSnbaseVerbose(), reorderRows = TRUE, reorderColumns = TRUE, ... )
object |
An object of class |
verbose |
If verbose (default is |
reorderRows |
If reorderRows (default is |
reorderColumns |
If reorderColumns (default is |
... |
Additional parameters passed to |
Used for its side effect. Invisibly returns NULL
Laurent Gatto
data(naset) naplot(naset)
data(naset) naplot(naset)
This function computes the number of features in the group defined
by the feature variable fcol
and appends this information
in the feature data of object
.
nFeatures(object, fcol)
nFeatures(object, fcol)
object |
An instance of class |
fcol |
Feature variable defining the feature grouping structure. |
An updated MSnSet
with a new feature variable
fcol.nFeatures
.
Laurent Gatto
library(pRolocdata) data("hyperLOPIT2015ms3r1psm") hyperLOPIT2015ms3r1psm <- nFeatures(hyperLOPIT2015ms3r1psm, "Protein.Group.Accessions") i <- c("Protein.Group.Accessions", "Protein.Group.Accessions.nFeatures") fData(hyperLOPIT2015ms3r1psm)[1:10, i]
library(pRolocdata) data("hyperLOPIT2015ms3r1psm") hyperLOPIT2015ms3r1psm <- nFeatures(hyperLOPIT2015ms3r1psm, "Protein.Group.Accessions") i <- c("Protein.Group.Accessions", "Protein.Group.Accessions.nFeatures") fData(hyperLOPIT2015ms3r1psm)[1:10, i]
MSnExp
, MSnSet
and
Spectrum
objects The normalise
method (also available as normalize
)
performs basic normalisation on spectra
intensities of single spectra ("Spectrum"
or
"Spectrum2"
objects),
whole experiments ("MSnExp"
objects) or
quantified expression data ("MSnSet"
objects).
Raw spectra and experiments are normalised using max
or
sum
only. For MSMS spectra could be normalised to their
precursor
additionally. Each peak intensity is divided by the
highest intensity in the spectrum, the sum of intensities or the intensity
of the precursor.
These methods aim at facilitating relative peaks heights between
different spectra.
The method
parameter for "MSnSet"
can be
one of sum
, max
, quantiles
, center.mean
,
center.median
, .median
, quantiles.robust
or
vsn
. For sum
and max
, each feature's reporter
intensity is divided by the maximum or the sum respectively. These two
methods are applied along the features (rows).
center.mean
and center.median
translate the respective
sample (column) intensities according to the column mean or
median. diff.median
translates all samples (columns) so that
they all match the grand median. Using quantiles
or
quantiles.robust
applies (robust) quantile normalisation, as
implemented in normalize.quantiles
and
normalize.quantiles.robust
of the preprocessCore
package. vsn
uses the vsn2
function from the
vsn
package. Note that the latter also glog-transforms the
intensities. See respective manuals for more details and function
arguments.
A scale
method, mimicking the base scale
method exists
for "MSnSet"
instances. See
?base::scale
for details.
object |
An object of class |
method |
A character vector of length one that describes how to normalise the object. See description for details. |
... |
Additional arguments passed to the normalisation function. |
The normalise
methods:
signature(object = "MSnSet", method = "character")
Normalises the object
reporter ions intensities using
method
.
signature(object = "MSnExp", method = "character")
Normalises the object
peak intensities using
method
.
signature(object = "Spectrum", method = "character")
Normalises the object
peak intensities using
method
.
signature(object = "Spectrum2", method = "character",
precursorIntensity)
Normalises the object
peak intensities using
method
. If method == "precursor"
,
precursorIntensity
allows to specify the intensity of the
precursor manually.
The scale
method:
signature(x = "MSnSet", center = "logical", scale =
"logical")
See ?base::scale
.
## quantifying full experiment data(msnset) msnset.nrm <- normalise(msnset, "quantiles") msnset.nrm
## quantifying full experiment data(msnset) msnset.nrm <- normalise(msnset, "quantiles") msnset.nrm
This function combines peptides into their proteins by normalising the intensity values to a reference run/sample for each protein.
normToReference( x, group, reference = .referenceFractionValues(x = x, group = group) )
normToReference( x, group, reference = .referenceFractionValues(x = x, group = group) )
x |
|
group |
|
reference |
|
This function is not intented to be used directly (that's why it is not
exported via NAMESPACE
). Instead the user should use
combineFeatures
.
The algorithm is described in Nikolovski et al., briefly it works as follows:
Find reference run (column) for each protein (grouped rows).
We use the run (column) with the lowest number of NA
.
If multiple candidates are available we use the one with the highest
intensity. This step is skipped if the user use his own reference
vector.
For each protein (grouped rows) and each run (column):
Find peptides (grouped rows) shared by the current run (column) and the reference run (column).
Sum the shared peptides (grouped rows) for the current run (column) and the reference run (column).
The ratio of the shared peptides (grouped rows) of the current run (column) and the reference run (column) is the new intensity for the current protein for the current run.
a matrix with one row per protein.
Sebastian Gibb [email protected], Pavel Shliaha
Nikolovski N, Shliaha PV, Gatto L, Dupree P, Lilley KS. Label-free protein quantification for plant Golgi protein localization and abundance. Plant Physiol. 2014 Oct;166(2):1033-43. DOI: 10.1104/pp.114.245589. PubMed PMID: 25122472.
library("MSnbase") data(msnset) # choose the reference run automatically combineFeatures(msnset, groupBy=fData(msnset)$ProteinAccession) # use a user-given reference combineFeatures(msnset, groupBy=fData(msnset)$ProteinAccession, reference=rep(2, 55))
library("MSnbase") data(msnset) # choose the reference run automatically combineFeatures(msnset, groupBy=fData(msnset)$ProteinAccession) # use a user-given reference combineFeatures(msnset, groupBy=fData(msnset)$ProteinAccession, reference=rep(2, 55))
Calculates a non-parametric version of the coefficient of
variation where the standard deviation is replaced by the median
absolute deviations (see mad
for details) and
divided by the absolute value of the mean.
Note that the mad
of a single value is 0 (as opposed to
NA
for the standard deviation, see example below).
npcv(x, na.rm = TRUE)
npcv(x, na.rm = TRUE)
x |
A |
na.rm |
A |
A numeric
.
Laurent Gatto
set.seed(1) npcv(rnorm(10)) replicate(10, npcv(rnorm(10))) npcv(1) mad(1) sd(1)
set.seed(1) npcv(rnorm(10)) replicate(10, npcv(rnorm(10))) npcv(1) mad(1) sd(1)
This function counts the number of quantified features, i.e
non NA quantitation values, for each group of features
for all the samples in an "MSnSet"
object.
The group of features are defined by a feature variable names, i.e
the name of a column of fData(object)
.
nQuants(x, groupBy)
nQuants(x, groupBy)
x |
An instance of class |
groupBy |
An object of class |
This function is typically used after topN
and before
combineFeatures
, when the summerising function is
sum
, or any function that does not normalise to the number of
features aggregated. In the former case, sums of features might
be the result of 0 (if no feature was quantified) to n
(if all topN
's n
features were quantified) features,
and one might want to rescale the sums based on the number of
non-NA features effectively summed.
A matrix
of dimensions
length(levels(groupBy))
by ncol(x)
A matrix
of dimensions
length(levels(factor(fData(object)[, fcol])))
by
ncol(object)
of integers.
Laurent Gatto [email protected], Sebastian Gibb [email protected]
data(msnset) n <- 2 msnset <- topN(msnset, groupBy = fData(msnset)$ProteinAccession, n) m <- nQuants(msnset, groupBy = fData(msnset)$ProteinAccession) msnset2 <- combineFeatures(msnset, groupBy = fData(msnset)$ProteinAccession, method = sum) stopifnot(dim(n) == dim(msnset2)) head(exprs(msnset2)) head(exprs(msnset2) * (n/m))
data(msnset) n <- 2 msnset <- topN(msnset, groupBy = fData(msnset)$ProteinAccession, n) m <- nQuants(msnset, groupBy = fData(msnset)$ProteinAccession) msnset2 <- combineFeatures(msnset, groupBy = fData(msnset)$ProteinAccession, method = sum) stopifnot(dim(n) == dim(msnset2)) head(exprs(msnset2)) head(exprs(msnset2) * (n/m))
OnDiskMSnExp
Class for MS Data And Meta-DataLike the MSnExp
class, the OnDiskMSnExp
class
encapsulates data and meta-data for mass spectrometry
experiments, but does, in contrast to the former, not keep the
spectrum data in memory, but fetches the M/Z and intensity values on
demand from the raw files. This results in some instances to a
reduced performance, has however the advantage of a much smaller
memory footprint.
The OnDiskMSnExp
object stores many spectrum related
information into the featureData
, thus, some calls, like
rtime
to retrieve the retention time of the individual scans
does not require the raw data to be read. Only M/Z and intensity
values are loaded on-the-fly from the original files. Extraction of
values for individual scans is, for mzML files, very fast. Extraction
of the full data (all spectra) are performed in a per-file parallel
processing strategy.
Data manipulations related to spectras' M/Z or intensity values
(e.g. removePeaks
or clean
) are (for
OnDiskMSnExp
objects) not applied immediately, but are stored
for later execution into the spectraProcessingQueue
. The
manipulations are performed on-the-fly upon data retrieval.
Other manipulations, like removal of individual spectra are applied
directly, since the corresponding data is available in the object's
featureData
slot.
Objects can be created by calls of the form
new("OnDiskMSnExp",...)
. However, it is preferred to use the
readMSData
function with argument backend="disk"
that will read raw mass spectrometry data to generate a valid
"OnDiskMSnExp"
instance.
backend
:Character string specifying the used backend.
spectraProcessingQueue
:list
of ProcessingStep
objects
defining the functions to be applied on-the-fly to the
spectra data (M/Z and intensity duplets).
assayData
:Object of class "environment"
that
is however empty, as no spectrum data is stored.
Slot is inherited from "pSet"
.
phenoData
:Object of class
"AnnotatedDataFrame"
containing
experimenter-supplied variables describing sample (i.e the
individual tags for an labelled MS experiment)
See phenoData
for more details.
Slot is inherited from "pSet"
.
featureData
:Object of class
"AnnotatedDataFrame"
containing variables
describing features (spectra in our case). See
featureData
for more details.
Slot is inherited from "pSet"
.
experimentData
:Object of class
"MIAPE"
, containing details of experimental
methods. See experimentData
for more details.
Slot is inherited from "pSet"
.
protocolData
:Object of class
"AnnotatedDataFrame"
containing
equipment-generated variables (inherited from
"eSet"
). See protocolData
for
more details.
Slot is inherited from "pSet"
.
processingData
:Object of class
"MSnProcess"
that records all processing.
Slot is inherited from "pSet"
.
.__classVersion__
:Object of class
"Versions"
describing the versions of R,
the Biobase package, "pSet"
and
MSnExp
of the current instance.
Slot is inherited from "pSet"
.
Intended for developer use and debugging (inherited from
"eSet"
).
Class "MSnExp"
, directly.
Class "pSet"
, by class "MSnExp", distance 3.
Class "VersionedBiobase"
, by class "pSet", distance 4.
Class "Versioned"
, by class "pSet", distance 5.
(in alphabetical order)
See also methods for MSnExp
or
pSet
objects.
object[i]
:subset the OnDiskMSnExp
by
spectra. i
can be a numeric
or logical
vector specifying to which spectra the data set should be reduced
(with i
being the index of the spectrum in the object's
featureData
).
The method returns a OnDiskMSnExp
object with the data
sub-set.
object[[i]]
: extract s single spectrum from the
OnDiskMSnExp
object object
. Argument i
can be
either numeric or character specifying the index or the name of
the spectrum in the object (i.e. in the featureData
). The
relevant information will be extracted from the corresponding raw
data file.
The method returns a Spectrum1
object.
acquisitionNum(signature(object="OnDiskMSnExp"))
: get the
acquisition number of each spectrum in each individual file. The
relevant information is extracted from the object's
featureData
slot.
Returns a numeric vector with names corresponding to the spectrum names.
assayData(signature(object = "OnDiskMSnExp"))
:
Extract the full data, i.e. read all spectra from the original files,
apply all processing steps from the spectraProcessingQueue
slot and return the data. Due to the required processing time
accessing the full data should be avoided wherever possible.
Returns an environment
.
centroided(signature(object="OnDiskMSnExp", msLevel, =
"numeric"))
: whether individual spectra are centroided or
uncentroided. The relevant information is extracted from the
object's featureData
slot. Returns a logical vector with
names corresponding to the spectrum names. Use
centroided(object) <- value
to update the information, with
value being a logical vector of length equal to the number of
spectra in the experiment.
isCentroided(object, k = 0.025, qtl = 0.9, verbose =
TRUE)
A heuristic assessing if the spectra in the object
are in profile or centroided mode. The function takes the
qtl
th quantile top peaks, then calculates the difference
between adjacent M/Z value and returns TRUE
if the first
quartile is greater than k
. (See
MSnbase:::.isCentroided
for the code.) If verbose
(default), a table indicating mode for all MS levels is printed.
The function has been tuned to work for MS1 and MS2 spectra and data centroided using different peak picking algorithms, but false positives can occur. See https://github.com/lgatto/MSnbase/issues/131 for details. For whole experiments, where all MS1 and MS2 spectra are expected to be in the same, albeit possibly different modes, it is advised to assign the majority result for MS1 and MS2 spectra, rather than results for individual spectra.
See also isCentroidedFromFile
that accessed the mode
directly from the raw data file.
fromFile(signature(object = "OnDiskMSnExp"))
: get the
index of the file (in fileNames(object)
) from which the
spectra were read. The relevant information is extracted from the
object's featureData
slot.
Returns a numeric vector with names corresponding to the spectrum names.
intensity(signature(object="OnDiskMSnExp"))
:
return the intensities from each spectrum in the data
set. Intensities are first read from the raw files followed by an
optional processing (depending on the processing steps defined in
the spectraProcessingQueue
). To reduce the amount of
required memory, this is performed on a per-file basis.
The BPPARAM
argument allows to specify how
and if parallel processing should be used. Information from
individual files will be processed in parallel (one process per
original file).
The method returns a list
of numeric intensity values. Each
list element represents the intensities from one spectrum.
ionCount(signature(object="OnDiskMSnExp",
BPPARAM=bpparam()))
:
extract the ion count (i.e. sum of intensity values) for each
spectrum in the data set. The relevant data has to be extracted
from the raw files (with eventually applying processing steps).
The BPPARAM
argument can be used to define how
and if parallel processing should be used. Information from
individual files will be processed in parallel (one process per
original file).
Returns a numeric vector with names corresponding to the spectrum names.
isolationWindowLowerMz(object = "OnDiskMSnExp")
: return the
lower m/z boundary for the isolation window.
Returns a numeric vector of length equal to the number of spectra
with the lower m/z value of the isolation window or NA
if
not specified in the original file.
isolationWindowUpperMz(object = "OnDiskMSnExp")
: return the
upper m/z boundary for the isolation window.
Returns a numeric vector of length equal to the number of spectra
with the upper m/z value of the isolation window or NA
if
not specified in the original file.
length(signature(object="OnDiskMSnExp"))
:
Returns the number of spectra of the current experiment.
msLevel(signature(object = "OnDiskMSnExp"))
: extract the
MS level from the spectra. The relevant information is extracted
from the object's featureData
slot.
Returns a numeric vector with names corresponding to the spectrum names.
mz(signature(object="OnDiskMSnExp"))
:
return the M/Z values from each spectrum in the data
set. M/Z values are first read from the raw files followed by an
optional processing (depending on the processing steps defined in
the spectraProcessingQueue
). To reduce the amount of
required memory, this is performed on a per-file basis.
The BPPARAM
argument allows to specify how
and if parallel processing should be used. Information from
individual files will be processed in parallel (one process per
original file).
The method returns a list
of numeric M/Z values. Each
list element represents the values from one spectrum.
peaksCount(signature(object="OnDiskMSnExp",
scans="numeric"), BPPARAM=bpparam())
:
extrac the peaks count from each spectrum in the object. Depending
on the eventually present ProcessingStep
objects in the
spectraProcessingQueue
raw data will be loaded to calculate
the peaks count. If no steps are present, the data is extracted
from the featureData
. Optional argument scans
allows
to specify the index of specific spectra from which the count
should be returned. The BPPARAM
argument can be used to define how
and if parallel processing should be used. Information from
individual files will be processed in parallel (one process per
original file).
Returns a numeric vector with names corresponding to the spectrum names.
polarity(signature(object="OnDiskMSnExp"))
:
returns a numeric vector with the polarity of the individual
spectra in the data set. The relevant information is extracted
from the featureData
.
rtime(signature(object="OnDiskMSnExp"))
:
extrac the retention time of the individual spectra in the data
set (from the featureData
).
Returns a numeric vector with names corresponding to the spectrum names.
scanIndex(signature(object="OnDiskMSnExp"))
:
get the spectra scan indices within the respective file. The
relevant information is extracted from the object's featureData
slot.
Returns a numeric vector of indices with names corresponding to the
spectrum names.
smoothed(signature(object="OnDiskMSnExp", msLevel. =
"numeric"))
: whether individual spectra are smoothed or
unsmoothed. The relevant information is extracted from the
object's featureData
slot. Returns a logical vector with
names corresponding to the spectrum names. Use
smoothed(object) <- value
to update the information, with
value being a logical vector of length equal to the number of
spectra in the experiment.
spectra(signature(object="OnDiskMSnExp"), BPPARAM=bpparam())
:
extract spectrum data from the individual files. This causes the
spectrum data to be read from the original raw files. After that
all processing steps defined in the spectraProcessingQueue
are applied to it. The results are then returned as a list
of Spectrum1
objects.
The BPPARAM
argument can be used to define how and if
parallel processing should be used. Information from individual
files will be processed in parallel (one process per file).
Note: extraction of selected spectra results in a considerable
processing speed and should thus be preferred over whole data
extraction.
Returns a list
of Spectrum1
objects
with names corresponding to the spectrum names.
tic(signature(object="OnDiskMSnExp"), initial = TRUE,
BPPARAM = bpparam())
:
get the total ion current (TIC) of each spectrum in the data
set. If initial = TRUE
, the information is extracted from
the object's featureData
and represents the tic provided in
the header of the original raw data files. For initial =
FALSE
, the TIC is calculated from the actual intensity values in
each spectrum after applying all data manipulation
methods (if any).
See also https://github.com/lgatto/MSnbase/issues/332 for more details.
BPPARAM
parameter: see spectra
method above.
Returns a numeric vector with names corresponding to the spectrum names.
bpi(signature(object="OnDiskMSnExp"), initial = TRUE,
BPPARAM = bpparam())
:
get the base peak intensity (BPI), i.e. the maximum intensity from
each spectrum in the data set. If initial = TRUE
, the
information is extracted from the object's featureData
and
represents the bpi provided in the header of the original raw data
files. For initial = FALSE
, the BPI is calculated from the
actual intensity values in each spectrum after applying all
eventual data manipulation methods.
See also https://github.com/lgatto/MSnbase/issues/332 for more details.
BPPARAM
parameter: see spectra
method above.
Returns a numeric vector with names corresponding to the spectrum names.
tic(signature(object="OnDiskMSnExp"))
: return a
character
of length length(object)
containing the
feature names. A replacement method is also available.
spectrapply(signature(object = "OnDiskMSnExp"), FUN = NULL,
BPPARAM = bpparam(), ...)
: applies the function FUN
to each
spectrum passing additional parameters in ...
to that
function and return its results. For FUN = NULL
it returns
the list of spectra (same as a call to spectra
). Parameter
BPPARAM
allows to specify how and if parallel processing
should be enabled.
Returns a list with the result for each of spectrum.
(in alphabetical order)
See also methods for MSnExp
or
pSet
objects. In contrast to the same-named
methods for pSet
or MSnExp
classes, the actual data manipulation is not performed immediately,
but only on-demand, e.g. when intensity or M/Z values are loaded.
clean(signature(object="OnDiskMSnExp"), all=TRUE, verbose=TRUE)
:
add an clean processing step to the lazy processing queue
of the OnDiskMSnExp
object. The clean
command will
only be executed when spectra information (including M/Z and
intensity values) is requested from the OnDiskMSnExp
object. Optional arguments to the methods are all=TRUE
and
verbose=TRUE
.
The method returns an OnDiskMSnExp
object.
For more details see documentation of the clean
method.
normalize(signature(object="OnDiskMSnExp"), method=c("max",
"sum"), ...)
:
add a normalize
processing step to the lazy processing
queue of the returned OnDiskMSnExp
object.
The method returns an OnDiskMSnExp
object.
For more details see documentation of the
normalize
method.
removePeaks(signature(object="OnDiskMSnExp"), t="min", verbose=TRUE)
:
add a removePeaks
processing step to the lazy processing
queue of the returned OnDiskMSnExp
object.
The method returns an OnDiskMSnExp
object.
For more details see documentation of the removePeaks
method.
trimMz(signature(object="OnDiskMSnExp", mzlim="numeric"),...)
:
add a trimMz
processing step to the lazy processing queue
of the returned OnDiskMSnExp
object.
The method returns an OnDiskMSnExp
object.
For more details see documentation of the trimMz
method.
validateOnDiskMSnExp(signature(object = "OnDiskMSnExp"))
:
validates an OnDiskMSnExp
object and all of its spectra. In
addition to the standard validObject
method, this
method reads also all spectra from the original files, applies
eventual processing steps and evaluates their validity.
as(from, "MSnExp")
Converts the OnDiskMSnExp
object from
, to an in-memory MSnExp
. Also available
as an S3 method as.MSnExp()
.
Johannes Rainer <[email protected]>
## Get some example mzML files library(msdata) mzfiles <- c(system.file("microtofq/MM14.mzML", package="msdata"), system.file("microtofq/MM8.mzML", package="msdata")) ## Read the data as an OnDiskMSnExp odmse <- readMSData(mzfiles, msLevel=1, centroided = TRUE) ## Get the length of data, i.e. the total number of spectra. length(odmse) ## Get the MS level head(msLevel(odmse)) ## Get the featureData, use fData to return as a data.frame head(fData(odmse)) ## Get to know from which file the spectra are head(fromFile(odmse)) ## And the file names: fileNames(odmse) ## Scan index and acquisitionNum head(scanIndex(odmse)) head(acquisitionNum(odmse)) ## Extract the spectra; the data is retrieved from the raw files. head(spectra(odmse)) ## Extracting individual spectra or a subset is much faster. spectra(odmse[1:50]) ## Alternatively, we could also subset the whole object by spectra and/or samples: subs <- odmse[rtime(odmse) >= 2 & rtime(odmse) <= 20, ] fileNames(subs) rtime(subs) ## Extract intensities and M/Z values per spectrum; the methods return a list, ## each element representing the values for one spectrum. ints <- intensity(odmse) mzs <- mz(odmse) ## Return a data.frame with mz and intensity pairs for each spectrum from the ## object res <- spectrapply(odmse, FUN = as, Class = "data.frame") ## Calling removePeaks, i.e. setting intensity values below a certain threshold to 0. ## Unlike the name suggests, this is not actually removing peaks. Such peaks with a 0 ## intensity are then removed by the "clean" step. ## Also, the manipulations are not applied directly, but put into the "lazy" ## processing queue. odmse <- removePeaks(odmse, t=10000) odmse <- clean(odmse) ## The processing steps are only applied when actual raw data is extracted. spectra(odmse[1:2]) ## Get the polarity of the spectra. head(polarity(odmse)) ## Get the retention time of all spectra head(rtime(odmse)) ## Get the intensities after removePeaks and clean intsAfter <- intensity(odmse) head(lengths(ints)) head(lengths(intsAfter)) ## The same for the M/Z values mzsAfter <- intensity(odmse) head(lengths(mzs)) head(lengths(mzsAfter)) ## Centroided or profile mode f <- msdata::proteomics(full.names = TRUE, pattern = "MS3TMT11.mzML") odmse <- readMSData(f, mode = "onDisk") validObject(odmse) odmse[[1]] table(isCentroidedFromFile(odmse), msLevel(odmse)) ## centroided status could be set manually centroided(odmse, msLevel = 1) <- FALSE centroided(odmse, msLevel = 2) <- TRUE centroided(odmse, msLevel = 3) <- TRUE ## or when reading the data odmse2 <- readMSData(f, centroided = c(FALSE, TRUE, TRUE), mode = "onDisk") table(centroided(odmse), msLevel(odmse)) ## Filtering precursor scans head(acquisitionNum(odmse)) head(msLevel(odmse)) ## Extract all spectra stemming from the first MS1 spectrum (from1 <- filterPrecursorScan(odmse, 21945)) table(msLevel(from1)) ## Extract the second sepctrum's parent (MS1) and children (MS3) ## spectra (from2 <- filterPrecursorScan(odmse, 21946)) table(msLevel(from2))
## Get some example mzML files library(msdata) mzfiles <- c(system.file("microtofq/MM14.mzML", package="msdata"), system.file("microtofq/MM8.mzML", package="msdata")) ## Read the data as an OnDiskMSnExp odmse <- readMSData(mzfiles, msLevel=1, centroided = TRUE) ## Get the length of data, i.e. the total number of spectra. length(odmse) ## Get the MS level head(msLevel(odmse)) ## Get the featureData, use fData to return as a data.frame head(fData(odmse)) ## Get to know from which file the spectra are head(fromFile(odmse)) ## And the file names: fileNames(odmse) ## Scan index and acquisitionNum head(scanIndex(odmse)) head(acquisitionNum(odmse)) ## Extract the spectra; the data is retrieved from the raw files. head(spectra(odmse)) ## Extracting individual spectra or a subset is much faster. spectra(odmse[1:50]) ## Alternatively, we could also subset the whole object by spectra and/or samples: subs <- odmse[rtime(odmse) >= 2 & rtime(odmse) <= 20, ] fileNames(subs) rtime(subs) ## Extract intensities and M/Z values per spectrum; the methods return a list, ## each element representing the values for one spectrum. ints <- intensity(odmse) mzs <- mz(odmse) ## Return a data.frame with mz and intensity pairs for each spectrum from the ## object res <- spectrapply(odmse, FUN = as, Class = "data.frame") ## Calling removePeaks, i.e. setting intensity values below a certain threshold to 0. ## Unlike the name suggests, this is not actually removing peaks. Such peaks with a 0 ## intensity are then removed by the "clean" step. ## Also, the manipulations are not applied directly, but put into the "lazy" ## processing queue. odmse <- removePeaks(odmse, t=10000) odmse <- clean(odmse) ## The processing steps are only applied when actual raw data is extracted. spectra(odmse[1:2]) ## Get the polarity of the spectra. head(polarity(odmse)) ## Get the retention time of all spectra head(rtime(odmse)) ## Get the intensities after removePeaks and clean intsAfter <- intensity(odmse) head(lengths(ints)) head(lengths(intsAfter)) ## The same for the M/Z values mzsAfter <- intensity(odmse) head(lengths(mzs)) head(lengths(mzsAfter)) ## Centroided or profile mode f <- msdata::proteomics(full.names = TRUE, pattern = "MS3TMT11.mzML") odmse <- readMSData(f, mode = "onDisk") validObject(odmse) odmse[[1]] table(isCentroidedFromFile(odmse), msLevel(odmse)) ## centroided status could be set manually centroided(odmse, msLevel = 1) <- FALSE centroided(odmse, msLevel = 2) <- TRUE centroided(odmse, msLevel = 3) <- TRUE ## or when reading the data odmse2 <- readMSData(f, centroided = c(FALSE, TRUE, TRUE), mode = "onDisk") table(centroided(odmse), msLevel(odmse)) ## Filtering precursor scans head(acquisitionNum(odmse)) head(msLevel(odmse)) ## Extract all spectra stemming from the first MS1 spectrum (from1 <- filterPrecursorScan(odmse, 21945)) table(msLevel(from1)) ## Extract the second sepctrum's parent (MS1) and children (MS3) ## spectra (from2 <- filterPrecursorScan(odmse, 21946)) table(msLevel(from2))
This method performs a peak picking on individual spectra
(Spectrum
instances) or whole experiments (MSnExp
instances) to
create centroided spectra.
For noisy spectra there are currently two different noise estimators
available, the Median Absolute Deviation (method = "MAD"
) and
Friedman's Super Smoother (method = "SuperSmoother"
),
as implemented in the MALDIquant::detectPeaks
and
MALDIquant::estimateNoise
functions respectively.
The method supports also to optionally refine the m/z value of the identified centroids by considering data points that belong (most likely) to the same mass peak. The m/z value is calculated as an intensity weighted average of the m/z values within the peak region. How the peak region is defined depends on the method chosen:
refineMz = "kNeighbors"
: m/z values (and their respective
intensities) of the2 * k
closest signals to the centroid are
used in the intensity weighted average calculation. The number of
neighboring signals can be defined with the argument k
.
refineMz = "descendPeak"
: the peak region is defined by
descending from the identified centroid/peak on both sides until the
measured signal increases again. Within this defined region all
measurements with an intensity of at least signalPercentage
of
the centroid's intensity are used to calculate the refined m/z. By
default the descend is stopped when the first signal that is equal or
larger than the last observed one is encountered. Setting
stopAtTwo = TRUE
, two consecutively increasing signals are
required.
By default (refineMz = "none"
, simply the m/z of the largest
signal (the identified centroid) is reported. See below for examples.
signature(x = "MSnExp", halfWindowSize = "integer",
method = "character", SNR = "numeric", verbose = "logical",
refineMz = "character", ...)
Performs the peak picking for all spectra in an MSnExp
instance.
method
could be "MAD"
or "SuperSmoother"
.
halfWindowSize
controls the window size of the peak picking
algorithm. The resulting window size is 2 * halfWindowSize + 1
.
The size should be nearly (or slightly larger) the FWHM
(full width at half maximum).
A local maximum is considered as peak if its intensity is SNR
times larger than the estimated noise.
refineMz
allows to choose a method for an optional centroid
m/z refinement (see description for more details). Choises are
"none"
(default, no m/z refinement), "kNeighbors"
and "descendPeak"
.
The arguments ...
are passed to the noise estimator or
m/z refinement functions.
For the noise estimator functions, currenlty only the
method = "SuperSmoother"
accepts additional arguments,
e.g. span
. Please see supsmu
for
details. refineMethod = "kNeighbors"
supports additional
argument k
and refineMethod = "descendPeak"
arguments signalPercentage
and stopAtTwo
. See
description above for more details.
This method displays a progress bar if verbose = TRUE
.
Returns an MSnExp
instance with centroided spectra.
signature(x = "Spectrum", method = "character",
halfWindowSize = "integer", ...)
Performs the peak picking for the spectrum (Spectrum
instance).
This method is the same as above but returns a centroided Spectrum
instead of an MSnExp
object. It has no verbose
argument.
Please read the details for the above MSnExp
method.
Sebastian Gibb <[email protected]> with contributions from Johannes Rainer.
S. Gibb and K. Strimmer. 2012. MALDIquant: a versatile R package for the analysis of mass spectrometry data. Bioinformatics 28: 2270-2271. http://strimmerlab.org/software/maldiquant/
clean
, removePeaks
smooth
,
estimateNoise
and trimMz
for other spectra
processing methods.
sp1 <- new("Spectrum1", intensity = c(1:6, 5:1), mz = 1:11, centroided = FALSE) sp2 <- pickPeaks(sp1) intensity(sp2) data(itraqdata) itraqdata2 <- pickPeaks(itraqdata) processingData(itraqdata2) ## Examples for refineMz: ints <- c(5, 3, 2, 3, 1, 2, 4, 6, 8, 11, 4, 7, 5, 2, 1, 0, 1, 0, 1, 1, 1, 0) mzs <- 1:length(ints) sp1 <- new("Spectrum1", intensity = ints, mz = mzs, centroided = FALSE) plot(mz(sp1), intensity(sp1), type = "h") ## Without m/z refinement: sp2 <- pickPeaks(sp1) points(mz(sp2), intensity(sp2), col = "darkgrey") ## Using k = 1, closest signals sp3 <- pickPeaks(sp1, refineMz = "kNeighbors", k = 1) points(mz(sp3), intensity(sp3), col = "green", type = "h") ## Using descendPeak requiring at least 50% or the centroid's intensity sp4 <- pickPeaks(sp1, refineMz = "descendPeak", signalPercentage = 50) points(mz(sp4), intensity(sp4), col = "red", type = "h")
sp1 <- new("Spectrum1", intensity = c(1:6, 5:1), mz = 1:11, centroided = FALSE) sp2 <- pickPeaks(sp1) intensity(sp2) data(itraqdata) itraqdata2 <- pickPeaks(itraqdata) processingData(itraqdata2) ## Examples for refineMz: ints <- c(5, 3, 2, 3, 1, 2, 4, 6, 8, 11, 4, 7, 5, 2, 1, 0, 1, 0, 1, 1, 1, 0) mzs <- 1:length(ints) sp1 <- new("Spectrum1", intensity = ints, mz = mzs, centroided = FALSE) plot(mz(sp1), intensity(sp1), type = "h") ## Without m/z refinement: sp2 <- pickPeaks(sp1) points(mz(sp2), intensity(sp2), col = "darkgrey") ## Using k = 1, closest signals sp3 <- pickPeaks(sp1, refineMz = "kNeighbors", k = 1) points(mz(sp3), intensity(sp3), col = "green", type = "h") ## Using descendPeak requiring at least 50% or the centroid's intensity sp4 <- pickPeaks(sp1, refineMz = "descendPeak", signalPercentage = 50) points(mz(sp4), intensity(sp4), col = "red", type = "h")
These methods provide the functionality to plot mass spectrometry data
provided as MSnExp
,
OnDiskMSnExp
or Spectrum
objects. Most functions plot mass spectra M/Z values against
intensities.
Full spectra (using the full
parameter) or specific peaks of
interest can be plotted using the reporters
parameter. If
reporters
are specified and full
is set to 'TRUE', a
sub-figure of the reporter ions is inlaid inside the full spectrum.
If an "MSnExp"
is provided as argument, all the
spectra are aligned vertically. Experiments can be subset to
extract spectra of interest using the [
operator or
extractPrecSpectra
methods.
Most methods make use the ggplot2
system in which case an
object of class 'ggplot' is returned invisibly.
If a single "Spectrum2"
and a "character"
representing a valid peptide sequence are passed as argument, the
expected fragement ions are calculated and matched/annotated on the
spectum plot.
x |
Objects of class |
y |
Missing, |
reporters |
An object of class
|
full |
Logical indicating whether full spectrum (respectively
spectra) of only reporter ions of interest should be
plotted. Default is 'FALSE', in which case |
centroided. |
Logical indicating if spectrum or spectra are in centroided mode, in which case peaks are plotted as histograms, rather than curves. |
plot |
Logical specifying whether plot should be printed to current device. Default is 'TRUE'. |
w1 |
Width of sticks for full centroided spectra. Default is to use maximum MZ value divided by 500. |
w2 |
Width of histogram bars for centroided reporter ions plots. Default is 0.01. |
See below for more details.
plot(signature(x = "MSnExp", y = "missing"),
type = c("spectra", "XIC"), reporters = "ReporterIons",
full = "logical", plot = "logical", ...)
For type = "spectra"
: Plots all the spectra in the
MSnExp
object vertically. One of reporters
must be
defined or full
set to 'TRUE'. In case of MSnExp
objects, repoter ions are not inlaid when full
is 'TRUE'.
For type = "XIC"
: Plots a combined plot of retention time
against m/z values and retention time against largest signal per
spectrum for each file. Data points are colored by intensity. The
lower part of the plot represents the location of the individual
signals in the retention time - m/z space, the upper part the base
peak chromatogram of the data (i.e. the largest signal for each
spectrum). This plot type is restricted to MS level 1 data and is
most useful for LC-MS data.
Ideally, the MSnExp
(or OnDiskMSnExp
)
object should be filtered first using the filterRt
and filterMz
functions to narrow on an ion of
interest. See examples below. This plot uses base R
plotting. Additional arguments to the plot
function can be
passed with ...
.
Additional arguments for type = "XIC"
are:
col
color for the border of the points. Defaults to
col = "grey"
.
colramp
color function/ramp to be used for the
intensity-dependent background color of data points. Defaults
to colramp = topo.colors
.
grid.color
color for the grid lines. Defaults to
grid.color = "lightgrey"
; use grid.color = NA
to
disable grid lines altogether.
pch
point character. Defaults to pch = 21
.
...
additional parameters for the low-level
plot
function.
plot(signature(x = "Spectrum", y = "missing"), reporters =
"ReporterIons", full = "logical", centroided. = "logical", plot =
"logical", w1, w2)
Displays the MZs against intensities of
the Spectrum
object as a line plot.
At least one of reporters
being defined or full
set to 'TRUE' is required.
reporters
and full
are used only for
"Spectrum2"
objects. Full "Spectrum1"
spectra are plotted
by default.
plot(signature(x = "Spectrum2", y = "character"), orientation
= "numeric", add = "logical", col = "character", pch, xlab =
"character", ylab = "character", xlim = "numeric", ylim =
"numeric", tolerance = "numeric", relative = "logical", type =
"character", modifications = "numeric", x = "numeric", fragments
= "data.frame", fragments.cex = "numeric", ... )
Plots a single
MS2 spectrum and annotates the fragment ions based on the
matching between the peaks in x
and the fragment peaks
calculated from the peptide sequence y
. The default
values are orientation=1
, add=FALSE
,
col="#74ADD1"
, pch=NA
, xlab="m/z"
,
ylab="intensity"
, ylim=c(0, 1)
,
tolerance=25e-6
, relative=TRUE, type=c("b", "y"),
modifications=c(C=160.030649)
, z=1
,
fragments=MSnbase:::calculateFragments_Spectrum2
and
fragments.cex=0.75
. Additional arguments ...
are
passed to plot.default
.
Laurent Gatto, Johannes Rainer and Sebastian Gibb
calculateFragments
to calculate ions produced by
fragmentation and plot.Spectrum.Spectrum
to plot and
compare 2 spectra and their shared peaks.
Chromatogram
for plotting of chromatographic data.
data(itraqdata) ## plotting experiments plot(itraqdata[1:2], reporters = iTRAQ4) plot(itraqdata[1:2], full = TRUE) ## plotting spectra plot(itraqdata[[1]],reporters = iTRAQ4, full = TRUE) itraqdata2 <- pickPeaks(itraqdata) i <- 14 s <- as.character(fData(itraqdata2)[i, "PeptideSequence"]) plot(itraqdata2[[i]], s, main = s) ## Load profile-mode LC-MS files library(msdata) od <- readMSData(dir(system.file("sciex", package = "msdata"), full.names = TRUE), mode = "onDisk") ## Restrict the MS data to signal for serine serine <- filterMz(filterRt(od, rt = c(175, 190)), mz = c(106.04, 106.06)) plot(serine, type = "XIC") ## Same plot but using heat.colors, rectangles and no point border plot(serine, type = "XIC", pch = 22, colramp = heat.colors, col = NA)
data(itraqdata) ## plotting experiments plot(itraqdata[1:2], reporters = iTRAQ4) plot(itraqdata[1:2], full = TRUE) ## plotting spectra plot(itraqdata[[1]],reporters = iTRAQ4, full = TRUE) itraqdata2 <- pickPeaks(itraqdata) i <- 14 s <- as.character(fData(itraqdata2)[i, "PeptideSequence"]) plot(itraqdata2[[i]], s, main = s) ## Load profile-mode LC-MS files library(msdata) od <- readMSData(dir(system.file("sciex", package = "msdata"), full.names = TRUE), mode = "onDisk") ## Restrict the MS data to signal for serine serine <- filterMz(filterRt(od, rt = c(175, 190)), mz = c(106.04, 106.06)) plot(serine, type = "XIC") ## Same plot but using heat.colors, rectangles and no point border plot(serine, type = "XIC", pch = 22, colramp = heat.colors, col = NA)
These method plot mass spectra MZ values against the intensities as line plots. The first spectrum is plotted in the upper panel and the other in upside down in the lower panel. Common peaks are drawn in a slightly darker colour. If a peptide sequence is provided it automatically calculates and labels the fragments.
x |
Object of class |
y |
Object of class |
... |
Further arguments passed to internal functions. |
signature(x = "Spectrum", y = "Spectrum", ...)
Plots two spectra against each other. Common peaks are drawn in a slightly
darker colour.
The ...
arguments are passed to the internal functions.
Currently tolerance
, relative
, sequences
and most of
the plot.default
arguments (like xlim
, ylim
,
main
, xlab
, ylab
, ...) are supported.
You could change the tolerance
(default 25e-6
) and
decide whether this tolerance should be applied relative
(default relative = TRUE
) or absolute (relative = FALSE
)
to find and colour common peaks.
Use a character
vector of length 2 to provide sequences
which would be used to calculate and draw the corresponding fragments.
If sequences
are given the
type
argument (default: type=c("b", "y")
specify the
fragment types which should calculated. Also it is possible to allow some
modifications
. Therefore you have to apply a named character
vector for modifications
where the name corresponds to the
one-letter-code of the modified amino acid
(default: Carbamidomethyl modifications=c(C=57.02146)
). Additional
you can specifiy the type of neutralLoss
(default:
PSMatch::defaultNeutralLoss()
).
See calculateFragments
for details.
There are a lot of graphical arguments available to control the
representation of the peaks and fragments. Use peaks.pch
to set the
character on top of the peaks (default: peaks.pch=19
). In a similar
way you can set the line width peaks.lwd=1
and the magnification
peaks.cex=0.5
of the peaks. The size of the fragment/legend labels could
be set using fragments.cex=0.75
or legend.cex
respectively.
See par
for details about graphical parameters in general.
Sebastian Gibb <[email protected]>
More spectrum plotting available in plot.Spectrum
.
More details about fragment calculation: calculateFragments
.
## find path to a mzXML file file <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "mzXML$") ## create basic MSnExp msexp <- readMSData(file, centroided.=FALSE) ## centroid them msexp <- pickPeaks(msexp) ## plot the first against the second spectrum plot(msexp[[1]], msexp[[2]]) ## add sequence information plot(msexp[[1]], msexp[[2]], sequences=c("VESITARHGEVLQLRPK", "IDGQWVTHQWLKK")) itraqdata2 <- pickPeaks(itraqdata) (k <- which(fData(itraqdata2)[, "PeptideSequence"] == "TAGIQIVADDLTVTNPK")) mzk <- precursorMz(itraqdata2)[k] zk <- precursorCharge(itraqdata2)[k] mzk * zk plot(itraqdata2[[k[1]]], itraqdata2[[k[2]]])
## find path to a mzXML file file <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "mzXML$") ## create basic MSnExp msexp <- readMSData(file, centroided.=FALSE) ## centroid them msexp <- pickPeaks(msexp) ## plot the first against the second spectrum plot(msexp[[1]], msexp[[2]]) ## add sequence information plot(msexp[[1]], msexp[[2]], sequences=c("VESITARHGEVLQLRPK", "IDGQWVTHQWLKK")) itraqdata2 <- pickPeaks(itraqdata) (k <- which(fData(itraqdata2)[, "PeptideSequence"] == "TAGIQIVADDLTVTNPK")) mzk <- precursorMz(itraqdata2)[k] zk <- precursorCharge(itraqdata2)[k] mzk * zk plot(itraqdata2[[k[1]]], itraqdata2[[k[2]]])
These methods plot the retention time vs. precursor MZ for the whole
"MSnExp"
experiment. Individual dots will be
colour-coded to describe individual spectra's peaks count, total ion
count, precursor charge (MS2 only) or file of origin.
The methods make use the ggplot2
system. An object of class
'ggplot' is returned invisibly.
object |
An object of class |
z |
A character indicating according to what variable to colour the dots. One of, possibly abreviated, "ionCount" (total ion count), "file" (raw data file), "peaks.count" (peaks count) or "charge" (precursor charge). |
alpha |
Numeric [0,1] indicating transparence level of points. |
plot |
A logical indicating whether the plot should be printed (default is 'TRUE'). |
signature(object = "MSnExp", ...)
Plots a 'MSnExp' summary.
signature(object = "data.frame", ...)
Plots a summary of the 'MSnExp' experiment described by the data frame.
Laurent Gatto
The plotDensity
and plotMzDelta
methods
for other QC plots.
itraqdata plot2d(itraqdata,z="ionCount") plot2d(itraqdata,z="peaks.count") plot2d(itraqdata,z="charge")
itraqdata plot2d(itraqdata,z="ionCount") plot2d(itraqdata,z="peaks.count") plot2d(itraqdata,z="charge")
These methods plot the distribution of several parameters of interest
for the different precursor charges for "MSnExp"
experiment.
The methods make use the ggplot2
system. An object of class
'ggplot' is returned invisibly.
object |
An object of class |
z |
A character indicating which parameter's densitiy to plot. One of, possibly abreviated, "ionCount" (total ion count), "peaks.count" (peaks count) or "precursor.mz" (precursor MZ). |
log |
Logical, whether to log transform the data (default is 'FALSE'). |
plot |
A logical indicating whether the plot should be printed (default is 'TRUE'). |
signature(object = "MSnExp", ...)
Plots a 'MSnExp' summary.
signature(object = "data.frame", ...)
Plots a summary of the 'MSnExp' experiment described by the data frame.
Laurent Gatto
The plot2d
and plotDensity
methods for
other QC plots.
itraqdata plotDensity(itraqdata,z="ionCount") plotDensity(itraqdata,z="peaks.count") plotDensity(itraqdata,z="precursor.mz")
itraqdata plotDensity(itraqdata,z="ionCount") plotDensity(itraqdata,z="peaks.count") plotDensity(itraqdata,z="precursor.mz")
The m/z delta plot illustrates the suitability of MS2 spectra for identification by plotting the m/z differences of the most intense peaks. The resulting histogram should optimally shown outstanding bars at amino acid residu masses. The plots have been described in Foster et al 2011.
Only a certain percentage of most intense MS2 peaks are taken into
account to use the most significant signal. Default value is 10% (see
percentage
argument). The difference between peaks is then
computed for all individual spectra and their distribution is plotted
as a histogram where single bars represent 1 m/z differences. Delta
m/z between 40 and 200 are plotted by default, to encompass the
residue masses of all amino acids and several common contaminants,
although this can be changes with the xlim
argument.
In addition to the processing described above, isobaric reporter tag
peaks (see the reporters
argument) and the precursor peak (see
the precMz
argument) can also be removed from the MS2 spectrum,
to avoid interence with the fragment peaks.
Note that figures in Foster et al 2011 have been produced and optimised for centroided data. Application of the plot as is for data in profile mode has not been tested thoroughly, although the example below suggest that it might work.
The methods make use the ggplot2
system. An object of class
ggplot
is returned invisibly.
Most of the code for plotMzDelta has kindly been contributed by Guangchuang Yu.
object |
An object of class |
reporters |
An object of class class
|
subset |
A numeric between 0 and 1 to use a subset of
|
percentage |
The percentage of most intense peaks to be used for the plot. Default is 0.1. |
precMz |
A |
precMzWidth |
A |
bw |
A |
xlim |
A |
withLabels |
A |
size |
A |
plot |
A |
verbose |
A |
signature(object = "MSnExp", ...)
Plots and (invisibly) returns the m/z delta histogram.
Laurent Gatto and Guangchuang Yu
Foster JM, Degroeve S, Gatto L, Visser M, Wang R, Griss J, Apweiler R, Martens L. "A posteriori quality control for the curation and reuse of public proteomics data." Proteomics, 2011 Jun;11(11):2182-94. doi:10.1002/pmic.201000602. Epub 2011 May 2. PMID: 21538885
The plotDensity
and plot2d
methods for
other QC plots.
mzdplot <- plotMzDelta(itraqdata, subset = 0.5, reporters = iTRAQ4, verbose = FALSE, plot = FALSE) ## let's retrieve peptide sequence information ## and get a table of amino acids peps <- as.character(fData(itraqdata)$PeptideSequence) aas <- unlist(strsplit(peps,"")) ## table of aas table(aas) ## mzDelta plot print(mzdplot)
mzdplot <- plotMzDelta(itraqdata, subset = 0.5, reporters = iTRAQ4, verbose = FALSE, plot = FALSE) ## let's retrieve peptide sequence information ## and get a table of amino acids peps <- as.character(fData(itraqdata)$PeptideSequence) aas <- unlist(strsplit(peps,"")) ## table of aas table(aas) ## mzDelta plot print(mzdplot)
These methods produce plots that illustrate missing data.
is.na
returns the expression matrix of it MSnSet
argument as a matrix of logicals referring whether the corresponding
cells are NA
or not. It is generally used in conjunction with
table
and image
(see example below).
The plotNA
method produces plots that illustrate missing data.
The completeness of the full dataset or a set of proteins (ordered by
increasing NA content along the x axis) is represented.
The methods make use the ggplot2
system. An object of class
'ggplot' is returned invisibly.
signature(x = "MSnSet")
Returns the a matrix of logicals of dimensions dim(x)
specifiying if respective values are missing in the
MSnSet
's expression matrix.
signature(object = "MSnSet", pNA = "numeric")
Plots missing data for an MSnSet
instance. pNA
is a
numeric
of length 1 that specifies the percentage
of accepted missing data values per features. This value will be
highlighted with a point on the figure, illustrating the overall
percentage of NA values in the full data set and the number of
proteins retained. Default is 1/2.
Laurent Gatto
See also the filterNA
method to filter out features with
a specified proportion if missing values.
data(msnset) exprs(msnset)[sample(prod(dim(msnset)), 120)] <- NA head(is.na(msnset)) table(is.na(msnset)) image(msnset) plotNA(msnset, pNA = 1/4)
data(msnset) exprs(msnset)[sample(prod(dim(msnset)), 120)] <- NA head(is.na(msnset)) table(is.na(msnset)) image(msnset) plotNA(msnset, pNA = 1/4)
precSelection
computes the number of selection events each
precursor ions has undergone in an tandem MS experiment. This will be
a function of amount of peptide loaded, chromatography efficiency,
exclusion time,... and is useful when optimising and experimental
setup. This function returns a named integer vector or length equal to
the number of unique precursor MZ values in the original
experiment. See n
parameter to set the number of MZ significant
decimals.
precSelectionTable
is a wrapper around precSelection
and
returns a table with the number of single, 2-fold, ... selection events.
precSelection(object,n)
precSelection(object,n)
object |
An instane of class |
n |
The number of decimal places to round the precursor MZ to. Is passed to the round function. |
A named integer in case of precSelection
and a table
for
precSelectionTable
.
Laurent Gatto
precSelection(itraqdata) precSelection(itraqdata,n=2) precSelectionTable(itraqdata) ## only single selection event in this reduced exeriment
precSelection(itraqdata) precSelection(itraqdata,n=2) precSelectionTable(itraqdata) ## only single selection event in this reduced exeriment
The ProcessingStep
class is a simple object to encapsule all
relevant information of a data analysis processing step, i.e. the
function name and all arguments.
Objects of this class are mainly used to record all possible
processing steps of an OnDiskMSnExp
object for
later lazy execution.
Objects can be created by calls of the form
new("ProcessingStep",...)
or using the ProcessingStep
constructor function.
FUN
:The function name to be executed as a character string.
ARGS
:A named list
with all arguments to the function.
Execute the processing step object
. Internally this
calls do.call
passing all arguments defined in the
ProcessingStep
object
along with potential
additional arguments in ...
to the function
object@FUN
.
Class "Versioned"
, directly.
Johannes Rainer <[email protected]>
## Define a simple ProcessingStep procS <- ProcessingStep("sum", list(c(1, 3, NA, 5), na.rm= TRUE)) executeProcessingStep(procS)
## Define a simple ProcessingStep procS <- ProcessingStep("sum", list(c(1, 3, NA, 5), na.rm= TRUE)) executeProcessingStep(procS)
Container for high-throughput mass-spectrometry assays and
experimental metadata. This class is based on Biobase's
"eSet"
virtual class, with the notable exception
that 'assayData' slot is an environment contain objects of class
"Spectrum"
.
A virtual Class: No objects may be created from it.
See "MSnExp"
for instantiatable sub-classes.
assayData
:Object of class "environment"
containing the MS spectra (see "Spectrum1"
and "Spectrum2"
).
phenoData
:Object of class
"AnnotatedDataFrame"
containing
experimenter-supplied variables describing sample (i.e the
individual tags for an labelled MS experiment)
See phenoData
for more details.
featureData
:Object of class
"AnnotatedDataFrame"
containing variables
describing features (spectra in our case), e.g. identificaiton data,
peptide sequence, identification score,... (inherited from
"eSet"
). See featureData
for
more details.
experimentData
:Object of class
"MIAPE"
, containing details of experimental
methods. See experimentData
for more details.
protocolData
:Object of class
"AnnotatedDataFrame"
containing
equipment-generated variables (inherited from
"eSet"
). See protocolData
for
more details.
processingData
:Object of class
"MSnProcess"
that records all processing.
.cache
:Object of class environment
used to
cache data. Under development.
.__classVersion__
:Object of class
"Versions"
describing the versions of the class.
Class "VersionedBiobase"
, directly.
Class "Versioned"
, by class "VersionedBiobase", distance 2.
Methods defined in derived classes may override the methods described here.
signature(x = "pSet")
: Subset current object and
return object of same class.
signature(x = "pSet")
: Direct access to individual
spectra.
signature(x = "pSet")
: directly access a specific
sample annotation column from the pData
.
signature(x = "pSet")
: replace or add a
sample annotation column in the pData
.
Access abstract in experimentData
.
signature(object = "pSet")
: Access the
assayData
slot. Returns an environment
.
signature(x = "pSet")
: Synonymous with
experimentData.
signature(x = "pSet")
: Returns the dimensions of
the phenoData
slot.
signature(x = "pSet")
: Access details
of experimental methods.
signature(x = "pSet")
: Access the
featureData
slot.
signature(x = "pSet")
: Access feature data
information.
signature(x = "pSet")
: Coordinate access
of feature names (e.g spectra, peptides or proteins) in
assayData
slot.
signature(object = "pSet")
: Access file
names in the processingData
slot.
signature(object = "pSet")
: Access raw data
file indexes (to be found in the processingData
slot) from
which the individual object's spectra where read from.
signature(object = "pSet")
: Indicates
whether individual spectra are centroided ('TRUE') of uncentroided
('FALSE'). Use centroided(object) <- value
to update a
whole experiment, ensuring that object
and value
have the same length.
signature(object = "pSet")
: Indicates
whether individual spectra are smoothed ('TRUE') of unsmoothed
('FALSE'). Use smoothed(object) <- value
to update a
whole experiment, ensuring that object
and value
have the same length.
signature(x = "pSet")
: Access metadata
describing features reported in fData
.
signature(x = "pSet")
: Access variable
labels in featureData
.
signature(x = "pSet")
: Returns the number of
features in the assayData
slot.
signature(x = "pSet")
: Retrieve and
unstructured notes associated with pSet
in the
experimentData
slot.
signature(x = "pSet")
: Access sample data
information.
signature(x = "pSet", value)
: Replace sample data
information with value
, value being a data.frame
.
signature(x = "pSet")
: Access the
phenoData
slot.
signature(x = "pSet", value)
: Replace
sample data information with value
. value
can be a
data.frame
or an AnnotatedDataFrame
.
signature(object = "pSet")
: Access the
processingData
slot.
signature(x = "pSet")
: Access the
protocolData
slot.
signature(x = "pSet")
: Access PMIDs in
experimentData
.
signature(x = "pSet")
: Access sample names
in phenoData
. A replacement method is also available.
signature(x = "pSet", ...)
: Access the
assayData
slot, returning the features as a list
.
Additional arguments are currently ignored.
signature(x = "pSet")
: Access metadata
describing variables reported in pData
.
signature(x = "pSet")
: Access variable
labels in phenoData
.
signature(object = "pSet")
: Accessor
for spectra acquisition numbers.
signature(object = "pSet")
: Accessor
for spectra scan indices.
signature(object = "pSet")
: Accessor
for MS2 spectra collision energies.
signature(object = "pSet", ...)
: Accessor
for spectra instenities, returned as named list. Additional
arguments are currently ignored.
signature(object = "pSet")
: Prints the MIAPE-MS
meta-data stored in the experimentData
slot.
signature(object = "pSet")
: Accessor for
spectra MS levels.
signature(object = "pSet", ...)
: Accessor for spectra
M/Z values, returned as a named list. Additional arguments are
currently ignored.
signature(object = "pSet")
: Accessor for
spectra preak counts.
signature(object = "pSet", scans =
"numeric")
: Accessor to scans
spectra preak counts.
signature(object = "pSet")
: Accessor for MS1
spectra polarities.
signature(object = "pSet")
: Accessor
for MS2 precursor charges.
signature(object = "pSet")
: Accessor
for MS2 precursor intensity.
signature(object = "pSet")
: Accessor
for MS2 precursor M/Z values.
signature(object = "pSet")
: Accessor
for MS2 precursor scan numbers.
see precAcquisitionNum
.
signature(object = "pSet", ...)
: Accessor for spectra
retention times. Additional arguments are currently ignored.
signature(object = "pSet", ...)
: Accessor for spectra
total ion counts. Additional arguments are currently ignored.
signature(object = "pSet")
: Accessor for spectra
total ion current.
signature(object = "pSet")
: Returns a data
frame containing all available spectra parameters (MSn only).
signature(object = "pSet", scans = "numeric")
:
Returns a data frame containing scans
spectra parameters
(MSn only).
spectrapply(signature(object = "pSet"), FUN = NULL,
BPPARAM = bpparam(), ...)
: applies the function FUN
to each
spectrum passing additional parameters in ...
to that
function and return its results. For FUN = NULL
it returns
the list of spectra (same as a call to spectra
). Parameter
BPPARAM
allows to specify how and if parallel processing
should be enabled.
Returns a list with the result for each of spectrum.
isolationWindowLowerMz(object = "pSet")
: return the
lower m/z boundary for the isolation window. Note that this method
is at present only available for OnDiskMSnExp
objects.
isolationWindowUpperMz(object = "pSet")
: return the
upper m/z boundary for the isolation window. Note that this method
is at present only available for OnDiskMSnExp
objects.
Additional accessors for the experimental metadata
(experimentData
slot) are defined. See
"MIAPE"
for details.
Laurent Gatto
The "eSet"
class, on which pSet
is based.
"MSnExp"
for an instantiatable application of
pSet
.
showClass("pSet")
showClass("pSet")
Manufacturers sometimes provide purity correction values indicating the percentages of each reporter ion that have masses differing by +/- n Da from the nominal reporter ion mass due to isotopic variants. This correction is generally applied after reporter peaks quantitation.
Purity correction here is applied using solve
from the
base
package using the purity correction values as coefficient of
the linear system and the reporter quantities as the right-hand side
of the linear system. 'NA' values are ignored and negative
intensities after correction are also set to 'NA'.
A more elaborated purity correction method is described in Shadforth et al., i-Tracker: for quantitative proteomics using iTRAQ. BMC Genomics. 2005 Oct 20;6:145. (PMID 16242023).
Function makeImpuritiesMatrix(x, filename, edit = TRUE)
helps
the user to create such a matrix. The function can be used in two ways.
If given an integer x
, it is used as the dimension of the
square matrix (i.e the number of reporter ions). For TMT6-plex and
iTRAQ4-plex, default values taken from manufacturer's certification
sheets are used as templates, but batch specific values should be used
whenever possible. Alternatively, the filename
of a csv
spreadsheet can be provided. The sheet should define the correction
factors as illustrated below (including reporter names in the first
column and header row) and the corresponding correction matrix is
calculated. Examples of such csv
files are available in the
package's extdata
directory. Use
dir(system.file("extdata", package = "MSnbase"), pattern =
"PurityCorrection", full.names = TRUE)
to locate them.
If edit = TRUE
, the the matrix can be edited before
it is returned.
object |
An object of class |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
impurities |
A square 'matrix' of dim equal to ncol(object) defining the correction coefficients to be applied. The reporter ions should be ordered along the columns and the relative percentages along the rows. As an example, below is the correction factors as provided in an ABI iTRAQ 4-plex certificate of analysis:
The impurity table will be
where, the diagonal is computed as 100 - sum of rows of the original table and subsequent cells are directly filled in. Similarly, for TMT 6-plex tags, we observe
and obtain the following impurity correction matrix
For iTRAQ 8-plex, given the following correction factors (to make such a matrix square, if suffices to add -4, -3, +3 and +4 columns filled with zeros):
we calculate the impurity correction matrix shown below
Finally, for a TMT 10-plex impurity matrix (for example lot RH239932)
(Note that a previous example, taken from lot PB199188A, contained a typo.) the impurity correction matrix is
These examples are provided as defaults impurity correction matrices
in |
signature(object = "MSnSet", impurities = "matrix")
## quantifying full experiment data(msnset) impurities <- matrix(c(0.929,0.059,0.002,0.000, 0.020,0.923,0.056,0.001, 0.000,0.030,0.924,0.045, 0.000,0.001,0.040,0.923), nrow=4, byrow = TRUE) ## or, using makeImpuritiesMatrix() ## Not run: impurities <- makeImpuritiesMatrix(4) msnset.crct <- purityCorrect(msnset, impurities) head(exprs(msnset)) head(exprs(msnset.crct)) processingData(msnset.crct) ## default impurity matrix for iTRAQ 8-plex makeImpuritiesMatrix(8, edit = FALSE) ## default impurity matrix for TMT 10-plex makeImpuritiesMatrix(10, edit = FALSE)
## quantifying full experiment data(msnset) impurities <- matrix(c(0.929,0.059,0.002,0.000, 0.020,0.923,0.056,0.001, 0.000,0.030,0.924,0.045, 0.000,0.001,0.040,0.923), nrow=4, byrow = TRUE) ## or, using makeImpuritiesMatrix() ## Not run: impurities <- makeImpuritiesMatrix(4) msnset.crct <- purityCorrect(msnset, impurities) head(exprs(msnset)) head(exprs(msnset.crct)) processingData(msnset.crct) ## default impurity matrix for iTRAQ 8-plex makeImpuritiesMatrix(8, edit = FALSE) ## default impurity matrix for TMT 10-plex makeImpuritiesMatrix(10, edit = FALSE)
This method quantifies individual "Spectrum"
objects or full "MSnExp"
experiments. Current,
MS2-level isobar tagging using iTRAQ and TMT (or any arbitrary peaks
of interest, see "ReporterIons"
) and MS2-level
label-free quantitation (spectral counting, spectral index or spectral
abundance factor) are available.
Isobaric tag peaks of single spectra or complete experiments can be
quantified using appropriate methods
. Label-free quantitation
is available only for MSnExp
experiments.
Since version 1.13.5, parallel quantitation is supported by the
BiocParallel
package and controlled by the BPPARAM
argument.
object |
An instance of class |
method |
Peak quantitation method. For isobaric tags, one of, possibly
abreviated For label-free quantitation, one of Finally, the simple |
reporters |
An instance of class |
strict |
For isobaric tagging only. If strict is |
BPPARAM |
Support for parallel processing using the |
parallel |
Deprecated. Please see |
qual |
Should the |
pepseq |
A |
verbose |
Verbose of the output (only for |
... |
Further arguments passed to the quantitation functions. |
"ReporterIons"
define specific MZ at which peaks
are expected and a window around that MZ value. A peak of interest is
searched for in that window. Since version 1.1.2, warnings are not
thrown anymore in case no data is found in that region or if the peak
extends outside the window. This can be checked manually after
quantitation, by inspecting the quantitation data (using the
exprs
accessor) for NA
values or by comaring the
lowerMz
and upperMz
columns in the
"MSnSet"
qual
slot against the respective
expected mz(reporters)
+/- width(reporters)
.
Once the range of the curve is found, quantification is performed. If
no data points are found in the expected region, NA
is returned
for the reporter peak MZ.
Note that for label-free, spectra that have not been identified (the
corresponding fields in the feature data are populated with NA
values) or that have been uniquely assigned to a protein (the
nprot
feature data is greater that 1) are removed prior to
quantitation. The latter does not apply for method = "count"
but can be applied manually with
removeMultipleAssignment
.
signature(object = "MSnExp", method = "character", reporters
= "ReporterIons", verbose = "logical", ...)
For isobaric tagging, quantifies peaks defined in reporters
using method
in all spectra of the MSnExp
object. If
verbose is set to TRUE
, a progress bar will be displayed.
For label-free quantitation, the respective quantitation methods
and normalisations are applied to the spectra. These methods
require two additional arguments (...
), namely the protein
accession of identifiers (fcol
, with detault value
"DatabaseAccess"
) and the protein lengths (plength
,
with default value "DBseqLength"
). These values are
available of the identification data had been collated using
addIdentificationData
.
An object of class "MSnSet"
is returned
containing the quantified feature expression and all meta data
inherited from the MSnExp
object
argument.
signature(object = "Spectrum", method = "character",
reporters = "ReporterIons")
Quantifies peaks defined in reporters
using method
in the Spectrum
object (isobaric tagging only).
A list of length 2 will be returned. The first element, named
peakQuant
, is a 'numeric' of length equal to
length(reporters)
with quantitation of the reporter peaks
using method
.
The second element, names curveStats
, is a 'data.frame' of
dimension length(reporters)
times 7 giving, for each
reporter curve parameters: maximum intensity ('maxInt'
),
number of maxima ('nMaxInt'
), number of data points defined
the curve ('baseLength'
), lower and upper MZ values for the
curve ('lowerMz'
and 'upperMz'
), reporter
('reporter'
) and precursor MZ value ('precursor'
)
when available.
Laurent Gatto and Sebastian Gibb
For details about the spectral index (SI), see Griffin NM, Yu J, Long F, Oh P, Shore S, Li Y, Koziol JA, Schnitzer JE. Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis. Nat Biotechnol. 2010 Jan;28(1):83-9. doi: 10.1038/nbt.1592. PMID: 20010810; PubMed Central PMCID: PMC2805705.
For details about the spectra abundance factor, see Paoletti AC, Parmely TJ, Tomomori-Sato C, Sato S, Zhu D, Conaway RC, Conaway JW, Florens L, Washburn MP. Quantitative proteomic analysis of distinct mammalian Mediator complexes using normalized spectral abundance factors. PNAS. 2006 Dec 12;103(50):18928-33. PMID: 17138671; PubMed Central PMCID: PMC1672612.
## Quantifying a full experiment using iTRAQ4-plex tagging data(itraqdata) msnset <- quantify(itraqdata, method = "trap", reporters = iTRAQ4) msnset ## specifying a custom parallel framework ## bp <- MulticoreParam(2L) # on Linux/OSX ## bp <- SnowParam(2L) # on Windows ## quantify(itraqdata[1:10], method = "trap", iTRAQ4, BPPARAM = bp) ## Checking for non-quantified peaks sum(is.na(exprs(msnset))) ## Quantifying a single spectrum qty <- quantify(itraqdata[[1]], method = "trap", iTRAQ4[1]) qty$peakQuant qty$curveStats ## Label-free quantitation ## Raw (mzXML) and identification (mzid) files quantFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "mzXML$") identFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "dummyiTRAQ.mzid") msexp <- readMSData(quantFile) msexp <- addIdentificationData(msexp, identFile) fData(msexp)$DatabaseAccess si <- quantify(msexp, method = "SIn") processingData(si) exprs(si) saf <- quantify(msexp, method = "NSAF") processingData(saf) exprs(saf)
## Quantifying a full experiment using iTRAQ4-plex tagging data(itraqdata) msnset <- quantify(itraqdata, method = "trap", reporters = iTRAQ4) msnset ## specifying a custom parallel framework ## bp <- MulticoreParam(2L) # on Linux/OSX ## bp <- SnowParam(2L) # on Windows ## quantify(itraqdata[1:10], method = "trap", iTRAQ4, BPPARAM = bp) ## Checking for non-quantified peaks sum(is.na(exprs(msnset))) ## Quantifying a single spectrum qty <- quantify(itraqdata[[1]], method = "trap", iTRAQ4[1]) qty$peakQuant qty$curveStats ## Label-free quantitation ## Raw (mzXML) and identification (mzid) files quantFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "mzXML$") identFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "dummyiTRAQ.mzid") msexp <- readMSData(quantFile) msexp <- addIdentificationData(msexp, identFile) fData(msexp)$DatabaseAccess si <- quantify(msexp, method = "SIn") processingData(si) exprs(si) saf <- quantify(msexp, method = "NSAF") processingData(saf) exprs(saf)
Reads a mgf file and generates an "MSnExp"
object.
readMgfData(filename, pdata = NULL, centroided = TRUE, smoothed = FALSE, verbose = isMSnbaseVerbose(), cache = 1)
readMgfData(filename, pdata = NULL, centroided = TRUE, smoothed = FALSE, verbose = isMSnbaseVerbose(), cache = 1)
filename |
character vector with file name to be read. |
pdata |
an object of class |
smoothed |
|
centroided |
|
cache |
Numeric indicating caching level. Default is 1. Under development. |
verbose |
verbosity flag. |
Note that when reading an mgf file, the original order of the spectra
is lost. Thus, if the data was originally written to mgf from an
MSnExp
object using writeMgfData
, although the feature
names will be identical, the spectra are not as a result of the
reordering. See example below.
An instance of
Guangchuang Yu and Laurent Gatto
writeMgfData
method to write the content of
"Spectrum"
or "MSnExp"
objects to mgf files. Raw data files can also be read with the
readMSData
function.
data(itraqdata) writeMgfData(itraqdata, con="itraqdata.mgf", COM="MSnbase itraqdata") itraqdata2 <- readMgfData("itraqdata.mgf") ## note that the order of the spectra is altered ## and precision of some values (precursorMz for instance) match(signif(precursorMz(itraqdata2),4),signif(precursorMz(itraqdata),4)) ## [1] 1 10 11 12 13 14 15 16 17 18 ... ## ... but all the precursors are there all.equal(sort(precursorMz(itraqdata2)), sort(precursorMz(itraqdata)), check.attributes=FALSE, tolerance=10e-5) ## is TRUE all.equal(as.data.frame(itraqdata2[[1]]),as.data.frame(itraqdata[[1]])) ## is TRUE all.equal(as.data.frame(itraqdata2[[3]]),as.data.frame(itraqdata[[11]])) ## is TRUE f <- dir(system.file(package="MSnbase",dir="extdata"), full.name=TRUE, pattern="test.mgf") (x <- readMgfData(f)) x[[2]] precursorMz(x[[2]]) precursorIntensity(x[[2]]) precursorMz(x[[1]]) precursorIntensity(x[[1]]) ## was not in test.mgf scanIndex(x)
data(itraqdata) writeMgfData(itraqdata, con="itraqdata.mgf", COM="MSnbase itraqdata") itraqdata2 <- readMgfData("itraqdata.mgf") ## note that the order of the spectra is altered ## and precision of some values (precursorMz for instance) match(signif(precursorMz(itraqdata2),4),signif(precursorMz(itraqdata),4)) ## [1] 1 10 11 12 13 14 15 16 17 18 ... ## ... but all the precursors are there all.equal(sort(precursorMz(itraqdata2)), sort(precursorMz(itraqdata)), check.attributes=FALSE, tolerance=10e-5) ## is TRUE all.equal(as.data.frame(itraqdata2[[1]]),as.data.frame(itraqdata[[1]])) ## is TRUE all.equal(as.data.frame(itraqdata2[[3]]),as.data.frame(itraqdata[[11]])) ## is TRUE f <- dir(system.file(package="MSnbase",dir="extdata"), full.name=TRUE, pattern="test.mgf") (x <- readMgfData(f)) x[[2]] precursorMz(x[[2]]) precursorIntensity(x[[2]]) precursorMz(x[[1]]) precursorIntensity(x[[1]]) ## was not in test.mgf scanIndex(x)
Reads as set of XML-based mass-spectrometry data files and
generates an MSnExp object. This function uses the
functionality provided by the mzR
package to access data and
meta data in mzData
, mzXML
and mzML
.
readMSData( files, pdata = NULL, msLevel. = NULL, verbose = isMSnbaseVerbose(), centroided. = NA, smoothed. = NA, cache. = 1L, mode = c("inMemory", "onDisk") )
readMSData( files, pdata = NULL, msLevel. = NULL, verbose = isMSnbaseVerbose(), centroided. = NA, smoothed. = NA, cache. = 1L, mode = c("inMemory", "onDisk") )
files |
A |
pdata |
An object of class AnnotatedDataFrame or
|
msLevel. |
MS level spectra to be read. In |
verbose |
Verbosity flag. Default is to use
|
centroided. |
A |
smoothed. |
A |
cache. |
Numeric indicating caching level. Default is 0 for
MS1 and 1 MS2 (or higher). Only relevant for |
mode |
On of |
When using the inMemory
mode, the whole MS data is read from
file and kept in memory as Spectrum objects within the
MSnExp'es assayData
slot.
To reduce the memory footpring especially for large MS1 data sets
it is also possible to read only selected information from the MS
files and fetch the actual spectrum data (i.e. the M/Z and
intensity values) only on demand from the original data
files. This can be achieved by setting mode = "onDisk"
. The
function returns then an OnDiskMSnExp object instead of a
MSnExp object.
An MSnExp object for inMemory
mode and a
OnDiskMSnExp object for onDisk
mode.
readMSData
uses normalizePath
to replace relative with
absolute file paths.
Laurent Gatto
readMgfData()
to read mgf
peak lists.
file <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "mzXML$") mem <- readMSData(file, mode = "inMemory") mem dsk <- readMSData(file, mode = "onDisk") dsk
file <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "mzXML$") mem <- readMSData(file, mode = "inMemory") mem dsk <- readMSData(file, mode = "onDisk") dsk
This function reads data files to generate an
MSnSet
instance. It is a wrapper around
Biobase
's readExpressionSet
function with an
additional featureDataFile
parameter to include feature data.
See also readExpressionSet
for more details.
readMSnSet2
is a simple version that takes a single text
spreadsheet as input and extracts the expression data and feature
meta-data to create and MSnSet
.
Note that when using readMSnSet2
, one should not set
rownames
as additional argument to defined feature names. It is
ignored and used to set fnames
if not provided otherwise.
readMSnSet(exprsFile, phenoDataFile, featureDataFile, experimentDataFile, notesFile, path, annotation, exprsArgs = list(sep = sep, header = header, row.names = row.names, quote = quote, ...), phenoDataArgs = list(sep = sep, header = header, row.names = row.names, quote = quote, stringsAsFactors = stringsAsFactors, ...), featureDataArgs = list(sep = sep, header = header, row.names = row.names, quote = quote, stringsAsFactors = stringsAsFactors, ...), experimentDataArgs = list(sep = sep, header = header, row.names = row.names, quote = quote, stringsAsFactors = stringsAsFactors, ...), sep = "\t", header = TRUE, quote = "", stringsAsFactors = FALSE, row.names = 1L, widget = getOption("BioC")$Base$use.widgets, ...) readMSnSet2(file, ecol, fnames, ...)
readMSnSet(exprsFile, phenoDataFile, featureDataFile, experimentDataFile, notesFile, path, annotation, exprsArgs = list(sep = sep, header = header, row.names = row.names, quote = quote, ...), phenoDataArgs = list(sep = sep, header = header, row.names = row.names, quote = quote, stringsAsFactors = stringsAsFactors, ...), featureDataArgs = list(sep = sep, header = header, row.names = row.names, quote = quote, stringsAsFactors = stringsAsFactors, ...), experimentDataArgs = list(sep = sep, header = header, row.names = row.names, quote = quote, stringsAsFactors = stringsAsFactors, ...), sep = "\t", header = TRUE, quote = "", stringsAsFactors = FALSE, row.names = 1L, widget = getOption("BioC")$Base$use.widgets, ...) readMSnSet2(file, ecol, fnames, ...)
Arguments direclty passed to readExpressionSet
. The description
is from the readExpressionSet
documentation page.
exprsFile |
(character) File or connection from which to read
expression values. The file should contain a matrix with rows as
features and columns as samples. |
phenoDataFile |
(character) File or connection from which to read
phenotypic data. |
experimentDataFile |
(character) File or connection from which to
read experiment data. |
notesFile |
(character) File or connection from which to read
notes; |
path |
(optional) directory in which to find all the above files. |
annotation |
(character) A single character string indicating the annotation associated with this ExpressionSet. |
exprsArgs |
A list of arguments to be used with
|
phenoDataArgs |
A list of arguments to be used (with
|
experimentDataArgs |
A list of arguments to be used (with
|
sep , header , quote , stringsAsFactors , row.names
|
arguments used
by the |
widget |
A boolean value indicating whether widgets can be
used. Widgets are NOT yet implemented for
|
... |
Further arguments that can be passed on to the
|
Additional argument, specific to readMSnSet
:
featureDataFile |
(character) File or connection from which to read
feature data. |
featureDataArgs |
A list of arguments to be used (with
|
Arguments for readMSnSet2
:
file |
A Passing a |
ecol |
A |
fnames |
An optional |
An instance of the MSnSet
class.
Laurent Gatto <[email protected]>
The grepEcols
and getEcols
helper
functions to identify the ecol
values. The MSnbase-io
vignette illustrates these functions in detail. It can be accessed
with vignette("MSnbase-io")
.
## Not run: exprsFile <- "path_to_intensity_file.csv" fdatafile <- "path_to_featuredata_file.csv" pdatafile <- "path_to_sampledata_file.csv" ## Read ExpressionSet with appropriate parameters res <- readMSnSet(exprsFile, pdataFile, fdataFile, sep = "\t", header=TRUE) ## End(Not run) library("pRolocdata") f0 <- dir(system.file("extdata", package = "pRolocdata"), full.names = TRUE, pattern = "Dunkley2006") basename(f0) res <- readMSnSet2(f0, ecol = 5:20) res head(exprs(res)) ## columns 5 to 20 head(fData(res)) ## other columns
## Not run: exprsFile <- "path_to_intensity_file.csv" fdatafile <- "path_to_featuredata_file.csv" pdatafile <- "path_to_sampledata_file.csv" ## Read ExpressionSet with appropriate parameters res <- readMSnSet(exprsFile, pdataFile, fdataFile, sep = "\t", header=TRUE) ## End(Not run) library("pRolocdata") f0 <- dir(system.file("extdata", package = "pRolocdata"), full.names = TRUE, pattern = "Dunkley2006") basename(f0) res <- readMSnSet2(f0, ecol = 5:20) res head(exprs(res)) ## columns 5 to 20 head(fData(res)) ## other columns
Reads as set of mzId
files containing PSMs an generates a
data.frame
.
readMzIdData(files)
readMzIdData(files)
files |
A |
This function uses the functionality provided by the mzR
package
to access data in the mzId
files. An object of class mzRident
can also be coerced to a data.frame
using as(, "data.frame")
.
A data.frame
containing the PSMs stored in the mzId
files.
Laurent Gatto
filterIdentificationDataFrame()
to filter out unreliable PSMs.
idf <- "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid" f <- msdata::ident(full.names = TRUE, pattern = idf) basename(f) readMzIdData(f)
idf <- "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid" f <- msdata::ident(full.names = TRUE, pattern = idf) basename(f) readMzIdData(f)
This function can be used to create an
"MSnSet"
by reading and parsing an
mzTab
file. The metadata section is always used to populate
the MSnSet
's experimentData()@other$mzTab
slot.
readMzTabData( file, what = c("PRT", "PEP", "PSM"), version = c("1.0", "0.9"), verbose = isMSnbaseVerbose() )
readMzTabData( file, what = c("PRT", "PEP", "PSM"), version = c("1.0", "0.9"), verbose = isMSnbaseVerbose() )
file |
A |
what |
One of |
version |
A |
verbose |
Produce verbose output. |
An instance of class MSnSet
.
Laurent Gatto
See MzTab
and MSnSetList
for
details about the inners of readMzTabData
.
testfile <- "https://raw.githubusercontent.com/HUPO-PSI/mzTab/master/examples/1_0-Proteomics-Release/PRIDE_Exp_Complete_Ac_16649.xml-mztab.txt" prot <- readMzTabData(testfile, "PRT") prot head(fData(prot)) head(exprs(prot)) psms <- readMzTabData(testfile, "PSM") psms head(fData(psms))
testfile <- "https://raw.githubusercontent.com/HUPO-PSI/mzTab/master/examples/1_0-Proteomics-Release/PRIDE_Exp_Complete_Ac_16649.xml-mztab.txt" prot <- readMzTabData(testfile, "PRT") prot head(fData(prot)) head(exprs(prot)) psms <- readMzTabData(testfile, "PSM") psms head(fData(psms))
This function can be used to create a "MSnSet"
by reading and parsing an mzTab
file. The metadata section
is always used to populate the MSnSet
's experimentData
slot.
readMzTabData_v0.9(file, what = c("PRT", "PEP"), verbose = isMSnbaseVerbose())
readMzTabData_v0.9(file, what = c("PRT", "PEP"), verbose = isMSnbaseVerbose())
file |
A |
what |
One of |
verbose |
Produce verbose output. |
An instance of class MSnSet
.
Laurent Gatto
writeMzTabData
to save an
"MSnSet"
as an mzTab
file.
testfile <- "https://raw.githubusercontent.com/HUPO-PSI/mzTab/master/legacy/jmztab-1.0/examples/mztab_itraq_example.txt" prot <- readMzTabData_v0.9(testfile, "PRT") prot pep <- readMzTabData_v0.9(testfile, "PEP") pep
testfile <- "https://raw.githubusercontent.com/HUPO-PSI/mzTab/master/legacy/jmztab-1.0/examples/mztab_itraq_example.txt" prot <- readMzTabData_v0.9(testfile, "PRT") prot pep <- readMzTabData_v0.9(testfile, "PEP") pep
The readSRMData
function reads MRM/SRM data from provided mzML files and
returns the results as a MChromatograms()
object.
readSRMData(files, pdata = NULL)
readSRMData(files, pdata = NULL)
files |
|
pdata |
|
readSRMData
supports reading chromatogram entries from mzML files. If
multiple files are provided the same precursor and product m/z for SRM/MRM
chromatograms are expected across files. The number of columns of the
resulting MChromatograms()
object corresponds to the number of files. Each
row in the MChromatograms
object is supposed to contain chromatograms
with same polarity, precursor and product m/z. If chromatograms with
redundant polarity, precursor and product m/z values and precursor collision
energies are found, they are placed into multiple consecutive rows in the
MChromatograms
object.
A MChromatograms()
object. See details above for more information.
readSRMData
reads only SRM/MRM chromatogram data, i.e. chromatogram data
from mzML files with precursorIsolationWindowTargetMZ
and
productIsolationWindowTargetMZ
attributes. Total ion chromatogram data is
hence not extracted.
The number of features and hence rows of the resulting MChromatograms
object depends on the total list of unique precursor and product m/z
isolation windows (and precursor collision energies) found across all input
files. In cases in which not each file has chromatgraphic data for the same
polarity, precursor m/z, product m/z and collision energy,
an empty Chromatogram()
object is reported for the specific precursor
and product m/z combination of the respective file (and a warning is
thrown).
Johannes Rainer
## Read an example MRM/SRM data library(msdata) fl <- proteomics(full.names = TRUE, pattern = "MRM") ## Read the data mrm <- readSRMData(fl) ## The data is represented as a MChromatograms object, each column ## containing the data from one input file mrm ## Access the polarity for each chromatogram (row) polarity(mrm) ## Access the precursor m/z. The result is returned as a matrix with ## columns representing the minimum and maximum m/z (will be identical in ## most cases). precursorMz(mrm) ## Access the product m/z. productMz(mrm) ## Plot one chromatogram plot(mrm[1, ])
## Read an example MRM/SRM data library(msdata) fl <- proteomics(full.names = TRUE, pattern = "MRM") ## Read the data mrm <- readSRMData(fl) ## The data is represented as a MChromatograms object, each column ## containing the data from one input file mrm ## Access the polarity for each chromatogram (row) polarity(mrm) ## Access the precursor m/z. The result is returned as a matrix with ## columns representing the minimum and maximum m/z (will be identical in ## most cases). precursorMz(mrm) ## Access the product m/z. productMz(mrm) ## Plot one chromatogram plot(mrm[1, ])
Reduce a data.frame so that the (primary) key column contains only unique entries and other columns pertaining to that entry are combined into semicolon-separated values into a single row/observation.
An important side-effect of reducing a data.frame
is that all
columns other than the key are converted to characters when they
are collapsed to a semi-column separated value (even if only one
value is present) as soon as one observation of transformed.
## S4 method for signature 'data.frame' reduce(x, key, sep = ";")
## S4 method for signature 'data.frame' reduce(x, key, sep = ";")
x |
A |
key |
The column name (currenly only one is supported) to be used as primary key. |
sep |
The separator. Default is |
A reduced data.frame
.
Laurent Gatto
dfr <- data.frame(A = c(1, 1, 2), B = c("x", "x", "z"), C = LETTERS[1:3]) dfr dfr2 <- reduce(dfr, key = "A") dfr2 ## column A used as key is still num str(dfr2) dfr3 <- reduce(dfr, key = "B") dfr3 ## A is converted to chr; B remains factor str(dfr3) dfr4 <- data.frame(A = 1:3, B = LETTERS[1:3], C = c(TRUE, FALSE, NA)) ## No effect of reducing, column classes are maintained str(reduce(dfr4, key = "B"))
dfr <- data.frame(A = c(1, 1, 2), B = c("x", "x", "z"), C = LETTERS[1:3]) dfr dfr2 <- reduce(dfr, key = "A") dfr2 ## column A used as key is still num str(dfr2) dfr3 <- reduce(dfr, key = "B") dfr3 ## A is converted to chr; B remains factor str(dfr3) dfr4 <- data.frame(A = 1:3, B = LETTERS[1:3], C = c(TRUE, FALSE, NA)) ## No effect of reducing, column classes are maintained str(reduce(dfr4, key = "B"))
The method removes non-identifed features in MSnExp
and MSnSet
instances using relevant information from the
feaureData
slot of a user-provide filtering vector of logicals.
signature(object = "MSnExp", fcol = "pepseq", keep =
NULL)
Removes the feature from object
that have a
feature fcol
(default is "pepseq"
) equal to
NA
. Alternatively, one can also manually define
keep
, a vector of logical, defining the feature to be
retained.
signature(object = "MSnSet", fcol = "pepseq", keep =
NULL)
As above of MSnSet
instances.
Laurent Gatto
quantFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "mzXML$") identFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "dummyiTRAQ.mzid") msexp <- readMSData(quantFile) msexp <- addIdentificationData(msexp, identFile) fData(msexp)$sequence length(msexp) ## using default fcol msexp2 <- removeNoId(msexp) length(msexp2) fData(msexp2)$sequence ## using keep print(fvarLabels(msexp)) (k <- fData(msexp)$'MS.GF.EValue' > 75) k[is.na(k)] <- FALSE k msexp3 <- removeNoId(msexp, keep = k) length(msexp3) fData(msexp3)$sequence
quantFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "mzXML$") identFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "dummyiTRAQ.mzid") msexp <- readMSData(quantFile) msexp <- addIdentificationData(msexp, identFile) fData(msexp)$sequence length(msexp) ## using default fcol msexp2 <- removeNoId(msexp) length(msexp2) fData(msexp2)$sequence ## using keep print(fvarLabels(msexp)) (k <- fData(msexp)$'MS.GF.EValue' > 75) k[is.na(k)] <- FALSE k msexp3 <- removeNoId(msexp, keep = k) length(msexp3) fData(msexp3)$sequence
This method sets low intensity peaks from individual spectra
(Spectrum
instances) or whole experiments (MSnExp
instances) to 0. The intensity threshold is set with the t
parameter. Default is the "min"
character. The threshold is
then set as the non-0 minimum intensity found in the spectrum. Any
other numeric values is valid. All peaks with maximum intensity
smaller or equal to t
are set to 0.
If the spectrum is in profile mode, ranges of successive non-0 peaks
<= t
are set to 0. If the spectrum is centroided, then
individual peaks <= t
are set to 0. See the example below for
an illustration.
Note that the number of peaks is not changed; the peaks below the
threshold are set to 0 and the object is not cleanded out (see
clean
). An illustrative example is shown below.
signature(object = "MSnExp", t, verbose = "logical" )
Removes low intensity peaks of all spectra in MSnExp
object. t
sets the minimum peak intensity. Default is
"min", i.e the smallest intensity in each spectrum. Other
numeric
values are valid.
Displays a control bar if verbose set to TRUE
(default). Returns a new MSnExp
instance.
signature(object = "Spectrum", t, msLevel. =
"numeric")
Removes low intensity peaks of Spectrum
object. t
sets the minimum peak intensity. Default is
"min", i.e the smallest intensity in each spectrum. Other
numeric
values are valid. msLevel.
defines the level
of the spectrum, and if msLevel(object) != msLevel.
,
cleaning is ignored. Only relevant when called from
OnDiskMSnExp
and is only relevant for developers.
Returns a new Spectrum
instance.
Laurent Gatto
clean
and trimMz
for other spectra
processing methods.
int <- c(2, 0, 0, 0, 1, 5, 1, 0, 0, 1, 3, 1, 0, 0, 1, 4, 2, 1) sp1 <- new("Spectrum2", intensity = int, mz = 1:length(int), centroided = FALSE) sp2 <- removePeaks(sp1) ## no peaks are removed here ## as min intensity is 1 and ## no peak has a max int <= 1 sp3 <- removePeaks(sp1, 3) intensity(sp1) intensity(sp2) intensity(sp3) peaksCount(sp1) == peaksCount(sp2) peaksCount(sp3) <= peaksCount(sp1) data(itraqdata) itraqdata2 <- removePeaks(itraqdata, t = 2.5e5) table(unlist(intensity(itraqdata)) == 0) table(unlist(intensity(itraqdata2)) == 0) processingData(itraqdata2) ## difference between centroided and profile peaks int <- c(104, 57, 32, 33, 118, 76, 38, 39, 52, 140, 52, 88, 394, 71, 408, 94, 2032) sp <- new("Spectrum2", intensity = int, centroided = FALSE, mz = seq_len(length(int))) ## unchanged, as ranges of peaks <= 500 considered intensity(removePeaks(sp, 500)) stopifnot(identical(intensity(sp), intensity(removePeaks(sp, 500)))) centroided(sp) <- TRUE ## different! intensity(removePeaks(sp, 500))
int <- c(2, 0, 0, 0, 1, 5, 1, 0, 0, 1, 3, 1, 0, 0, 1, 4, 2, 1) sp1 <- new("Spectrum2", intensity = int, mz = 1:length(int), centroided = FALSE) sp2 <- removePeaks(sp1) ## no peaks are removed here ## as min intensity is 1 and ## no peak has a max int <= 1 sp3 <- removePeaks(sp1, 3) intensity(sp1) intensity(sp2) intensity(sp3) peaksCount(sp1) == peaksCount(sp2) peaksCount(sp3) <= peaksCount(sp1) data(itraqdata) itraqdata2 <- removePeaks(itraqdata, t = 2.5e5) table(unlist(intensity(itraqdata)) == 0) table(unlist(intensity(itraqdata2)) == 0) processingData(itraqdata2) ## difference between centroided and profile peaks int <- c(104, 57, 32, 33, 118, 76, 38, 39, 52, 140, 52, 88, 394, 71, 408, 94, 2032) sp <- new("Spectrum2", intensity = int, centroided = FALSE, mz = seq_len(length(int))) ## unchanged, as ranges of peaks <= 500 considered intensity(removePeaks(sp, 500)) stopifnot(identical(intensity(sp), intensity(removePeaks(sp, 500)))) centroided(sp) <- TRUE ## different! intensity(removePeaks(sp, 500))
This methods sets all the reporter tag ion peaks from one MS2
spectrum or all the MS2 spectra of an experiment to 0. Reporter data
is specified using an "ReporterIons"
instance. The peaks are selected around the expected reporter ion
m/z value +/- the reporter width.
Optionally, the spectrum/spectra can be cleaned
to
remove successive 0 intensity data points (see the clean
function for details).
Note that this method only works for MS2 spectra or experiments that contain MS2 spectra. It will fail for MS1 spectrum.
signature(object = "MSnExp", reporters = "ReporterIons",
clean = "logical", verbose = "logical" )
The reporter ion
peaks defined in the reporters
instance of all the MS2
spectra of the "MSnExp"
instance are set to 0
and, if clean
is set to TRUE
, cleaned. The default
value of reporters
is NULL
, which leaves the spectra
as unchanged. The verbose
parameter (default is
TRUE
) defines whether a progress bar should be showed.
signature(object = "Spectrum", reporters = "ReporterIons",
clean = "FALSE")
The reporter ion peaks defined in the
reporters
instance of MS2 "Spectrum"
instance are set to 0 and, if clean
is set to TRUE
,
cleaned. The default value of reporters
is NULL
,
which leaves the spectrum as unchanged.
Laurent Gatto
clean
and removePeaks
for other spectra
processing methods.
sp1 <- itraqdata[[1]] sp2 <- removeReporters(sp1,reporters=iTRAQ4) sel <- mz(sp1) > 114 & mz(sp1) < 114.2 mz(sp1)[sel] intensity(sp1)[sel] plot(sp1,full=TRUE,reporters=iTRAQ4) intensity(sp2)[sel] plot(sp2,full=TRUE,reporters=iTRAQ4)
sp1 <- itraqdata[[1]] sp2 <- removeReporters(sp1,reporters=iTRAQ4) sel <- mz(sp1) > 114 & mz(sp1) < 114.2 mz(sp1)[sel] intensity(sp1)[sel] plot(sp1,full=TRUE,reporters=iTRAQ4) intensity(sp2)[sel] plot(sp2,full=TRUE,reporters=iTRAQ4)
The ReporterIons
class allows to define a set of isobaric
reporter ions that are used for quantification in MSMS
mode, e.g. iTRAQ (isobaric tag for relative and absolute quantitation)
or TMT (tandem mass tags).
ReporterIons
instances can them be used when quantifying
"MSnExp"
data of plotting the reporters peaks
based on in "Spectrum2"
ojects.
Some reporter ions are provided with MSnbase
an can be loaded
with the data
function. These reporter ions data sets
are:
iTRAQ4
:ReporterIon
object for the iTRAQ
4-plex set. Load with data(iTRAQ4)
.
iTRAQ5
:ReporterIon
object for the iTRAQ
4-plex set plus the isobaric tag. Load with data(iTRAQ5)
.
TMT6
:ReporterIon
object for the TMT
6-plex set. Load with data(TMT6)
.
TMT7
:ReporterIon
object for the TMT
6-plex set plus the isobaric tag. Load with data(TMT6)
.
Objects can be created by calls of the form new("ReporterIons", ...)
.
name
:Object of class "character"
to identify
the ReporterIons
instance.
reporterNames
:Object of class "character"
naming each individual reporter of the ReporterIons
instance. If not provided explicitely, they are names by
concatenating the ReporterIons
name and the respective MZ
values.
description
:Object of class "character"
to
describe the ReporterIons
instance.
mz
:Object of class "numeric"
providing the MZ
values of the reporter ions.
col
:Object of class "character"
providing
colours to highlight the reporters on plots.
width
:Object of class "numeric"
indicating the
width around the individual reporter ions MZ values were to search
for peaks. This is dependent on the mass spectrometer's resolution
and is used for peak picking when quantifying the reporters. See
quantify
for more details about quantification.
.__classVersion__
:Object of class "Versions"
indicating the version of the ReporterIons
instance. Intended for developer use and debugging.
Class "Versioned"
, directly.
show(object)
Displays object content as text.
object[]
Subsets one or several reporter ions of the
ReporterIons
object and returns a new instance of the same
class.
length(object)
Returns the number of reporter ions in the instance.
mz(object, ...)
Returns the expected mz values of reporter ions. Additional arguments are currently ignored.
reporterColours(object)
or reporterColors(object)Returns the colours used to highlight the reporter ions.
reporterNames(object)
Returns the name of the individual reporter ions. If not specified or is an incorrect number of names is provided at initialisation, the names are generated automatically by concatenating the instance name and the reporter's MZ values.
reporterNames(object) <- value
Sets the reporter
names to value
, which must be a character of the same
length as the number of reporter ions.
width(object)
Returns the widths in which the reporter ion peaks are expected.
names(object)
Returns the name of the
ReporterIons
object.
description(object)
Returns the description of the
ReporterIons
object.
Laurent Gatto
Ross PL, Huang YN, Marchese JN, Williamson B, Parker K, Hattan S, Khainovski N, Pillai S, Dey S, Daniels S, Purkayastha S, Juhasz P, Martin S, Bartlet-Jones M, He F, Jacobson A, Pappin DJ. "Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents." Mol Cell Proteomics, 2004 Dec;3(12):1154-69. Epub 2004 Sep 22. PubMed PMID: 15385600.
Thompson A, Schäfer J, Kuhn K, Kienle S, Schwarz J, Schmidt G, Neumann T, Johnstone R, Mohammed AK, Hamon C. "Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS." Anal Chem. 2003 Apr 15;75(8):1895-904. Erratum in: Anal Chem. 2006 Jun 15;78(12):4235. Mohammed, A Karim A [added] and Anal Chem. 2003 Sep 15;75(18):4942. Johnstone, R [added]. PubMed PMID: 12713048.
TMT6
or iTRAQ4
for readily available examples.
## Code used for the iTRAQ4 set ri <- new("ReporterIons", description="4-plex iTRAQ", name="iTRAQ4", reporterNames=c("iTRAQ4.114","iTRAQ4.115", "iTRAQ4.116","iTRAQ4.117"), mz=c(114.1,115.1,116.1,117.1), col=c("red","green","blue","yellow"), width=0.05) ri reporterNames(ri) ri[1:2]
## Code used for the iTRAQ4 set ri <- new("ReporterIons", description="4-plex iTRAQ", name="iTRAQ4", reporterNames=c("iTRAQ4.114","iTRAQ4.115", "iTRAQ4.116","iTRAQ4.117"), mz=c(114.1,115.1,116.1,117.1), col=c("red","green","blue","yellow"), width=0.05) ri reporterNames(ri) ri[1:2]
Select feature variables to be retained.
requiredFvarLabels
returns a character
vector with the
required feature data variable names (fvarLabels
, i.e. the column
names in the fData
data.frame
) for the specified object.
selectFeatureData(object, graphics = TRUE, fcol) requiredFvarLabels(x = c("OnDiskMSnExp", "MSnExp", "MSnSet"))
selectFeatureData(object, graphics = TRUE, fcol) requiredFvarLabels(x = c("OnDiskMSnExp", "MSnExp", "MSnSet"))
object |
An |
graphics |
A |
fcol |
A |
x |
|
For selectFeatureData
: updated object containing only
selected feature variables.
For requiredFvarLabels
: character
with the required feature
variable names.
Laurent Gatto
library("pRolocdata") data(hyperLOPIT2015) ## 5 first feature variables x <- selectFeatureData(hyperLOPIT2015, fcol = 1:5) fvarLabels(x) ## Not run: ## select via GUI x <- selectFeatureData(hyperLOPIT2015) fvarLabels(x) ## End(Not run) ## Subset the feature data of an OnDiskMSnExp object to the minimal ## required columns f <- system.file("microtofq/MM14.mzML", package = "msdata") od <- readMSData(f, mode = "onDisk") ## what columns do we have? fvarLabels(od) ## Reduce the feature data data.frame to the required columns only od <- selectFeatureData(od, fcol = requiredFvarLabels(class(od))) fvarLabels(od)
library("pRolocdata") data(hyperLOPIT2015) ## 5 first feature variables x <- selectFeatureData(hyperLOPIT2015, fcol = 1:5) fvarLabels(x) ## Not run: ## select via GUI x <- selectFeatureData(hyperLOPIT2015) fvarLabels(x) ## End(Not run) ## Subset the feature data of an OnDiskMSnExp object to the minimal ## required columns f <- system.file("microtofq/MM14.mzML", package = "msdata") od <- readMSData(f, mode = "onDisk") ## what columns do we have? fvarLabels(od) ## Reduce the feature data data.frame to the required columns only od <- selectFeatureData(od, fcol = requiredFvarLabels(class(od))) fvarLabels(od)
This method smooths individual spectra (Spectrum
instances)
or whole experiments (MSnExp
instances).
Currently, the Savitzky-Golay-Smoothing (method = "SavitzkyGolay"
)
and the Moving-Average-Smoothing (method = "MovingAverage"
) are
available, as implemented in the MALDIquant::smoothIntensity
function.
Additional methods might be added at a later stage.
signature(x = "MSnExp", method = "character",
halfWindowSize = "integer", verbose = "logical", ...)
Smooths all spectra in MSnExp
. method
could be
"SavitzkyGolay"
or
"MovingAverage"
. "halfWindowSize"
controls the
window size of the filter. The resulting window size is 2 *
halfWindowSize + 1
. The best size differs depending on the
selected method
. For method = "SavitzkyGolay"
it
should be lower than FWHM of the peaks (full width at half
maximum; please find details in Bromba and Ziegler 1981). The
arguments ...
are passed to the internal functions.
For method="MovingAverage"
there is an additional weighted
argument (default: FALSE
) to indicate if the average should
be equal weight (default) or if it should have weights depending
on the distance from the center as calculated as
1/2^abs(-halfWindowSize:halfWindowSize)
with the sum
of all weigths normalized to 1.
For method="SavitzkyGolay"
an additonal argument
is polynomialOrder
(default: 3). It controls the
polynomial order of the Savitzky-Golay Filter.
This method displays a progress bar if verbose = TRUE
.
Returns an MSnExp
instance with smoothed spectra.
signature(x = "Spectrum", method = "character",
halfWindowSize = "integer", ...)
Smooths the spectrum (Spectrum
instance). This method is
the same as above but returns a smoothed Spectrum
instead
of an MSnExp
object. It has no verbose
argument. Please read the details for the above MSnExp
method.
Sebastian Gibb <[email protected]>
A. Savitzky and M. J. Golay. 1964. Smoothing and differentiation of data by simplified least squares procedures. Analytical chemistry, 36(8), 1627-1639.
M. U. Bromba and H. Ziegler. 1981. Application hints for Savitzky-Golay digital smoothing filters. Analytical Chemistry, 53(11), 1583-1586.
S. Gibb and K. Strimmer. 2012. MALDIquant: a versatile R package for the analysis of mass spectrometry data. Bioinformatics 28: 2270-2271. http://strimmerlab.org/software/maldiquant/
clean
, pickPeaks
, removePeaks
and
trimMz
for other spectra processing methods.
sp1 <- new("Spectrum1", intensity = c(1:6, 5:1), mz = 1:11) sp2 <- smooth(sp1, method = "MovingAverage", halfWindowSize = 2) intensity(sp2) data(itraqdata) itraqdata2 <- smooth(itraqdata, method = "MovingAverage", halfWindowSize = 2) processingData(itraqdata2)
sp1 <- new("Spectrum1", intensity = c(1:6, 5:1), mz = 1:11) sp2 <- smooth(sp1, method = "MovingAverage", halfWindowSize = 2) intensity(sp2) data(itraqdata) itraqdata2 <- smooth(itraqdata, method = "MovingAverage", halfWindowSize = 2) processingData(itraqdata2)
Virtual container for spectrum data common to all different types of
spectra. A Spectrum
object can not be directly instanciated. Use
"Spectrum1"
and "Spectrum2"
instead.
In version 1.19.12, the polarity
slot has been added to this
class (previously in "Spectrum1"
).
msLevel
:Object of class "integer"
indicating
the MS level: 1 for MS1 level Spectrum1
objects and 2 for MSMSM
Spectrum2
objects. Levels > 2 have not been tested and will be
handled as MS2 spectra.
polarity
:Object of class "integer"
indicating
the polarity if the ion.
peaksCount
:Object of class "integer"
indicating the number of MZ peaks.
rt
:Object of class "numeric"
indicating the
retention time (in seconds) for the current ions.
tic
:Object of class "numeric"
indicating the
total ion current, as reported in the original raw data file.
acquisitionNum
:Object of class "integer"
corresponding to the acquisition number of the current spectrum.
scanIndex
:Object of class "integer"
indicating
the scan index of the current spectrum.
mz
:Object of class "numeric"
of length equal
to the peaks count (see peaksCount
slot) indicating the MZ
values that have been measured for the current ion.
intensity
:Object of class "numeric"
of same
length as mz
indicating the intensity at which each mz
datum has been measured.
centroided
:Object of class "logical"
indicating if instance is centroided ('TRUE') of uncentroided
('FALSE'). Default is NA
.
smoothed
:Object of class "logical"
indicating if instance is smoothed ('TRUE') of unsmoothed
('FALSE'). Default is NA
.
fromFile
:Object of class "integer"
referencing
the file the spectrum originates. The file names are stored in the
processingData
slot of the "MSnExp"
or
"MSnSet"
instance that contains the current
"Spectrum"
instance.
.__classVersion__
:Object of class "Versions"
indicating the version of the Spectrum
class. Intended for
developer use and debugging.
Class "Versioned"
, directly.
acquisitionNum(object)
Returns the acquisition number of the spectrum as an integer.
scanIndex(object)
Returns the scan index of the spectrum as an integer.
centroided(object)
Indicates whether spectrum is
centroided (TRUE
), in profile mode (FALSE
), or
unkown (NA
).
isCentroided(object, k=0.025, qtl=0.9)
A heuristic
assessing if a spectrum is in profile or centroided mode. The
function takes the qtl
th quantile top peaks, then
calculates the difference between adjacent M/Z value and returns
TRUE
if the first quartile is greater than k
. (See
MSnbase:::.isCentroided
for the code.) The function has
been tuned to work for MS1 and MS2 spectra and data centroided
using different peak picking algorithms, but false positives can
occur. See https://github.com/lgatto/MSnbase/issues/131 for
details. It should however be safe to use is at the experiment
level, assuming that all MS level have the same mode. See
class?MSnExp
for an example.
smoothed(object)
Indicates whether spectrum is
smoothed (TRUE
) or not (FALSE
).
centroided(object) <- value
Sets the centroided
status of the spectrum object.
smoothed(object) <- value
Sets the smoothed
status of the spectrum object.
fromFile(object)
Returns the index of the raw data file from which the current instances originates as an integer.
intensity(object)
Returns an object of class
numeric
containing the intensities of the spectrum.
msLevel(object)
Returns an MS level of the spectrum as an integer.
mz(object, ...)
Returns an object of class
numeric
containing the MZ value of the spectrum
peaks. Additional arguments are currently ignored.
peaksCount(object)
Returns the number of peaks (possibly of 0 intensity) as an integer.
rtime(object, ...)
Returns the retention time for the spectrum as an integer. Additional arguments are currently ignored.
ionCount(object)
Returns the total ion count for the spectrum as a numeric.
tic(object, ...)
Returns the total ion current for
the spectrum as a numeric. Additional arguments are currently
ignored. This is the total ion current as originally reported in
the raw data file. To get the current total ion count, use
ionCount
.
signature(object = "Spectrum")
: Bins Spectrum.
See bin
documentation for more details and examples.
signature(object = "Spectrum")
: Removes unused 0
intensity data points. See clean
documentation
for more details and examples.
signature(x = "Spectrum",
y = "Spectrum")
: Compares spectra. See
compareSpectra
documentation for more details and
examples.
signature(object = "Spectrum")
: Estimates the
noise in a profile spectrum.
See estimateNoise
documentation for more
details and examples.
signature(object = "Spectrum")
: Performs the peak
picking to generate a centroided spectrum.
See pickPeaks
documentation for more
details and examples.
signature(x = "Spectrum", y = "missing")
: Plots
intensity against mz.
See plot.Spectrum
documentation for more details.
signature(x = "Spectrum", y = "Spectrum")
: Plots
two spectra above/below each other.
See plot.Spectrum.Spectrum
documentation for more
details.
signature(x = "Spectrum", y = "character")
: Plots
an MS2 level spectrum and its highlight the fragmention peaks.
See plot.Spectrum.character
documentation for more
details.
signature(object = "Spectrum")
: Quatifies
defined peaks in the spectrum.
See quantify
documentation for more details.
signature(object = "Spectrum")
: Remove
peaks lower that a threshold t
. See
removePeaks
documentation for more details and
examples.
signature(x = "Spectrum")
: Smooths spectrum.
See smooth
documentation for more details and examples.
signature(object = "Spectrum")
: Displays object
content as text.
signature(object = "Spectrum")
: Trims the MZ
range of all the spectra of the MSnExp
instance. See
trimMz
documentation for more details and
examples.
signature(x = "Spectrum")
: Checks if the
x
is an empty Spectrum
.
signature(object = "Spectrum", "data.frame")
:
Coerces the Spectrum
object to a two-column
data.frame
containing intensities and MZ values.
This is a virtual class and can not be instanciated directly.
Laurent Gatto
Instaciable sub-classes "Spectrum1"
and
"Spectrum2"
for MS1 and MS2 spectra.
Spectrum1
extends the "Spectrum"
class and
introduces an MS1 specific attribute in addition to the slots in
"Spectrum"
. Spectrum1
instances are not
created directly but are contained in the assayData
slot of an
"MSnExp"
.
See the "Spectrum"
class for inherited slots.
Class "Spectrum"
, directly.
Class "Versioned"
, by class "Spectrum", distance 2.
See "Spectrum"
for additional accessors and
methods to process Spectrum1
objects.
polarity(object)
Returns the polarity of the spectrum as an integer.
Laurent Gatto
Virtual super-class "Spectrum"
,
"Spectrum2"
for MS2 spectra and
"MSnExp"
for a full experiment container.
Spectrum2
extends the "Spectrum"
class and
introduces several MS2 specific attributes in addition to the slots in
"Spectrum"
. Since version 1.99.2, this class is
used for any MS levels > 1. Spectrum2
are not created directly
but are contained in the assayData
slot of an
"MSnExp"
.
In version 1.19.12, the polarity
slot had been added to the
"Spectrum"
class (previously in
"Spectrum1"
). Hence, "Spectrum2"
objects
created prior to this change will not be valid anymore, since they
will miss the polarity
slots. Object can be appropriately
updated using the updateObject
method.
See the "Spectrum"
class for inherited slots.
merged
:Object of class "numeric"
indicating of
how many combination the current spectrum is the result of.
precScanNum
:Object of class "integer"
indicating
the precursor MS scan index in the original input file. Accessed
with the precScanNum
or precAcquisitionNum
methods.
precursorMz
:Object of class "numeric"
providing the precursor ion MZ value.
precursorIntensity
:Object of class "numeric"
providing the precursor ion intensity.
precursorCharge
:Object of class "integer"
indicating the precursor ion charge.
collisionEnergy
:Object of class "numeric"
indicating the collision energy used to fragment the parent ion.
Class "Spectrum"
, directly.
Class "Versioned"
, by class "Spectrum", distance 2.
See "Spectrum"
for additional accessors and
methods for Spectrum2
objects.
precursorMz(object)
Returns the precursor MZ value as a numeric.
precursorMz(object)
Returns the precursor scan number in the original data file as an integer.
precursorIntensity(object)
Returns the precursor intensity as a numeric.
precursorCharge(object)
Returns the precursor intensity as a integer.
collisionEnergy(object)
Returns the collision energy as an numeric.
removeReporters(object, ...)
Removes all reporter ion
peaks. See removeReporters
documentation for more
details and examples.
precAcquisitionNum
:Returns the precursor's acquisition number.
precScanNum
:See precAcquisitionNum
.
signature(sequence = "character",
object = "Spectrum2")
:
Calculates and matches the theoretical fragments of a peptide
sequence
with the ones observed in a spectrum.
See calculateFragments
documentation
for more details and examples.
Laurent Gatto
Virtual super-class "Spectrum"
,
"Spectrum1"
for MS1 spectra and
"MSnExp"
for a full experiment container.
This instance of class "ReporterIons"
corresponds
to the TMT 6-plex set, i.e the 126, 127, 128, 129, 130 and 131
isobaric tags. In the TMT7
data set, an unfragmented tag, i.e
reporter and attached isobaric tag, is also included at MZ 229. A
second TMT6b
has slightly different values.
The TMT10
instance corresponds to the 10-plex version. There
are spectific HCD (TMT10HCD
, same as TMT10
) and ETD
(TMT10ETD
) sets.
These objects are used to plot the reporter ions of interest in an
MSMS spectra (see "Spectrum2"
) as well as for
quantification (see quantify
).
TMT6 TMT6b TMT7 TMT7b TMT10 TMT10ETD TMT10HCD TMT11 TMT11HCD
TMT6 TMT6b TMT7 TMT7b TMT10 TMT10ETD TMT10HCD TMT11 TMT11HCD
Thompson A, Schäfer J, Kuhn K, Kienle S, Schwarz J, Schmidt G, Neumann T, Johnstone R, Mohammed AK, Hamon C. "Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS." Anal Chem. 2003 Apr 15;75(8):1895-904. Erratum in: Anal Chem. 2006 Jun 15;78(12):4235. Mohammed, A Karim A [added] and Anal Chem. 2003 Sep 15;75(18):4942. Johnstone, R [added]. PubMed PMID: 12713048.
TMT6 TMT6[1:2] TMT10 newReporter <- new("ReporterIons", description="an example", name="my reporter ions", reporterNames=c("myrep1","myrep2"), mz=c(121,122), col=c("red","blue"), width=0.05) newReporter
TMT6 TMT6[1:2] TMT10 newReporter <- new("ReporterIons", description="an example", name="my reporter ions", reporterNames=c("myrep1","myrep2"), mz=c(121,122), col=c("red","blue"), width=0.05) newReporter
This method selects a range of MZ values in a single spectrum
(Spectrum
instances) or all the spectra of an experiment
(MSnExp
instances). The regions to trim are defined by the
range of mz
argument, such that MZ values <= min(mz)
and
MZ values >= max(mz)
are trimmed away.
signature(object = "MSnExp", mz = "numeric", msLevel. =
"numeric")
Trims all spectra in MSnExp
object according
to mz
. If msLevel.
is defined, then only spectra of
that level are trimmer.
signature(object = "Spectrum", mz = "numeric",
msLevel. = "numeric")
Trims the Spectrum
object and
retruns a new trimmed object. msLevel.
defines the level
of the spectrum, and if msLevel(object) != msLevel.
,
cleaning is ignored. Only relevant when called from
OnDiskMSnExp
and is only relevant for developers.
Laurent Gatto
removePeaks
and clean
for other spectra
processing methods.
mz <- 1:100 sp1 <- new("Spectrum2", mz = mz, intensity = abs(rnorm(length(mz)))) sp2 <- trimMz(sp1, c(25, 75)) range(mz(sp1)) range(mz(sp2)) data(itraqdata) itraqdata2 <- filterMz(itraqdata, c(113, 117)) range(mz(itraqdata)) range(mz(itraqdata2)) processingData(itraqdata2)
mz <- 1:100 sp1 <- new("Spectrum2", mz = mz, intensity = abs(rnorm(length(mz)))) sp2 <- trimMz(sp1, c(25, 75)) range(mz(sp1)) range(mz(sp2)) data(itraqdata) itraqdata2 <- filterMz(itraqdata, c(113, 117)) range(mz(itraqdata)) range(mz(itraqdata2)) processingData(itraqdata2)
Methods for function updateObject
for objects from the MSnbase
package. See updateObject
for details.
signature(object = "MSnExp")
Update the MSnExp
object to the latest class version
signature(object = "Spectrum")
Update the
Spectrum
object (and it's sub-classes Spectrum1
and
Spectrum2
) to the latest class version.
Methods writeMgfData
write individual
"Spectrum"
instances of whole
"MSnExp"
experiments to a file
in Mascot Generic Format (mgf) (see
http://www.matrixscience.com/help/data_file_help.html
for more details). Function readMgfData
read spectra from and
mgf file and creates an "MSnExp"
object.
object |
|
con |
A valid |
COM |
Optional character vector with the value for the 'COM' field. |
TITLE |
Optional character vector with the value for the spectrum 'TITLE' field. Not applicable for experiments. |
Note that when reading an mgf file, the original order of the spectra
is lost. Thus, if the data was originally written to mgf from an
MSnExp
object using writeMgfData
, although the feature
names will be identical, the spectra are not as a result of the
reordering. See example below.
signature(object = "MSnExp")
Writes the full exeriment to an mgf file.
signature(object = "Spectrum")
Writes an individual spectrum to an mgf file.
readMgfData
function to read data from and mgf file.
data(itraqdata) f <- tempfile() writeMgfData(itraqdata, con = f) itraqdata2 <- readMgfData(f) ## note that the order of the spectra and precision of some values ## (precursorMz for instance) are altered match(signif(precursorMz(itraqdata2),4), signif(precursorMz(itraqdata),4)) ## [1] 1 10 11 12 13 14 15 16 17 18 ... ## ... but all the precursors are there all.equal(sort(precursorMz(itraqdata2)), sort(precursorMz(itraqdata)), check.attributes = FALSE, tolerance = 10e-5) all.equal(as.data.frame(itraqdata2[[1]]), as.data.frame(itraqdata[[1]])) all.equal(as.data.frame(itraqdata2[[3]]), as.data.frame(itraqdata[[11]])) all(featureNames(itraqdata2) == featureNames(itraqdata))
data(itraqdata) f <- tempfile() writeMgfData(itraqdata, con = f) itraqdata2 <- readMgfData(f) ## note that the order of the spectra and precision of some values ## (precursorMz for instance) are altered match(signif(precursorMz(itraqdata2),4), signif(precursorMz(itraqdata),4)) ## [1] 1 10 11 12 13 14 15 16 17 18 ... ## ... but all the precursors are there all.equal(sort(precursorMz(itraqdata2)), sort(precursorMz(itraqdata)), check.attributes = FALSE, tolerance = 10e-5) all.equal(as.data.frame(itraqdata2[[1]]), as.data.frame(itraqdata[[1]])) all.equal(as.data.frame(itraqdata2[[3]]), as.data.frame(itraqdata[[11]])) all(featureNames(itraqdata2) == featureNames(itraqdata))
The writeMSData,MSnExp
and writeMSData,OnDiskMSnExp
saves
the content of a MSnExp or OnDiskMSnExp object to MS file(s) in
either mzML or mzXML format.
## S4 method for signature 'MSnExp,character' writeMSData( object, file, outformat = c("mzml", "mzxml"), merge = FALSE, verbose = isMSnbaseVerbose(), copy = FALSE, software_processing = NULL )
## S4 method for signature 'MSnExp,character' writeMSData( object, file, outformat = c("mzml", "mzxml"), merge = FALSE, verbose = isMSnbaseVerbose(), copy = FALSE, software_processing = NULL )
object |
|
file |
|
outformat |
|
merge |
|
verbose |
|
copy |
|
software_processing |
optionally provide specific data processing steps.
See documentation of the |
The writeMSData
method uses the proteowizard libraries through
the mzR
package to save the MS data. The data can be written to
mzML or mzXML files with or without copying additional metadata
information from the original files from which the data was read by the
readMSData()
function. This can be set using the copy
parameter.
Note that copy = TRUE
requires the original files to be available and
is not supported for input files in other than mzML or mzXML format.
All metadata related to the run is copied, such as instrument
information, data processings etc. If copy = FALSE
only processing
information performed in R (using MSnbase
) are saved to the mzML file.
Currently only spectrum data is supported, i.e. if the original mzML file contains also chromatogram data it is not copied/saved to the new mzML file.
General spectrum data such as total ion current, peak count, base peak m/z or base peak intensity are calculated from the actual spectrum data before writing the data to the files.
For MSn data, if the OnDiskMSnExp
or MSnExp
does not contain also
the precursor scan of a MS level > 1 spectrum (e.g. due to filtering on
the MS level) precursorScanNum
is set to 0 in the output file to
avoid potentially linking to a wrong spectrum.
The exported mzML
file should be valid according to the mzML 1.1.2
standard. For exported mzXML
files it can not be guaranteed that they
are valid and can be opened with other software than mzR
/MSnbase
.
Johannes Rainer
writeMzTabData
exports an MzTab object as mzTab file. Note
that the comment section "COM" are not written out.
writeMzTabData( object, file, what = c("MT", "PEP", "PRT", "PSM", "SML", "SMF", "SME") )
writeMzTabData( object, file, what = c("MT", "PEP", "PRT", "PSM", "SML", "SMF", "SME") )
object |
MzTab object, either read in by |
file |
|
what |
|
Steffen Neumann