Title: | Workflow to process tandem MS files and build MassBank records |
---|---|
Description: | Workflow to process tandem MS files and build MassBank records. Functions include automated extraction of tandem MS spectra, formula assignment to tandem MS fragments, recalibration of tandem MS spectra with assigned fragments, spectrum cleanup, automated retrieval of compound information from Internet databases, and export to MassBank records. |
Authors: | Michael Stravs, Emma Schymanski, Steffen Neumann, Erik Mueller, Paul Stahlhofen, Tobias Schulze with contributions of Hendrik Treutler |
Maintainer: | RMassBank at Eawag <[email protected]> |
License: | Artistic-2.0 |
Version: | 3.17.0 |
Built: | 2024-12-19 02:57:12 UTC |
Source: | https://github.com/bioc/RMassBank |
Shifts both 'parent' and 'children' spectra of the 'RmbSpectraSet' by the same mass.
## S4 method for signature 'RmbSpectraSet,ANY' e1 - e2
## S4 method for signature 'RmbSpectraSet,ANY' e1 - e2
e1 |
a 'RmbSpectraSet' object containing zero or more 'children' spectra and a 'parent' spectrum |
e2 |
a numeric mass shift |
Shifts all spectra in a 'RmbSpectrum2List' by the same mass
## S4 method for signature 'RmbSpectrum2List,ANY' e1 - e2
## S4 method for signature 'RmbSpectrum2List,ANY' e1 - e2
e1 |
a 'RmbSpectrum2List' object containing zero or more 'RmbSpectrum2' spectra |
e2 |
a numeric mass shift |
Add a negative mass shift to a spectrum
## S4 method for signature 'Spectrum,numeric' e1 - e2
## S4 method for signature 'Spectrum,numeric' e1 - e2
e1 |
a 'MSnbase::Spectrum' object |
e2 |
a numeric mass shift |
Parses a title for a single MassBank record using the title format specified in the option titleFormat. Internally used, not exported.
.parseTitleString(mbdata)
.parseTitleString(mbdata)
mbdata |
list
The information data block for the record header, as stored in
|
If the option is not set, a standard title format is used (for record definition version 1 or 2).
A string with the title.
Michael Stravs, Eawag
MassBank record format: http://www.massbank.jp/manuals/MassBankRecord_en.pdf
## Not run: # used in buildRecord() title <- .parseTitleString(mbdata) ## End(Not run)
## Not run: # used in buildRecord() title <- .parseTitleString(mbdata) ## End(Not run)
TODO: consider whether to add functionality to move reanalysis stuff from legacy data back in.
.updateObject.RmbSpectrum2.formulaSource(w)
.updateObject.RmbSpectrum2.formulaSource(w)
w |
RmbSpectrum2 The object to be updated |
The updated RmbSpectrum2 object
stravsmi
Shifts both 'parent' and 'children' spectra of the 'RmbSpectraSet' by the same mass.
## S4 method for signature 'RmbSpectraSet,ANY' e1 + e2
## S4 method for signature 'RmbSpectraSet,ANY' e1 + e2
e1 |
a 'RmbSpectraSet' object containing zero or more 'children' spectra and a 'parent' spectrum |
e2 |
a numeric mass shift |
Shifts all spectra in a 'RmbSpectrum2List' by the same mass
## S4 method for signature 'RmbSpectrum2List,ANY' e1 + e2
## S4 method for signature 'RmbSpectrum2List,ANY' e1 + e2
e1 |
a 'RmbSpectrum2List' object containing zero or more 'RmbSpectrum2' spectra |
e2 |
a numeric mass shift |
Add a mass shift to a spectrum
## S4 method for signature 'Spectrum,numeric' e1 + e2
## S4 method for signature 'Spectrum,numeric' e1 + e2
e1 |
a 'MSnbase::Spectrum' object |
e2 |
a numeric mass shift |
Add, subtract, and multiply molecular formulas.
add.formula(f1, f2, as.formula = TRUE, as.list = FALSE) multiply.formula(f1, n, as.formula = TRUE, as.list = FALSE)
add.formula(f1, f2, as.formula = TRUE, as.list = FALSE) multiply.formula(f1, n, as.formula = TRUE, as.list = FALSE)
f1 , f2
|
Molecular formulas (in list form or in text form) to calculate with. |
as.formula |
Return the result as a text formula (e.g.
|
as.list |
Return the result in list format (e.g. |
n |
Multiplier (positive or negative, integer or non-integer.) |
Note that the results are not checked for plausibility at any stage, nor reordered.
The resulting formula, as specified above.
Michael Stravs
formulastring.to.list
, is.valid.formula
,
order.formula
## add.formula("C6H12O6", "C3H3") add.formula("C6H12O6", "C-3H-3") add.formula("C6H12O6", multiply.formula("C3H3", -1))
## add.formula("C6H12O6", "C3H3") add.formula("C6H12O6", "C-3H-3") add.formula("C6H12O6", multiply.formula("C3H3", -1))
Adds the peaklist of a MassBank-Record to the specs of an msmsWorkspace
addMB(w, cpdID, fileName, mode)
addMB(w, cpdID, fileName, mode)
w |
The msmsWorkspace that the peaklist should be added to. |
cpdID |
The compoundID of the compound that has been used for the record |
fileName |
The path to the record |
mode |
The ionization mode that has been used to create the record |
The msmsWorkspace
with the additional peaklist from the record
Erik Mueller
## Not run: addMB("filepath_to_records/RC00001.txt") ## End(Not run)
## Not run: addMB("filepath_to_records/RC00001.txt") ## End(Not run)
Loads a table with additional peaks to add to the MassBank spectra. Required
columns are cpdID, scan, int, mzFound, OK
.
addPeaks(mb, filename_or_dataframe)
addPeaks(mb, filename_or_dataframe)
mb |
The |
filename_or_dataframe |
Filename of the csv file, or name of the R dataframe containing the peaklist. |
All peaks with OK=1 will be included in the spectra.
The mbWorkspace
with loaded additional peaks.
Michael Stravs
## Not run: addPeaks("myrun_additionalPeaks.csv")
## Not run: addPeaks("myrun_additionalPeaks.csv")
Adds a manual peaklist in matrix-format
addPeaksManually(w, cpdID, handSpec, mode)
addPeaksManually(w, cpdID, handSpec, mode)
w |
The msmsWorkspace that the peaklist should be added to. |
cpdID |
The compoundID of the compound that has been used for the peaklist |
handSpec |
A peaklist with 2 columns, one with "mz", one with "int" |
mode |
The ionization mode that has been used for the spectrum represented by the peaklist |
The msmsWorkspace
with the additional peaklist added to the right spectrum
Erik Mueller
## Not run: handSpec <- cbind(mz=c(274.986685367956, 259.012401087427, 95.9493025990907, 96.9573002472772), int=c(357,761, 2821, 3446)) addPeaksManually(w, cpdID, handSpec) ## End(Not run)
## Not run: handSpec <- cbind(mz=c(274.986685367956, 259.012401087427, 95.9493025990907, 96.9573002472772), int=c(357,761, 2821, 3446)) addPeaksManually(w, cpdID, handSpec) ## End(Not run)
Adds a new column of a defined type to a data.frame
and initializes it to a value.
The advantage of doing this over adding it with $
or [,""]
is that the case
nrow(o) == 0
is adequately handled and doesn't raise an error.
addProperty(o, name, type, value = NA) ## S4 method for signature 'RmbSpectrum2,character,character' addProperty(o, name, type, value = NA) ## S4 method for signature 'data.frame,character,character' addProperty(o, name, type, value = NA)
addProperty(o, name, type, value = NA) ## S4 method for signature 'RmbSpectrum2,character,character' addProperty(o, name, type, value = NA) ## S4 method for signature 'data.frame,character,character' addProperty(o, name, type, value = NA)
o |
|
name |
Name of the new column |
type |
Data type of the new column |
value |
Initial value of the new column ( |
Expanded data frame.
addProperty(o = data.frame, name = character, type = character)
: Add a new column to a data.frame
stravsmi
Groups an array of analyzed spectra and creates aggregated peak tables
aggregateSpectra(spec, addIncomplete=FALSE)
aggregateSpectra(spec, addIncomplete=FALSE)
spec |
The |
addIncomplete |
Whether or not the peaks from incomplete files (files for which less than the maximal number of spectra are present) |
addIncomplete
is relevant for recalibration. For recalibration,
we want to use only high-confidence peaks, therefore we set
addIncomplete
to FALSE
. When we want to generate a peak
list for actually generating MassBank records, we want to include all peaks
into the peak tables.
A summary data.frame
with all peaks (possibly multiple rows for one m/z value from a spectrum, see below) with columns:
mzFound , intensity
|
Mass and intensity of the peak |
good |
if the peak passes filter criteria |
mzCalc , formula , dbe , dppm
|
calculated mass, formula, dbe and ppm deviation of the assigned formula |
formulaCount , dppmBest
|
Number of matched formulae for this m/z value, and ppm deviation of the best match |
scan , cpdID , parentScan
|
Scan number of the child and parent spectrum in the raw file, also the compound ID to which the peak belongs |
dppmRc |
ppm deviation recalculated from the aggregation function |
index |
Aggregate-table peak index, so the table can be subsetted, edited and results reinserted back into this table easily |
Further columns are later added by workflow steps 6 (electronic noise culler), 7 and 8.
Michael Stravs
## As used in the workflow: ## Not run: % w@spectra <- lapply(w@spectra, function(spec) analyzeMsMs(spec, mode="pH", detail=TRUE, run="recalibrated", cut=0, cut_ratio=0 ) ) w@aggregate <- aggregateSpectra(w@spectra) ## End(Not run)
## As used in the workflow: ## Not run: % w@spectra <- lapply(w@spectra, function(spec) analyzeMsMs(spec, mode="pH", detail=TRUE, run="recalibrated", cut=0, cut_ratio=0 ) ) w@aggregate <- aggregateSpectra(w@spectra) ## End(Not run)
Analyzes MSMS spectra of a compound by fitting formulas to each subpeak.
analyzeMsMs( msmsPeaks, mode = "pH", detail = FALSE, run = "preliminary", filterSettings = getOption("RMassBank")$filterSettings, spectraList = getOption("RMassBank")$spectraList, method = "formula" ) analyzeMsMs.formula( msmsPeaks, mode = "pH", detail = FALSE, run = "preliminary", filterSettings = getOption("RMassBank")$filterSettings ) analyzeMsMs.intensity( msmsPeaks, mode = "pH", detail = FALSE, run = "preliminary", filterSettings = getOption("RMassBank")$filterSettings )
analyzeMsMs( msmsPeaks, mode = "pH", detail = FALSE, run = "preliminary", filterSettings = getOption("RMassBank")$filterSettings, spectraList = getOption("RMassBank")$spectraList, method = "formula" ) analyzeMsMs.formula( msmsPeaks, mode = "pH", detail = FALSE, run = "preliminary", filterSettings = getOption("RMassBank")$filterSettings ) analyzeMsMs.intensity( msmsPeaks, mode = "pH", detail = FALSE, run = "preliminary", filterSettings = getOption("RMassBank")$filterSettings )
msmsPeaks |
A |
mode |
Specifies the processing mode, i.e. which molecule species the
spectra contain. |
detail |
Whether detailed return information should be provided
(defaults to |
run |
|
filterSettings |
Settings for the filter parameters, by default loaded from the RMassBank settings
set with e.g.
|
spectraList |
The list of MS/MS spectra present in each data block. As also defined in the settings file. |
method |
Selects which function to actually use for data evaluation. The default "formula" runs a full analysis via formula assignment to fragment peaks. The alternative setting "intensity" calls a "mock" implementation which circumvents formula assignment and filters peaks purely based on intensity cutoffs and the satellite filtering. (In this case, the ppm and dbe related settings in filterSettings are ignored.) |
The analysis function uses Rcdk. Note
that in this step, satellite peaks are removed by a simple heuristic
rule (refer to the documentation of filterPeakSatellites
for details.)
The processed RmbSpectraSet
object.
Added (or filled, respectively, since the slots are present before) data include
list("complete") |
whether all spectra have useful value |
list("empty") |
whether there are no useful spectra |
list("children") |
The processed
|
analyzeMsMs.formula()
: Analyze the peaks using formula annotation
analyzeMsMs.intensity()
: Analyze the peaks going only by intensity values
Michael Stravs
msmsWorkflow
, filterLowaccResults
,
filterPeakSatellites
, reanalyzeFailpeaks
## Not run: analyzed <- analyzeMsMs(spec, "pH", TRUE)
## Not run: analyzed <- analyzeMsMs(spec, "pH", TRUE)
Generates the PK$ANNOTATION entry from the peaklist obtained. This function is overridable by using the "annotator" option in the settings file.
annotator.default(annotation, formulaTag)
annotator.default(annotation, formulaTag)
annotation |
A peak list to be annotated. Contains columns:
|
formulaTag |
The ion type to be added to annotated formulas ("+" or "-" usually) |
The annotated peak table. Table colnames()
will be used for the
titles (preferrably don't use spaces in the column titles; however no format is
strictly enforced by the MassBank data format.
Michele Stravs, Eawag <[email protected]>
## Not run: annotation <- annotator.default(annotation) ## End(Not run)
## Not run: annotation <- annotator.default(annotation) ## End(Not run)
msmsWorkflow
resultsWrites the results from different msmsWorkflow
steps to a file.
archiveResults(w, fileName, settings = getOption("RMassBank"))
archiveResults(w, fileName, settings = getOption("RMassBank"))
w |
The |
fileName |
The filename to store the results under. |
settings |
The settings to be stored into the msmsWorkspace image. |
# This doesn't really make a lot of sense, # it stores an empty workspace. RmbDefaultSettings() w <- newMsmsWorkspace() archiveResults(w, "narcotics.RData")
# This doesn't really make a lot of sense, # it stores an empty workspace. RmbDefaultSettings() w <- newMsmsWorkspace() archiveResults(w, "narcotics.RData")
Takes a spectra block for a compound, as returned from
analyzeMsMs
, and an aggregated cleaned peak table, together
with a MassBank information block, as stored in the infolists and loaded via
loadInfolist
/readMbdata
and processes them to a
MassBank record
buildRecord(o, ..., cpd, mbdata, analyticalInfo, additionalPeaks) ## S4 method for signature 'RmbSpectraSet' buildRecord(o, ..., cpd, mbdata, analyticalInfo, additionalPeaks) ## S4 method for signature 'RmbSpectrum2' buildRecord( o, ..., cpd = NULL, mbdata = list(), analyticalInfo = list(), additionalPeaks = NULL )
buildRecord(o, ..., cpd, mbdata, analyticalInfo, additionalPeaks) ## S4 method for signature 'RmbSpectraSet' buildRecord(o, ..., cpd, mbdata, analyticalInfo, additionalPeaks) ## S4 method for signature 'RmbSpectrum2' buildRecord( o, ..., cpd = NULL, mbdata = list(), analyticalInfo = list(), additionalPeaks = NULL )
o |
|
... |
keyword arguments for intensity normalization and peak selection (see |
cpd |
|
mbdata |
list
The information data block for the record header, as stored in
|
analyticalInfo |
A list containing information for the 'AC$' section of a MassBank record, with elements 'ai, ac_lc, ac_ms' for general, LC and MS info respectively. |
additionalPeaks |
data.frame
If present, a table with additional peaks to add into the spectra.
As loaded with |
An object of the same type as was used for the input with new information added to it
Michael Stravs
MassBank record format: http://www.massbank.jp/manuals/MassBankRecord_en.pdf
mbWorkflow
, addPeaks
,
toMassbank
This is a wrapper for webchem::cir_query
, using the
CACTUS API at https://cactus.nci.nih.gov/chemical/structure_documentation
for the conversion. Before converting the CAS number, the
name is checked whether it contains the word 'derivative'.
If so, the conversion is stopped and NA is returned.
Also, a warning will be printed in this case.
CAS2SMILES(CAS_number, name)
CAS2SMILES(CAS_number, name)
CAS_number |
character The CAS registry number of a compound |
name |
character The compound's name |
The API allows only one query per second. This is a hard- coded feature
The SMILES code of the compound as character-string
pstahlhofen
SMILES_ethanol <- CAS2SMILES("64-17-5", "Ethanol")
SMILES_ethanol <- CAS2SMILES("64-17-5", "Ethanol")
msmsWorkspace
Checks for isotopes in a msmsWorkspace
checkIsotopes( w, mode = "pH", intensity_cutoff = 0, intensity_precision = "none", conflict = "strict", isolationWindow = 2, evalMode = "complete", plotSpectrum = TRUE, settings = getOption("RMassBank") )
checkIsotopes( w, mode = "pH", intensity_cutoff = 0, intensity_precision = "none", conflict = "strict", isolationWindow = 2, evalMode = "complete", plotSpectrum = TRUE, settings = getOption("RMassBank") )
w |
A |
mode |
|
intensity_cutoff |
The cutoff (as an absolute intensity value) under which isotopic peaks shouldn't be checked for or accepted as valid. Please note: The cutoff is not hard in the sense that it interacts with the intensity_precision parameter. |
intensity_precision |
The difference that is accepted between the calculated and observed intensity of a possible isotopic peak. Further details down below. |
conflict |
Either "isotopic"(Peak formulas are always chosen if they fit the requirements for an isotopic peak) or "strict"(Peaks are only marked as isotopic when there hasn't been a formula assigned before.) |
isolationWindow |
Half of the width of the isolation window in Da |
evalMode |
Currently no function yet, but planned. Currently must be "complete" |
plotSpectrum |
A boolean specifiying whether the spectrumshould be plotted |
settings |
Options to be used for processing. Defaults to the options loaded via
|
text describing parameter inputs in more detail.
intensity_precision
This parameter determines how strict the intensity values should adhere to the calculated intensity in relation to the parent peak.
Options for this parameter are "none"
, where the intensity is irrelevant, "low"
, which has an error margin of 70% and "high"
, where the error margin
is set to 35%. The recommended setting is "low"
, but can be changed to adjust to the intensity precision of the mass spectrometer.
The msmsWorkspace
with annotated isolation peaks
Michael Stravs, Eawag <[email protected]>
Erik Mueller, UFZ
Checks if a specific compound (RmbSpectraSet
) was found with child spectra in the raw file (found
),
has a complete set of MS2 spectra with useful peaks (complete
), or is empty (note: spectra are currently not ever
marked empty - empty should mean found, but no useful peaks at all. This is not yet currently tested.)
checkSpectra(s, property) ## S4 method for signature 'RmbSpectraSet,character' checkSpectra(s, property)
checkSpectra(s, property) ## S4 method for signature 'RmbSpectraSet,character' checkSpectra(s, property)
s |
The ( |
property |
The property to check ( |
TRUE
or FALSE
checkSpectra(s = RmbSpectraSet, property = character)
:
stravsmi
Removes known electronic noise peaks from a peak table
cleanElnoise(peaks, noise=getOption("RMassBank")$electronicNoise, width = getOption("RMassBank")$electronicNoiseWidth) ## S4 method for signature 'data.frame,numeric,numeric' cleanElnoise( peaks, noise = getOption("RMassBank")$electronicNoise, width = getOption("RMassBank")$electronicNoiseWidth ) ## S4 method for signature 'RmbSpectrum2,numeric,numeric' cleanElnoise( peaks, noise = getOption("RMassBank")$electronicNoise, width = getOption("RMassBank")$electronicNoiseWidth ) ## S4 method for signature 'RmbSpectrum2List,numeric,numeric' cleanElnoise( peaks, noise = getOption("RMassBank")$electronicNoise, width = getOption("RMassBank")$electronicNoiseWidth ) ## S4 method for signature 'RmbSpectraSet,numeric,numeric' cleanElnoise( peaks, noise = getOption("RMassBank")$electronicNoise, width = getOption("RMassBank")$electronicNoiseWidth )
cleanElnoise(peaks, noise=getOption("RMassBank")$electronicNoise, width = getOption("RMassBank")$electronicNoiseWidth) ## S4 method for signature 'data.frame,numeric,numeric' cleanElnoise( peaks, noise = getOption("RMassBank")$electronicNoise, width = getOption("RMassBank")$electronicNoiseWidth ) ## S4 method for signature 'RmbSpectrum2,numeric,numeric' cleanElnoise( peaks, noise = getOption("RMassBank")$electronicNoise, width = getOption("RMassBank")$electronicNoiseWidth ) ## S4 method for signature 'RmbSpectrum2List,numeric,numeric' cleanElnoise( peaks, noise = getOption("RMassBank")$electronicNoise, width = getOption("RMassBank")$electronicNoiseWidth ) ## S4 method for signature 'RmbSpectraSet,numeric,numeric' cleanElnoise( peaks, noise = getOption("RMassBank")$electronicNoise, width = getOption("RMassBank")$electronicNoiseWidth )
peaks |
An aggregated peak frame as described in |
noise |
A numeric vector of known m/z of electronic noise peaks from the instrument Defaults to the entries in the RMassBank settings. |
width |
The window for the noise peak in m/z units. Defaults to the entries in the RMassBank settings. |
Extends the aggregate data frame by column noise
(logical), which is TRUE
if the peak is marked as noise.
cleanElnoise(peaks = data.frame, noise = numeric, width = numeric)
: Remove known electronic noise peaks
cleanElnoise(peaks = RmbSpectrum2, noise = numeric, width = numeric)
: Remove known electronic noise peaks
cleanElnoise(peaks = RmbSpectrum2List, noise = numeric, width = numeric)
: Remove known electronic noise peaks
cleanElnoise(peaks = RmbSpectraSet, noise = numeric, width = numeric)
: Remove known electronic noise peaks
Michael Stravs
# As used in the workflow: ## Not run: w@aggregated <- cleanElnoise(w@aggregated) ## End(Not run)
# As used in the workflow: ## Not run: w@aggregated <- cleanElnoise(w@aggregated) ## End(Not run)
Combines multiple msmsWorkspace items to one workspace which is used for multiplicity filtering.
combineMultiplicities(workspaces)
combineMultiplicities(workspaces)
workspaces |
A vector of |
This feature is particularily meant to be used in
conjunction with the confirmMode
option of msmsWorkflow
:
a file can be analyzed with confirmMode = 0
(default) and subsequently
with confirmMode = 1
(take second highest scan). The second analysis
should contain "the same" spectra as the first one (but less intense) and can
be used to confirm the peaks in the first spectra.
TO DO: Enable the combination of workspaces for combining e.g. multiple energy settings measured separately.
A msmsWorkspace
object prepared for step 8 processing.
Stravs MA, Eawag <[email protected]>
## Not run: w <- newMsmsWorkspace w@files <- c("spec1", "spec2") w1 <- msmsWorkflow(w, steps=c(1:7), mode="pH") w2 <- msmsWorkflow(w, steps=c(1:7), mode="pH", confirmMode = 1) wTotal <- combineMultiplicities(c(w1, w2)) wTotal <- msmsWorkflow(wTotal, steps=8, mode="pH", archivename = "output") # continue here with mbWorkflow ## End(Not run)
## Not run: w <- newMsmsWorkspace w@files <- c("spec1", "spec2") w1 <- msmsWorkflow(w, steps=c(1:7), mode="pH") w2 <- msmsWorkflow(w, steps=c(1:7), mode="pH", confirmMode = 1) wTotal <- combineMultiplicities(c(w1, w2)) wTotal <- msmsWorkflow(wTotal, steps=8, mode="pH", archivename = "output") # continue here with mbWorkflow ## End(Not run)
The resulting SDF will be written to a file named 'Compoundlist.sdf'. The header for each block is the chemical name, tags for ID, SMILES and CAS are added in the description block
compoundlist2SDF(filename)
compoundlist2SDF(filename)
filename |
character The name of the csv-file to be read. Note that the compoundlist has to be filtered already. |
This method has no return value.
pstahlhofen
## Not run: compoundlist2SDF("Compoundlist_filtered.csv") ## End(Not run)
## Not run: compoundlist2SDF("Compoundlist_filtered.csv") ## End(Not run)
This method will automatically look for all single-block JCAMP files in the directory by picking all files ending in '.dx' (and not '.jdx'). A csv-file named 'Compoundlist.csv' will be created in the same directory. The Compoundlist contains columns 'ID', 'Name', 'SMILES' and 'CAS' where 'SMILES' might be empty if the compound is a derivative or if the CAS number could not be converted (see CAS2SMILES).
createCompoundlist()
createCompoundlist()
This method has no return value.
pstahlhofen
CAS2SMILES
## Not run: # Prepare the compoundlist-creation splitMultiblockDX('my_multiblock_jcamp.jdx') createCompoundlist() ## End(Not run)
## Not run: # Prepare the compoundlist-creation splitMultiblockDX('my_multiblock_jcamp.jdx') createCompoundlist() ## End(Not run)
Creates a MOL file (in memory or on disk) for a compound specified by the compound ID or by a SMILES code.
createMolfile(id_or_smiles, fileName = FALSE)
createMolfile(id_or_smiles, fileName = FALSE)
id_or_smiles |
The compound ID or a SMILES code. |
fileName |
If the filename is set, the file is written directly to disk using the specified filename. Otherwise, it is returned as a text array. |
The function invokes OpenBabel (and therefore needs a correctly set
OpenBabel path in the RMassBank settings), using the SMILES code retrieved
with findSmiles
or using the SMILES code directly. The current
implementation of the workflow uses the latter version, reading the SMILES
code directly from the MassBank record itself.
A character array containing the MOL/SDF format file, ready to be written to disk.
Michael Stravs
OpenBabel: http://openbabel.org
# Benzene: ## Not run: createMolfile("C1=CC=CC=C1") ## End(Not run)
# Benzene: ## Not run: createMolfile("C1=CC=CC=C1") ## End(Not run)
Select a subset of external IDs from a CTS record.
CTS.externalIdSubset(data, database)
CTS.externalIdSubset(data, database)
data |
The complete CTS record as retrieved by |
database |
The database for which keys should be returned. |
Returns an array of all external identifiers stored in the record for the given database.
Michele Stravs, Eawag <[email protected]>
## Not run: # Return all CAS registry numbers stored for benzene. data <- getCtsRecord("UHOVQNZJYSORNB-UHFFFAOYSA-N") cas <- CTS.externalIdSubset(data, "CAS") ## End(Not run)
## Not run: # Return all CAS registry numbers stored for benzene. data <- getCtsRecord("UHOVQNZJYSORNB-UHFFFAOYSA-N") cas <- CTS.externalIdSubset(data, "CAS") ## End(Not run)
Find all available databases for a CTS record
CTS.externalIdTypes(data)
CTS.externalIdTypes(data)
data |
The complete CTS record as retrieved by |
Returns an array of all database names for which there are external identifiers stored in the record.
Michele Stravs, Eawag <[email protected]>
## Not run: # Return all databases for which the benzene entry has # links in the CTS record. data <- getCTS("UHOVQNZJYSORNB-UHFFFAOYSA-N") databases <- CTS.externalIdTypes(data) ## End(Not run)
## Not run: # Return all databases for which the benzene entry has # links in the CTS record. data <- getCTS("UHOVQNZJYSORNB-UHFFFAOYSA-N") databases <- CTS.externalIdTypes(data) ## End(Not run)
Calculates the Ring and Double Bond Equivalents for a chemical formula. The highest valence state of each atom is used, such that the returned DBE should never be below 0.
dbe(formula)
dbe(formula)
formula |
A molecular formula in text or list representation (e.g.
|
Returns the DBE for the given formula.
Michael Stravs
# dbe("C6H12O6")
# dbe("C6H12O6")
The deprofile
functions convert profile-mode high-resolution input data to "centroid"-mode
data amenable to i.e. centWave. This is done using full-width, half-height algorithm, spline
algorithm or local maximum method.
deprofile.scan(scan, noise = NA, method = "deprofile.fwhm", colnames = TRUE, ...) deprofile(df, noise, method, ...) deprofile.fwhm(df, noise = NA, cut = 0.5) deprofile.localMax(df, noise = NA) deprofile.spline(df, noise=NA, minPts = 5, step = 0.00001)
deprofile.scan(scan, noise = NA, method = "deprofile.fwhm", colnames = TRUE, ...) deprofile(df, noise, method, ...) deprofile.fwhm(df, noise = NA, cut = 0.5) deprofile.localMax(df, noise = NA) deprofile.spline(df, noise=NA, minPts = 5, step = 0.00001)
df |
A dataframe with at least the columns |
noise |
The noise cutoff. A peak is not included if the maximum stick intensity of the peak is below the noise cutoff. |
method |
"deprofile.fwhm" for full-width half-maximum or "deprofile.localMax" for local maximum. |
... |
Arguments to the workhorse functions |
scan |
A matrix with 2 columns for m/z and intensity; from profile-mode high-resolution data. Corresponds to the spectra obtained with xcms::getScan or mzR::peaks. |
colnames |
For deprofile.scan: return matrix with column names (xcms-style,
|
cut |
A parameter for |
minPts |
The minimal points count in a peak to build a spline; for peaks with less points the local maximum will be used instead. Note: The minimum value is 4! |
step |
The interpolation step for the calculated spline, which limits the maximum precision which can be achieved. |
The deprofile.fwhm
method is basically an R-semantic version of the "Exact Mass" m/z deprofiler
from MZmine. It takes the center between the m/z values at half-maximum intensity for the exact peak mass.
The logic is stolen verbatim from the Java MZmine algorithm, but it has been rewritten to use the fast
R vector operations instead of loops wherever possible. It's slower than the Java implementation, but
still decently fast IMO. Using matrices instead of the data frame would be more memory-efficient
and also faster, probably.
The deprofile.localMax
method uses local maxima and is probably the same used by e.g. Xcalibur.
For well-formed peaks, "deprofile.fwhm" gives more accurate mass results; for some applications,
deprofile.localMax
might be better (e.g. for fine isotopic structure peaks which are
not separated by a valley and also not at half maximum.) For MS2 peaks, which have no isotopes,
deprofile.fwhm
is probably the better choice generally.
deprofile.spline
calculates the mass using a cubic spline, as the HiRes peak detection
in OpenMS does.
The word "centroid" is used for convenience to denote not-profile-mode data. The data points are NOT mathematical centroids; we would like to have a better word for it.
The noise
parameter was only included for completeness, I personally don't use it.
deprofile.fwhm
and deprofile.localMax
are the workhorses;
deprofile.scan
takes a 2-column scan as input.
deprofile
dispatches the call to the appropriate worker method.
deprofile.scan
: a matrix with 2 columns for m/z and intensity
Known limitations: If the absolute leftmost stick or the absolute rightmost stick in a scan are maxima, they will be discarded! However, I don't think this will ever present a practical problem.
Michael Stravs, Eawag <[email protected]>
mzMine source code http://sourceforge.net/svn/?group_id=139835
## Not run: mzrFile <- openMSfile("myfile.mzML") acqNo <- xraw@acquisitionNum[[50]] scan.mzML.profile <- mzR::peaks(mzrFile, acqNo) scan.mzML <- deprofile.scan(scan.mzML.profile) close(mzrFile) ## End(Not run)
## Not run: mzrFile <- openMSfile("myfile.mzML") acqNo <- xraw@acquisitionNum[[50]] scan.mzML.profile <- mzR::peaks(mzrFile, acqNo) scan.mzML <- deprofile.scan(scan.mzML.profile) close(mzrFile) ## End(Not run)
Exports MassBank recfile data arrays and corresponding molfiles to physical files on hard disk, for one compound.
exportMassbank(compiled, molfile = NULL)
exportMassbank(compiled, molfile = NULL)
compiled |
|
molfile |
A molfile from |
The data from compiled
is still used here, because it contains the
"visible" accession number. In the plain-text format contained in
files
, the accession number is not "accessible" anymore since it's in
the file.
No return value.
An improvement would be to write the accession numbers into
names(compiled)
and later into names(files)
so compiled
wouldn't be needed here anymore. (The compound ID would have to go into
names(molfile)
, since it is also retrieved from compiled
.)
Michael Stravs
MassBank record format: http://www.massbank.jp/manuals/MassBankRecord_en.pdf
createMolfile
, toMassbank
,
mbWorkflow
This method takes the info which is added to the aggregated table in the reanalysis and multiplicity filtering steps of the workflow, and adds it back into the spectra.
fillback(o, id, aggregated) ## S4 method for signature 'msmsWorkspace,missing,missing' fillback(o) ## S4 method for signature 'RmbSpectraSet,missing,data.frame' fillback(o, aggregated) ## S4 method for signature 'RmbSpectrum2,character,data.frame' fillback(o, id, aggregated)
fillback(o, id, aggregated) ## S4 method for signature 'msmsWorkspace,missing,missing' fillback(o) ## S4 method for signature 'RmbSpectraSet,missing,data.frame' fillback(o, aggregated) ## S4 method for signature 'RmbSpectrum2,character,data.frame' fillback(o, id, aggregated)
o |
msmsWorkspace, RmbSpectraSet or RmbSpectrum2 The object information is filled back into. If applied to an RmbSpectraSet, information is added to all its RmbSpectrum2 children. If applied to the whole msmsWorkspace, information is added to all SpectraSets. |
id |
character or missing The id of the parent RmbSpectraSet if applied to RmbSpectrum2 |
aggregated |
data.frame or missing The aggregated table of the parent msmsWorkspace if applied to RmbSpectraSet or RmbSpectrum2 |
o msmsWorkspace, RmbSpectraSet or Rmbspectrum2 The same object that was given as input with new information filled into it
Read the Compoundlist given by the filename and write a 'Compoundlist_filtered.csv', containing only the lines with a SMILES string
filterCompoundlist(filename)
filterCompoundlist(filename)
filename |
character The name of the csv-file to be read |
This method has no return value.
pstahlhofen
## Not run: filterCompoundlist('Compoundlist.csv') ## End(Not run)
## Not run: filterCompoundlist('Compoundlist.csv') ## End(Not run)
Filters a peak table (with annotated formulas) for accuracy. Low-accuracy peaks are removed.
filterLowaccResults(peaks, mode="fine", filterSettings = getOption("RMassBank")$filterSettings)
filterLowaccResults(peaks, mode="fine", filterSettings = getOption("RMassBank")$filterSettings)
peaks |
A data frame with at least the columns |
mode |
|
filterSettings |
Settings for filtering. For details, see documentation of
|
In the coarse
mode, mass tolerance is set to 10 ppm (above m/z 120)
and 15 ppm (below m/z 120). This is useful for formula assignment before
recalibration, where a wide window is desirable to accomodate the high mass
deviations at low m/z values, so we get a nice recalibration curve.
In the fine
run, the mass tolerance is set to 5 ppm over the whole
mass range. This should be applied after recalibration.
A list(TRUE = goodPeakDataframe, FALSE = badPeakDataframe)
is
returned: A data frame with all peaks which are "good" is in
return[["TRUE"]]
.
Michael Stravs
analyzeMsMs
, filterPeakSatellites
# from analyzeMsMs: ## Not run: childPeaksFilt <- filterLowaccResults(childPeaksInt, filterMode)
# from analyzeMsMs: ## Not run: childPeaksFilt <- filterLowaccResults(childPeaksInt, filterMode)
Multiplicity filtering: Removes peaks which occur only once in a n-spectra set.
filterMultiplicity(w, archivename=NA, mode="pH", recalcBest = TRUE, multiplicityFilter = getOption("RMassBank")$multiplicityFilter)
filterMultiplicity(w, archivename=NA, mode="pH", recalcBest = TRUE, multiplicityFilter = getOption("RMassBank")$multiplicityFilter)
w |
Workspace containing the data to be processed (aggregate table and |
archivename |
The archive name, used for generation of archivename_Failpeaks.csv |
mode |
Mode of ion analysis |
recalcBest |
Boolean, whether to recalculate the formula multiplicity after the first multiplicity filtering step. Sometimes, setting this to FALSE can be a solution if you have many compounds with e.g. fluorine atoms, which often have multiple assigned formulas per peak and might occasionally lose peaks because of that. |
multiplicityFilter |
Threshold for the multiplicity filter. If set to 1, no filtering will apply (minimum 1 occurrence of peak). 2 equals minimum 2 occurrences etc. |
This function executes multiplicity filtering for a set of spectra using the
workhorse function filterPeaksMultiplicity
(see details there)
and retrieves problematic filtered peaks (peaks which are of high intensity
but were discarded, because either no formula was assigned or it was not
present at least 2x), using the workhorse function
problematicPeaks
. The results are returned in a format ready
for further processing with mbWorkflow
.
A list object with values:
peaksOK |
Peaks with >1-fold formula multiplicity from the "normal" peak analysis. |
peaksReanOK |
Peaks with >1-fold formula multiplicity from peak reanalysis. |
peaksFiltered |
All peaks with annotated formula multiplicity from first analysis. |
peaksFilteredReanalysis |
All peaks with annotated formula multiplicity from peak reanalysis. |
peaksProblematic |
Peaks with high intensity which do not match inclusion criteria -> possible false negatives. The list will be exported into archivename_failpeaks.csv. |
Michael Stravs
filterPeaksMultiplicity
,problematicPeaks
## Not run: refilteredRcSpecs <- filterMultiplicity( w, "myarchive", "pH") ## End(Not run)
## Not run: refilteredRcSpecs <- filterMultiplicity( w, "myarchive", "pH") ## End(Not run)
Filters satellite peaks in FT spectra which arise from FT artifacts and from conversion to stick mode. A very simple rule is used which holds mostly true for MSMS spectra (and shouldn't be applied to MS1 spectra which contain isotope structures...)
filterPeakSatellites(peaks, filterSettings = getOption("RMassBank")$filterSettings)
filterPeakSatellites(peaks, filterSettings = getOption("RMassBank")$filterSettings)
peaks |
A peak dataframe with at least the columns |
filterSettings |
The settings used for filtering. Refer to |
The function cuts off all peaks within 0.5 m/z from every peak, in decreasing intensity order, which are below 5 intensity. E.g. for peaks m/z=100, int=100; m/z=100.2, int=2, m/z=100.3, int=6, m/z 150, int=10: The most intense peak (m/z=100) is selected, all neighborhood peaks below 5 peak) and the next less intense peak is selected. Here this is the m/z=150 peak. All low-intensity neighborhood peaks are removed (nothing). The next less intense peak is selected (m/z=100.3) and again neighborhood peaks are cut away (nothing to cut here. Note that the m/z = 100.2 peak was alredy removed.)
Returns the peak table with satellite peaks removed.
This is a very crude rule, but works remarkably well for our spectra.
Michael Stravs
analyzeMsMs
, filterLowaccResults
# From the workflow: ## Not run: # Filter out satellite peaks: shot <- filterPeakSatellites(shot) shot_satellite_n <- setdiff(row.names(shot_full), row.names(shot)) shot_satellite <- shot_full[shot_satellite_n,] # shot_satellite contains the peaks which were eliminated as satellites. ## End(Not run)
# From the workflow: ## Not run: # Filter out satellite peaks: shot <- filterPeakSatellites(shot) shot_satellite_n <- setdiff(row.names(shot_full), row.names(shot)) shot_satellite <- shot_full[shot_satellite_n,] # shot_satellite contains the peaks which were eliminated as satellites. ## End(Not run)
For every compound, every peak (with annotated formula) is compared across all spectra. Peaks whose formula occurs only once for all collision energies / spectra types, are discarded. This eliminates "stochastic formula hits" of pure electronic noise peaks efficiently from the spectra. Note that in the author's experimental setup two spectra were recorded at every collision energy, and therefore every peak-formula should appear at least twice if it is real, even if it is by chance a fragment which appears on only one collision energy setting. The function was not tested in a different setup. Therefore, use with a bit of caution.
filterPeaksMultiplicity(w, recalcBest = TRUE)
filterPeaksMultiplicity(w, recalcBest = TRUE)
w |
a 'msmsWorkspace' object where formulas have been assigned to peaks |
recalcBest |
Whether the best formula for each peak should be re-determined.
This is necessary for results from the ordinary |
The peak table is returned, enriched with columns:
formulaMultiplicity
The # of occurrences of this formula
in the spectra of its compounds.
Michael Stravs, EAWAG <[email protected]>
## Not run: peaksFiltered <- filterPeaksMultiplicity(peaksMatched(w), "formula", TRUE) peaksOK <- subset(peaksFiltered, formulaMultiplicity > 1) ## End(Not run)
## Not run: peaksFiltered <- filterPeaksMultiplicity(peaksMatched(w), "formula", TRUE) peaksOK <- subset(peaksFiltered, formulaMultiplicity > 1) ## End(Not run)
Extract EICs from raw data for a determined mass window.
findEIC( msRaw, mz, limit = NULL, rtLimit = NA, headerCache = NULL, floatingRecalibration = NULL, peaksCache = NULL, polarity = NA, msLevel = 1, precursor = NULL )
findEIC( msRaw, mz, limit = NULL, rtLimit = NA, headerCache = NULL, floatingRecalibration = NULL, peaksCache = NULL, polarity = NA, msLevel = 1, precursor = NULL )
msRaw |
The mzR file handle |
mz |
The mass or mass range to extract the EIC for: either a single mass
(with the range specified by |
limit |
If a single mass was given for |
rtLimit |
If given, the retention time limits in form |
headerCache |
If present, the complete |
floatingRecalibration |
A fitting function that |
peaksCache |
If present, the complete output of |
polarity |
If a value is given, scans are filtered to this polarity before EIC extraction. Valid values are 1 for positive, 0 for negative (according to mzR), or a RMassBank 'mode' (e.g. 'pH, pM, mH...') from which the polarity can be derived. |
msLevel |
Which MS level to target for EIC extraction. By default this is 1; level 2 can be used to extract the EIC of fragments for a specific precursor or to extract EICs from DIA data. |
precursor |
Which precursor to target for EIC extraction. If 'NULL', the scans are not filtered by precursor. Use this only for 'msLevel != 1'. If a 'precursor' filter is set for 'msLevel == 1', all scans will be filtered out. |
A [rt, intensity, scan]
matrix (scan
being the scan number.)
Michael A. Stravs, Eawag <[email protected]>
findMsMsHR
Retrieves the exact mass of the uncharged molecule. It works directly from
the SMILES and therefore is used in the MassBank workflow
(mbWorkflow
) - there, all properties are calculated from the
SMILES code retrieved from the database. (Alternatively, takes also the
compound ID as parameter and looks it up.) Calculation relies on Rcdk.
findMass(cpdID_or_smiles, retrieval = "standard", mode = "pH")
findMass(cpdID_or_smiles, retrieval = "standard", mode = "pH")
cpdID_or_smiles |
SMILES code or compound ID of the molecule. (Numerics are treated as compound ID). |
retrieval |
A value that determines whether the files should be handled either as "standard", if the compoundlist is complete, "tentative", if at least a formula is present or "unknown" if the only know thing is the m/z |
mode |
|
Returns the exact mass of the uncharged molecule.
Michael Stravs
## findMass("OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O")
## findMass("OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O")
Extracts MS/MS spectra from LC-MS raw data for a specified precursor, specified
either via the RMassBank compound list (see loadList
) or via a mass.
findMsMsHR( fileName = NULL, msRaw = NULL, cpdID, mode = "pH", confirmMode = 0, useRtLimit = TRUE, ppmFine = getOption("RMassBank")$findMsMsRawSettings$ppmFine, mzCoarse = getOption("RMassBank")$findMsMsRawSettings$mzCoarse, fillPrecursorScan = getOption("RMassBank")$findMsMsRawSettings$fillPrecursorScan, rtMargin = getOption("RMassBank")$rtMargin, deprofile = getOption("RMassBank")$deprofile, headerCache = NULL, peaksCache = NULL, enforcePolarity = getOption("RMassBank")$enforcePolarity, diaWindows = getOption("RMassBank")$findMsMsRawSettings$diaWindows ) findMsMsHR.mass( msRaw, mz, limit.coarse, limit.fine, rtLimits = NA, maxCount = NA, headerCache = NULL, fillPrecursorScan = FALSE, deprofile = getOption("RMassBank")$deprofile, peaksCache = NULL, cpdID = NA, polarity = NA, diaWindows = getOption("RMassBank")$findMsMsRawSettings$diaWindows )
findMsMsHR( fileName = NULL, msRaw = NULL, cpdID, mode = "pH", confirmMode = 0, useRtLimit = TRUE, ppmFine = getOption("RMassBank")$findMsMsRawSettings$ppmFine, mzCoarse = getOption("RMassBank")$findMsMsRawSettings$mzCoarse, fillPrecursorScan = getOption("RMassBank")$findMsMsRawSettings$fillPrecursorScan, rtMargin = getOption("RMassBank")$rtMargin, deprofile = getOption("RMassBank")$deprofile, headerCache = NULL, peaksCache = NULL, enforcePolarity = getOption("RMassBank")$enforcePolarity, diaWindows = getOption("RMassBank")$findMsMsRawSettings$diaWindows ) findMsMsHR.mass( msRaw, mz, limit.coarse, limit.fine, rtLimits = NA, maxCount = NA, headerCache = NULL, fillPrecursorScan = FALSE, deprofile = getOption("RMassBank")$deprofile, peaksCache = NULL, cpdID = NA, polarity = NA, diaWindows = getOption("RMassBank")$findMsMsRawSettings$diaWindows )
fileName |
The file to open and search the MS2 spectrum in. |
msRaw |
The opened raw file (mzR file handle) to search the MS2 spectrum in. Specify either this
or |
cpdID |
The compound ID in the compound list (see |
mode |
The processing mode (determines which ion/adduct is searched):
|
confirmMode |
Whether to use the highest-intensity precursor (=0), second- highest (=1), third-highest (=2)... |
useRtLimit |
Whether to respect retention time limits from the compound list. |
ppmFine |
The limit in ppm to use for fine limit (see below) calculation. |
mzCoarse |
The coarse limit to use for locating potential MS2 scans: this tolerance is used when finding scans with a suitable precursor ion value. |
fillPrecursorScan |
If |
rtMargin |
The retention time tolerance to use. |
deprofile |
Whether deprofiling should take place, and what method should be
used (cf. |
headerCache |
If present, the complete |
peaksCache |
If present, the complete output of |
enforcePolarity |
If TRUE, scans are filtered for the given ‘mode'’s polarity when finding the target spectrum. |
diaWindows |
A data frame with columns |
mz |
The mass to use for spectrum search. |
limit.coarse |
Parameter in |
limit.fine |
The fine limit to use for locating MS2 scans: this tolerance is used when locating an appropriate analyte peak in the MS1 precursor spectrum. |
rtLimits |
|
maxCount |
The maximal number of spectra groups to return. One spectra group consists of all data-dependent scans from the same precursor whose precursor mass matches the specified search mass. |
polarity |
If set (for ?findMsMsHR.mass), scans are filtered for the given ‘mode'’s polarity when finding the target spectrum. |
Different versions of the function get the data from different sources. Note that findMsMsHR and findMsMsHR.direct differ mainly in that findMsMsHR opens a file whereas findMsMs.direct uses an open file handle - both are intended to be used in a full process which involves compound lists etc. In contrast, findMsMsHR.mass is a low-level function which uses the mass directly for lookup and is intended for use as a standalone function in unrelated applications.
An RmbSpectraSet
(for findMsMsHR
). Contains parent MS1 spectrum (@parent
), a block of dependent MS2 spectra ((@children
)
and some metadata (id
,mz
,name
,mode
in which the spectrum was acquired.
For findMsMsHR.mass
: a list of RmbSpectraSet
s as defined above, sorted
by decreasing precursor intensity.
findMsMsHR.mass()
: A submethod of find MsMsHR that retrieves basic spectrum data
findMsMs.direct
is deactivated
Michael A. Stravs, Eawag <[email protected]>
findEIC
## Not run: loadList("mycompoundlist.csv") # if Atrazine has compound ID 1: msms_atrazine <- findMsMsHR(fileName = "Atrazine_0001_pos.mzML", cpdID = 1, mode = "pH") # Or alternatively: msRaw <- openMSfile("Atrazine_0001_pos.mzML") msms_atrazine <- findMsMsHR(msRaw=msRaw, cpdID = 1, mode = "pH") # Or directly by mass (this will return a list of spectra sets): mz <- findMz(1)$mzCenter msms_atrazine_all <- findMsMsHR.mass(msRaw, mz, 1, ppm(msRaw, 10, p=TRUE)) msms_atrazine <- msms_atrazine_all[[1]] ## End(Not run)
## Not run: loadList("mycompoundlist.csv") # if Atrazine has compound ID 1: msms_atrazine <- findMsMsHR(fileName = "Atrazine_0001_pos.mzML", cpdID = 1, mode = "pH") # Or alternatively: msRaw <- openMSfile("Atrazine_0001_pos.mzML") msms_atrazine <- findMsMsHR(msRaw=msRaw, cpdID = 1, mode = "pH") # Or directly by mass (this will return a list of spectra sets): mz <- findMz(1)$mzCenter msms_atrazine_all <- findMsMsHR.mass(msRaw, mz, 1, ppm(msRaw, 10, p=TRUE)) msms_atrazine <- msms_atrazine_all[[1]] ## End(Not run)
This interface has been discontinued. findMsMsHR
now supports the same parameters (use named
parameters).
findMsMsHR.direct( msRaw, cpdID, mode = "pH", confirmMode = 0, useRtLimit = TRUE, ppmFine = getOption("RMassBank")$findMsMsRawSettings$ppmFine, mzCoarse = getOption("RMassBank")$findMsMsRawSettings$mzCoarse, fillPrecursorScan = getOption("RMassBank")$findMsMsRawSettings$fillPrecursorScan, rtMargin = getOption("RMassBank")$rtMargin, deprofile = getOption("RMassBank")$deprofile, headerCache = NULL )
findMsMsHR.direct( msRaw, cpdID, mode = "pH", confirmMode = 0, useRtLimit = TRUE, ppmFine = getOption("RMassBank")$findMsMsRawSettings$ppmFine, mzCoarse = getOption("RMassBank")$findMsMsRawSettings$mzCoarse, fillPrecursorScan = getOption("RMassBank")$findMsMsRawSettings$fillPrecursorScan, rtMargin = getOption("RMassBank")$rtMargin, deprofile = getOption("RMassBank")$deprofile, headerCache = NULL )
msRaw |
x |
cpdID |
x |
mode |
x |
confirmMode |
x |
useRtLimit |
x |
ppmFine |
x |
mzCoarse |
x |
fillPrecursorScan |
x |
rtMargin |
x |
deprofile |
x |
headerCache |
x |
an error
stravsmi
Extract an MS/MS spectrum or multiple MS/MS spectra based on the TIC of the MS2 and precursor mass, picking the most intense MS2 scan. Can be used, for example, to get a suitable MS2 from direct infusion data which was collected with purely targeted MS2 without MS1.
findMsMsHR.ticms2( msRaw, mz, limit.coarse, limit.fine, rtLimits = NA, maxCount = NA, headerCache = NULL, fillPrecursorScan = FALSE, deprofile = getOption("RMassBank")$deprofile, trace = "ms2tic" )
findMsMsHR.ticms2( msRaw, mz, limit.coarse, limit.fine, rtLimits = NA, maxCount = NA, headerCache = NULL, fillPrecursorScan = FALSE, deprofile = getOption("RMassBank")$deprofile, trace = "ms2tic" )
msRaw |
The mzR raw file |
mz |
Mass to find |
limit.coarse |
Allowed mass deviation for scan precursor (in m/z values) |
limit.fine |
Unused here, but present for interface compatiblity with findMsMsHR |
rtLimits |
Unused here, but present for interface compatiblity with findMsMsHR |
maxCount |
Maximal number of spectra to return |
headerCache |
Cached results of header(msRaw), either to speed up the operations or to operate with preselected header() data |
fillPrecursorScan |
Unused here, but present for interface compatiblity with findMsMsHR |
deprofile |
Whether deprofiling should take place, and what method should be
used (cf. |
trace |
Either |
Note that this is not a precise function and only really makes sense in direct infusion and if
the precursor is really known, because MS2 precursor data is only "roughly" accurate (to 2 dp).
The regular findMsMsHR
functions confirm the exact mass of the precursor in the MS1 scan.
a list of "spectrum sets" as defined in findMsMsHR
, sorted
by decreasing precursor intensity.
stravsmi
This function is currently used to read msp files containing data that were already processed in order to convert the results to MassBank records.
findMsMsHRperMsp(fileName, cpdIDs, mode = "pH") findMsMsHRperMsp.direct(fileName, cpdIDs, mode = "pH")
findMsMsHRperMsp(fileName, cpdIDs, mode = "pH") findMsMsHRperMsp.direct(fileName, cpdIDs, mode = "pH")
fileName |
vector of character-strings The msp files to be searched for spectra |
cpdIDs |
vector of integers The IDs of compounds in the compoundlist for which spectra should be retrieved |
mode |
character, default: "pH"
The processing mode that was used to produce the spectrum.
Should be one of
"pH": ([M+H]+)
"pNa": ([M+Na]+)
"pM": ([M]+)
"mH": ([M-H]-)
or "mFA": ([M+FA]-)
(see the |
An RmbSpectraSet
with integrated information from the msp files
findMsMsHRperMsp.direct()
: A submethod of findMsMsHrperxcms that retrieves basic spectrum data
Picks peaks from mz-files and returns the pseudospectra that CAMERA creates with the help of XCMS
findMsMsHRperxcms( fileName, cpdID, mode = "pH", findPeaksArgs = NULL, plots = FALSE, MSe = FALSE ) findMsMsHRperxcms.direct( fileName, cpdID, mode = "pH", findPeaksArgs = NULL, plots = FALSE, MSe = FALSE )
findMsMsHRperxcms( fileName, cpdID, mode = "pH", findPeaksArgs = NULL, plots = FALSE, MSe = FALSE ) findMsMsHRperxcms.direct( fileName, cpdID, mode = "pH", findPeaksArgs = NULL, plots = FALSE, MSe = FALSE )
fileName |
The path to the mz-file that should be read |
cpdID |
The compoundID(s) of the compound that has been used for the file |
mode |
The ionization mode that has been used for the spectrum represented by the peaklist |
findPeaksArgs |
A list of arguments that will be handed to the xcms-method findPeaks via do.call |
plots |
A parameter that determines whether the spectra should be plotted or not |
MSe |
A boolean value that determines whether the spectra were recorded using MSe or not |
The spectra generated from XCMS
findMsMsHRperxcms.direct()
: A submethod of findMsMsHrperxcms that retrieves basic spectrum data
Erik Mueller
## Not run: fileList <- list.files(system.file("XCMSinput", package = "RMassBank"), "Glucolesquerellin", full.names=TRUE)[3] loadList(system.file("XCMSinput/compoundList.csv",package="RMassBank")) psp <- findMsMsHRperxcms(fileList,2184) ## End(Not run)
## Not run: fileList <- list.files(system.file("XCMSinput", package = "RMassBank"), "Glucolesquerellin", full.names=TRUE)[3] loadList(system.file("XCMSinput/compoundList.csv",package="RMassBank")) psp <- findMsMsHRperxcms(fileList,2184) ## End(Not run)
Retrieves compound information from the loaded compound list or calculates it from the SMILES code in the list.
findMz(cpdID, mode = "pH", ppm = 10, deltaMz = 0, retrieval="standard", unknownMass = getOption("RMassBank")$unknownMass ) findRt(cpdID) findSmiles(cpdID) findFormula(cpdID, retrieval="standard") findCAS(cpdID) findName(cpdID) findLevel(cpdID, compact=FALSE)
findMz(cpdID, mode = "pH", ppm = 10, deltaMz = 0, retrieval="standard", unknownMass = getOption("RMassBank")$unknownMass ) findRt(cpdID) findSmiles(cpdID) findFormula(cpdID, retrieval="standard") findCAS(cpdID) findName(cpdID) findLevel(cpdID, compact=FALSE)
cpdID |
The compound ID in the compound list. |
mode |
Specifies the species of the molecule: An empty string specifies
uncharged monoisotopic mass, |
ppm |
Specifies ppm window (10 ppm will return the range of the molecular mass + and - 10 ppm). |
deltaMz |
Specifies additional m/z window to add to the range (deltaMz = 0.02 will return the range of the molecular mass +- 0.02 (and additionally +- the set ppm value). |
retrieval |
A value that determines whether the files should be handled either as "standard", if the compoundlist is complete, "tentative", if at least a formula is present or "unknown" if the only know thing is the m/z |
unknownMass |
'charged' or 'neutral' ('charged' assumed by default) specifies whether a mass of an unknown compound (level 5) refers to the charged or neutral mass (and correspondingly, whether it must be shifted or not to find the m/z value) |
compact |
Only for |
findMz
will return a list(mzCenter=, mzMin=, mzMax=)
with the molecular weight of the given ion, as calculated from the SMILES
code and Rcdk.
findRt
, findSmiles
,findCAS
,findName
will return
the corresponding entry from the compound list. findFormula
returns
the molecular formula as determined from the SMILES code.
Michael Stravs
findMass
, loadList
, findMz.formula
## Not run: % findMz(123, "pH", 5) findFormula(123) ## End(Not run)
## Not run: % findMz(123, "pH", 5) findFormula(123) ## End(Not run)
Find the exact mass +/- a given margin for a given formula or its ions and adducts.
findMz.formula(formula, mode = "pH", ppm = 10, deltaMz = 0)
findMz.formula(formula, mode = "pH", ppm = 10, deltaMz = 0)
formula |
The molecular formula in text or list format (see |
mode |
|
ppm |
The ppm margin to add/subtract |
deltaMz |
The absolute mass to add/subtract. Cumulative with |
A list(mzMin=, mzCenter=, mzMax=)
with the masses.
Michael A. Stravs, Eawag <[email protected]>
findMz.formula("C6H6")
findMz.formula("C6H6")
This function reads out the content of different slots of the workspace
object and finds out which steps have already been processed on it.
findProgress(workspace)
findProgress(workspace)
workspace |
A |
An array containing all msmsWorkflow
steps which have
likely been processed.
Stravs MA, Eawag <[email protected]>
## Not run: findProgress(w) ## End(Not run)
## Not run: findProgress(w) ## End(Not run)
flatten
converts a list of MassBank compound information sets (as
retrieved by gatherData
) to a flat table, to be exported into
an infolist. readMbdata
reads a single record
from an infolist flat table back into a MassBank (half-)entry.
flatten(mbdata) readMbdata(row)
flatten(mbdata) readMbdata(row)
mbdata |
A list of MassBank compound information sets as returned from
|
row |
One row of MassBank compound information retrieved from an infolist. |
Neither the flattening system itself nor the implementation are particularly fantastic, but since hand-checking of records is a necessary evil, there is currently no alternative (short of coding a complete GUI for this and working directly on the records.)
flatten
returns a tibble (not a data frame or matrix) to be written to
CSV.
readMbdata
returns a list of type list(id= compoundID,
..., 'ACCESSION' = '', 'RECORD_TITLE' = '', )
etc.
Michael Stravs
MassBank record format: http://www.massbank.jp/manuals/MassBankRecord_en.pdf
## Not run: # Collect some data to flatten ids <- c(40,50,60,70) data <- lapply(ids, gatherData) # Flatten the data trees to a table flat.table <- flatten(data) # reimport the table into a tree data.reimported <- apply(flat.table, 1, readMbdata) ## End(Not run)
## Not run: # Collect some data to flatten ids <- c(40,50,60,70) data <- lapply(ids, gatherData) # Flatten the data trees to a table flat.table <- flatten(data) # reimport the table into a tree data.reimported <- apply(flat.table, 1, readMbdata) ## End(Not run)
Converts molecular formulas from string to list representation or vice versa.
list.to.formula(flist) formulastring.to.list(formula)
list.to.formula(flist) formulastring.to.list(formula)
formula |
A molecular formula in string format, e.g. |
flist |
A molecular formula in list format, e.g. |
The function doesn't care about whether your formula makes sense. However,
"C3.5O4"
will give list("C" = 3, "O" = 4)
because regular
expressions are used for matching (however, list("C" = 3.5, "O" = 4)
gives "C3.5O4"
.) Duplicate elements cause problems; only "strict"
molecular formulas ("CH4O", but not "CH3OH") work correctly.
list.to.formula
returns a string representation of the
formula; formulastring.to.list
returns the list representation.
Michael Stravs
add.formula
, order.formula
,
is.valid.formula
# list.to.formula(list("C" = 4, "H" = 12)) # This is also OK and useful to calculate e.g. adducts or losses. list.to.formula(list("C" = 4, "H" = -1)) formulastring.to.list(list.to.formula(formulastring.to.list("CHIBr")))
# list.to.formula(list("C" = 4, "H" = 12)) # This is also OK and useful to calculate e.g. adducts or losses. list.to.formula(list("C" = 4, "H" = -1)) formulastring.to.list(list.to.formula(formulastring.to.list("CHIBr")))
Retrieves annotation data for a compound from the internet service US EPA CCTE based on the inchikey generated by babel or Cactus
gatherCCTE(key, api_key)
gatherCCTE(key, api_key)
key |
An Inchi-Key or other chemical identifier (e.g. Chemical name, DTXSID, CASRN, InChIKey, DTXCID) |
api_key |
An US EPA CCTE API key (personal or application) |
The data retrieved is the US EPA DTXSID, the US EPA chemical dashboard substance ID, the CAS-RN, the DTX preferred name, and the DTXCID (chemical ID).
Returns a list with 5 slots:
dtxsid
The US EPA chemical dashboard substance id
dtxcid
The US EPA chemical dashboard chemical id
preferredName
The US EPA chemical dashboard preferred name
casrn
The latest CAS registration number
smiles
The SMILES annotation of the structure
Tobias Schulze
CCTE REST: https://api-ccte.epa.gov/docs/
# Gather data for compound ID 131 ## Not run: gatherCCTE("QEIXBXXKTUNWDK-UHFFFAOYSA-N", api_key = NA)
# Gather data for compound ID 131 ## Not run: gatherCCTE("QEIXBXXKTUNWDK-UHFFFAOYSA-N", api_key = NA)
Retrieves annotation data for a compound from the internet services CTS, Pubchem, Chemspider and Cactvs, based on the SMILES code and name of the compounds stored in the compound list.
gatherData(id)
gatherData(id)
id |
The compound ID. |
Composes the "upper part" of a MassBank record filled with chemical data
about the compound: name, exact mass, structure, CAS no., links to PubChem,
KEGG, ChemSpider. The instrument type is also written into this block (even
if not strictly part of the chemical information). Additionally, index
fields are added at the start of the record, which will be removed later:
id, dbcas, dbname
from the compound list, dataused
to indicate
the used identifier for CTS search (smiles
or dbname
).
Additionally, the fields ACCESSION
and RECORD_TITLE
are
inserted empty and will be filled later on.
Returns a list of type list(id= compoundID, ...,
'ACCESSION' = '', 'RECORD_TITLE' = '', )
etc.
Michael Stravs
Chemical Translation Service: http://uranus.fiehnlab.ucdavis.edu:8080/cts/homePage cactus Chemical Identifier Resolver: http://cactus.nci.nih.gov/chemical/structure MassBank record format: http://www.massbank.jp/manuals/MassBankRecord_en.pdf Pubchem REST: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html Chemspider InChI conversion: https://www.chemspider.com/InChI.asmx
# Gather data for compound ID 131 ## Not run: gatherData(131)
# Gather data for compound ID 131 ## Not run: gatherData(131)
Retrieves annotation data for a compound by using babel, based on the SMILES code and name of the compounds stored in the compound list.
gatherDataBabel(id)
gatherDataBabel(id)
id |
The compound ID. |
Composes the "upper part" of a MassBank record filled with chemical data
about the compound: name, exact mass, structure, CAS no..
The instrument type is also written into this block (even
if not strictly part of the chemical information). Additionally, index
fields are added at the start of the record, which will be removed later:
id, dbcas, dbname
from the compound list.
Additionally, the fields ACCESSION
and RECORD_TITLE
are
inserted empty and will be filled later on.
This function is an alternative to gatherData, in case CTS is down or if information on one or more of the compounds in the compound list are sparse
Returns a list of type list(id= compoundID, ...,
'ACCESSION' = '', 'RECORD_TITLE' = '', )
etc.
Michael Stravs, Erik Mueller
MassBank record format: http://www.massbank.jp/manuals/MassBankRecord_en.pdf
# Gather data for compound ID 131 ## Not run: gatherDataBabel(131)
# Gather data for compound ID 131 ## Not run: gatherDataBabel(131)
Retrieves annotation data for an unknown compound by using basic information present
gatherDataUnknown(id, mode, retrieval)
gatherDataUnknown(id, mode, retrieval)
id |
The compound ID. |
mode |
|
retrieval |
A value that determines whether the files should be handled either as "standard", if the compoundlist is complete, "tentative", if at least a formula is present or "unknown" if the only know thing is the m/z |
Composes the "upper part" of a MassBank record filled with chemical data
about the compound: name, exact mass, structure, CAS no..
The instrument type is also written into this block (even
if not strictly part of the chemical information). Additionally, index
fields are added at the start of the record, which will be removed later:
id, dbcas, dbname
from the compound list.
Additionally, the fields ACCESSION
and RECORD_TITLE
are
inserted empty and will be filled later on.
This function is used to generate the data in case a substance is unknown, i.e. not enough information is present to derive anything about formulas or links
Returns a list of type list(id= compoundID, ...,
'ACCESSION' = '', 'RECORD_TITLE' = '', )
etc.
Michael Stravs, Erik Mueller
MassBank record format: http://www.massbank.jp/manuals/MassBankRecord_en.pdf
# Gather data for compound ID 131 ## Not run: gatherDataUnknown(131,"pH")
# Gather data for compound ID 131 ## Not run: gatherDataUnknown(131,"pH")
Retrieves annotation data for a compound from the internet service Pubchem based on the inchikey generated by babel or Cactus
gatherPubChem(key)
gatherPubChem(key)
key |
An Inchi-Key |
The data retrieved is the Pubchem CID, a synonym from the Pubchem database, the IUPAC name (using the preferred if available) and a Chebi link
Returns a list with 4 slots:
PcID
The Pubchem CID
Synonym
An arbitrary synonym for the compound name
IUPAC
A IUPAC-name (preferred if available)
Chebi
The identification number of the chebi database
Erik Mueller
Pubchem REST: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html Chebi: http://www.ebi.ac.uk/chebi
# Gather data for compound ID 131 ## Not run: gatherPubChem("QEIXBXXKTUNWDK-UHFFFAOYSA-N")
# Gather data for compound ID 131 ## Not run: gatherPubChem("QEIXBXXKTUNWDK-UHFFFAOYSA-N")
Collects the info for 'ai, ac_lc, ac_ms' for general, LC and MS info respectively. The info comes from the settings except for the compound-specific part, which is omitted if there is no 'cpd' specified.
getAnalyticalInfo(cpd = NULL)
getAnalyticalInfo(cpd = NULL)
cpd |
A 'RmbSpectraSet' object |
Retrieves information from the Cactus Chemical Identifier Resolver (PubChem).
getCactus(identifier, representation)
getCactus(identifier, representation)
identifier |
Any identifier interpreted by the resolver, e.g. an InChI key or a SMILES code. |
representation |
The desired representation, as required from the
resolver. e.g. |
It is not necessary to specify in which format the identifier
is.
Somehow, cactus does this automatically.
The result of the query, in plain text. Can be NA, or one or multiple lines (character array) of results.
Note that the InChI key is retrieved with a prefix (InChIkey=
),
which must be removed for most database searches in other databases (e.g.
CTS).
Michael Stravs
cactus Chemical Identifier Resolver: http://cactus.nci.nih.gov/chemical/structure
# Benzene: getCactus("C1=CC=CC=C1", "cas") getCactus("C1=CC=CC=C1", "stdinchikey") getCactus("C1=CC=CC=C1", "chemspider_id")
# Benzene: getCactus("C1=CC=CC=C1", "cas") getCactus("C1=CC=CC=C1", "stdinchikey") getCactus("C1=CC=CC=C1", "chemspider_id")
Retrieves CCTE CASRN from US EPA for a search term.
getCASRN(key, api_key)
getCASRN(key, api_key)
key |
ID to be converted |
api_key |
API key for CCTE |
Only the first result is returned currently. The function should be regarded as experimental and has not thoroughly been tested.
The CCTE CAS RN (in string type)
Tobias Schulze
CCTE search: https://api-ccte.epa.gov/docs
CCTE REST: https://api-ccte.epa.gov/docs
## Not run: getCASRN("MKXZASYAUGDDCJ-NJAFHUGGSA-N") ## End(Not run)
## Not run: getCASRN("MKXZASYAUGDDCJ-NJAFHUGGSA-N") ## End(Not run)
Given an InChIKey, this function queries the chemspider web API to retrieve the Chemspider ID of he compound with that InChIkey.
getCSID(query)
getCSID(query)
query |
The InChIKey of the compound |
Returns the chemspide
Michele Stravs, Eawag <[email protected]>
Erik Mueller, UFZ <[email protected]>
## Not run: # Return all CAS registry numbers stored for benzene. data <- getCtsRecord("UHOVQNZJYSORNB-UHFFFAOYSA-N") cas <- CTS.externalIdSubset(data, "CAS") ## End(Not run)
## Not run: # Return all CAS registry numbers stored for benzene. data <- getCtsRecord("UHOVQNZJYSORNB-UHFFFAOYSA-N") cas <- CTS.externalIdSubset(data, "CAS") ## End(Not run)
Convert a single ID to another using CTS.
getCtsKey(query, from = "Chemical Name", to = "InChIKey")
getCtsKey(query, from = "Chemical Name", to = "InChIKey")
query |
ID to be converted |
from |
Type of input ID |
to |
Desired output ID |
An unordered array with the resulting converted key(s).
Michele Stravs, Eawag <[email protected]>
k <- getCtsKey("benzene", "Chemical Name", "InChIKey")
k <- getCtsKey("benzene", "Chemical Name", "InChIKey")
Retrieves a complete CTS record from the InChI key.
getCtsRecord(key)
getCtsRecord(key)
key |
The InChI key. |
Returns a list with all information from CTS: inchikey,
inchicode, formula, exactmass
contain single values. synonyms
contains
an unordered list of scored synonyms (type, name, score
, where type
indicates either a normal name or a specific IUPAC name, see below).
externalIds
contains an unordered list of identifiers of the compound in
various databases (name, value
, where name
is the database name and
value
the identifier in that database.)
Currently, the CTS results are still incomplete; the name scores are all 0, formula and exact mass return zero.
Michele Stravs, Eawag <[email protected]>
Chemical Translation Service: https://cts.fiehnlab.ucdavis.edu
data <- getCtsRecord("UHOVQNZJYSORNB-UHFFFAOYSA-N") # show all synonym "types" types <- unique(unlist(lapply(data$synonyms, function(i) i$type))) ## Not run: print(types)
data <- getCtsRecord("UHOVQNZJYSORNB-UHFFFAOYSA-N") # show all synonym "types" types <- unique(unlist(lapply(data$synonyms, function(i) i$type))) ## Not run: print(types)
Returns a data frame with columns for all non-empty slots in a RmbSpectrum2
object. Note that MSnbase::Spectrum
has
a method as.data.frame
, however that one will return only mz, intensity. This function is kept separate to ensure downwards
compatibility since it returns more columns than MSnbase as.data.frame
.
## S4 method for signature 'RmbSpectrum2' getData(s)
## S4 method for signature 'RmbSpectrum2' getData(s)
s |
The |
A data frame with columns for every set slot.
stravsmi
Retrieves CCTE DTXCID from US EPA for a search term.
getDTXCID(key, api_key)
getDTXCID(key, api_key)
key |
ID to be converted |
api_key |
API key for CCTE |
Only the first result is returned currently. The function should be regarded as experimental and has not thoroughly been tested.
The DTXCID (in string type)
Tobias Schulze
CCTE search: https://api-ccte.epa.gov/docs
CCTE REST: https://api-ccte.epa.gov/docs
## Not run: getDTXCID("MKXZASYAUGDDCJ-NJAFHUGGSA-N") ## End(Not run)
## Not run: getDTXCID("MKXZASYAUGDDCJ-NJAFHUGGSA-N") ## End(Not run)
Retrieves CCTE DTXSID from US EPA for a search term.
getDTXSID(key, api_key)
getDTXSID(key, api_key)
key |
ID to be converted |
api_key |
API key for CCTE |
Only the first result is returned currently. The function should be regarded as experimental and has not thoroughly been tested.
The DTXSID (in string type)
Tobias Schulze
CCTE search: https://api-ccte.epa.gov/docs
CCTE REST: https://api-ccte.epa.gov/docs
## Not run: getDTXSID("MKXZASYAUGDDCJ-NJAFHUGGSA-N") ## End(Not run)
## Not run: getDTXSID("MKXZASYAUGDDCJ-NJAFHUGGSA-N") ## End(Not run)
Retrieves CCTE SMILES from US EPA for a search term.
getDTXSMILES(key, api_key)
getDTXSMILES(key, api_key)
key |
ID to be converted |
api_key |
API key for CCTE |
Only the first result is returned currently. The function should be regarded as experimental and has not thoroughly been tested.
The SMILES (in string type)
Tobias Schulze
CCTE search: https://api-ccte.epa.gov/docs
CCTE REST: https://api-ccte.epa.gov/docs
## Not run: getDTXSMILES("MKXZASYAUGDDCJ-NJAFHUGGSA-N") ## End(Not run)
## Not run: getDTXSMILES("MKXZASYAUGDDCJ-NJAFHUGGSA-N") ## End(Not run)
The content will always be returned as character-string
getField(parsedJDX, field_name)
getField(parsedJDX, field_name)
parsedJDX |
list as created by readJDX A parsed, single-block JCAMP file |
field_name |
character The name of the field (e.g. 'CAS REGISTRY NO') |
The field's content
pstahlhofen
readJDX
## Not run: parsedJDX <- readJDX('my_singleblock_jcamp.dx') title <- getField(parsedJDX, "TITLE") ## End(Not run)
## Not run: parsedJDX <- readJDX('my_singleblock_jcamp.dx') title <- getField(parsedJDX, "TITLE") ## End(Not run)
Generates a Rcdk molecule object from SMILES code, which is fully typed and
usable (in contrast to the built-in parse.smiles
).
getMolecule(smiles)
getMolecule(smiles)
smiles |
The SMILES code of the compound. |
NOTE: As of today (2012-03-16), Rcdk discards stereochemistry when loading the SMILES code! Therefore, do not trust this function blindly, e.g. don't generate InChI keys from the result. It is, however, useful if you want to compute the mass (or something else) with Rcdk.
A Rcdk IAtomContainer
reference.
Michael Stravs
# Lindane: getMolecule("C1(C(C(C(C(C1Cl)Cl)Cl)Cl)Cl)Cl") # Benzene: getMolecule("C1=CC=CC=C1")
# Lindane: getMolecule("C1(C(C(C(C(C1Cl)Cl)Cl)Cl)Cl)Cl") # Benzene: getMolecule("C1=CC=CC=C1")
Retrieves PubChem CIDs for a search term.
getPcId(query, from = "inchikey")
getPcId(query, from = "inchikey")
query |
ID to be converted |
from |
Type of input ID |
Only the first result is returned currently. The function should be regarded as experimental and has not thoroughly been tested.
The PubChem CID (in string type).
Michael Stravs, Erik Mueller
PubChem search: http://pubchem.ncbi.nlm.nih.gov/
Pubchem REST: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html
getPcId("MKXZASYAUGDDCJ-NJAFHUGGSA-N")
getPcId("MKXZASYAUGDDCJ-NJAFHUGGSA-N")
Retrieves CCTE Preferred Name from US EPA for a search term.
getPrefName(key, api_key)
getPrefName(key, api_key)
key |
ID to be converted |
api_key |
API key for CCTE |
Only the first result is returned currently. The function should be regarded as experimental and has not thoroughly been tested.
The CCTE Preferred Name (in string type)
Tobias Schulze
CCTE search: https://api-ccte.epa.gov/docs
CCTE REST: https://api-ccte.epa.gov/docs
## Not run: getPrefName("MKXZASYAUGDDCJ-NJAFHUGGSA-N") ## End(Not run)
## Not run: getPrefName("MKXZASYAUGDDCJ-NJAFHUGGSA-N") ## End(Not run)
Checks whether the formula is chemically valid, i.e. has no zero-count or negative-count elements.
is.valid.formula(formula)
is.valid.formula(formula)
formula |
A molecular formula in string or list representation
( |
The check is only meant to identify formulas which have negative elements,
which can arise from the subtraction of adducts. It is not a
high-level formula "validity" check like e.g. the Rcdk function
isvalid.formula
which uses the nitrogen rule or a DBE rule.
Michael Stravs
list.to.formula
, add.formula
,
order.formula
# is.valid.formula(list(C=0,H=1,Br=2)) is.valid.formula("CH2Cl") is.valid.formula("C0H2")
# is.valid.formula(list(C=0,H=1,Br=2)) is.valid.formula("CH2Cl") is.valid.formula("C0H2")
Loads MassBank compound information lists (i.e. the lists which were created
in the first two steps of the MassBank mbWorkflow
and
subsequently edited by hand.).
loadInfolists(mb, path) loadInfolist(mb, fileName) resetInfolists(mb)
loadInfolists(mb, path) loadInfolist(mb, fileName) resetInfolists(mb)
mb |
The |
path |
Directory in which the namelists reside. All CSV files in this directory will be loaded. |
fileName |
A single namelist to be loaded. |
resetInfolists
clears the information lists, i.e. it creates a new
empty list in mbdata_archive
. loadInfolist
loads a single CSV
file, whereas loadInfolists
loads a whole directory.
The new workspace with loaded/reset lists.
Michael Stravs, Tobias Schulze
# ## Not run: mb <- resetInfolists(mb) mb <- loadInfolist(mb, "my_csv_infolist.csv") ## End(Not run)
# ## Not run: mb <- resetInfolists(mb) mb <- loadInfolist(mb, "my_csv_infolist.csv") ## End(Not run)
Loads a CSV compound list with compound IDs
loadList(path, listEnv=NULL, check=TRUE) resetList()
loadList(path, listEnv=NULL, check=TRUE) resetList()
path |
Path to the CSV list. |
listEnv |
The environment to load the list into. By default, the namelist is loaded into an environment internally in RMassBank. |
check |
A parameter that specifies whether the SMILES-Codes in the list should be checked for readability by rcdk. |
The list is loaded into the variable compoundList
in the environment
listEnv
(which defaults to the global environment) and used by
the findMz
, findCAS
, ... functions. The CSV file is required to have at least the following columns, which are used for
further processing and must be named correctly (but present in any order): ID, Name, SMILES, RT,
CAS
resetList() clears a currently loaded list.
No return value.
Michael Stravs
## ## Not run: loadList("mylist.csv")
## ## Not run: loadList("mylist.csv")
Makes a list.tsv file in the "moldata" folder.
makeMollist(compiled)
makeMollist(compiled)
compiled |
list of |
Generates the list.tsv file which is needed by MassBank to connect records with their respective molfiles. The first compound name is linked to a mol-file with the compound ID (e.g. 2334.mol for ID 2334).
No return value.
Michael A. Stravs, Eawag <[email protected]>
Generates a peak cache table for use with findMsMsHR
functions.
makePeaksCache(msRaw, headerCache)
makePeaksCache(msRaw, headerCache)
msRaw |
the input raw datafile (opened) |
headerCache |
the cached header, or subset thereof for which peaks should be extracted. Peak extraction goes
by |
A list of dataframes as from mzR::peaks
.
stravsmi
Recalibrates MS/MS spectra by building a recalibration curve of the
assigned putative fragments of all spectra in aggregatedSpecs
(measured mass vs. mass of putative associated fragment) and additionally
the parent ion peaks.
makeRecalibration(w, recalibrateBy = getOption("RMassBank")$recalibrateBy, recalibrateMS1 = getOption("RMassBank")$recalibrateMS1, recalibrator = getOption("RMassBank")$recalibrator, recalibrateMS1Window = getOption("RMassBank")$recalibrateMS1Window ) recalibrateSpectra(rawspec = NULL, rc = NULL, rc.ms1=NULL, w = NULL, recalibrateBy = getOption("RMassBank")$recalibrateBy, recalibrateMS1 = getOption("RMassBank")$recalibrateMS1) recalibrateSingleSpec(spectrum, rc, recalibrateBy = getOption("RMassBank")$recalibrateBy)
makeRecalibration(w, recalibrateBy = getOption("RMassBank")$recalibrateBy, recalibrateMS1 = getOption("RMassBank")$recalibrateMS1, recalibrator = getOption("RMassBank")$recalibrator, recalibrateMS1Window = getOption("RMassBank")$recalibrateMS1Window ) recalibrateSpectra(rawspec = NULL, rc = NULL, rc.ms1=NULL, w = NULL, recalibrateBy = getOption("RMassBank")$recalibrateBy, recalibrateMS1 = getOption("RMassBank")$recalibrateMS1) recalibrateSingleSpec(spectrum, rc, recalibrateBy = getOption("RMassBank")$recalibrateBy)
w |
For |
recalibrateBy |
Whether recalibration should be done by ppm ("ppm") or by m/z ("mz"). |
recalibrateMS1 |
Whether MS1 spectra should be recalibrated separately ("separate"), together with MS2 ("common") or not at all ("none"). Usually taken from settings. |
recalibrator |
The recalibrator functions to be used.
Refer to |
recalibrateMS1Window |
Window width to look for MS1 peaks to recalibrate (in ppm). |
spectrum |
For |
rawspec |
For |
rc , rc.ms1
|
The recalibration curves to be used in the recalibration. |
Note that the actually used recalibration functions are governed by the
general MassBank settings (see recalibrate
).
If a set of acquired LC-MS runs contains spectra for two different ion types
(e.g. [M+H]+ and [M+Na]+) which should both be processed by RMassBank, it is
necessary to do this in two separate runs. Since it is likely that one ion type
will be the vast majority of spectra (e.g. most in [M+H]+ mode), and only few
spectra will be present for other specific adducts (e.g. only few [M+Na]+ spectra),
it is possible that too few spectra are present to build a good recalibration curve
using only e.g. the [M+Na]+ ions. Therefore we recommend, for one set of LC/MS runs,
to build the recalibration curve for one ion type
(msmsWorkflow(mode="pH", steps=c(1:8), newRecalibration=TRUE)
)
and reuse the same curve for processing different ion types
(msmsWorkflow(mode="pNa", steps=c(1:8), newRecalibration=FALSE)
).
This also ensures a consistent recalibration across all spectra of the same batch.
makeRecalibration
: a list(rc, rc.ms1)
with recalibration curves
for the MS2 and MS1 spectra.
recalibrateSpectra
: if rawspec
is not NULL
, returns the recalibrated
spectra as RmbSpectraSetList
. All spectra have their mass recalibrated and evaluation data deleted.
recalibrateSingleSpec
: the recalibrated Spectrum
(same object, recalibrated masses,
evaluation data like assigned formulae etc. deleted).
Michael Stravs, Eawag <[email protected]>
## Not run: rcCurve <- recalibrateSpectra(w, "pH") w@spectra <- recalibrateSpectra(mode="pH", rawspec=w@spectra, w=myWorkspace) w@spectra <- recalibrateSpectra(mode="pH", rawspec=w@spectra, rcCurve$rc, rcCurve$rc.ms1) ## End(Not run)
## Not run: rcCurve <- recalibrateSpectra(w, "pH") w@spectra <- recalibrateSpectra(mode="pH", rawspec=w@spectra, w=myWorkspace) w@spectra <- recalibrateSpectra(mode="pH", rawspec=w@spectra, rcCurve$rc, rcCurve$rc.ms1) ## End(Not run)
Uses data generated by msmsWorkflow
to create MassBank records.
mbWorkflow( mb, steps = c(1, 2, 3, 4, 5, 6, 7, 8), infolist_path = "./infolist.csv", gatherData = "online", filter = TRUE )
mbWorkflow( mb, steps = c(1, 2, 3, 4, 5, 6, 7, 8), infolist_path = "./infolist.csv", gatherData = "online", filter = TRUE )
mb |
The |
steps |
Which steps in the workflow to perform. |
infolist_path |
A path where to store newly downloaded compound informations, which should then be manually inspected. |
gatherData |
A variable denoting whether to retrieve information using several online databases |
filter |
If |
See the vignette vignette("RMassBank")
for detailed informations about the usage.
Steps:
Step 1: Find which compounds don't have annotation information yet. For these compounds, pull information from several databases (using gatherData).
Step 2: If new compounds were found, then export the infolist.csv and stop the workflow. Otherwise, continue.
Step 3: Take the archive data (in table format) and reformat it to MassBank tree format.
Step 4: Compile the spectra. Using the skeletons from the archive data, create MassBank records per compound and fill them with peak data for each spectrum. Also, assign accession numbers based on scan mode and relative scan no.
Step 5: Convert the internal tree-like representation of the MassBank data into flat-text string arrays (basically, into text-file style, but still in memory)
Step 6: For all OK records, generate a corresponding molfile with the structure of the compound, based on the SMILES entry from the MassBank record. (This molfile is still in memory only, not yet a physical file)
Step 7: If necessary, generate the appropriate subdirectories, and actually write the files to disk.
Step 8: Create the list.tsv in the molfiles folder, which is required by MassBank to attribute substances to their corresponding structure molfiles.
The processed mbWorkspace
.
Michael A. Stravs, Eawag <[email protected]>
## Not run: mb <- newMbWorkspace(w) # w being a msmsWorkspace mb <- loadInfolists(mb, "D:/myInfolistPath") mb <- mbWorkflow(mb, steps=c(1:3), "newinfos.csv") ## End(Not run)
## Not run: mb <- newMbWorkspace(w) # w being a msmsWorkspace mb <- loadInfolists(mb, "D:/myInfolistPath") mb <- mbWorkflow(mb, steps=c(1:3), "newinfos.csv") ## End(Not run)
mbWorkflow
dataA workspace which stores input and output data for use with mbWorkflow
.
## S4 method for signature 'mbWorkspace' show(object)
## S4 method for signature 'mbWorkspace' show(object)
object |
The |
Slots:
The corresponding
input data from msmsWorkspace-class
A list of additional peaks which can be loaded
using addPeaks
.
Infolist data: Data for
annotation of MassBank records, which can be loaded using
loadInfolists
.
Compiled tree-structured MassBank records. compiled_ok
contains
only the compounds with at least one valid spectrum.
Compiled MassBank records in text representation.
MOL files with the compound structures.
Index lists for internal use which denote which compounds have valid spectra.
Methods:
Shows a brief summary of the object. Currently only a stub.
Michael Stravs, Eawag <[email protected]>
This procedure first sorts peaks by intensity (descending sort) and then starts iterating over the peaks, removing all entries that deviate "sufficiently far" from the currently selected peak. See the Details section for a full explanation and information on how to fine-tune peak removal.
mergePeaks(peaks, ...) ## S4 method for signature 'data.frame' mergePeaks(peaks, ...) ## S4 method for signature 'matrix' mergePeaks(peaks, ...) ## S4 method for signature 'RmbSpectrum2' mergePeaks(peaks, ...) ## S4 method for signature 'Spectrum' mergePeaks(peaks, ...)
mergePeaks(peaks, ...) ## S4 method for signature 'data.frame' mergePeaks(peaks, ...) ## S4 method for signature 'matrix' mergePeaks(peaks, ...) ## S4 method for signature 'RmbSpectrum2' mergePeaks(peaks, ...) ## S4 method for signature 'Spectrum' mergePeaks(peaks, ...)
peaks |
data.frame, matrix or RmbSpectrum2
The peak-table to be merged. In case of an |
... |
3 numeric values These define cutoff limits (see details) |
Three parameters must be passed to mergePeaks
for
peak-removal control in this order:
- cutoff_dppm_limit
- cutoff_absolute_limit
- cutoff_intensity_limit
The method iterates through the peaks, beginning with the
highest-intensity peak and in each step removes all other
peaks that fulfill conditions 1 AND 2 relative to the selected peak
1. Their m/z value does not deviate too far from the one of the selected peak.
i.e. if the selected peak is p and the checked peak is c, it holds that
EITHER
|p$mz - c$mz| <= cutoff_absolute_limit
OR
|p$mz - c$mz| <= ppm(p$mz, cutoff_dppm_limit, p=TRUE)
(see ppm
)
2. Their intensity is much smaller than the one of the selected peak, i.e.
c$mz < cutoff_intensity_limit * p$mz
for a suitable cutoff_intensity_limit between 0 and 1.
object of the same class as peaks The result contains a reduced peak-table ordered by m/z
## Not run: mergePeaks(spectrum, 10, 0.5, 0.05)
## Not run: mergePeaks(spectrum, 10, 0.5, 0.05)
This method takes a collection of RmbSpectrum2
objects
and merges them into a single RmbSpectrum2
object
mergeSpectra(spectra, ...) ## S4 method for signature 'RmbSpectrum2List' mergeSpectra(spectra, ...)
mergeSpectra(spectra, ...) ## S4 method for signature 'RmbSpectrum2List' mergeSpectra(spectra, ...)
spectra |
|
... |
NOTHING (This parameter is reserved for future implementations of the generic) |
Information from all spectra is retrieved via getData
combined with rbind
and placed into the new spectrum with
setData
A single RmbSpectrum2
object
containing the merged information
The filenames of the raw LC-MS runs are read from the array files
in the global enviroment.
See the vignette vignette("RMassBank")
for further details about the
workflow.
msmsRead( w, filetable = NULL, files = NULL, cpdids = NULL, readMethod, mode = NULL, confirmMode = FALSE, useRtLimit = TRUE, Args = NULL, settings = getOption("RMassBank"), progressbar = "progressBarHook", MSe = FALSE, plots = FALSE )
msmsRead( w, filetable = NULL, files = NULL, cpdids = NULL, readMethod, mode = NULL, confirmMode = FALSE, useRtLimit = TRUE, Args = NULL, settings = getOption("RMassBank"), progressbar = "progressBarHook", MSe = FALSE, plots = FALSE )
w |
A |
filetable |
The path to a .csv-file that contains the columns "Files" and "ID" supplying the relationships between files and compound IDs. Either this or the parameter "files" need to be specified. |
files |
A vector or list containing the filenames of the files that are to be read as spectra. For the IDs to be inferred from the filenames alone, there need to be exactly 2 underscores. |
cpdids |
A vector or list containing the compound IDs of the files that are to be read as spectra.
The ordering of this and |
readMethod |
Several methods are available to get peak lists from the files. Currently supported are "mzR", "xcms", "MassBank" and "peaklist". The first two read MS/MS raw data, and differ in the strategy used to extract peaks. MassBank will read existing records, so that e.g. a recalibration can be performed, and "peaklist" just requires a CSV with two columns and the column header "mz", "int". |
mode |
|
confirmMode |
Defaults to false (use most intense precursor). Value 1 uses the 2nd-most intense precursor for a chosen ion (and its data-dependent scans) , etc. |
useRtLimit |
Whether to enforce the given retention time window. |
Args |
A list of arguments that will be handed to the xcms-method findPeaks via do.call |
settings |
Options to be used for processing. Defaults to the options loaded via
|
progressbar |
The progress bar callback to use. Only needed for specialized applications.
Cf. the documentation of |
MSe |
A boolean value that determines whether the spectra were recorded using MSe or not |
plots |
A boolean value that determines whether the pseudospectra in XCMS should be plotted |
The msmsWorkspace
with msms-spectra read.
Michael Stravs, Eawag <[email protected]>
Erik Mueller, UFZ
msmsWorkspace-class
, msmsWorkflow
The filenames of the raw LC-MS runs are read from the array files
in the global enviroment.
See the vignette vignette("RMassBank")
for further details about the
workflow.
msmsRead.RAW( w, xRAW = NULL, cpdids = NULL, mode, findPeaksArgs = NULL, settings = getOption("RMassBank"), progressbar = "progressBarHook", plots = FALSE )
msmsRead.RAW( w, xRAW = NULL, cpdids = NULL, mode, findPeaksArgs = NULL, settings = getOption("RMassBank"), progressbar = "progressBarHook", plots = FALSE )
w |
A |
xRAW |
A list of xcmsRaw objects whose peaks should be detected and added to the workspace.
The relevant data must be in the MS1 data of the xcmsRaw object. You can coerce the
msn-data in a usable object with the |
cpdids |
A vector or list containing the compound IDs of the files that are to be read as spectra.
The ordering of this and |
mode |
|
findPeaksArgs |
A list of arguments that will be handed to the xcms-method findPeaks via do.call |
settings |
Options to be used for processing. Defaults to the options loaded via
|
progressbar |
The progress bar callback to use. Only needed for specialized applications.
Cf. the documentation of |
plots |
A boolean value that determines whether the pseudospectra in XCMS should be plotted |
The msmsWorkspace
with msms-spectra read.
Michael Stravs, Eawag <[email protected]>
Erik Mueller, UFZ
msmsWorkspace-class
, msmsWorkflow
Extracts and processes spectra from a specified file list, according to loaded options and given parameters.
msmsWorkflow( w, mode = "pH", steps = c(1:8), confirmMode = FALSE, newRecalibration = TRUE, useRtLimit = TRUE, archivename = NA, readMethod = "mzR", filetable = NULL, findPeaksArgs = NULL, plots = FALSE, precursorscan.cf = FALSE, settings = getOption("RMassBank"), analyzeMethod = "formula", progressbar = "progressBarHook", MSe = FALSE )
msmsWorkflow( w, mode = "pH", steps = c(1:8), confirmMode = FALSE, newRecalibration = TRUE, useRtLimit = TRUE, archivename = NA, readMethod = "mzR", filetable = NULL, findPeaksArgs = NULL, plots = FALSE, precursorscan.cf = FALSE, settings = getOption("RMassBank"), analyzeMethod = "formula", progressbar = "progressBarHook", MSe = FALSE )
w |
A |
mode |
|
steps |
Which steps of the workflow to process. See the vignette
|
confirmMode |
Defaults to false (use most intense precursor). Value 1 uses the 2nd-most intense precursor for a chosen ion (and its data-dependent scans) , etc. |
newRecalibration |
Whether to generate a new recalibration curve ( |
useRtLimit |
Whether to enforce the given retention time window. |
archivename |
The prefix under which to store the analyzed result files. |
readMethod |
Several methods are available to get peak lists from the files. Currently supported are "mzR", "xcms", "MassBank" and "peaklist". The first two read MS/MS raw data, and differ in the strategy used to extract peaks. MassBank will read existing records, so that e.g. a recalibration can be performed, and "peaklist" just requires a CSV with two columns and the column header "mz", "int". |
filetable |
If including step 1 (data extraction), a 'filetable' argument
to be passed to |
findPeaksArgs |
A list of arguments that will be handed to the xcms-method findPeaks via do.call |
plots |
A parameter that determines whether the spectra should be plotted or not (This parameter is only used for the xcms-method) |
precursorscan.cf |
Whether to fill precursor scans. To be used with files which for some reasons do not contain precursor scan IDs in the mzML, e.g. AB Sciex converted files. |
settings |
Options to be used for processing. Defaults to the options loaded via
|
analyzeMethod |
The "method" parameter to pass to |
progressbar |
The progress bar callback to use. Only needed for specialized applications.
Cf. the documentation of |
MSe |
A boolean value that determines whether the spectra were recorded using MSe or not |
The filenames of the raw LC-MS runs are read from the array files
in the global enviroment.
See the vignette vignette("RMassBank")
for further details about the
workflow.
The processed msmsWorkspace
.
Michael Stravs, Eawag <[email protected]>
msmsWorkflow
dataA workspace which stores input and output data for msmsWorkflow
.
## S4 method for signature 'msmsWorkspace' show(object)
## S4 method for signature 'msmsWorkspace' show(object)
object |
The |
Slots:
The input file names
The spectra per compound (RmbSpectraSet
) extracted from the raw files
A data.frame with an aggregated peak table from all spectra
.
Further columns are added during processing.
The recalibration curves generated in workflow step 4.
For the workflow steps after 4: the parent workspace containing the state (spectra, aggregate) before recalibration, such that the workflow can be reprocessed from start.
The base name of the files the archive is stored to during the workflow.
The RMassBank settings used during the workflow, if stored with the workspace.
#'
Methods:
Shows a brief summary of the object and processing progress.
Michael Stravs, Eawag <[email protected]>
mbWorkflow
Creates a new workspace for use with mbWorkflow
.
newMbWorkspace(w)
newMbWorkspace(w)
w |
The input |
The workspace input data will be loaded from the msmsWorkspace-class
object provided by the parameter w
.
A new mbWorkflow
object with the loaded input data.
Michael Stravs, Eawag <[email protected]>
mbWorkflow
, msmsWorkspace-class
msmsWorkflow
Creates an empty workspace or loads an existing workspace from disk.
newMsmsWorkspace(files = character(0))
newMsmsWorkspace(files = character(0))
files |
If given, the files list to initialize the workspace with. |
newMsmsWorkspace
creates a new empty workspace for use with msmsWorkflow.
loadMsmsWorkspace
loads a workspace saved using archiveResults
.
Note that it also successfully loads data saved with the old RMassBank data format
into the new msmsWorkspace
object.
A new msmsWorkspace
object
Michael Stravs, Eawag <[email protected]>
msmsWorkflow
, msmsWorkspace-class
Scale spectrum to specified intensity range
## S4 method for signature 'RmbSpectrum2' normalize(object, ..., scale = 999, precision = 0, slot = "intensity")
## S4 method for signature 'RmbSpectrum2' normalize(object, ..., scale = 999, precision = 0, slot = "intensity")
object |
the 'RmbSpectrum2' object to scale |
... |
arguments passed to 'selectPeaks' to choose peaks for normalization |
scale |
Maximum intensity in normalized spectrum |
precision |
Digits after comma for normalized intensity, typically 0 |
slot |
Which property of the spectrum should be scaled |
Scale all spectra in a 'RmbSpectrum2List' to a specified intensity.
## S4 method for signature 'RmbSpectrum2List' normalize(object, ...)
## S4 method for signature 'RmbSpectrum2List' normalize(object, ...)
object |
the 'RmbSpectrum2List' with spectra to scale |
... |
Arguments passed to 'normalize,RmbSpectrum2' |
Orders a chemical formula in the commonly accepted order (CH followed by alphabetic ordering).
order.formula(formula, as.formula = TRUE, as.list = FALSE)
order.formula(formula, as.formula = TRUE, as.list = FALSE)
formula |
A molecular formula in string or list representation
( |
as.formula |
If |
as.list |
If |
Michele Stravs
list.to.formula
, add.formula
,
is.valid.formula
# order.formula("H4C9") order.formula("C2N5HClBr")
# order.formula("H4C9") order.formula("C2N5HClBr")
Can parse MassBank-records(only V2)
parseMassBank(Files)
parseMassBank(Files)
Files |
array of character-strings Paths to the plaintext-records that should be read |
The mbWorkspace
that the plaintext-record creates.
All parsed information will be stored in the 'compiled_ok' slot.
Erik Mueller
## Not run: paths <- c("filepath_to_records/RC000001.txt", "filepath_to_records/RC000002.txt") mb <- parseMassBank(paths) ## End(Not run)
## Not run: paths <- c("filepath_to_records/RC000001.txt", "filepath_to_records/RC000002.txt") mb <- parseMassBank(paths) ## End(Not run)
Can parse MassBank-records(only V2)
parseMbRecord(filename, readAnnotation=TRUE)
parseMbRecord(filename, readAnnotation=TRUE)
filename |
character A path to the plaintext-record that should be read |
readAnnotation |
logical, Default: TRUE If TRUE, parse annotations from the record file and add columns for 'formula', 'formulaCount', 'mzCalc' and 'dppm' to the peak table |
An RmbSpectrum2
object created from the plaintext-record
Erik Mueller
## Not run: parseMassBank("filepath_to_records/RC00001.txt") ## End(Not run)
## Not run: parseMassBank("filepath_to_records/RC00001.txt") ## End(Not run)
Select matching/unmatching peaks from aggregate table
peaksMatched(o) ## S4 method for signature 'data.frame' peaksMatched(o) ## S4 method for signature 'msmsWorkspace' peaksMatched(o)
peaksMatched(o) ## S4 method for signature 'data.frame' peaksMatched(o) ## S4 method for signature 'msmsWorkspace' peaksMatched(o)
o |
Workspace or aggregate table from a workspace |
Selects the peaks from the aggregate table which matched within filter criteria (peaksMatched
) or didn't match
(peaksUnmatched
).
peaksMatched(data.frame)
: A method to retrieve the matched peaks from the "aggregated" slot (a data.frame object) in an msmsWorkSpace
peaksMatched(msmsWorkspace)
: A method to retrieve the matched peaks from an msmsWorkSpace
stravsmi
Select matching/unmatching peaks from aggregate table
peaksUnmatched(o, cleaned = FALSE) ## S4 method for signature 'data.frame' peaksUnmatched(o, cleaned = FALSE) ## S4 method for signature 'msmsWorkspace' peaksUnmatched(o, cleaned = FALSE)
peaksUnmatched(o, cleaned = FALSE) ## S4 method for signature 'data.frame' peaksUnmatched(o, cleaned = FALSE) ## S4 method for signature 'msmsWorkspace' peaksUnmatched(o, cleaned = FALSE)
o |
Workspace or aggregate table from a workspace |
cleaned |
Return only peaks which pass electronic noise filtering if |
Selects the peaks from the aggregate table which matched within filter criteria (peaksMatched
) or didn't match
(peaksUnmatched
).
peaksUnmatched(data.frame)
: A method to retrieve the unmatched peaks from the "aggregated" slot (a data.frame object) in an msmsWorkSpace
peaksUnmatched(msmsWorkspace)
: A method to retrieve the unmatched peaks from an msmsWorkSpace
stravsmi
Plots the peaks of one or two mbWorkspace
to compare them.
plotMbWorkspaces(w1, w2 = NULL)
plotMbWorkspaces(w1, w2 = NULL)
w1 |
The |
w2 |
Another optional |
This functions plots one or two mbWorkspace
s in case the use has used different methods to acquire
similar spectra. w1
must always be supplied, while w2
is optional. The wokspaces need to be fully processed
for this function to work.
A logical indicating whether the information was plotted or not
Erik Mueller
# ## Not run: plotMbWorkspaces(w1,w2)
# ## Not run: plotMbWorkspaces(w1,w2)
Plot the recalibration graph.
plotRecalibration(w, recalibrateBy = getOption("RMassBank")$recalibrateBy) plotRecalibration.direct(rcdata, rc, rc.ms1, title, mzrange, recalibrateBy = getOption("RMassBank")$recalibrateBy)
plotRecalibration(w, recalibrateBy = getOption("RMassBank")$recalibrateBy) plotRecalibration.direct(rcdata, rc, rc.ms1, title, mzrange, recalibrateBy = getOption("RMassBank")$recalibrateBy)
w |
The workspace to plot the calibration graph from |
recalibrateBy |
Whether recalibration was done by ppm ("ppm") or by m/z ("mz"). Important only for graph labeling here. |
rcdata |
A data frame with columns |
rc |
Predictor for MS2 data |
rc.ms1 |
Predictor for MS1 data |
title |
Prefix for the graph titles |
mzrange |
m/z value range for the graph |
Michele Stravs, Eawag <[email protected]>
Calculates ppm values for a given mass.
ppm(mass, dppm, l = FALSE, p = FALSE)
ppm(mass, dppm, l = FALSE, p = FALSE)
mass |
The "real" mass |
dppm |
The mass deviation to calculate |
l |
Boolean: return limits? Defaults to FALSE. |
p |
Boolean: return ppm error itself? Defaults to FALSE. |
This is a helper function used in RMassBank code.
By default (l=FALSE, p=FALSE
) the function returns the mass plus the
ppm error (for 123.00000 and 10 ppm: 123.00123, or for 123 and -10 ppm:
122.99877).
For l=TRUE
, the function returns the upper and lower limit (sic!)
For p=TRUE
, just the difference itself is returned (0.00123 for 123/10ppm).
Michael A. Stravs, Eawag <[email protected]>
ppm(100, 10)
ppm(100, 10)
Finds a list of peaks in spectra with a high relative intensity (>10 1e4, or >1 checked. Peaks orbiting around the parent peak mass (calculated from the compound ID), which are very likely co-isolated substances, are ignored.
problematicPeaks(sp)
problematicPeaks(sp)
sp |
a RmbSpectrum2 object to be checked for problematic peaks. |
The modified RmbSpectrum2 object with additional columns/properties 'problematicPeaks' (logical 'TRUE' if the peak is intense and unannotated), 'aMax' (base peak intensity), 'mzCenter' (the precursor m/z).
TODO: there is hardcoded logic in this function that needs to be resolved eventually!
Michael Stravs
# As used in the workflow: sp <- new("RmbSpectrum2", mz = c(100,200,300,400,500), intensity = c(999999,888888,777777,666666,555555)) sp@ok <- TRUE property(sp, "mzFound", addNew=TRUE) <- sp@mz sp@good <- c(TRUE, TRUE, TRUE, FALSE, FALSE) sp@precursorMz <- 600 sp_checked <- problematicPeaks(sp) # stopifnot(sum(getData(sp_checked)$problematicPeak) == 2)
# As used in the workflow: sp <- new("RmbSpectrum2", mz = c(100,200,300,400,500), intensity = c(999999,888888,777777,666666,555555)) sp@ok <- TRUE property(sp, "mzFound", addNew=TRUE) <- sp@mz sp@good <- c(TRUE, TRUE, TRUE, FALSE, FALSE) sp@precursorMz <- 600 sp_checked <- problematicPeaks(sp) # stopifnot(sum(getData(sp_checked)$problematicPeak) == 2)
Generates a list of intense unmatched peaks for further review (the "failpeak list") and exports it if the archive name is given.
processProblematicPeaks(w, archivename = NA)
processProblematicPeaks(w, archivename = NA)
w |
|
archivename |
Base name of the archive to write to (for "abc" the exported failpeaks list will be "abc_Failpeaks.csv"). if the compoundlist is complete, "tentative", if at least a formula is present or "unknown" if the only know thing is the m/z |
Returns the aggregate data.frame with added column "problematic
" (logical) which marks peaks which match the problematic criteria
stravsmi
This function provides a standard implementation for the progress bar in RMassBank.
progressBarHook(object = NULL, value = 0, min = 0, max = 100, close = FALSE)
progressBarHook(object = NULL, value = 0, min = 0, max = 100, close = FALSE)
object |
An identifier representing an instance of a progress bar. |
value |
The new value to assign to the progress indicator |
min |
The minimal value of the progress indicator |
max |
The maximal value of the progress indicator |
close |
If |
RMassBank calls the progress bar function in the following three ways:
pb <- progressBarHook(object=NULL, value=0, min=0, max=LEN)
to create a new progress bar.
pb <- progressBarHook(object=pb, value= VAL)
to set the progress bar to a new value (between the set min
and max
)
progressBarHook(object=pb, close=TRUE)
to close the progress bar. (The actual calls are performed with do.call
,
e.g.
progressbar <- "progressBarHook"
pb <- do.call(progressbar, list(object=pb, value= nProg))
. See the source code for details.)
To substitute the standard progress bar for an alternative implementation (e.g. for
use in a GUI), the developer can write his own function which behaves in the same way
as progressBarHook
, i.e. takes the same parameters and can be called in the
same way.
Returns a progress bar instance identifier (i.e. an identifier
which can be used as object
in subsequent calls.)
Michele Stravs, Eawag <[email protected]>
This searches the 'properties' slot of the object and returns a column with matching name (if found) or NULL otherwise.
property(o, property) ## S4 method for signature 'RmbSpectrum2,character' property(o, property)
property(o, property) ## S4 method for signature 'RmbSpectrum2,character' property(o, property)
o |
|
property |
character The name of a property |
The corresponding column of o@properties
Update the 'properties' slot of the given object.
If the column you want to update does not exist yet and
addNew = FALSE
(default), this will cause a warning
and the object will not be changed
property(o, property, addNew=FALSE, class="") <- value ## S4 replacement method for signature 'RmbSpectrum2,character,logical,character' property(o, property, addNew = FALSE, class = "") <- value ## S4 replacement method for signature 'RmbSpectrum2,character,missing,character' property(o, property, addNew = FALSE, class = "") <- value ## S4 replacement method for signature 'RmbSpectrum2,character,logical,missing' property(o, property, addNew = FALSE, class = "") <- value ## S4 replacement method for signature 'RmbSpectrum2,character,missing,missing' property(o, property, addNew = FALSE, class = "") <- value
property(o, property, addNew=FALSE, class="") <- value ## S4 replacement method for signature 'RmbSpectrum2,character,logical,character' property(o, property, addNew = FALSE, class = "") <- value ## S4 replacement method for signature 'RmbSpectrum2,character,missing,character' property(o, property, addNew = FALSE, class = "") <- value ## S4 replacement method for signature 'RmbSpectrum2,character,logical,missing' property(o, property, addNew = FALSE, class = "") <- value ## S4 replacement method for signature 'RmbSpectrum2,character,missing,missing' property(o, property, addNew = FALSE, class = "") <- value
o |
|
property |
character The name of the column in the 'properties' data frame to be updated |
addNew |
logical, Default: FALSE Whether or not a new column should be added in case a column of the given name does not exist yet. |
class |
character or missing The class of the entries for the column to be added/updated |
value |
ANY The value(s) to be written into the column |
Please note that this is a replacement method, meaning that
property(o, property) <- value
can be used as a short-hand for the equivalent
o <- 'property<-'(o, property, value)
The RmbSpectrum2
object with an updated 'properties' slot
Reanalysis of peaks with no matching molecular formula by allowing additional elements (e.g. "N2O").
reanalyzeFailpeaks(w, custom_additions, filterSettings = getOption("RMassBank")$filterSettings, progressbar = "progressBarHook") reanalyzeFailpeak(mass, custom_additions, cpdID, mode, filterSettings = getOption("RMassBank")$filterSettings)
reanalyzeFailpeaks(w, custom_additions, filterSettings = getOption("RMassBank")$filterSettings, progressbar = "progressBarHook") reanalyzeFailpeak(mass, custom_additions, cpdID, mode, filterSettings = getOption("RMassBank")$filterSettings)
w |
A 'msmsWorkspace' with annotated peaks. |
custom_additions |
The allowed additions, e.g. "N2O". |
filterSettings |
Settings for filtering data. Refer to |
progressbar |
The progress bar callback to use. Only needed for specialized
applications. Cf. the documentation of |
mass |
(Usually recalibrated) m/z value of the peak. |
cpdID |
Compound ID of this spectrum. |
mode |
for 'reanalyzeFailpeak', the 'mode' (adduct) of the analyzed spectrum. |
reanalyzeFailpeaks
examines the unmatchedPeaksC
table in
specs
and sends every peak through reanalyzeFailpeak
.
The aggregate data frame extended by the columns: #'
reanalyzed.??? |
If reanalysis (step 7) has already been processed: matching values from the reanalyzed peaks |
matchedReanalysis |
Whether reanalysis has matched ( |
It would be good to merge the analysis functions of analyzeMsMs
with
the one used here, to simplify code changes.
Michael Stravs
## As used in the workflow: ## Not run: reanalyzedRcSpecs <- reanalyzeFailpeaks(w@aggregated, custom_additions="N2O", mode="pH") # A single peak: reanalyzeFailpeak("N2O", 105.0447, 1234, 1, 1) ## End(Not run)
## As used in the workflow: ## Not run: reanalyzedRcSpecs <- reanalyzeFailpeaks(w@aggregated, custom_additions="N2O", mode="pH") # A single peak: reanalyzeFailpeak("N2O", 105.0447, 1234, 1, 1) ## End(Not run)
Predefined fits to use for recalibration: Loess fit and GAM fit.
recalibrate.loess(rcdata) recalibrate.identity(rcdata) recalibrate.mean(rcdata) recalibrate.linear(rcdata)
recalibrate.loess(rcdata) recalibrate.identity(rcdata) recalibrate.mean(rcdata) recalibrate.linear(rcdata)
rcdata |
A data frame with at least the columns |
recalibrate.loess()
provides a Loess fit (recalibrate.loess
)
to a given recalibration parameter.
If MS and MS/MS data should be fit together, recalibrate.loess
provides good default settings for Orbitrap instruments.
recalibrate.identity()
returns a non-recalibration, i.e. a predictor
which predicts 0 for all input values. This can be used if the user wants to
skip recalibration in the RMassBank workflow.
#' recalibrate.mean()
and recalibrate.linear()
are simple recalibrations
which return a constant shift or a linear recalibration. They will be only useful
in particular cases.
recalibrate()
itself is only a dummy function and does not do anything.
Alternatively other functions can be defined. Which functions are used for recalibration
is specified by the RMassBank options file. (Note: if recalibrateMS1: common
, the
recalibrator: MS1
value is irrelevant, since for a common curve generated with
the function specified in recalibrator: MS2
will be used.)
Returns a model for recalibration to be used with predict
and the like.
Michael Stravs, EAWAG <[email protected]>
## Not run: rcdata <- subset(spec$peaksMatched, formulaCount==1) ms1data <- recalibrate.addMS1data(spec, mode, 15) rcdata <- rbind(rcdata, ms1data) rcdata$recalfield <- rcdata$dppm rcCurve <- recalibrate.loess(rcdata) # define a spectrum and recalibrate it s <- matrix(c(100,150,200,88.8887,95.0005,222.2223), ncol=2) colnames(s) <- c("mz", "int") recalS <- recalibrateSingleSpec(s, rcCurve) Alternative: define an custom recalibrator function with different parameters recalibrate.MyOwnLoess <- function(rcdata) { return(loess(recalfield ~ mzFound, data=rcdata, family=c("symmetric"), degree = 2, span=0.4)) } # This can then be specified in the RMassBank settings file: # recalibrateMS1: common # recalibrator: # MS1: recalibrate.loess # MS2: recalibrate.MyOwnLoess") # [...] ## End(Not run)
## Not run: rcdata <- subset(spec$peaksMatched, formulaCount==1) ms1data <- recalibrate.addMS1data(spec, mode, 15) rcdata <- rbind(rcdata, ms1data) rcdata$recalfield <- rcdata$dppm rcCurve <- recalibrate.loess(rcdata) # define a spectrum and recalibrate it s <- matrix(c(100,150,200,88.8887,95.0005,222.2223), ncol=2) colnames(s) <- c("mz", "int") recalS <- recalibrateSingleSpec(s, rcCurve) Alternative: define an custom recalibrator function with different parameters recalibrate.MyOwnLoess <- function(rcdata) { return(loess(recalfield ~ mzFound, data=rcdata, family=c("symmetric"), degree = 2, span=0.4)) } # This can then be specified in the RMassBank settings file: # recalibrateMS1: common # recalibrator: # MS1: recalibrate.loess # MS2: recalibrate.MyOwnLoess") # [...] ## End(Not run)
Returns the precursor peaks for all MS1 spectra in the spec
dataset
with annotated formula to be used in recalibration.
For all spectra in spec$specFound
, the precursor ion is extracted from
the MS1 precursor spectrum. All found ions are returned in a data frame with a
format matching spec$peaksMatched
and therefore suitable for rbind
ing
to the spec$peaksMatched
table. However, only minimal information needed for
recalibration is returned.
recalibrate.addMS1data(spec, recalibrateMS1Window = getOption("RMassBank")$recalibrateMS1Window)
recalibrate.addMS1data(spec, recalibrateMS1Window = getOption("RMassBank")$recalibrateMS1Window)
spec |
A |
recalibrateMS1Window |
Window width to look for MS1 peaks to recalibrate (in ppm). |
A dataframe with columns mzFound, formula, mzCalc, dppm, dbe, int,
dppmBest, formulaCount, good, cpdID, scan, parentScan, dppmRc
. However,
columns dbe, int, formulaCount, good, scan, parentScan
do not contain
real information and are provided only as fillers.
Michael Stravs, EAWAG <[email protected]>
## Not run: # More or less as used in recalibrateSpectra: rcdata <- peaksMatched(w) rcdata <- rcdata[rcdata$formulaCount == 1, ,drop=FALSE] ms1data <- recalibrate.addMS1data(w, "pH", 15) rcdata <- rbind(rcdata, ms1data) # ... continue constructing recalibration curve with rcdata ## End(Not run)
## Not run: # More or less as used in recalibrateSpectra: rcdata <- peaksMatched(w) rcdata <- rcdata[rcdata$formulaCount == 1, ,drop=FALSE] ms1data <- recalibrate.addMS1data(w, "pH", 15) rcdata <- rbind(rcdata, ms1data) # ... continue constructing recalibration curve with rcdata ## End(Not run)
The logging file to be used can be specified by the user in the logging_file
field of settings.ini
rmb_log_debug(...)
rmb_log_debug(...)
... |
The log message, as for 'logger::log_...' functions |
pstahlhofen
logger::log_debug
The logging file to be used can be specified by the user in the logging_file
field of settings.ini
rmb_log_error(...)
rmb_log_error(...)
... |
The log message, as for 'logger::log_...' functions |
pstahlhofen
logger::log_error
The logging file to be used can be specified by the user in the logging_file
field of settings.ini
rmb_log_fatal(...)
rmb_log_fatal(...)
... |
The log message, as for 'logger::log_...' functions |
pstahlhofen
logger::log_fatal
The logging file to be used can be specified by the user in the logging_file
field of settings.ini
rmb_log_info(...)
rmb_log_info(...)
... |
The log message, as for 'logger::log_...' functions |
pstahlhofen
logger::log_info
The logging file to be used can be specified by the user in the logging_file
field of settings.ini
rmb_log_success(...)
rmb_log_success(...)
... |
The log message, as for 'logger::log_...' functions |
pstahlhofen
logger::log_success
The logging file to be used can be specified by the user in the logging_file
field of settings.ini
rmb_log_trace(...)
rmb_log_trace(...)
... |
The log message, as for 'logger::log_...' functions |
pstahlhofen
logger::log_trace
The logging file to be used can be specified by the user in the logging_file
field of settings.ini
rmb_log_warn(...)
rmb_log_warn(...)
... |
The log message, as for 'logger::log_...' functions |
pstahlhofen
logger::log_warn
Load, set and reset settings for RMassBank.
loadRmbSettings(file_or_list) loadRmbSettingsFromEnv(env = .GlobalEnv) RmbDefaultSettings() RmbSettingsTemplate(target)
loadRmbSettings(file_or_list) loadRmbSettingsFromEnv(env = .GlobalEnv) RmbDefaultSettings() RmbSettingsTemplate(target)
file_or_list |
The file (YML or R format) or R |
target |
The path where the template setting file should be stored. |
env |
The environment to load the settings from. |
RmbSettingsTemplate
creates a template file in which you can adjust the
settings as you like. Before using RMassBank, you must then load the
settings file using loadRmbSettings
. RmbDefaultSettings
loads
the default settings. loadRmbSettingsFromEnv
loads the settings
stored in env$RmbSettings, which is useful when reloading archives with
saved settings inside.
Note: no settings are loaded upon loading MassBank! This is intended, so that one never forgets to load the correct settings.
The settings are described in RmbSettings
.
None.
The default settings will not work for you unless you have, by chance, installed OpenBabel into the same directory as I have!
Michael Stravs
# Create a standard settings file and load it (unedited) RmbSettingsTemplate("mysettings.ini") loadRmbSettings("mysettings.ini") unlink("mysettings.ini")
# Create a standard settings file and load it (unedited) RmbSettingsTemplate("mysettings.ini") loadRmbSettings("mysettings.ini") unlink("mysettings.ini")
Describes all settings for the RMassBank settings file.
deprofile
Whether and how to deprofile input raw files. Leave the
setting empty if your raw files are already in "centroid" mode. If your
input files are in profile mode, you have the choice between algorithms
deprofile.spline, deprofile.fwhm, deprofile.localMax
; refer to
the individual manpages for more information.
rtMargin, rtShift
The allowed retention time deviation relative to the
values specified in your compound list (see loadList
), and the systematic
shift (due to the use of, e.g., pre-columns or other special equipment.
babeldir
Directory to OpenBabel. Required for creating molfiles for MassBank export.
If no OpenBabel directory is given, RMassBank will attempt to use the CACTUS webservice
for SDF generation. It is strongly advised to install OpenBabel; the CACTUS structures
have explicit hydrogen atoms.
The path should point to the directory where babel.exe (or the Linux "babel" equivalent) lies.
use_version
Which MassBank record format to use; version 2 is strongly advised,
version 1 is considered outdated and should be used only if for some reason you are running
old servers and an upgrade is not feasible.
use_rean_peaks
Whether to include peaks from reanalysis (see
reanalyzeFailpeaks
) in the MassBank records. Boolean, TRUE or FALSE.
annotations
A list of constant annotations to use in the MassBank records. The entries
authors, copyright, license, instrument, instrument_type, compound_class
correspond to the MassBank entries AUTHORS, COPYRIGHT, PUBLICATION, LICENSE, AC$INSTRUMENT,
AC$INSTRUMENT_TYPE, CH$COMPOUND_CLASS
. The entry confidence_comment
is added as
COMMENT: CONFIDENCE
entry.
The entry internal_id_fieldname
is used to name
the MassBank entry which will keep a reference to the internal compound ID used in
the workflow: for internal_id_fieldname = MYID
and e.g. compound 1234, an
entry will be added to the MassBank record with
COMMENT: MYID 1234
. The internal fieldname should not be left empty!
The entries lc_gradient, lc_flow, lc_solvent_a, lc_solvent_b, lc_column
correspond
to the MassBank entries AC$CHROMATOGRAPHY: FLOW_GRADIENT, FLOW_RATE,
SOLVENT A, SOLVENT B, COLUMN_NAME
.
ms_type, ionization
correspond to AC$MASS_SPECTROMETRY: MS_TYPE, IONIZATION
.
entry_prefix
is the two-letter prefix used when building MassBank accession codes.
Entries under ms_dataprocessing
are added as MS$DATA_PROCESSING:
entries,
in addition to the default WHOLE: RMassBank
.
annotator
For advanced users: option to select your own custom annotator.
Check annotator.default
and the source code for details.
spectraList
This setting describes the experimental annotations for the single
data-dependent scans. For every data-dependent scan event, a spectraList
entry with
mode, ces, ce, res
denoting collision mode, collision energy in short and verbose
notation, and FT resolution.
accessionNumberShifts
This denotes the starting points for accession numbers
for different ion types. For example, pH: 0, mH: 50
means that [M+H]+ spectra will
start at XX123401
(XX
being the entry_prefix
and 1234
the compound
id) and [M-H]- will start at XX123451
.
electronicNoise, electronicNoiseWidth
Known electronic noise peaks and the window
to be used by cleanElnoise
recalibrateBy
dppm
or dmz
to recalibrate either by delta ppm or by
delta mz.
recalibrateMS1
common
or separate
to recalibrate MS1 data points together
or separately from MS2 data points.
recalibrator: MS1, MS2
The functions to use for recalibration of MS1 and MS2 data points.
Note that the MS1
setting is only meaningful if recalibrateMS1: separate
, otherwise
the MS2
setting is used for a common recalibration curve. See recalibrate.loess
for details.
multiplicityFilter
Define the multiplicity filtering level. Default is 2, a value of 1
is off (no filtering) and >2 is harsher filtering.
titleFormat
The title of MassBank records is a mini-summary
of the record, for example "Dinotefuran; LC-ESI-QFT; MS2; CE: 35
By default, the first compound name CH$NAME
, instrument type
AC$INSTRUMENT_TYPE
, MS/MS type AC$MASS_SPECTROMETRY: MS_TYPE
,
collision energy RECORD_TITLE_CE
, resolution AC$MASS_SPECTROMETRY: RESOLUTION
and precursor MS$FOCUSED_ION: PRECURSOR_TYPE
are used. If alternative
information is relevant to differentiate acquired spectra, the title should be adjusted.
For example, many TOFs do not have a resolution setting.
See MassBank documentation for more.
filterSettings
A list of settings that affect the MS/MS processing. The entries
ppmHighMass, ppmLowMass, massRangeDivision
set values for
pre-processing, prior to recalibration. ppmHighMass
defines the
ppm error for the high mass range (default 10 ppm for Orbitraps),
ppmLowMass
is the error for the low mass range (default 15 ppm
for Orbitraps) and massRangeDivision
is the m/z value defining
the split between the high and low mass range (default m/z = 120).
The entry ppmFine
defines the ppm cut-off post recalibration.
The default value of 5 ppm is recommended for Orbitraps. For other
instruments this can be interpreted from the recalibration plot.
All ppm limits are one-sided (e.g. this includes values to +5 ppm or -5 ppm
deviation from the exact mass).
The entries prelimCut, prelimCutRatio
define the intensity cut-off and
cut-off ratio (in
the peak selection for the recalibration only. Careful: the default value
1e4 for Orbitrap LTQ positive mode could remove all peaks for TOF data
and will remove too many peaks for Orbitrap LTQ negative mode spectra!
The entry specOKLimit
defines the intensity limit to include MS/MS spectra.
MS/MS spectra must have at least one peak above this limit to proceed through
the workflow.
dbeMinLimit
defines the minimum allowable ring and double bond equivalents (DBE)
allowed for assigned formulas. This assumes maximum valuences for elements with
multiple valence states. The default is -0.5 (accounting for fragments being ions).
The entries satelliteMzLimit, satelliteIntLimit
define the cut-off m/z and
intensity values for satellite peak removal (an artefact of Fourier Transform
processing). All peaks within the m/z limit (default 0.5) and intensity ratio
(default 0.05 or 5
Fourier Transform instruments only (e.g. Orbitrap).
filterSettings
Parameters for adjusting the raw data retrieval.
The entry ppmFine
defines the ppm error to look for the precursor in
the MS1 (parent) spectrum. Default is 10 ppm for Orbitrap.
mzCoarse
defines the error to search for the precursor specification
in the MS2 spectrum. This is often only saved to 2 decimal places and thus
can be quite inaccurate. The accuracy also depends on the isolation window used.
The default settings (for e.g. Orbitrap) is 0.5 (Da, or Th for m/z).
The entry fillPrecursorScan
is largely untested. The default value
(FALSE) assumes all necessary precursor information is available in the mzML file.
A setting ot TRUE tries to fill in the precursor data scan number if it is missing.
Only tested on one case study so far - feedback welcome!
Michael Stravs, Emma Schymanski
Set of spectra pertaining to one compound
parent
Spectrum1 The precursor spectrum
children
RmbSpectrum2List List of 'RmbSpectrum2' objects for the fragmentation spectra, which are first extracted and later processed during 'msmsWorkflow'
found
logical, denotes whether or not fragmentation spectra were found for this compound
complete
logical, denotes whether or not *all* expected collision energies were found for this compound
empty
logical, 'TRUE' if there are zero found spectra for this compound
formula
character, the molecular formula of the neutral compound
id
The ID of the compound in the RMassBank compound list (see loadList
)
mz
the m/z value of the precursor
name
The name of the compound
mode
The ion type of the precursor, e.g. 'pH, mH, mNa'
smiles
the SMILES string for the compound structure
This extends the Spectrum2
class of the MSnbase
package and introduces further slots that are used to store information
during the RMassBank
workflow.
satellite
logical
If TRUE
, the corresponding peak was removed as satellite.
low
logical
If TRUE
, the corresponding peak was removed
because it failed the intensity cutoff.
rawOK
logical
If TRUE
, the peak passed satellite and low-intensity cutoff removal.
good
logical
If TRUE
, a formula could be found for the peak
and the peak passed all filter criteria. (see the
RMassBank
vignette or the documentation of analyzeMsMs
#' for details on filter settings)
mzCalc
numeric The mz value calculated from the found formula for each peak (if any)
formula
character
The formula found for each peak.
generate.formula
is used
for formula-fitting
dbe
numeric The number of double bond equivalents. This is calculated from the found formula for each peak (if any)
formulaCount
integer
The number of different formulae found for each peak.
Note: A peak for which multiple formulas were found will appear
multiple times. Hence there may be multiple entries in the formula
, dppm
and mzCalc
slot for the same mz value.
formulaSource
character "analyze" or "reanalysis" Shows whether the current formula for the peak was determined by normal analysis ("analyze") or by reanalysis of a failpeak ("reanalysis")
dppm
numeric The ppm deviation of the mz value from the found formula (if any).
dppmBest
numeric The ppm deviation of the mz value from the best formula found.
ok
logical one-element vector
If this is TRUE
, the spectrum was successfully processed
with at least one resulting peak.
Otherwise, one of the following cases applies:
All peaks failed the intensity cutoff i.e. the whole spectrum contains low intensity peaks, only.
All peaks were marked as satellites.
All peaks in the spectrum have a lower intensity than the value
given in the specOkLimit
filter setting. (see the RMassBank
vignette or the documentation of analyzeMsMs
)
The precursor ion formula is invalid (see is.valid.formula
)
The spectrum is empty.
No molecular formula could be found for any of the peaks.
All peaks failed the dbeMinLimit
criterion. (see the
RMassBank
vignette or the documentation of analyzeMsMs
)
info
list
Spectrum identifying information
(collision energy, resolution, collision mode) from the spectraList
properties
data.frame
This is used as a flexible placeholder to store additional properties
for each peak throughout the workflow. After the last step of the
mbWorkflow
, this will typically contain columns mzRaw
,
noise
, formulaMultiplicity
, bestMultiplicity
and filterOK
. However, new columns may be added on demand
(see property<-
)
generate.formula
, property<-
analyzeMsMs
, generate.formula
,
is.valid.formula
Selects peaks from aggregate table according to different criteria.
selectPeaks(o, ...) ## S4 method for signature 'RmbSpectrum2' selectPeaks(o, filter, ..., enclos = parent.frame(2)) ## S4 method for signature 'Spectrum' selectPeaks(o, filter, ..., enclos = parent.frame(2)) ## S4 method for signature 'RmbSpectrum2List' selectPeaks(o, ..., enclos = parent.frame(2)) ## S4 method for signature 'RmbSpectraSetList' selectPeaks(o, ..., enclos = parent.frame(2)) ## S4 method for signature 'RmbSpectraSet' selectPeaks(o, ..., enclos = parent.frame(2)) ## S4 method for signature 'data.frame' selectPeaks(o, filter, ..., enclos = parent.frame(2)) ## S4 method for signature 'msmsWorkspace' selectPeaks(o, ..., enclos = parent.frame(2))
selectPeaks(o, ...) ## S4 method for signature 'RmbSpectrum2' selectPeaks(o, filter, ..., enclos = parent.frame(2)) ## S4 method for signature 'Spectrum' selectPeaks(o, filter, ..., enclos = parent.frame(2)) ## S4 method for signature 'RmbSpectrum2List' selectPeaks(o, ..., enclos = parent.frame(2)) ## S4 method for signature 'RmbSpectraSetList' selectPeaks(o, ..., enclos = parent.frame(2)) ## S4 method for signature 'RmbSpectraSet' selectPeaks(o, ..., enclos = parent.frame(2)) ## S4 method for signature 'data.frame' selectPeaks(o, filter, ..., enclos = parent.frame(2)) ## S4 method for signature 'msmsWorkspace' selectPeaks(o, ..., enclos = parent.frame(2))
o |
|
... |
no additional parameters |
filter |
The expression (to be evaluated in context of the 'getData()' result on the spectrum) to define the peaks to keep. For example, 'good & filterOK' |
enclos |
The context in which to evaluate the filter expression, by default set such that the spectrum 'getData()' is retrieved. |
Peak dataframe according to the specified criteria.
selectPeaks(RmbSpectraSetList)
: A method to filter spectra to the specified peaks
selectPeaks(RmbSpectraSet)
: A method to filter spectra to the specified peaks
selectPeaks(data.frame)
: A method to retrieve the specified peaks from the "aggregated" slot (a data.frame object) in an msmsWorkSpace
selectPeaks(msmsWorkspace)
: A method to retrieve the specified peaks from an msmsWorkSpace
stravsmi
From a list of RmbSpectraSet
s, returns the spectra which match a criterion (found, complete, empty as in checkSpectra
).
This can be returned either as a TRUE/FALSE
vector, as a vector of indices for matching elements, as a vector of RmbSpectraSet
objects
matching the conditions, or as a vector of RmbSpectraSet
objects NOT matching the conditions (sic!).
selectSpectra(s, property, value = "logical") ## S4 method for signature 'RmbSpectraSetList,character' selectSpectra(s, property, value = "logical") ## S4 method for signature 'msmsWorkspace,character' selectSpectra(s, property, value = "logical")
selectSpectra(s, property, value = "logical") ## S4 method for signature 'RmbSpectraSetList,character' selectSpectra(s, property, value = "logical") ## S4 method for signature 'msmsWorkspace,character' selectSpectra(s, property, value = "logical")
s |
The |
property |
The property to check ( |
value |
|
As described above.
selectSpectra(s = RmbSpectraSetList, property = character)
: A method for selecting spectra from a spectra set list
selectSpectra(s = msmsWorkspace, property = character)
: A method for selecting spectra from an msmsWorkspace
stravsmi
Define a programmatic or gluey ACCESSION builder
setAccessionBuilder(accessionBuilder)
setAccessionBuilder(accessionBuilder)
accessionBuilder |
a function that takes parameters 'cpd' (an instance of 'RmbSpectraSet'), 'spectrum' (an instance of 'RmbSpectrum2') and 'subscan' (an integer denoting relative scan id) and returns a 'character'. Alternatively a glue string just like the one in the RMassBank settings. |
RmbSpectrum2
data from data.frameSets all slots which are present as columns in the given dataframe. Optionally cleans the object, i.e. empties slots not defined in the data frame.
## S4 method for signature 'RmbSpectrum2,data.frame' setData(s, df, clean = TRUE)
## S4 method for signature 'RmbSpectrum2,data.frame' setData(s, df, clean = TRUE)
s |
The |
df |
The data frame with new data |
clean |
|
The modified RmbSpectrum2
.
stravsmi
Uses a SMILES-String to calculate the mass using rcdk-integrated functions.
smiles2mass(SMILES)
smiles2mass(SMILES)
SMILES |
A String-object representing a SMILES |
The calculated mass of the given SMILES-Formula
Erik Mueller
## Not run: smiles2mass("CC(=O)NC(C(O)1)C(O)C(OC(O2)C(O)C(OC(O3)C(O)C(O)C(O)C(CO)3)C(O)C(CO)2)C(CO)O1") ## End(Not run)
## Not run: smiles2mass("CC(=O)NC(C(O)1)C(O)C(OC(O2)C(O)C(OC(O3)C(O)C(O)C(O)C(CO)3)C(O)C(CO)2)C(CO)O1") ## End(Not run)
Counts the number of acquired spectra for a compound or multiple compounds
spectraCount(s) ## S4 method for signature 'RmbSpectraSet' spectraCount(s) ## S4 method for signature 'RmbSpectraSetList' spectraCount(s) ## S4 method for signature 'msmsWorkspace' spectraCount(s)
spectraCount(s) ## S4 method for signature 'RmbSpectraSet' spectraCount(s) ## S4 method for signature 'RmbSpectraSetList' spectraCount(s) ## S4 method for signature 'msmsWorkspace' spectraCount(s)
s |
The object ( |
For RmbSpectraSet
objects, a single number counting the spectra in that object. For RmbSpectraSetList
or msmsWorkspace
, a
vector with spectra counts for all compounds (RmbSpectraSet
s) in the object.
spectraCount(RmbSpectraSet)
: Counts the number of acquired spectra for an RmbSpectraSet
spectraCount(RmbSpectraSetList)
: Counts the number of acquired spectra for an RmbSpectraSetList
spectraCount(msmsWorkspace)
: Counts the number of acquired spectra for an msmsWorkSpace
stravsmi
Converts a molecular formula e.g. C15H20 into an upper limit appropriate for
use with Rcdk's generate.formula
function's element
argument.
to.limits.rcdk(formula)
to.limits.rcdk(formula)
formula |
A molecular formula in string or list representation
( |
This helper function is used to make the upper limits for
generate.formula
when finding subformulas to match to a MS2
fragment peak.
An array in the form c( c("C", "0", "12"), c("H", "0", "12"))
(for input of "C12H12").
Michael Stravs
# to.limits.rcdk("C6H6") to.limits.rcdk(add.formula("C6H12O6", "H"))
# to.limits.rcdk("C6H6") to.limits.rcdk(add.formula("C6H12O6", "H"))
Writes a MassBank record in list format to a text array.
toMassbank(o, ...) ## S4 method for signature 'RmbSpectraSet' toMassbank(o, addAnnotation = getOption("RMassBank")$add_annotation) ## S4 method for signature 'RmbSpectrum2' toMassbank(o, addAnnotation = getOption("RMassBank")$add_annotation)
toMassbank(o, ...) ## S4 method for signature 'RmbSpectraSet' toMassbank(o, addAnnotation = getOption("RMassBank")$add_annotation) ## S4 method for signature 'RmbSpectrum2' toMassbank(o, addAnnotation = getOption("RMassBank")$add_annotation)
o |
An object to convert to MassBank record format. This may be a single 'RmbSpectrum2', or a complete compound (an 'RmbSpectraSet'), |
... |
Parameters passed to the implementation, in particular 'addAnnotation' |
addAnnotation |
'logical', whether to add peak annotations (putative formulas) to the record. |
The function is a general conversion tool for the MassBank format; i.e. the
field names are not fixed. mbdata
must be a named list, and the
entries can be as follows:
A single text line:
'CH\$EXACT_MASS' = '329.1023'
is written as
CH\$EXACT_MASS: 329.1023
A character array:
'CH\$NAME' = c('2-Aminobenzimidazole', '1H-Benzimidazol-2-amine')
is written as
CH\$NAME: 2-Aminobenzimidazole
CH\$NAME: 1H-Benzimidazol-2-amine
A named list of strings:
'CH\$LINK' = list('CHEBI' = "27822", "KEGG" = "C10901")
is written as
CH\$LINK: CHEBI 27822
CH\$LINK: KEGG C10901
A data frame (e.g. the peak table) is written as specified in the MassBank record format (Section 2.6.3): the column names are used as headers for the first line, all data rows are printed space-separated.
The result is a text array, which is ready to be written to the disk as a file.
The function iterates over the list item names. This means that
duplicate entries in mbdata
are (partially) discarded! The correct
way to add them is by making a character array (as specified above): Instead
of 'CH\$NAME' = 'bla', 'CH\$NAME' = 'blub'
specify 'CH\$NAME' =
c('bla','blub')
.
Michael Stravs
MassBank record format: http://www.massbank.jp/manuals/MassBankRecord_en.pdf
## Not run: # Read just the compound info skeleton from the Internet for some compound ID id <- 35 mbdata <- gatherData(id) #' # Export the mbdata blocks to line arrays # (there is no spectrum information, just the compound info...) mbtext <- toMassbank(mbdata) ## End(Not run)
## Not run: # Read just the compound info skeleton from the Internet for some compound ID id <- 35 mbdata <- gatherData(id) #' # Export the mbdata blocks to line arrays # (there is no spectrum information, just the compound info...) mbtext <- toMassbank(mbdata) ## End(Not run)
Converts a pseudospectrum extracted from XCMS using CAMERA into the msmsWorkspace(at)spectrum-format that RMassBank uses
toRMB(msmsXCMSspecs, cpdID, mode, MS1spec)
toRMB(msmsXCMSspecs, cpdID, mode, MS1spec)
msmsXCMSspecs |
The compoundID of the compound that has been used for the peaklist |
cpdID |
The compound ID of the substance of the given spectrum |
mode |
The ionization mode that has been used for the spectrum |
MS1spec |
The MS1-spectrum from XCMS, which can be optionally supplied |
One list element of the (at)specs-entry from an msmsWorkspace
Erik Mueller
## Not run: XCMSpspectra <- findmsmsHRperxcms.direct("Glucolesquerellin_2184_1.mzdata", 2184) wspecs <- toRMB(XCMSpspectra) ## End(Not run)
## Not run: XCMSpspectra <- findmsmsHRperxcms.direct("Glucolesquerellin_2184_1.mzdata", 2184) wspecs <- toRMB(XCMSpspectra) ## End(Not run)
JCAMP files containing multiple blocks are usually structured by so-called link blocks. If no link block is present, the readJDX package is not able to parse the file. This method will add a link block at the top of the given file or print a message if an existing link block is found. The file is not changed in this case.
updateHeader(filename)
updateHeader(filename)
filename |
character The name of the file to which a link block should be added. The filename is also used as content for the TITLE field in the link block |
Nothing is returned
pstahlhofen
## Not run: updateHeader("my_multiblock_jcamp.jdx") ## End(Not run)
## Not run: updateHeader("my_multiblock_jcamp.jdx") ## End(Not run)
Checks if all necessary fields are present in the current settings
and fills in default values from the RmbDefaultSettings
if required.
updateSettings(settings, warn = TRUE)
updateSettings(settings, warn = TRUE)
settings |
The set of settings to check and update. |
warn |
Whether to update parameters quietly ( |
The updated set of settings.
Important: There is a change in behaviour of RMassBank in certain cases when filterSettings
is not
present in the old settings! The default pre-recalibration cutoff from RmbDefaultSettings
is 10000.
Formerly the pre-recalibration cutoff was set to be 10000 for positive spectra but 0 for negative spectra.
Updating the settings files is preferred to using the updateSettings
function.
Stravs MA, Eawag <[email protected]>
## Not run: w@settings <- updateSettings(w@settings) ## End(Not run)
## Not run: w@settings <- updateSettings(w@settings) ## End(Not run)
Validates a plain text MassBank record, or recursively all records within a directory. The Unit Tests to be used are installed in RMassBank/inst/validationTests and currently include checks for NAs, peaks versus precursor, precursor mz, precursor type, SMILES vs exact mass, total intensities and title versus type. The validation report is saved as "report.html" in the working directory.
validate(path, simple = TRUE)
validate(path, simple = TRUE)
path |
The filepath to a single record, or a directory to search recursively |
simple |
If TRUE the function creates a simpler form of the RUnit .html report, better readable for humans. If FALSE it returns the unchanged RUnit report. |
## Not run: validate("/tmp/MassBank/OpenData/record/") ## End(Not run)
## Not run: validate("/tmp/MassBank/OpenData/record/") ## End(Not run)