Title: | Baffling Recursive Algorithm for Isotope distributioN calculations |
---|---|
Description: | Package for calculating aggregated isotopic distribution and exact center-masses for chemical substances (in this version composed of C, H, N, O and S). This is an implementation of the BRAIN algorithm described in the paper by J. Claesen, P. Dittwald, T. Burzykowski and D. Valkenborg. |
Authors: | Piotr Dittwald, with contributions of Dirk Valkenborg and Jurgen Claesen |
Maintainer: | Piotr Dittwald <[email protected]> |
License: | GPL-2 |
Version: | 1.53.0 |
Built: | 2024-11-24 06:26:16 UTC |
Source: | https://github.com/bioc/BRAIN |
This package implements BRAIN (Baffling Recursive Algorithm for Isotope distributioN calculations) is described in full details by Claesen et al. [Clae] (see also application note [Ditt]). The algorithm uses an algebraic approach (Viete's formulas, Newton identities [Macd]) which is especially useful for large molecules due to its advantageous scaling properties. This version of the package provides functions for calculating the aggregated isotopic distribution and center-masses masses for each aggregated isotopic variant for chemical components built from carbon, hydrogen, oxygen, nitrogen and sulfur (e.g. peptides). The natural abundances and molecular masses for stable isotopes of C, H, N, O, S are taken from IUPAC 1997 [Rosm]. Also, some heuristics to faster compute isotopic distribution are applied [Ditt2].
Package: | BRAIN |
Type: | Package |
Version: | 1.5.0 |
Date: | 2018-08-02 |
License: | GPL-2 |
LazyLoad: | yes |
Piotr Dittwald with contribution of Dirk Valkenborg and Jurgen Claesen
Maintainer: Piotr Dittwald <[email protected]>
[Clae] Claesen J., Dittwald P., Burzykowski T. and Valkenborg D. An efficient method to calculate the aggregated isotopic distribution and exact center-masses. JASMS, 2012, doi:10.1007/s13361-011-0326-2
[Ditt] Dittwald P., Claesen J., Burzykowski T., Valkenborg D., Gambin A. BRAIN: a universal tool for high-throughput calculations of the isotopic distribution for mass spectrometry. Anal Chem., 2013, doi: 10.1021/ac303439m
[Ditt2] Dittwald P., Valkenborg D. BRAIN 2.0: time and memory complexity improvements in the algorithm for calculating the isotope distribution. JASMS, 2014, doi: 10.1007/s13361-013-0796-5.
[Macd] Macdonald I.G., Symmetric functions and Hall polynomials / by I. G. Macdonald. Clarendon Press; Oxford University Press, Oxford : New York, 1979.
[Rosm] K.J.R. Rosman and P.D.P. Taylor. Isotopic compositions of the elements 1997. Pure and Applied Chemistry, 70(1):217-235, 1998.
nrPeaks = 1000 aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- useBRAIN(aC = aC, nrPeaks = nrPeaks) iso <- res$isoDistr masses <- res$masses mono <- res$monoisotopicMass avgMass <- res$avgMass
nrPeaks = 1000 aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- useBRAIN(aC = aC, nrPeaks = nrPeaks) iso <- res$isoDistr masses <- res$masses mono <- res$monoisotopicMass avgMass <- res$avgMass
Function computing the theoretical average masses for chemical components composed of carbon, hydrogen, oxygen, nitrogen and sulfur (e.g. peptides).
calculateAverageMass(aC)
calculateAverageMass(aC)
aC |
List with fields C, H, N, O, S of integer non-negative values (if any field is ommited, then its value is set to 0). |
Mass is calculated in Daltons.
Average mass (numeric)
Piotr Dittwald <[email protected]>
[Clae] Claesen J., Dittwald P., Burzykowski T. and Valkenborg D. An efficient method to calculate the aggregated isotopic distribution and exact center-masses. JASMS, 2012, doi:10.1007/s13361-011-0326-2
aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- calculateAverageMass(aC = aC)
aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- calculateAverageMass(aC = aC)
Function computing probabilities of aggregated isotopic variants for chemical components built from carbon, hydrogen, oxygen, nitrogen and sulfur (e.g. peptides).
calculateIsotopicProbabilities(aC, stopOption = "nrPeaks", nrPeaks, coverage, abundantEstim)
calculateIsotopicProbabilities(aC, stopOption = "nrPeaks", nrPeaks, coverage, abundantEstim)
aC |
List with fields C, H, N, O, S of integer non-negative values (if any field is ommited, then its value is set to 0). |
stopOption |
one of the following strings: "nrPeaks" (default), "coverage", "abundantEstim" |
nrPeaks |
Integer indicating the number of consecutive isotopic variants to be calculated, starting from the monoisotopic one. This value can always be provided, even if <stopOption> is not a default setting. In the latter case it is a hard stopping criterion. |
coverage |
Scalar indicating the value of the cumulative aggregated distribution. The calculations will be stopped after reaching this value. |
abundantEstim |
Integer indicating the number of consecutive isotopic variants to be calculated, starting from one after the most abundant one. All consecutive isotopic variants before the most abundant peak are also returned. |
Remember that the isotopic variants starts from the monoisotopic one. In case of large chemical molecules, first masses may have very low abundance values for the lower mass aggregated values. A sufficient number of peaks should be calculated to reach most abundant isotopic variant.
Probabilities of aggregated isotopic variants (numeric vector)
If also masses associated with the aggregated isotopic variants are needed, then the function useBRAIN should be used.
Piotr Dittwald <[email protected]>
[Clae] Claesen J., Dittwald P., Burzykowski T. and Valkenborg D. An efficient method to calculate the aggregated isotopic distribution and exact center-masses. JASMS, 2012, doi:10.1007/s13361-011-0326-2
nrPeaks = 1000 aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- calculateIsotopicProbabilities(aC = aC, stopOption="nrPeaks", nrPeaks = nrPeaks)
nrPeaks = 1000 aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- calculateIsotopicProbabilities(aC = aC, stopOption="nrPeaks", nrPeaks = nrPeaks)
Function computing the theoretical monoisotopic masses for chemical components composed of carbon, hydrogen, oxygen, nitrogen and sulfur (e.g. peptides).
calculateMonoisotopicMass(aC)
calculateMonoisotopicMass(aC)
aC |
List with fields C, H, N, O, S of integer non-negative values (if any field is ommited, then its value is set to 0). |
Mass is calculated in Daltons.
Monoisotopic mass (numeric)
Piotr Dittwald <[email protected]>
[Clae] Claesen J., Dittwald P., Burzykowski T. and Valkenborg D. An efficient method to calculate the aggregated isotopic distribution and exact center-masses. JASMS, 2012, doi:10.1007/s13361-011-0326-2
aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- calculateMonoisotopicMass(aC = aC)
aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- calculateMonoisotopicMass(aC = aC)
Function computing heuristically the required number of consecutive aggregated isotopic variants (starting from the monoisotopic mass).
calculateNrPeaks(aC)
calculateNrPeaks(aC)
aC |
List with fields C, H, N, O, S of integer non-negative values (if any field is ommited, then its value is set to 0). |
This function uses following rule of thumb: the difference between the theoretical monoisotopic mass and the theoretical average mass is computed and multiplied by two. Subsequently, the obtained number is rounded to the nearest integer greater than or equal to the multiplied difference. For small molecules, the minimal number of returned variants is five.
Integer number not lower than 5.
Jurgen Claesen <[email protected]>
[Clae] Claesen J., Dittwald P., Burzykowski T. and Valkenborg D. An efficient method to calculate the aggregated isotopic distribution and exact center-masses. JASMS, 2012, doi:10.1007/s13361-011-0326-2
aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- calculateNrPeaks(aC = aC)
aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- calculateNrPeaks(aC = aC)
Function computing an atomic composition from (naturally occuring) amino acid sequence.
getAtomsFromSeq(seq)
getAtomsFromSeq(seq)
seq |
The character vector of AAString (see Biostrings package) with amino acid sequence. It should contain only letters "A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V" (1-letter symbols of 20 naturally occuring amino acids). |
The atomic composition is just a summaric atomic composition of all amino acids composing the sequence minus (n-1) times the water molecule, where n is a length of given amino acid sequence.
Named list with the following fields with number of correcponding atoms (integer non-negative values):
C,
H,
N,
O,
S
Piotr Dittwald <[email protected]>
seq1 <- "AACD" aC1 <- getAtomsFromSeq(seq = seq1) seq2 <- AAString("ACCD") aC2 <- getAtomsFromSeq(seq = seq2)
seq1 <- "AACD" aC1 <- getAtomsFromSeq(seq = seq1) seq2 <- AAString("ACCD") aC2 <- getAtomsFromSeq(seq = seq2)
Function computing probabilities of isotopic variants and their aggregated masses for chemical components composed of carbon, hydrogen, oxygen, nitrogen and sulfur (e.g. peptides). Additionally the function returns also the monoisotopic mass and the average mass of given chemical component.
useBRAIN(aC, stopOption = "nrPeaks", nrPeaks, coverage, abundantEstim)
useBRAIN(aC, stopOption = "nrPeaks", nrPeaks, coverage, abundantEstim)
aC |
List with fields C, H, N, O, S of integer non-negative values (if any field is ommited, then its value is set to 0). |
stopOption |
one of the following strings: "nrPeaks" (default), "coverage", "abundantEstim" |
nrPeaks |
Integer indicating the number of consecutive isotopic variants to be calculated, starting from the monoisotopic one. This value can always be provided, even if <stop.option> is not a default setting. In the latter case it is a hard stopping criterion. |
coverage |
Scalar indicating the value of the cumulative aggregated distribution. The calculations will be stopped after reaching this value. |
abundantEstim |
Integer indicating the number of consecutive isotopic variants to be calculated, starting from one after the most abundant one. All consecutive isotopic variants before the most abundant peak are also returned. |
Function uses recursive formulae based on algebraic Newton-Girard identity (see [Clae]).
Named list with the following fields:
isoDistrProbabilities of aggregated isotopic variants (numeric vector)
massesAggregated masses for isotopic variants (numeric vector)
monoisotopicMassMonoisotopic mass (numeric)
avgMassAverage mass - weighted average of the isotopic variants contributing to the most abundant aggregated variant (numeric)
Remember that the isotopic variants starts from monoisotopic one. For large chemical molecules, first masses may have very low abundances. So sufficient number of peaks should be calculated to reach most abundant isotopic variant.
If only isotopic probabilities are needed, then the function calculateIsotopicProbabilities should be used.
Piotr Dittwald <[email protected]>
[Clae] Claesen J., Dittwald P., Burzykowski T. and Valkenborg D. An efficient method to calculate the aggregated isotopic distribution and exact center-masses. JASMS, 2012, doi:10.1007/s13361-011-0326-2
calculateIsotopicProbabilities
nrPeaks = 1000 aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- useBRAIN(aC = aC, stopOption="nrPeaks", nrPeaks = nrPeaks)
nrPeaks = 1000 aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- useBRAIN(aC = aC, stopOption="nrPeaks", nrPeaks = nrPeaks)
Function computing probabilities of isotopic variants using heuristics, for chemical components composed of carbon, hydrogen, oxygen, nitrogen and sulfur (e.g. peptides). Additionally the function returns also the monoisotopic mass and the average mass of given chemical component.
useBRAIN2(aC, stopOption = "nrPeaks", nrPeaks, approxStart = 1, approxParam = NULL))
useBRAIN2(aC, stopOption = "nrPeaks", nrPeaks, approxStart = 1, approxParam = NULL))
aC |
List with fields C, H, N, O, S of integer non-negative values (if any field is ommited, then its value is set to 0). |
stopOption |
only option "nrPeaks" allowed |
nrPeaks |
Integer indicating the number of consecutive isotopic variants to be calculated, starting from the monoisotopic one. This value can always be provided, even if <stop.option> is not a default setting. In the latter case it is a hard stopping criterion. |
approxStart |
Integer indicating the number of first isotopic peak to be calculated |
approxParam |
Integer indicating the length of recurrence (see RCL in [Ditt2]) |
Function uses RCL and LSP heuristics from [Ditt2].
Named list with the following fields:
isoDistrProbabilities of aggregated isotopic variants (numeric vector)
Remember that the isotopic variants starts from monoisotopic one. For large chemical molecules, first masses may have very low abundances. So sufficient number of peaks should be calculated to reach most abundant isotopic variant.
If only isotopic probabilities are needed, then the function calculateIsotopicProbabilities should be used.
Piotr Dittwald <[email protected]>
[Ditt2] Dittwald P., Valkenborg D. BRAIN 2.0: time and memory complexity improvements in the algorithm for calculating the isotope distribution. JASMS, 2014, doi: 10.1007/s13361-013-0796-5.
calculateIsotopicProbabilities
nrPeaks = 1000 aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- useBRAIN(aC = aC, stopOption="nrPeaks", nrPeaks = nrPeaks) res2 <- useBRAIN2(aC = aC, stopOption="nrPeaks", nrPeaks = nrPeaks, approxStart = 10) old = res$iso[10:109]/res$iso[11:110] new = res2$iso[1:100]/res2$iso[2:101] max(old - new) max((old - new)/old) res3 <- useBRAIN2(aC = aC, stopOption="nrPeaks", nrPeaks = nrPeaks, approx=TRUE, approxParam = 10) max(res3$iso - res$iso)
nrPeaks = 1000 aC <- list(C=23832, H=37816, N=6528, O=7031, S=170) # Human dynein heavy chain res <- useBRAIN(aC = aC, stopOption="nrPeaks", nrPeaks = nrPeaks) res2 <- useBRAIN2(aC = aC, stopOption="nrPeaks", nrPeaks = nrPeaks, approxStart = 10) old = res$iso[10:109]/res$iso[11:110] new = res2$iso[1:100]/res2$iso[2:101] max(old - new) max((old - new)/old) res3 <- useBRAIN2(aC = aC, stopOption="nrPeaks", nrPeaks = nrPeaks, approx=TRUE, approxParam = 10) max(res3$iso - res$iso)