Title: | Analysis of Metabolomics GC/MS Data |
---|---|
Description: | Fragment-level analysis of gas chromatography-massspectrometry metabolomics data. |
Authors: | Mark Robinson <[email protected]>, Riccardo Romoli <[email protected]> |
Maintainer: | Mark Robinson <[email protected]>, Riccardo Romoli <[email protected]> |
License: | LGPL (>= 2) |
Version: | 1.63.0 |
Built: | 2025-01-09 06:30:37 UTC |
Source: | https://github.com/bioc/flagme |
Reads ASCII ELU-format files (output from AMDIS) and attaches them to an
already created peaksDataset
object
addAMDISPeaks(object, fns = dir(, "[Eu][Ll][Uu]"), verbose = TRUE, ...)
addAMDISPeaks(object, fns = dir(, "[Eu][Ll][Uu]"), verbose = TRUE, ...)
object |
a |
fns |
character vector of same length as |
verbose |
whether to give verbose output, default |
... |
arguments passed on to |
Repeated calls to parseELU
to add peak detection results to the
original peaksDataset
object.
peaksDataset
object
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
# need access to CDF (raw data) and ELU files require(gcspikelite) gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") # full paths to file names cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # create a 'peaksDataset' object and add AMDIS peaks to it pd<-peaksDataset(cdfFiles[1],mz=seq(50,550),rtrange=c(7.5,8.5)) pd<-addAMDISPeaks(pd,eluFiles[1])
# need access to CDF (raw data) and ELU files require(gcspikelite) gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") # full paths to file names cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # create a 'peaksDataset' object and add AMDIS peaks to it pd<-peaksDataset(cdfFiles[1],mz=seq(50,550),rtrange=c(7.5,8.5)) pd<-addAMDISPeaks(pd,eluFiles[1])
Reads ASCII tab-delimited format files (output from ChromaTOF) and attaches
them to an already created peaksDataset
object
addChromaTOFPeaks( object, fns = dir(, "[Tt][Xx][Tx]"), rtDivide = 60, verbose = TRUE, ... )
addChromaTOFPeaks( object, fns = dir(, "[Tt][Xx][Tx]"), rtDivide = 60, verbose = TRUE, ... )
object |
a |
fns |
character vector of same length as |
rtDivide |
number giving the amount to divide the retention times by. |
verbose |
whether to give verbose output, default |
... |
arguments passed on to |
Repeated calls to parseChromaTOF
to add peak detection results to the
original peaksDataset
object.
peaksDataset
object
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
# need access to CDF (raw data) and ChromaTOF files require(gcspikelite) gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") # full paths to file names cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) # [not run] cTofFiles<-dir(gcmsPath,"txt",full=TRUE) # create a 'peaksDataset' object and add ChromaTOF peaks to it pd<-peaksDataset(cdfFiles[1],mz=seq(50,550),rtrange=c(7.5,8.5)) # [not run] pd<-addChromTOFPeaks(pd,...)
# need access to CDF (raw data) and ChromaTOF files require(gcspikelite) gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") # full paths to file names cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) # [not run] cTofFiles<-dir(gcmsPath,"txt",full=TRUE) # create a 'peaksDataset' object and add ChromaTOF peaks to it pd<-peaksDataset(cdfFiles[1],mz=seq(50,550),rtrange=c(7.5,8.5)) # [not run] pd<-addChromTOFPeaks(pd,...)
Add xcms/CAMERA peak detection results
addXCMSPeaks( files, object, settings = list(), minintens = 100, minfeat = 6, BPPARAM = bpparam(), multipleMF = FALSE, multipleMFParam = list(fwhm = c(5, 10, 15), mz.abs = 0.2, rt.abs = 2) )
addXCMSPeaks( files, object, settings = list(), minintens = 100, minfeat = 6, BPPARAM = bpparam(), multipleMF = FALSE, multipleMFParam = list(fwhm = c(5, 10, 15), mz.abs = 0.2, rt.abs = 2) )
files |
list of chromatogram files |
object |
a |
settings |
|
minintens |
minimum ion intensity to be included into a pseudospectra |
minfeat |
minimum number of ion to be created a pseudospectra |
BPPARAM |
a parameter class specifying if and how parallel processing should be performed |
multipleMF |
logical Try to remove redundant peaks, in this case where there are any peaks within an absolute m/z value of 0.2 and within 3 s for any one sample in the xcmsSet (the largest peak is kept) |
multipleMFParam |
list. It conteins the settings for the peak-picking. mz_abs represent the the mz range; rt_abs represent thert range |
mz.abs |
mz range |
rt.abs |
rt range |
Reads the raw data using xcms, group each extracted ion according to their
retention time using CAMERA and attaches them to an already created
peaksDataset
object
Repeated calls to xcmsSet and annotate to perform peak-picking and
deconvolution. The peak detection results are added to the original
peaksDataset
object. Two peak detection alorithms are available:
continuous wavelet transform (peakPicking=c('cwt')) and the matched filter
approach (peakPicking=c('mF')) described by Smith et al (2006). For further
information consult the xcms package manual.
peaksDataset
object
Riccardo Romoli [email protected]
peaksDataset
findPeaks-matchedFilter
findPeaks-centWave
xcmsRaw-class
files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam() data <- addXCMSPeaks(files[1:2], data, settings = mfp, multipleMF = FALSE) data
files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam() data <- addXCMSPeaks(files[1:2], data, settings = mfp, multipleMF = FALSE) data
This function creates a "between" alignment (i.e. comparing merged peaks)
betweenAlignment( pD, cAList, pAList, impList, filterMin = 1, gap = 0.7, D = 10, usePeaks = TRUE, df = 30, verbose = TRUE, metric = 2, type = 2, penality = 0.2, compress = FALSE )
betweenAlignment( pD, cAList, pAList, impList, filterMin = 1, gap = 0.7, D = 10, usePeaks = TRUE, df = 30, verbose = TRUE, metric = 2, type = 2, penality = 0.2, compress = FALSE )
pD |
a |
cAList |
|
pAList |
|
impList |
|
filterMin |
minimum number of peaks within a merged peak to be kept in the analysis |
gap |
gap parameter |
D |
retention time penalty parameter |
usePeaks |
logical, whether to use peaks (if |
df |
distance from diagonal to calculate similarity |
verbose |
logical, whether to print information |
metric |
numeric, different algorithm to calculate the similarity
matrix between two mass spectrum. |
type |
numeric, two different type of alignment function |
penality |
penalization applied to the matching between two mass
spectra if |
compress |
logical whether to compress the similarity matrix into a sparse format. |
betweenAlignment
objects gives the data structure which stores the
result of an alignment across several "pseudo" datasets. These pseudo
datasets are constructed by merging the "within" alignments.
betweenAlignment
object
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
require(gcspikelite) ## see 'multipleAlignment'
require(gcspikelite) ## see 'multipleAlignment'
This function takes the set of all pairwise profile alignments and use these to estimate retention time shifts between each pair of samples. These will then be used to normalize the retention time penalty of the signal peak alignment.
calcTimeDiffs(pd, ca.full, verbose = TRUE)
calcTimeDiffs(pd, ca.full, verbose = TRUE)
pd |
a |
ca.full |
a |
verbose |
logical, whether to print out information |
Using the set of profile alignments,
list
of same length as ca.full@alignments
with the matrices
giving the retention time penalties.
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
peaksAlignment
, clusterAlignment
require(gcspikelite) # paths and files gcmsPath <- paste(find.package("gcspikelite"),"data",sep="/") cdfFiles <- dir(gcmsPath,"CDF",full=TRUE) eluFiles <- dir(gcmsPath,"ELU",full=TRUE) # read data, peak detection results pd <- peaksDataset(cdfFiles[1:2],mz=seq(50,550),rtrange=c(7.5,8.5)) pd <- addAMDISPeaks(pd,eluFiles[1:2]) # pairwise alignment using all scans fullca <- clusterAlignment(pd, usePeaks=FALSE, df=100) # calculate retention time shifts timedf <- calcTimeDiffs(pd, fullca)
require(gcspikelite) # paths and files gcmsPath <- paste(find.package("gcspikelite"),"data",sep="/") cdfFiles <- dir(gcmsPath,"CDF",full=TRUE) eluFiles <- dir(gcmsPath,"ELU",full=TRUE) # read data, peak detection results pd <- peaksDataset(cdfFiles[1:2],mz=seq(50,550),rtrange=c(7.5,8.5)) pd <- addAMDISPeaks(pd,eluFiles[1:2]) # pairwise alignment using all scans fullca <- clusterAlignment(pd, usePeaks=FALSE, df=100) # calculate retention time shifts timedf <- calcTimeDiffs(pd, fullca)
Store the raw data and optionally, information regarding signal peaks for a number of GCMS runs
clusterAlignment( pD, runs = 1:length(pD@rawdata), timedf = NULL, usePeaks = TRUE, verbose = TRUE, ... )
clusterAlignment( pD, runs = 1:length(pD@rawdata), timedf = NULL, usePeaks = TRUE, verbose = TRUE, ... )
pD |
a |
runs |
vector of integers giving the samples to calculate set of pairwise alignments over. |
timedf |
list (length = the number of pairwise alignments) of matrices
giving the expected time differences expected at each pair of peaks used
with |
usePeaks |
logical, |
verbose |
logical, whether to print out info. |
... |
other arguments passed to |
clusterAlignment computes the set of pairwise alignments.
clusterAlignment
object
Mark Robinson, Riccardo Romoli
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
require(gcspikelite) # paths and files gcmsPath <- paste(find.package("gcspikelite"), "data", sep="/") cdfFiles <- dir(gcmsPath, "CDF", full=TRUE) eluFiles <- dir(gcmsPath, "ELU", full=TRUE) # read data, peak detection results pd <- peaksDataset(cdfFiles[1:2], mz=seq(50,550), rtrange=c(7.5,8.5)) pd <- addAMDISPeaks(pd, eluFiles[1:2]) ca <- clusterAlignment(pd, gap=0.5, D=0.05, df=30, metric=1, type=1)
require(gcspikelite) # paths and files gcmsPath <- paste(find.package("gcspikelite"), "data", sep="/") cdfFiles <- dir(gcmsPath, "CDF", full=TRUE) eluFiles <- dir(gcmsPath, "ELU", full=TRUE) # read data, peak detection results pd <- peaksDataset(cdfFiles[1:2], mz=seq(50,550), rtrange=c(7.5,8.5)) pd <- addAMDISPeaks(pd, eluFiles[1:2]) ca <- clusterAlignment(pd, gap=0.5, D=0.05, df=30, metric=1, type=1)
This function calculates the similarity of all pairs of peaks from 2 samples, using the spectra similarity and the rretention time differencies
corPrt(d1, d2, t1, t2, D, penality = 0.2)
corPrt(d1, d2, t1, t2, D, penality = 0.2)
d1 |
data matrix for sample 1 |
d2 |
data matrix for sample 2 |
t1 |
vector of retention times for sample 1 |
t2 |
vector of retention times for sample 2 |
D |
retention time window for the matching |
penality |
penalization applied to the matching between two mass
spectra if |
Computes the Pearson carrelation between every pair of peak vectors in the
retention time window (D
)and returns the similarity matrix.
matrix of similarities
Riccardo Romoli
## Not Run require(gcspikelite) files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam() data <- addXCMSPeaks(files[1:2], data, settings = mfp, multipleMF = FALSE) data ## review peak picking plotChrom(data, rtrange=c(7.5, 10.5), runs=c(1:2)) r <- corPrt(data@peaksdata[[1]], data@peaksdata[[2]], data@peaksrt[[1]], data@peaksrt[[2]], D = 50, penality = 0.2) ## End (Not Run)
## Not Run require(gcspikelite) files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam() data <- addXCMSPeaks(files[1:2], data, settings = mfp, multipleMF = FALSE) data ## review peak picking plotChrom(data, rtrange=c(7.5, 10.5), runs=c(1:2)) r <- corPrt(data@peaksdata[[1]], data@peaksdata[[2]], data@peaksrt[[1]], data@peaksrt[[2]], D = 50, penality = 0.2) ## End (Not Run)
Duplicate peak removal function
deDuper(object, mz.abs = 0.1, rt.abs = 2)
deDuper(object, mz.abs = 0.1, rt.abs = 2)
object |
xcms object |
mz.abs |
mz range |
rt.abs |
rt range |
Remove redundant peaks, in this case where there are any peaks within an absolute m/z value of 0.2 and within 3 s for any one sample in the xcmsSet (the largest peak is kept)
an object of xcms class
r
The function calculate the distance between each mas spec in the msp file and the aligned mass spec from each sampe
distToLib(mspLib, outList)
distToLib(mspLib, outList)
mspLib |
a .msp file from NIST |
outList |
an object from gatherInfo() |
Return the distance matrix
the distance matrix between the mass spec and the aligned spec
Riccardo Romoli
This function calls C code for a bare-bones dynamic programming algorithm, finding the best cost path through a similarity matrix.
dp(M, gap = 0.5, big = 1e+10, verbose = FALSE)
dp(M, gap = 0.5, big = 1e+10, verbose = FALSE)
M |
similarity matrix |
gap |
penalty for gaps |
big |
large value used for matrix margins |
verbose |
logical, whether to print out information |
This is a pretty standard implementation of a bare-bones dynamic programming algorithm, with a single gap parameter and allowing only simple jumps through the matrix (up, right or diagonal).
list
with element match
with the set of pairwise matches.
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # read data, peak detection results pd<-peaksDataset(cdfFiles[1:2],mz=seq(50,550),rtrange=c(7.5,8.5)) pd<-addAMDISPeaks(pd,eluFiles[1:2]) # similarity matrix r<-normDotProduct(pd@peaksdata[[1]],pd@peaksdata[[2]]) # dynamic-programming-based matching of peaks v<-dp(r,gap=.5)
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # read data, peak detection results pd<-peaksDataset(cdfFiles[1:2],mz=seq(50,550),rtrange=c(7.5,8.5)) pd<-addAMDISPeaks(pd,eluFiles[1:2]) # similarity matrix r<-normDotProduct(pd@peaksdata[[1]],pd@peaksdata[[2]]) # dynamic-programming-based matching of peaks v<-dp(r,gap=.5)
Dynamic Retention Time Based Alignment algorithm, given a similarity matrix
dynRT(S)
dynRT(S)
S |
similarity matrix |
This function align two chromatograms finding the maximum similarity among the mass spectra
list containing the matched peaks between the two chromatograms. The number represent position of the spectra in the S matrix
require(gcspikelite) files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam() data <- addXCMSPeaks(files[1:2], data, settings = mfp, multipleMF = FALSE) data ## review peak picking plotChrom(data, rtrange=c(7.5, 10.5), runs=c(1:2)) ## similarity r <- ndpRT(data@peaksdata[[1]], data@peaksdata[[2]], data@peaksrt[[1]], data@peaksrt[[2]], D = 50) ## dynamic retention time based alignment algorithm v <- dynRT(S = r)
require(gcspikelite) files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam() data <- addXCMSPeaks(files[1:2], data, settings = mfp, multipleMF = FALSE) data ## review peak picking plotChrom(data, rtrange=c(7.5, 10.5), runs=c(1:2)) ## similarity r <- ndpRT(data@peaksdata[[1]], data@peaksdata[[2]], data@peaksrt[[1]], data@peaksrt[[2]], D = 50) ## dynamic retention time based alignment algorithm v <- dynRT(S = r)
Write the mass spectum into a .msp file to be used in NIST search.
exportSpectra(object, outList, spectra, normalize = TRUE)
exportSpectra(object, outList, spectra, normalize = TRUE)
object |
an object of class "peaksDataset" |
outList |
an object created using the gatherInfo() function |
spectra |
numeric. The number of the mass spectra to be printed. It correspond to the number of the peak in the plot() and the number of the peak in the gatherInfo() list. |
normalize |
logical. If the mass spectra has to be normalized to 100 |
Write the mass spectum into a .msp file to be used in NIST search.
a .msp file
Given an alignment table (indices of matched peaks across several samples)
such as that within a progressiveAlignment
or
multipleAlignment
object, this routines goes through the raw data and
collects the abundance of each fragment peak, as well as the retention times
across the samples.
gatherInfo( pD, obj, newind = NULL, method = c("apex"), findmzind = TRUE, useTIC = FALSE, top = NULL, intensity.cut = 0.05 )
gatherInfo( pD, obj, newind = NULL, method = c("apex"), findmzind = TRUE, useTIC = FALSE, top = NULL, intensity.cut = 0.05 )
pD |
a |
obj |
either a |
newind |
list giving the |
method |
method used to gather abundance information, only |
findmzind |
logical, whether to take a subset of all m/z indices |
useTIC |
logical, whether to use total ion current for abundance summaries |
top |
only use the top |
intensity.cut |
percentage of the maximum intensity |
This procedure loops through the the table of matched peaks and gathers the
Returns a list (of lists) for each row in the alignment table. Each list has 3 elements:
mz |
a numerical vector of the m/z fragments used |
rt |
a numerical vector for the exact retention time of each peak across all samples |
data |
matrix of fragment intensities. If
|
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
require(gcspikelite) ## paths and files gcmsPath <- paste(find.package("gcspikelite"), "data", sep = "/") cdfFiles <- dir(gcmsPath, "CDF", full = TRUE) eluFiles <- dir(gcmsPath, "ELU", full = TRUE) ## read data, peak detection results pd <- peaksDataset(cdfFiles[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) pd <- addAMDISPeaks(pd, eluFiles[1:2]) ## multiple alignment ma <- multipleAlignment(pd, c(1,1), wn.gap = 0.5, wn.D = 0.05, bw.gap = 0.6, bw.D = 0.2, usePeaks = TRUE, filterMin = 1, df = 50, verbose = TRUE, metric = 1, type = 1) ## gather apex intensities d <- gatherInfo(pd, ma) ## table of retention times nm <- list(paste("MP", 1:length(d), sep = ""), c("S1", "S2")) rts <- matrix(unlist(sapply(d, .subset, "rt")), byrow = TRUE, nc = 2, dimnames = nm)
require(gcspikelite) ## paths and files gcmsPath <- paste(find.package("gcspikelite"), "data", sep = "/") cdfFiles <- dir(gcmsPath, "CDF", full = TRUE) eluFiles <- dir(gcmsPath, "ELU", full = TRUE) ## read data, peak detection results pd <- peaksDataset(cdfFiles[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) pd <- addAMDISPeaks(pd, eluFiles[1:2]) ## multiple alignment ma <- multipleAlignment(pd, c(1,1), wn.gap = 0.5, wn.D = 0.05, bw.gap = 0.6, bw.D = 0.2, usePeaks = TRUE, filterMin = 1, df = 50, verbose = TRUE, metric = 1, type = 1) ## gather apex intensities d <- gatherInfo(pd, ma) ## table of retention times nm <- list(paste("MP", 1:length(d), sep = ""), c("S1", "S2")) rts <- matrix(unlist(sapply(d, .subset, "rt")), byrow = TRUE, nc = 2, dimnames = nm)
The head-to-tail-plot for the mass spectra
headToTailPlot(specFromLib, specFromList)
headToTailPlot(specFromLib, specFromList)
specFromLib |
the mass spectra obtained from the .msp file |
specFromList |
the mass spectra obtained from |
Head-to-tail-plot to visually compare the mass spectra
the plot
Riccardo Romoli
Read the mass spectra from an external msp file
importSpec(file)
importSpec(file)
file |
a .msp file from NIST search library database |
Read the mass spectra from an external file in msp format. The format is used in NIST search library database.
list conaining the mass spctra
Using the information within the peaks that are matched across several runs, we can impute the location of the peaks that are undetected in a subset of runs
imputePeaks(pD, obj, typ = 1, obj2 = NULL, filterMin = 1, verbose = TRUE)
imputePeaks(pD, obj, typ = 1, obj2 = NULL, filterMin = 1, verbose = TRUE)
pD |
a |
obj |
the alignment object, either |
typ |
type of imputation to do, 1 for simple linear interpolation
(default), 2 only works if |
obj2 |
a |
filterMin |
minimum number of peaks within a merged peak to impute |
verbose |
logical, whether to print out information |
If you are aligning several samples and for a (small) subset of the samples in question, a peak is undetected, there is information within the alignment that can be useful in determining where the undetected peak is, based on the surrounding matched peaks. Instead of moving forward with missing values into the data matrices, this procedures goes back to the raw data and imputes the location of the apex (as well as the start and end), so that we do not need to bother with post-hoc imputation or removing data because of missing components.
We realize that imputation is prone to error and prone to attributing intensity from neighbouring peaks to the unmatched peak. We argue that this is still better than having to deal with these in statistical models after that fact. This may be an area of future improvement.
list
with 3 elements apex
, start
and
end
, each masked matrices giving the scan numbers of the imputed
peaks.
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
multipleAlignment
,
progressiveAlignment
, peaksDataset
require(gcspikelite) ## paths and files gcmsPath <- paste(find.package("gcspikelite"), "data", sep = "/") cdfFiles <- dir(gcmsPath,"CDF", full = TRUE) eluFiles <- dir(gcmsPath,"ELU", full = TRUE) ## read data, peak detection results pd <- peaksDataset(cdfFiles[1:3], mz = seq(50,550), rtrange = c(7.5,8.5)) pd <- addAMDISPeaks(pd, eluFiles[1:3]) ## alignments ca <- clusterAlignment(pd, gap = 0.5, D = 0.05, df = 30, metric = 1, type = 1, compress = FALSE) pa <-progressiveAlignment(pd, ca, gap = 0.6, D = 0.1, df = 30, compress = FALSE) v <- imputePeaks(pd, pa, filterMin = 1)
require(gcspikelite) ## paths and files gcmsPath <- paste(find.package("gcspikelite"), "data", sep = "/") cdfFiles <- dir(gcmsPath,"CDF", full = TRUE) eluFiles <- dir(gcmsPath,"ELU", full = TRUE) ## read data, peak detection results pd <- peaksDataset(cdfFiles[1:3], mz = seq(50,550), rtrange = c(7.5,8.5)) pd <- addAMDISPeaks(pd, eluFiles[1:3]) ## alignments ca <- clusterAlignment(pd, gap = 0.5, D = 0.05, df = 30, metric = 1, type = 1, compress = FALSE) pa <-progressiveAlignment(pd, ca, gap = 0.6, D = 0.1, df = 30, compress = FALSE) v <- imputePeaks(pd, pa, filterMin = 1)
Calculate the distance between a reference mass spectrum
matchSpec(spec1, outList, whichSpec)
matchSpec(spec1, outList, whichSpec)
spec1 |
reference mass spectrum |
outList |
the return of |
whichSpec |
the entry number of outList |
Calculate the distance between a reference mass spectrum and one from the sample
the distance between the reference mass spectrum and the others
Riccardo Romoli
Store the raw data and optionally, information regarding signal peaks for a number of GCMS runs
multipleAlignment( pd, group, bw.gap = 0.8, wn.gap = 0.6, bw.D = 0.2, wn.D = 0.05, filterMin = 1, lite = FALSE, usePeaks = TRUE, df = 50, verbose = TRUE, timeAdjust = FALSE, doImpute = FALSE, metric = 2, type = 2, penality = 0.2, compress = FALSE )
multipleAlignment( pd, group, bw.gap = 0.8, wn.gap = 0.6, bw.D = 0.2, wn.D = 0.05, filterMin = 1, lite = FALSE, usePeaks = TRUE, df = 50, verbose = TRUE, timeAdjust = FALSE, doImpute = FALSE, metric = 2, type = 2, penality = 0.2, compress = FALSE )
pd |
a |
group |
factor variable of experiment groups, used to guide the alignment algorithm |
bw.gap |
gap parameter for "between" alignments |
wn.gap |
gap parameter for "within" alignments |
bw.D |
distance penalty for "between" alignments. When |
wn.D |
distance penalty for "within" alignments. When |
filterMin |
minimum number of peaks within a merged peak to be kept in the analysis |
lite |
logical, whether to keep "between" alignment details (default, |
usePeaks |
logical, whether to use peaks (if |
df |
distance from diagonal to calculate similarity |
verbose |
logical, whether to print information |
timeAdjust |
logical, whether to use the full 2D profile data to estimate retention time drifts (Note: time required) |
doImpute |
logical, whether to impute the location of unmatched peaks |
metric |
numeric, different algorithm to calculate the similarity matrix
between two mass spectrum. |
type |
numeric, two different type of alignment function |
penality |
penalization applied to the matching between two mass
spectra if |
compress |
logical whether to compress the similarity matrix into a sparse format. |
multipleAlignment is the data structure giving the result of an alignment
across several GCMS runs. Multiple alignments are done progressively.
First, all samples with the same tg$Group
label with be aligned
(denoted a "within" alignment). Second, each group will be summarized into
a pseudo-data set, essentially a spectrum and retention time for each matched
peak of the within-alignment. Third, these "merged peaks" are aligned in the
same progressive manner, here called a "between" alignment.
multipleAlignment object
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
peaksDataset
, betweenAlignment
,
progressiveAlignment
require(gcspikelite) ## paths and files gcmsPath <- paste(find.package("gcspikelite"), "data", sep = "/") cdfFiles <- dir(gcmsPath, "CDF", full = TRUE) eluFiles <- dir(gcmsPath, "ELU", full = TRUE) ## read data, peak detection results pd <- peaksDataset(cdfFiles[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) pd <- addAMDISPeaks(pd,eluFiles[1:2]) ## multiple alignment ma <- multipleAlignment(pd, c(1, 1), wn.gap = 0.5, wn.D = 0.05, bw.gap = 0.6, bw.D = 0.2, usePeaks = TRUE, filterMin = 1, df = 50, verbose = TRUE, metric = 1, type = 1)
require(gcspikelite) ## paths and files gcmsPath <- paste(find.package("gcspikelite"), "data", sep = "/") cdfFiles <- dir(gcmsPath, "CDF", full = TRUE) eluFiles <- dir(gcmsPath, "ELU", full = TRUE) ## read data, peak detection results pd <- peaksDataset(cdfFiles[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) pd <- addAMDISPeaks(pd,eluFiles[1:2]) ## multiple alignment ma <- multipleAlignment(pd, c(1, 1), wn.gap = 0.5, wn.D = 0.05, bw.gap = 0.6, bw.D = 0.2, usePeaks = TRUE, filterMin = 1, df = 50, verbose = TRUE, metric = 1, type = 1)
This function calculates the similarity of all pairs of peaks from 2 samples, using the spectra similarity and the retention time differencies
ndpRT(s1, s2, t1, t2, D)
ndpRT(s1, s2, t1, t2, D)
s1 |
data matrix for sample 1 |
s2 |
data matrix for sample 2 |
t1 |
vector of retention times for sample 1 |
t2 |
vector of retention times for sample 2 |
D |
retention time window for the matching |
Computes the normalized dot product between every pair of peak vectors in
the retention time window (D
)and returns a similarity matrix.
matrix of similarities
Riccardo Romoli
## Not Run require(gcspikelite) files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam() data <- addXCMSPeaks(files[1:2], data, settings = mfp, multipleMF = FALSE) data ## review peak picking plotChrom(data, rtrange = c(7.5, 10.5), runs = c(1:2)) r <- ndpRT(data@peaksdata[[1]], data@peaksdata[[2]], data@peaksrt[[1]], data@peaksrt[[2]], D = 50) ## End (Not Run)
## Not Run require(gcspikelite) files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam() data <- addXCMSPeaks(files[1:2], data, settings = mfp, multipleMF = FALSE) data ## review peak picking plotChrom(data, rtrange = c(7.5, 10.5), runs = c(1:2)) r <- ndpRT(data@peaksdata[[1]], data@peaksdata[[2]], data@peaksrt[[1]], data@peaksrt[[2]], D = 50) ## End (Not Run)
This function calculates the similarity of all pairs of peaks from 2 samples, using the spectra similarity
normDotProduct( x1, x2, t1 = NULL, t2 = NULL, df = max(ncol(x1), ncol(x2)), D = 1e+05, timedf = NULL, verbose = FALSE )
normDotProduct( x1, x2, t1 = NULL, t2 = NULL, df = max(ncol(x1), ncol(x2)), D = 1e+05, timedf = NULL, verbose = FALSE )
x1 |
data matrix for sample 1 |
x2 |
data matrix for sample 2 |
t1 |
vector of retention times for sample 1 |
t2 |
vector of retention times for sample 2 |
df |
distance from diagonal to calculate similarity |
D |
retention time penalty |
timedf |
matrix of time differences to normalize to. if |
verbose |
logical, whether to print out information |
Efficiently computes the normalized dot product between every pair of peak vectors and returns a similarity matrix. C code is called.
matrix of similarities
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # read data, peak detection results pd<-peaksDataset(cdfFiles[1:2],mz=seq(50,550),rtrange=c(7.5,8.5)) pd<-addAMDISPeaks(pd,eluFiles[1:2]) r<-normDotProduct(pd@peaksdata[[1]],pd@peaksdata[[2]])
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # read data, peak detection results pd<-peaksDataset(cdfFiles[1:2],mz=seq(50,550),rtrange=c(7.5,8.5)) pd<-addAMDISPeaks(pd,eluFiles[1:2]) r<-normDotProduct(pd@peaksdata[[1]],pd@peaksdata[[2]])
Reads ASCII ChromaTOF-format files from AMDIS (Automated Mass Spectral Deconvolution and Identification System)
parseChromaTOF( fn, min.pc = 0.01, mz = seq(85, 500), rt.cut = 0.008, rtrange = NULL, skip = 1, rtDivide = 60 )
parseChromaTOF( fn, min.pc = 0.01, mz = seq(85, 500), rt.cut = 0.008, rtrange = NULL, skip = 1, rtDivide = 60 )
fn |
ChromaTOF filename to read. |
min.pc |
minimum percent of maximum intensity. |
mz |
vector of mass-to-charge bins of raw data table. |
rt.cut |
the difference in retention time, below which peaks are merged together. |
rtrange |
retention time range to parse peaks from, can speed up
parsing if only interested in a small region (must be |
skip |
number of rows to skip at beginning of the ChromaTOF |
rtDivide |
multiplier to divide the retention times by (default: 60) |
parseChromaTOF
will typically be called by
addChromaTOFPeaks
, not called directly.
Peaks that are detected within rt.cut
are merged together. This
avoids peaks which are essentially overlapping.
Fragments that are less than min.pc
of the maximum intensity fragment
are discarded.
list
with components peaks
(table of spectra – rows are
mass-to-charge and columns are the different detected peaks) and tab
(table of features for each detection), according to what is stored in the
ChromaTOF file.
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") tofFiles<-dir(gcmsPath,"tof",full=TRUE) # parse ChromaTOF file cTofList<-parseChromaTOF(tofFiles[1])
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") tofFiles<-dir(gcmsPath,"tof",full=TRUE) # parse ChromaTOF file cTofList<-parseChromaTOF(tofFiles[1])
Reads ASCII ELU-format files from AMDIS (Automated Mass Spectral Deconvolution and Identification System)
parseELU(f, min.pc = 0.01, mz = seq(50, 550), rt.cut = 0.008, rtrange = NULL)
parseELU(f, min.pc = 0.01, mz = seq(50, 550), rt.cut = 0.008, rtrange = NULL)
f |
ELU filename to read. |
min.pc |
minimum percent of maximum intensity. |
mz |
vector of mass-to-charge bins of raw data table. |
rt.cut |
the difference in retention time, below which peaks are merged together. |
rtrange |
retention time range to parse peaks from, can speed up
parsing if only interested in a small region (must be |
parseELU
will typically be called by addAMDISPeaks
, not
called directly.
Peaks that are detected within rt.cut
are merged together. This
avoids peaks which are essentially overlapping.
Fragments that are less than min.pc
of the maximum intensity fragment
are discarded.
list
with components peaks
(table of spectra – rows are
mass-to-charge and columns are the different detected peaks) and tab
(table of features for each detection), according to what is stored in the
ELU file.
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # parse ELU file eluList<-parseELU(eluFiles[1])
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # parse ELU file eluList<-parseELU(eluFiles[1])
Store the raw data and optionally, information regarding signal peaks for a number of GCMS runs
peaksAlignment( d1, d2, t1, t2, gap = 0.5, D = 50, timedf = NULL, df = 30, verbose = TRUE, usePeaks = TRUE, compress = TRUE, metric = 2, type = 2, penality = 0.2 )
peaksAlignment( d1, d2, t1, t2, gap = 0.5, D = 50, timedf = NULL, df = 30, verbose = TRUE, usePeaks = TRUE, compress = TRUE, metric = 2, type = 2, penality = 0.2 )
d1 |
matrix of MS intensities for 1st sample (if doing a peak alignment, this contains peak apexes/areas; if doing a profile alignment, this contains scan intensities. Rows are m/z bins, columns are peaks/scans. |
d2 |
matrix of MS intensities for 2nd sample |
t1 |
vector of retention times for 1st sample |
t2 |
vector of retention times for 2nd sample |
gap |
gap penalty for dynamic programming algorithm. Not used if
|
D |
time window (on same scale as retention time differences,
|
timedf |
list (length = the number of pairwise alignments) of matrices
giving the expected time differences expected at each pair of peaks used
with |
df |
integer, how far from the diagonal to go to calculate the similarity of peaks. Smaller value should run faster, but be careful not to choose too low. |
verbose |
logical, whether to print out info. |
usePeaks |
logical, |
compress |
logical, whether to compress the similarity matrix into a sparse format. |
metric |
numeric, different algorithm to calculate the similarity
matrix between two mass spectrum. |
type |
numeric, two different type of alignment function |
penality |
penalization applied to the matching between two mass
spectra if |
peaksAlignment is a hold-all data structure of the raw and peak detection data.
peaksAlignment
object
Mark Robinson, Riccardo Romoli
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
peaksDataset
, clusterAlignment
## see clusterAlignment, it calls peaksAlignment ## Not Run: files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam(snthresh = 3, ppm = 3000, peakwidth = c(3, 40), prefilter = c(3, 100), fitgauss = FALSE, integrate = 2, noise = 0, extendLengthMSW = TRUE, mzCenterFun = "wMean") data <- addXCMSPeaks(files[1:2], data, settings = mfp) data plotChrom(data, rtrange=c(7.5, 10.5), runs=c(1:2)) ## align two chromatogram pA <- peaksAlignment(data@peaksdata[[1]], data@peaksdata[[2]], data@peaksrt[[1]], data@peaksrt[[2]], D = 50, metric = 3, compress = FALSE, type = 2, penality = 0.2) plotAlignment(pA) pA@v$match par(mfrow=c(2,1)) plot(data@peaksdata[[1]][,15], type = 'h', main = paste(data@peaksrt[[1]][[15]])) plot(data@peaksdata[[2]][,17], type = 'h', main = paste(data@peaksrt[[2]][[17]])) ## End (Not Run)
## see clusterAlignment, it calls peaksAlignment ## Not Run: files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam(snthresh = 3, ppm = 3000, peakwidth = c(3, 40), prefilter = c(3, 100), fitgauss = FALSE, integrate = 2, noise = 0, extendLengthMSW = TRUE, mzCenterFun = "wMean") data <- addXCMSPeaks(files[1:2], data, settings = mfp) data plotChrom(data, rtrange=c(7.5, 10.5), runs=c(1:2)) ## align two chromatogram pA <- peaksAlignment(data@peaksdata[[1]], data@peaksdata[[2]], data@peaksrt[[1]], data@peaksrt[[2]], D = 50, metric = 3, compress = FALSE, type = 2, penality = 0.2) plotAlignment(pA) pA@v$match par(mfrow=c(2,1)) plot(data@peaksdata[[1]][,15], type = 'h', main = paste(data@peaksrt[[1]][[15]])) plot(data@peaksdata[[2]][,17], type = 'h', main = paste(data@peaksrt[[2]][[17]])) ## End (Not Run)
Store the raw data and optionally, information regarding signal peaks for a number of GCMS runs
peaksDataset( fns = dir(, "[Cc][Dd][Ff]"), verbose = TRUE, mz = seq(50, 550), rtDivide = 60, rtrange = NULL )
peaksDataset( fns = dir(, "[Cc][Dd][Ff]"), verbose = TRUE, mz = seq(50, 550), rtDivide = 60, rtrange = NULL )
fns |
character vector, filenames of raw data in CDF format. |
verbose |
logical, if |
mz |
vector giving bins of raw data table. |
rtDivide |
number giving the amount to divide the retention times by. |
rtrange |
retention time range to limit data to (must be |
peaksDataset is a hold-all data structure of the raw and peak detection data.
peaksDataset
object
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # read data pd<-peaksDataset(cdfFiles[1:2],mz=seq(50,550),rtrange=c(7.5,8.5)) show(pd)
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # read data pd<-peaksDataset(cdfFiles[1:2],mz=seq(50,550),rtrange=c(7.5,8.5)) show(pd)
Plot the aligned mass spectra
plotAlignedFrags( object, outList, specID, fullRange = TRUE, normalize = TRUE, ... )
plotAlignedFrags( object, outList, specID, fullRange = TRUE, normalize = TRUE, ... )
object |
where to keep the mass range of the experiment |
outList |
where to keep the mass spectra; both abundance than m/z |
specID |
a vector containing the index of the spectra to be plotted. Is referred to outList |
fullRange |
if TRUE uses the mass range of the whole experiment, otherwise uses only the mass range of each plotted spectum |
normalize |
if TRUE normalize the intensity of the mass peak to 100, the most abundant is 100% and the other peaks are scaled consequetially |
... |
further arguments passed to the ‘plot’ command |
Plot the deconvoluted and aligned mass spectra collected using gatherInfo()
Riccardo Romoli ([email protected])
files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:4], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam(snthresh = 3, ppm = 3000, peakwidth = c(3, 40), prefilter = c(3, 100), fitgauss = FALSE, integrate = 2, noise = 0, extendLengthMSW = TRUE, mzCenterFun = "wMean") data <- addXCMSPeaks(files[1:4], data, settings = mfp) data ## multiple alignment ma <- multipleAlignment(data, c(1,1,2,2), wn.gap = 0.5, wn.D = 0.05, bw.gap = 0.6, bw.D = 0.2, usePeaks = TRUE, filterMin = 1, df = 50, verbose = TRUE, metric = 2, type = 2) ## gather apex intensities gip <- gatherInfo(data, ma) gip[[33]] plotAlignedFrags(object = data, outList = gip, specID = 33)
files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:4], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam(snthresh = 3, ppm = 3000, peakwidth = c(3, 40), prefilter = c(3, 100), fitgauss = FALSE, integrate = 2, noise = 0, extendLengthMSW = TRUE, mzCenterFun = "wMean") data <- addXCMSPeaks(files[1:4], data, settings = mfp) data ## multiple alignment ma <- multipleAlignment(data, c(1,1,2,2), wn.gap = 0.5, wn.D = 0.05, bw.gap = 0.6, bw.D = 0.2, usePeaks = TRUE, filterMin = 1, df = 50, verbose = TRUE, metric = 2, type = 2) ## gather apex intensities gip <- gatherInfo(data, ma) gip[[33]] plotAlignedFrags(object = data, outList = gip, specID = 33)
Plotting functions for GCMS data objects
## S4 method for signature 'peaksAlignment' plotAlignment( object, xlab = "Peaks - run 1", ylab = "Peaks - run 2", plotMatches = TRUE, matchPch = 19, matchLwd = 3, matchCex = 0.5, matchCol = "black", col = colorpanel(50, "white", "green", "navyblue"), breaks = seq(0, 1, length = 51), ... )
## S4 method for signature 'peaksAlignment' plotAlignment( object, xlab = "Peaks - run 1", ylab = "Peaks - run 2", plotMatches = TRUE, matchPch = 19, matchLwd = 3, matchCex = 0.5, matchCol = "black", col = colorpanel(50, "white", "green", "navyblue"), breaks = seq(0, 1, length = 51), ... )
object |
a |
xlab |
x-axis label |
ylab |
y-axis label |
plotMatches |
logical, whether to plot matches |
matchPch |
match plotting character |
matchLwd |
match line width |
matchCex |
match character expansion factor |
matchCol |
match colour |
col |
vector of colours for colourscale |
breaks |
vector of breaks for colourscale |
... |
further arguments passed to |
Plot an object of peaksAlignment
The similarity matrix is plotted and optionally, the set of matching peaks.
clusterAlignment
objects are just a collection of all pairwise
peakAlignment
objects.
plot an object of class peaksAlignment
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
peaksAlignment
plotAlignment
require(gcspikelite) files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam(snthresh = 3, ppm = 3000, peakwidth = c(3, 40), prefilter = c(3, 100), fitgauss = FALSE, integrate = 2, noise = 0, extendLengthMSW = TRUE, mzCenterFun = "wMean") data <- addXCMSPeaks(files[1:2], data, settings = mfp) data ## image plot plotChrom(data, rtrange = c(7.5,8.5), plotPeaks = TRUE, plotPeakLabels =TRUE) ## align two chromatogram pA <- peaksAlignment(data@peaksdata[[1]], data@peaksdata[[2]], data@peaksrt[[1]], data@peaksrt[[2]], D = 50, compress = FALSE, type = 1, metric = 1, gap = 0.5) plotAlignment(pA)
require(gcspikelite) files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam(snthresh = 3, ppm = 3000, peakwidth = c(3, 40), prefilter = c(3, 100), fitgauss = FALSE, integrate = 2, noise = 0, extendLengthMSW = TRUE, mzCenterFun = "wMean") data <- addXCMSPeaks(files[1:2], data, settings = mfp) data ## image plot plotChrom(data, rtrange = c(7.5,8.5), plotPeaks = TRUE, plotPeakLabels =TRUE) ## align two chromatogram pA <- peaksAlignment(data@peaksdata[[1]], data@peaksdata[[2]], data@peaksrt[[1]], data@peaksrt[[2]], D = 50, compress = FALSE, type = 1, metric = 1, gap = 0.5) plotAlignment(pA)
Store the raw data and optionally, information regarding signal peaks for a number of GCMS runs
## S4 method for signature 'peaksDataset' plotChrom( object, runs = 1:length(object@rawdata), mzind = 1:nrow(object@rawdata[[1]]), mind = NULL, plotSampleLabels = TRUE, calcGlobalMax = FALSE, peakCex = 0.8, plotPeaks = TRUE, plotPeakBoundaries = FALSE, plotPeakLabels = FALSE, plotMergedPeakLabels = TRUE, mlwd = 3, usePeaks = TRUE, plotAcrossRuns = FALSE, overlap = F, rtrange = NULL, cols = NULL, thin = 1, max.near = median(object@rawrt[[1]]), how.near = 50, scale.up = 1, ... )
## S4 method for signature 'peaksDataset' plotChrom( object, runs = 1:length(object@rawdata), mzind = 1:nrow(object@rawdata[[1]]), mind = NULL, plotSampleLabels = TRUE, calcGlobalMax = FALSE, peakCex = 0.8, plotPeaks = TRUE, plotPeakBoundaries = FALSE, plotPeakLabels = FALSE, plotMergedPeakLabels = TRUE, mlwd = 3, usePeaks = TRUE, plotAcrossRuns = FALSE, overlap = F, rtrange = NULL, cols = NULL, thin = 1, max.near = median(object@rawrt[[1]]), how.near = 50, scale.up = 1, ... )
object |
a |
runs |
set of run indices to plot |
mzind |
set of mass-to-charge indices to sum over (default, all) |
mind |
matrix of aligned indices |
plotSampleLabels |
logical, whether to display sample labels |
calcGlobalMax |
logical, whether to calculate an overall maximum for scaling |
peakCex |
character expansion factor for peak labels |
plotPeaks |
logical, whether to plot hashes for each peak |
plotPeakBoundaries |
logical, whether to display peak boundaries |
plotPeakLabels |
logical, whether to display peak labels |
plotMergedPeakLabels |
logical, whether to display 'merged' peak labels |
mlwd |
line width of lines indicating the alignment |
usePeaks |
logical, whether to plot alignment of peaks (otherwise, scans) |
plotAcrossRuns |
logical, whether to plot across peaks when unmatched peak is given |
overlap |
logical, whether to plot TIC/XICs overlapping |
rtrange |
vector of length 2 giving start and end of the X-axis |
cols |
vector of colours (same length as the length of runs) |
thin |
when |
max.near |
where to look for maximum |
how.near |
how far away from |
scale.up |
a constant factor to scale the TICs |
... |
further arguments passed to the |
Each TIC is scale to the maximum value (as specified by the
how.near
and max.near
values). The many parameters gives
considerable flexibility of how the TICs can be visualized.
plot the chromatograms
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
require(gcspikelite) ## paths and files gcmsPath <- paste(find.package("gcspikelite"), "data", sep="/") cdfFiles <- dir(gcmsPath, "CDF", full=TRUE) eluFiles <- dir(gcmsPath, "ELU", full=TRUE) ## read data pd <- peaksDataset(cdfFiles[1:3], mz=seq(50,550), rtrange=c(7.5,8.5)) ## image plot plotChrom(pd, rtrange = c(7.5,8.5), plotPeaks = TRUE, plotPeakLabels = TRUE)
require(gcspikelite) ## paths and files gcmsPath <- paste(find.package("gcspikelite"), "data", sep="/") cdfFiles <- dir(gcmsPath, "CDF", full=TRUE) eluFiles <- dir(gcmsPath, "ELU", full=TRUE) ## read data pd <- peaksDataset(cdfFiles[1:3], mz=seq(50,550), rtrange=c(7.5,8.5)) ## image plot plotChrom(pd, rtrange = c(7.5,8.5), plotPeaks = TRUE, plotPeakLabels = TRUE)
Plotting functions for GCMS data objects
## S4 method for signature 'clusterAlignment' plotClustAlignment(object, alignment = 1, ...)
## S4 method for signature 'clusterAlignment' plotClustAlignment(object, alignment = 1, ...)
object |
|
alignment |
the set of alignments to plot |
... |
further arguments passed to |
For clusterAlignment
objects, the similarity matrix is plotted and
optionally, the set of matching peaks. clusterAlignment
objects are
just a collection of all pairwise peakAlignment
objects.
plot the pairwise alignment
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
plotAlignment
require(gcspikelite) # paths and files gcmsPath <- paste(find.package("gcspikelite"), "data", sep="/") cdfFiles <- dir(gcmsPath, "CDF", full=TRUE) eluFiles <- dir(gcmsPath, "ELU", full=TRUE) # read data, peak detection results pd <- peaksDataset(cdfFiles[1:2], mz=seq(50,550), rtrange=c(7.5,8.5)) pd <- addAMDISPeaks(pd, eluFiles[1:2]) ca <- clusterAlignment(pd, gap=0.5, D=0.05, df=30, metric=1, type=1) plotClustAlignment(ca, run = 1) plotClustAlignment(ca, run = 2) plotClustAlignment(ca, run = 3)
require(gcspikelite) # paths and files gcmsPath <- paste(find.package("gcspikelite"), "data", sep="/") cdfFiles <- dir(gcmsPath, "CDF", full=TRUE) eluFiles <- dir(gcmsPath, "ELU", full=TRUE) # read data, peak detection results pd <- peaksDataset(cdfFiles[1:2], mz=seq(50,550), rtrange=c(7.5,8.5)) pd <- addAMDISPeaks(pd, eluFiles[1:2]) ca <- clusterAlignment(pd, gap=0.5, D=0.05, df=30, metric=1, type=1) plotClustAlignment(ca, run = 1) plotClustAlignment(ca, run = 2) plotClustAlignment(ca, run = 3)
Plot the mass spectra from the profile matrix
plotFrags(object, sample, specID, normalize = TRUE, ...)
plotFrags(object, sample, specID, normalize = TRUE, ...)
object |
an object of class "peaksDataset" where to keep the mass spectra; both abundance (y) than m/z (x) |
sample |
character, the sample from were to plot the mass spectra |
specID |
numerical, a vector containing the index of the spectra to be plotted. |
normalize |
logical, if TRUE normalize the intensity of the mass peak to 100, the most abundant is 100 consequetially |
... |
other parameter passed to the plot() function |
Plot the deconvoluted mass spectra from the profile matrix
files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam(snthresh = 3, ppm = 3000, peakwidth = c(3, 40), prefilter = c(3, 100), fitgauss = FALSE, integrate = 2, noise = 0, extendLengthMSW = TRUE, mzCenterFun = "wMean") data <- addXCMSPeaks(files[1:2], data, settings = mfp) data ## align two chromatogram pA <- peaksAlignment(data@peaksdata[[1]], data@peaksdata[[2]], data@peaksrt[[1]], data@peaksrt[[2]], D = 50, metric = 3, compress = FALSE, type = 2, penality = 0.2) pA@v$match ## plot the mass spectra par(mfrow=c(2,1)) plotFrags(object=data, sample=1, specID=10) plotFrags(object=data, sample=2, specID=12)
files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam(snthresh = 3, ppm = 3000, peakwidth = c(3, 40), prefilter = c(3, 100), fitgauss = FALSE, integrate = 2, noise = 0, extendLengthMSW = TRUE, mzCenterFun = "wMean") data <- addXCMSPeaks(files[1:2], data, settings = mfp) data ## align two chromatogram pA <- peaksAlignment(data@peaksdata[[1]], data@peaksdata[[2]], data@peaksrt[[1]], data@peaksrt[[2]], D = 50, metric = 3, compress = FALSE, type = 2, penality = 0.2) pA@v$match ## plot the mass spectra par(mfrow=c(2,1)) plotFrags(object=data, sample=1, specID=10) plotFrags(object=data, sample=2, specID=12)
Image plots (i.e. 2D heatmaps) of raw GCMS profile data
## S4 method for signature 'peaksDataset' plotImage( object, run = 1, rtrange = c(11, 13), main = NULL, mzrange = c(50, 200), SCALE = log2, ... )
## S4 method for signature 'peaksDataset' plotImage( object, run = 1, rtrange = c(11, 13), main = NULL, mzrange = c(50, 200), SCALE = log2, ... )
object |
a |
run |
index of the run to plot an image for |
rtrange |
vector of length 2 giving start and end of the X-axis (retention time) |
main |
main title (auto-constructed if not specified) |
mzrange |
vector of length 2 giving start and end of the Y-axis (mass-to-charge ratio) |
SCALE |
function called to scale the data (default: |
... |
further arguments passed to the |
For peakDataset
objects, each TIC is scale to the maximum value (as
specified by the how.near
and max.near
values). The many
parameters gives considerable flexibility of how the TICs can be visualized.
For peakAlignment
objects, the similarity matrix is plotted and
optionally, the set of matching peaks. clusterAlignment
objects are
just a collection of all pairwise peakAlignment
objects.
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # read data pd<-peaksDataset(cdfFiles[1],mz=seq(50,550),rtrange=c(7.5,8.5)) # image plot plotImage(pd,run=1,rtrange=c(7.5,8.5),main="")
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # read data pd<-peaksDataset(cdfFiles[1],mz=seq(50,550),rtrange=c(7.5,8.5)) # image plot plotImage(pd,run=1,rtrange=c(7.5,8.5),main="")
Performs a progressive peak alignment (clustalw style) of multiple GCMS peak lists
progressiveAlignment( pD, cA, D = 50, gap = 0.5, verbose = TRUE, usePeaks = TRUE, df = 30, compress = FALSE, type = 2 )
progressiveAlignment( pD, cA, D = 50, gap = 0.5, verbose = TRUE, usePeaks = TRUE, df = 30, compress = FALSE, type = 2 )
pD |
a |
cA |
a |
D |
retention time penalty |
gap |
gap parameter |
verbose |
logical, whether to print information |
usePeaks |
logical, whether to use peaks (if |
df |
distance from diagonal to calculate similarity |
compress |
logical, whether to store the similarity matrices in sparse form |
type |
numeric, two different type of alignment function |
The progressive peak alignment we implemented here for multiple GCMS peak
lists is analogous to how clustalw
takes a set of pairwise sequence
alignments and progressively builds a multiple alignment. More details can
be found in the reference below.
progressiveAlignment
object
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
peaksDataset
, multipleAlignment
require(gcspikelite) files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam(snthresh = 3, ppm = 3000, peakwidth = c(3, 40), prefilter = c(3, 100), fitgauss = FALSE, integrate = 2, noise = 0, extendLengthMSW = TRUE, mzCenterFun = "wMean") data <- addXCMSPeaks(files[1:2], data, settings = mfp) data ca <- clusterAlignment(data, gap = 0.5, D = 0.05, df = 30, metric = 1, type = 1, compress = FALSE) pa <- progressiveAlignment(data, ca, gap = 0.6, D = 0.1, df = 30, type = 1, compress = FALSE)
require(gcspikelite) files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam(snthresh = 3, ppm = 3000, peakwidth = c(3, 40), prefilter = c(3, 100), fitgauss = FALSE, integrate = 2, noise = 0, extendLengthMSW = TRUE, mzCenterFun = "wMean") data <- addXCMSPeaks(files[1:2], data, settings = mfp) data ca <- clusterAlignment(data, gap = 0.5, D = 0.05, df = 30, metric = 1, type = 1, compress = FALSE) pa <- progressiveAlignment(data, ca, gap = 0.6, D = 0.1, df = 30, type = 1, compress = FALSE)
Build a fat data matrix
retFatMatrix(object, data, minFilter = round(length(object@files)/3 * 2))
retFatMatrix(object, data, minFilter = round(length(object@files)/3 * 2))
object |
peakDataset object |
data |
a gatherInfo() object |
minFilter |
the minimum number for a feature to be returned in the data matrix. Default is 2/3 of the samples |
This function allows to extract the data from an object created using
gatherInfo
and build a data matrix using the area of the deconvoluted
and aligned peaks. The row are the samples while the column represent the
different peaks.
A fat data matrix containing the area of the deconvoluted and aligned peaks. The row are the samples while the column represent the different peaks
Riccardo Romoli [email protected]
require(gcspikelite) files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam(snthresh = 3, ppm = 3000, peakwidth = c(3, 40), prefilter = c(3, 100), fitgauss = FALSE, integrate = 2, noise = 0, extendLengthMSW = TRUE, mzCenterFun = "wMean") data <- addXCMSPeaks(files[1:2], data, settings = mfp) data ma <- multipleAlignment(pd = data, group = c(1,1), filterMin = 1, metric = 2, type = 2) outList <- gatherInfo(data, ma) mtxD <- retFatMatrix(object = data, data = outList, minFilter = 1)
require(gcspikelite) files <- list.files(path = paste(find.package("gcspikelite"), "data", sep = "/"),"CDF", full = TRUE) data <- peaksDataset(files[1:2], mz = seq(50, 550), rtrange = c(7.5, 8.5)) ## create settings object mfp <- xcms::MatchedFilterParam(fwhm = 10, snthresh = 5) cwt <- xcms::CentWaveParam(snthresh = 3, ppm = 3000, peakwidth = c(3, 40), prefilter = c(3, 100), fitgauss = FALSE, integrate = 2, noise = 0, extendLengthMSW = TRUE, mzCenterFun = "wMean") data <- addXCMSPeaks(files[1:2], data, settings = mfp) data ma <- multipleAlignment(pd = data, group = c(1,1), filterMin = 1, metric = 2, type = 2) outList <- gatherInfo(data, ma) mtxD <- retFatMatrix(object = data, data = outList, minFilter = 1)
Using rlm
from MASS, this procedure fits a linear model using all the
fragments
rmaFitUnit( u, maxit = 5, mzEffect = TRUE, cls = NULL, fitSample = TRUE, fitOrCoef = c("coef", "fit"), TRANSFORM = log2 )
rmaFitUnit( u, maxit = 5, mzEffect = TRUE, cls = NULL, fitSample = TRUE, fitOrCoef = c("coef", "fit"), TRANSFORM = log2 )
u |
a metabolite unit (list object with vectors |
maxit |
maximum number of iterations (default: 5) |
mzEffect |
logical, whether to fit m/z effect (default: |
cls |
class variable |
fitSample |
whether to fit individual samples (alternative is fit by group) |
fitOrCoef |
whether to return a vector of coefficients (default:
"coef"), or an |
TRANSFORM |
function to transform the raw data to before fitting
(default: |
Fits a robust linear model.
list
giving elements of fragment
and sample
coefficients (if fitOrCoef="coef"
) or a list
of elements from
the fitting process (if fitOrCoef="fit"
)
Mark Robinson
Mark D Robinson (2008). Methods for the analysis of gas chromatography - mass spectrometry data PhD dissertation University of Melbourne.
peaksAlignment
, clusterAlignment
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # read data, peak detection results pd<-peaksDataset(cdfFiles[1:2],mz=seq(50,550),rtrange=c(7.5,8.5)) pd<-addAMDISPeaks(pd,eluFiles[1:2]) # pairwise alignment using all scans fullca<-clusterAlignment(pd, usePeaks = FALSE, df = 100) # calculate retention time shifts timedf<-calcTimeDiffs(pd, fullca)
require(gcspikelite) # paths and files gcmsPath<-paste(find.package("gcspikelite"),"data",sep="/") cdfFiles<-dir(gcmsPath,"CDF",full=TRUE) eluFiles<-dir(gcmsPath,"ELU",full=TRUE) # read data, peak detection results pd<-peaksDataset(cdfFiles[1:2],mz=seq(50,550),rtrange=c(7.5,8.5)) pd<-addAMDISPeaks(pd,eluFiles[1:2]) # pairwise alignment using all scans fullca<-clusterAlignment(pd, usePeaks = FALSE, df = 100) # calculate retention time shifts timedf<-calcTimeDiffs(pd, fullca)
multipleAlignment is the data structure giving the result of an alignment
across several GCMS runs. Multiple alignments are done progressively.
First, all samples with the same tg$Group
label with be aligned
(denoted a "within" alignment). Second, each group will be summarized into
a pseudo-data set, essentially a spectrum and retention time for each matched
peak of the within-alignment. Third, these "merged peaks" are aligned in the
same progressive manner, here called a "between" alignment.
## S4 method for signature 'multipleAlignment' show(object)
## S4 method for signature 'multipleAlignment' show(object)
object |
multipleAlignment object |
Mark Robinson