The ptairMS package provides a workflow to
process PTR-TOF-MS raw data in the open Hierarchical Data Format 5 (HDF5; .h5 extension), and generate
the peak table as an ExpressionSet
object for subsequent
data analysis with the many methods and packages available in R.
Applications include the analysis of exhaled breath, cell culture
headspace or ambient air. The package offers several features to check
the raw data and tune the few processing parameters. It also enables to
include new samples in a study without re-processing all the previous
data, providing a convenient management for cohort studies
(e.g. duration of the inclusion process, longitudinal studies,
etc.).
Proton Transfer Reaction - Mass Spectrometry (PTR-MS) has emerged with excellent sensitivity and specificity for VOC analysis in a wide range of applications, including environment, food quality, and biology (Blake, Monks, and Ellis 2009). In the area of health and care, PTR-MS opens up unique opportunities for real-time analysis at the patient’s bedside.
The characterization of volatile organic compounds (VOCs) emitted by living organisms is of major interest in medicine, food sciences, and ecology. As an example, thousands of VOCs have been identified in the exhaled breath, resulting from normal metabolism or pathological processes (Lacy Costello et al. 2014). The main advantage of breath analysis in medicine is that the sampling is non-invasive (Devillier et al. 2017). Methods based on mass spectrometry (MS) are the reference technologies for VOC analysis because of their sensitivity and large dynamic range.
The workflow consists of five steps:
createPtrSet
: A ptrSet
object is
generated by taking as input the name of the directory containing the
raw files (in HDF5 format), possibly grouped into subfolders according
to classes of samples
detectPeak
: peak detection and quantification are
performed within each file and the ptrSet
object is updated
with the sample metadata, the peak list for each sample, and several
quality metrics
alignSamples
: The peak lists are aligned between
samples and an ExpressionSet
object is returned, containing
the table of peak intensities, the sample metadata, and the feature
metadata (which can be accessed with the exprs
,
pData
and fData
methods from the
Biobase
package, respectively)
imputing
: Missing values in the table of intensities
may be replaced by the integrated signal in the expected raw data
region
annotateVOC
: Features may be annotated, based on the
Human Breathomics
Database (Kuo et al. 2020)
Two real PTR-TOF-MS raw data sets are available in the ptairData package.
exhaledAir: Exhaled air from two healthy individual: three acquisitions per individual (with two expirations per acquisition), on distinct days [6 files]
mycobacteria: cell culture headspace: two replicates from two species and one control (culture medium) [6 files]
Note: To limit the size of the data and speed up the analysis, the raw data are truncated in the mass dimension within the range [20.4, 21.6] ∪ [50, 150] for individuals, and [20.4, 21.6] ∪ [56.4, 90.6] for mycobacteria.
The whole workflow of ptairMS can be run interactively through a graphical user interface, which provides visualizations (expiration phases, peaks in the raw data, peak table, individual VOCs), quality controls (calibration, resolution, peak shape and evolution of reagent ions depending on time), and exploratory data analysis.
Alternatively, the workflow can be run on the command line. We describe hereafter the main steps and parameters.
createPtrSet
: Checking raw data and setting
parametersThe analysis starts from a directory containing the raw data. For this example, we use the exhaledAir from the ptairData package:
Before processing the data, there are two important steps:
calibrate the mass axis
determine the time periods of interest within the acquisitions (i.e., expirations or headspace analysis duration in our examples)
To perform these steps and check the quality of the raw data, we
first use the createPtrSet
function which, for each
file:
reads the raw data
performs calibration on the calibrationPeriod
sum
spectra, with the mzCalibRef
reference masses (see the
Calibrating the mass axis section)
determines the expiration time limits (or headspace duration for
bacteria example) on the trace of
mzBreathTracer
mass or on the total sum
trace if this parameter is NULL
, with the
fracMaxTIC
parameter (see the Determine the time limits
for expirations or headspace duration section)
builds a default sampleMetadata
table with the file
names, date and possibly the subdirectories
This function creates a unique ptrSet
object for the
whole study, which contains all necessary information about the files:
calibration parameters, time limits, peak lists (at this step they are
empty), and primary ion quantification. The corresponding raw files are
available in the ptairData companion
package.
Importantly, the ptrSet
may be updated if new files are
added or deleted from the directory, with the function
updatePtrSet
(see the Updating the ptrSet
section).
To create a ptrSet
object form the raw data
directory:
exhaledPtrset <- createPtrSet(dir=dirRaw,
setName="exhaledPtrset",
mzCalibRef = c(21.022, 60.0525),fracMaxTIC = 0.7,
saveDir = NULL )
## ind1-1.h5 check
## ind1-2.h5 check
## ind1-3.h5 check
## ind2-1.h5 check
## ind2-2.h5 check
## ind2-3.h5 check
## ptrSet object : exhaledPtrset
## directory: /github/workspace/pkglib/ptairData/extdata/exhaledAir
## 6 files contained in the directory
## 6 files checked
## 0 files processed by detectPeak
To get the list of the file names in your directory (at the last time
the ptrSet
was created or updated), use
getFileNames
:
## [1] "ind1-1.h5" "ind1-2.h5" "ind1-3.h5" "ind2-1.h5" "ind2-2.h5" "ind2-3.h5"
To check the quality and view useful information about your files,
you can use the plot
method on the ptrSet
,
that provides four plots:
calibError
: the calibration error boxplot in
ppm
$\frac{m}{\Delta m}$: an
estimation of the resolution for the mzCalibRef
peaks
the average normalized shape of the mzCalibRef
peaks
The primary ion isotope intensity in cps in function of acquisition date
We will now quickly explain each step of this function, and show how
to choose and check the quality of the mzCalibRef
,
mzBreathTracer
and fracMaxTIC
parameters.
To convert Time Of Flight (TOF) axis to mass-to-charge ratio (mz), the following formula is used (Müller et al. 2013):
To estimate the parameters (a,b)
, reference peaks with
known masses (and without overlapping peaks) are needed. For optimal
results, the calibration peaks have to be well distributed over the
entire m/z axis. The masses of the calibration peaks are set with the
mzCalibRef
parameter, in an numeric vector. For exhaled
air, we propose by default 6 peaks (two of them are suggested by the
IONICON manufacturer):
calib_table<-read.csv(system.file("extdata", "reference_tables/calib_table.tsv", package = "ptairMS"),sep="\t")
knitr::kable(calib_table)
Formula | mz | Compound |
---|---|---|
H3 (18O)+ | 21.02204 | Isotope of primary ion (H3O+) |
N2H+ | 29.01342 | Diazote |
C3H5+ | 41.03858 | Allyl cation |
C3H7O+ | 60.05250 | Isotope of Acetone |
C6H6I+ | 203.94300 | Iodobenzene (1,3-diiodobenzene fragment) |
C6H5I2+ | 330.84950 | 1,3-diiodobenzene (instrument internal standard) |
Other reference masses (at least two) can be provided by the user.
Since there may be a drift of mass calibration of the instrument
during the acquisition, e.g. due to low change of temperature, periodic
calibration is performed every calibrationPeriods
seconds
(default 60): correction of the drifts is then performed by linear
interpolation, by using the first calibration as the reference.
If a mass contains one or several outlier values on the summary plot
(i.e. an error > 20 ppm) for specific files, you can check those
files with the function plotCalib
that plots the
Average total ion spectrum around all the
mzCalibRef
(with the exact masses as red vertical lines),
after calibration drift correction.
You can then choose to keep or delete these masses from
mzCalibRef
. To change the calibration masses, use the
calibration
function on the ptrSet
object.
In our dataset example, since we have truncated the mass axis, only two masses from those listed above are present. To calibrate with three masses, we therefore add the peak 75.04406 ([C3H6O2+H]+, Hydroxyacetone).
To see and change the expirations or headspace duration, use the
changeTimeLimits
function that opens the Shiny
application for interactive visualization and selection, or use the
timeLimits
function.
The limits are determined on the total trace if
mzBreathTracer
is NULL
, or on the trace around
mzBreathTracer
mass. They correspond to the part of the
trace where the intensity is higher than
fracMaxTIC * max(TIC)
, after baseline removal if
baseline
is set to TRUE
.
Note: To analyze the entire spectrum (e.g. in ambient air studies),
set fracMaxTIC = 0
.
Example of expiration detection at fracMaxTIC = 0.5
on a
single file:
samplePath <-getFileNames(exhaledPtrset,fullNames = TRUE)[1]
sampleRaw <- readRaw(samplePath, calib = FALSE)
expirationLimit <- timeLimits(sampleRaw,fracMaxTIC = 0.5,plotDel = TRUE, mzBreathTracer = 60.05)
Note: You can also see the expiration limits on all or several files
from the ptrSet
with the plotTIC
function. By
default, when fileNames
is NULL
, all TICs
files are plotted. A pdf file may be generated by setting the
pdfFile
argument to the absolute file path ending with the
.pdf
extension. Finally, you can remove the baseline by
setting baselineRm = TRUE
, and add the time limits to the
plot by setting showLimits = TRUE
. Coloring the TICs
according to a column from the sampleMetadata is also possible by
indicating the column name as the colorBy
parameter.
The createPtrSet
function automatically generates a
default sampleMetadata data.frame
. It contains the file
names as row names, a column named ‘subfolder’ when the files are
organized into subfolders in the parent directory, and a column date
with the date and hour of the acquisition. To get this data frame, use
the getSampleMetadata
method. Remember that the row names
of the sampleMetadata
must always correspond to all the
file names from the directory.
## subfolder date
## ind1-1.h5 ind1 21/02/2019 11:52:49
## ind1-2.h5 ind1 27/02/2019 11:41:24
## ind1-3.h5 ind1 28/02/2019 11:28:29
## ind2-1.h5 ind2 07/02/2019 10:29:33
## ind2-2.h5 ind2 18/02/2019 11:06:52
## ind2-3.h5 ind2 22/02/2019 09:52:20
You can at any moment obtain this default sample metadata with the
function resetSampleMetadata(exhaledPtrset)
.
To modify the sample metadata, there exists two different ways:
setSampleMetadata
method: modify the data frame in your
R session and set it back into the ptrSet
object.sampleMD <- getSampleMetadata(exhaledPtrset)
colnames(sampleMD)[1] <- "individual"
exhaledPtrset <- setSampleMetadata(exhaledPtrset,sampleMD)
getSampleMetadata(exhaledPtrset)
## individual date
## ind1-1.h5 ind1 21/02/2019 11:52:49
## ind1-2.h5 ind1 27/02/2019 11:41:24
## ind1-3.h5 ind1 28/02/2019 11:28:29
## ind2-1.h5 ind2 07/02/2019 10:29:33
## ind2-2.h5 ind2 18/02/2019 11:06:52
## ind2-3.h5 ind2 22/02/2019 09:52:20
exportSampleMetada
and
importSampleMetadata
methods:
exportSampleMetada
saves the data.frame in the
.tsv
format in the directory of your choice. The parameter
saveFile
must always have the .tsv
extension.
You can open and modify it on any spreadsheet editor, but row
names must never be modified: if one of the files happens to be
missing from the row names during the subsequent import, an error will
be thrown. Once the .tsv
file has been modified, it can be
imported back into the ptrSet
object, with the function
importSampleMetadata
.When calling the createPtrSet
function, you may use the
saveDir
argument to save the ptrSet
object in
the directory of your choice, with setName
parameter as
name (the .RData
extension will automatically be added at
the end of the file name). Subsequent import of the saved
ptrSet
object relies on the classical load
function.
There exist two functions for plotting the raw data:
plotRaw
: Displays for a file the image of the matrix of
intensities, the Extracted Ion Chromatogram (EIC), the
Extracted Ion Spectrum (EIS), for the selected m/z and
time ranges, and all the putative annotations available in the
vocDB
within this mz range (no peak detection has yet been
done as this step)## ion_mass ion_formula name_iupac
## 40 59.08552 [C4H10+H]+ butane|2-methylpropane
## 39 59.06037 [C2H6N2+H]+ dimethyldiazene
## 38 59.04914 [C3H6O+H]+ prop-2-en-1-ol|propanal|propan-2-one
## 37 58.94261 [Ni+H]+ nickel
plotFeatures
: for all selected files, the average
spectrum for all time periods is plotted, with the background
superimposed as a dashed line. As with the plotTIC
function, you can choose the type of display (plotly
or
ggplot
), and the possibility to color spectra individually
by indicating a column name of the sample MetadataupdatePtrSet
: Updating the ptrSetIf you delete or add files to the directory after the
ptrSet
object has been created, the
updatePtrSet
function must be run.
## nothing to update
detectPeak
: Peak detection and quantificationNow that we have checked that the calibration is efficient, and that
the expiration or headspace time limits are correct, the peaks can be
detected and quantified in each file with the detectPeak
function, which works on the ptrSet
object (and the
corresponding raw files).
For each file in the directory, this function:
performs the calibration of the mass axis every
calibrationPeriod
second
detects peaks on the total ion spectrum with an untargeted peak picking algorithm
estimates the temporal evolution of each detected peak with a 2-dimensional model
quantifies each peak in both exhaled air and background, in cps,
ncps (normalized by the primary ion, H3(O18)+ * 488) and ppb [(Hansel et al. 1995)] units, and stores the
corresponding information in the peakList
slot of the
ptrSet object as a list of ExpressionSet objects (see below)
performs two unilateral t-test comparing the mean intensities
between background and expiration/headspace phases, and stores the
p-value in the peakListAligned
slot
The peakList
is then written in the ptrSet
object as a list of ExpressionSet
, each containing all the
peaks detected in one file (e.g., sample; see below for the details of
the ExpressionSet content).
## ind1-1.h5 : 154 peaks detected
## ind1-2.h5 : 164 peaks detected
## ind1-3.h5 : 140 peaks detected
## ind2-1.h5 : 162 peaks detected
## ind2-2.h5 : 177 peaks detected
## ind2-3.h5 : 161 peaks detected
The peak detection may be restricted to specific nominal masses with
the mzNominal
argument: for example
exhaledPtrset <- detectPeak(exhaledPtrset , mzNominal = c(5,60))
.
The peak detection step may take a few minutes if there are many
files and a large m/z range (1 to 2 minutes for files for files with an
average acquisition time of 3 minutes) . Parallel computing is available
by setting parallelize = TRUE
and by giving the number of
available cores of your computer in nbCores
. To find the
number of CPU cores available in your computer, use
parallel::detectCores()
.
To see the resulting peak lists, use the getPeakList
method. It returns a list of ExpressionSet, where:
assay Data: the matrix of peak intensities, with m/z peak centre as row names, and the quantification in cps at each time point
feature Data: the data frame with m/z peak centre as row names, and the following columns:
peakList<-getPeakList(exhaledPtrset)
peakList1<-peakList$`ind1-1.h5`
X<-Biobase::exprs(peakList1)
Y<-Biobase::fData(peakList1)
mz<-Y[,"Mz"]
plot(X[which.min(abs(mz-59.0498)),],ylab="cps",xlab="time",main=paste("Temporal evolution of acetone "))
## Mz lowerMz upperMz overlap parameterPeak.delta1
## 21.022 21.02199 21.01237 21.03169 0 0.002627749
## 51.0442 51.04420 51.02412 51.06458 0 0.005044793
## 53.0031 53.00311 52.97578 53.03075 1 0.006625388
## 53.0388 53.03880 53.01560 53.06232 1 0.004657318
## 54.0076 54.00762 53.97978 54.03579 1 0.006750953
## 54.036 54.03596 54.00811 54.06414 1 0.006754495
## parameterPeak.delta2 parameterPeak.height quanti_cps background_cps
## 21.022 0.002033698 816.75081 3974.1939 4640.8017
## 51.0442 0.004725133 442.09372 5558.3477 122.7511
## 53.0031 0.006625388 101.42118 696.9602 1414.9976
## 53.0388 0.006629476 88.69103 757.6235 708.0097
## 54.0076 0.006750953 20.54174 197.6101 247.6568
## 54.036 0.006754495 17.77483 124.2098 260.0253
## diffAbs_cps pValLess pValGreater quanti_ncps background_ncps
## 21.022 602.36010 2.729988e-13 1.000000e+00 1.895138e-03 0.0022130174
## 51.0442 5386.01378 1.000000e+00 4.256174e-23 2.650559e-03 0.0000585352
## 53.0031 818.19828 2.856432e-11 1.000000e+00 3.323531e-04 0.0006747572
## 53.0388 223.18254 1.000000e+00 1.206099e-01 3.612811e-04 0.0003376222
## 54.0076 15.58893 7.769693e-04 1.000000e+00 9.423254e-05 0.0001180979
## 54.036 147.33833 1.271412e-09 1.000000e+00 5.923082e-05 0.0001239959
## diffAbs_ncps quanti_ppb background_ppb diffAbs_ppb
## 21.022 2.872420e-04 45.30682241 52.57848477 6.824506409
## 51.0442 2.568380e-03 2.81161476 0.06170721 2.707559808
## 53.0031 3.901669e-04 0.33808927 0.68214957 0.394441388
## 53.0388 1.064271e-04 0.36751649 0.34132110 0.107593030
## 54.0076 7.433754e-06 0.09393274 0.11699268 0.007364187
## 54.036 7.025991e-05 0.05904238 0.12283554 0.069602391
ptrSet
peak lists with
detectPeak
As described previously, an important feature of the
ptairMS package is the possibility to update
the ptrSet
object linked to a directory, as new data files
are added and included in this directory. In such a case, we have seen
that the updatePtrSet
function must be used to reset the
sample metadata or append it. Then, the detectPeak
function
must be used on the updated ptrSet
to compute the
additional peak lists.
alignSamples
: Aligning features between samplesThe alignment between samples (i.e. the matching of variables between
the peak lists within the ptrSet
object) is performed by
using a kernel gaussian density (Delabrière et al. 2017).
The alignSamples
returns an ExpressionSet
object, with the table of peak intensities which has just been built,
the sample meta data (borrowed from the input ptrSet
) and
the variable meta data which contains peak intensities in the
background.
It is possible to apply two filters:
On the background: only variables with a significant higher
intensity in the expiration phases compared to the background phases (at
the pValGreaterThres
p-value threshold) for
fracExp
% of the samples are kept
On sample reproducibility: only variables which are detected in
more than fracGroup
percent of the samples (or percent of
at least one group
) are kept.
If you do not want to apply those filters, set fracGroup
to 0 and pValGreaterThres
to 1.
## 113 peaks aligned
The three tables from the ExpressionSet
can be accessed
with the classical exprs
, pData
, and
fData
accessors:
ind1-1.h5 | ind1-2.h5 | ind1-3.h5 | ind2-1.h5 | ind2-2.h5 | ind2-3.h5 | |
---|---|---|---|---|---|---|
51.0442 | 2.8116148 | 2.7230144 | 2.3933849 | 8.2962850 | 6.9375697 | 10.9301556 |
52.0473 | NA | NA | NA | 0.1268996 | 0.1101952 | 0.1610980 |
53.0029 | 0.3380893 | 0.4097092 | 0.3345380 | 0.3943865 | 0.6936014 | 0.4844526 |
53.0138 | NA | NA | NA | 0.1195139 | 0.1170446 | 0.1568373 |
53.0388 | 0.3675165 | 0.3534616 | 0.3584981 | 0.6067432 | 0.7399354 | 0.7080680 |
54.0085 | 0.0939327 | 0.0945617 | NA | 0.1446772 | 0.2594131 | 0.1919683 |
individual | date | |
---|---|---|
ind1-1.h5 | ind1 | 21/02/2019 11:52:49 |
ind1-2.h5 | ind1 | 27/02/2019 11:41:24 |
ind1-3.h5 | ind1 | 28/02/2019 11:28:29 |
ind2-1.h5 | ind2 | 07/02/2019 10:29:33 |
ind2-2.h5 | ind2 | 18/02/2019 11:06:52 |
ind2-3.h5 | ind2 | 22/02/2019 09:52:20 |
ion_mass | Background - ind1-1.h5 | Background - ind1-2.h5 | Background - ind1-3.h5 | Background - ind2-1.h5 | Background - ind2-2.h5 | Background - ind2-3.h5 | |
---|---|---|---|---|---|---|---|
51.0442 | 51.0442 | 0.0617072 | 0.0372129 | 0.0373825 | 0.0699982 | 0.0228573 | 0.1021184 |
52.0473 | 52.0473 | NA | NA | NA | 0.0095295 | 0.0055960 | 0.0102491 |
53.0029 | 53.0029 | 0.6821496 | 0.4545806 | 0.2022318 | 0.9540640 | 0.5317314 | 0.6168134 |
53.0138 | 53.0138 | NA | NA | NA | 0.1546882 | -0.0016491 | 0.1394140 |
53.0388 | 53.0388 | 0.3413211 | 0.3598626 | 0.2159814 | 0.1840640 | 0.3216666 | 0.3652356 |
54.0085 | 54.0085 | 0.1169927 | 0.0806165 | NA | 0.2357813 | 0.0953313 | 0.1249523 |
Note: The view
method from the ropls
package may be used to print these tables:
imputing
: Imputation of missing valuesTo impute missing values, ptairMS returns
back to the raw data, and performs the quantification with the same
method as detectPeak
but this time without any limit of
detection.
## ind1-1.h5 done
## ind1-2.h5 done
## ind1-3.h5 done
## ind2-1.h5 done
## ind2-2.h5 done
annotateVOC
: AnnotationptairMS provides putative annotations by
matching the measured ion masses to the Human Breathomics Database (Kuo et al. 2020). Applied to an
ExpressionSet
, it appends the feature metadata
(fData
) with new columns containing chemical information
(formulas, IUPAC name, InChI, etc.), as well as the isotopes for
nuclides C13, O17 and O18.
## vocDB_ion_mass vocDB_ion_formula vocDB_name_iupac
## 59.049 59.04914 [C3H6O+H]+ prop-2-en-1-ol, propan-2-one, propanal
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
## NOTE: You are sure that is the mass of an electrone?
ion_mass | Background - ind1-1.h5 | Background - ind1-2.h5 | Background - ind1-3.h5 | Background - ind2-1.h5 | Background - ind2-2.h5 | Background - ind2-3.h5 | vocDB_ion_mass | vocDB_ion_formula | vocDB_name_iupac | isotope | |
---|---|---|---|---|---|---|---|---|---|---|---|
51.0442 | 51.0442 | 0.0617072 | 0.0372129 | 0.0373825 | 0.0699982 | 0.0228573 | 0.1021184 | 53.0388/52.0473 | |||
52.0473 | 52.0473 | NA | NA | NA | 0.0095295 | 0.0055960 | 0.0102491 | NA | |||
53.0029 | 53.0029 | 0.6821496 | 0.4545806 | 0.2022318 | 0.9540640 | 0.5317314 | 0.6168134 | 54.0085 | |||
53.0138 | 53.0138 | NA | NA | NA | 0.1546882 | -0.0016491 | 0.1394140 | NA | |||
53.0388 | 53.0388 | 0.3413211 | 0.3598626 | 0.2159814 | 0.1840640 | 0.3216666 | 0.3652356 | 53.03857 | [C4H4+H]+ | but-1-en-3-yne | NA |
54.0085 | 54.0085 | 0.1169927 | 0.0806165 | NA | 0.2357813 | 0.0953313 | 0.1249523 | NA |
writeEset
: Export data and metadata to 3 tabular
filesFinally, the ExpressionSet
can be exported to 3
tabulated files ‘dataMatrix.tsv’, sampleMetadata.tsv’, and
‘variableMetadata.tsv’:
The ExpressionSet
object is now ready for subsequent
data analysis (e.g. data mining, classification or feature selection)
with the many R packages.
As an example, we describe here how to perform Exploratory Data Analysis with the ropls package (Thevenot 2015).
As a preliminary step, log transformation is often used in Mass Spectrometry to stabilize the variance:
Then, the data and metadata may be printed and plotted with the
view
method:
To avoid redundancy, the isotopes may be discarded:
isotopes<-Biobase::fData(exhaledEset)[,"isotope"]
isotopes<-isotopes[!is.na(isotopes)]
exhaledEset <- exhaledEset[!(Biobase::fData(exhaledEset)[, "ion_mass"] %in% isotopes), ]
Principal Component Analysis may then be performed (due to the limited number of samples, cross-validation is decreased to 5 instead of 7):
exhaledPca<-ropls::opls(exhaledEset,crossvalI=5,info.txtC="none",fig.pdfC="none")
ropls::plot(exhaledPca, parAsColFcVn=Biobase::pData(exhaledEset)[, "individual"],typeVc="x-score")
Subsequent supervised analysis, e.g. (Orthogonal) Partial Least Squares, or feature selection may be performed with the ropls and biosigner packages, respectively.
To get the most important variable that contribute to the dimension that discriminate the two individual (here the first dimension), we look at the loadings:
knitr::kable(Biobase::fData(exhaledEset)[names(sort(abs(load1),decreasing = TRUE)[1:10]),c("vocDB_ion_formula","vocDB_name_iupac")])
vocDB_ion_formula | vocDB_name_iupac | |
---|---|---|
110.9705 | ||
53.0388 | [C4H4+H]+ | but-1-en-3-yne |
67.0542 | [C5H6+H]+ | (E)-pent-3-en-1-yne, 2-methylbut-1-en-3-yne, cyclopenta-1,3-diene, pent-3-en-1-yne |
135.1139 | [C10H14+H]+ | 1,2,3,4-tetramethylbenzene, 1,2,3,5-tetramethylbenzene, 1,2,4,5-tetramethylbenzene, 1,3-diethylbenzene, 1,4-diethylbenzene, 1-ethyl-2,3-dimethylbenzene, 1-ethyl-2,4-dimethylbenzene, 1-ethyl-3,5-dimethylbenzene, 1-methyl-2-propan-2-ylbenzene, 1-methyl-2-propylbenzene, 1-methyl-3-propan-2-ylbenzene, 1-methyl-4-propan-2-ylbenzene, 1-methyl-4-propylbenzene, 2-ethyl-1,4-dimethylbenzene, 2-methyl-5-prop-1-en-2-ylcyclohexa-1,3-diene, 4-ethyl-1,2-dimethylbenzene, butylbenzene |
69.0704 | [C5H8+H]+ | (3E)-penta-1,3-diene, (3Z)-penta-1,3-diene, 2-methylbuta-1,3-diene, 3-methylbuta-1,2-diene, pent-2-yne, penta-1,2-diene, penta-1,4-diene |
129.127 | [C8H16O+H]+ | 2-ethylhexanal, 3-methylheptan-2-one, 6-methylheptan-2-one, 6-methylheptan-3-one, octan-2-one, octan-3-one, octanal |
109.0652 | [C7H8O+H]+ | 2-prop-2-enylfuran, 3-methylphenol, 4-methylphenol, phenylmethanol |
52.0473 | ||
63.0443 | ||
143.0969 |
We then could plot the raw data around this masses to check the robustness of these potential marker:
The package can also be used to process data from headspace analysis (e.g., cell culture headspace). The acquisition phase is detected on the total ion trace, in a similar way as for exhaled air. The mycobacteria dataset contains two replicates from two bacterial species and one control (culture medium). The 6 corresponding raw files are available in the companion ptairData package.
dir <- system.file("extdata/mycobacteria", package = "ptairData")
mycobacteriaSet <- createPtrSet(dir = dir, setName = "test",
mzCalibRef = c(21.022,59.049))
## Control1.h5 check
## Control2.h5 check
## Specie-a1.h5 check
## Specie-a2.h5 check
## specie-b1.h5 check
## specie-b2.h5 check
## Control1.h5 : 36 peaks detected
## Control2.h5 : 38 peaks detected
## Specie-a1.h5 : 38 peaks detected
## Specie-a2.h5 : 39 peaks detected
## specie-b1.h5 : 51 peaks detected
## specie-b2.h5 : 41 peaks detected
## 30 peaks aligned
## specie-b1.h5 done
## specie-b2.h5 done
To read and access the entire raw data of a single file, use the
readRaw
function. It opens the data from an absolute file
path with the rhdf5
library, and returns a ptrRaw
object containing the raw
data matrix, the time axis, the m/z axis obtained after calibration if
calibTIS = TRUE
, and additional information contained in
the h5 file (transmission curve, drift temperature, …).
As an example, we get the absolute file path for the first raw file:
dirRaw <- system.file("extdata/exhaledAir", package = "ptairData")
samplePath<-list.files(dirRaw,recursive = TRUE,full.names = TRUE,pattern = ".h5$")[1]
To read the file:
## /github/workspace/pkglib/ptairData/extdata/exhaledAir/ind1/ind1-1.h5
## mz range: 20.4 - 150.7
## time range: 0 - 49
## Calibration error in ppm:
## No calibration performs
At this stage, the same methods as with the ptrSet
object can be used: calibration
, timeLimits
,
plotCalib
, plotTIC
, plotRaw
, and
detectPeak
.
This research was conducted by Camille Roquencourt, with the contributions from Paul Zheng, Pierrick Roger-Mele and Etienne A. Thevenot (Data Sciences for Deep Phenotyping and Precision Medicine team at CEA), in collaboration with Stanislas Grassin-Delyle, Helene Salvator, Emmanuel Naline, Philippe Devillier and Louis-Jean Couderc (Exhalomics platform at the Foch Hospital) within the SoftwAiR project funded by the Agence Nationale de la Recherche (ANR-18-CE45-0017 grant).
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ptairData_1.14.0 ptairMS_1.15.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] MultiDataSet_1.35.0 gridExtra_2.3
## [3] rlang_1.1.4 magrittr_2.0.3
## [5] matrixStats_1.4.1 compiler_4.4.2
## [7] vctrs_0.6.5 stringr_1.5.1
## [9] crayon_1.5.3 pkgconfig_2.0.3
## [11] fastmap_1.2.0 XVector_0.47.0
## [13] backports_1.5.0 labeling_0.4.3
## [15] utf8_1.2.4 ropls_1.39.0
## [17] promises_1.3.0 rmarkdown_2.29
## [19] UCSC.utils_1.3.0 purrr_1.0.2
## [21] bit_4.5.0 MultiAssayExperiment_1.33.0
## [23] xfun_0.49 zlibbioc_1.52.0
## [25] cachem_1.1.0 GenomeInfoDb_1.43.1
## [27] jsonlite_1.8.9 later_1.3.2
## [29] DelayedArray_0.33.2 rhdf5filters_1.19.0
## [31] Rhdf5lib_1.29.0 broom_1.0.7
## [33] parallel_4.4.2 cluster_2.1.6
## [35] R6_2.5.1 bslib_0.8.0
## [37] stringi_1.8.4 limma_3.63.2
## [39] car_3.1-3 rpart_4.1.23
## [41] GenomicRanges_1.59.1 jquerylib_0.1.4
## [43] Rcpp_1.0.13-1 SummarizedExperiment_1.37.0
## [45] iterators_1.0.14 knitr_1.49
## [47] base64enc_0.1-3 IRanges_2.41.1
## [49] Matrix_1.7-1 httpuv_1.6.15
## [51] splines_4.4.2 nnet_7.3-19
## [53] tidyselect_1.2.1 rstudioapi_0.17.1
## [55] abind_1.4-8 yaml_2.3.10
## [57] doParallel_1.0.17 codetools_0.2-20
## [59] minpack.lm_1.2-4 lattice_0.22-6
## [61] tibble_3.2.1 plyr_1.8.9
## [63] Biobase_2.67.0 shiny_1.9.1
## [65] withr_3.0.2 evaluate_1.0.1
## [67] signal_1.8-1 foreign_0.8-87
## [69] pillar_1.9.0 BiocManager_1.30.25
## [71] ggpubr_0.6.0 MatrixGenerics_1.19.0
## [73] carData_3.0-5 checkmate_2.3.2
## [75] foreach_1.5.2 stats4_4.4.2
## [77] generics_0.1.3 S4Vectors_0.45.2
## [79] ggplot2_3.5.1 munsell_0.5.1
## [81] scales_1.3.0 calibrate_1.7.7
## [83] chron_2.3-61 xtable_1.8-4
## [85] glue_1.8.0 shinyscreenshot_0.2.1
## [87] Hmisc_5.2-0 maketools_1.3.1
## [89] tools_4.4.2 sys_3.4.3
## [91] data.table_1.16.2 enviPat_2.6
## [93] ggsignif_0.6.4 buildtools_1.0.0
## [95] cowplot_1.1.3 rhdf5_2.51.0
## [97] grid_4.4.2 tidyr_1.3.1
## [99] colorspace_2.1-1 GenomeInfoDbData_1.2.13
## [101] htmlTable_2.4.3 Formula_1.2-5
## [103] cli_3.6.3 fansi_1.0.6
## [105] S4Arrays_1.7.1 dplyr_1.1.4
## [107] gtable_0.3.6 rstatix_0.7.2
## [109] sass_0.4.9 digest_0.6.37
## [111] BiocGenerics_0.53.3 SparseArray_1.7.2
## [113] htmlwidgets_1.6.4 farver_2.1.2
## [115] htmltools_0.5.8.1 lifecycle_1.0.4
## [117] httr_1.4.7 statmod_1.5.0
## [119] mime_0.12 qqman_0.1.9
## [121] bit64_4.5.2 MASS_7.3-61
createPtrSet
:
Checking raw data and setting parametersupdatePtrSet
:
Updating the ptrSetdetectPeak
:
Peak detection and quantificationptrSet
peak lists with detectPeak
alignSamples
:
Aligning features between samplesimputing
:
Imputation of missing valuesannotateVOC
:
AnnotationwriteEset
:
Export data and metadata to 3 tabular files