## Warning: multiple methods tables found for 'union'
## Warning: multiple methods tables found for 'intersect'
## Warning: multiple methods tables found for 'setdiff'
## Warning: multiple methods tables found for 'setequal'
Cardinal 3.6 is a major update with breaking changes. It bring support many of the new low-level signal processing functions implemented for matter 2.4 and matter 2.6. Almost the entire Cardinal codebase has been refactored to support these improvements.
The most notable of the new features include:
Redesign class hierarchy that includes a greater emphasis on
spectra: SpectralImagingData
,
SpectralImagingArrays
, and
SpectralImagingExperiment
lay the groundwork for the new
data structures
Updated MSImagingExperiment
class with a new
counterpart MSImagingArrays
class for better representing
raw spectra.
New spectral processing methods in smooth()
:
New spectral baseline reduction methods in
reduceBaseline()
:
New spectral alignment methods in recalibrate()
:
New peak picking methods in peakPick()
:
Improved image()
contrast enhancement via
enhance
:
Improved image()
spatial smoothing via
smooth
:
All statistical methods improved and updated
crossValidate()
methodNMF()
PCA()
and spatialFastmap()
PLS()
and OPLS()
with new
algorithmsspatialKMeans()
with better
initializationsspatialShrunkenCentroids()
with better
initializationsspatialDGMM()
with improved stabilitymeansTest()
with improved data preparationSpatialResults
output with simplified
interfaceAnd many other updates! Many redundant functions and methods have been merged to simplify and streamline workflows. Many unnecessary functions and methods have been deprecated.
Major improvements from earlier versions are further described below.
Cardinal 3 lays the groundwork for future improvements to the existing toolbox of pre-processing, visualization, and statistical methods for mass spectrometry (MS) imaging experiments. Cardinal has been updated to support matter 2, and legacy support has been dropped.
Despite minimal user-visible changes in Cardinal (at first), the entire matter package that provides the backend for Cardinal’s computing on larger-than-memory MS imaging datasets has been rewritten. This should provide more robust support for larger-than-memory computations, as well as greater flexibility in handling many data files in the future.
Further changes will be coming soon to Cardinal 3 in future point updates that are aimed to greatly improve the user experience and simplify the code that users need to write to process and analyze MS imaging data.
Major improvements from earlier versions are further described below.
Cardinal 2 provides new classes and methods for the manipulation, transformation, visualization, and analysis of imaging experiments–specifically MS imaging experiments.
MS imaging is a rapidly advancing field with consistent improvements in instrumentation for both MALDI and DESI imaging experiments. Both mass resolution and spatial resolution are steadily increasing, and MS imaging experiments grow increasingly complex.
The first version of Cardinal was written with certain assumptions about MS imaging data that are no longer true. For example, the basic assumption that the raw spectra can be fully loaded into memory is unreasonable for many MS imaging experiments today.
Cardinal 2 was re-written from the ground up to handle the evolving needs of high-resolution MS imaging experiments. Some advancements include:
New imaging experiment classes such as
ImagingExperiment
, SparseImagingExperiment
,
and MSImagingExperiment
provide better support for
out-of-memory datasets and a more flexible representation of complex
experiments
New imaging metadata classes such as
PositionDataFrame
and MassDataFrame
make it
easier to manipulate experimental runs, pixel coordinates, and
m/z-values by storing them as separate slots rather than
ordinary columns
New plot()
and image()
visualization
methods that can handle non-gridded pixel coordinates and allow
assigning the resulting plot (and data) to a variable for later
re-plotting
Support for writing imzML in addition to reading it; more options and support for importing out-of-memory imzML for both “continuous” and “processed” formats
Data manipulation and summarization verbs such as
subset()
, aggregate()
, and
summarizeFeatures()
, etc. for easier subsetting and
summarization of imaging datasets
Delayed pre-processing via a new process()
method
that allows queueing of delayed pre-processing methods such as
normalize()
and peakPick()
for later
execution
Parallel processing support via the BiocParallel package
for all pre-processing methods and any statistical analysis methods with
a BPPARAM
option
Classes from older versions of Cardinal should be coerced to their
Cardinal 2 equivalents. For example, to return an updated
MSImageSet
object called x
, use
as(x, "MSImagingExperiment")
.
Cardinal can be installed via the BiocManager package.
The same function can be used to update Cardinal and other Bioconductor packages.
Once installed, Cardinal can be loaded with
library()
:
Cardinal natively supports reading and writing imzML (both
“continuous” and “processed” types) and Analyze 7.5 formats via the
readMSIData()
and writeMSIData()
functions.
The imzML format is an open standard designed specifically for interchange of mass spectrometry imaging datasets. Vendor-specific raw formats can be converted to imzML with the help of free applications available online at .
We can read an example of a “continuous” imzML file from the
CardinalIO
package:
## [1] "/github/workspace/pkglib/CardinalIO/extdata/Example_Continuous_imzML1.1.1/Example_Continuous.imzML"
## MSImagingExperiment with 8399 features and 9 spectra
## spectraData(1): intensity
## featureData(1): mz
## pixelData(3): x, y, run
## coord(2): x = 1...3, y = 1...3
## runNames(1): Example_Continuous
## experimentData(14): spectrumType, spectrumRepresentation, contactName, ..., scanType, lineScanDirection, pixelSize
## mass range: 100.0833 to 799.9167
## centroided: FALSE
A “continuous” imzML file contains mass spectra where all of the
spectra have the same m/z values. It is returned as an
MSImagingExperiment
object, which contains both the spectra
and the experimental metadata.
We can also read an example of a “processed” imzML file from the
CardinalIO
package:
## [1] "/github/workspace/pkglib/CardinalIO/extdata/Example_Processed_imzML1.1.1/Example_Processed.imzML"
## MSImagingArrays with 9 spectra
## spectraData(2): intensity, mz
## pixelData(3): x, y, run
## coord(2): x = 1...3, y = 1...3
## runNames(1): Example_Processed
## experimentData(14): spectrumType, spectrumRepresentation, contactName, ..., scanType, lineScanDirection, pixelSize
## centroided: FALSE
## continuous: FALSE
A “processed” imzML file contains mass spectra where each spectrum
has its own m/z values. Despite the name, it can still contain
profile spectra. For “processed” imzML, the data is returned as an
MSImagingArrays
object.
Cardinal 3.6 introduces a simple set of new data structures for organizing data from MS imaging experiments.
SpectraArrays
: Storage for high-throughput
spectra
SpectralImagingData
: Virtual container for spectral
imaging data, i.e., spectra with spatial metadata
MSImagingArrays
: Specializes
SpectralImagingData
(via
SpectralImagingExperiment
) for representing raw mass
spectra where each spectrum has its own m/z values
MSImagingExperiment
: Specializes
SpectralImagingData
(via
SpectralImagingArrays
) for representing mass spectra where
all spectra have the same m/z values
These are further explored in the next sections.
MSImagingArrays
: Mass spectra with differing
m/z valuesIn Cardinal, mass spectral data with differing m/z
values are stored in MSImagingArrays
objects.
## MSImagingArrays with 9 spectra
## spectraData(2): intensity, mz
## pixelData(3): x, y, run
## coord(2): x = 1...3, y = 1...3
## runNames(1): Example_Processed
## experimentData(14): spectrumType, spectrumRepresentation, contactName, ..., scanType, lineScanDirection, pixelSize
## centroided: FALSE
## continuous: FALSE
An MSImagingArrays
object is conceptually a list of mass
spectra with a companion data frame of spectrum-level pixel
metadata.
This dataset contains 9 mass spectra. It can be subset like a list:
## MSImagingArrays with 3 spectra
## spectraData(2): intensity, mz
## pixelData(3): x, y, run
## coord(2): x = 1...3, y = 1...1
## runNames(1): Example_Processed
## experimentData(14): spectrumType, spectrumRepresentation, contactName, ..., scanType, lineScanDirection, pixelSize
## centroided: FALSE
## continuous: FALSE
spectraData()
The spectral data can be accessed with
spectraData()
.
## SpectraArrays of length 2
## names(2): intensity mz
## class(2): matter_list matter_list
## length(2): <9> <9>
## real mem(2): 6.75 KB 6.75 KB
## shared mem(2): 0 KB 0 KB
## virtual mem(2): 302.37 KB 302.37 KB
The list of spectral data arrays are stored in a
SpectraArrays
object. An MSImagingArrays
object must have at least two arrays named “mz” and “intensity”, which
are lists of the m/z arrays and intensity arrays.
The spectra()
accessor can be used to access specific
spectra arrays.
## <9 length> matter_list :: out-of-core list
## [1] [2] [3] [4] [5] [6] ...
## $Scan=1 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=2 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=3 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=4 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=5 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=6 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## ...
## (6.75 KB real | 0 bytes shared | 302.37 KB virtual)
## <9 length> matter_list :: out-of-core list
## [1] [2] [3] [4] [5] [6] ...
## $Scan=1 0 0 0 0 0 0 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=2 0 0 0 0 0 0 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=3 0 0 0 0 0 0 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=4 0 0 0 0 0 0 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=5 0 0 0 0 0 0 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=6 0 0 0 0 0 0 ...
## ...
## (6.75 KB real | 0 bytes shared | 302.37 KB virtual)
Alternatively, we can use the mz()
and
intensity()
accessors to get these specific arrays.
## <9 length> matter_list :: out-of-core list
## [1] [2] [3] [4] [5] [6] ...
## $Scan=1 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=2 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=3 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=4 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=5 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=6 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## ...
## (6.75 KB real | 0 bytes shared | 302.37 KB virtual)
## <9 length> matter_list :: out-of-core list
## [1] [2] [3] [4] [5] [6] ...
## $Scan=1 0 0 0 0 0 0 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=2 0 0 0 0 0 0 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=3 0 0 0 0 0 0 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=4 0 0 0 0 0 0 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=5 0 0 0 0 0 0 ...
## [1] [2] [3] [4] [5] [6] ...
## $Scan=6 0 0 0 0 0 0 ...
## ...
## (6.75 KB real | 0 bytes shared | 302.37 KB virtual)
Note that the full spectra are not fully loaded into memory. Instead,
they are represented as out-of-memory matter
lists. For the
most part, these lists can be treated as ordinary R lists, but the
spectra are only loaded from storage on-the-fly as they are
accessed.
pixelData()
The spectrum-level pixel metadata can be accessed with
pixelData()
. Alternatively, pData()
is a
shorter alias that does the same thing.
## PositionDataFrame with 9 rows and 3 columns
## x y run
## <numeric> <numeric> <factor>
## Scan=1 1 1 Example_Processed
## Scan=2 2 1 Example_Processed
## Scan=3 3 1 Example_Processed
## Scan=4 1 2 Example_Processed
## Scan=5 2 2 Example_Processed
## Scan=6 3 2 Example_Processed
## Scan=7 1 3 Example_Processed
## Scan=8 2 3 Example_Processed
## Scan=9 3 3 Example_Processed
## coord(2): x, y
## run(1): run
## PositionDataFrame with 9 rows and 3 columns
## x y run
## <numeric> <numeric> <factor>
## Scan=1 1 1 Example_Processed
## Scan=2 2 1 Example_Processed
## Scan=3 3 1 Example_Processed
## Scan=4 1 2 Example_Processed
## Scan=5 2 2 Example_Processed
## Scan=6 3 2 Example_Processed
## Scan=7 1 3 Example_Processed
## Scan=8 2 3 Example_Processed
## Scan=9 3 3 Example_Processed
## coord(2): x, y
## run(1): run
The pixel metadata is stored in a PositionDataFrame
,
with a row for each mass spectrum in the dataset. This data frame stores
position information, run information, and all other spectrum-level
metadata.
The coord()
accessor retrieves the columns giving the
positions of the spectra.
## DataFrame with 9 rows and 2 columns
## x y
## <numeric> <numeric>
## Scan=1 1 1
## Scan=2 2 1
## Scan=3 3 1
## Scan=4 1 2
## Scan=5 2 2
## Scan=6 3 2
## Scan=7 1 3
## Scan=8 2 3
## Scan=9 3 3
Use runNames()
to access the names of the experimental
runs (by default set to the file name) and run()
to access
the run for each spectrum.
## [1] "Example_Processed"
## [1] Example_Processed Example_Processed Example_Processed Example_Processed
## [5] Example_Processed Example_Processed
## Levels: Example_Processed
This data frame is also used to store any other spectrum-level metadata or statistical summaries.
Visualization of mass spectra and molecular ion images is vital for
exploratory analysis of MS imaging experiments. Cardinal
provides plot()
methods for plotting mass spectra and
aimage()
methods for plotting images.
We will use simulated data for visualization. We will create versions
of the dataset represented as both MSImagingArrays
and
MSImagingExperiment
.
# Simulate an MSImagingExperiment
set.seed(2020, kind="L'Ecuyer-CMRG")
mse <- simulateImage(preset=6, dim=c(32,32), baseline=0.5)
mse
## MSImagingExperiment with 3879 features and 2048 spectra
## spectraData(1): intensity
## featureData(1): mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(1): design
## mass range: 462.3758 to 2181.0856
## centroided: FALSE
## MSImagingArrays with 2048 spectra
## spectraData(2): intensity, mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(1): design
## centroided: FALSE
## continuous: TRUE
plot()
Use plot()
to plot mass spectra from a
MSImagingArrays
or MSImagingExperiment
object.
Below we plot the 463rd and 628th mass spectra in the dataset.
Alternatively, we can specify the coordinates.
We can use superpose
to overlay the mass spectra and
xlim
to control the mass range.
image()
Use image()
to plot ion images from
MSImagingExperiment
. Below we plot the image for the
2,489th m/z value.
Alternatively, we can specify the m/z value. The closest matching m/z value will be used.
Use tolerance
to sum together the images for a all
m/z values within a certain tolerance.
By default, images from all experimental runs are plotted. Use
run
to specify specific runs to plot by name or index.
Alternatively, use subset
to plot an arbitrary subset of
pixels.
Multiplicative variance in spectral intensities can cause images to be noisy and dark due to hot spots.
Often, images may require some type of processing and enhancement to improve interpretation.
Multiple images can be superposed with superpose=TRUE
.
Use scale=TRUE
to rescale all images to the same intensity
range.
Use selectROI()
to select regions-of-interest on an ion
image. It is important to specify a subset so that selection is only
made on a single experimental run, otherwise results may be unexpected.
The form of selectROI()
is the same as
image()
.
sampleA <- selectROI(mse, mz=1003.3, subset=run(mse) == "run0")
sampleB <- selectROI(mse, mz=1003.3, subset=run(mse) == "run1")
selectROI()
returns a logical vector specifying which
pixels from the imaging experiment are contained in the selected
region.
makeFactor()
can then be used to combine multiple
logical vectors (e.g., from selectROI()
) into a single
factor.
Plots and images can be saved to a file by using R’s built-in graphics devices.
Use pdf()
to initialize a PDF graphics device, create
the plot, and then use dev.off()
to turn off the graphics
device.
Any plots printed while the graphics device is active will be saved to the specified file(s).
pdffile <- tempfile(fileext=".pdf")
pdf(file=pdffile, width=9, height=4)
image(mse, mz=1003.3)
dev.off()
Graphics devices for png()
, jpeg()
,
bmp()
, and tiff()
are also available. See
their documentation for usage.
While many software for MS imaging data use a light-on-dark theme,
Cardinal uses a dark-on-light theme by default. However, a dark
theme is also provided with style="dark
.
While plotting spectra should typically be fast, plotting images can be be (much) slower for large out-of-memory datasets.
This is due to the way the spectra are stored in imzML and Analyze files. Exracting the images simply takes longer than reading the spectra.
For the fastest visualization of images, experiments should be coerced to an in-memory matrix.
Also note that all Cardinal visualizations produce a
plot()
-able object that can be assigned to a variable and
plot()
-ed later without the need to read the data again.
Some parameters can even be updated this way, such as smoothing,
contrast enhancement, and scaling.
This is useful for re-creating or updating plots without accessing the data again.
MSImagingArrays
and MSImagingExperiment
can
be subsetted using the [
operator.
When subsetting MSImagingArrays
, the object is treated
as a list of mass spectra.
## MSImagingArrays with 5 spectra
## spectraData(2): intensity, mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...5, y = 1...1
## runNames(1): runA1
## metadata(1): design
## centroided: FALSE
## continuous: TRUE
When subsetting MSImagingExperiment
, the “rows” are the
flattened images, and the “columns” are the mass spectra.
## MSImagingExperiment with 10 features and 5 spectra
## spectraData(1): intensity
## featureData(1): mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...5, y = 1...1
## runNames(1): runA1
## metadata(1): design
## mass range: 462.3758 to 464.0434
## centroided: FALSE
Subsetting the dataset this way requires knowing the desired row and column indices.
Use features()
to find row indices of a
MSImagingExperiment
corresponding to specific m/z
values or other feature metadata.
## [1] 1938
## [1] 1935 1936 1937 1938 1939
Use pixels()
to find indices of
MSImagingArrays
or column indices of
MSImagingExperiment
that correspond to specific mass
spectra based on coordinates or other metadata.
# get column indices corresponding to x = 10, y = 10 in all runs
pixels(mse, coord=list(x=10, y=10))
## [1] 298 1322
# get column indices corresponding to x <= 3, y <= 3 in "runA1"
pixels(mse, x <= 3, y <= 3, run == "runA1")
## [1] 1 2 3 33 34 35 65 66 67
These methods can be used to determine row/column indices of particular m/z-values or pixel coordinates to use for subsetting.
fid <- features(mse, 1002 < mz, mz < 1004)
pid <- pixels(mse, x <= 3, y <= 3, run == "runA1")
mse[fid, pid]
## MSImagingExperiment with 5 features and 9 spectra
## spectraData(1): intensity
## featureData(1): mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...3, y = 1...3
## runNames(1): runA1
## metadata(1): design
## mass range: 1002.225 to 1003.830
## centroided: FALSE
subset()
and friendsAlternatively, subset()
can be used to subset MS imaging
datasets based on metadata.
For MSImagingArrays
, subset()
takes a
single argument specifying the pixels (i.e., the mass spectra).
## MSImagingArrays with 192 spectra
## spectraData(2): intensity, mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...3, y = 1...32
## runNames(2): runA1, runB1
## metadata(1): design
## centroided: FALSE
## continuous: TRUE
For MSImagingExperiment
, subset()
takes a
two arguments specifying both the features and the pixels.
## MSImagingExperiment with 5 features and 192 spectra
## spectraData(1): intensity
## featureData(1): mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...3, y = 1...32
## runNames(2): runA1, runB1
## metadata(1): design
## mass range: 1002.225 to 1003.830
## centroided: FALSE
We can also use subsetFeatures()
and
subsetPixels()
if subsetting an
MSImagingExperiment
.
## MSImagingExperiment with 5 features and 2048 spectra
## spectraData(1): intensity
## featureData(1): mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(1): design
## mass range: 1002.225 to 1003.830
## centroided: FALSE
## MSImagingExperiment with 3879 features and 18 spectra
## spectraData(1): intensity
## featureData(1): mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...3, y = 1...3
## runNames(2): runA1, runB1
## metadata(1): design
## mass range: 462.3758 to 2181.0856
## centroided: FALSE
MSImagingExperiment
represents the data as a matrix,
where each column is a mass spectrum, rather than as a true “data cube”.
This is typically simpler when operating on the mass spectra, and more
space efficient when the data is non-rectangular.
Sometimes, however, it is useful to index into the data as an actual “data cube”, with explicit array dimensions for each spatial dimension.
Use sliceImage()
to slice an
MSImagingExperiment
as a data cube and extract images.
## x y run
## 32 32 2
Because we only sliced a single image, the first 2 dimensions are the
spatial dimensions and the 3rd dimension are the experimental runs. We
can use drop=FALSE
to indicate we want to preserve the
feature dimension even for a single image.
## x y run feature
## 32 32 2 1
Note that when plotting images from raw arrays, the images are
upside-down due to differing coordinate conventions used by
graphics::image()
.
Because MSImagingExperiment
is matrix-like,
rbind()
and cbind()
can be used to combine
multiple MSImagingExperiment
objects by “row” or “column”,
assumping certain conditions are met.
Use cbind()
to combine datasets from different
experimental runs. The m/z-values must match between all
datasets to succesfully combine them.
# divide dataset into separate objects for each run
mse_run0 <- mse[,run(mse) == "runA1"]
mse_run1 <- mse[,run(mse) == "runB1"]
mse_run0
## MSImagingExperiment with 3879 features and 1024 spectra
## spectraData(1): intensity
## featureData(1): mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(1): runA1
## metadata(1): design
## mass range: 462.3758 to 2181.0856
## centroided: FALSE
## MSImagingExperiment with 3879 features and 1024 spectra
## spectraData(1): intensity
## featureData(1): mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(1): runB1
## metadata(1): design
## mass range: 462.3758 to 2181.0856
## centroided: FALSE
## MSImagingExperiment with 3879 features and 2048 spectra
## spectraData(1): intensity
## featureData(1): mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(2): design, design
## mass range: 462.3758 to 2181.0856
## centroided: FALSE
Some processing may be necessary to ensure datasets are compatible before combining them.
Most components of an MSImagingExperiment
that can be
accessed through getter functions like spectraData()
,
featureData()
, and pixelData()
can also be
re-assigned with analogous setter functions. These can likewise be used
to get and set columns of the pixel and feature metadata.
Note that pData()
and fData()
are aliases
for pixelData()
and featureData()
,
respectively. The $
operator will access the corresponding
columns of pixelData()
.
Here, we use makeFactor()
to create a new pixel metadata
columns.
## PositionDataFrame with 2048 rows and 9 columns
## x y run square1 square2 circleA circleB
## <integer> <integer> <factor> <logical> <logical> <logical> <logical>
## 1 1 1 runA1 FALSE FALSE FALSE FALSE
## 2 2 1 runA1 FALSE FALSE FALSE FALSE
## 3 3 1 runA1 FALSE FALSE FALSE FALSE
## 4 4 1 runA1 FALSE FALSE FALSE FALSE
## 5 5 1 runA1 FALSE FALSE FALSE FALSE
## ... ... ... ... ... ... ... ...
## 2044 28 32 runB1 FALSE TRUE FALSE FALSE
## 2045 29 32 runB1 FALSE TRUE FALSE FALSE
## 2046 30 32 runB1 FALSE TRUE FALSE FALSE
## 2047 31 32 runB1 FALSE TRUE FALSE FALSE
## 2048 32 32 runB1 FALSE TRUE FALSE FALSE
## condition region
## <factor> <factor>
## 1 A NA
## 2 A NA
## 3 A NA
## 4 A NA
## 5 A NA
## ... ... ...
## 2044 B other
## 2045 B other
## 2046 B other
## 2047 B other
## 2048 B other
## coord(2): x, y
## run(1): run
Here, we create new feature metadata columns based on the design of the simulated data.
fData(mse)$region <- makeFactor(
circle=mz(mse) > 1000 & mz(mse) < 1250,
square=mz(mse) < 1000 | mz(mse) > 1250)
fData(mse)
## MassDataFrame with 3879 rows and 2 columns
## mz region
## <numeric> <factor>
## 1 462.376 square
## 2 462.561 square
## 3 462.746 square
## 4 462.931 square
## 5 463.116 square
## ... ... ...
## 3875 2177.60 square
## 3876 2178.47 square
## 3877 2179.34 square
## 3878 2180.21 square
## 3879 2181.09 square
## mz(1): mz
Use spectra()
to access elements of the
spectraData()
list of spectra arrays by name or index. It
is possible to have multiple spectra arrays. Calling
spectra()
with no other arguments will get or set the first
element of spectraData()
. Providing a name or index will
get or set that element.
# create a new spectra matrix of log-intensities
spectra(mse, "log2intensity") <- log2(spectra(mse) + 1)
spectraData(mse)
## SpectraArrays of length 2
## names(2): intensity log2intensity
## class(2): matrix matrix
## dim(2): <3879 x 2048> <3879 x 2048>
## real mem(2): 63.56 MB 63.56 MB
## shared mem(2): 0 MB 0 MB
## virtual mem(2): 0 MB 0 MB
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.5891243 0.7066315 0.5849625 0.6498496 0.6965572
## [2,] 0.5845547 0.5845547 0.5845547 0.5845547 0.5845547
## [3,] 0.5841471 0.5841471 0.5841471 0.7135359 0.5841471
## [4,] 0.6285481 0.5837394 0.6467127 0.7211373 0.5837394
## [5,] 0.6613471 0.5833319 0.6701298 0.7298229 0.5833319
Whether or not the spectra have been centroided or not can be
accessed using centroided()
## [1] FALSE
This can also be used to set whether the spectra should be treated as centroided or not.
Cardinal provides functions for summarizing over the
features or pixels of an MSImagingExperiment
.
summarizeFeatures()
summarizes by feature (e.g.,
mean spectrum)
summarizePixels()
summarizes by pixel (e.g.,
TIC)
When applied to an MSImagingExperiment
, the summary
statistics are stored as new columns in featureData()
or
pixelData()
, respectively.
Below, we summarize and plot the mean specrum.
# calculate mean spectrum
mse <- summarizeFeatures(mse, stat="mean")
# mean spectrum stored in featureData
fData(mse)
## MassDataFrame with 3879 rows and 3 columns
## mz region mean
## <numeric> <factor> <numeric>
## 1 462.376 square 0.538982
## 2 462.561 square 0.539069
## 3 462.746 square 0.540151
## 4 462.931 square 0.539510
## 5 463.116 square 0.541830
## ... ... ... ...
## 3875 2177.60 square 0.0410674
## 3876 2178.47 square 0.0406372
## 3877 2179.34 square 0.0390261
## 3878 2180.21 square 0.0412461
## 3879 2181.09 square 0.0400341
## mz(1): mz
Below, we summarize and plot the total ion current.
## PositionDataFrame with 2048 rows and 10 columns
## x y run square1 square2 circleA circleB
## <integer> <integer> <factor> <logical> <logical> <logical> <logical>
## 1 1 1 runA1 FALSE FALSE FALSE FALSE
## 2 2 1 runA1 FALSE FALSE FALSE FALSE
## 3 3 1 runA1 FALSE FALSE FALSE FALSE
## 4 4 1 runA1 FALSE FALSE FALSE FALSE
## 5 5 1 runA1 FALSE FALSE FALSE FALSE
## ... ... ... ... ... ... ... ...
## 2044 28 32 runB1 FALSE TRUE FALSE FALSE
## 2045 29 32 runB1 FALSE TRUE FALSE FALSE
## 2046 30 32 runB1 FALSE TRUE FALSE FALSE
## 2047 31 32 runB1 FALSE TRUE FALSE FALSE
## 2048 32 32 runB1 FALSE TRUE FALSE FALSE
## condition region TIC
## <factor> <factor> <numeric>
## 1 A NA 655.277
## 2 A NA 619.358
## 3 A NA 630.155
## 4 A NA 633.181
## 5 A NA 627.942
## ... ... ... ...
## 2044 B other 856.771
## 2045 B other 873.250
## 2046 B other 860.006
## 2047 B other 851.561
## 2048 B other 830.818
## coord(2): x, y
## run(1): run
It is also possible to summarize over different groups.
Here, we summarize over different pixel groups and plot the resulting mean spectra.
# calculate mean spectrum
mse <- summarizeFeatures(mse, stat="mean", groups=mse$region)
# mean spectrum stored in featureData
fData(mse)
## MassDataFrame with 3879 rows and 6 columns
## mz region mean A.mean B.mean other.mean
## <numeric> <factor> <numeric> <numeric> <numeric> <numeric>
## 1 462.376 square 0.538982 0.532676 0.536587 0.540117
## 2 462.561 square 0.539069 0.535976 0.546470 0.541043
## 3 462.746 square 0.540151 0.543922 0.542662 0.538899
## 4 462.931 square 0.539510 0.539947 0.541177 0.538993
## 5 463.116 square 0.541830 0.540383 0.544032 0.541032
## ... ... ... ... ... ... ...
## 3875 2177.60 square 0.0410674 0.0364042 0.0425991 0.0418534
## 3876 2178.47 square 0.0406372 0.0404122 0.0368558 0.0384662
## 3877 2179.34 square 0.0390261 0.0388301 0.0477074 0.0381444
## 3878 2180.21 square 0.0412461 0.0381640 0.0407892 0.0417756
## 3879 2181.09 square 0.0400341 0.0412113 0.0429291 0.0424167
## mz(1): mz
Here, we summarize over different feature groups and plot the resulting images.
# calculate mean spectrum
mse <- summarizePixels(mse, stat="sum", groups=fData(mse)$region)
# mean spectrum stored in featureData
pData(mse)
## PositionDataFrame with 2048 rows and 12 columns
## x y run square1 square2 circleA circleB
## <integer> <integer> <factor> <logical> <logical> <logical> <logical>
## 1 1 1 runA1 FALSE FALSE FALSE FALSE
## 2 2 1 runA1 FALSE FALSE FALSE FALSE
## 3 3 1 runA1 FALSE FALSE FALSE FALSE
## 4 4 1 runA1 FALSE FALSE FALSE FALSE
## 5 5 1 runA1 FALSE FALSE FALSE FALSE
## ... ... ... ... ... ... ... ...
## 2044 28 32 runB1 FALSE TRUE FALSE FALSE
## 2045 29 32 runB1 FALSE TRUE FALSE FALSE
## 2046 30 32 runB1 FALSE TRUE FALSE FALSE
## 2047 31 32 runB1 FALSE TRUE FALSE FALSE
## 2048 32 32 runB1 FALSE TRUE FALSE FALSE
## condition region TIC circle.sum square.sum
## <factor> <factor> <numeric> <numeric> <numeric>
## 1 A NA 655.277 49.5216 605.756
## 2 A NA 619.358 39.7829 579.575
## 3 A NA 630.155 41.4789 588.676
## 4 A NA 633.181 40.1374 593.043
## 5 A NA 627.942 41.0550 586.887
## ... ... ... ... ... ...
## 2044 B other 856.771 165.707 691.064
## 2045 B other 873.250 173.984 699.266
## 2046 B other 860.006 170.695 689.311
## 2047 B other 851.561 161.945 689.615
## 2048 B other 830.818 157.879 672.939
## coord(2): x, y
## run(1): run
By default, Cardinal does not load the spectra from imzML and Analyze files into memory, but retrieves them from files when necessary. For very large datasets, this is necessary and memory-efficient.
However, for datasets that are known to fit in computer memory, this may be unnecessarily slow, especially when plotting images (which are perpendicular to how data are stored in the files).
## <8399 row x 9 col> matter_mat :: out-of-core double matrix
## Scan=1 Scan=2 Scan=3 Scan=4 Scan=5 Scan=6 ...
## [1,] 0 0 0 0 0 0 ...
## [2,] 0 0 0 0 0 0 ...
## [3,] 0 0 0 0 0 0 ...
## [4,] 0 0 0 0 0 0 ...
## [5,] 0 0 0 0 0 0 ...
## [6,] 0 0 0 0 0 0 ...
## ... ... ... ... ... ... ... ...
## (7.16 KB real | 0 bytes shared | 302.37 KB virtual)
## SpectraArrays of length 1
## names(1): intensity
## class(1): matter_mat
## dim(1): <8399 x 9>
## real mem(1): 7.16 KB
## shared mem(1): 0 KB
## virtual mem(1): 302.37 KB
For MSImagingExperiment
, use as.matrix()
on
the spectra()
to load the spectra into memory as a dense
matrix.
# coerce spectra to an in-memory matrix
spectra(mse_tiny) <- as.matrix(spectra(mse_tiny))
spectraData(mse_tiny) # 'intensity' array is 'matrix' object
## SpectraArrays of length 1
## names(1): intensity
## class(1): matrix
## dim(1): <8399 x 9>
## real mem(1): 605.8 KB
## shared mem(1): 0 KB
## virtual mem(1): 0 KB
Use as()
to coerce between different data
representations.
Here, we coerce from MSImagingArrays
to
MSImagingExperiment
.
## MSImagingArrays with 2048 spectra
## spectraData(2): intensity, mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(1): design
## centroided: FALSE
## continuous: TRUE
## MSImagingExperiment with 3879 features and 2048 spectra
## spectraData(1): intensity
## featureData(1): mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(1): design
## mass range: 462.3758 to 2181.0856
## centroided: FALSE
We can also coerce from MSImagingExperiment
to
MSImagingArrays
.
## MSImagingExperiment with 3879 features and 2048 spectra
## spectraData(2): intensity, log2intensity
## featureData(6): mz, region, mean, A.mean, B.mean, other.mean
## pixelData(12): x, y, run, ..., TIC, circle.sum, square.sum
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(1): design
## mass range: 462.3758 to 2181.0856
## centroided: FALSE
## MSImagingArrays with 2048 spectra
## spectraData(2): intensity, mz
## pixelData(12): x, y, run, ..., TIC, circle.sum, square.sum
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(1): design
## centroided: FALSE
## continuous: TRUE
This will often change the underlying data representation, so some information may be lost depending on the coercion.
In practice, it’s rarely necessary to coerce between data
representations. Instead, it is more common to process the data into an
MSImagingExperiment
, as in the next section.
Cardinal provides a full suite of pre-processing methods.
Processing steps that are applied to mass spectra individually are be
queued to be applied in sequence later. Use process()
to
apply queued processing steps.
Spectral processing
normalize()
for normalizing mass spectra
smooth()
for smoothing mass spectra
reduceBaseline()
for baseline subtraction
recalibrate()
for recalibration of m/z
values
peakPick()
for peak detection and
summarization
Peak processing
peakAlign()
for aligning detected peaks
peakProcess()
for streamlined peak detection and
alignment
Other processing
bin()
for binning or resampling mass spectraWe will demonstrate how to apply these pre-processing steps in the following sections.
Use normalize()
to queue normalization on
MSImagingArrays
or MSImagingExperiment
.
The supported normalization methods include:
method="tic"
performs total-ion-current (TIC)
normalization
method="rms"
performs root-mean-square (RMS)
normalization
method="reference"
normalizes spectra to a reference
feature
TIC normalization is one of the most common normalization methods for mass spectrometry imaging. For comparison between datasets, TIC normalization requires that all spectra are the same length. RMS normalization is more appropriate when spectra are of different lengths.
Normalization to a reference is the most reliable form of normalization, but is only possible when the experiment contains a known reference peak with constant abundance throughout the dataset. This is often not possible in unsupervised and exploratory experiments.
(We won’t plot normalization, because it is simply re-scaling the intensities.)
Use smooth()
to queue smoothing on
MSImagingArrays
or MSImagingExperiment
.
p1 <- smooth(msa, method="gaussian") |>
plot(coord=list(x=16, y=16),
xlim=c(1150, 1450), linewidth=2)
p2 <- smooth(msa, method="sgolay") |>
plot(coord=list(x=16, y=16),
xlim=c(1150, 1450), linewidth=2)
matter::as_facets(list(p1, p2), nrow=2,
labels=c("Gaussian smoothing", "Savitsky-Golay smoothing"))
Note above we use |>
to chain together sequences of
functions, and we can use plot()
to preview the results of
queued spectral processing before applying it.
The supported smoothing methods include:
method="gaussian"
performs Gaussian
smoothing
method="bilateral"
performs bilateral
filtering
method="adaptive"
performs adaptive bilateral
filtering
method="diff"
performs nonlinear diffusion
method="guide"
performs guided filtering
method="pag"
performs peak-aware guided
filtering
method="sgolay"
performs Savitzky-Golay
smoothing
method="ma"
performs moving average
smoothing
Use reduceBaseline()
to queue baseline subtraction on
MSImagingArrays
or MSImagingExperiment
.
p1 <- reduceBaseline(mse, method="locmin") |>
plot(coord=list(x=16, y=16), linewidth=2)
p2 <- reduceBaseline(mse, method="median") |>
plot(coord=list(x=16, y=16), linewidth=2)
matter::as_facets(list(p1, p2), nrow=2,
labels=c("Local minima interpolation", "Running medians"))
The supported smoothing methods include:
method="locmin"
interpolates a baseline from local
minima
method="hull"
uses convex hull estimation
method="snip"
uses sensitive nonlinear iterative
peak (SNIP) clipping
method="median"
estimates a baseline from running
medians
Although peak alignment (to be discussed shortly) will generally account for small differences in m/z values between spectra, alignment of the profile spectra is sometimes desireable as well.
Use recalibrate()
to queue recalibration on
MSImagingArrays
or MSImagingExperiment
.
First, we need to simulate spectra that are visibly in need of calibration.
set.seed(2020, kind="L'Ecuyer-CMRG")
mse_drift <- simulateImage(preset=1, npeaks=10,
from=500, to=600, sdmz=750, units="ppm")
plot(mse_drift, i=186:195, xlim=c(535, 570),
superpose=TRUE, key=FALSE, linewidth=2)
To align the spectra, we need to provide a vector of reference
m/z values of expected peaks. Here, we will simply use the
peaks of the mean spectrum from
estimateReferencePeaks()
.
peaks_drift <- estimateReferencePeaks(mse_drift)
mse_nodrift <- recalibrate(mse_drift, ref=peaks_drift,
method="locmax", tolerance=1500, units="ppm")
mse_nodrift <- process(mse_nodrift)
plot(mse_nodrift, i=186:195, xlim=c(535, 570),
superpose=TRUE, key=FALSE, linewidth=2)
The supported recalibration methods include:
method="locmax"
uses local regression to shift the
spectra
method="dtw"
uses dynamic time warping
(DTW)
method="cow"
uses correlation optimized warping
(COW)
The algorithms will shift the spectrum to try to match local maxima
to the reference peaks. The maximum shift is given by
tolerance
. If tolerance
is too small, the
spectra may not be shifted enough. If tolerance
is too
large, the local maxima may be matched to the wrong reference peaks.
Peak processing encompasses multiple steps, including (1) peak detection, (2) aligning peaks across all spectra, (3) filtering peaks, and/or (4) extracting peaks from spectra based on a reference. Some of these steps are optional.
Use peakPick()
to queue peak picking on
MSImagingArrays
or MSImagingExperiment
.
p1 <- peakPick(msa_pre, method="diff", SNR=3) |>
plot(coord=list(x=16, y=16), linewidth=2)
p2 <- peakPick(msa_pre, method="filter", SNR=3) |>
plot(coord=list(x=16, y=16), linewidth=2)
matter::as_facets(list(p1, p2), nrow=2,
labels=c("Derivative-based SNR", "Dynamic filtering-based SNR"))
We use SNR
to designate the minimum signal-to-noise
threshold for the detected peaks.
The supported peak picking methods include:
method="diff"
estimates SNR from deviations between
the spectrum and a rolling average of its derivative
method="sd"
estimates SNR from the standard
deviation of the spectrum convolved with a wavelet
method="mad"
estimates SNR from the mean absolute
deviation of the spectrum convolved with a wavelet
method="quantile"
estimates SNR from a rolling
quantile of the difference between the original spectrum and a smoothed
spectrum
method="filter"
uses dynamic filtering to separate
peaks into signal peaks and noise peaks
method="cwt"
uses the continuous wavelet transform
(CWT)
Use peakAlign()
to align the detected peaks.
Note that peakAlign()
will automatically call
process()
if there are queued spectral processing
steps.
## MSImagingExperiment with 1292 features and 2048 spectra
## spectraData(1): intensity
## featureData(3): mz, count, freq
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(4): design, processing_20241113061608, processing_20241113061615, processing_20241113061615.1
## mass range: 463.1162 to 2180.2133
## centroided: TRUE
Note that the result is returned as an
MSImagingExperiment
now that all of the spectra have been
aligned to the same m/z values.
Peaks are aligned to candidate locations based on the the given
tolerance
. A set of reference peaks to use can be specified
via ref
.
In this case, no reference peaks are specified, so the candidate locations are generated automatically from the detected peaks. This may results in many extraneous peaks that need to be removed.
When peaks are aligned without a reference, peakAlign()
will return the count and frequency of each peak as count
and freq
columns in featureData()
.
## MassDataFrame with 1292 rows and 3 columns
## mz count freq
## <numeric> <numeric> <numeric>
## 1 463.116 130 0.0634766
## 2 463.672 92 0.0449219
## 3 464.229 93 0.0454102
## 4 464.786 93 0.0454102
## 5 465.344 94 0.0458984
## ... ... ... ...
## 1288 2169.77 103 0.0502930
## 1289 2172.38 109 0.0532227
## 1290 2174.99 119 0.0581055
## 1291 2177.60 93 0.0454102
## 1292 2180.21 133 0.0649414
## mz(1): mz
# filter to peaks with frequencies > 0.1
mse_filt <- subsetFeatures(mse_peaks, freq > 0.1)
fData(mse_filt)
## MassDataFrame with 13 rows and 3 columns
## mz count freq
## <numeric> <numeric> <numeric>
## 1 610.317 1172 0.572266
## 2 796.620 401 0.195801
## 3 1011.488 1427 0.696777
## 4 1041.043 1607 0.784668
## 5 1128.199 1643 0.802246
## ... ... ... ...
## 9 1247.85 1049 0.512207
## 10 1341.01 444 0.216797
## 11 1497.54 614 0.299805
## 12 1797.19 931 0.454590
## 13 1983.02 974 0.475586
## mz(1): mz
Here, we use subsetFeatures()
to subset the data to
include only peaks observed in more than 10% of the dataset. For this
dataset, that results in 37 peaks. (Note that the dataset was simulated
with 30 peaks.)
We can also use peakPick()
to queue peak summarization
based on a set of reference peaks.
msa_peaks2 <- peakPick(msa_pre, ref=mz(mse_filt), type="area",
tolerance=600, units="ppm")
mse_peaks2 <- process(msa_peaks2)
In this case, local peaks are matched to the reference peaks within
tolerance
. The peak is then expanded to the nearest local
minima in both directions. The intensity of the peak is then summarized
either by the maximum intensity (type="height")
or sum of
intensities (type="area")
.
peakProcess()
We can use peakProcess()
to streamline the most common
peak processing workflows.
Note that peakProcess()
will automatically call
process()
if there are queued spectral processing
steps.
mse_peaks3 <- peakProcess(msa_pre, method="diff", SNR=6,
sampleSize=0.3, filterFreq=0.02)
mse_peaks3
## MSImagingExperiment with 28 features and 2048 spectra
## spectraData(1): intensity
## featureData(3): mz, count, freq
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(3): design, processing_20241113061608, processing_20241113061618
## mass range: 513.6943 to 1983.3448
## centroided: TRUE
When sampleSize
is specified, peakProcess()
first performs peak picking and alignment on a subset of the spectra (as
specified by sampleSize
) to create a set of reference
peaks. Then, these reference peaks are summarized for every spectrum in
the full dataset.
The advantage of this approach is that all of the peaks detected in the sample spectra will be summarized for every spectrum. So it is less likely that there will be missing peaks due to low signal-to-noise ratio. However, the drawback is that rare peaks may be less likely to be detected if they are not in the sample.
Alternatively, using peakProcess()
without setting
sampleSize
will perform peak picking followed by peak
alignment as usual.
Use bin()
to bin an MSImagingArrays
or
MSImagingExperiment
dataset to an arbitrary resolution. The
binning is applied on-the-fly, whenever data is accessed.
## MSImagingExperiment with 1721 features and 2048 spectra
## spectraData(1): intensity
## featureData(1): mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(1): design
## mass range: 462 to 2182
## centroided: FALSE
Here, we bin a dataset to unit m/z resolution (i.e., spacing of 1 between m/z values).
Queueing processing steps makes it easy to chain together processing
steps with the |>
operator, and then apply them all at
once.
mse_queue <- msa |>
normalize() |>
smooth() |>
reduceBaseline() |>
peakPick(SNR=6)
# preview processing
plot(mse_queue, coord=list(x=16, y=16), linewidth=2)
# apply processing and align peaks
mse_proc <- peakAlign(mse_queue)
mse_proc <- subsetFeatures(mse_proc, freq > 0.1)
mse_proc
## MSImagingExperiment with 28 features and 2048 spectra
## spectraData(1): intensity
## featureData(3): mz, count, freq
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(3): design, processing_20241113061621, processing_20241113061622
## mass range: 513.6763 to 1983.3586
## centroided: TRUE
We can use writeMSIData()
to write
MSImagingArrays
or MSImagingExperiment
objects
to imzML files.
## [1] "filee907dbeb538.fdata" "filee907dbeb538.ibd"
## [3] "filee907dbeb538.imzML" "filee907dbeb538.log"
## [5] "filee907dbeb538.metadata" "filee907dbeb538.pdata"
By default, the “.imzML” and “.ibd” file are bundled into a directory
of the specified file name. Note that the featureData()
and
pixelData()
are also written to tab-delimited files (if
they contain more than the default metadata columns).
## MSImagingExperiment with 28 features and 2048 spectra
## spectraData(1): intensity
## featureData(3): mz, count, freq
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(3): design, processing_20241113061621, processing_20241113061622
## experimentData(3): spectrumType, spectrumRepresentation, instrumentModel
## mass range: 513.6763 to 1983.3586
## centroided: TRUE
When reading the data back in with readMSIData()
, the
bundled imzML directory is detected and the files are imported
appropriately. The additional featureData()
and
pixelData()
columns are imported as well.
All pre-processing methods and some statistical analysis methods in Cardinal can be executed in parallel using BiocParallel.
By default, no parallelization is used. This is for maximum stability and compatibility across all users.
BPPARAM
Any method that supports parallelization includes
BPPARAM
as an argument (see method documentation). The
BPPARAM
argument can be used to specify a parallel backend
for the operation, such as SerialParam()
,
MulticoreParam()
, SnowParam()
, etc.
Several parallelization backends are available, depending on OS:
SerialParam()
creates a serial (non-parallel)
backend. Use this to avoid potential issues caused by
parallelization.
MulticoreParam()
creates a multicore backend by
forking the current R session. This is typically the fastest
parallelization option, but is only available on macOS and
Linux.
SnowParam()
creates a SNOW backend by creating new R
sessions via socket connections. This is typically slower than
multicore, but is available on all platforms including Windows.
Use of MulticoreParam()
will frequently improve speed on
macOS and Linux dramatically. However, due to the extra overhead of
SnowParam()
, Windows users may prefer
SerialParam()
(no parallelization), depending on the size
of the dataset.
Available backends can be viewed with
BiocParallel::registered()
.
## $MulticoreParam
## class: MulticoreParam
## bpisup: FALSE; bpnworkers: 2; bptasks: 0; bpjobname: BPJOB
## bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
## bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
## bpexportglobals: TRUE; bpexportvariables: FALSE; bpforceGC: FALSE
## bpfallback: TRUE
## bplogdir: NA
## bpresultdir: NA
## cluster type: FORK
##
## $SnowParam
## class: SnowParam
## bpisup: FALSE; bpnworkers: 2; bptasks: 0; bpjobname: BPJOB
## bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
## bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
## bpexportglobals: TRUE; bpexportvariables: TRUE; bpforceGC: FALSE
## bpfallback: TRUE
## bplogdir: NA
## bpresultdir: NA
## cluster type: SOCK
##
## $SerialParam
## class: SerialParam
## bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
## bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
## bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
## bpexportglobals: FALSE; bpexportvariables: FALSE; bpforceGC: FALSE
## bpfallback: FALSE
## bplogdir: NA
## bpresultdir: NA
The current backend used by Cardinal can be viewed with
getCardinalBPPARAM()
:
## NULL
The default is NULL
, which means no parallelization.
A new default backend can be set for use with Cardinal by calling
setCardinalBPPARAM()
.
# register a SNOW backend
setCardinalBPPARAM(SnowParam(workers=2, progressbar=TRUE))
getCardinalBPPARAM()
## class: SnowParam
## bpisup: FALSE; bpnworkers: 2; bptasks: 2147483647; bpjobname: BPJOB
## bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
## bpRNGseed: ; bptimeout: NA; bpprogressbar: TRUE
## bpexportglobals: TRUE; bpexportvariables: TRUE; bpforceGC: FALSE
## bpfallback: TRUE
## bplogdir: NA
## bpresultdir: NA
## cluster type: SOCK
See the BiocParallel package documentation for more details on available parallel backends.
Statistical methods are documented in a separate vignette. See
vignette("Cardinal3-stats")
to read about statistical
methods in Cardinal.
More in-depth walkthroughs using real experimental data are available in the CardinalWorkflows package. You can install it using:
Once installed, CardinalWorkflows can be loaded with
library()
:
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## Random number generation:
## RNG: L'Ecuyer-CMRG
## Normal: Inversion
## Sample: Rejection
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Cardinal_3.9.0 S4Vectors_0.45.1 ProtGenerics_1.39.0
## [4] BiocGenerics_0.53.2 generics_0.1.3 BiocParallel_1.41.0
## [7] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] Matrix_1.7-1 EBImage_4.49.0 jsonlite_1.8.9
## [4] matter_2.9.0 compiler_4.4.2 BiocManager_1.30.25
## [7] Biobase_2.67.0 bitops_1.0-9 parallel_4.4.2
## [10] jquerylib_0.1.4 CardinalIO_1.5.0 png_0.1-8
## [13] yaml_2.3.10 fastmap_1.2.0 lattice_0.22-6
## [16] R6_2.5.1 knitr_1.49 htmlwidgets_1.6.4
## [19] ontologyIndex_2.12 fftwtools_0.9-11 maketools_1.3.1
## [22] bslib_0.8.0 tiff_0.1-12 rlang_1.1.4
## [25] cachem_1.1.0 xfun_0.49 sass_0.4.9
## [28] sys_3.4.3 cli_3.6.3 digest_0.6.37
## [31] grid_4.4.2 locfit_1.5-9.10 irlba_2.3.5.1
## [34] nlme_3.1-166 lifecycle_1.0.4 evaluate_1.0.1
## [37] codetools_0.2-20 buildtools_1.0.0 RCurl_1.98-1.16
## [40] abind_1.4-8 rmarkdown_2.29 tools_4.4.2
## [43] jpeg_0.1-10 htmltools_0.5.8.1