---
title: "Brings Orbitrap Mass Spectrometry Data to Life; Fast and Colorful"
author:
- name: Christian Trachsel
- name: Christian Panse
affiliation:
- &id Functional Genomics Center Zurich (FGCZ) - University of Zurich | ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics (SIB), Quartier Sorge - Batiment Amphipole, CH-1015 Lausanne, Switzerland
email: cp@fgcz.ethz.ch
- name: Tobias Kockmann
affiliation: *id
package: rawDiag
abstract: |
Optimizing liquid chromatography coupled to mass spectrometry (LC–MS) methods
presents a significant challenge. The 'rawDiag' package [@Trachsel2018],
accessible through `r BiocStyle::Biocpkg('rawDiag')`, streamlines method
optimization by generating MS operator-specific diagnostic plots based on
scan-level metadata. Tailored for use on the R shell or as a
`r BiocStyle::CRANpkg('shiny')` application on the Orbitrap instrument PC,
'rawDiag' leverages `r BiocStyle::Biocpkg('rawrr')` [@Kockmann2021] for
reading vendor proprietary instrument data.
Developed, rigorously tested, and actively employed at the
Functional Genomics Center Zurich ETHZ | UZH, 'rawDiag' stands as a robust
solution in advancing LC–MS Orbitrap method optimization."
output:
BiocStyle::html_document:
toc_float: true
bibliography: rawDiag.bib
vignette: >
%\usepackage[utf8]{inputenc}
%\VignetteIndexEntry{Brings Orbitrap Mass Spectrometry Data to Life; Fast and Colorful}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
urlcolor: blue
---
```{r style, results = 'asis'}
BiocStyle::markdown()
knitr::opts_chunk$set(fig.wide = TRUE, fig.retina = 3, error=FALSE, eval=TRUE)
```
```{r rawDiagLogo, out.width="50%", fig.cap="The octopussy `rawDiag` package logo by Lilly van de Venn."}
knitr::include_graphics("octopussy.png")
```
# Introduction
Over the past two decades, liquid chromatography coupled to mass
spectrometry (LC–MS) has evolved into the method of choice in the
field of proteomics. [@Cox2011; @Mallick2010] During a typical
LC–MS measurement, a complex mixture of analytes is separated by a
liquid chromatography system coupled to a mass spectrometer (MS)
through an ion source interface. This interface converts the analytes
that elute from the chromatography system over time into a beam of
ions. The MS records from this ion beam a series of mass spectra
containing detailed information on the analyzed sample. [@Savaryn2016]
The resulting raw data consist of the mass spectra and their metadata,
typically recorded in a vendor-specific binary format. During a
measurement the mass spectrometer applies internal heuristics, which
enables the instrument to adapt to sample properties, for example,
sample complexity and amount of ions in near real time. Still, method
parameters controlling these heuristics need to be set prior to the
measurement. Optimal measurement results require a careful balancing of
instrument parameters, but their complex interactions with each other
make LC–MS method optimization a challenging task.
Here we present `r BiocStyle::Biocpkg('rawDiag')`, a
platform-independent software tool implemented in the R language [@newS] that
supports LC–MS operators during the process of empirical method
optimization. Our work builds on the ideas of the discontinued software
*rawMeat* (VAST Scientific). Our application is currently
tailored toward spectral data acquired on Thermo Fisher Scientific
instruments (raw format), with a particular focus on Orbitrap
[@Zubarev2013] mass analyzers (Exactive or Fusion instruments). These
instruments are heavily used in the field of bottom-up proteomics
[@Aebersold2003] to analyze complex peptide mixtures derived from
enzymatic digests of proteomes.
`r BiocStyle::Biocpkg('rawDiag')` is meant to run after MS acquisition,
optimally as an interactive R shiny application, and produces a series
of diagnostic plots visualizing the impact of method parameter choices
on the acquired data across injections. If static reports are required
then pdf files can be generated using
`r BiocStyle::CRANpkg('rmarkdown')`. In this vignette, we present the
usage of our tool.
`r BiocStyle::Biocpkg('rawDiag')` gains advantages from being part of
the Bioconductor ecosystem, such as its ability to utilize the
`r BiocStyle::Biocpkg('rawrr')` package and potentially extend its
functionality through interaction with the `r BiocStyle::Biocpkg('Spectra')`
infrastructure, particularly with the
`r BiocStyle::Biocpkg('MsBackendRawFileReader')`.
# Requirements
`r BiocStyle::Biocpkg('rawDiag')` proides a wrapper function `readRaw` using the
`r BiocStyle::Biocpkg('rawrr')` methods `raw::readIndex`, `rawrr::readTrailer`,
and `rawrr::readChromatogram` to read proprietary mass spectrometer generated
data by invoking third-party managed methods through a `system2`
`text connection`.
The `r BiocStyle::Biocpkg('rawrr')` package provides the entire stack below,
which `r BiocStyle::Biocpkg('rawDiag')` utilizes.
`R>`
|
`text connection`
|
`system2`
|
Mono Runtime
|
Managed Assembly
(CIL/.NET code)
rawrr.exe
|
ThermoFisher.CommonCore.*.dll
|
# Install
## OS dependencies
### Linux (debian/ubuntu)
In case you prefer to compile `rawrr.exe` from C# source code, please install
the mono compiler and xbuild by installing the following Linux packages:
To install this package, start R (version ">=4.4") and enter:
```{r installRawrr, eval=FALSE}
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("rawrr")
```
assemblies aka Common Intermediate Language bytecode - the download and install
can be done on all platforms using the command:
```{r checkInstallRequirements}
rawDiag::checkRawrr
rawDiag::checkRawrr()
rawrr::installRawrrExe()
```
```{r listrawrrAssemblyPath}
rawrr::rawrrAssemblyPath()
rawrr::rawrrAssemblyPath() |> list.files()
```
for more information please read the INSTALL file in the
`r BiocStyle::Biocpkg('rawrr')` package.
# Usage
## R command line
### Input
fetch example Orbitrap raw files from
`r BiocStyle::Biocpkg('ExperimentHub')`'s `r BiocStyle::Biocpkg('tartare')`
package.
```{r fetchFromExperimentHub}
library(ExperimentHub)
ExperimentHub::ExperimentHub() -> eh
normalizePath(eh[["EH3222"]]) -> EH3222
normalizePath(eh[["EH4547"]]) -> EH4547
(rawfileEH3222 <- paste0(EH3222, ".raw"))
if (!file.exists(rawfileEH3222)){
file.copy(EH3222, rawfileEH3222)
}
(rawfileEH4547 <- paste0(EH4547, ".raw"))
if (!file.exists(rawfileEH4547)){
file.copy(EH4547, rawfileEH4547)
}
c(rawfileEH3222, rawfileEH4547) -> rawfile
```
Of note, the *proprietary* .Net assemblies [@RFR] require a file extentention of `.raw`. Therfore we have to rename the EH files and add the `.raw` suffix.
list meta data of the raw files.
```{r header}
(rawfile |>
lapply(FUN = rawrr::readFileHeader) -> rawFileHeader)
```
### `readRaw` - read Orbitrap raw file
read the two instrument raw files by using the `r BiocStyle::Biocpkg('rawDiag')`
package.
```{r readEH4547OrbitrapTrailerTable}
rawfile |>
lapply(FUN = rawDiag::readRaw) |>
Reduce(f = rbind) -> x
#BiocParallel::bplapply(FUN = rawDiag::readRaw) |>
```
### Output - Visualization
This package provides several plot functions tailored toward MS data.
The following list shows all available plot methods.
```{r listFUN}
library(rawDiag)
ls("package:rawDiag") |>
grep(pattern = '^plot', value = TRUE) -> pm
pm |>
knitr::kable(col.names = "package:rawDiag plot functions")
```
An inherent problem of visualizing data is the fact that depending on the data
at hand, specific visualizations lose their usefulness, e.g., overplotting in
a scatter plot if too many data points are present. To address this problem, we implemented most of the plot functions in different versions inspired by the
work of @Cleveland1993, @Sarkar2008, and @Wickham2009. The data can be displayed
in trellis plot manner using the faceting functionality of
`r BiocStyle::CRANpkg('ggplot2')`. Alternatively, overplotting using color
coding or violin plots based on descriptive statistics values can be chosen,
which allows the user to interactively change the appearance of the plots based
on the situation at hand. For instance, a large number of files are best
visualized by violin plots, giving the user an idea about the distribution of
the data points. On the basis of this, a smaller subset of files can be selected
and visualized with another technique.
The code snippet below applies all plot methods on the example data.
```{r plotALL, fig.width = 10, fig.height = 5}
pm |>
lapply(FUN = function(plotFUN) {
lapply(c('trellis'), function(method) {
message("plotting", plotFUN, "using method", method, "...")
do.call(plotFUN, list(x, method))
})
})
```
The appearance of each plot depends on the instrument, sample, and method used
to acquire the data. Therefore, it is hard to say what each ideal plot should
look like.
In particular, in the example above, we use data generated on an
`r rawFileHeader[[1]]$"Instrument name"`, `r rawFileHeader[[1]]$"RAW file"` and
`r rawFileHeader[[2]]$"Instrument name"`, `r rawFileHeader[[2]]$"RAW file"`
instrument using data-independent acquisition (DIA) [@Bruderer2017] and
data-dependent acquisition (DDA) methods.
For more information on the plot methods and its application, please read the
package man pages and the application examples in the manuscript
[@Trachsel2018].
## Launching the shiny application
The package provides a simple interactive `r BiocStyle::Biocpkg('shiny')`-based
graphical user interface for exploring Thermo Fisher Scientific *raw* data.
If you have a directory containing raw files, you can create a shiny application
as follows:
```{r shinyApp}
rawfile |>
dirname() |>
rawDiag::buildRawDiagShinyApp() -> app
```
The `r BiocStyle::Biocpkg('shiny')` runApp function launches the app in our
browser.
```{r runApp, eval=FALSE}
shiny::runApp(app)
```
By default, the application lets you choose the raw files in the provided
directory and provides the visualizations of the raw data as output.
The user can interactively change the by the
`r BiocStyle::Biocpkg('rawDiag')`
the package provided plot functions and arguments.
Additionally, the application provides PDF generation and download buttons.
Optionally height and width can be changed in the user interface.
Of note, the `rawDiag::rawDiagServer` module can be integrated into an existing
`r BiocStyle::CRANpkg('shinydashboard')` application, e.g., https://shiny-ms.fgcz.uzh.ch/fgczmsqc-dashboard/.
# FAQ
## I would like to load multiple files into a single data.frame to do comparisons; what is the preferred method for doing so?
consider all raw files of your working directory, e.g., `~/Downloads` and load them.
```{r FAQ1, eval=FALSE}
file.path(Sys.getenv("HOME"), "Downloads") |>
setwd()
list.files() |>
grep(pattern = '*.raw$', value = TRUE) |>
lapply(FUN = rawDiag::readRaw) |>
Reduce(f = rbind) -> x
```
as alternative to `lapply` you can utilize the
`r BiocStyle::Biocpkg('BiocParallel')` package `bplapply` function.
# References {-}
# Session information {-}
```{r sessioninfo, eval=TRUE}
sessionInfo()
```