---
title: "Retrieve and Use Mass Spectrometry Data from MetaboLights"
output:
BiocStyle::html_document:
toc_float: true
vignette: >
%\VignetteIndexEntry{Retrieve and Use Mass Spectrometry Data from MetaboLights}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
%\VignettePackage{MsBackendMetaboLights}
%\VignetteDepends{Spectra,BiocStyle}
---
```{r style, echo = FALSE, results = 'asis', message=FALSE}
BiocStyle::markdown()
```
**Package**: `r Biocpkg("MsBackendMetaboLights")`
**Authors**: `r packageDescription("MsBackendMetaboLights")[["Author"]] `
**Last modified:** `r file.info("MsBackendMetaboLights.Rmd")$mtime`
**Compiled**: `r date()`
```{r, echo = FALSE, message = FALSE}
library(Spectra)
library(BiocStyle)
```
# Introduction
The `r Biocpkg("Spectra")` package provides a central infrastructure for the
handling of Mass Spectrometry (MS) data in Bioconductor. The package supports
interchangeable use of different *backends* to import and represent MS data from
a variety of sources and data formats. The *MsBackendMetaboLights* package
allows to retrieve MS data files directly from the
[MetaboLights](https://www.ebi.ac.uk/metabolights/) repository. MetaboLights is
one of the main public repositories for deposition of metabolomics experiments
including (raw) MS and/or NMR data files and the related experimental and
analytical results. The *MsBackendMetaboLights* package downloads and locally
caches MS data files for a MetaboLights data set and enables further analyses of
this data directly in R.
# Installation
The package can be installed from within R with the commands below:
```{r, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("RforMassSpectrometry/MsBackendMetaboLights")
```
# Importing MS Data from MetaboLights
[MetaboLights](https://www.ebi.ac.uk/metabolights/) is one of the main public
repositories for deposition of metabolomics experiments including (raw) mass
spectrometry (MS) and NMR data files and experimental/analysis results. The
experimental metadata and results are stored as plain text files in ISA-tab
format. Each MetaboLights experiment must provide a file describing the samples
analyzed and at least one *assay* file that links between the experimental
samples and the (raw and processed) data files with quantification of
metabolites/features in these samples.
In this vignette we explore and load MS data files from a small MetaboLights
experiment. MetaboLights provides information on a data set/experiment as a set
of plain text files in *ISA-tab* format. These can be accessed and read from the
data set's ftp folder. The set of files consist generally of a file with
information on the experiment/investigation (in a file with the file name
starting with *i_*) the samples of the data set (file name starting with *s_*),
the *assay* (measurements/analysis) of the experiment and a file with quantified
metabolite abundances (file name starting with *m_*). Note that a data set can
have more than one assay file.
Below we list all files from the MetaboLights data set with the ID *MTBLS39*.
```{r}
library(MsBackendMetaboLights)
#' List files of a MetaboLights data set
all_files <- mtbls_list_files("MTBLS39")
```
All these files are directly accessible in the ftp folder associated with the
MetaboLights data set. Below we use the `mtbls_ftp_path()` function to return
the ftp path for our test data set.
```{r}
mtbls_ftp_path("MTBLS39")
```
We could inspect the content of this folder also using a browser supporting the
ftp file transfer protocol and download individual files manually. We can
however access the files also directly from within R. Below we read the *assay*
data file directly using the base R `read.table()` function.
```{r}
#' Get the assay files of the data set
grep("^a_", all_files, value = TRUE)
#' Read the assay file
a <- read.table(paste0(mtbls_ftp_path("MTBLS39"),
grep("^a_", all_files, value = TRUE)),
sep = "\t", header = TRUE, check.names = FALSE)
```
Each row in this assay table refers to one measurement (data file) of the data
set, with columns providing information on that measurement. The number and
content of columns can vary between data sets and depends on the information the
original researcher (manually) provided. Below we list the columns available in
the assay file of our test data set.
```{r}
colnames(a)
```
MS data files are generally provided in a column named `"Derived Spectral Data
File"` but sometimes they are also listed in a column named `"Raw Spectral Data
File"`. Note that providing MS data files is not absolutely mandatory, thus, for
some data sets no MS data files might be available. Below we list the content of
these data columns.
```{r}
a[, c("Raw Spectral Data File", "Derived Spectral Data File")]
```
For this particular data set the MS data files are provided in the `"Raw
Spectral Data File"` column. These files are in CDF format and can hence be
loaded using the `MsBackendMetaboLights` backend into R as a `Spectra` object
(`MsBackendMetaboLights` directly extends *Spectra*'s `MsBackendMzR` backend and
therefore supports import of MS data files in *mzML*, *CDF* or *mzXML*
formats). By default, all MS data files of all assays would be retrieved, but in
our example below we restrict to few data files to reduce the amount of data
that needs to be downloaded. To this end we define a pattern matching the file
name of only some data files using the `filePattern` parameter. Alternatively,
for data sets with more than one assay, it would also be possible to select MS
data files from one particular assay only using the `assayName` parameter. In
our case we load all MS data files that end with *63A.cdf*.
```{r}
library(Spectra)
#' Load MS data files of one data set
s <- Spectra("MTBLS39", filePattern = "63A.cdf",
source = MsBackendMetaboLights())
s
```
This call now downloaded the files to the local cache and loaded these files as
a `Spectra` object. The downloading and caching of the data is handled by
Bioconductor's `r Biocpkg("BiocFileCache")`. The local cache can thus be managed
directly using functionality from that package. Any subsequent loading of the
same data files will load the locally cached versions avoiding thus repetitive
download of the same data.
The message that is shown by the call above indicates that the MS data files
were not provided in the expected column (`"Derived Spectral Data File"`) but in
the column for raw data files.
The `Spectra` object with the MS data files of the MetaboLights data set enables
now any subsequent analysis of the data in R. On top of the spectra variables
and mass peak data values that are provided by the MS data files also additional
information related to the MetaboLights data set are available as specific
*spectra variables*. We list all available spectra variables of the data set
below.
```{r}
spectraVariables(s)
```
The MetaboLights-specific variables are `"mtbls_id"`, `"mtbls_assay_name"` and
`"derived_spectral_data_file"` providing the MetaboLights ID of the data set,
the assay/method with which the data files were generated and the original file
path/name of the data files on the MetaboLights ftp server.
```{r}
spectraData(s, c("mtbls_id", "mtbls_assay_name",
"derived_spectral_data_file"))
```
These variables can be used to link the individual spectra back to the original
sample (e.g. through the *assay* and *sample* tables of the MetaboLights data
set.
The `mtbls_sync()` function can be used to *synchronize* the local content of a
`MsBackendMetaboLights`. This function checks if all data files of the backend
are available locally and eventually downloads and caches missing files.
```{r}
mtbls_sync(s@backend)
```
Also, it is possible to *manually* cache and download data files from
MetaboLights using the `mtbls_sync_data_files()` function. This function
evaluates if the respective data files are already cached and, if so, does not
download them again. Below we use this retrieve the local storage information on
one of the data files of the MetaboLights data set *MTBLS39*:
```{r}
res <- mtbls_sync_data_files("MTBLS39", fileName = "AM063A.cdf")
res
```
The `mtbls_cached_data_files()` function can be used to inspect and list locally
cached MetaboLights data files. This function does not require an active
internet connection since only local content is queried. With the default
settings, a `data.frame` with all available data files is returned.
```{r}
mtbls_cached_data_files()
```
# Session information
```{r}
sessionInfo()
```