--- title: "Retrieve and Use Mass Spectrometry Data from MetaboLights" output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{Retrieve and Use Mass Spectrometry Data from MetaboLights} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignettePackage{MsBackendMetaboLights} %\VignetteDepends{Spectra,BiocStyle} --- ```{r style, echo = FALSE, results = 'asis', message=FALSE} BiocStyle::markdown() ``` **Package**: `r Biocpkg("MsBackendMetaboLights")`
**Authors**: `r packageDescription("MsBackendMetaboLights")[["Author"]] `
**Last modified:** `r file.info("MsBackendMetaboLights.Rmd")$mtime`
**Compiled**: `r date()` ```{r, echo = FALSE, message = FALSE} library(Spectra) library(BiocStyle) ``` # Introduction The `r Biocpkg("Spectra")` package provides a central infrastructure for the handling of Mass Spectrometry (MS) data in Bioconductor. The package supports interchangeable use of different *backends* to import and represent MS data from a variety of sources and data formats. The *MsBackendMetaboLights* package allows to retrieve MS data files directly from the [MetaboLights](https://www.ebi.ac.uk/metabolights/) repository. MetaboLights is one of the main public repositories for deposition of metabolomics experiments including (raw) MS and/or NMR data files and the related experimental and analytical results. The *MsBackendMetaboLights* package downloads and locally caches MS data files for a MetaboLights data set and enables further analyses of this data directly in R. # Installation The package can be installed from within R with the commands below: ```{r, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("RforMassSpectrometry/MsBackendMetaboLights") ``` # Importing MS Data from MetaboLights [MetaboLights](https://www.ebi.ac.uk/metabolights/) is one of the main public repositories for deposition of metabolomics experiments including (raw) mass spectrometry (MS) and NMR data files and experimental/analysis results. The experimental metadata and results are stored as plain text files in ISA-tab format. Each MetaboLights experiment must provide a file describing the samples analyzed and at least one *assay* file that links between the experimental samples and the (raw and processed) data files with quantification of metabolites/features in these samples. In this vignette we explore and load MS data files from a small MetaboLights experiment. MetaboLights provides information on a data set/experiment as a set of plain text files in *ISA-tab* format. These can be accessed and read from the data set's ftp folder. The set of files consist generally of a file with information on the experiment/investigation (in a file with the file name starting with *i_*) the samples of the data set (file name starting with *s_*), the *assay* (measurements/analysis) of the experiment and a file with quantified metabolite abundances (file name starting with *m_*). Note that a data set can have more than one assay file. Below we list all files from the MetaboLights data set with the ID *MTBLS39*. ```{r} library(MsBackendMetaboLights) #' List files of a MetaboLights data set all_files <- mtbls_list_files("MTBLS39") ``` All these files are directly accessible in the ftp folder associated with the MetaboLights data set. Below we use the `mtbls_ftp_path()` function to return the ftp path for our test data set. ```{r} mtbls_ftp_path("MTBLS39") ``` We could inspect the content of this folder also using a browser supporting the ftp file transfer protocol and download individual files manually. We can however access the files also directly from within R. Below we read the *assay* data file directly using the base R `read.table()` function. ```{r} #' Get the assay files of the data set grep("^a_", all_files, value = TRUE) #' Read the assay file a <- read.table(paste0(mtbls_ftp_path("MTBLS39"), grep("^a_", all_files, value = TRUE)), sep = "\t", header = TRUE, check.names = FALSE) ``` Each row in this assay table refers to one measurement (data file) of the data set, with columns providing information on that measurement. The number and content of columns can vary between data sets and depends on the information the original researcher (manually) provided. Below we list the columns available in the assay file of our test data set. ```{r} colnames(a) ``` MS data files are generally provided in a column named `"Derived Spectral Data File"` but sometimes they are also listed in a column named `"Raw Spectral Data File"`. Note that providing MS data files is not absolutely mandatory, thus, for some data sets no MS data files might be available. Below we list the content of these data columns. ```{r} a[, c("Raw Spectral Data File", "Derived Spectral Data File")] ``` For this particular data set the MS data files are provided in the `"Raw Spectral Data File"` column. These files are in CDF format and can hence be loaded using the `MsBackendMetaboLights` backend into R as a `Spectra` object (`MsBackendMetaboLights` directly extends *Spectra*'s `MsBackendMzR` backend and therefore supports import of MS data files in *mzML*, *CDF* or *mzXML* formats). By default, all MS data files of all assays would be retrieved, but in our example below we restrict to few data files to reduce the amount of data that needs to be downloaded. To this end we define a pattern matching the file name of only some data files using the `filePattern` parameter. Alternatively, for data sets with more than one assay, it would also be possible to select MS data files from one particular assay only using the `assayName` parameter. In our case we load all MS data files that end with *63A.cdf*. ```{r} library(Spectra) #' Load MS data files of one data set s <- Spectra("MTBLS39", filePattern = "63A.cdf", source = MsBackendMetaboLights()) s ``` This call now downloaded the files to the local cache and loaded these files as a `Spectra` object. The downloading and caching of the data is handled by Bioconductor's `r Biocpkg("BiocFileCache")`. The local cache can thus be managed directly using functionality from that package. Any subsequent loading of the same data files will load the locally cached versions avoiding thus repetitive download of the same data. The message that is shown by the call above indicates that the MS data files were not provided in the expected column (`"Derived Spectral Data File"`) but in the column for raw data files. The `Spectra` object with the MS data files of the MetaboLights data set enables now any subsequent analysis of the data in R. On top of the spectra variables and mass peak data values that are provided by the MS data files also additional information related to the MetaboLights data set are available as specific *spectra variables*. We list all available spectra variables of the data set below. ```{r} spectraVariables(s) ``` The MetaboLights-specific variables are `"mtbls_id"`, `"mtbls_assay_name"` and `"derived_spectral_data_file"` providing the MetaboLights ID of the data set, the assay/method with which the data files were generated and the original file path/name of the data files on the MetaboLights ftp server. ```{r} spectraData(s, c("mtbls_id", "mtbls_assay_name", "derived_spectral_data_file")) ``` These variables can be used to link the individual spectra back to the original sample (e.g. through the *assay* and *sample* tables of the MetaboLights data set. The `mtbls_sync()` function can be used to *synchronize* the local content of a `MsBackendMetaboLights`. This function checks if all data files of the backend are available locally and eventually downloads and caches missing files. ```{r} mtbls_sync(s@backend) ``` Also, it is possible to *manually* cache and download data files from MetaboLights using the `mtbls_sync_data_files()` function. This function evaluates if the respective data files are already cached and, if so, does not download them again. Below we use this retrieve the local storage information on one of the data files of the MetaboLights data set *MTBLS39*: ```{r} res <- mtbls_sync_data_files("MTBLS39", fileName = "AM063A.cdf") res ``` The `mtbls_cached_data_files()` function can be used to inspect and list locally cached MetaboLights data files. This function does not require an active internet connection since only local content is queried. With the default settings, a `data.frame` with all available data files is returned. ```{r} mtbls_cached_data_files() ``` # Session information ```{r} sessionInfo() ```