---
title: Introduction to Seahtrue
author: "Vincent C. J. de Boer, Gerwin Smits, Xiang Zhang"
date: "`r format(Sys.Date(), '%b %d, %Y')`"
output:
    BiocStyle::html_document:
        toc: true
        number_sections: true
        toc_depth: 3
        toc_float:
            collapsed: true
description: |
  Overview of the Seahtrue package.
vignette: >
  %\VignetteIndexEntry{Introduction to Seahtrue}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
  #rmarkdown.html_vignette.check_title = FALSE
)
```

# Seahtrue overview

The seahtrue package offers a set of functions to be able to perform reproducible data analysis of extracellular flux analysis. The main function `revive_xfplate()` reads, preprocess and validates the input data and outputs experimental details and outcome variables in an organized (tidy) way. The output of the `revive_xfplate()` is a nested tibble.

# Extracellular flux analysis scientific primer

With instruments such as the Seahorse XF analyzer from Agilent, but also the O2K from Oroboros, other oyxgraphs (from Hansatech instruments for example) and the ReSipher from Lucid Scientific, scientists are able to analyze oxygen consumption of living biological samples. 

Oxygen consumption of cells or small model organism can provide insights into the function of the mitochondria, since mitochondria are most of the time the main O2 consumers of cells. Apart form oxygen consumption the Seahorse XF analyzer is able to analyze in parallel the extracellular acidification of the culture medium in which the sample is emerged. This can be a proxy for glycolytic activity of samples. 

Seahorse extracellular flux instruments performs analysis of O2 and pH in either 96 wells, 24 wells or 8 wells, and typically O2 and pH are monitored over a period of around 1 hour, in discrete measurements of typically 3 minutes each. Furthermore, perturbations of cellular functional states can be induced by adding compounds while performing the assay. The most common perturbations that are performed are injections of oligomycin, fccp and anitmycin/rotenone, known as a mitostress test. 

# Resources

Divakaruni, Ajit S., and Martin Jastroch. “A Practical Guide for the Analysis, Standardization and Interpretation of Oxygen Consumption Measurements.” Nature Metabolism 4, no. 8 (August 15, 2022): 978–94. https://doi.org/10.1038/s42255-022-00619-4.

Gerencser, A. A., A. Neilson, S. W. Choi, U. Edman, N. Yadava, R. J. Oh, D. A. Ferrick, D. G. Nicholls, and M. D. Brand. “Quantitative Microplate-Based Respirometry with Correction for Oxygen Diffusion.” Anal Chem 81, no. 16 (August 15, 2009): 6868–78. https://doi.org/10.1021/ac900881z.

Janssen, J. J. E., B. Lagerwaard, A. Bunschoten, H. F. J. Savelkoul, R. J. J. van Neerven, J. Keijer, and V. C. J. de Boer. “Novel Standardized Method for Extracellular Flux Analysis of Oxidative and Glycolytic Metabolism in Peripheral Blood Mononuclear Cells.” Sci Rep 11, no. 1 (January 18, 2021): 1662. https://doi.org/10.1038/s41598-021-81217-4.

Zhang, Xiang, Taolin Yuan, Jaap Keijer, and Vincent C. J. de Boer. “OCRbayes: A Bayesian Hierarchical Modeling Framework for Seahorse Extracellular Flux Oxygen Consumption Rate Data Analysis.” PLOS ONE 16, no. 7 (July 15, 2021): e0253926. https://doi.org/10.1371/journal.pone.0253926.

```{r load packages, include=FALSE}

suppressPackageStartupMessages({
  library(dplyr)
  library(seahtrue)
  #library(ggplot2)
})


```

# Seahtrue

## Inspect the seahtrue data

```{r}

# data_file_path <- 
#   system.file("data", 
#               "revive_output_donor_A.rda", 
#               package = "seahtrue")
# 
# load(data_file_path)


#library(seahtrue)
#library(tidyverse)
revive_output_donor_A <- 
  system.file("extdata", 
              "20191219_SciRep_PBMCs_donor_A.xlsx",
              package = "seahtrue") %>% 
  seahtrue::revive_xfplate()
  

```

Take a glimpse at the generated data from the `revive_xfplate()` function:

```{r}
revive_output_donor_A %>%  
  dplyr::glimpse()

```

Our data starts with 4 columns of identifiers. `plate_id`, `filepath_seahorse`, `date_run` and `date_processed`, this keeps the data easily traceable with ids on the top level. After that, 5 columns of nested tibbles follow. The `assay_info` column contains a tibble/dataframe with information that was stored in the experimental file. This is either information that the user put into the software before running the experiment, after running the software when processing the data, or was generated by the software. The next colum is the `injection_info` containing measurement, interval and injection. Then the two data columns for `raw_data` and `rate_data` are listed as tibble/dataframe. The final column is the `validation_output` that has the output of the validations as well as its rules. In the next sections, we will explore each data output separately.

## Rate data

The `rate_data` is what scientist typically use for their interpretation of their XF experiments. It contains the calculated OCR (oxygen consumption rate) and ECAR (extracellular acidification rate) values, together with the PER (proton efflux rate). The PER is calculated from ECAR when the buffer capacity is known and set in the experiment. In our `rate_data` we only have the OCR and ECAR data, since PER can be easily calculated.

```{r}
revive_output_donor_A %>%  
  purrr::pluck("rate_data", 1)

```
The `rate_data` has `well`,`measurement`, `group` identifiers for each row followed by the `time_wave` column which provides the time of the measurement in minutes, and the OCR and ECAR data columns. Also the `cell_n` and `flagged_well` status is joined in this dataframe. This provides all information for exploring and plotting the data. Since the OCR and ECAR data can be exported with either background on or off, the read functions in the seahtrue package determine whether the OCR and ECAR are background corrected or not, based on whether the Background wells have an OCR of zero. If the data was not corrected for background the the OCR is corrected while reading the .xlsx file. The background corrected data is given in `OCR_wave_bc` and `ECAR_wave_bc`. If the data was exported without background correction the `OCR_wave` and `ECAR_wave` data columns would contain the non corrected OCR and ECAR.

## Raw data

The `raw_data` tibble contains the raw data that is collected in an XF experiment. This is essential data that can give detailed insights on the quality of the assay. Apart from the data that is presented in the raw data sheet of the .xlsx, some preprocessing output is given. Such as the timescale in seconds (`timescale`) and minutes (`minutes`), as well as an `interval` and `injection` id. Also, the background corrected raw data values for pH, O2 and its emissions are given. Again, just like in the `rate_data` tibble, the `cell_n` and `flagged_well` status is given.

```{r}
revive_output_donor_A %>%  
  purrr::pluck("raw_data", 1)

```

## Assay info

The `assay_info` has information from user or software provided meta data that is associated with the experiment and plate. For example, the barcode of the cartridge that was used:

```{r}
revive_output_donor_A %>%  
  purrr::pluck("assay_info", 1) %>% 
  pull(cartridge_barcode)

```

The XF analyzer reads for each cartridge a barcode that is then associated with the assay. There is some information in the barcode that the software uses for OCR calculation. The emission of the fluorescent O2 sensors at zero oxygen `F0` (see Gerencser et al. (2009) Quantitative microplate-based respirometry with correction for oxygen diffusion. Anal Chem 81:6868, for details) is derived from the Stern-Volmer constant `KSV`. Where the emission at ambient oxygen is typically set at 12500 AU and ambient oxygen levels in wells in culture medium is set to 151.6900241 mmHg.

```{r}
# KSV in barcode
revive_output_donor_A %>%  
  purrr::pluck("assay_info", 1) %>% 
  pull(cartridge_barcode) %>% 
  stringr::str_sub(-18, -13)
  
# KSV in assay info sheet
revive_output_donor_A %>%  
  purrr::pluck("assay_info", 1) %>% 
  pull(KSV) 

# F0 can be calculated using the stern-volmer equation
# and the info 
# emission target at ambient O2 = 12500
# O2 level at ambient in sample medium in well = 151.69
#
# F0/F = 1 + KSV*O2
# F0 = (1+KSV*O2)*F
# F0 = (1+ KSV*151.69)*12500

# F0 from assay info sheet
revive_output_donor_A %>%  
  purrr::pluck("assay_info", 1) %>% 
  pull(F0) 
```

Apart from user and software generated meta info, the functions in the seahtrue package also put some relevant info into this tibble. Such as the time to start the actual measurements (`minutes_to_start_measurement_one`), that shows how long the user took to insert the cell plate and start running the measurements. The timer starts at t = 0 minutes when the cartridge is calibrated by the user.

```{r}
revive_output_donor_A %>%  
  purrr::pluck("assay_info", 1) %>% 
  pull(minutes_to_start_measurement_one)

```

Apart from the `assay_info` there can be some more meta info associated with the data tibbles in the form of `attributes`. These can also be viewed as shown in the following examples:

```{r}
revive_output_donor_A %>%  
  purrr::pluck("rate_data", 1) %>% str()

```

```{r}
  revive_output_donor_A %>%  
    purrr::pluck("rate_data", 1) %>% 
    attributes() %>% 
    purrr::pluck("was_background_corrected")

```

## Injection info

Since every XF experiment uses pertubations with chemicals or nutrients the `injection_info` is important for interpretation of the experiment. The injection info can be plucked from the data tibble.

```{r}
  revive_output_donor_A %>%  
    purrr::pluck("injection_info", 1)

```

# Session info

```{r}
sessionInfo()
```