---
title: "MSstatsBioNet Introduction"
author: "Anthony Wu"
package: MSstatsBioNet
output: BiocStyle::html_document
vignette: >
  %\VignetteIndexEntry{MSstatsBioNet: Introduction}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```


# Installation

Run this code below to install MSstatsBioNet from bioconductor

```{r eval = FALSE}
if (!require("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("MSstatsBioNet")
```

# Purpose of MSstatsBioNet

The `MSstatsBioNet` package is a member of the `MSstats` family of packages.
It contains a set of functions for interpretation of mass spectrometry (MS) 
statistical analysis results in the context of protein-protein interaction 
networks. The package is designed to be used in conjunction with the 
`MSstats` package.

# Dataset

We will be taking a subset of the dataset found in this [paper](https://pmc.ncbi.nlm.nih.gov/articles/PMC7331093/).  

```{r}
input = data.table::fread(system.file(
    "extdata/msstats.csv",
    package = "MSstatsBioNet"
))
```

# MSstats Convert from Upstream Dataset

```{r}
library(MSstatsConvert)
msstats_imported = FragPipetoMSstatsFormat(input, use_log_file = FALSE)
head(msstats_imported)
```

We will first convert the input data to a format that can be processed by 
MSstats.  In this example, the `MetamorpheusToMSstatsFormat` function is used 
to convert the input data from Metamorpheus to MSstats format.  The MSstats
format will contain the necessary information for downstream analysis, such
as peptide information, abundance values, run ID, and experimental annotation
information.

# MSstats Process and GroupComparison

```{r}
library(MSstats)
QuantData <- dataProcess(msstats_imported, use_log_file = FALSE)
model <- groupComparison(
    contrast.matrix = "pairwise",
    data = QuantData,
    use_log_file = FALSE
)
head(model$ComparisonResult)
```

Next, we will preprocess the data using the `dataProcess` function and perform
a statistical analysis using the `groupComparison` function.  The output of 
`groupComparison` is a table containing a list of differentially abundant 
proteins with their associated p-values and log fold changes.  

# MSstatsBioNet Analysis

## ID Conversion

First, we need to convert the group comparison results to a format that can be
processed by INDRA.  The `getSubnetworkFromIndra` function requires a table 
containing HGNC IDs.  We can use the `annotateProteinInfoFromIndra` function
to obtain these mappings.

In the below example, we convert uniprot IDs to their corresponding Hgnc IDs. We
can also extract other information, such as hgnc gene name and protein function.

```{r}
library(MSstatsBioNet)
annotated_df = annotateProteinInfoFromIndra(model$ComparisonResult, "Uniprot")
head(annotated_df)
```

## Subnetwork Query

The package provides a function `getSubnetworkFromIndra` that retrieves a
subnetwork of proteins from the INDRA database based on differential abundance
analysis results.  

```{r}
subnetwork <- getSubnetworkFromIndra(annotated_df, pvalueCutoff = 0.05)
head(subnetwork$nodes)
head(subnetwork$edges)
```

This package is distributed under the [Artistic-2.0](https://opensource.org/licenses/Artistic-2.0) license.  However, its dependencies may have different licenses.  In this example, getSubnetworkFromIndra depends on INDRA, which is distributed under the [BSD 2-Clause](https://opensource.org/license/bsd-2-clause) license.  Furthermore, INDRA's knowledge sources may have different licenses for commercial applications.  Please refer to the [INDRA README](https://github.com/sorgerlab/indra?tab=readme-ov-file#indra-modules) for more information on its knowledge sources and their associated licenses.


## Visualize Networks

The function `visualizeNetworks` then takes the output of 
`getSubnetworkFromIndra` and visualizes the subnetwork. The function requires
Cytoscape desktop to be open for the visualization to work.

```{r}
visualizeNetworks(subnetwork$nodes, subnetwork$edges)
```
In the network diagram displayed in Cytoscape, you should see two arrows
connecting two nodes, P16050 and P84243.  These arrows represent the 
interactions between these two proteins, notably activation and phosphorylation.

# Session info

```{r}
sessionInfo()
```