--- title: "MSstatsBioNet Introduction" author: "Anthony Wu" package: MSstatsBioNet output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{MSstatsBioNet: Introduction} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Installation Run this code below to install MSstatsBioNet from bioconductor ```{r eval = FALSE} if (!require("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("MSstatsBioNet") ``` # Purpose of MSstatsBioNet The `MSstatsBioNet` package is a member of the `MSstats` family of packages. It contains a set of functions for interpretation of mass spectrometry (MS) statistical analysis results in the context of protein-protein interaction networks. The package is designed to be used in conjunction with the `MSstats` package. # Dataset We will be taking a subset of the dataset found in this [paper](https://pmc.ncbi.nlm.nih.gov/articles/PMC7331093/). ```{r} input = data.table::fread(system.file( "extdata/msstats.csv", package = "MSstatsBioNet" )) ``` # MSstats Convert from Upstream Dataset ```{r} library(MSstatsConvert) msstats_imported = FragPipetoMSstatsFormat(input, use_log_file = FALSE) head(msstats_imported) ``` We will first convert the input data to a format that can be processed by MSstats. In this example, the `MetamorpheusToMSstatsFormat` function is used to convert the input data from Metamorpheus to MSstats format. The MSstats format will contain the necessary information for downstream analysis, such as peptide information, abundance values, run ID, and experimental annotation information. # MSstats Process and GroupComparison ```{r} library(MSstats) QuantData <- dataProcess(msstats_imported, use_log_file = FALSE) model <- groupComparison( contrast.matrix = "pairwise", data = QuantData, use_log_file = FALSE ) head(model$ComparisonResult) ``` Next, we will preprocess the data using the `dataProcess` function and perform a statistical analysis using the `groupComparison` function. The output of `groupComparison` is a table containing a list of differentially abundant proteins with their associated p-values and log fold changes. # MSstatsBioNet Analysis ## ID Conversion First, we need to convert the group comparison results to a format that can be processed by INDRA. The `getSubnetworkFromIndra` function requires a table containing HGNC IDs. We can use the `annotateProteinInfoFromIndra` function to obtain these mappings. In the below example, we convert uniprot IDs to their corresponding Hgnc IDs. We can also extract other information, such as hgnc gene name and protein function. ```{r} library(MSstatsBioNet) annotated_df = annotateProteinInfoFromIndra(model$ComparisonResult, "Uniprot") head(annotated_df) ``` ## Subnetwork Query The package provides a function `getSubnetworkFromIndra` that retrieves a subnetwork of proteins from the INDRA database based on differential abundance analysis results. ```{r} subnetwork <- getSubnetworkFromIndra(annotated_df, pvalueCutoff = 0.05) head(subnetwork$nodes) head(subnetwork$edges) ``` This package is distributed under the [Artistic-2.0](https://opensource.org/licenses/Artistic-2.0) license. However, its dependencies may have different licenses. In this example, getSubnetworkFromIndra depends on INDRA, which is distributed under the [BSD 2-Clause](https://opensource.org/license/bsd-2-clause) license. Furthermore, INDRA's knowledge sources may have different licenses for commercial applications. Please refer to the [INDRA README](https://github.com/sorgerlab/indra?tab=readme-ov-file#indra-modules) for more information on its knowledge sources and their associated licenses. ## Visualize Networks The function `visualizeNetworks` then takes the output of `getSubnetworkFromIndra` and visualizes the subnetwork. The function requires Cytoscape desktop to be open for the visualization to work. ```{r} visualizeNetworks(subnetwork$nodes, subnetwork$edges) ``` In the network diagram displayed in Cytoscape, you should see two arrows connecting two nodes, P16050 and P84243. These arrows represent the interactions between these two proteins, notably activation and phosphorylation. # Session info ```{r} sessionInfo() ```