Title: | Network Analysis for MS-based Proteomics Experiments |
---|---|
Description: | A set of tools for network analysis using mass spectrometry-based proteomics data and network databases. The package takes as input the output of MSstats differential abundance analysis and provides functions to perform enrichment analysis and visualization in the context of prior knowledge from past literature. Notably, this package integrates with INDRA, which is a database of biological networks extracted from the literature using text mining techniques. |
Authors: | Anthony Wu [aut, cre],
Olga Vitek [aut] |
Maintainer: | Anthony Wu <[email protected]> |
License: | file LICENSE |
Version: | 0.99.9 |
Built: | 2025-02-22 03:26:57 UTC |
Source: | https://github.com/bioc/MSstatsBioNet |
This function populates the HGNC IDs in the data frame based on the Uniprot IDs.
.populateHgncIdsInDataFrame(df)
.populateHgncIdsInDataFrame(df)
df |
A data frame containing protein information. |
A data frame with populated HGNC IDs.
This function populates the HGNC names in the data frame based on the HGNC IDs.
.populateHgncNamesInDataFrame(df)
.populateHgncNamesInDataFrame(df)
df |
A data frame containing protein information. |
A data frame with populated HGNC names.
This function populates the kinase information in the data frame based on the HGNC names.
.populateKinaseInfoInDataFrame(df)
.populateKinaseInfoInDataFrame(df)
df |
A data frame containing protein information. |
A data frame with populated kinase information.
This function populates the phosphatase information in the data frame based on the HGNC names.
.populatePhophataseInfoInDataFrame(df)
.populatePhophataseInfoInDataFrame(df)
df |
A data frame containing protein information. |
A data frame with populated phosphatase information.
This function populates the transcription factor information in the data frame based on the HGNC names.
.populateTranscriptionFactorInfoInDataFrame(df)
.populateTranscriptionFactorInfoInDataFrame(df)
df |
A data frame containing protein information. |
A data frame with populated transcription factor information.
This function populates the Uniprot IDs in the data frame based on the protein ID type.
.populateUniprotIdsInDataFrame(df, proteinIdType)
.populateUniprotIdsInDataFrame(df, proteinIdType)
df |
A data frame containing protein information. |
proteinIdType |
A character string specifying the type of protein ID. It can be either "Uniprot" or "Uniprot_Mnemonic". |
A data frame with populated Uniprot IDs.
This function validates the input data frame for the annotateProteinInfoFromIndra function.
.validateAnnotateProteinInfoFromIndraInput(df)
.validateAnnotateProteinInfoFromIndraInput(df)
df |
A data frame containing protein information. |
None. Throws an error if validation fails.
This function annotates a data frame with protein information from Indra.
annotateProteinInfoFromIndra(df, proteinIdType)
annotateProteinInfoFromIndra(df, proteinIdType)
df |
output of |
proteinIdType |
A character string specifying the type of protein ID. It can be either "Uniprot" or "Uniprot_Mnemonic". |
A data frame with the following columns:
Character. The original protein identifier.
Character. The Uniprot ID of the protein.
Character. The HGNC ID of the protein.
Character. The HGNC name of the protein.
Logical. Indicates if the protein is a transcription factor.
Logical. Indicates if the protein is a kinase.
Logical. Indicates if the protein is a phosphatase.
df <- data.frame(Protein = c("CLH1_HUMAN")) annotated_df <- annotateProteinInfoFromIndra(df, "Uniprot_Mnemonic") head(annotated_df)
df <- data.frame(Protein = c("CLH1_HUMAN")) annotated_df <- annotateProteinInfoFromIndra(df, "Uniprot_Mnemonic") head(annotated_df)
Using differential abundance results from MSstats, this function retrieves a subnetwork of protein interactions from INDRA database.
getSubnetworkFromIndra( input, protein_level_data = NULL, pvalueCutoff = NULL, statement_types = c("IncreaseAmount", "DecreaseAmount"), paper_count_cutoff = 1, evidence_count_cutoff = 1, correlation_cutoff = 0.3 )
getSubnetworkFromIndra( input, protein_level_data = NULL, pvalueCutoff = NULL, statement_types = c("IncreaseAmount", "DecreaseAmount"), paper_count_cutoff = 1, evidence_count_cutoff = 1, correlation_cutoff = 0.3 )
input |
output of |
protein_level_data |
output of the |
pvalueCutoff |
p-value cutoff for filtering. Default is NULL, i.e. no filtering |
statement_types |
list of interaction types to filter on. Equivalent to statement type in INDRA. Default is c("IncreaseAmount", "DecreaseAmount"). |
paper_count_cutoff |
number of papers to filter on. Default is 1. |
evidence_count_cutoff |
number of evidence to filter on for each paper. E.g. A paper may have 5 sentences describing the same interaction vs 1 sentence. Default is 1. |
correlation_cutoff |
if protein_level_abundance is not NULL, apply a cutoff for edges with correlation less than a specified cutoff. Default is 0.3 |
list of 2 data.frames, nodes and edges
input <- data.table::fread(system.file( "extdata/groupComparisonModel.csv", package = "MSstatsBioNet" )) subnetwork <- getSubnetworkFromIndra(input) head(subnetwork$nodes) head(subnetwork$edges)
input <- data.table::fread(system.file( "extdata/groupComparisonModel.csv", package = "MSstatsBioNet" )) subnetwork <- getSubnetworkFromIndra(input) head(subnetwork$nodes) head(subnetwork$edges)
Use results from INDRA to generate a visualization of the a network on Cytoscape Desktop. Note that the Cytoscape Desktop app must be open for this function to work.
visualizeNetworks( nodes, edges, pvalueCutoff = 0.05, logfcCutoff = 0.5, node_label_column = "id", main_targets = c() )
visualizeNetworks( nodes, edges, pvalueCutoff = 0.05, logfcCutoff = 0.5, node_label_column = "id", main_targets = c() )
nodes |
dataframe of nodes consisting of columns id (chararacter), pvalue (number), logFC (number) |
edges |
dataframe of edges consisting of columns source (character), target (character), interaction (character), evidenceCount (number), evidenceLink (character) |
pvalueCutoff |
p-value cutoff for coloring significant proteins. Default is 0.05 |
logfcCutoff |
log fold change cutoff for coloring significant proteins. Default is 0.5 |
node_label_column |
The column of the nodes dataframe to use as the node label. Default is "id". "hgncName" can be used for gene name. |
main_targets |
character vector of main targets to stand-out with a different node shape. Default is an empty vector c(). IDs of main targets should match the column used by the node_label_column parameter. |
cytoscape visualization of subnetwork
input <- data.table::fread(system.file( "extdata/groupComparisonModel.csv", package = "MSstatsBioNet" )) subnetwork <- getSubnetworkFromIndra(input) visualizeNetworks(subnetwork$nodes, subnetwork$edges)
input <- data.table::fread(system.file( "extdata/groupComparisonModel.csv", package = "MSstatsBioNet" )) subnetwork <- getSubnetworkFromIndra(input) visualizeNetworks(subnetwork$nodes, subnetwork$edges)