Title: | STRINGdb - Protein-Protein Interaction Networks and Functional Enrichment Analysis |
---|---|
Description: | The STRINGdb package provides a R interface to the STRING protein-protein interactions database (https://string-db.org). |
Authors: | Andrea Franceschini <[email protected]> |
Maintainer: | Damian Szklarczyk <[email protected]> |
License: | GPL-2 |
Version: | 2.19.0 |
Built: | 2024-10-31 05:32:33 UTC |
Source: | https://github.com/bioc/STRINGdb |
Take in input a dataframe containing a logFC column that reports the logarithm of the difference in expression level. Add a "color" column to the data frame such that strongly downregulated genes are colored in green and strong upregulated genes are in red. When the down or up-regulation is instead weak the intensity of the color gets weaker as well, accordingly.
## S4 method for signature 'STRINGdb' add_diff_exp_color(screen, logFcColStr="logFC" )
## S4 method for signature 'STRINGdb' add_diff_exp_color(screen, logFcColStr="logFC" )
screen |
Dataframe containing the results of the experiment (e.g. the analyzed results of a microarray or RNAseq experiment) |
logFcColStr |
name of the colum that contains the logFC of the expression |
vector containing the colors
Andrea Franceschini
Add description coluns to the proteins that are present in the data frame given in input. The data frame must contain a column named "STRING_id".
## S4 method for signature 'STRINGdb' add_proteins_description(screen)
## S4 method for signature 'STRINGdb' add_proteins_description(screen)
screen |
Dataframe containing the results of the experiment (e.g. the analyzed results of a microarray or RNAseq experiment) |
returns the same dataframe given in input with an additional columns containing a description of the proteins.
Andrea Franceschini
coefficient of variation
coeffOfvar(x)
coeffOfvar(x)
x |
input number |
coefficient of variation
coefficient of variation
Andrea Franceschini
delete a column in the data frame
delColDf(df, colName)
delColDf(df, colName)
df |
data frame |
colName |
name of the column to be deleted |
data frame
Andrea Franceschini
example of microarray data (data processed from GEO GSE9008)
data(diff_exp_example1)
data(diff_exp_example1)
Data frames with 20861 observations on the following 3 variables.
gene
a character vector
pvalue
a numeric vector
logFC
a numeric vector
Whyte L, Huang YY, Torres K, Mehta RG. Molecular mechanisms of resveratrol action in lung cancer cells using dual protein and microarray analyses. Cancer Res 2007.
download a file only if it is not present.
downloadAbsentFile(urlStr, oD = tempdir())
downloadAbsentFile(urlStr, oD = tempdir())
urlStr |
url from which to download the file |
oD |
directory where to store the file |
Andrea Franceschini
download a STRING file only if it is not present or if it is corrupted.
downloadAbsentFileSTRING(urlStr, oD = tempdir())
downloadAbsentFileSTRING(urlStr, oD = tempdir())
urlStr |
url from which to download the file |
oD |
directory where to store the file |
Andrea Franceschini
Loads and returns the STRING alias table.
## S4 method for signature 'STRINGdb' get_aliases( )
## S4 method for signature 'STRINGdb' get_aliases( )
a data frame containing the STRING alias table
Andrea Franceschini
Loads and returns STRING annotations (i.e. GO annotations, KEGG pathways, domain databases). The annotations are stored in the "annotations" variable.
## S4 method for signature 'STRINGdb' get_annotations( )
## S4 method for signature 'STRINGdb' get_annotations( )
a data frame containing the annotations to the STRING proteins (e.g. GeneOntology, KEGG pathways, InterPro domains)
Andrea Franceschini
Returns a data frame with the description of every STRING annotation term (it downloads and caches the information the first time that is called).
## S4 method for signature 'STRINGdb' get_annotations_desc()
## S4 method for signature 'STRINGdb' get_annotations_desc()
data frame with the description of every STRING annotation term.
Andrea Franceschini
Returns the interaction graph as an object of the graph package in Bioconductor.
## S4 method for signature 'STRINGdb' get_bioc_graph()
## S4 method for signature 'STRINGdb' get_bioc_graph()
interaction graph as an object of the graph package in Bioconductor.
Andrea Franceschini
Returns a list of clusters of interacting proteins. See the iGraph (http://igraph.sourceforge.net/) documentation for additional information on the algorithms.
## S4 method for signature 'STRINGdb' get_clusters(string_ids, algorithm="fastgreedy")
## S4 method for signature 'STRINGdb' get_clusters(string_ids, algorithm="fastgreedy")
string_ids |
a vector of STRING identifiers. |
algorithm |
algorithm to use for the clustering. You can choose between "fastgreedy", "walktrap", "spinglass" and "edge.betweenness"). |
list of clusters of interacting proteins.
Andrea Franceschini
Returns the enrichment in pathways of the vector of STRING proteins that is given in input.
## S4 method for signature 'STRINGdb' get_enrichment(string_ids, category = "Process", methodMT = "fdr", iea = TRUE, minScore=NULL)
## S4 method for signature 'STRINGdb' get_enrichment(string_ids, category = "Process", methodMT = "fdr", iea = TRUE, minScore=NULL)
string_ids |
a vector of STRING identifiers. |
category |
category for which to compute the enrichment (i.e. "Process", "Component", "Function", "KEGG", "Pfam", "InterPro"). The default category is "Process". |
methodMT |
method to be used for the multiple testing correction. (i.e. "fdr", "bonferroni"). The default is "fdr". |
iea |
specify whether you also want to use electronic inference annotations |
minScore |
with Tissue and Disease categories is possible to filter the annotations having an annotation score higher than this threshold (from 0 to 5) |
Data frame containing the enrichment in pathways of the vector of STRING proteins that is given in input.
Andrea Franceschini
Return an igraph object with the STRING network (for information about iGraph visit http://igraph.sourceforge.net)
## S4 method for signature 'STRINGdb' get_graph()
## S4 method for signature 'STRINGdb' get_graph()
igraph object with the STRING network
Andrea Franceschini
Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.sf.net
In order to simplify the most common tasks, we do also provide convenient functions that wrap some iGraph functions. get_interactions(string_ids) # returns the interactions in between the input proteins get_neighbors(string_ids) # Get the neighborhoods of a protein (or of a vector of proteins) that is given in input. get_subnetwork(string_ids) # returns a subgraph from the given input proteins
Returns the list of closest homologs (as measured by bitscore) of the given input identifiers in all STRING species or single target species.
## S4 method for signature 'STRINGdb' get_homologs_besthits(string_ids, target_species_id=NULL)
## S4 method for signature 'STRINGdb' get_homologs_besthits(string_ids, target_species_id=NULL)
string_ids |
a vector of STRING identifiers. |
target_species_id |
NCBI taxonomy identifier of the species to query for homologs (the species must be present in the STRING database) |
Data frame containing the best blast hits x species of the given input identifiers.
Andrea Franceschini
Shows the interactions in between the proteins that are given in input.
## S4 method for signature 'STRINGdb' get_interactions(string_ids)
## S4 method for signature 'STRINGdb' get_interactions(string_ids)
string_ids |
a vector of STRING identifiers |
Data frame containing the interactions in between the input proteins.
Andrea Franceschini
Returns a short link to the network page of our STRING website that shows the protein interactions between the given identifiers.
## S4 method for signature 'STRINGdb' get_link(string_ids, required_score=NULL, network_flavor="evidence", payload_id = NULL)
## S4 method for signature 'STRINGdb' get_link(string_ids, required_score=NULL, network_flavor="evidence", payload_id = NULL)
string_ids |
a vector of STRING identifiers. |
required_score |
minimum STRING combined score of the interactions (if left NULL we get the combined score of the object, which is 400 by default). |
network_flavor |
specify the flavor of the network ("evidence", "confidence" or "actions". default "evidence"). |
payload_id |
an identifier of payload data on the STRING server (see method post_payload for additional informations) |
short link to the network page of our STRING website that shows the protein interactions between the input identifiers.
Andrea Franceschini
Get the neighborhoods of a protein (or of a vector of proteins) that is given in input.
## S4 method for signature 'STRINGdb' get_neighbors(string_ids)
## S4 method for signature 'STRINGdb' get_neighbors(string_ids)
string_ids |
a vector of STRING identifiers |
vector containing the neighborhoods of a protein (or of a vector of proteins) that is given in input.
Andrea Franceschini
Returns the list of paralogs of the given input in their species.
## S4 method for signature 'STRINGdb' get_paralogs(string_ids)
## S4 method for signature 'STRINGdb' get_paralogs(string_ids)
string_ids |
a vector of STRING identifiers. |
Data frame containing the best blast hits x species of the given input identifiers.
Andrea Franceschini
Returns a png image of a STRING protein network with the given identifiers.
## S4 method for signature 'STRINGdb' get_png(string_ids, required_score=NULL, network_flavor="evidence", file=NULL, payload_id=NULL)
## S4 method for signature 'STRINGdb' get_png(string_ids, required_score=NULL, network_flavor="evidence", file=NULL, payload_id=NULL)
string_ids |
a vector of STRING identifiers. |
required_score |
minimum STRING combined score of the interactions (if left NULL we get the combined score of the object, which is 400 by default). |
network_flavor |
specify the flavor of the network ("evidence", "confidence" or "actions". default "evidence"). |
file |
file where to save the image |
payload_id |
identifier of the payload |
Returns a png image of a STRING protein network with the given identifiers.
Andrea Franceschini
Returns a pvalue representing the enrichment in interactions of the list of proteins (i.e. the probability to obtain such a number of interactions by chance).
## S4 method for signature 'STRINGdb' get_ppi_enrichment(string_ids)
## S4 method for signature 'STRINGdb' get_ppi_enrichment(string_ids)
string_ids |
a vector of STRING identifiers |
Returns a pvalue representing the enrichment in interactions of the list of proteins (i.e. the probability to obtain such a number of interactions by chance).
Andrea Franceschini
Returns the STRING proteins data frame. (it downloads and caches the information the first time that is called).
## S4 method for signature 'STRINGdb' get_proteins()
## S4 method for signature 'STRINGdb' get_proteins()
STRING proteins data frame.
Andrea Franceschini
Returns the subgraph generated by the given input proteins.
## S4 method for signature 'STRINGdb' get_subnetwork(string_ids )
## S4 method for signature 'STRINGdb' get_subnetwork(string_ids )
string_ids |
a vector of STRING identifiers |
Returns the subgraph (i.e. an iGraph object) generated by the given input proteins.
Andrea Franceschini
Returns a summary of the STRING sub-network containing the identifiers provided in input.
## S4 method for signature 'STRINGdb' get_summary(string_ids)
## S4 method for signature 'STRINGdb' get_summary(string_ids)
string_ids |
a vector of STRING identifiers |
Returns a summary (i.e. a text description) of the STRING sub-network containing the identifiers provided in input.
Andrea Franceschini
Returns the proteins annotated to belong to a given term.
## S4 method for signature 'STRINGdb' get_term_proteins(term_ids, string_ids=NULL, enableIEA=TRUE)
## S4 method for signature 'STRINGdb' get_term_proteins(term_ids, string_ids=NULL, enableIEA=TRUE)
term_ids |
vector of terms |
string_ids |
a vector of STRING identifiers. If the variable is set, the method returns only the proteins that are present in this vector. |
enableIEA |
whether to consider also Electronic Inferred Annotations |
Returns the proteins annotated to belong to a given term.
Andrea Franceschini
example of a sorted list of protein-protein interactions, resulta our cooccurrence algorithm (SVD_Phy)
data(interactions_example)
data(interactions_example)
Data frames with 20861 observations on the following 3 variables.
proteinA
a character vector
proteinB
a character vector
score
a numeric vector
Downloads and returns the STRING network (the network is set also in the graph variable of the STRING_db object).
It makes use of the variables: "backgroundV" vector containing STRING identifiers to be used as background (i.e. the STRING network loaded will contain only the proteins that are present also in this vector) "score_threshold" STRING combined score threshold (the network loaded contains only interactions having a combined score greater than this threshold)
## S4 method for signature 'STRINGdb' load()
## S4 method for signature 'STRINGdb' load()
STRING network (i.e. an iGraph object. For info look to http://igraph.sourceforge.net)
Andrea Franceschini
Force download and loading of all the files (so that you can later store the object on the hard disk if you like). It makes use of the variables: "backgroundV" vector containing STRING identifiers to be used as background (i.e. the STRING network loaded will contain only the proteins that are present also in this vector) "score_threshold" STRING combined score threshold (the network loaded contains only interactions having a combined score greater than this threshold)
## S4 method for signature 'STRINGdb' load_all()
## S4 method for signature 'STRINGdb' load_all()
Andrea Franceschini
Maps the gene identifiers of the input dataframe to STRING identifiers. It returns the input dataframe with the "STRING_id" additional column.
## S4 method for signature 'STRINGdb' map(my_data_frame, my_data_frame_id_col_names, takeFirst=TRUE, removeUnmappedRows=FALSE, quiet=FALSE)
## S4 method for signature 'STRINGdb' map(my_data_frame, my_data_frame_id_col_names, takeFirst=TRUE, removeUnmappedRows=FALSE, quiet=FALSE)
my_data_frame |
data frame provided as input. |
my_data_frame_id_col_names |
vector contatining the names of the columns of "my_data_frame" that have to be used for the mapping. |
takeFirst |
boolean indicating what to do in case of multiple STRING proteins that map to the same name. If TRUE, only the first of those is taken. Otherwise all of them are used. (default TRUE) |
removeUnmappedRows |
remove the rows that cannot be mapped to STRING (by default those lines are left and their STRING_id is set to NA). |
quiet |
Setting this variable to TRUE we can avoid printing the warning relative to the unmapped values. |
Returns the dataframe that is given in input with the "STRING_id" additional column.
Andrea Franceschini
Maps the gene identifiers of the input vector to STRING identifiers (using a take first approach). It returns a vector with the STRING identifiers of the mapped proteins.
## S4 method for signature 'STRINGdb' mp(protein_aliases)
## S4 method for signature 'STRINGdb' mp(protein_aliases)
protein_aliases |
vector of protein aliases that we want to convert to STRING identifiers |
It returns a vector with the STRING identifiers of the mapped proteins.
Andrea Franceschini
mapping function (it add the possibility to map using more than one column of the data frame)
multi_map_df(dfToMap, dfMap, strColsFrom, strColFromDfMap, strColToDfMap, caseSensitive=FALSE)
multi_map_df(dfToMap, dfMap, strColsFrom, strColFromDfMap, strColToDfMap, caseSensitive=FALSE)
dfToMap |
input data frame (that contains the columns that need to be mapped) |
dfMap |
data frame containing the mapping data |
strColsFrom |
sorted vector containing the names of the columns to be used in the input data frame for the mapping (the order of the elements in the vector defines the priority for the mapping) |
strColFromDfMap |
name of the column in the mapping data frame to be used as source for the mapping |
strColToDfMap |
name of the column in the mapping data frame to be used as target for the mapping |
caseSensitive |
specify whether the mapping should be case sensitive |
data frame with an additional column containing the result of the mapping
Andrea Franceschini
Plots an image of the STRING network with the given proteins.
## S4 method for signature 'STRINGdb' plot_network(string_ids, payload_id=NULL, required_score=NULL, add_link=TRUE, add_summary=TRUE)
## S4 method for signature 'STRINGdb' plot_network(string_ids, payload_id=NULL, required_score=NULL, add_link=TRUE, add_summary=TRUE)
string_ids |
a vector of STRING identifiers |
payload_id |
an identifier of payload data on the STRING server (see method post_payload for additional informations) |
required_score |
a threshold on the score that overrides the default score_threshold, that we use only for the picture |
add_link |
parameter to specify whether you want to generate and add a short link to the relative page in STRING. As default this option is active but we suggest to deactivate it in case one is generating many images (e.g. in a loop). Deactivating this option avoids to generate and store a lot of short-urls on our server. |
add_summary |
parameter to specify whether you want to add a summary text to the picture. This summary includes a p-value and the number of proteins/interactions. |
Andrea Franceschini
Posts the input to STRING and returns an identifier that you can use to access the payload when you enter in our website.
## S4 method for signature 'STRINGdb' post_payload(stringIds, colors=NULL, comments=NULL, links=NULL, iframe_urls=NULL, logo_imgF=NULL, legend_imgF=NULL )
## S4 method for signature 'STRINGdb' post_payload(stringIds, colors=NULL, comments=NULL, links=NULL, iframe_urls=NULL, logo_imgF=NULL, legend_imgF=NULL )
stringIds |
vector of STRING identifiers. |
colors |
vector containing the colors to use for a every STRING identifier ( the order of the elements must match those in the string_ids vector) |
comments |
vector containing the comments to use for every STRING identifier ( the order of the elements must match those in the string_ids vector) |
links |
vector containing the links to use for every STRING identifier ( the order of the elements must match those in the string_ids vector) |
iframe_urls |
vector containing the urls of the iframes to use for every STRING identifier ( the order of the elements must match those in the string_ids vector). |
logo_imgF |
path to a file containing the logo image to be display in the STRING website |
legend_imgF |
path to a file containing a legend image to be display in the STRING website |
identifier of the payload.
Andrea Franceschini
With this method it is possible to remove the interactions that are composed by a pair of homologous/similar proteins, having a similarity bitscore between each other higher than a threshold.
## S4 method for signature 'STRINGdb' remove_homologous_interactions(interactions_dataframe, bitscore_threshold = 60)
## S4 method for signature 'STRINGdb' remove_homologous_interactions(interactions_dataframe, bitscore_threshold = 60)
interactions_dataframe |
a data frame contaning the sorted interactions to be benchmarked. The data frame should have the following column names: proteinA, proteinB, score |
bitscore_threshold |
filter out pairs of homologous proteins, having a similarity bitscore higher than this parameter |
interactions data frame where the homologous pairs have been removed, from the input interactions' data frame
Andrea Franceschini
Rename a column of a data frame
renameColDf(df, colOldName, colNewName)
renameColDf(df, colOldName, colNewName)
df |
input data frame |
colOldName |
column name to be changed |
colNewName |
new column name |
data frame with the column name changed
Andrea Franceschini
With this method you can specify a vector of proteins to be used as background. The network is reloaded and only the proteins that are present in the background vector are inserted in the graph. Besides, the background is taken in consideration for all the enrichment statistics.
## S4 method for signature 'STRINGdb' set_background(background_vector )
## S4 method for signature 'STRINGdb' set_background(background_vector )
background_vector |
vector of STRING protein identifiers |
Andrea Franceschini
"STRINGdb"
The R package STRINGdb provides a convenient interface to the STRING protein-protein interactions database for the R/bioconductor users. Please look at the manual/vignette to get additional informationd and examples on how to use the package. STRING is a database of known and predicted protein-protein interactions. It contains information from numerous sources, including experimental repositories, computational prediction methods and public text collections. Each interaction is associated with a combined confidence score that integrates the various evidences. STRING is regularly updated , the latest version 9.05 contains information on 5 millions proteins from more than 1100 species. The STRING web interface is freely accessible at: http://string-db.org/
All reference classes extend and inherit methods from "envRefClass"
.
annotations
:Object of class data.frame
~~
annotations_description
:Object of class data.frame
~~
graph
:Object of class igraph
~~
proteins
:Object of class data.frame
~~
speciesList
:Object of class data.frame
~~
species
:Object of class numeric
~~
version
:Object of class character
~~
input_directory
:Object of class character
~~
backgroundV
:Object of class vector
~~
score_threshold
:Object of class numeric
~~
set_background(background_vector)
:~~
post_payload(stringIds, colors, comments, links, iframe_urls, logo_imgF, legend_imgF)
:~~
plot_network(string_ids, payload_id, required_score)
:~~
plot_ppi_enrichment(string_ids, file, sliceWindow, edgeWindow, windowExtendedReferenceThreshold, minVal, title)
:~~
map(my_data_frame, my_data_frame_id_col_names, takeFirst, removeUnmappedRows, quiet)
:~~
load()
:~~
get_term_proteins(term_ids, string_ids, enableIEA)
:~~
get_summary(string_ids)
:~~
get_subnetwork(string_ids)
:~~
get_ppi_enrichment_full(string_ids, sliceWindow, edgeWindow, windowExtendedReferenceThreshold, growingWindowLimit)
:~~
get_ppi_enrichment(string_ids)
:~~
get_proteins()
:~~
get_png(string_ids, required_score, network_flavor, file, payload_id)
:~~
get_neighbors(string_ids)
:~~
get_link(string_ids, required_score, network_flavor, payload_id)
:~~
get_interactions(string_ids)
:~~
get_homologs_besthits(string_ids, symbets, target_species_id, bitscore_threshold)
:~~
get_homologs(string_ids, target_species_id, bitscore_threshold)
:~~
get_graph()
:~~
get_enrichment(string_ids, category, methodMT, iea)
:~~
get_clusters(string_ids, algorithm)
:~~
get_annotations_desc()
:~~
get_annotations()
:~~
load_all()
:~~
initialize(...)
:~~
add_proteins_description(screen)
:~~
add_diff_exp_color(screen, logFcColStr)
:~~
show()
:~~
Andrea Franceschini
Franceschini, A (2013). STRING v9.1: protein-protein interaction networks, with increased coverage and integration. In:'Nucleic Acids Res. 2013 Jan;41(Database issue):D808-15. doi: 10.1093/nar/gks1094. Epub 2012 Nov 29'.
http://stitch-db.org
showClass("STRINGdb")
showClass("STRINGdb")