Title: | Visualization and management of the protein-protein interaction networks. |
---|---|
Description: | cisPath is an R package that uses web browsers to visualize and manage protein-protein interaction networks. |
Authors: | Likun Wang <[email protected]> |
Maintainer: | Likun Wang <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.47.0 |
Built: | 2024-12-27 03:57:16 UTC |
Source: | https://github.com/bioc/cisPath |
This method is used to identify and visualize the shortest functional paths between proteins in the protein–protein interaction (PPI) network.
cisPath(infoFile, outputDir, proteinName=NULL, targetProteins=NULL, swissProtID=FALSE, sprotFile="", tremblFile="", nodeColors=c("#1F77B4", "#FF7F0E", "#D62728", "#9467BD", "#8C564B", "#E377C2"), leafColor="#2CA02C", byStep=FALSE) ## S4 method for signature 'character,character' cisPath(infoFile, outputDir, proteinName=NULL, targetProteins=NULL, swissProtID=FALSE, sprotFile="", tremblFile="", nodeColors=c("#1F77B4", "#FF7F0E", "#D62728", "#9467BD", "#8C564B", "#E377C2"), leafColor="#2CA02C", byStep=FALSE)
cisPath(infoFile, outputDir, proteinName=NULL, targetProteins=NULL, swissProtID=FALSE, sprotFile="", tremblFile="", nodeColors=c("#1F77B4", "#FF7F0E", "#D62728", "#9467BD", "#8C564B", "#E377C2"), leafColor="#2CA02C", byStep=FALSE) ## S4 method for signature 'character,character' cisPath(infoFile, outputDir, proteinName=NULL, targetProteins=NULL, swissProtID=FALSE, sprotFile="", tremblFile="", nodeColors=c("#1F77B4", "#FF7F0E", "#D62728", "#9467BD", "#8C564B", "#E377C2"), leafColor="#2CA02C", byStep=FALSE)
infoFile |
File that contains PPI data (character(1)). |
outputDir |
Output directory (character(1)). |
proteinName |
Gene name or Swiss-Prot accession number of the source protein (character(1)). |
targetProteins |
Gene names or Swiss-Prot accession numbers of the target proteins (character vector). |
swissProtID |
A logical value. If |
sprotFile |
Input: File downloaded from the UniProt database (UniProtKB/Swiss-Prot) (character(1)). |
tremblFile |
Input: File downloaded from the UniProt database (UniProtKB/TrEMBL) (character(1)). |
nodeColors |
Represents colors for main nodes in the graph. If there are fewer values than main nodes, it will be recycled in the standard manner. |
leafColor |
Represents color for leaf nodes in the graph. (character(1)) |
byStep |
A logical value. If users wish to identify the paths utilizing the shortest number of steps (instead of minimal cost), set |
The input PPI data file infoFile
should follow the format as the output files of the method formatSTRINGPPI
, formatPINAPPI
, or formatiRefIndex
.
See files STRINGPPI.txt
or PINAPPI.txt
as examples.
The first four fields contain the Swiss-Prot accession numbers and gene names for two interacting proteins.
The PubMedID
field should be stated to be NA
if unavailable.
The evidence
field may present an introduction to the evidence.
The edgeValue
field should be assigned a value no less than 1
.
This value will be treated as the cost while identifying the shortest paths.
If there is no method available to estimate this value, please give the value as 1
.
The shortest functional paths between the proteins are calculated using Dijkstra's algorithm.
The results are shown in an HTML file, and users can easily query them using a browser.
Each shortest path is displayed as a force-directed graph (http://bl.ocks.org/4062045) with JavaScript library D3 (www.d3js.org).
The HTML file follows HTML 4.01 Strict and CSS version 3 standards to maintain consistency across different browsers.
Chrome, Firefox, Safari, and IE9 will all properly display the PPI view.
Please contact us if the paths do not display correctly.
As an example, we have generated PPI interaction data for several species
from the PINA database (http://cbg.garvan.unsw.edu.au/pina/), STRING database (http://string-db.org/), and iRefIndex database (http://www.irefindex.org/wiki/).
Users can download these files from http://www.isb.pku.edu.cn/cisPath/.
If you make use of these files, please cite PINA, STRING, and iRefIndex accordingly.
Users can edit the PPI interactions generated with these two databases, or combine them with their private data to construct more complete PPI interaction networks.
In this package, we select only a small portion of the available PPI interaction data as an example.
An ID mapping file is also provided in this package, which was generated according to the data from the UniProt (http://www.uniprot.org/) database.
Using this method, a protein must be chosen as the source protein.
However, target proteins need not be chosen.
If target proteins are not provided, this method will identify the shortest paths between the source protein and all other relevant proteins.
The user can query the results upon finding an interesting “target” protein.
Although the output in such cases is large, we strongly suggest users give this method of selecting a source but not a target a try.
A protein often has several names, and some of these names have perhaps not been included in the input file infoFile
.
We therefore suggest users take a look at the output file targetIDs.txt
to check whether the input protein names are valid.
In order to avoid inputting invalid target protein names, the unique identifier Swiss-Prot accession numbers may alternatively be used as input.
The Swiss-Prot accession numbers can be sought in the UniProt (http://www.uniprot.org/) database. We strongly suggest users provide
the files from downloaded from the UniProt database (sprotFile
and tremblFile
).
All species:
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.dat.gz
Taxonomic divisions:
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/
uniprot_sprot_archaea.dat.gz and uniprot_trembl_archaea.dat.gz contain all archaea entries.
uniprot_sprot_bacteria.dat.gz and uniprot_trembl_bacteria.dat.gz contain all bacteria entries.
uniprot_sprot_fungi.dat.gz and uniprot_trembl_fungi.dat.gz contain all fungi entries.
uniprot_sprot_human.dat.gz and uniprot_trembl_human.dat.gz contain all human entries.
uniprot_sprot_invertebrates.dat.gz and uniprot_trembl_invertebrates.dat.gz contain all invertebrate entries.
uniprot_sprot_mammals.dat.gz and uniprot_trembl_mammals.dat.gz contain all mammalian entries except human and rodent entries.
uniprot_sprot_plants.dat.gz and uniprot_trembl_plants.dat.gz contain all plant entries.
uniprot_sprot_rodents.dat.gz and uniprot_trembl_rodents.dat.gz contain all rodent entries.
uniprot_sprot_vertebrates.dat.gz and uniprot_trembl_vertebrates.dat.gz contain all vertebrate entries except mammals.
uniprot_sprot_viruses.dat.gz and uniprot_trembl_viruses.dat.gz contain all eukaryotic entries except those from vertebrates, fungi and plants.
We suggest you take a look at the README file before you download these files.
If you make use of these files, please cite the UniProt database.
A list will be returned, and each element will contain the shortest paths from the source protein to a target protein.
The output directory contains the shortest paths from a source protein to the target proteins.
Users can search for the paths easily using a browser.
The file validInputProteins.txt
contains the proteins that are valid as input to the HTML file.
Please take a look at the output file targetIDs.txt
to check whether the input protein names to this method are valid.
Cowley, M.J. and et al. (2012) PINA v2.0: mining interactome modules. Nucleic Acids Res, 40, D862-865.
Wu, J. and et al. (2009) Integrated network analysis platform for protein-protein interactions. Nature methods, 6, 75-77.
Razick S. and et al. (2008) iRefIndex: A consolidated protein interaction database with provenance. BMC Bioinformatics, 9, 405
Aranda, B. and et al. (2011) PSICQUIC and PSISCORE: accessing and scoring molecular interactions, Nat Methods, 8, 528-529.
Szklarczyk,D. and et al. (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res, 39, D561-D568.
Franceschini,A. and et al. (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res, 41, D808-D815.
UniProt Consortium and others. (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res, 40, D71-D75.
formatSTRINGPPI
, formatPINAPPI
, formatSIFfile
, formatiRefIndex
, combinePPI
, networkView
, easyEditor
.
# examples infoFile <- system.file("extdata", "PPI_Info.txt", package="cisPath") # source protein: TP53 # Identify the shortest functional paths from TP53 to all other relevant proteins outputDir <- file.path(tempdir(), "TP53_example1") results <- cisPath(infoFile, outputDir, "TP53", byStep=TRUE) # Identify the shortest paths from TP53 to proteins MAGI1 and GH1 outputDir <- file.path(tempdir(), "TP53_example2") results <- cisPath(infoFile, outputDir, "TP53", targetProteins=c("MAGI1", "GH1"), byStep=TRUE) results # Identify the shortest paths from TP53 to proteins Q96QZ7 and P01241 (with the Swiss-Prot accession numbers) outputDir <- file.path(tempdir(), "TP53_example3") results <- cisPath(infoFile, outputDir, "TP53", targetProteins=c("Q96QZ7", "P01241"), swissProtID=TRUE, byStep=FALSE) # Identify the shortest functional paths in the web page outputDir <- file.path(tempdir(), "cisPath_example") results <- cisPath(infoFile, outputDir) ## Not run: # example of downloading PPI data from our website # Change to your own output directory outputDir <- file.path(getwd(), "TP53") # Create the output directory dir.create(outputDir, showWarnings=FALSE, recursive=TRUE) # infoFile: site where the PPI data file will be saved. infoFile <- file.path(outputDir, "PPIdata.txt") # Download PPI data from our website download.file("http://www.isb.pku.edu.cn/cispath/data/Homo_sapiens_PPI.txt", infoFile) download.file("http://www.isb.pku.edu.cn/cispath/data/Caenorhabditis_elegans_PPI.txt", infoFile) download.file("http://www.isb.pku.edu.cn/cispath/data/Drosophila_melanogaster_PPI.txt", infoFile) download.file("http://www.isb.pku.edu.cn/cispath/data/Mus_musculus_PPI.txt", infoFile) download.file("http://www.isb.pku.edu.cn/cispath/data/Rattus_norvegicus_PPI.txt", infoFile) download.file("http://www.isb.pku.edu.cn/cispath/data/Saccharomyces_cerevisiae_PPI.txt", infoFile) results <- cisPath(infoFile, outputDir, "TP53") outputDir <- file.path(getwd(), "cisPathWeb") results <- cisPath(infoFile, outputDir) ## End(Not run)
# examples infoFile <- system.file("extdata", "PPI_Info.txt", package="cisPath") # source protein: TP53 # Identify the shortest functional paths from TP53 to all other relevant proteins outputDir <- file.path(tempdir(), "TP53_example1") results <- cisPath(infoFile, outputDir, "TP53", byStep=TRUE) # Identify the shortest paths from TP53 to proteins MAGI1 and GH1 outputDir <- file.path(tempdir(), "TP53_example2") results <- cisPath(infoFile, outputDir, "TP53", targetProteins=c("MAGI1", "GH1"), byStep=TRUE) results # Identify the shortest paths from TP53 to proteins Q96QZ7 and P01241 (with the Swiss-Prot accession numbers) outputDir <- file.path(tempdir(), "TP53_example3") results <- cisPath(infoFile, outputDir, "TP53", targetProteins=c("Q96QZ7", "P01241"), swissProtID=TRUE, byStep=FALSE) # Identify the shortest functional paths in the web page outputDir <- file.path(tempdir(), "cisPath_example") results <- cisPath(infoFile, outputDir) ## Not run: # example of downloading PPI data from our website # Change to your own output directory outputDir <- file.path(getwd(), "TP53") # Create the output directory dir.create(outputDir, showWarnings=FALSE, recursive=TRUE) # infoFile: site where the PPI data file will be saved. infoFile <- file.path(outputDir, "PPIdata.txt") # Download PPI data from our website download.file("http://www.isb.pku.edu.cn/cispath/data/Homo_sapiens_PPI.txt", infoFile) download.file("http://www.isb.pku.edu.cn/cispath/data/Caenorhabditis_elegans_PPI.txt", infoFile) download.file("http://www.isb.pku.edu.cn/cispath/data/Drosophila_melanogaster_PPI.txt", infoFile) download.file("http://www.isb.pku.edu.cn/cispath/data/Mus_musculus_PPI.txt", infoFile) download.file("http://www.isb.pku.edu.cn/cispath/data/Rattus_norvegicus_PPI.txt", infoFile) download.file("http://www.isb.pku.edu.cn/cispath/data/Saccharomyces_cerevisiae_PPI.txt", infoFile) results <- cisPath(infoFile, outputDir, "TP53") outputDir <- file.path(getwd(), "cisPathWeb") results <- cisPath(infoFile, outputDir) ## End(Not run)
This method is used to combine the PPI information generated from different databases.
combinePPI(input, output, mappingFile="", dbNames="", maxEdgeValue=-1) ## S4 method for signature 'character,character' combinePPI(input, output, mappingFile="", dbNames="", maxEdgeValue=-1)
combinePPI(input, output, mappingFile="", dbNames="", maxEdgeValue=-1) ## S4 method for signature 'character,character' combinePPI(input, output, mappingFile="", dbNames="", maxEdgeValue=-1)
input |
Files that contain PPI information (character vector). |
output |
Output file (character(1)). |
mappingFile |
Identifier mapping file (character(1)). |
dbNames |
dbNames for input files (character vector). |
maxEdgeValue |
Filter out PPI information with edge cost greater than this value. (double(1)). |
The input files should follow the format as the output files of the method formatSTRINGPPI
, formatPINAPPI
, formatSIFfile
, or formatiRefIndex
.
See the files STRINGPPI.txt
or PINAPPI.txt
as examples.
The first four fields contain the Swiss-Prot accession numbers and gene names for two interacting proteins.
The PubMedID
field should be stated to be NA
if unavailable.
The evidence
field may present an introduction to the evidence.
The edgeValue
field should be given a value no less than 1
.
This value will be treated as the cost while identifying the shortest paths.
If there is no method available to estimate this value, please give the value as 1
.
In some cases, a protein may have several swiss-Prot accession numbers. The parameter mappingFile
is used
to unify the swiss-Prot accession numbers for proteins from different databases. The proteins whose swiss-Prot accession numbers are not included
in the mappingfile will be ignored as well as the intereractions associated to them.
formatSTRINGPPI
, formatPINAPPI
, formatSIFfile
, formatiRefIndex
, cisPath
, getMappingFile
.
library(cisPath) inputFile1 <- system.file("extdata", "PINA2PPI.txt", package="cisPath") inputFile2 <- system.file("extdata", "iRefIndex.txt", package="cisPath") inputFile3 <- system.file("extdata", "STRINGPPI.txt", package="cisPath") output <- file.path(tempdir(), "allPPI.txt") combinePPI(c(inputFile1, inputFile2, inputFile3), output, maxEdgeValue=1.4)
library(cisPath) inputFile1 <- system.file("extdata", "PINA2PPI.txt", package="cisPath") inputFile2 <- system.file("extdata", "iRefIndex.txt", package="cisPath") inputFile3 <- system.file("extdata", "STRINGPPI.txt", package="cisPath") output <- file.path(tempdir(), "allPPI.txt") combinePPI(c(inputFile1, inputFile2, inputFile3), output, maxEdgeValue=1.4)
This method is used to open the editor for network graphs.
easyEditor(outputDir) ## S4 method for signature 'character' easyEditor(outputDir)
easyEditor(outputDir) ## S4 method for signature 'character' easyEditor(outputDir)
outputDir |
Output directory (character(1)). |
The output HTML file of this method is a editor for network graphs. Users can draw and edit their own network with this editor.
library(cisPath) outputDir <- file.path(tempdir(), "easyEditor") easyEditor(outputDir)
library(cisPath) outputDir <- file.path(tempdir(), "easyEditor") easyEditor(outputDir)
This method is used to format the PPI file which is downloaded from the iRefIndex database.
formatiRefIndex(input, output, taxonId="") ## S4 method for signature 'character,character' formatiRefIndex(input, output, taxonId="")
formatiRefIndex(input, output, taxonId="") ## S4 method for signature 'character,character' formatiRefIndex(input, output, taxonId="")
input |
File downloaded from the iRefIndex database (character(1)). |
output |
Output file (character(1)). |
taxonId |
NCBI taxonomy specie identifier (character(1)). |
The input file is downloaded from the iRefIndex database (http://irefindex.org/wiki/).
Access http://irefindex.org/download/irefindex/data/archive/release_13.0/psi_mitab/MITAB2.6/ to download PPI files with the MITAB2.6 format
for different species.
If you make use of this file, please cite the iRefIndex database.
Each line of the output file contains Swiss-Prot accession numbers and gene names for two interacting proteins.
The edge value will be assigned as 1
for each link between two interacting proteins.
This may be treated as the “cost” while identifying the shortest paths between proteins.
Advanced users can edit the file and change this value for each edge.
Razick S. and et al. (2008) iRefIndex: A consolidated protein interaction database with provenance. BMC Bioinformatics, 9, 405
Aranda, B. and et al. (2011) PSICQUIC and PSISCORE: accessing and scoring molecular interactions, Nat Methods, 8, 528-529.
cisPath
, formatPINAPPI
, formatSIFfile
, formatSTRINGPPI
, combinePPI
.
library(cisPath) input <- system.file("extdata", "9606.mitab.100.txt", package="cisPath") output <- file.path(tempdir(), "iRefIndex.txt") formatiRefIndex(input, output) ## Not run: outputDir <- file.path(getwd(), "cisPath_test") dir.create(outputDir, showWarnings=FALSE, recursive=TRUE) # Download iRefIndex PPI for humans only (compressed:~105M, decompressed:~757M) destfile <- file.path(outputDir, "9606.mitab.08122013.txt.zip") cat("Downloading...\n") httpURL <- "http://irefindex.org/download" filePath <- "irefindex/data/archive/release_13.0/psi_mitab/MITAB2.6" fileName <- "9606.mitab.08122013.txt.zip" URL <- paste(httpURL, filePath, fileName, sep="/") download.file(URL, destfile) unzip(destfile, overwrite=TRUE, exdir=outputDir) # Format iRefIndex PPI fileFromiRef <- file.path(outputDir, "9606.mitab.08122013.txt") iRefIndex <- file.path(outputDir, "iRefIndex.txt") formatiRefIndex(fileFromiRef, output=iRefIndex) ## End(Not run)
library(cisPath) input <- system.file("extdata", "9606.mitab.100.txt", package="cisPath") output <- file.path(tempdir(), "iRefIndex.txt") formatiRefIndex(input, output) ## Not run: outputDir <- file.path(getwd(), "cisPath_test") dir.create(outputDir, showWarnings=FALSE, recursive=TRUE) # Download iRefIndex PPI for humans only (compressed:~105M, decompressed:~757M) destfile <- file.path(outputDir, "9606.mitab.08122013.txt.zip") cat("Downloading...\n") httpURL <- "http://irefindex.org/download" filePath <- "irefindex/data/archive/release_13.0/psi_mitab/MITAB2.6" fileName <- "9606.mitab.08122013.txt.zip" URL <- paste(httpURL, filePath, fileName, sep="/") download.file(URL, destfile) unzip(destfile, overwrite=TRUE, exdir=outputDir) # Format iRefIndex PPI fileFromiRef <- file.path(outputDir, "9606.mitab.08122013.txt") iRefIndex <- file.path(outputDir, "iRefIndex.txt") formatiRefIndex(fileFromiRef, output=iRefIndex) ## End(Not run)
This method is used to format the PPI file which is downloaded from the PINA database (MITAB Format).
formatPINAPPI(input, output) ## S4 method for signature 'character,character' formatPINAPPI(input, output)
formatPINAPPI(input, output) ## S4 method for signature 'character,character' formatPINAPPI(input, output)
input |
File downloaded from the PINA database (character(1)). |
output |
Output file (character(1)). |
The input file (MITAB Format) is downloaded from the PINA database (http://cbg.garvan.unsw.edu.au/pina/).
Access http://cbg.garvan.unsw.edu.au/pina/interactome.stat.do to download PPI files with the MITAB format
for different species.
If you make use of this file, please cite the PINA database.
Each line of the output file contains Swiss-Prot accession numbers and gene names for two interacting proteins.
The edge value will be assigned as 1
for each link between two interacting proteins.
This may be treated as the “cost” while identifying the shortest paths between proteins.
Advanced users can edit the file and change this value for each edge.
Cowley, M.J. and et al. (2012) PINA v2.0: mining interactome modules. Nucleic Acids Res, 40, D862-865.
Wu, J. and et al. (2009) Integrated network analysis platform for protein-protein interactions. Nature methods, 6, 75-77.
cisPath
, formatSIFfile
, formatiRefIndex
, formatSTRINGPPI
, combinePPI
.
library(cisPath) input <- system.file("extdata", "Homo_sapiens_PINA100.txt", package="cisPath") output <- file.path(tempdir(), "PINAPPI.txt") formatPINAPPI(input, output) ## Not run: outputDir <- file.path(getwd(), "cisPath_test") dir.create(outputDir, showWarnings=FALSE, recursive=TRUE) # Download PINA PPI (MITAB format) for humans only (~96M) destfile <- file.path(outputDir, "Homo_sapiens.txt") cat("Downloading...\n") download.file("http://cbg.garvan.unsw.edu.au/pina/download/Homo%20sapiens-20121210.txt", destfile) # Format PINA PPI fileFromPINA <- file.path(outputDir, "Homo_sapiens.txt") PINAPPI <- file.path(outputDir, "PINAPPI.txt") formatPINAPPI(fileFromPINA, output=PINAPPI) ## End(Not run)
library(cisPath) input <- system.file("extdata", "Homo_sapiens_PINA100.txt", package="cisPath") output <- file.path(tempdir(), "PINAPPI.txt") formatPINAPPI(input, output) ## Not run: outputDir <- file.path(getwd(), "cisPath_test") dir.create(outputDir, showWarnings=FALSE, recursive=TRUE) # Download PINA PPI (MITAB format) for humans only (~96M) destfile <- file.path(outputDir, "Homo_sapiens.txt") cat("Downloading...\n") download.file("http://cbg.garvan.unsw.edu.au/pina/download/Homo%20sapiens-20121210.txt", destfile) # Format PINA PPI fileFromPINA <- file.path(outputDir, "Homo_sapiens.txt") PINAPPI <- file.path(outputDir, "PINAPPI.txt") formatPINAPPI(fileFromPINA, output=PINAPPI) ## End(Not run)
This method is used to format the PPI file which is downloaded from the PINA2 database (SIF Format).
formatSIFfile(input, mappingFile, output) ## S4 method for signature 'character,character,character' formatSIFfile(input, mappingFile, output)
formatSIFfile(input, mappingFile, output) ## S4 method for signature 'character,character,character' formatSIFfile(input, mappingFile, output)
input |
File downloaded from the PINA2 database (SIF Format) (character(1)). |
mappingFile |
Identifier mapping file (character(1)). |
output |
Output file (character(1)). |
The input file (SIF Format) is downloaded from PINA2 database (http://cbg.garvan.unsw.edu.au/pina/).
Access http://cbg.garvan.unsw.edu.au/pina/interactome.stat.do to download PPI files with the SIF format
for different species.
If you make use of this file, please cite the PINA database.
Each line of the output file contains Swiss-Prot accession numbers and gene names for two interacting proteins.
The edge value will be assigned as 1
for each link between two interacting proteins.
This may be treated as the “cost” while identifying the shortest paths between proteins.
Advanced users can edit the file and change this value for each edge.
Cowley, M.J. and et al. (2012) PINA v2.0: mining interactome modules. Nucleic Acids Res, 40, D862-865.
Wu, J. and et al. (2009) Integrated network analysis platform for protein-protein interactions. Nature methods, 6, 75-77.
cisPath
, getMappingFile
, formatPINAPPI
, formatSTRINGPPI
, formatiRefIndex
, combinePPI
.
library(cisPath) # Generate the identifier mapping file input <- system.file("extdata", "uniprot_sprot_human10.dat", package="cisPath") mappingFile <- file.path(tempdir(), "mappingFile.txt") getMappingFile(input, output=mappingFile, taxonId="9606") # Format the file downloaded from STRING database output <- file.path(tempdir(), "PINA2PPI.txt") fileFromPINA2 <- system.file("extdata", "Homo_sapiens_PINA2.sif", package="cisPath") formatSIFfile(fileFromPINA2, mappingFile, output) ## Not run: if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("R.utils") library(R.utils) outputDir <- file.path(getwd(), "cisPath_test") dir.create(outputDir, showWarnings=FALSE, recursive=TRUE) # Generate the identifier mapping file fileFromUniProt <- file.path(outputDir, "uniprot_sprot_human.dat") mappingFile <- file.path(outputDir, "mappingFile.txt") getMappingFile(fileFromUniProt, output=mappingFile) # Download PINA2 PPI (SIF format) for humans only (~2.8M) destfile <- file.path(outputDir, "Homo_sapiens.sif") cat("Downloading...\n") download.file("http://cbg.garvan.unsw.edu.au/pina/download/Homo%20sapiens-20140521.sif", destfile) # Format PINA2 PPI fileFromPINA2 <- file.path(outputDir, "Homo_sapiens.sif") PINA2PPI <- file.path(outputDir, "PINA2PPI.txt") formatSIFfile(fileFromPINA2, mappingFile, output=PINA2PPI) ## End(Not run)
library(cisPath) # Generate the identifier mapping file input <- system.file("extdata", "uniprot_sprot_human10.dat", package="cisPath") mappingFile <- file.path(tempdir(), "mappingFile.txt") getMappingFile(input, output=mappingFile, taxonId="9606") # Format the file downloaded from STRING database output <- file.path(tempdir(), "PINA2PPI.txt") fileFromPINA2 <- system.file("extdata", "Homo_sapiens_PINA2.sif", package="cisPath") formatSIFfile(fileFromPINA2, mappingFile, output) ## Not run: if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("R.utils") library(R.utils) outputDir <- file.path(getwd(), "cisPath_test") dir.create(outputDir, showWarnings=FALSE, recursive=TRUE) # Generate the identifier mapping file fileFromUniProt <- file.path(outputDir, "uniprot_sprot_human.dat") mappingFile <- file.path(outputDir, "mappingFile.txt") getMappingFile(fileFromUniProt, output=mappingFile) # Download PINA2 PPI (SIF format) for humans only (~2.8M) destfile <- file.path(outputDir, "Homo_sapiens.sif") cat("Downloading...\n") download.file("http://cbg.garvan.unsw.edu.au/pina/download/Homo%20sapiens-20140521.sif", destfile) # Format PINA2 PPI fileFromPINA2 <- file.path(outputDir, "Homo_sapiens.sif") PINA2PPI <- file.path(outputDir, "PINA2PPI.txt") formatSIFfile(fileFromPINA2, mappingFile, output=PINA2PPI) ## End(Not run)
This method is used to format the PPI file which is downloaded from the STRING database.
formatSTRINGPPI(input, mappingFile, taxonId, output, minScore=700) ## S4 method for signature 'character,character,character,character' formatSTRINGPPI(input, mappingFile, taxonId, output, minScore=700)
formatSTRINGPPI(input, mappingFile, taxonId, output, minScore=700) ## S4 method for signature 'character,character,character,character' formatSTRINGPPI(input, mappingFile, taxonId, output, minScore=700)
input |
File downloaded from the STRING database (character(1)). |
mappingFile |
Identifier mapping file (character(1)). |
taxonId |
NCBI taxonomy specie identifier (character(1)). |
output |
Output file (character(1)). |
minScore |
Filter out PPI information with STRING scores less than this value. (integer(1)). |
The input file is downloaded from the STRING database (http://string-db.org/).
The URL of this file is http://string-db.org/newstring_download/protein.links.v9.1.txt.gz.
Access http://string-db.org/newstring_download/species.v9.1.txt to determine the parameter taxonId
.
Access http://string-db.org/newstring_cgi/show_download_page.pl for more details.
If you make use of this file, please cite the STRING database.
Each line of the output file contains Swiss-Prot accession numbers and gene names for two interacting proteins.
An edge value is estimated for each link between two interacting proteins.
This value is defined as max(1,log(1000-STRING_SCORE,100))
.
This may be treated as the “cost” while determining the shortest paths between proteins.
Advanced users can edit the file and change this value for each edge.
Szklarczyk,D. and et al. (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res, 39, D561-D568.
Franceschini,A. and et al. (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res, 41, D808-D815.
UniProt Consortium and others. (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40, D71-D75.
cisPath
, getMappingFile
, formatPINAPPI
, formatSIFfile
, formatiRefIndex
, combinePPI
.
library(cisPath) # Generate the identifier mapping file input <- system.file("extdata", "uniprot_sprot_human10.dat", package="cisPath") mappingFile <- file.path(tempdir(), "mappingFile.txt") getMappingFile(input, output=mappingFile, taxonId="9606") # Format the file downloaded from STRING database output <- file.path(tempdir(), "STRINGPPI.txt") fileFromSTRING <- system.file("extdata", "protein.links.txt", package="cisPath") formatSTRINGPPI(fileFromSTRING, mappingFile, "9606", output, 700) ## Not run: if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("R.utils") library(R.utils) outputDir <- file.path(getwd(), "cisPath_test") dir.create(outputDir, showWarnings=FALSE, recursive=TRUE) # Generate the identifier mapping file fileFromUniProt <- file.path(outputDir, "uniprot_sprot_human.dat") mappingFile <- file.path(outputDir, "mappingFile.txt") getMappingFile(fileFromUniProt, output=mappingFile) # Download STRING PPI for Homo sapiens (compressed:~27M, decompressed:~213M) destfile <- file.path(outputDir, "9606.protein.links.v9.1.txt.gz") cat("Downloading...\n") download.file("http://string-db.org/newstring_download/protein.links.v9.1/9606.protein.links.v9.1.txt.gz", destfile) cat("Uncompressing...\n") gunzip(destfile, overwrite=TRUE, remove=FALSE) # Format STRING PPI fileFromSTRING <- file.path(outputDir, "9606.protein.links.v9.1.txt") STRINGPPI <- file.path(outputDir, "STRINGPPI.txt") formatSTRINGPPI(fileFromSTRING, mappingFile, "9606", output=STRINGPPI, 700) ## End(Not run)
library(cisPath) # Generate the identifier mapping file input <- system.file("extdata", "uniprot_sprot_human10.dat", package="cisPath") mappingFile <- file.path(tempdir(), "mappingFile.txt") getMappingFile(input, output=mappingFile, taxonId="9606") # Format the file downloaded from STRING database output <- file.path(tempdir(), "STRINGPPI.txt") fileFromSTRING <- system.file("extdata", "protein.links.txt", package="cisPath") formatSTRINGPPI(fileFromSTRING, mappingFile, "9606", output, 700) ## Not run: if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("R.utils") library(R.utils) outputDir <- file.path(getwd(), "cisPath_test") dir.create(outputDir, showWarnings=FALSE, recursive=TRUE) # Generate the identifier mapping file fileFromUniProt <- file.path(outputDir, "uniprot_sprot_human.dat") mappingFile <- file.path(outputDir, "mappingFile.txt") getMappingFile(fileFromUniProt, output=mappingFile) # Download STRING PPI for Homo sapiens (compressed:~27M, decompressed:~213M) destfile <- file.path(outputDir, "9606.protein.links.v9.1.txt.gz") cat("Downloading...\n") download.file("http://string-db.org/newstring_download/protein.links.v9.1/9606.protein.links.v9.1.txt.gz", destfile) cat("Uncompressing...\n") gunzip(destfile, overwrite=TRUE, remove=FALSE) # Format STRING PPI fileFromSTRING <- file.path(outputDir, "9606.protein.links.v9.1.txt") STRINGPPI <- file.path(outputDir, "STRINGPPI.txt") formatSTRINGPPI(fileFromSTRING, mappingFile, "9606", output=STRINGPPI, 700) ## End(Not run)
This method is used to generate the identifier mapping file which is necessary for methods formatSIFfile
and formatSTRINGPPI
.
getMappingFile(sprotFile, output, tremblFile="", taxonId="") ## S4 method for signature 'character,character' getMappingFile(sprotFile, output, tremblFile="", taxonId="")
getMappingFile(sprotFile, output, tremblFile="", taxonId="") ## S4 method for signature 'character,character' getMappingFile(sprotFile, output, tremblFile="", taxonId="")
sprotFile |
Input: File downloaded from the UniProt database (UniProtKB/Swiss-Prot) (character(1)). |
output |
Output file (character(1)). |
tremblFile |
Input: File downloaded from the UniProt database (UniProtKB/TrEMBL) (character(1)). |
taxonId |
NCBI taxonomy specie identifier (character(1)). |
UniProtKB/Swiss-Prot: fully annotated curated entries.
UniProtKB/TrEMBL: computer-generated entries enriched with automated classification and annotation. sprotFile
is mandatory, while tremblFile
is optional.
If users only want to process the reviewed proteins from the UniProt database, tremblFile
should be ignored.
All species:
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.dat.gz
Taxonomic divisions:
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/
uniprot_sprot_archaea.dat.gz and uniprot_trembl_archaea.dat.gz contain all archaea entries.
uniprot_sprot_bacteria.dat.gz and uniprot_trembl_bacteria.dat.gz contain all bacteria entries.
uniprot_sprot_fungi.dat.gz and uniprot_trembl_fungi.dat.gz contain all fungi entries.
uniprot_sprot_human.dat.gz and uniprot_trembl_human.dat.gz contain all human entries.
uniprot_sprot_invertebrates.dat.gz and uniprot_trembl_invertebrates.dat.gz contain all invertebrate entries.
uniprot_sprot_mammals.dat.gz and uniprot_trembl_mammals.dat.gz contain all mammalian entries except human and rodent entries.
uniprot_sprot_plants.dat.gz and uniprot_trembl_plants.dat.gz contain all plant entries.
uniprot_sprot_rodents.dat.gz and uniprot_trembl_rodents.dat.gz contain all rodent entries.
uniprot_sprot_vertebrates.dat.gz and uniprot_trembl_vertebrates.dat.gz contain all vertebrate entries except mammals.
uniprot_sprot_viruses.dat.gz and uniprot_trembl_viruses.dat.gz contain all eukaryotic entries except those from vertebrates, fungi and plants.
We suggest you take a look at the README file before you download these files.
If you make use of these files, please cite the UniProt database.
The output file contains identifier mapping information which is necessary for methods formatSIFfile
and formatSTRINGPPI
.
Each line contains both the Ensembl Genomes Protein identifier and the Swiss-Prot accession number for a given protein.
UniProt Consortium and others. (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40, D71-D75.
cisPath
,
formatSTRINGPPI
,
formatSIFfile
,
combinePPI
.
library(cisPath) sprotFile <- system.file("extdata", "uniprot_sprot_human10.dat", package="cisPath") output <- file.path(tempdir(), "mappingFile.txt") getMappingFile(sprotFile, output, taxonId="9606") ## Not run: if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("R.utils") library(R.utils) outputDir <- file.path(getwd(), "cisPath_test") dir.create(outputDir, showWarnings=FALSE, recursive=TRUE) # Download protein information file for humans only from UniProt (decompressed:~246M) destfile <- file.path(outputDir, "uniprot_sprot_human.dat.gz"); cat("Downloading...\n") download.file("ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_human.dat.gz", destfile) gunzip(destfile, overwrite=TRUE, remove=FALSE) destfile <- file.path(outputDir, "uniprot_trembl_human.dat.gz"); cat("Downloading...\n") download.file("ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_trembl_human.dat.gz", destfile) gunzip(destfile, overwrite=TRUE, remove=FALSE) # Generate identifier mapping file sprotFile <- file.path(outputDir, "uniprot_sprot_human.dat") tremblFile <- file.path(outputDir, "uniprot_trembl_human.dat") mappingFile <- file.path(outputDir, "mappingFile.txt") getMappingFile(sprotFile, output=mappingFile, tremblFile) ## End(Not run)
library(cisPath) sprotFile <- system.file("extdata", "uniprot_sprot_human10.dat", package="cisPath") output <- file.path(tempdir(), "mappingFile.txt") getMappingFile(sprotFile, output, taxonId="9606") ## Not run: if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("R.utils") library(R.utils) outputDir <- file.path(getwd(), "cisPath_test") dir.create(outputDir, showWarnings=FALSE, recursive=TRUE) # Download protein information file for humans only from UniProt (decompressed:~246M) destfile <- file.path(outputDir, "uniprot_sprot_human.dat.gz"); cat("Downloading...\n") download.file("ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_human.dat.gz", destfile) gunzip(destfile, overwrite=TRUE, remove=FALSE) destfile <- file.path(outputDir, "uniprot_trembl_human.dat.gz"); cat("Downloading...\n") download.file("ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_trembl_human.dat.gz", destfile) gunzip(destfile, overwrite=TRUE, remove=FALSE) # Generate identifier mapping file sprotFile <- file.path(outputDir, "uniprot_sprot_human.dat") tremblFile <- file.path(outputDir, "uniprot_trembl_human.dat") mappingFile <- file.path(outputDir, "mappingFile.txt") getMappingFile(sprotFile, output=mappingFile, tremblFile) ## End(Not run)
This method can visualize the input proteins in the protein–protein interaction (PPI) network and display the evidence that supports the specific interactions among these proteins.
networkView(infoFile, proteinNames, outputDir, swissProtID=FALSE, mainNode=c(1), nodeColors=c("#1F77B4", "#FF7F0E", "#D62728","#9467BD","#8C564B","#E377C2"), displayMore=TRUE, leafColor="#2CA02C") ## S4 method for signature 'character,character,character' networkView(infoFile, proteinNames, outputDir, swissProtID=FALSE, mainNode=c(1), nodeColors=c("#1F77B4", "#FF7F0E", "#D62728","#9467BD","#8C564B","#E377C2"), displayMore=TRUE, leafColor="#2CA02C")
networkView(infoFile, proteinNames, outputDir, swissProtID=FALSE, mainNode=c(1), nodeColors=c("#1F77B4", "#FF7F0E", "#D62728","#9467BD","#8C564B","#E377C2"), displayMore=TRUE, leafColor="#2CA02C") ## S4 method for signature 'character,character,character' networkView(infoFile, proteinNames, outputDir, swissProtID=FALSE, mainNode=c(1), nodeColors=c("#1F77B4", "#FF7F0E", "#D62728","#9467BD","#8C564B","#E377C2"), displayMore=TRUE, leafColor="#2CA02C")
infoFile |
File that contains PPI data (character(1)). |
proteinNames |
Gene names or Swiss-Prot accession numbers of the proteins (character vector). |
outputDir |
Output directory (character(1)). |
swissProtID |
A logical value. If |
mainNode |
A vector of value |
nodeColors |
Represents colors for proteins. If there are fewer values than proteins, it will be recycled in the standard manner. |
displayMore |
A logical value. If |
leafColor |
Only used while |
As an example, we have generated PPI interaction data for several species from the PINA database (http://cbg.garvan.unsw.edu.au/pina/) and the STRING database (http://string-db.org/). Users can download these files from http://www.isb.pku.edu.cn/cisPath/. If you make use of these files, please cite PINA or STRING accordingly. Users can edit the PPIs generated with these two databases, or combine them with their private data to construct more complete PPI networks. In this package, we select only a small portion of the available PPI data as an example.
A protein often has several names, and some of these names have perhaps not been included in the input file infoFile
.
We therefore suggest users take a look at the output file proteinIDs.txt
to check whether the input protein names are valid.
In order to avoid inputting invalid target protein names, the unique identifier Swiss-Prot accession numbers may alternatively be used as input.
The Swiss-Prot accession numbers can be sought in the UniProt (http://www.uniprot.org/) database.
The output HTML file contains the PPIs and related evidence supporting them.
Users can view the PPI network and the evidence supporting these interactions easily using a browser.
Please take a look at the output file proteinIDs.txt
to check whether the input protein names are valid.
Cowley, M.J. and et al. (2012) PINA v2.0: mining interactome modules. Nucleic Acids Res, 40, D862-865.
Wu, J. and et al. (2009) Integrated network analysis platform for protein-protein interactions. Nature methods, 6, 75-77.
Szklarczyk,D. and et al. (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res, 39, D561-D568.
Franceschini,A. and et al. (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res, 41, D808-D815.
UniProt Consortium and others. (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40, D71-D75.
library(cisPath) infoFile <- system.file("extdata", "PPI_Info.txt", package="cisPath") outputDir <- file.path(tempdir(), "networkView") networkView(infoFile, c("MAGI1","TP53BP2","TP53", "PTEN"), outputDir, FALSE, c(1,1,1,0), displayMore=TRUE) outputDir2 <- file.path(tempdir(), "networkView2") inputFile <- system.file("extdata", "networkView.txt", package="cisPath") rt <- read.table(inputFile, sep=",", comment.char="", header=TRUE) proteins <- as.vector(rt[,1]) sizes <- as.vector(rt[,2]) nodeColors <- as.vector(rt[,3]) networkView(infoFile, proteins, outputDir2, FALSE, sizes, nodeColors, displayMore=FALSE)
library(cisPath) infoFile <- system.file("extdata", "PPI_Info.txt", package="cisPath") outputDir <- file.path(tempdir(), "networkView") networkView(infoFile, c("MAGI1","TP53BP2","TP53", "PTEN"), outputDir, FALSE, c(1,1,1,0), displayMore=TRUE) outputDir2 <- file.path(tempdir(), "networkView2") inputFile <- system.file("extdata", "networkView.txt", package="cisPath") rt <- read.table(inputFile, sep=",", comment.char="", header=TRUE) proteins <- as.vector(rt[,1]) sizes <- as.vector(rt[,2]) nodeColors <- as.vector(rt[,3]) networkView(infoFile, proteins, outputDir2, FALSE, sizes, nodeColors, displayMore=FALSE)