Title: | Access Pathways from Multiple Databases Through BioPAX and Pathway Commons |
---|---|
Description: | The package provides a set of R functions for interacting with BioPAX OWL files using Paxtools and the querying Pathway Commons (PC) molecular interaction database. Pathway Commons is a project by the Memorial Sloan-Kettering Cancer Center (MSKCC), Dana-Farber Cancer Institute (DFCI), and the University of Toronto. Pathway Commons databases include: BIND, BioGRID, CORUM, CTD, DIP, DrugBank, HPRD, HumanCyc, IntAct, KEGG, MirTarBase, Panther, PhosphoSitePlus, Reactome, RECON, TRANSFAC. |
Authors: | Augustin Luna [aut, cre] |
Maintainer: | Augustin Luna <[email protected]> |
License: | LGPL-3 |
Version: | 1.41.0 |
Built: | 2024-10-31 03:45:44 UTC |
Source: | https://github.com/bioc/paxtoolsr |
Add attributes using a list of vectors to an igraph object
addAttributeList(g, attr, l)
addAttributeList(g, attr, l)
g |
an igraph object |
attr |
the name of the attribute |
l |
the list of vectors |
the modified igraph object
library(igraph) g <- barabasi.game(20) g <- set_vertex_attr(g, "name", value=LETTERS[1:20]) g <- addAttributeList(g, "isProt", list(A=TRUE, B=FALSE, C=TRUE, D=TRUE, E=FALSE))
library(igraph) g <- barabasi.game(20) g <- set_vertex_attr(g, "name", value=LETTERS[1:20]) g <- addAttributeList(g, "isProt", list(A=TRUE, B=FALSE, C=TRUE, D=TRUE, E=FALSE))
Convert columns with list in data.frame to vector
convertDataFrameListsToVectors(df, delimiter = ";")
convertDataFrameListsToVectors(df, delimiter = ";")
df |
a data.frame |
delimiter |
a delimiter to concatenate (DEFAULT: ;) |
a data.frame without list columns
Lists as columns are useful programmatically, but cause issue in writing output to text-based files
df <- data.frame(id = 1:2, name = c("Jon", "Mark"), children = I(list(c("Mary", "James"), c("Greta", "Sally")))) df <- convertDataFrameListsToVectors(df)
df <- data.frame(id = 1:2, name = c("Jon", "Mark"), children = I(list(c("Mary", "James"), c("Greta", "Sally")))) df <- convertDataFrameListsToVectors(df)
Convert SIF to GMT
convertSifToGmt(sif, name = "gmt", returnSmallMolecules = FALSE)
convertSifToGmt(sif, name = "gmt", returnSmallMolecules = FALSE)
sif |
a data.frame representing a SIF (Simple Interaction Format) |
name |
the name of the gene set |
returnSmallMolecules |
a boolean whether to return genes or small molecules in the gene set |
a list with one entry being a vector
sif <- readSif(system.file("extdata", "test_sif.txt", package="paxtoolsr")) gmt <- convertSifToGmt(sif)
sif <- readSif(system.file("extdata", "test_sif.txt", package="paxtoolsr")) gmt <- convertSifToGmt(sif)
Check Cache and Download File
downloadFile( baseUrl, fileName, destDir = NULL, cacheEnv = "PAXTOOLSR_CACHE", verbose = FALSE )
downloadFile( baseUrl, fileName, destDir = NULL, cacheEnv = "PAXTOOLSR_CACHE", verbose = FALSE )
baseUrl |
a string, entire download URL except filename |
fileName |
a string, the filename of file to be downloaded |
destDir |
a string, the path where a file should be saved |
cacheEnv |
a string, environment variable pointing to specific cache |
verbose |
show debugging information |
Description of file formats: http://www.pathwaycommons.org/pc2/formats
a boolean TRUE if the file was downloaded or already exists, FALSE otherwise
readSif, readBiopax, readSbgn, readSifnx, readGmt
downloadFile("http://google.com/", fileName="index.html", destDir=tempdir())
downloadFile("http://google.com/", fileName="index.html", destDir=tempdir())
Download Pathway Commons files (uses menu and cache)
downloadPc2( selectedFileName = NULL, destDir = NULL, returnNames = NULL, version, verbose = FALSE, ... )
downloadPc2( selectedFileName = NULL, destDir = NULL, returnNames = NULL, version, verbose = FALSE, ... )
selectedFileName |
a string, a name of a file to skip the the interactive selection |
destDir |
a string, the destination directory for the file to be downloaded (Default: NULL). If NULL, then file will be downloaded to cache directory at Sys.getenv("PAXTOOLSR_CACHE") |
returnNames |
return a vector of names matching the given regular expression |
version |
a version number for a previous version of Pathway Commons data; versions 3 and above. Parameter set as version="8". Available versions "http://www.pathwaycommons.org/archives/PC2/" |
verbose |
a flag to display debugging information (Default: FALSE) |
... |
additional parameters to send to corresponding read* methods |
an R object using one of the read* methods provided in this package corresponding to the file downloaded
## Not run: downloadPc2(version="8") downloadPc2(version="8", returnNames="ext.*sif") downloadPc2("PathwayCommons.8.inoh.GSEA.hgnc.gmt.gz", version="8", verbose=TRUE) ## End(Not run)
## Not run: downloadPc2(version="8") downloadPc2(version="8", returnNames="ext.*sif") downloadPc2("PathwayCommons.8.inoh.GSEA.hgnc.gmt.gz", version="8", verbose=TRUE) ## End(Not run)
Download a SIF file containing only signed interactions
downloadSignedPC(destDir = NULL, forceCache = FALSE)
downloadSignedPC(destDir = NULL, forceCache = FALSE)
destDir |
a string, the destination directory for the file to be downloaded (Default: NULL). If NULL, then file will be downloaded to cache directory at Sys.getenv("PAXTOOLSR_CACHE") |
forceCache |
a boolean to force the use of a cached version (DEFAULT: FALSE); the current host of the file (GitHub) does not support the LAST-MODIFIED header |
a SIF containing interactions that are considered signed (i.e. interactions causing an increase on decrease in a molecular species)
# downloadSignedPC()
# downloadSignedPC()
This function will create a subsetted object with specified URIs.
fetch(inputFile, outputFile = NULL, idList)
fetch(inputFile, outputFile = NULL, idList)
inputFile |
a string of the name of the input BioPAX OWL file |
outputFile |
a string with the name of the output BioPAX OWL file |
idList |
a vector of IDs from the BioPAX OWL file |
Only entities in the input BioPAX file will be used in the fetch. IDs used must be URIs for the entities of interest. Additional properties such as cross-references for fetched entities will be included in the output.
an XMLInternalDocument representing a BioPAX OWL file
outFile <- tempfile() ids <- c("http://identifiers.org/uniprot/P36894", "http://identifiers.org/uniprot/Q13873") results <- fetch(system.file("extdata", "REACT_12034-3.owl", package="paxtoolsr"), outFile, ids)
outFile <- tempfile() ids <- c("http://identifiers.org/uniprot/P36894", "http://identifiers.org/uniprot/Q13873") results <- fetch(system.file("extdata", "REACT_12034-3.owl", package="paxtoolsr"), outFile, ids)
Keep interactions in SIF network based on certain criteria
filterSif( sif, ids = NULL, interactionTypes = NULL, dataSources = NULL, interactionPubmedIds = NULL, pathwayNames = NULL, mediatorIds = NULL, edgelist = NULL, idsBothParticipants = FALSE, edgelistCheckReverse = TRUE, verbose = FALSE )
filterSif( sif, ids = NULL, interactionTypes = NULL, dataSources = NULL, interactionPubmedIds = NULL, pathwayNames = NULL, mediatorIds = NULL, edgelist = NULL, idsBothParticipants = FALSE, edgelistCheckReverse = TRUE, verbose = FALSE )
sif |
a binary SIF as a data.frame with three columns: "PARTICIPANT_A", "INTERACTION_TYPE", "PARTICIPANT_B" |
ids |
a vector of IDs to be kept |
interactionTypes |
a vector of interaction types to be kept (List of interaction types: http://www.pathwaycommons.org/pc2/formats) |
dataSources |
a vector of data sources to be kept. For Extended SIF. |
interactionPubmedIds |
a vector of Pubmed IDs to be kept. For Extended SIF. |
pathwayNames |
a vector of pathway names to be kept. For Extended SIF. |
mediatorIds |
a vector of mediator IDs to be kept. For Extended SIF. Mediator IDs are the full BioPAX objects that were simplified to interaction given in the SIF. For Extended SIF. |
edgelist |
a two-column data.frame where each row is an interaction to be kept. Directionality is ignored (e.g. Edge A B will return interactions A B and B A from SIF) |
idsBothParticipants |
a boolean whether both interaction participants should be in a given interaction when using the ids parameter; TRUE if both (DEFAULT: TRUE) |
edgelistCheckReverse |
a boolean whether to check for edges in the reverse order (DEFAULT: TRUE) |
verbose |
Show debugging information (DEFAULT: FALSE) |
filtered interactions with three columns: "PARTICIPANT_A", "INTERACTION_TYPE", "PARTICIPANT_B". The intersection of multiple filters is returned.
results <- readSif(system.file("extdata", "test_sif.txt", package="paxtoolsr")) intTypes <- c("controls-state-change-of", "controls-expression-of", "catalysis-precedes") filteredNetwork <- filterSif(results, intTypes) tmp <- readSifnx(system.file("extdata", "test_sifnx_250.txt", package = "paxtoolsr")) results <- filterSif(tmp$edges, ids=c("CHEBI:17640", "MCM3")) results <- filterSif(tmp$edges, dataSources=c("INOH", "KEGG")) results <- filterSif(tmp$edges, dataSources=c("IntAct"), ids=c("CHEBI:17640", "MCM3")) results <- filterSif(tmp$edges, pathwayNames=c("Metabolic pathways")) results <- filterSif(tmp$edges, mediatorIds=c("http://purl.org/pc2/8/MolecularInteraction_1452626895158")) results <- filterSif(tmp$edges, interactionPubmedId="17654400") tmp <- readSifnx(system.file("extdata", "test_sifnx_250.txt", package = "paxtoolsr")) edgelist <- read.table(system.file("extdata", "test_edgelist.txt", package = "paxtoolsr"), sep="\t", header=FALSE, stringsAsFactors=FALSE) results <- filterSif(tmp$edges, edgelist=edgelist)
results <- readSif(system.file("extdata", "test_sif.txt", package="paxtoolsr")) intTypes <- c("controls-state-change-of", "controls-expression-of", "catalysis-precedes") filteredNetwork <- filterSif(results, intTypes) tmp <- readSifnx(system.file("extdata", "test_sifnx_250.txt", package = "paxtoolsr")) results <- filterSif(tmp$edges, ids=c("CHEBI:17640", "MCM3")) results <- filterSif(tmp$edges, dataSources=c("INOH", "KEGG")) results <- filterSif(tmp$edges, dataSources=c("IntAct"), ids=c("CHEBI:17640", "MCM3")) results <- filterSif(tmp$edges, pathwayNames=c("Metabolic pathways")) results <- filterSif(tmp$edges, mediatorIds=c("http://purl.org/pc2/8/MolecularInteraction_1452626895158")) results <- filterSif(tmp$edges, interactionPubmedId="17654400") tmp <- readSifnx(system.file("extdata", "test_sifnx_250.txt", package = "paxtoolsr")) edgelist <- read.table(system.file("extdata", "test_edgelist.txt", package = "paxtoolsr"), sep="\t", header=FALSE, stringsAsFactors=FALSE) results <- filterSif(tmp$edges, edgelist=edgelist)
List files in cache directory
getCacheFiles()
getCacheFiles()
a vector of the files in the cache directory
getCacheFiles()
getCacheFiles()
Get Error Message for a Pathway Commons Error
getErrorMessage(code)
getErrorMessage(code)
code |
a three digit numerical error code |
an error message for the code
results <- getErrorMessage("452")
results <- getErrorMessage("452")
This function retrieves a set of neighbors for a set of IDs in a BioPAX file.
getNeighbors(inputFile, outputFile = NULL, idList)
getNeighbors(inputFile, outputFile = NULL, idList)
inputFile |
a string with the name of the input BioPAX OWL file |
outputFile |
a string with the name of the output BioPAX OWL file |
idList |
a vector of IDs from the BioPAX OWL file |
Only entities in the input BioPAX file will be searched for neighbors. IDs used must be URIs for the entities of interest.
an XMLInternalDocument representing a BioPAX OWL file
outFile <- tempfile() results <- getNeighbors(system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr"), outFile, c("HTTP://WWW.REACTOME.ORG/BIOPAX/48887#PROTEIN2360_1_9606", "HTTP://WWW.REACTOME.ORG/BIOPAX/48887#PROTEIN1631_1_9606"))
outFile <- tempfile() results <- getNeighbors(system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr"), outFile, c("HTTP://WWW.REACTOME.ORG/BIOPAX/48887#PROTEIN2360_1_9606", "HTTP://WWW.REACTOME.ORG/BIOPAX/48887#PROTEIN1631_1_9606"))
This command retrieves full pathway information for a set of elements such as pathway, interaction or physical entity given the RDF IDs.
getPc(uri, format = "BIOPAX", verbose = FALSE, ...)
getPc(uri, format = "BIOPAX", verbose = FALSE, ...)
uri |
a vector that includes valid/existing BioPAX element's URI (RDF ID; for utility classes that were "normalized", such as entity refereneces and controlled vocabularies, it is usually a Idntifiers.org URL. Multiple IDs are allowed per query, for example, c("http://identifiers.org/uniprot/Q06609", "http://identifiers.org/uniprot/Q549Z0") See also about MIRIAM and Identifiers.org in details. |
format |
output format (Default: BIOPAX). Valid options can be found using
|
verbose |
a boolean, display the command used to query Pathway Commons |
... |
additional arguments to read* methods that handle data from Pathway Commons |
Get commands only retrieve the BioPAX elements that are directly mapped to the ID. Use the "traverse query to traverse BioPAX graph and obtain child/owner elements.
Information on MIRIAM and Identifiers.org http://www.pathwaycommons.org/pc2/#miriam
a XMLInternalDocument object
uri <- "http://identifiers.org/uniprot/O14503" #results <- getPc(uri) uri <- c("http://identifiers.org/uniprot/O14503", "http://identifiers.org/uniprot/Q9P2X7") #results <- getPc(uri, verbose=TRUE)
uri <- "http://identifiers.org/uniprot/O14503" #results <- getPc(uri) uri <- c("http://identifiers.org/uniprot/O14503", "http://identifiers.org/uniprot/Q9P2X7") #results <- getPc(uri, verbose=TRUE)
Get a Pathway Commons Databases
getPcDatabaseNames(version)
getPcDatabaseNames(version)
version |
PC2 version |
a names of databases that can be used as part of queries
getPcDatabaseNames(version=10)
getPcDatabaseNames(version=10)
Get base Pathway Commons URL
getPcUrl()
getPcUrl()
paxtoolsr will support versions Pathway Commons 5 and later. Old versions of the webservice will not be not be operational. Users can parse older BioPAX outputs as an alternative.
a string with base Pathway Commons URL
url <- getPcUrl()
url <- getPcUrl()
Get the shortest between two IDs (HGNC or CHEBI)
getShortestPathSif( sif, idA, idB, mode = c("all", "out", "in"), weights = NULL, verbose = FALSE, filterFun, ... )
getShortestPathSif( sif, idA, idB, mode = c("all", "out", "in"), weights = NULL, verbose = FALSE, filterFun, ... )
sif |
a SIF network |
idA |
HGNC or CHEBI (CHEBI:XXXXX) ID |
idB |
HGNC or CHEBI (CHEBI:XXXXX) ID |
mode |
see shortest_paths() in igraph |
weights |
see shortest_paths() in igraph |
verbose |
a boolean whether to show debugging information |
filterFun |
a function to filter multiple paths of the same length |
... |
additional arguments passed on to filterFun |
a data.frame representing a SIF network
idA <- "DAP3" idB <- "RPS16" sif <- readSif(system.file("extdata", "test_sif_shortestPath.txt", package="paxtoolsr")) filterFun <- function(vpaths) { idx <- sample(1:length(vpaths), 1); return(vpaths[[idx]]) } m1 <- getShortestPathSif(sif, idA, idB, mode="all", weights=NULL, filterFun=filterFun, verbose=TRUE)
idA <- "DAP3" idB <- "RPS16" sif <- readSif(system.file("extdata", "test_sif_shortestPath.txt", package="paxtoolsr")) filterFun <- function(vpaths) { idx <- sample(1:length(vpaths), 1); return(vpaths[[idx]]) } m1 <- getShortestPathSif(sif, idA, idB, mode="all", weights=NULL, filterFun=filterFun, verbose=TRUE)
Get a list of categories of SIF interactions
getSifInteractionCategories()
getSifInteractionCategories()
Description of interaction types: http://www.pathwaycommons.org/pc2/formats Categories provided: BetweenProteins, BetweenProteinsOther (often from high-throughput experiments), BetweenProteinSmallMolecule, BetweenSmallMolecules, SignedInteractions
a list of interactions in categories
sifCat <- getSifInteractionCategories() sifCat[["BetweenProteins"]]
sifCat <- getSifInteractionCategories() sifCat[["BetweenProteins"]]
This function will retrieve a set of BioPAX elements given a graph query match.
graphPc( kind, source, target = NULL, direction = NULL, limit = NULL, format = NULL, datasource = NULL, organism = NULL, verbose = FALSE )
graphPc( kind, source, target = NULL, direction = NULL, limit = NULL, format = NULL, datasource = NULL, organism = NULL, verbose = FALSE )
kind |
graph query. Valid options can be found using |
source |
source object's URI/ID. Multiple source URIs/IDs are allowed per query, for example c("http://identifiers.org/uniprot/Q06609", "http://identifiers.org/uniprot/Q549Z0") See a note about MIRIAM and Identifiers.org in details |
target |
[Required for PATHSFROMTO graph query] target URI/ID. Multiple target URIs are allowed per query; for example c("http://identifiers.org/uniprot/Q06609", "http://identifiers.org/uniprot/Q549Z0") See a note about MIRIAM and Identifiers.org in details |
direction |
[Optional, for NEIGHBORHOOD and COMMONSTREAM algorithms] -
graph search direction. Valid options: |
limit |
graph query search distance limit (default: 1). |
format |
output format. Valid options: |
datasource |
datasource filter (same as for 'search'). |
organism |
organism filter (same as for 'search'). |
verbose |
a boolean, display the command used to query Pathway Commons |
depending on the the output format a different object may be returned.
pcFormats
source <- "http://identifiers.org/uniprot/O14503" #results <- graphPc(source=source, kind="neighborhood", format="TXT")
source <- "http://identifiers.org/uniprot/O14503" #results <- graphPc(source=source, kind="neighborhood", format="TXT")
This function merges two BioPAX OWL files
integrateBiopax(inputFile1, inputFile2, outputFile = NULL)
integrateBiopax(inputFile1, inputFile2, outputFile = NULL)
inputFile1 |
a string of the name of the input BioPAX OWL file |
inputFile2 |
a string of the name of the input BioPAX OWL file |
outputFile |
a string of the name of the output integrated BioPAX OWL file |
This method is deprecated. Use mergeBiopax instead.
an XMLInternalDocument representing a BioPAX OWL file
outFile <- tempfile() results <- integrateBiopax(system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr"), system.file("extdata", "dna_replication.owl", package="paxtoolsr"), outFile)
outFile <- tempfile() results <- integrateBiopax(system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr"), system.file("extdata", "dna_replication.owl", package="paxtoolsr"), outFile)
Load SIF as igraph Network
loadSifInIgraph(sif, directed = TRUE)
loadSifInIgraph(sif, directed = TRUE)
sif |
a binary SIF as a data.frame with three columns: "PARTICIPANT_A", "INTERACTION_TYPE", "PARTICIPANT_B" |
directed |
a boolean weather the returned graph should be directed (DEFAULT: TRUE) |
Users are likely to run into issues if the input SIF has factor levels
a directed igraph network with interaction types
results <- readSif(system.file("extdata", "test_sif.txt", package="paxtoolsr")) g <- loadSifInIgraph(results)
results <- readSif(system.file("extdata", "test_sif.txt", package="paxtoolsr")) g <- loadSifInIgraph(results)
Map Attributes from igraph to Cytoscape JSON
mapAttributes(attr.names, all.attr, i)
mapAttributes(attr.names, all.attr, i)
attr.names |
names of attributes |
all.attr |
all attributes |
i |
index |
attributes
From https://github.com/idekerlab/cy-rest-R/blob/17f748426bb5e48ba4075b9d97318ad582b250da/utility/cytoscape_util.R
Map values from One Vector to Another
mapValues(data, oldValue, newValue)
mapValues(data, oldValue, newValue)
data |
a vector of strings where values will be replaced |
oldValue |
a vector that matches values in the data vector |
newValue |
a vector of new values that will replace the old values |
return the vector with the mapped values. If there was no corresponding entry then replace it with an NA.
data <- c("A", "B", "C", "X", "Y", "Z") oldValue <- LETTERS[1:20] newValue <- letters[1:20] results <- mapValues(data, oldValue, newValue)
data <- c("A", "B", "C", "X", "Y", "Z") oldValue <- LETTERS[1:20] newValue <- letters[1:20] results <- mapValues(data, oldValue, newValue)
This function merges two BioPAX OWL files
mergeBiopax(inputFile1, inputFile2, outputFile = NULL)
mergeBiopax(inputFile1, inputFile2, outputFile = NULL)
inputFile1 |
a string of the name of the input BioPAX OWL file |
inputFile2 |
a string of the name of the input BioPAX OWL file |
outputFile |
a string of the name of the output merged BioPAX OWL file (Optional) |
Only entities that share IDs will be merged. No additional merging occurs on cross-references. Merging may result in warning messages caused as a result of redundant actions being checked against by the Java library; these messages may be ignored.
an XMLInternalDocument representing a BioPAX OWL file
outFile <- tempfile() results <- mergeBiopax(system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr"), system.file("extdata", "dna_replication.owl", package="paxtoolsr"), outFile)
outFile <- tempfile() results <- mergeBiopax(system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr"), system.file("extdata", "dna_replication.owl", package="paxtoolsr"), outFile)
A simple function to see valid options
pcDirections()
pcDirections()
BOTHSTREAM where the current entity can either be the source or target of an interaction
DOWNSTREAM where the current entity can only be the source
UPSTREAM where the current entity can only be the target
acceptable Pathway Commons directions
pcDirections()
pcDirections()
A simple function to see valid options
pcFormats()
pcFormats()
See references.
acceptable Pathway Commons formats
Output Formats Description: http://www.pathwaycommons.org/pc2/help/formats.html
pcFormats()
pcFormats()
A simple function to see valid options
pcGraphQueries()
pcGraphQueries()
COMMONSTREAM searches common downstream or common upstream of a specified set of entities based on the given directions within the boundaries of a specified length limit
NEIGHBORHOOD searches the neighborhood of given source set of nodes
PATHSBETWEEN finds the paths between specific source set of states or entities within the boundaries of a specified length limit
PATHSFROMTO finds the paths from a specific source set of states or entities to a specific target set of states or entities within the boundaries of a specified length limit
acceptable Pathway Commons graph queries
pcGraphQueries()
pcGraphQueries()
Process Pathway Commons request in various formats
processPcRequest(content, format, ...)
processPcRequest(content, format, ...)
content |
a string, content to be processed |
format |
a string, the type of format |
... |
other arguments passed to read* methods for reading different formats |
an R object using one of the read* methods provided in this package corresponding to the format
fileName <- system.file("extdata", "test_biopax.owl", package="paxtoolsr") content <- readChar(fileName, file.info(fileName)$size) results <- processPcRequest(content, "BIOPAX")
fileName <- system.file("extdata", "test_biopax.owl", package="paxtoolsr") content <- readChar(fileName, file.info(fileName)$size) results <- processPcRequest(content, "BIOPAX")
Read BioPAX files as XML documents
readBiopax(inputFile)
readBiopax(inputFile)
inputFile |
an inputFile |
an XMLInternalDocument
results <- readBiopax(system.file("extdata", "biopax3-short-metabolic-pathway.owl", package="paxtoolsr"))
results <- readBiopax(system.file("extdata", "biopax3-short-metabolic-pathway.owl", package="paxtoolsr"))
This function will read in gene sets in the GMT format into a named list.
readGmt(inputFile, removePrefix = FALSE, returnInfo = FALSE)
readGmt(inputFile, removePrefix = FALSE, returnInfo = FALSE)
inputFile |
an inputFile |
removePrefix |
Pathway Commons genesets are prefixed with a NCBI organism taxonomy number (e.g. 9606 for humans); this is a boolean whether to remove the prefix (default: FALSE) |
returnInfo |
a boolean whether to return information on genesets; these results are returned a list of two items: 1) basic GMT results and 2) datasource, organism, and id type information for each gene set (default: FALSE) |
a named list where each entry corresponds to a gene set or a list described in the returnInfo parameter
f1 <- system.file("extdata", "test_PathwayCommons12.kegg.hgnc.gmt", package="paxtoolsr") f2 <- system.file("extdata", "test_PathwayCommons12.netpath.hgnc.gmt", package="paxtoolsr") results <- readGmt(f1) results <- readGmt(f2) results <- readGmt(f1, removePrefix=TRUE) results <- readGmt(f2, returnInfo=TRUE)
f1 <- system.file("extdata", "test_PathwayCommons12.kegg.hgnc.gmt", package="paxtoolsr") f2 <- system.file("extdata", "test_PathwayCommons12.netpath.hgnc.gmt", package="paxtoolsr") results <- readGmt(f1) results <- readGmt(f2) results <- readGmt(f1, removePrefix=TRUE) results <- readGmt(f2, returnInfo=TRUE)
Read in Pathway Commons Pathways Information
readPcPathwaysInfo(inputFile = NULL, version = NULL)
readPcPathwaysInfo(inputFile = NULL, version = NULL)
inputFile |
an inputFile; if NULL then retrieve the current pathways.txt; see details (default: NULL) |
version |
a version number for a previous version of Pathway Commons data; versions 3 and above. Parameter set as version="8". Available versions "http://www.pathwaycommons.org/archives/PC2/" |
This file is generally found as pathways.txt.gz (e.g. http://www.pathwaycommons.org/archives/PC2/current/pathways.txt.gz)
a data.frame
inputFile <- system.file("extdata", "pathways.txt.gz", package="paxtoolsr") results <- readPcPathwaysInfo(inputFile, version="8")
inputFile <- system.file("extdata", "pathways.txt.gz", package="paxtoolsr") results <- readPcPathwaysInfo(inputFile, version="8")
Read SBGN files as XML documents
readSbgn(inputFile)
readSbgn(inputFile)
inputFile |
an inputFile |
an XMLInternalDocument
results <- readSbgn(system.file("extdata", "test_sbgn.xml", package="paxtoolsr"))
results <- readSbgn(system.file("extdata", "test_sbgn.xml", package="paxtoolsr"))
Read in a binary SIF file
readSif(inputFile)
readSif(inputFile)
inputFile |
an inputFile |
a data.frame with the interactions in the binary SIF format
results <- readSif(system.file("extdata", "test_sif.txt", package="paxtoolsr"))
results <- readSif(system.file("extdata", "test_sif.txt", package="paxtoolsr"))
Read in a Extended SIF file
readSifnx(inputFile)
readSifnx(inputFile)
inputFile |
an inputFile |
SIFNX files from Pathway Commons commonly come a single file that includes a tab-delimited sections for nodes and another for edges. The sections are separated by an empty lines. These sections must be split before they are read.
a list with nodes and edges entries
results <- readSifnx(system.file("extdata", "test_sifnx.txt", package="paxtoolsr"))
results <- readSifnx(system.file("extdata", "test_sifnx.txt", package="paxtoolsr"))
Search List of Vectors
searchListOfVectors(q, lst)
searchListOfVectors(q, lst)
q |
query vector |
lst |
list of vectors to search |
Taken from: http://stackoverflow.com/questions/11002391/fast-way-of-getting-index-of-match-in-list
a list of vectors with the same length as the query vector, each list entry will have indicies for lst where there was a match with the query vector. Return NA if there were no matches.
lst <- list(1:3, 3:5, 3:7) q <- c(3, 5) results <- searchListOfVectors(q, lst) names(results) <- q lst <- list(LETTERS[1:3], LETTERS[3:5], LETTERS[3:7]) q <- c("C", "E") searchListOfVectors(q, lst) lst <- list(LETTERS[3], LETTERS[4:6]) q <- "C" searchListOfVectors(q, lst) lst <- list(LETTERS[3], LETTERS[4:6]) q <- c("C") searchListOfVectors(q, lst) lst <- list(LETTERS[3], LETTERS[4:6]) q <- c("C", "E") searchListOfVectors(q, lst) lst <- list(LETTERS[3], LETTERS[4:6]) q <- "Z" searchListOfVectors(q, lst)
lst <- list(1:3, 3:5, 3:7) q <- c(3, 5) results <- searchListOfVectors(q, lst) names(results) <- q lst <- list(LETTERS[1:3], LETTERS[3:5], LETTERS[3:7]) q <- c("C", "E") searchListOfVectors(q, lst) lst <- list(LETTERS[3], LETTERS[4:6]) q <- "C" searchListOfVectors(q, lst) lst <- list(LETTERS[3], LETTERS[4:6]) q <- c("C") searchListOfVectors(q, lst) lst <- list(LETTERS[3], LETTERS[4:6]) q <- c("C", "E") searchListOfVectors(q, lst) lst <- list(LETTERS[3], LETTERS[4:6]) q <- "Z" searchListOfVectors(q, lst)
This command provides a text search using the Lucene query syntax.
searchPc( q, page = 0, datasource = NULL, organism = NULL, type = NULL, verbose = FALSE )
searchPc( q, page = 0, datasource = NULL, organism = NULL, type = NULL, verbose = FALSE )
q |
a keyword, name, external identifier, or a Lucene query string. |
page |
an integer giving the search result page number (N>=0, default: 0) |
datasource |
a vector that is a filter by data source (use names or URIs of pathway data sources or of any existing Provenance object). If multiple data source values are specified, a union of hits from specified sources is returned. For example, datasource as c("reactome", "pid") returns hits associated with Reactome or PID. |
organism |
a vector that is an organism filter. The organism can be specified either by official name, e.g. "homo sapiens" or by NCBI taxonomy id, e.g. "9606". Similar to data sources, if multiple organisms are declared a union of all hits from specified organisms is returned. For example organism as c("9606", "10016") returns results for both human and mice. Only humans, "9606" is officially supported. |
type |
BioPAX class filter. See Details. |
verbose |
a boolean, display the command used to query Pathway Commons |
Indexed fields were selected based on most common searches. Some of these fields are direct BioPAX properties, others are composite relationships. All index fields are (case-sensitive):comment, ecnumber, keyword, name, pathway, term, xrefdb, xrefid, dataSource, and organism. The pathway field maps to all participants of pathways that contain the keyword(s) in any of its text fields. This field is transitive in the sense that participants of all sub-pathways are also returned. Finally, keyword is a transitive aggregate field that includes all searchable keywords of that element and its child elements - e.g. a complex would be returned by a keyword search if one of its members has a match. Keyword is the default field type. All searches can also be filtered by data source and organism. It is also possible to restrict the domain class using the 'type' parameter. This query can be used standalone or to retrieve starting points for graph searches. Search strings are case insensitive unless put inside quotes.
BioPAX classes can be found at http://www.pathwaycommons.org/pc2/#biopax_types
an XMLInternalDocument with results
query <- "Q06609" #results <- searchPc(query) query <- "glycolysis" #results <- searchPc(query, type="Pathway")
query <- "Q06609" #results <- searchPc(query) query <- "glycolysis" #results <- searchPc(query, type="Pathway")
Splits SIFNX entries into individual pathways
splitSifnxByPathway(edges, parallel = FALSE)
splitSifnxByPathway(edges, parallel = FALSE)
edges |
a data.frame with SIF content with the additional column "PATHWAY_NAMES". "PATHWAY_NAMES" should include pathway names delimited with a semi-colon: ";". |
parallel |
a boolean that will parallelize the process; requires foreach/doSNOW/parallel packages |
This method can be slow; ~1.5 minutes for 150K+ rows. Has a parallelized method to speed things up.
a list of where each entry is a vector of row indicies for a given pathway
This function provides a summary of BioPAX classes.
summarize(inputFile)
summarize(inputFile)
inputFile |
a string of the name of the input BioPAX OWL file |
BioPAX classes are defined by the BioPAX specification: http://www.biopax.org/
list with BioPAX class counts
summary <- summarize(system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr"))
summary <- summarize(system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr"))
Summarize a SIF Network
summarizeSif(sif)
summarizeSif(sif)
sif |
a binary SIF as a data.frame with three columns: "PARTICIPANT_A", "INTERACTION_TYPE", "PARTICIPANT_B" |
a list containing a count of the unique genes in the SIF and counts for the interaction types in the network
results <- readSif(system.file("extdata", "test_sif.txt", package="paxtoolsr")) summarizeSif(results)
results <- readSif(system.file("extdata", "test_sif.txt", package="paxtoolsr")) summarizeSif(results)
Convert igraph to Cytoscape JSON
toCytoscape(igraphobj)
toCytoscape(igraphobj)
igraphobj |
an igraph object |
a JSON object
From https://github.com/idekerlab/cy-rest-R/blob/17f748426bb5e48ba4075b9d97318ad582b250da/utility/cytoscape_util.R
library(igraph) g <- barabasi.game(20) json <- toCytoscape(g)
library(igraph) g <- barabasi.game(20) json <- toCytoscape(g)
This function converts pathway information stored as BioPAX files into the the GSEA .gmt format.
toGSEA( inputFile, outputFile = NULL, database = "uniprot", crossSpeciesCheckFlag = TRUE )
toGSEA( inputFile, outputFile = NULL, database = "uniprot", crossSpeciesCheckFlag = TRUE )
inputFile |
a string of the name of the input OWL file |
outputFile |
a string of the name of the output file |
database |
a string of the name of the identifier type to be included (e.g. "HGNC Symbol") |
crossSpeciesCheckFlag |
a boolean that ensures participant protein is from same species |
The GSEA GMT format is a tab-delimited format where each row represents a gene set. The first column is the gene set name. The second column is a brief description. Other columns for each row contain genes in the gene set; these rows may be of unequal lengths.
see readGmt()
outFile <- tempfile() results <- toGSEA(system.file("extdata", "biopax3-short-metabolic-pathway.owl", package="paxtoolsr"), outFile, "uniprot", crossSpeciesCheckFlag=TRUE)
outFile <- tempfile() results <- toGSEA(system.file("extdata", "biopax3-short-metabolic-pathway.owl", package="paxtoolsr"), outFile, "uniprot", crossSpeciesCheckFlag=TRUE)
This file will convert PSIMI or older BioPAX objects to BioPAX Level 3
toLevel3(inputFile, outputFile = NULL)
toLevel3(inputFile, outputFile = NULL)
inputFile |
a string of the name of the input file |
outputFile |
a string of the name of the output BioPAX OWL file |
an XMLInternalDocument representing a BioPAX OWL file
inputFile <- system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr") outFile <- tempfile() results <- toLevel3(inputFile, outFile)
inputFile <- system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr") outFile <- tempfile() results <- toLevel3(inputFile, outFile)
This command returns all "top" pathways.
topPathways(q = NULL, datasource = NULL, organism = NULL, verbose = FALSE)
topPathways(q = NULL, datasource = NULL, organism = NULL, verbose = FALSE)
q |
[Optional] a keyword, name, external identifier, or a Lucene query string, like in 'search', but the default is '*' (match all). |
datasource |
filter by data source (same as for 'search'). |
organism |
organism filter (same as for 'search'). |
verbose |
a boolean, display the command used to query Pathway Commons |
Pathways that are neither 'controlled' nor 'pathwayComponent' of another process.
a data.frame with the following columns:
uri URI ID for the pathway
biopaxClass the type of BioPAX object
name a human readable name
dataSource the dataSource for the pathway
organism an organism identifier
pathway URI ID for the pathway
#results <- topPathways(q="TP53", datasource="panther")
#results <- topPathways(q="TP53", datasource="panther")
This function will convert a BioPAX OWL file into the Systems Biology Graphical Notation (SBGN) Markup Language (SBGNML) XML representation
toSBGN(inputFile, outputFile = NULL)
toSBGN(inputFile, outputFile = NULL)
inputFile |
a string of the name of the input BioPAX OWL file |
outputFile |
a string of the name of the output SBGNML file |
Objects in the SBGNML format are laid out using a Compound Spring Embedder (CoSE) layout
see readSbgn()
http://www.cs.bilkent.edu.tr/~ivis/layout/cose-animated-demo/cose.html
outFile <- tempfile() results <- toSBGN(system.file("extdata", "biopax3-short-metabolic-pathway.owl", package="paxtoolsr"), outFile)
outFile <- tempfile() results <- toSBGN(system.file("extdata", "biopax3-short-metabolic-pathway.owl", package="paxtoolsr"), outFile)
Convert a BioPAX OWL file to a binary SIF file
toSif(inputFile, outputFile = NULL)
toSif(inputFile, outputFile = NULL)
inputFile |
a string of the name of the input BioPAX OWL file |
outputFile |
a string of the name of the output SIF file (Optional) |
Information on SIF conversion is provided on the Pathway Commons site: http://www.pathwaycommons.org/pc2/
see readSif()
outFile <- tempfile() results <- toSif(system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr"), outFile)
outFile <- tempfile() results <- toSif(system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr"), outFile)
Converts BioPAX OWL file to extended binary SIF representation
toSifnx(inputFile, outputFile = tempfile(), idType = "uniprot")
toSifnx(inputFile, outputFile = tempfile(), idType = "uniprot")
inputFile |
a string with the name of the input BioPAX OWL file |
outputFile |
a string with the name of the output file for SIFNX information |
idType |
a string either "hgnc" or "uniprot" (DEFAULT: uniprot, more common) |
Information on SIF conversion is provided on the Pathway Commons site: http://www.pathwaycommons.org/pc2/. Also, this is a Java-based methods, it is best to use full paths.
see readSifnx()
inputFile <- system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr") results <- toSifnx(inputFile=inputFile)
inputFile <- system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr") results <- toSifnx(inputFile=inputFile)
This command provides XPath-like access to the Pathway Commons.
traverse(uri, path, verbose = FALSE)
traverse(uri, path, verbose = FALSE)
uri |
a BioPAX element URI - specified similarly to the 'GET' command above). Multiple IDs are allowed (uri=...&uri=...&uri=...). |
path |
a BioPAX propery path in the form of property1[:type1]/property2[:type2]; see properties, inverse properties, Paxtools, org.biopax.paxtools.controller.PathAccessor. |
verbose |
a boolean, display the command used to query Pathway Commons |
With traverse users can explicitly state the paths they would like to access. The format of the path query is in the form: [Initial Class]/[property1]:[classRestriction(optional)]/[property2]... A "*" sign after the property instructs path accessor to transitively traverse that property. For example, the following path accessor will traverse through all physical entity components within a complex: "Complex/component*/entityReference/xref:UnificationXref" The following will list display names of all participants of interactions, which are components (pathwayComponent) of a pathway (note: pathwayOrder property, where same or other interactions can be reached, is not considered here): "Pathway/pathwayComponent:Interaction/participant*/displayName" The optional parameter classRestriction allows to restrict/filter the returned property values to a certain subclass of the range of that property. In the first example above, this is used to get only the Unification Xrefs. Path accessors can use all the official BioPAX properties as well as additional derived classes and parameters in paxtools such as inverse parameters and interfaces that represent anonymous union classes in OWL. (See Paxtools documentation for more details).
an XMLInternalDocument with results
Paxtools Documentation: http://www.biopax.org/m2site/
uri <- "http://identifiers.org/uniprot/P38398" #results <- traverse(uri=uri, path="ProteinReference/organism/displayName")
uri <- "http://identifiers.org/uniprot/P38398" #results <- traverse(uri=uri, path="ProteinReference/organism/displayName")
This function validates BioPAX files for errors.
validate( inputFile, outputFile = NULL, type = c("xml", "html", "biopax"), autoFix = FALSE, onlyErrors = FALSE, maxErrors = NULL, notStrict = FALSE )
validate( inputFile, outputFile = NULL, type = c("xml", "html", "biopax"), autoFix = FALSE, onlyErrors = FALSE, maxErrors = NULL, notStrict = FALSE )
inputFile |
a string of the name of the input BioPAX OWL file |
outputFile |
a string of the name of the output file containing validation results |
type |
a string denoting the type of output: xml (default), html, biopax |
autoFix |
a boolean that determines if the input file should be fixed automatically. Errors that can be automatically fixed include generating displayName properties from names, inferring organism, and inferring dataSource |
onlyErrors |
a boolean of whether to only display errors |
maxErrors |
a integer denoting the number of errors to return |
notStrict |
a boolean of whether to be strict in validation (default: FALSE) |
See the publication by Rodchenkov, et al. for information on the BioPAX validator. See http://biopax.baderlab.org/validator for additional information on validator. See http://biopax.baderlab.org/validator/errorTypes.html for information on error types.
an XMLInternalDocument is returned if type is set to "xml" otherwise the location of the outputfile is returned.
Rodchenkov I, Demir E, Sander C, Bader GD. The BioPAX Validator, http://www.ncbi.nlm.nih.gov/pubmed/23918249
outFile <- tempfile() rawDoc <- validate(system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr"), onlyErrors=TRUE)
outFile <- tempfile() rawDoc <- validate(system.file("extdata", "raf_map_kinase_cascade_reactome.owl", package="paxtoolsr"), onlyErrors=TRUE)