Pathways, reactions, and biological entities in Reactome knowledge are systematically represented as an ordered network of molecular reactions. Graph database technology is an effective tool for modeling highly connected data, hence Reactome’s relational database is imported in Neo4j to create one large interconnected graph. Instances are represented as nodes and relationships between nodes as edges.
The ReactomeGraph4R
package is an R interface for retrieving data with network
structure from the Reactome Graph
Database. There is another R package, ReactomeContentService4R
,
for querying specific bits of information from the
Reactome Database through the RESTful API in the Content Service.
ReactomeGraph4R
is built on the Neo4j driver neo4r
,
thus returned data are mainly same as those called by neo4r
but with a little modifications, and are in these two formats:
nodes
and
relationships
information that can be used for
visualizationThis package will allow you to interact with the data in Reactome’s
graph database in R, with the aim of minimizing the number of Neo4j Cypher
queries that the user will need to perform. For example, if you wanted
to retrieve any Reactome information associated with the hypothetical
identifier ‘123456789’, you can use
matchObject(id="123456789")
, which would be equivalent to
using the Cypher query
MATCH (rgp:ReferenceGeneProduct) WHERE rgp.identifier = "123456789" RETURN rgp
on the Reactome graph database.
Aside from performing basic Cypher queries and formatting the results
as R objects, the package also contains functionality that can not be
easily performed using Cypher. This includes finding hierarchical data
of an instance (for example what Reactions and Pathways a Protein is
found in), getting the entire Reaction sequence/context using
preceding/following relationships, what role a PhysicalEntity plays in
each of its associated Reactions (catalyst, regulator, input, etc.),
searching for research papers that are cited in Reactome’s curations,
and even displaying network data. Please read on to see detailed
instructions for the ReactomeGraph4R
package - it is a
flexible package with plenty of useful functionality for the prospective
R-Reactome user!
Follow this instruction
to download and setup the Reactome Graph Database, then install
ReactomeGraph4R
package.
There are two questions needed to be answered for Neo4j server
connection when loading the package. You can change the url port if it’s
not 7474
. And if the Neo4j authentication is required, the
username and password are same as the ones to log in your local Neo4j
database.
## Is the url 'http://localhost:7474'? (Yes/no/cancel)
## Does Neo4J require authentication? (yes/No/cancel)
## Successfully connected to the local Reactome Graph Database v76!
The basic function matchObject
allows you to fetch
Reactome objects using:
id
: Reactome or non-Reactome identifier (e.g. UniProt
id)displayName
: display name of an objectschemaClass
: schema classproperty
: attributes of Reactome objectsrelationship
: relationship between two nodesMoreover, you could specify the argument
returnedAttributes
for retrieving only a few attributes of
the targeted object; species
for specific species; and
limit
for the number of returned objects. Note that this
function only returns “row” data.
The “id” input can be either non-Reactome or Reactome identifiers. If
you use a non-Reactome id, remember that you must also specify
databaseName
since the default one is “Reactome”. For
example, to get the Reactome instance associated with a circadian
rhythmic gene PER2:
# Retrieve the object of PER2 gene
# NOTE: if you're unsure which database to specify, you have to leave it as NULL
matchObject(id = "PER2", databaseName = NULL)
## $databaseObject
## schemaClass identifier databaseName displayName
## 1 ReferenceDNASequence PER2 COSMIC (genes) COSMIC (genes):PER2 PER2
## dbId geneName url
## 1 11509503 PER2 http://cancer.sanger.ac.uk/cosmic/gene/overview?ln=PER2
Now we know that the database name should be “COSMIC (genes)”! We can also try with a Reactome id “R-HSA-400219”:
## $databaseObject
## schemaClass speciesName oldStId isInDisease releaseDate
## 1 Reaction Homo sapiens REACT_25088 FALSE 2010-09-21
## displayName stIdVersion dbId
## 1 Beta-TrCP1 binds phosphorylated PER proteins R-HSA-400219.1 400219
## name stId category isInferred
## 1 Beta-TrCP1 binds phosphorylated PER proteins R-HSA-400219 binding TRUE
For multiple ids, say you want to get more information for your
significantly enriched pathways, you can use function
multiObjects
. The speedUp
option can determine
to use the doParallel
method or not, details see ?multiObjects
.
# retrieve multiple objects
ids <- c('R-HSA-74158', 'R-HSA-1566977', 'R-HSA-3000157', 'R-HSA-3000178', 'R-HSA-216083')
multiObjects(ids)
## schemaClass oldStId isInDisease releaseDate
## <char> <char> <lgcl> <char>
## 1: Pathway REACT_1371 FALSE 2004-07-06
## 2: Pathway REACT_160131 FALSE 2013-03-12
## 3: Pathway REACT_169262 FALSE 2013-09-18
## 4: Pathway REACT_163906 FALSE 2013-06-11
## 5: Pathway REACT_13552 FALSE 2008-06-30
## displayName stId speciesName diagramHeight
## <char> <char> <char> <int>
## 1: RNA Polymerase III Transcription R-HSA-74158 Homo sapiens 1012
## 2: Fibronectin matrix formation R-HSA-1566977 Homo sapiens NA
## 3: Laminin interactions R-HSA-3000157 Homo sapiens NA
## 4: ECM proteoglycans R-HSA-3000178 Homo sapiens NA
## 5: Integrin cell surface interactions R-HSA-216083 Homo sapiens 1564
## stIdVersion dbId name hasDiagram
## <char> <int> <char> <lgcl>
## 1: R-HSA-74158.2 74158 RNA Polymerase III Transcription TRUE
## 2: R-HSA-1566977.2 1566977 Fibronectin matrix formation FALSE
## 3: R-HSA-3000157.2 3000157 Laminin interactions FALSE
## 4: R-HSA-3000178.4 3000178 ECM proteoglycans FALSE
## 5: R-HSA-216083.4 216083 Integrin cell surface interactions TRUE
## isInferred doi diagramWidth
## <lgcl> <char> <int>
## 1: FALSE 10.3180/REACT_1371.1 1863
## 2: FALSE 10.3180/REACT_160131.1 NA
## 3: FALSE 10.3180/REACT_169262.1 NA
## 4: FALSE 10.3180/REACT_163906.1 NA
## 5: FALSE 10.3180/REACT_13552.1 3755
Instances can also be fetched by their “displayNames”. Do note that spaces and symbols within the name are required. Here we focus on the complex SUMO1:TOP1 in nucleoplasm “SUMO1:TOP1 [nucleoplasm]” in C. elegans:
## $databaseObject
## schemaClass speciesName isInDisease displayName
## 1 Complex Caenorhabditis elegans FALSE SUMO1:TOP1 [nucleoplasm]
## stIdVersion dbId name stId
## 1 R-CEL-4641301.1 10549504 SUMO1:TOP1 R-CEL-4641301
When retrieving instances belonging to one schema class, it’s better
specify the argument limit
as well for restricting the
number of returned instances. For all available schema classes see
Reactome Data Schema.
For instance, to get 5 “EntitySets” in human and then return their
display names and stId only:
# Get 5 instance in Class EntitySet and return displayName & stId
entity.set <- matchObject(schemaClass = "EntitySet", species = "human",
returnedAttributes = c("displayName", "stId"), limit = 5)
entity.set[["databaseObject"]] # show as dataframe
## displayName
## 1 HSP90AA1, HSP90AB1 [lysosomal lumen]
## 2 Substrates for chaperone mediated autophagy [lysosomal lumen]
## 3 Phosphorylated PLINs from lipid droplet surface [lysosomal lumen]
## 4 PolyUb-Misfolded Proteins [lysosomal lumen]
## 5 PolyUb-Misfolded cilia proteins [lysosomal lumen]
## stId
## 1 R-HSA-9622845
## 2 R-HSA-9625158
## 3 R-HSA-9639394
## 4 R-HSA-9660006
## 5 R-HSA-9660010
By specifying the property
, nodes with the given
property (or properties), which are actually attributes/slots of
Reactome instances, could be returned. Let’s try to get instances that
are chimeric and are in disease.
# Get instances with conditions of properties that are stored in a list
matchObject(property = list(isChimeric = TRUE, isInDisease = TRUE), limit = 10)[["databaseObject"]]
## schemaClass speciesName isInDisease releaseDate
## 1 Reaction Homo sapiens TRUE 2016-03-23
## 2 FailedReaction Mus musculus TRUE 2018-09-12
## 3 Complex Mus musculus TRUE <NA>
## 4 FailedReaction Homo sapiens TRUE 2018-09-12
## displayName
## 1 Mouse FGFR2 IIIa TM binds FGF1,2 and full-length receptors
## 2 Defective Mutyh mutants do not cleave adenine mispaired with 8-oxoguanine
## 3 Mutyh G257E,Mutyh A341V:(OGUA:Ade)-dsDNA [nucleoplasm]
## 4 MECP2 mutants P302R,K304E,K305R,R306C,(R306H) do not bind NCoR/Smrt complex
## stIdVersion dbId
## 1 R-NUL-8853328.2 8853328
## 2 R-NUL-9606338.3 9606338
## 3 R-NUL-9606341.1 9606341
## 4 R-NUL-9005752.1 9005752
## name
## 1 Mouse FGFR2 IIIa TM binds FGF1,2 and full-length receptors
## 2 Defective Mutyh mutants do not cleave adenine mispaired with 8-oxoguanine
## 3 Mutyh G257E,Mutyh A341V:(OGUA:Ade)-dsDNA
## 4 MECP2 mutants P302R,K304E,K305R,R306C,(R306H) do not bind NCoR/Smrt complex
## isChimeric stId category isInferred
## 1 TRUE R-NUL-8853328 binding FALSE
## 2 TRUE R-NUL-9606338 transition FALSE
## 3 TRUE R-NUL-9606341 <NA> NA
## 4 TRUE R-NUL-9005752 transition FALSE
The actual Cypher query for this command is
MATCH (n1)-[r:relationship]->(n2) RETURN n1,n2
,
therefore the n1
and n2
dataframes in the
returned list have the same number of rows, and every two rows with the
same index are connected with the given relationship.
## $n1
## schemaClass speciesName isInDisease displayName
## 1 Complex Homo sapiens FALSE p-GFAP:EEF1A1 [lysosomal membrane]
## 2 Complex Homo sapiens FALSE p-GFAP:EEF1A1 [lysosomal membrane]
## 3 Complex Rattus norvegicus FALSE p-Gfap:Eef1a1 [lysosomal membrane]
## stIdVersion dbId name isChimeric stId
## 1 R-HSA-9626070.1 9626070 p-GFAP:EEF1A1 FALSE R-HSA-9626070
## 2 R-HSA-9626070.1 9626070 p-GFAP:EEF1A1 FALSE R-HSA-9626070
## 3 R-RNO-9626031.1 9626031 p-Gfap:Eef1a1 FALSE R-RNO-9626031
##
## $n2
## schemaClass speciesName isInDisease
## 1 EntityWithAccessionedSequence Homo sapiens FALSE
## 2 EntityWithAccessionedSequence Homo sapiens FALSE
## 3 EntityWithAccessionedSequence Rattus norvegicus FALSE
## displayName stIdVersion dbId name stId
## 1 p-GFAP [lysosomal membrane] R-HSA-9626054.1 9626054 p-GFAP R-HSA-9626054
## 2 EEF1A1 [lysosomal membrane] R-HSA-9626022.1 9626022 EEF1A1 R-HSA-9626022
## 3 p-Gfap [lysosomal membrane] R-RNO-9626029.1 9626029 p-Gfap R-RNO-9626029
## startCoordinate referenceType endCoordinate
## 1 1 ReferenceGeneProduct 432
## 2 1 ReferenceGeneProduct 462
## 3 1 ReferenceGeneProduct 430
These following functions in the MATCH family provide several commonly used cases that you might be interested in for Reactome data querying.
Reactome data are organized in a hierarchical way:
Pathway --> Reaction --> PhysicalEntity
, or sometimes
it might be
Pathway --> Reaction --> PhysicalEntity --> ReferenceEntity
where the PhysicalEntity has links to external database information via
the ReferenceEntity. You could retrieve the hierarchical data of a given
Event (Pathway or Reaction) or Entity
(PhysicalEntity or ReferenceEntity) using matchHierarchy
.
In this example, we’ll take a look at a RNA sequence (PhysicalEntity)
“POU5F1 mRNA [cytosol]” with stable identifier “R-HSA-500358”:
# Get hierarchy data of R-HSA-500358
pou5f1.hierarchy <- matchHierarchy(id = "R-HSA-500358", type = "row")
str(pou5f1.hierarchy, max.level = 1)
## List of 4
## $ physicalEntity:'data.frame': 1 obs. of 12 variables:
## $ event :'data.frame': 3 obs. of 13 variables:
## $ upperevent :'data.frame': 2 obs. of 16 variables:
## $ relationships :'data.frame': 7 obs. of 9 variables:
The RNA sequence we specified is in the physicalEntity
dataframe of the result list. It’s directly connected with those Events
in the event
dataframe, which are then connected with
Events in the upperevent
. Relationships between all these
objects are in relationship
dataframe:
## type startNode.dbId startNode.schemaClass endNode.dbId
## 1 hasEvent 452723 Pathway 452392
## 2 output 452392 BlackBoxEvent 500358
## 3 hasEvent 1266738 TopLevelPathway 452723
## 4 input 500366 Reaction 500358
## 5 hasEvent 452723 Pathway 500366
## 6 input 2889036 BlackBoxEvent 500358
## 7 hasEvent 452723 Pathway 2889036
## endNode.schemaClass
## 1 BlackBoxEvent
## 2 EntityWithAccessionedSequence
## 3 Pathway
## 4 EntityWithAccessionedSequence
## 5 Reaction
## 6 EntityWithAccessionedSequence
## 7 BlackBoxEvent
This method can find all ReactionLikeEvents (RLEs) connected with a given Pathway by the relationship “hasEvent”. Additionally, the input can be a RLE, the result would be Pathway(s) linked via “hasEvent” together with other RLEs linked with the Pathways(s). Here we focus on a RLE “OAS1 oligomerizes” with identifier “R-HSA-8983688”.
# Find Reactions connected with R-HSA-8983688
rle <- matchReactionsInPathway(event.id = "R-HSA-8983688", type = "row")
## List of 4
## $ reactionLikeEvent :'data.frame': 1 obs. of 12 variables:
## $ pathway :'data.frame': 1 obs. of 14 variables:
## $ otherReactionLikeEvent:'data.frame': 14 obs. of 12 variables:
## $ relationships :'data.frame': 15 obs. of 9 variables:
## schemaClass speciesName isInDisease releaseDate displayName
## 1 Reaction Homo sapiens FALSE 2018-09-12 OAS1 oligomerizes
## stIdVersion dbId name isChimeric stId category
## 1 R-HSA-8983688.1 8983688 OAS1 oligomerizes FALSE R-HSA-8983688 binding
## isInferred
## 1 FALSE
## schemaClass speciesName isInDisease releaseDate displayName
## 1 Pathway Homo sapiens FALSE 2018-09-12 OAS antiviral response
## stIdVersion dbId name stId isInferred
## 1 R-HSA-8983711.2 8983711 OAS antiviral response R-HSA-8983711 FALSE
## diagramHeight hasDiagram doi diagramWidth
## 1 753 TRUE 10.3180/R-HSA-8983711.1 970
otherReactionLikeEvent
are RLEs other than “OAS1
oligomerizes” connected with Pathway “OAS antiviral response”.
## [1] "OAS1 binds viral dsRNA"
## [2] "RNASEL cleaves viral ssRNA"
## [3] "OAS2 binds viral dsRNA"
## [4] "RNASEL cleaves cellular ssRNA"
## [5] "OAS2 produces oligoadenylates"
## [6] "ABCE1 binds RNASEL"
## [7] "PDE12 cleaves 2'-5' oligoadenylates "
## [8] "Viral 2',5'-PDE cleaves 2'-5' oligoadenylates "
## [9] "OAS3 binds viral dsRNA"
## [10] "RNASEL binds 2'-5' oligoadenylate"
## [11] "OAS3 produces oligoadenylates"
## [12] "OAS1 produces oligoadenylates"
## [13] "OASL binds DDX58"
## [14] "OAS2 dimerizes"
The contect of these Events can actually be visualized in R using the
exportImage
function from the
ReactomeContentService4R
package! And it looks the same as
that in Pathway
Browser. To get the pathway diagram of Pathway “OAS antiviral
response” (stId: R-HSA-8983711) that we just retrieved, and highlight
the RLE (stId: R-HSA-8983688) that we specified:
## Connecting...welcome to Reactome v90!
With the diagram shown above, we can see that the Reaction
highlighted in blue is in the middle of a Reaction cascade, with other
RLEs immediately preceding and following it. In order to know what these
preceding and following Reactions are, we can use function
matchPrecedingAndFollowingEvents
to find RLEs linked via
“precedingEvent”. The argument depth
is used to describe
the “variable length relationships”, the default value is 1
(i.e. immediately connected); or you can set
all.depth = TRUE
for retrieving the whole context. Details
see ?matchPrecedingAndFollowingEvents
.
# Retrieve RLE context with depth = 2
rle.context <- matchPrecedingAndFollowingEvents(event.id = "R-HSA-8983688", depth = 2, type = "row")
str(rle.context, max.level = 1)
## List of 4
## $ precedingEvent:'data.frame': 1 obs. of 12 variables:
## $ event :'data.frame': 1 obs. of 12 variables:
## $ followingEvent:'data.frame': 2 obs. of 12 variables:
## $ relationships :'data.frame': 3 obs. of 9 variables:
Usually we query data in a way like parent to child
(parent) --> (child)
, where we provide information about
the parent. But with the Graph Database, we are able to search in a
reverse direction that is child to parent
(parent) <-- (child)
with child’s information only. This
“child-to-parent” relationship is called Referral. You
could carry out the referral fetching by matchReferrals
that supports Classes “Event”, “PhysicalEntity”, “Regulation”,
“CatalystActivity”, “ReferenceEntity”, “Interaction”,
“AbstractModifiedResidue”. Depth related arguments could also be
specified here. More details sees ?matchReferrals
.
We would look at a Regulation “Negative gene expression regulation by ’EGR2 [nucleoplasm]” with dbId “6810147”:
## $Regulation
## schemaClass
## 1 NegativeGeneExpressionRegulation
## displayName stIdVersion
## 1 Negative gene expression regulation by 'EGR2 [nucleoplasm]' R-HSA-6810147.1
## dbId stId
## 1 6810147 R-HSA-6810147
##
## $databaseObject
## schemaClass displayName stIdVersion dbId stId
## 1 BlackBoxEvent HOXB1 gene is transcribed R-HSA-5617454.3 5617454 R-HSA-5617454
## speciesName isInDisease releaseDate name isChimeric
## 1 Homo sapiens FALSE 2015-12-15 HOXB1 gene is transcribed FALSE
## category isInferred
## 1 omitted TRUE
##
## $relationships
## neo4jId type startNode.neo4jId startNode.dbId startNode.schemaClass
## 1 5505941 regulatedBy 1330001 5617454 BlackBoxEvent
## endNode.neo4jId endNode.dbId endNode.schemaClass properties
## 1 1330002 6810147 NegativeGeneExpressionRegulation 1, 4
The dbId of endNode (endNode.dbId
in
$relationships
) is exactly the dbId we just specified.
Interactions of a PhysicalEntity (PE) could be retrieved by
matchInteractors
. This method begins with finding the
ReferenceEntity matched with the PE, then get the Interactions having
“interactor” relationship with the ReferenceEntity. For example, to get
interactions of “FANCM [nucleoplasm]” with stable id “R-HSA-419535”:
## List of 4
## $ physicalEntity :'data.frame': 1 obs. of 12 variables:
## $ referenceEntity:'data.frame': 1 obs. of 17 variables:
## $ interaction :'data.frame': 7 obs. of 8 variables:
## $ relationships :'data.frame': 8 obs. of 9 variables:
## schemaClass displayName
## 1 UndirectedInteraction UniProt:Q8IYD8 <-> UniProt:Q8N2Z9 (IntAct)
## 2 UndirectedInteraction UniProt:Q8IYD8 <-> IntAct:EBI-12506732 (IntAct)
## 3 UndirectedInteraction UniProt:Q8IYD8 <-> UniProt:O95208-2 (IntAct)
## 4 UndirectedInteraction UniProt:P14373 <-> UniProt:Q8IYD8 (IntAct)
## 5 UndirectedInteraction UniProt:O15360 <-> UniProt:Q8IYD8 (IntAct)
## 6 UndirectedInteraction UniProt:Q9BTP7 <-> UniProt:Q8IYD8 (IntAct)
## 7 UndirectedInteraction UniProt:Q8NB91 <-> UniProt:Q8IYD8 (IntAct)
## dbId databaseName
## 1 11972059 IntAct
## 2 11972057 IntAct
## 3 11972055 IntAct
## 4 11902647 IntAct
## 5 11897311 IntAct
## 6 11895701 IntAct
## 7 11890537 IntAct
## url
## 1 https://www.ebi.ac.uk/intact/pages/interactions/interactions.xhtml?query=EBI-8670867%20OR%20EBI-8670909
## 2 https://www.ebi.ac.uk/intact/pages/interactions/interactions.xhtml?query=EBI-12507230%20OR%20EBI-12507920
## 3 https://www.ebi.ac.uk/intact/pages/interactions/interactions.xhtml?query=EBI-25046655%20OR%20EBI-23491147%20OR%20EBI-24557288
## 4 https://www.ebi.ac.uk/intact/pages/interactions/interactions.xhtml?query=EBI-10583656%20OR%20EBI-10262943%20OR%20EBI-10417427
## 5 https://www.ebi.ac.uk/intact/pages/interactions/interactions.xhtml?query=EBI-12506540%20OR%20EBI-12506610%20OR%20EBI-7359857
## 6 https://www.ebi.ac.uk/intact/pages/interactions/interactions.xhtml?query=EBI-12507902%20OR%20EBI-12506096%20OR%20EBI-12506470%20OR%20EBI-12506540%20OR%20EBI-12507795%20OR%20EBI-12506454%20OR%20EBI-12506136%20OR%20EBI-12506358%20OR%20EBI-12506512%20OR%20EBI-12506600
## 7 https://www.ebi.ac.uk/intact/pages/interactions/interactions.xhtml?query=EBI-7359880%20OR%20EBI-12506540
## score pubmed
## 1 0.524 23886707
## 2 0.524 17289582
## 3 0.556 32296183
## 4 0.556 25416956
## 5 0.527 17396147, 17289582
## 6 0.671 17289582
## 7 0.527 17396147, 17289582
## accession
## 1 EBI-8670867, EBI-8670909
## 2 EBI-12507230, EBI-12507920
## 3 EBI-25046655, EBI-23491147, EBI-24557288
## 4 EBI-10583656, EBI-10262943, EBI-10417427
## 5 EBI-12506540, EBI-12506610, EBI-7359857
## 6 EBI-12507902, EBI-12506096, EBI-12506470, EBI-12506540, EBI-12507795, EBI-12506454, EBI-12506136, EBI-12506358, EBI-12506512, EBI-12506600
## 7 EBI-7359880, EBI-12506540
The roles of PhysicalEntities include “input”, “output”, “regulator”,
“catalyst”, which are represented as relationships “input” ,“output”,
“regulatedBy”, “catalystActivity” respectively. Therefore, we could
retrieve instances that are possibly connected with the given
PhysicalEntity via these relationships, and see the exact role(s) from
the existing relationships. We’ll take a look at a Polymer “HSBP1
oligomer [cytosol]” and input it into matchPEroles
. Either
id
or displayName
could be specified.
# Find possible roles of the given PE
roles <- matchPEroles(pe.displayName = "HSBP1 oligomer [cytosol]")
## List of 3
## $ physicalEntity:'data.frame': 6 obs. of 10 variables:
## $ databaseObject:'data.frame': 6 obs. of 12 variables:
## $ relationships :'data.frame': 6 obs. of 9 variables:
## [1] "output"
Diseases related to a PhysicalEntity or an Event could be found using
function matchDisease
. In reverse, you can also get
PhysicalEntities/Events associated with a Disease.
# Fetch Reactome instances associated with 'neuropathy' in human
matchDiseases(displayName = "neuropathy", species = "human", type = "row")
## $disease
## schemaClass identifier synonym databaseName displayName dbId
## 1 Disease 870 peripheral neuropathy DOID neuropathy 9635395
## name definition
## 1 neuropathy A nervous system disease that is located_in nerves or nerve cells.
## url
## 1 https://www.ebi.ac.uk/ols/ontologies/doid/terms?obo_id=DOID:870
##
## $databaseObject
## schemaClass displayName dbId name
## 1 ChemicalDrug pralidoxime [extracellular region] 9635003 pralidoxime
## isInDisease stIdVersion stId
## 1 TRUE R-ALL-9635003.1 R-ALL-9635003
##
## $relationships
## neo4jId type startNode.neo4jId startNode.dbId startNode.schemaClass
## 1 1031748 disease 263887 9635003 ChemicalDrug
## endNode.neo4jId endNode.dbId endNode.schemaClass properties
## 1 263890 9635395 Disease 1, 0
Given the PubMed id or the title for a paper, Reactome instances
related to this paper could be found by matchPaperObjects
.
The DatabaseObjects are connected with the LiteratureReference
(i.e. paper) via “literatureReference” relationship. Let’s try with a
paper “Aggresomes: a cellular response to misfolded proteins”.
# fetch objects by paper title
matchPaperObjects(displayName = "Aggresomes: a cellular response to misfolded proteins", type = "row")
## $literatureReference
## volume schemaClass journal pages year
## 1 143 LiteratureReference J. Cell Biol. 1883-98 1998
## displayName dbId
## 1 Aggresomes: a cellular response to misfolded proteins 9646681
## pubMedIdentifier title
## 1 9864362 Aggresomes: a cellular response to misfolded proteins
##
## $databaseObject
## schemaClass displayName dbId
## 1 Reaction PolyUb-Misfolded proteins bind vimentin to form aggresome 9646679
## speciesName isInDisease releaseDate stIdVersion
## 1 Homo sapiens FALSE 2019-12-10 R-HSA-9646679.2
## name isChimeric
## 1 PolyUb-Misfolded proteins bind vimentin to form aggresome FALSE
## stId category isInferred
## 1 R-HSA-9646679 binding FALSE
##
## $relationships
## neo4jId type startNode.neo4jId startNode.dbId
## 1 46368 literatureReference 12675 9646679
## startNode.schemaClass endNode.neo4jId endNode.dbId endNode.schemaClass
## 1 Reaction 12676 9646681 LiteratureReference
## properties
## 1 1, 0
The ability to view network graphs is definitely a big advantage of a graph database. Fortunately, R has developed into a powerful tool for network analysis. There are a number of R packages targeted network analysis and visualization, therefore we are able to get a graph just like the one in the Neo4j server, and even to set more visualization options!
Don’t forget that results can also be returned in the “graph” format, which are used to create the network visualization in R! This comprehensive tutorial - Network visualization with R (Ognyanova, K., 2019) - walks through each step on the creation of network graphs in R.
Here we will show a couple of examples to generate an interactive network graph after retrieving the specific Reactome graph data. Let’s say we want to visualize the hierarchical data of a ReferenceEntity “UniProt:P33992 MCM5”.
First install and load the following packages.
# install packages
list.pkg <- c("stringr", "visNetwork", "networkD3", "wesanderson")
new.pkg <- list.pkg[!(list.pkg %in% installed.packages()[ ,"Package"])]
if (length(new.pkg)) {
install.packages(new.pkg, repos = "https://cloud.r-project.org/")
}
# load
invisible(suppressPackageStartupMessages(lapply(list.pkg, library, character.only = TRUE)))
We will try the visNetwork
package which visualizes networks using vis.js
javascript library.
# Get graph output data
graph <- matchHierarchy(displayName = "UniProt:P33992 MCM5", databaseName = "UniProt", type = "graph")
relationships <- graph[["relationships"]]
nodes <- graph[["nodes"]]
nodes <- unnestListCol(df = nodes, column = "properties") # unnest the 'properties' column of lists
head(nodes); head(relationships)
## id
## 1 437126
## 2 87093
## 3 87094
## 4 437804
## 5 437831
## 6 438701
## label
## 1 DatabaseObject, Event, ReactionLikeEvent, Reaction
## 2 DatabaseObject, PhysicalEntity, GenomeEncodedEntity, EntityWithAccessionedSequence
## 3 DatabaseObject, ReferenceEntity, ReferenceGeneProduct, ReferenceSequence
## 4 DatabaseObject, Event, Pathway
## 5 DatabaseObject, Event, Pathway
## 6 DatabaseObject, Event, Pathway
## schemaClass speciesName oldStId isInDisease releaseDate
## 1 Reaction Homo sapiens REACT_1303 FALSE 2004-07-06
## 2 EntityWithAccessionedSequence Homo sapiens REACT_4343 FALSE <NA>
## 3 ReferenceGeneProduct <NA> <NA> NA <NA>
## 4 Pathway Homo sapiens REACT_2148 FALSE 2004-07-06
## 5 Pathway Homo sapiens REACT_2014 FALSE 2004-07-06
## 6 Pathway Homo sapiens REACT_899 FALSE 2004-07-06
## displayName stIdVersion
## 1 Mcm4,6,7 trimer forms and associates with the replication fork R-HSA-69019.2
## 2 MCM5 [nucleoplasm] R-HSA-68551.1
## 3 UniProt:P33992 MCM5 <NA>
## 4 Switching of origins to a post-replicative state R-HSA-69052.3
## 5 Synthesis of DNA R-HSA-69239.2
## 6 S Phase R-HSA-69242.2
## dbId
## 1 69019
## 2 68551
## 3 94954
## 4 69052
## 5 69239
## 6 69242
## name
## 1 Mcm4,6,7 trimer forms and associates with the replication fork, Rearrangement and mobilization of Mcm2-7
## 2 MCM5, Mcm5, DNA replication licensing factor MCM5, CDC46 homolog, P1-CDC46
## 3 MCM5
## 4 Switching of origins to a post-replicative state
## 5 Synthesis of DNA
## 6 S Phase
## stId category isInferred startCoordinate referenceType
## 1 R-HSA-69019 dissociation FALSE NA <NA>
## 2 R-HSA-68551 <NA> NA 1 ReferenceGeneProduct
## 3 <NA> <NA> NA NA <NA>
## 4 R-HSA-69052 <NA> FALSE NA <NA>
## 5 R-HSA-69239 <NA> FALSE NA <NA>
## 6 R-HSA-69242 <NA> FALSE NA <NA>
## endCoordinate identifier chain databaseName
## 1 NA <NA> NULL <NA>
## 2 734 <NA> NULL <NA>
## 3 NA P33992 initiator methionine:1, chain:2-734 UniProt
## 4 NA <NA> NULL <NA>
## 5 NA <NA> NULL <NA>
## 6 NA <NA> NULL <NA>
## sequenceLength
## 1 NA
## 2 NA
## 3 734
## 4 NA
## 5 NA
## 6 NA
## description
## 1 NULL
## 2 NULL
## 3 recommendedName: fullName evidence="15"DNA replication licensing factor MCM5 ecNumber3.6.4.12/ecNumber alternativeName: fullName evidence="6"CDC46 homolog alternativeName: fullName evidence="14"P1-CDC46
## 4 NULL
## 5 NULL
## 6 NULL
## url geneName checksum
## 1 <NA> NULL <NA>
## 2 <NA> NULL <NA>
## 3 http://purl.uniprot.org/uniprot/P33992 MCM5, CDC46 A80280E61749998D
## 4 <NA> NULL <NA>
## 5 <NA> NULL <NA>
## 6 <NA> NULL <NA>
## comment
## 1 NULL
## 2 NULL
## 3 FUNCTION Acts as component of the MCM2-7 complex (MCM complex) which is the putative replicative helicase essential for 'once per cell cycle' DNA replication initiation and elongation in eukaryotic cells. The active ATPase sites in the MCM2-7 ring are formed through the interaction surfaces of two neighboring subunits such that a critical structure of a conserved arginine finger motif is provided in trans relative to the ATP-binding site of the Walker A box of the adjacent subunit. The six ATPase active sites, however, are likely to contribute differentially to the complex helicase activity.SUBUNIT Component of the MCM2-7 complex (PubMed:17296731, PubMed:16899510). The complex forms a toroidal hexameric ring with the proposed subunit order MCM2-MCM6-MCM4-MCM7-MCM3-MCM5 (PubMed:17296731, PubMed:16899510). Interacts with ANKRD17 (PubMed:23711367). Interacts with MCMBP (PubMed:17296731). Component of the CMG helicase complex, composed of the MCM2-7 complex, the GINS complex and CDC45 (By similarity).MISCELLANEOUS Early fractionation of eukaryotic MCM proteins yielded a variety of dimeric, trimeric and tetrameric complexes with unclear biological significance. The MCM2-7 hexamer is the proposed physiological active complex.SIMILARITY Belongs to the MCM family.
## 4 NULL
## 5 NULL
## 6 NULL
## secondaryIdentifier
## 1 NULL
## 2 NULL
## 3 MCM5_HUMAN, O60785, Q14578, Q9BTJ4, Q9BWL8
## 4 NULL
## 5 NULL
## 6 NULL
## keyword
## 1 NULL
## 2 NULL
## 3 3D-structure, Acetylation, ATP-binding, Cell cycle, Chromosome, Cytoplasm, Disease variant, DNA replication, DNA-binding, Dwarfism, Helicase, Hydrolase, Nucleotide-binding, Nucleus, Phosphoprotein, Reference proteome
## 4 NULL
## 5 NULL
## 6 NULL
## otherIdentifier
## 1 NULL
## 2 NULL
## 3 11748675_a_at, 16929573, 201755_at, 216237_s_at, 3944148, 3944149, 3944151, 3944153, 3944154, 3944155, 3944157, 3944159, 3944160, 3944162, 3944163, 3944164, 3944165, 3944167, 3944168, 3944169, 3944170, 3944171, 3944172, 3944174, 3944175, 3944180, 3944181, 3944182, 3944183, 4174, 46492_at, 8072687, 982_at, A_33_P3284951, GO:0000082, GO:0000166, GO:0000228, GO:0000278, GO:0000727, GO:0000781, GO:0003674, GO:0003677, GO:0003678, GO:0003688, GO:0003697, GO:0004386, GO:0005515, GO:0005524, GO:0005575, GO:0005622, GO:0005634, GO:0005654, GO:0005694, GO:0005737, GO:0005829, GO:0006259, GO:0006260, GO:0006267, GO:0006270, GO:0006950, GO:0007049, GO:0008150, GO:0009058, GO:0016020, GO:0016787, GO:0016887, GO:0017116, GO:0022607, GO:0032508, GO:0032991, GO:0034641, GO:0042555, GO:0043138, GO:0043167, GO:0043226, GO:0051276, GO:0051301, GO:0065003, GO:0071162, HMNXSV003016392, HMNXSV003039658, Hs.77171.1.A1_3p_a_at, PH_hs_0004448, TC22000260.hg, TC22001090.hg, X74795_at, g6981191_3p_a_at
## 4 NULL
## 5 NULL
## 6 NULL
## isSequenceChanged hasDiagram doi diagramHeight diagramWidth
## 1 NA NA <NA> NA NA
## 2 NA NA <NA> NA NA
## 3 FALSE NA <NA> NA NA
## 4 NA FALSE 10.3180/REACT_2148.1 NA NA
## 5 NA TRUE <NA> 1240 1837
## 6 NA TRUE 10.3180/REACT_899.2 1488 2354
## definition hasEHLD
## 1 <NA> NA
## 2 <NA> NA
## 3 <NA> NA
## 4 <NA> NA
## 5 <NA> NA
## 6 The part of the cell cycle during which DNA synthesis takes place. NA
## id type startNode endNode properties
## 1 341524 referenceEntity 87093 87094 1, 0
## 2 1741794 hasEvent 437804 437126 1, 2
## 3 1737814 output 437126 87093 1, 2
## 4 1742098 hasEvent 437831 437804 1, 1
## 5 1745766 hasEvent 438701 437831 1, 1
## 6 5233969 hasEvent 1266713 438701 1, 1
# Transform into visNetwork format for nodes & edges
vis.nodes <- data.frame(id = nodes$id,
label = str_trunc(nodes$displayName, 20), # truncate the long names
group = nodes$schemaClass,
title = paste0("<p><b>", nodes$schemaClass, "</b><br>",
"dbId: ", nodes$dbId, "<br>", nodes$displayName, "</p>"))
vis.edges <- data.frame(from = relationships$startNode,
to = relationships$endNode,
label = relationships$type,
font.size = 16,
font.color = 'steelblue')
head(vis.nodes); head(vis.edges)
## id label group
## 1 437126 Mcm4,6,7 trimer f... Reaction
## 2 87093 MCM5 [nucleoplasm] EntityWithAccessionedSequence
## 3 87094 UniProt:P33992 MCM5 ReferenceGeneProduct
## 4 437804 Switching of orig... Pathway
## 5 437831 Synthesis of DNA Pathway
## 6 438701 S Phase Pathway
## title
## 1 <p><b>Reaction</b><br>dbId: 69019<br>Mcm4,6,7 trimer forms and associates with the replication fork</p>
## 2 <p><b>EntityWithAccessionedSequence</b><br>dbId: 68551<br>MCM5 [nucleoplasm]</p>
## 3 <p><b>ReferenceGeneProduct</b><br>dbId: 94954<br>UniProt:P33992 MCM5</p>
## 4 <p><b>Pathway</b><br>dbId: 69052<br>Switching of origins to a post-replicative state</p>
## 5 <p><b>Pathway</b><br>dbId: 69239<br>Synthesis of DNA</p>
## 6 <p><b>Pathway</b><br>dbId: 69242<br>S Phase</p>
## from to label font.size font.color
## 1 87093 87094 referenceEntity 16 steelblue
## 2 437804 437126 hasEvent 16 steelblue
## 3 437126 87093 output 16 steelblue
## 4 437831 437804 hasEvent 16 steelblue
## 5 438701 437831 hasEvent 16 steelblue
## 6 1266713 438701 hasEvent 16 steelblue
We are going to change the visual parameters of nodes and edges by
adding them as columns in the dataframes. More customizations see the
visNetwork
documentation or
?vignette("Introduction-to-visNetwork")
.
# nodes parameters
## get palette colors with package 'wesanderson'
node.colors <- as.character(wes_palette(n = length(unique(vis.nodes$group)), name = "Darjeeling2"))
names(node.colors) <- levels(factor(vis.nodes$group))
## NOTE: don't use `str_replace_all` here since 'TopLevelPathway' & 'Pathway' share the string 'Pathway'
vis.nodes$color.background <- node.colors[as.numeric(factor(vis.nodes$group))] # node color
vis.nodes$color.border <- "lightgray"
## highlight the instance we specified
vis.nodes$color.border[vis.nodes$label == "UniProt:P33992 MCM5"] <- "pink"
vis.nodes$color.highlight.border <- "darkred"
vis.nodes$borderWidth <- 2 # Node border width
# edges parameters
vis.edges$width <- 1.2 # line width
edges.colors <- as.character(wes_palette(n = length(unique(vis.edges$label)), name = "FantasticFox1"))
names(edges.colors) <- unique(vis.edges$label)
vis.edges$color <- str_replace_all(vis.edges$label, edges.colors) # line color
vis.edges$arrows <- "to" # arrows: 'from', 'to', or 'middle'
vis.edges$smooth <- TRUE # should the edges be curved?
# height & width of the plot can be set here
visnet <- visNetwork(vis.nodes, vis.edges, main = "The hierarchy of protein MCM5",
height = "500px", width = "100%")
visnet
Add a drop-down menu:
We can also take a look at another package networkD3
,
which generates network graphs using D3
javascript library.
# the node ids MUST be numeric, and start from 0
nodes.idx <- as.character(as.numeric(factor(nodes$id)) - 1)
names(nodes.idx) <- nodes$id
# transform into networkD3 format
d3.edges <- data.frame(source = as.numeric(str_replace_all(relationships$startNode, nodes.idx)),
target = as.numeric(str_replace_all(relationships$endNode, nodes.idx)),
label = relationships$type)
d3.edges <- d3.edges[order(d3.edges$source), ]
d3.nodes <- cbind(idx=as.numeric(nodes.idx), nodes)
d3.nodes <- d3.nodes[order(d3.nodes$idx), ] # the order MUST be consistent with the 'source'
forceNetwork(Links = d3.edges, Nodes = d3.nodes, Source="source", Target="target",
NodeID = "displayName", Group = "schemaClass", Value = "label",
linkColour = "#afafaf", fontSize = 12, zoom = TRUE, legend = TRUE,
Nodesize = 15, opacity = 0.9, charge = -50)
To modify the forceNetwork graph, one can execute custom javascript
code with the htmlwidgets
R
package, but it won’t be discussed here.
If you found this package useful and used in your projects, please cite it.
## To cite package 'ReactomeGraph4R' in publications use:
##
## Poon C, Cook J, Shorser S, Weiser J, Haw R, Stein L (2021). _R
## interface to the reactome graph database_, volume 10.
## doi:10.7490/f1000research.1118690.1
## <https://doi.org/10.7490/f1000research.1118690.1>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## author = {Chi-Lam Poon and Justin Cook and Solomon Shorser and Joel Weiser and Robin Haw and Lincoln Stein},
## title = {R interface to the reactome graph database},
## journal = {F1000Research},
## volume = {10},
## pages = {721 (poster)},
## year = {2021},
## doi = {10.7490/f1000research.1118690.1},
## }
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] wesanderson_0.3.7 networkD3_0.4
## [3] visNetwork_2.1.2 stringr_1.5.1
## [5] ReactomeContentService4R_1.15.0 ReactomeGraph4R_1.15.0
## [7] rmarkdown_2.29
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.9 utf8_1.2.4 generics_0.1.3 tidyr_1.3.1
## [5] stringi_1.8.4 digest_0.6.37 magrittr_2.0.3 evaluate_1.0.1
## [9] attempt_0.3.1 iterators_1.0.14 fastmap_1.2.0 foreach_1.5.2
## [13] doParallel_1.0.17 jsonlite_1.8.9 promises_1.3.2 httr_1.4.7
## [17] purrr_1.0.2 fansi_1.0.6 codetools_0.2-20 jquerylib_0.1.4
## [21] cli_3.6.3 shiny_1.9.1 rlang_1.1.4 cachem_1.1.0
## [25] yaml_2.3.10 tools_4.4.2 parallel_4.4.2 dplyr_1.1.4
## [29] httpuv_1.6.15 neo4r_0.1.1 curl_6.0.1 buildtools_1.0.0
## [33] vctrs_0.6.5 R6_2.5.1 mime_0.12 lifecycle_1.0.4
## [37] magick_2.8.5 htmlwidgets_1.6.4 pkgconfig_2.0.3 bslib_0.8.0
## [41] pillar_1.9.0 later_1.4.1 data.table_1.16.2 glue_1.8.0
## [45] Rcpp_1.0.13-1 xfun_0.49 tibble_3.2.1 tidyselect_1.2.1
## [49] rstudioapi_0.17.1 sys_3.4.3 knitr_1.49 xtable_1.8-4
## [53] igraph_2.1.1 htmltools_0.5.8.1 maketools_1.3.1 compiler_4.4.2
## [57] getPass_0.2-4