This vignette will demonstrate network retrieval from the STRING database, basic analysis, TCGA data loading and visualization in Cytoscape from R using the RCy3 package.
The whole point of RCy3 is to connect with Cytoscape. You will need to install and launch Cytoscape:
For this vignette, you’ll also need the STRING app to access the STRING database from within Cytoscape: * Install the STRING app from https://apps.cytoscape.org/apps/stringapp
#{r} #available in Cytoscape 3.7.0 and above installApp('STRINGapp') #
Use Cytoscape to query the STRING database for networks of genes associated with breast cancer and ovarian cancer.
If the STRING app is not installed, no error is reported, but your network will be empty
string.cmd = 'string disease query disease="breast cancer" cutoff=0.9 species="Homo sapiens" limit=150'
commandsRun(string.cmd)
Here we are using Cytoscape’s command line syntax, which can be used for any core or app automation function, and then making a GET request. Use commandsHelp to interrogate the functions and parameters available in your active Cytoscape session, including the apps you’ve installed!
Now that we’ve got a couple networks into Cytoscape, let’s see what we can do with them from R…
Now, let’s look at the tablular data associated with our STRING networks…
One of the great things about the STRING database is all the node and edge attriubtes they provide. Let’s pull some of it into R to play with…
We can retrieve any set of columns from Cytoscape and store them as an R data frame keyed by SUID. In this case, let’s retrieve the disease score column from the node table. Those will be our two parameters:
In order to reflect your exploration back onto the network, let’s generate subnetworks…
…from top quartile of ‘disease score’
top.quart <- quantile(disease.score.table[,1], 0.75)
top.nodes <- row.names(disease.score.table)[which(disease.score.table[,1]>top.quart)]
createSubnetwork(top.nodes,subnetwork.name ='top disease quartile')
#returns a Cytoscape network SUID
…of connected nodes only
createSubnetwork(edges='all',subnetwork.name='top disease quartile connected') #handy way to exclude unconnected nodes!
…from first neighbors of top 3 genes, using the network connectivity together with the data to direct discovery.
setCurrentNetwork(network="STRING network - ovarian cancer")
top.nodes <- row.names(disease.score.table)[tail(order(disease.score.table[,1]),3)]
selectNodes(nodes=top.nodes)
selectFirstNeighbors()
createSubnetwork('selected', subnetwork.name='top disease neighbors') # selected nodes, all connecting edges (default)
…from diffusion algorithm starting with top 3 genes, using the network connectivity in a more subtle way than just first-degree neighbors.
setCurrentNetwork(network="STRING network - ovarian cancer")
selectNodes(nodes=top.nodes)
commandsPOST('diffusion diffuse') # diffusion!
createSubnetwork('selected',subnetwork.name = 'top disease diffusion')
layoutNetwork('force-directed')
Pro-tip: don’t forget to setCurrentNetwork() to the correct parent network before getting table column data and making selections.
Downloaded TCGA data, preprocessed as R objects. Also available via each TCGA publication, e.g.:
load(system.file("extdata","tutorial-ovc-expr-mean-dataset.robj", package="RCy3"))
load(system.file("extdata","tutorial-ovc-mut-dataset.robj", package="RCy3"))
load(system.file("extdata","tutorial-brc-expr-mean-dataset.robj", package="RCy3"))
load(system.file("extdata","tutorial-brc-mut-dataset.robj", package="RCy3"))
These datasets are similar to the data frames you normally encounter in R. For diversity, one using row.names to store corresponding gene names and the other uses the first column. Both are easy to import into Cytoscape.
str(brc.expr) # gene names in row.names of data.frame
str(brc.mut) # gene names in column named 'Hugo_Symbol'
Let’s return to the Breast Cancer network…
setCurrentNetwork(network="STRING network - breast cancer")
layoutNetwork('force-directed') #uses same settings as previously set
…and use the helper function from RCy3 called loadTableData
?loadTableData
loadTableData(brc.expr,table.key.column = "display name") #default data.frame key is row.names
loadTableData(brc.mut,'Hugo_Symbol',table.key.column = "display name") #specify column name if not default
Let’s create a new style to visualize our imported data …starting with the basics, we will specify a few defaults and obvious mappings in a custom style all our own.
style.name = "dataStyle"
createVisualStyle(style.name)
setVisualStyle(style.name)
setNodeShapeDefault("ellipse", style.name) #remember to specify your style.name!
setNodeSizeDefault(60, style.name)
setNodeColorDefault("#AAAAAA", style.name)
setEdgeLineWidthDefault(2, style.name)
setNodeLabelMapping('display name', style.name)
Now let’s update the style with a mapping for mean expression. The first step is to grab the column data from Cytoscape and pull out the min and max to define our data mapping range of values.
brc.expr.network = getTableColumns('node','expr.mean')
min.brc.expr = min(brc.expr.network[,1],na.rm=TRUE)
max.brc.expr = max(brc.expr.network[,1],na.rm=TRUE)
data.values = c(min.brc.expr,0,max.brc.expr)
Next, we use the RColorBrewer package to help us pick good colors to pair with our data values.
display.brewer.all(length(data.values), colorblindFriendly=TRUE, type="div") # div,qual,seq,all
node.colors <- c(rev(brewer.pal(length(data.values), "RdBu")))
Finally, we use the handy mapVisualProperty function to construct the data object that CyREST needs to specify style mappings and then we’ll send them off to Cytoscape with updateStyleMappings.
Pro-tip: depending on your data, it may be better to balance your color range over negative and positive values bounded by the largest min or max data value, so that color intensity scales similarly in both directions.
OK, now let’s update with a mapping for mutation. Here are all the same steps, but this time mapping mutation counts to both node border width and color.
brc.mut.network = getTableColumns('node','mut_count')
min.brc.mut = min(brc.mut.network[,1],na.rm=TRUE)
max.brc.mut = max(brc.mut.network[,1],na.rm=TRUE)
data.values = c(min.brc.mut,20,max.brc.mut)
display.brewer.all(length(data.values), colorblindFriendly=TRUE, type="seq")
border.colors <- c(brewer.pal(3, "Reds"))
setNodeBorderColorMapping('mut_count',data.values,border.colors,style.name=style.name)
border.width <- c(2,4,8)
setNodeBorderWidthMapping('mut_count',data.values,border.width,style.name=style.name)
This is a useful pair of visual properties to map to a single data column. See why?
Now, let’s pull in what we learned about subnetwork selection and apply it here…
top.mut <- (brc.mut$Hugo_Symbol)[tail(order(brc.mut$mut_count),2)]
top.mut
selectNodes(nodes=top.mut,'display name')
commandsPOST('diffusion diffuse')
createSubnetwork('selected',subnetwork.name = 'top mutated diffusion')
layoutNetwork('force-directed')
The top mutated genes are based on TCGA data and the diffusion algorithm is operating based on the network connectivity from STRING data, leading to a focused subnetwork view of critical Breast Cancer genes with mean patient expression data mapped to fill color. Now that’s data integration!
Pro-tip: You can generate a legend for this in Cytoscape Style tab > Options > Create style… This is no yet available as a command. Coming soon!
But what about the other network and datasets? Do we have to repeat all of those steps again? Actually, no!
First, let’s switch back over to the Ovarian Cancer network and load our data.
setCurrentNetwork(network="STRING network - ovarian cancer")
clearSelection()
str(ovc.expr) # gene names in row.names of data.frame
str(ovc.mut) # gene names in column named 'Hugo_Symbol'
loadTableData(ovc.expr, table.key.column = 'display name')
loadTableData(ovc.mut,'Hugo_Symbol', table.key.column = 'display name')
Because we used the same column names in our original data frames, now we can simply apply the same visual style created above!
Reusing the same style for both breast and ovarian cancers, we can compare the relative expression and mutation counts across the two datasets. For example, notice in the case of ovarian cancer: decreased range of mean expression and fewer mega-mutated genes.
Session files save everything. As with most project software, we recommend saving often!
Note: If you don’t specify a complete path, the files will be saved relative to your current working directory in R.