Title: | Identification of Mutational Clusters in Proteins via a Graph Theoretical Approach. |
---|---|
Description: | Identifies mutational clusters of amino acids in a protein while utilizing the proteins tertiary structure via a graph theoretical model. |
Authors: | Gregory Ryslik, Hongyu Zhao |
Maintainer: | Gregory Ryslik <[email protected]> |
License: | GPL-2 |
Version: | 1.49.0 |
Built: | 2024-10-30 09:09:29 UTC |
Source: | https://github.com/bioc/GraphPAC |
The GraphPAC package identifies statistically significant clusters of non-synonomous amino acid mutations and is a sister package to iPAC. GraphPAC reorders the protein into a one dimensional space via a graph theoretrical approach. Specifically, the traveling salesman problem (TSP) is solved heuristically via the TSP package. Once solved, the mutational data is reordered to follow the hamiltonian path and the nmc algorithm is run to find the mutational clusters on the remapped protein. Unlike the MDS remapping approach that is used in iPAC, distant amino acids no longer have an effect on each other's position in one dimensional space allowing for a closer representation of the underlying biology.
Please see the documentation for “get.Positions", “get.AlignedPositions", and "Plot.Protein.Linear" in the iPAC package. There you will find information on getting basic positional data and plotting functions.
Gregory Ryslik Hongyu Zhao
Maintainer: Gregory A. Ryslik <[email protected]>
Ye et. al., Statistical method on nonrandom clustering with application to somatic mutations in cancer. BMC Bioinformatics. 2010. doi:10.1186/1471-2105-11-11.
Michael Hahsler and Kurt Hornik (2011). Traveling Salesperson Problem (TSP) R package version 1.0-7. http://CRAN.R-project.org/.
Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.sf.net
Gregory Ryslik and Hongyu Zhao (2012). iPAC: Identification of Protein Amino acid Clustering. R package version 1.1.3. http://www.bioconductor.org/.
Bioconductor: Open software development for computational biology and bioinformatics R. Gentleman, V. J. Carey, D. M. Bates, B.Bolstad, M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, and others 2004, Genome Biology, Vol. 5, R80
## Not run: #Load the positional and mutatioanl data CIF<-"https://files.rcsb.org/view/3GFT.cif" Fasta<-"https://www.uniprot.org/uniprot/P01116-2.fasta" KRAS.Positions<-get.Positions(CIF,Fasta, "A") data(KRAS.Mutations) #Calculate the required clusters GraphClust(KRAS.Mutations,KRAS.Positions$Positions,insertion.type = "cheapest_insertion", alpha = 0.05, MultComp = "Bonferroni") ## End(Not run)
## Not run: #Load the positional and mutatioanl data CIF<-"https://files.rcsb.org/view/3GFT.cif" Fasta<-"https://www.uniprot.org/uniprot/P01116-2.fasta" KRAS.Positions<-get.Positions(CIF,Fasta, "A") data(KRAS.Mutations) #Calculate the required clusters GraphClust(KRAS.Mutations,KRAS.Positions$Positions,insertion.type = "cheapest_insertion", alpha = 0.05, MultComp = "Bonferroni") ## End(Not run)
Employs a heuristic approach to solve the traveling salesman problem.
Find.TSP.Path(PositionList, mutation.matrix, insertion.type = "cheapest_insertion", fix.start.pos = "Y")
Find.TSP.Path(PositionList, mutation.matrix, insertion.type = "cheapest_insertion", fix.start.pos = "Y")
PositionList |
A dataframe consisting of six columns: 1) Residue Name, 2) Amino Acid number in the protein, 3) Side Chain, 4) X-coordinate, 5) Y-coordinate and 6) Z-coordinate. Please see get.Positions and get.AlignedPositions in the iPAC package for further information on how to construct this matrix. |
mutation.matrix |
A matrix of 0's (no mutation) and 1's (mutation) where each column represents an amino acid in the protein and each row represents an individual sample (test subject, cell line, etc). Thus if column i in row j had a 1, that would mean that the ith amino acid for person j had a nonsynonomous mutation. |
insertion.type |
Specifies the type of insertion method used. Please see the TSP package for more details. |
fix.start.pos |
The TSP package starts the path at a random amino acid. Such that the results are easily reproducible, the default starts the path on the first amino acid in the protein. |
candidate.path |
A numeric vector of the sequence found through the protein. |
candidate.path.distance |
The distance traveled along the candidate path. |
dist.matrix |
The distance matrix between any two pairwise amino acids. |
linear.path.distance |
The distance traveled if one were to visit the amino acids in the original sequence (1 -> 2 -> 3 -> ...->N |
Michael Hahsler and Kurt Hornik (2011). Traveling Salesperson Problem (TSP) R package version 1.0-7. http://CRAN.R-project.org/.
Gregory Ryslik and Hongyu Zhao (2012). iPAC: Identification of Protein Amino acid Clustering. R package version 1.1.3. http://www.bioconductor.org/.
#Load the position and mutational data CIF<-"https://files.rcsb.org/view/3GFT.cif" Fasta<-"https://www.uniprot.org/uniprot/P01116-2.fasta" KRAS.Positions<-get.Positions(CIF,Fasta, "A") data(KRAS.Mutations) #Save all the results to path.results path.results <- Find.TSP.Path(KRAS.Positions$Positions, KRAS.Mutations)
#Load the position and mutational data CIF<-"https://files.rcsb.org/view/3GFT.cif" Fasta<-"https://www.uniprot.org/uniprot/P01116-2.fasta" KRAS.Positions<-get.Positions(CIF,Fasta, "A") data(KRAS.Mutations) #Save all the results to path.results path.results <- Find.TSP.Path(KRAS.Positions$Positions, KRAS.Mutations)
Finds mutational clusters after reordering the protein using the traveling salesman approach.
GraphClust(mutation.data, position.data, insertion.type = "cheapest_insertion", alpha = 0.05, MultComp = "Bonferroni", fix.start.pos = "Y", Include.Culled = "Y", Include.Full = "Y")
GraphClust(mutation.data, position.data, insertion.type = "cheapest_insertion", alpha = 0.05, MultComp = "Bonferroni", fix.start.pos = "Y", Include.Culled = "Y", Include.Full = "Y")
mutation.data |
A matrix of 0's (no mutation) and 1's (mutation) where each column represents an amino acid in the protein and each row represents an individual sample (test subject, cell line, etc). Thus if column i in row j had a 1, that would mean that the ith amino acid for person j had a nonsynonomous mutation. |
position.data |
A dataframe consisting of six columns: 1) Residue Name, 2) Amino Acid number in the protein, 3) Side Chain, 4) X-coordinate, 5) Y-coordinate and 6) Z-coordinate. Please see get.Positions and get.AlignedPositions in the iPAC package for further information on how to construct this matrix. |
insertion.type |
Specifies the type of insertion method used. Please see the TSP package for more details. |
alpha |
The significance level required in order to find a mutational cluster significance. Please see the NMC package for further information. |
MultComp |
The multiple comparison adjustment required as all pairwise mutations are considered. Options are: “Bonferroni", "BH", or "None". |
fix.start.pos |
The TSP package starts the path at a random amino acid. Such that the results are easily reproducible, the default starts the path on the first amino acid in the protein. |
Include.Culled |
If "Y", the standard NMC algorithm will be run on the protein after removing the amino acids for which there is no positional data. |
Include.Full |
If "Y", the standard NMC algorithm will be run on the full protein sequence. |
The protein reordering is done using the TSP package available on CRAN. This hamiltonian path then serves as the new protein ordering.
The position data can be created via the “get.AlignedPositions" or the “get.Positions" functions available via the imported iPAC package.
The mutation matrix must have the default R column headings “V1", “V2",...,“VN", where N is the last amino acid in the protein. No positions should be skipped in the mutaion matrix.
When unmapping back to the original space, the end points of the cluster in the mapped space are used as the endpoints of the cluster in the unmapped space.
Remapped |
This shows the clusters found while taking the 3D structure into account and remapping the protein using a traveling salesman approach. |
OriginalCulled |
This shows the clusters found if you run the NMC algorithm on the canonical linear protein, but with the amino acids for which we don't have 3D positional data removed. |
Original |
This shows the clusters found if you run the NMC algorithn on the canonical linear protein with all the amino acids. |
candidate.path |
This shows the path found by the TSP package that heuristically minimizes the total distance through the protein. |
path.distance |
The length of the candidate path if traveled from start to finish. |
linear.path.distance |
The length of the sequential path 1,2,3...,N (where N is the total number of amino acids in the protein). |
protein.graph |
A graph object created by the igraph package that has edges between amino acids on the candidate.path. This can be passed to plotting functions to create visual represnetations. |
missing.positions |
This shows which amino acids are present in the mutation matrix but for which we do not have positions. These amino acids are cut from the protein when calculating the Remapped and OriginalCulled results. |
Ye et. al., Statistical method on nonrandom clustering with application to somatic mutations in cancer. BMC Bioinformatics. 2010. doi:10.1186/1471-2105-11-11.
Michael Hahsler and Kurt Hornik (2011). Traveling Salesperson Problem (TSP) R package version 1.0-7. http://CRAN.R-project.org/.
Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.sf.net
Gregory Ryslik and Hongyu Zhao (2012). iPAC: Identification of Protein Amino acid Clustering. R package version 1.1.3. http://www.bioconductor.org/.
## Not run: #Load the positional and mutatioanl data CIF<-"https://files.rcsb.org/view/3GFT.cif" Fasta<-"https://www.uniprot.org/uniprot/P01116-2.fasta" KRAS.Positions<-get.Positions(CIF,Fasta, "A") data(KRAS.Mutations) #Calculate the required clusters GraphClust(KRAS.Mutations,KRAS.Positions$Positions,insertion.type = "cheapest_insertion", alpha = 0.05, MultComp = "Bonferroni") ## End(Not run)
## Not run: #Load the positional and mutatioanl data CIF<-"https://files.rcsb.org/view/3GFT.cif" Fasta<-"https://www.uniprot.org/uniprot/P01116-2.fasta" KRAS.Positions<-get.Positions(CIF,Fasta, "A") data(KRAS.Mutations) #Calculate the required clusters GraphClust(KRAS.Mutations,KRAS.Positions$Positions,insertion.type = "cheapest_insertion", alpha = 0.05, MultComp = "Bonferroni") ## End(Not run)
Creates a circular interactive plot of the path through the protein.
Plot.Protein(graph, path, vertex.size = 5, color.palette = "heat")
Plot.Protein(graph, path, vertex.size = 5, color.palette = "heat")
graph |
The graph object returned by GraphClust ($protein.graph). |
path |
The path returned by GraphClust ($candidate.path). |
vertex.size |
How large you want each vertex to be. |
color.palette |
Possible options are: "heat", "gray", "topo", "cm". |
This will plot the amino acids in a circular directed graph. The vertices can be dragged around to enhance the visual representation. This is meant to complement the Plot.Protein.Linear function in iPAC which is also applicable in this package.
This function is based on the “tkplot" function in igraph. Please see the documentation for that package for the necessary requirements. Special thanks to Dr. G\'abor Cs\'ardi (creator of the igraph package) for his help.
Gregory Ryslik and Hongyu Zhao (2012). iPAC: Identification of Protein Amino acid Clustering. R package version 1.1.3. http://www.bioconductor.org/.
Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.sf.net.
## Not run: #Loads the mutational and positional data CIF<-"https://files.rcsb.org/view/3GFT.cif" Fasta<-"https://www.uniprot.org/uniprot/P01116-2.fasta" KRAS.Positions<-get.Positions(CIF,Fasta, "A") data(KRAS.Mutations) #gets the cluster results and graph object my.graph.clusters <- GraphClust(KRAS.Mutations,KRAS.Positions$Positions, insertion.type = "cheapest_insertion",alpha = 0.05, MultComp = "Bonferroni") Plot.Protein(my.graph.clusters$protein.graph, my.graph.clusters$candidate.path, vertex.size=5, color.palette="heat") ## End(Not run)
## Not run: #Loads the mutational and positional data CIF<-"https://files.rcsb.org/view/3GFT.cif" Fasta<-"https://www.uniprot.org/uniprot/P01116-2.fasta" KRAS.Positions<-get.Positions(CIF,Fasta, "A") data(KRAS.Mutations) #gets the cluster results and graph object my.graph.clusters <- GraphClust(KRAS.Mutations,KRAS.Positions$Positions, insertion.type = "cheapest_insertion",alpha = 0.05, MultComp = "Bonferroni") Plot.Protein(my.graph.clusters$protein.graph, my.graph.clusters$candidate.path, vertex.size=5, color.palette="heat") ## End(Not run)