Title: | Evolutionary and plasticity analysis of orthologous groups |
---|---|
Description: | Geneplast is designed for evolutionary and plasticity analysis based on orthologous groups distribution in a given species tree. It uses Shannon information theory and orthologs abundance to estimate the Evolutionary Plasticity Index. Additionally, it implements the Bridge algorithm to determine the evolutionary root of a given gene based on its orthologs distribution. |
Authors: | Rodrigo Dalmolin, Mauro Castro |
Maintainer: | Mauro Castro <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.33.0 |
Built: | 2024-10-30 08:02:53 UTC |
Source: | https://github.com/bioc/geneplast |
Geneplast is designed for evolutionary and plasticity analysis based on orthologous groups distribution in a given species tree. It uses Shannon information theory and orthologs abundance to estimate the Evolutionary Plasticity Index. Additionally, it implements the Bridge algorithm to determine the evolutionary root of a given gene based on its orthologs distribution.
Package: | geneplast |
Type: | Package |
Version: | 1.0 |
Date: | 2016-06-10 |
License: | GPL (>= 2) |
R package for gene plasticity inference based on orthologous groups distribution.
Rodrigo JS Dalmolin, Mauro AA Castro.
Dalmolin RJS et al. Geneplast: an R package for gene plasticity inference based on orthologous groups distribution. Journal paper (in preparation).
A dataset used to demonstrate geneplast main functions.
data(gpdata.gs)
data(gpdata.gs)
gpdata.gs
A data frame containing orthology annotation.
The dataset consists of 4 R objects to be used for demonstration purposes only in the geneplast vignette.
A data frame with three columns listing orthology annotation retrieved from the STRING database (http://string-db.org/), release 9.1. Column 1 = Ensembl protein ID; column 2 = NCBI species ID; column 3 = OG ID. Note: This dataset is to be used for demonstration purposes only as it represents a subset of the STRING database; in order to reduce the dataset size, orthology annotation was mapped to genome stability genes (Castro et al.).
A data frame containing the species listed in STRING database (http://string-db.org/), release 9.1. Column 1 = NCBI species ID; column 2 = NCBI species name;column 3 = species domain (eukaryotes).
A one-column data.frame listing orthologous groups (OGs) available in the 'cogdata' object.
An object of class "phylo" for the eukaryotes listed in the STRING database, release 9.1.
a dataset.
Franceschini et al. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Research 41(Database issue):D808-15, 2013. doi:10.1093/nar/gks1094. Epub 2012 Nov 29.
Castro et al. Evolutionary Origins of Human Apoptosis and Genome-Stability Gene Networks. Nucleic Acids Research 36(19): 6269-83, 2008. doi:10.1093/nar/gkn636.
data(gpdata.gs)
data(gpdata.gs)
Function to calculate abundance, diversity, and evolutionary plasticity of an orthologous group (OG).
gplast(object, verbose=TRUE)
gplast(object, verbose=TRUE)
object |
this argument is an object of class 'OGP' ( |
verbose |
a single logical value specifying to display detailed messages (when verbose=TRUE) or not (when verbose=FALSE). |
This method computes the abundance and diversity of an OG, and derives the evolutionary plasticity as described in Castro et al. (2008) and Dalmolin et al. (2011). The OG diversity corresponds to the normalized Shannon's diversity index and estimates the distribution of orthologous proteins across the species listed in the input dataset. The OG abundance is simply the ratio between the number of orthologs of a given OG and the number of organisms listed in the group. Evolutionary Plasticity Index is calculated as described in Dalmolin et al (2011).
A processed object of class 'OGP', including COG's abundance, diversity, and plasticity results.
Rodrigo Dalmolin, Mauro Castro
Dalmolin, RJ, Castro, MA, Rybarczyk-Filho JL, Souza LH, de Almeida RM, Moreira JC. Evolutionary plasticity determination by orthologous groups distribution. Biol Direct. 2011 May 17;6:22. DOI: 10.1186/1745-6150-6-22.
Castro MA, Dalmolin RJ, Moreira JC, Mombach JC, de Almeida RM. Evolutionary origins of human apoptosis and genome-stability gene networks. Nucleic Acids Res. 2008 Nov;36(19):6269-83. DOI: 10.1093/nar/gkn636.
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGP' ogp <- gplast.preprocess(cogdata=cogdata, sspids=sspids, cogids=cogids) ## run the gplast function ## this example uses the especies/COGs listed in the gpdata object ogp <- gplast(ogp) res <- gplast.get(ogp,what="results")
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGP' ogp <- gplast.preprocess(cogdata=cogdata, sspids=sspids, cogids=cogids) ## run the gplast function ## this example uses the especies/COGs listed in the gpdata object ogp <- gplast(ogp) res <- gplast.get(ogp,what="results")
Get information from individual slots in an OGP object and any available results from previous analysis.
gplast.get(object, what="status")
gplast.get(object, what="status")
object |
an object of class 'OGP' |
what |
a single character value specifying which information should be retrieved from the slots. Options: "cogids", "sspids", "orthodist", "results", and "status". |
slot content from an object of class 'OGP' OGP-class
.
Rodrigo Dalmolin, Mauro Castro
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGP' ogp <- gplast.preprocess(cogdata=cogdata, sspids=sspids, cogids=cogids) ## run the gplast function ## this example uses the especies/COGs listed in the gpdata object ogp <- gplast(ogp) res <- gplast.get(ogp,what="results")
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGP' ogp <- gplast.preprocess(cogdata=cogdata, sspids=sspids, cogids=cogids) ## run the gplast function ## this example uses the especies/COGs listed in the gpdata object ogp <- gplast(ogp) res <- gplast.get(ogp,what="results")
Constructor for the 'OGP-class' object.
gplast.preprocess(cogdata, sspids=NULL, cogids=NULL, verbose=TRUE)
gplast.preprocess(cogdata, sspids=NULL, cogids=NULL, verbose=TRUE)
cogdata |
a data frame or matrix with COG's data. |
sspids |
an optional data frame with species annotation. Alternatively, it can be a character vector with species IDs. |
cogids |
an optional data frame with COG's annotation. Alternatively, it can be a character vector with COG IDs. |
verbose |
a single logical value specifying to display detailed messages (when verbose=TRUE) or not (when verbose=FALSE). |
This function creates an OGP-class
object and checks the consistency of the input data for the evolutionary plasticity pipeline. Internally, the function counts the number of orthologs for each species in a given OG and computes the orthodist
matrix.
A preprocessed object of class 'OGP'.
Rodrigo Dalmolin, Mauro Castro
Dalmolin, RJ, Castro, MA, Rybarczyk-Filho JL, Souza LH, de Almeida RM, Moreira JC. Evolutionary plasticity determination by orthologous groups distribution. Biol Direct. 2011 May 17;6:22. DOI: 10.1186/1745-6150-6-22.
Castro MA, Dalmolin RJ, Moreira JC, Mombach JC, de Almeida RM. Evolutionary origins of human apoptosis and genome-stability gene networks. Nucleic Acids Res. 2008 Nov;36(19):6269-83. DOI: 10.1093/nar/gkn636.
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGP' ogp <- gplast.preprocess(cogdata=cogdata, sspids=sspids, cogids=cogids)
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGP' ogp <- gplast.preprocess(cogdata=cogdata, sspids=sspids, cogids=cogids)
Function to determine the evolutionary root of a gene based on the distribution of its orthologs.
groot(object, method="BR", penalty=2, cutoff=0.3, nPermutations=1000, pAdjustMethod="bonferroni", verbose=TRUE)
groot(object, method="BR", penalty=2, cutoff=0.3, nPermutations=1000, pAdjustMethod="bonferroni", verbose=TRUE)
object |
this argument is an object of class 'OGR' ( |
method |
a single character value specifying the rooting algorithm. Options: "BR" and "KS" (see details). |
penalty |
a single numeric value specifying the penalty used in the rooting algorithm (see details). |
cutoff |
a single numeric value in [0,1] specifying the cutoff used in the BR statistics (see details). |
nPermutations |
a single integer value specifying the number of permutations used to compute a null distribution for the inferred roots in the species tree. |
pAdjustMethod |
a single character value specifying the p-value adjustment method to be used (see 'p.adjust' for details). |
verbose |
a single logical value specifying to display detailed messages (when verbose=TRUE) or not (when verbose=FALSE). |
The 'groot' function addresses the problem of finding the evolutionary root of a feature in an phylogenetic tree. The method infers the probability that such feature was present in the Last Common Ancestor (LCA) of a given lineage. Events like horizontal gene transfer, gene deletion, and de novo gene formation add noise to vertical heritage patterns. The 'groot' function assesses the presence and absence of the orthologs in the extant species of the phylogenetic tree in order to build a probability distribution, which is used to identify vertical heritage patterns.
The 'penalty' argument weighs gene gain and loss; penalty=1 indicates equal probability; penalty > 1 indicates higher probability of gene loss while penalty < 1 indicates higher probability of gene gain (penalty value should be greater than zero; default penalty=2).
After the probability distribution is built for a given lineage, then a rooting algorithm is used to search the LCA that provides the best vertical heritage pattern (default method='BR'). The rooting algorithms finds the optimum point that splits the probability distribution into two components: one enriched with the queried feature (supporting vertical heritage in the lineage) and another with low evidence in favour of the feature's presence. The cutoff sets the tolerance for the discrimination between the two components (default cutoff=0.3). Based on the optimization settings, then the 'groot' function computes the inconsistency score 'Dscore', which assesses the significance of the inferred root against a null distribution derived by permutation analysis.
An processed object of class 'OGR', including results from the rooting algorithm.
Rodrigo Dalmolin, Mauro Castro
Dalmolin RJ and Castro, MA. Geneplast: Evolutionary rooting using orthologous groups distribution. Journal Paper (in preparation), 2016.
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGR' for H. sapiens ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606", cogids=cogids) ## run the groot function ## this example uses the orthologous groups listed in the gpdata object ogr <- groot(ogr, nPermutations=100) res <- groot.get(ogr, what="results") ## Not run: # Option: parallel version with SNOW package! library(snow) options(cluster=makeCluster(2, "SOCK")) ogr <- groot(ogr, nPermutations=100) stopCluster(getOption("cluster")) ## End(Not run)
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGR' for H. sapiens ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606", cogids=cogids) ## run the groot function ## this example uses the orthologous groups listed in the gpdata object ogr <- groot(ogr, nPermutations=100) res <- groot.get(ogr, what="results") ## Not run: # Option: parallel version with SNOW package! library(snow) options(cluster=makeCluster(2, "SOCK")) ogr <- groot(ogr, nPermutations=100) stopCluster(getOption("cluster")) ## End(Not run)
Get information from individual slots in an OGR object and any available results from a previous analysis.
groot.get(object, what="status")
groot.get(object, what="status")
object |
an object of class 'OGR' |
what |
a single character value specifying which information should be retrieved from the slots. Options: "cogids", "spbranches", "orthoroot" ,"results" and "status". |
slot content from an object of class 'OGR' OGR-class
.
Rodrigo Dalmolin, Mauro Castro
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGR' for H. sapiens ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606", cogids=cogids) ## run the groot function ## this example uses the orthologous groups listed in the gpdata object ogr <- groot(ogr, nPermutations=100) res <- groot.get(ogr, what="results")
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGR' for H. sapiens ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606", cogids=cogids) ## run the groot function ## this example uses the orthologous groups listed in the gpdata object ogr <- groot(ogr, nPermutations=100) res <- groot.get(ogr, what="results")
Plot the inferred evolutionary root of a given OG onto the species tree or the map of LCAs of a given species.
groot.plot(ogr, whichOG, fname="gproot", width=4.5, height=6.5, cex.lab=0.3, cex.nodes=0.6, adj.tips=c(1, 0.5), lab.offset=1.5, col.tips=c("green2","grey"), col.edges=c("black","grey"), col.root="red", plot.sspnames=TRUE, plot.subtree=FALSE, plot.lcas=FALSE)
groot.plot(ogr, whichOG, fname="gproot", width=4.5, height=6.5, cex.lab=0.3, cex.nodes=0.6, adj.tips=c(1, 0.5), lab.offset=1.5, col.tips=c("green2","grey"), col.edges=c("black","grey"), col.root="red", plot.sspnames=TRUE, plot.subtree=FALSE, plot.lcas=FALSE)
ogr |
this argument is an object of class 'OGR' evaluated by the groot |
whichOG |
a single character value indicating the OG to be plotted. |
fname |
a character string naming a file. |
width |
a single numeric value specifying the width of the graphics region in inches. |
height |
a single numeric value specifying the height of the graphics region in inches. |
cex.lab |
numeric character expansion factor for tip labels. |
cex.nodes |
numeric expansion factor for node symbols. |
adj.tips |
two numeric values specifying the adjustment of the labels. |
lab.offset |
a single numeric value specifying the offset of the labels. |
col.tips |
a character vector of length=2 specifying the colors of the tips. |
col.edges |
a character vector of length=2 specifying the colors of the edges. |
col.root |
a character value specifying the color of the inferred root. |
plot.sspnames |
a single logical value specifying whether ssp names should be used to generate the plot. |
plot.subtree |
a single logical value specifying whether a sub-species tree should be used to generate the plot. |
plot.lcas |
a single logical value specifying whether a species tree should be generated mapping the positions of all possible roots. |
a pdf file.
Rodrigo Dalmolin, Mauro Castro
Dalmolin RJ and Castro, MA. Geneplast: Evolutionary rooting using orthologous groups distribution. Journal Paper (in preparation), 2016.
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGR' for H. sapiens ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606", cogids=cogids) ## run the groot function ogr <- groot(ogr, nPermutations=100) ## this example plots NOG40170 in the phyloTree groot.plot(ogr,whichOG="NOG40170")
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGR' for H. sapiens ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606", cogids=cogids) ## run the groot function ogr <- groot(ogr, nPermutations=100) ## this example plots NOG40170 in the phyloTree groot.plot(ogr,whichOG="NOG40170")
Constructor for the 'OGR-class' object.
groot.preprocess(cogdata, phyloTree, spid, cogids=NULL, verbose=TRUE)
groot.preprocess(cogdata, phyloTree, spid, cogids=NULL, verbose=TRUE)
cogdata |
a data frame with COG's data. |
phyloTree |
an object of class "phylo". |
spid |
a single character or integer value specifying the reference species to be used in the rooting algorithm. This species should be listed in the 'phyloTree'. |
cogids |
an optional data frame with COG's annotation. Alternatively, it can be a character vector with COG IDs. |
verbose |
a single logical value specifying to display detailed messages (when verbose=TRUE) or not (when verbose=FALSE). |
This function creates an OGR-class
object and checks the consistency of the input data for the evolutionary root pipeline. Internally, the function access the presence and absence of orthologs for each species in a given OG and computes the orthoct
data.frame.
A preprocessed object of class 'OGR'.
Rodrigo Dalmolin, Mauro Castro
Dalmolin RJ and Castro, MA. Geneplast: Evolutionary rooting using orthologous groups distribution. Journal Paper (in preparation), 2016.
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGR' for H. sapiens ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606", cogids=cogids)
#load datasets used for demonstration data(gpdata.gs) #create and object of class 'OGR' for H. sapiens ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606", cogids=cogids)
"OGP"
: an S4 class for genetic plasticity analysis.This S4 class includes methods to access the genetic plasticity of orthologous groups.
Objects can be created by calls to the "gplast.preprocess"
constructor.
sspids
:Object of class "data.frame"
,
a data frame with species annotation.
cogids
:Object of class "data.frame"
,
a data frame with COG's data.
orthodist
:Object of class "matrix"
,
a matrix with COG's information (see return values in the OGP methods).
abundance
:Object of class "numeric"
,
a numeric vector with results from the gplast
function (see return values in the OGP methods).
diversity
:Object of class "numeric"
,
a numeric vector with results from the gplast
function (see return values in the OGP methods).
plasticity
:Object of class "numeric"
,
a numeric vector with results from the gplast
function (see return values in the OGP methods).
status
:Object of class "character"
,
a character value specifying the status of the OGP object
based on the available methods.
signature(object = "OGP")
: see gplast
signature(object = "OGP")
: see gplast.get
Rodrigo Dalmolin, Mauro Castro
"OGR"
: an S4 class for rooting analysis.This S4 class includes methods to do inferential analysis of evolutionary roots in a given species tree.
Objects can be created by calls to the "groot.preprocess"
constructor.
cogids
:Object of class "data.frame"
,
a data frame with COG's data.
tree
:Object of class "phylo"
,
a given species tree.
spbranches
:Object of class "data.frame"
,
a data frame listing branches of a given species tree.
orthoroot
:Object of class "data.frame"
,
a data.frame with results from the 'groot' function (see return values in the OGR methods).
orthoct
:Object of class "data.frame"
,
a data.frame with results from the 'groot.preprocess' function
(see return values in the OGR methods).
status
:Object of class "character"
,
a character value specifying the status of the OGR object
based on the available methods.
Rodrigo Dalmolin, Mauro Castro
This function adds evolutionary rooting information to an 'igraph' class object.
ogr2igraph(ogr, cogdata, g, idkey = "ENTREZ")
ogr2igraph(ogr, cogdata, g, idkey = "ENTREZ")
ogr |
a processed object of class 'OGR' evaluated by the |
cogdata |
a data.frame with COG to protein mapping, with at least one gene annotation type listed in the 'rtni' object (e.g. ENTREZ gene ID). |
g |
a 'igraph' object. |
idkey |
a single character value specifying a vertex attribute name used to map 'cogdata' annotation to 'g'. |
Return an updated 'igraph' object with evolutionary root information.
Rodrigo Dalmolin, Mauro Castro, Sheyla Trefflich
Dalmolin RJ and Castro, MA. Geneplast: Evolutionary rooting using orthologous groups distribution. Journal Paper (in preparation), 2016.
## Not run: #This example requires the geneplast.data.string.v91 package! (currently available under request) library(geneplast.data.string.v91) data(gpdata_string_v91) ## End(Not run)
## Not run: #This example requires the geneplast.data.string.v91 package! (currently available under request) library(geneplast.data.string.v91) data(gpdata_string_v91) ## End(Not run)
This function adds evolutionary rooting information to regulons of a TNI class object.
ogr2tni(ogr, cogdata, tni)
ogr2tni(ogr, cogdata, tni)
ogr |
a processed object of class 'OGR' evaluated by the |
cogdata |
a data.frame with COG to protein mapping, with at least one gene annotation type listed in the 'rtni' object (e.g. ENTREZ gene ID). |
tni |
a processed object of class 'TNI' |
Return an updated TNI object with evolutionary root information.
Rodrigo Dalmolin, Mauro Castro, Sheyla Trefflich
Dalmolin RJ and Castro, MA. Geneplast: Evolutionary rooting using orthologous groups distribution. Journal Paper (in preparation), 2016.
## Not run: #This example requires the geneplast.data.string.v91 package! (currently available under request) # Evolutionary rooting inference for all human genes library(geneplast.data.string.v91) data(gpdata_string_v91) cogids <- cogdata$cog_id[cogdata$ssp_id=="9606"] ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606", cogids=cogids) ogr <- groot(ogr, nPermutations=100, verbose=TRUE) # Integrates evolutionary rooting information and regulons library(RTN) library(Fletcher2013b) data("rtni1st") rtni1st <- ogr2tni(ogr, cogdata, rtni1st) ## End(Not run)
## Not run: #This example requires the geneplast.data.string.v91 package! (currently available under request) # Evolutionary rooting inference for all human genes library(geneplast.data.string.v91) data(gpdata_string_v91) cogids <- cogdata$cog_id[cogdata$ssp_id=="9606"] ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606", cogids=cogids) ogr <- groot(ogr, nPermutations=100, verbose=TRUE) # Integrates evolutionary rooting information and regulons library(RTN) library(Fletcher2013b) data("rtni1st") rtni1st <- ogr2tni(ogr, cogdata, rtni1st) ## End(Not run)
An igraph object used to demonstrate geneplast main functions.
data(ppi.gs)
data(ppi.gs)
ppi.gs
an igraph object.
Protein-protein interactions (PPI) mapped to H. sapiens apoptosis and genome-stability genes. The PPI information is derived from the STRING database, release 9.1, using combined score >=900 (Franceschini et al., 2013).
an igraph object.
Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2013 Jan;41(Database issue):D808-15. doi: 10.1093/nar/gks1094. Epub 2012 Nov 29.
data(ppi.gs)
data(ppi.gs)