PhyloProfile

Introduction

Phylogenetic profiles capture the presence - absence pattern of genes across species (Pellegrini et al., 1999). The presence of an ortholog in a given species is often taken as evidence that also the corresponding function is represented (Lee et al., 2007). Moreover, if two genes agree in their phylogenetic profile, it can suggest that they functionally interact (Pellegrini et al., 1999). Phylogenetic profiles are therefore commonly used for tracing functional protein clusters or metabolic networks across species and through time. However, orthology inference is not error-free (Altenhoff et al., 2016), and orthology does not guarantee functional equivalence for two genes (Studer and Robinson-Rechavi, 2009). Therefore, phylogenetic profiles are often integrated with accessory information layers, such as sequence similarity, domain architecture similarity, or semantic similarity of Gene Ontology-term descriptions.

Various approaches exist to visualize such profiles. However, there is still a shortage of tools that provide a comprehensive set of functions for the display, filtering and analysis of multi-layered phylogenetic profiles comprising hundreds of genes and taxa. To close this methodological gap, we present here PhyloProfile, an R-based tool to visualize, explore and analyze multi-layered phylogenetic profiles.

How to install PhyloProfile

To install the PhyloProfile package via Bioconductor using BiocManager:

if (!requireNamespace("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("PhyloProfile")

To install the dev version from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(version='devel')
BiocManager::install("PhyloProfile")

To install the dev version from github:

if (!requireNamespace("devtools"))
    install.packages("devtools", repos = "http://cran.us.r-project.org")
devtools::install_github(
    "BIONF/PhyloProfile",
    INSTALL_opts = c('--no-lock'),
    build_vignettes = TRUE
)

Or use directly the online version at http://applbio.biologie.uni-frankfurt.de/phyloprofile/.

Input

PhyloProfile expects as a main input the phylogenetic distribution of orthologs, or more generally of homologs. This information can be complemented with domain architecture annotation and data for up to two additional annotation layers.

Beside tab delimited text and sequences in FASTA format, the tool also accepts orthoXML (Schmitt et al., 2011), or a list of OMA IDs (Altenhoff et al., 2015) as input.

Here is an example of a tab delimited input with two additional annotation layers:

geneID ncbiID orthoID FAS_F FAS_B
100136at6656 ncbi36329 @1|Q8ILT8|1 0.9875289 0.8427314
100136at6656 ncbi319348 @0|319348_0:004132|1 1.0000000 1.0000000
100136at6656 ncbi208964 @1|Q9I5U5|1 0.9971027 0.9971027
100136at6656 ncbi418459 @1|E3KFA2|1 0.9895679 0.8232540
100136at6656 ncbi10116 @1|G3V7R8|1 0.9996617 0.8541265
100136at6656 ncbi284812 @1|Q9USU2|1 0.9994874 0.9994874
100136at6656 ncbi35128 @1|B8C2N6|1 0.9852370 0.7002961
100136at6656 ncbi7070 @1|D6X457|1 1.0000000 1.0000000
100136at6656 ncbi237631 @1|A0A0D1C927|1 0.9912998 0.6172244
100136at6656 ncbi559292 @1|P41819|1 0.9978912 0.9978912

The WIKI accompanying PhyloProfile gives a comprehensive guide of how to format input data.

Features and capabilities

Interactive visualization and dynamic exploration of phylogenetic profiles

Together with several functions for exploring phylogenetic profiles, we provide an interactive visualization application implemented with Shiny (https://CRAN.R-project.org/package=shiny).

PhyloProfile's GUI

PhyloProfile’s GUI

For both command-based and visualization analyses, users can:

  • dynamically change the resolution of the analysis from invidual species to phyla or entire kingdoms by collapsing the input taxa into higher systematic rank. Species are automatically linked to the NCBI taxonomy, and are ordered in increasing taxonomic distance from a user-specified reference taxon. PhyloProfile is able to represent co-orthologs (in-paralogs), if the working taxonomic rank is the deepest one (e.g. strain or species) that can be found in the input taxa.
  • dynamically filter data by applying different thresholds to the integrated information (e.g. increasing the fraction of species in a systematic group that must harbor an ortholog before the gene is considered present in this group reduces the impact of spurious ortholog identification on evolutionary interpretations).
  • dynamically modify the apperance of profile with diverse plot configuration options.

PhyloProfile is able to represent the entire data matrix (Main profile) or to visualize only a subset of genes and taxa for a detailed inspection (Customized profile), without the need of modifying the input data.

Besides, PhyloProfile’s interface will be automatically varied according to user’s input files, such as the names of two additional information layers or list of input taxa.

Analysis functions

PhyloProfile provides several functions for dynamically analyzing phylogenetic profiles.

Profile clustering

The identification of proteins with similar phylogenetic profiles is a crucial step in the identification and characterization of novel functional protein interaction networks (Pellegrini, 2012). PhyloProfile offers the option to cluster genes according to the distance of their phylogenetic profiles.

### An example for plotting the clustered profiles tree.
### See ?getDendrogram for more details.

#' Load built-in data
data("finalProcessedProfile", package="PhyloProfile")
data <- finalProcessedProfile

#' Calculate distance matrix
#' Check ?getDistanceMatrix
profileType <- "binary"
profiles <- getDataClustering(
    data, profileType, var1AggregateBy, var2AggregateBy)
method <- "mutualInformation"
distanceMatrix <- getDistanceMatrix(profiles, method)

#' Create clustered profile tree
clusterMethod <- "complete"
dd <- clusterDataDend(as.dist(distanceMatrix), clusterMethod)
getDendrogram(dd)
#> $type
#> [1] "phylogram"
#> 
#> $use.edge.length
#> [1] TRUE
#> 
#> $node.pos
#> [1] 1
#> 
#> $node.depth
#> [1] 1
#> 
#> $show.tip.label
#> [1] TRUE
#> 
#> $show.node.label
#> [1] FALSE
#> 
#> $font
#> [1] 3
#> 
#> $cex
#> [1] 1
#> 
#> $adj
#> [1] 0
#> 
#> $srt
#> [1] 0
#> 
#> $no.margin
#> [1] FALSE
#> 
#> $label.offset
#> [1] 0
#> 
#> $x.lim
#> [1] 0.0000000 0.1240092
#> 
#> $y.lim
#> [1] 1 4
#> 
#> $direction
#> [1] "rightwards"
#> 
#> $tip.color
#> [1] "black"
#> 
#> $Ntip
#> [1] 4
#> 
#> $Nnode
#> [1] 3
#> 
#> $root.time
#> NULL
#> 
#> $align.tip.label
#> [1] FALSE

Gene age estimation

PhyloProfile can estimate the evolutionary age of a gene from the phylogenetic profiles using an LCA algorithm (Capra et al., 2013). Specifically, the last common ancestor of the two most distantly related species displaying a given gene serves as the minimal gene age. Age estimates are dynamically updated upon filtering of the data.

### An example for calculating gene age for the built-in data set.
### See ?estimateGeneAge for more details.
#' Load built-in data
data("fullProcessedProfile", package="PhyloProfile")

#' Choose the working rank and the reference taxon
rankName <- "class"
refTaxon <- "Mammalia"

#' Count taxa within each supertaxon
taxonIDs <- levels(as.factor(fullProcessedProfile$ncbiID))
sortedInputTaxa <- sortInputTaxa(
    taxonIDs, rankName, refTaxon, NULL, NULL, NULL
)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
taxaCount <- sortedInputTaxa %>% dplyr::count(supertaxon)

#' Set cutoff for 2 additional variables and the percentage of present species
#' in each supertaxon
var1Cutoff <- c(0,1)
var2Cutoff <- c(0,1)
percentCutoff <- c(0,1)

#' Estimate gene age
estimateGeneAge(
    fullProcessedProfile, taxaCount, rankName, refTaxon,
    var1Cutoff, var2Cutoff, percentCutoff
)
#> Key: <geneID>
#>          geneID      cath                           age
#>          <char>    <char>                        <char>
#> 1: 100136at6656 000000001 10_Archaea-Bacteria-Eukaryota
#> 2: 100265at6656 000000111               06_Opisthokonta
#> 3: 101621at6656 000000001 10_Archaea-Bacteria-Eukaryota
#> 4: 103479at6656 000000011                  08_Eukaryota

Core gene identification

Phylogenomic reconstructions are typically based on a collection of core genes (Daubin et al., 2002), i.e. genes that are shared among all genomes in a taxon collection. PhyloProfile enables users to select a set of taxa and returns their core genes.

### An example for calculating core gene set for the built-in data set.
### See ?getCoreGene for more details.

#' Load built-in data
data("fullProcessedProfile", package="PhyloProfile")

#' Choose the working rank and a set of taxa of interest
rankName <- "class"
refTaxon <- "Mammalia"
taxaCore <- c("Mammalia", "Saccharomycetes", "Insecta")

#' Set cutoff for 2 additional variables and the percentage of present species
#' in each supertaxon
var1Cutoff <- c(0.75, 1.0)
var2Cutoff <- c(0.75, 1.0)
percentCutoff <- c(0.0, 1.0)

#' Set core coverage, the % of taxa that must be present in the selected set
coreCoverage <- 1

#' Count taxa within each supertaxon
taxonIDs <- levels(as.factor(fullProcessedProfile$ncbiID))
sortedInputTaxa <- sortInputTaxa(
    taxonIDs, rankName, refTaxon, NULL, NULL, NULL
)
taxaCount <- sortedInputTaxa %>% dplyr::count(supertaxon)

#' Identify core genes
getCoreGene(
    rankName,
    taxaCore,
    fullProcessedProfile,
    taxaCount,
    var1Cutoff, var2Cutoff,
    percentCutoff, coreCoverage
)
#> [1] "100136at6656" "100265at6656" "101621at6656" "103479at6656"

Group comparison

This function is used to compare the distribution of the additional variables between two taxon groups, an in- and an out-group. Users can define the in-group and all taxa not included in this are used as the out-group. The value distributions of the variables are then compared using statistical tests (Kolmogorov-Smirnov and Wilcoxon-Mann-Whitney) using the specified significant level (0.05 by default). Genes that have a significantly different distribution will be shown in the candidate gene list for further analysis.

#' Load built-in data
data("mainLongRaw", package="PhyloProfile")
data <- mainLongRaw
#' choose the in-group taxa
inGroup <- c("ncbi9606", "ncbi10116")
#' choose variable to be compared
variable <- colnames(data)[4]
#' compare the selected variable between the in-group and out-group taxa
compareTaxonGroups(data, inGroup, TRUE, variable, 0.05)
#> 103479at6656 100136at6656 101621at6656 100265at6656 
#>    0.1399542    0.4889198    0.5620258    0.5850421

Distribution analysis

The interpretation of phylogenetic profiles, and the result of downstream analyses can change substantially upon filtering the data. To help users to decide on reasonable filtering thresholds, PhyloProfile provides a function to plot the distributions of the values incurred by the integrated information layers.

### An example for plotting the distribution of the 1st additional variable.
### See ?createVarDistPlot for more details.

#' Load built-in data
data("mainLongRaw", package="PhyloProfile")

#' Process data for distribution analysis
#' See ?createVariableDistributionData
data <- createVariableDistributionData(
    mainLongRaw, c(0, 1), c(0.5, 1)
)
head(data, 6)
#>                                             orthoID      var1      var2
#> 4596            100136at6656|PLAF7@36329@1|Q8ILT8|1 0.9875289 0.8427314
#> 4597 100136at6656|POLVAN@319348@0|319348_0:004132|1 1.0000000 1.0000000
#> 4598           100136at6656|PSEAE@208964@1|Q9I5U5|1 0.9971027 0.9971027
#> 4599           100136at6656|PUCGT@418459@1|E3KFA2|1 0.9895679 0.8232540
#> 4600              100136at6656|RAT@10116@1|G3V7R8|1 0.9996617 0.8541265
#> 4602           100136at6656|SCHPO@284812@1|Q9USU2|1 0.9994874 0.9994874

#' Choose a variable for plotting and set the variable name
varType <- "var1"
varName <- "Variable 1"

#' Set cutoff for the percentage of present species in each supertaxon
percentCutoff <- c(0,1)

#' Set text size
distTextSize <- 12

#' Create distribution plot
createVarDistPlot(
    data,
    varName,
    varType,
    percentCutoff,
    distTextSize
)

Examples

Process raw input

Process raw input (in different format) into a dataframe that contains all required information for the phylogenetic profile analysis.

#' Load built-in data
#' If input data is in other format (e.g. fasta, OrthoXML, or wide matrix),
#' see ?createLongMatrix
rawInput <- system.file(
    "extdata", "test.main.long", package = "PhyloProfile", mustWork = TRUE
)

#' Set working rank and the reference taxon
rankName <- "class"
refTaxon <- "Mammalia"

#' Input a user-defined taxonomy tree to replace NCBI taxonomy tree (optional)
taxaTree <- NULL
#' Or a sorted taxon list (optional)
sortedTaxonList <- NULL

#' Choose how to aggregate the additional variables when pocessing the data
#' into supertaxon
var1AggregateBy <- "max"
var2AggregateBy <- "mean"

#' Set cutoffs for for percentage of species present in a supertaxon,
#' allowed number of co orthologs, and cutoffs for the additional variables
percentCutoff <- c(0.0, 1.0)
coorthologCutoffMax <- 10
var1Cutoff <- c(0.75, 1.0)
var2Cutoff <- c(0.5, 1.0)

#' Choose the relationship of the additional variables, if they are related to
#' the orthologous proteins or to the species
var1Relation <- "protein"
var2Relation <- "species"

#' Identify categories for input genes (by a mapping tab-delimited file)
groupByCat <- FALSE
catDt <- NULL

#' Process the input file into a phylogenetic profile data that contains
#' taxonomy information and the aggregated values for the 2 additional variables
profileData <- fromInputToProfile(
    rawInput,
    rankName,
    refTaxon,
    taxaTree,
    sortedTaxonList,
    var1AggregateBy,
    var2AggregateBy,
    percentCutoff,
    coorthologCutoffMax,
    var1Cutoff,
    var2Cutoff,
    var1Relation,
    var2Relation,
    groupByCat,
    catDt
)

head(profileData)
#>         geneID          supertaxon supertaxonID      var1 presSpec category
#> 1 100136at6656     100001_Mammalia        40674 0.9996617        1      cat
#> 2 100136at6656         100002_Aves         8782 0.9994875        1      cat
#> 3 100136at6656  100003_Actinopteri       186623 1.0000000        1      cat
#> 4 100136at6656  100005_Leptocardii      2682552 1.0000000        1      cat
#> 5 100136at6656      100006_Insecta        50557 1.0000000        1      cat
#> 6 100136at6656 100007_Branchiopoda         6658 0.9887614        1      cat
#>                                          orthoID      var2 paralog presentTaxa
#> 1              100136at6656|RAT@10116@1|G3V7R8|1 0.8674499       1           5
#> 2             100136at6656|CHICK@9031@1|E1C0U9|1 0.9994875       1           1
#> 3             100136at6656|DANRE@7955@1|B0S688|1 0.8277451       1           2
#> 4             100136at6656|BRAFL@7739@1|C3YM50|1 1.0000000       1           1
#> 5 100136at6656|POLVAN@319348@0|319348_0:004132|1 0.9167176       1           9
#> 6    100136at6656|DAPMA@35525@0|35525_0:001fd7|1 0.7286372       1           1
#>   totalTaxa     geneName
#> 1         5 100136at6656
#> 2         1 100136at6656
#> 3         2 100136at6656
#> 4         1 100136at6656
#> 5         9 100136at6656
#> 6         1 100136at6656

Create profile plot

Generate phylogenetic profile heatmap after processing the raw input file.

#' Load built-in processed data
data("finalProcessedProfile", package="PhyloProfile")

#' Create data for plotting
plotDf <- dataMainPlot(finalProcessedProfile)

#' You can also choose a subset of genes and/or taxa for plotting with:
#' selectedTaxa <- c("Mammalia", "Echinoidea", "Gunneridae")
#' selectedSeq <- "all"
#' plotDf <- dataCustomizedPlot(
#'     finalProcessedProfile, selectedTaxa, selectedSeq
#' )

#' Identify plot's parameters
plotParameter <- list(
    "xAxis" = "taxa",
    "var1ID" = "FAS_FW",
    "var2ID"  = "FAS_BW",
    "midVar1" = 0.5,
    "midColorVar1" =  "#FFFFFF",
    "lowColorVar1" =  "#FF8C00",
    "highColorVar1" = "#4682B4",
    "midVar2" = 1,
    "midColorVar2" =  "#FFFFFF",
    "lowColorVar2" = "#CB4C4E",
    "highColorVar2" = "#3E436F",
    "paraColor" = "#07D000",
    "xSize" = 8,
    "ySize" = 8,
    "legendSize" = 8,
    "mainLegend" = "top",
    "dotZoom" = 0,
    "xAngle" = 60,
    "guideline" = 0,
    "colorByGroup" = FALSE,
    "colorByOrthoID" = FALSE
)

#' Generate profile plot
p <- heatmapPlotting(plotDf, plotParameter)
p
#' To highlight a gene and/or taxon of interest
taxonHighlight <- "Mammalia"
workingRank <- "class"
geneHighlight <- "none"
#' Then use ?highlightProfilePlot function
# highlightProfilePlot(
#    p, plotDf, taxonHighlight, workingRank, geneHighlight, plotParameter$xAxis
# )

Create protein domain architecture plot

Generate the domain architecture plot for a gene of interest and its orthologs.

#' Load protein domain architecture file
domainFile <- system.file(
    "extdata", "domainFiles/101621at6656.domains",
    package = "PhyloProfile", mustWork = TRUE
)

#' Identify IDs of gene of interest and its ortholog partner
seedID <- "101621at6656"
orthoID <- "101621at6656|AGRPL@224129@0|224129_0:001955|1"
info <- c(seedID, orthoID)

#' Get data for 2 selected genes from input file
domainDf <- parseDomainInput(seedID, domainFile, "file")

#' Modify feature IDs
domainDf$feature_id_mod <- domainDf$feature_id
domainDf$feature_id_mod <- gsub("SINGLE", "LCR", domainDf$feature_id_mod)
domainDf$feature_id_mod[domainDf$feature_type == "coils"] <- "Coils"
domainDf$feature_id_mod[domainDf$feature_type == "seg"] <- "LCR"
domainDf$feature_id_mod[domainDf$feature_type == "tmhmm"] <- "TM"

#' Generate plot
plot <- createArchiPlot(info, domainDf, 9, 9)
grid::grid.draw(plot)

Other use cases

Get taxonomy IDs and names in a specified rank for taxa in the input file.

#' Load raw input
data("mainLongRaw", package="PhyloProfile")
inputDf <- mainLongRaw

#' Set working rank and the reference taxon
rankName <- "phylum"

#' Get taxonomy IDs and names for the input data
inputTaxonID <- getInputTaxaID(inputDf)
inputTaxonName <- getInputTaxaName(rankName, inputTaxonID)

head(inputTaxonName)
#>   ncbiID            fullName   rank parentID
#> 1   1117       Cyanobacteria phylum  1798711
#> 2   1224      Proteobacteria phylum        2
#> 3   1224      Proteobacteria phylum        2
#> 4   1224      Proteobacteria phylum        2
#> 5   1297 Deinococcus-Thermus phylum  1783272
#> 6   2836     Bacillariophyta phylum  2696291

Get taxonomy info for list of input taxa and sort it based on the taxonomy distance to a reference taxon

#' Get list of taxon IDs and names for input profile
data("mainLongRaw", package="PhyloProfile")
inputDf <- mainLongRaw
rankName <- "phylum"
inputTaxonID <- getInputTaxaID(inputDf)

#' Input a user-defined taxonomy tree to replace NCBI taxonomy tree (optional)
inputTaxaTree <- NULL

#' Sort taxonomy list based on a selected refTaxon
refTaxon <- "Chordata"
sortedTaxonomy <- sortInputTaxa(
    taxonIDs = inputTaxonID,
    rankName = rankName,
    refTaxon = refTaxon,
    taxaTree = inputTaxaTree,
    NULL,
    NULL
)

head(
    sortedTaxonomy[
        , c("ncbiID", "fullName", "supertaxon", "supertaxonID", "rank")
    ]
)
#>       ncbiID                  fullName         supertaxon supertaxonID   rank
#> 1  ncbi36329 Plasmodium falciparum 3D7 100010_Apicomplexa         5794 phylum
#> 2 ncbi224324      Aquifex aeolicus VF5   100018_Aquificae       200783 phylum
#> 3   ncbi7165         Anopheles gambiae  100002_Arthropoda         6656 phylum
#> 4 ncbi319348  Polypedilum vanderplanki  100002_Arthropoda         6656 phylum
#> 5   ncbi7260     Drosophila willistoni  100002_Arthropoda         6656 phylum
#> 6   ncbi7227   Drosophila melanogaster  100002_Arthropoda         6656 phylum

More examples? Please tell us what you want to see ;-)

How to cite

Ngoc-Vinh Tran, Bastian Greshake Tzovaras, Ingo Ebersberger, PhyloProfile: dynamic visualization and exploration of multi-layered phylogenetic profiles, Bioinformatics, Volume 34, Issue 17, 01 September 2018, Pages 3041–3043, https://doi.org/10.1093/bioinformatics/bty225

Or use the citation function in R CMD to have the citation in BibTex or LaTeX format

citation("PhyloProfile")
#> To cite PhyloProfile in publications use:
#> 
#>   Ngoc-Vinh Tran, Bastian Greshake Tzovaras, Ingo Ebersberger;
#>   PhyloProfile: Dynamic visualization and exploration of multi-layered
#>   phylogenetic profiles, Bioinformatics, Volume 34, Issue 17, 01
#>   September 2018, Pages 3041–3043
#>   https://doi.org/10.1093/bioinformatics/bty225
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {{PhyloProfile}: Dynamic visualization and exploration of multi-layered phylogenetic profiles},
#>     author = {Ngoc Vinh Tran and Bastian Greshake Tzovaras and Ingo Ebersberger},
#>     journal = {Bioinformatics},
#>     volume = {34(17)},
#>     pages = {3041–3043},
#>     year = {2018},
#>     url = {https://doi.org/10.1093/bioinformatics/bty225},
#>   }

How to contribute

Thanks so much for your interest in contributing to PhyloProfile 🎉👍🍾

Contributions to PhyloProfile can take many forms. If you are

biologist, you can

biologist and love coding, you can

  • fix existing bugs,
  • or add new features. Some things we’d love to see are: add scripts to add out-of-the-box support for further orthology prediction tools, increasing the test coverage from 0% to something above that, or basically whatever great idea you have!
  • and all points for non-coding contributors as well :)

not biologist but can code, it would be great if you can

  • test the tool in different environments (Windows, Linux, Mac - Firefox, Chrome, IE, Safari,…),
  • suggest a better user interface,
  • improve the code quality

Don’t hesitate to get in touch with us if you have any questions. You can contact us at [email protected]

SessionInfo()

Here is the output of sessionInfo() on the system on which this document was compiles:

#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> character(0)
#> 
#> other attached packages:
#> [1] PhyloProfile_1.99.1
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3      sys_3.4.3               jsonlite_1.8.9         
#>   [4] umap_0.2.10.0           magrittr_2.0.3          farver_2.1.2           
#>   [7] rmarkdown_2.29          fs_1.6.5                vctrs_0.6.5            
#>  [10] memoise_2.0.1           RCurl_1.98-1.16         askpass_1.2.1          
#>  [13] htmltools_0.5.8.1       energy_1.7-12           AnnotationHub_3.15.0   
#>  [16] curl_6.0.1              sass_0.4.9              shinyFiles_0.9.3       
#>  [19] bslib_0.8.0             htmlwidgets_1.6.4       plotly_4.10.4          
#>  [22] zoo_1.8-12              cachem_1.1.0            buildtools_1.0.0       
#>  [25] mime_0.12               lifecycle_1.0.4         pkgconfig_2.0.3        
#>  [28] colourpicker_1.3.0      Matrix_1.7-1            R6_2.5.1               
#>  [31] fastmap_1.2.0           GenomeInfoDbData_1.2.13 shiny_1.10.0           
#>  [34] digest_0.6.37           colorspace_2.1-1        AnnotationDbi_1.69.0   
#>  [37] S4Vectors_0.45.2        shinycssloaders_1.1.0   RSpectra_0.16-2        
#>  [40] ExperimentHub_2.15.0    RSQLite_2.3.9           filelock_1.0.3         
#>  [43] labeling_0.4.3          RcppZiggurat_0.1.6      httr_1.4.7             
#>  [46] compiler_4.4.2          bit64_4.5.2             withr_3.0.2            
#>  [49] gsl_2.1-8               DBI_1.2.3               Rttf2pt1_1.3.12        
#>  [52] tsne_0.1-3.1            base_4.4.2              openssl_2.3.0          
#>  [55] rappdirs_0.3.3          stats_4.4.2             tools_4.4.2            
#>  [58] ape_5.8-1               httpuv_1.6.15           extrafontdb_1.0        
#>  [61] glue_1.8.0              graphics_4.4.2          nlme_3.1-166           
#>  [64] promises_1.3.2          grid_4.4.2              generics_0.1.3         
#>  [67] gtable_0.3.6            shinyBS_0.61.1          tidyr_1.3.1            
#>  [70] data.table_1.16.4       xml2_1.3.6              XVector_0.47.1         
#>  [73] BiocGenerics_0.53.3     BiocVersion_3.21.1      pillar_1.10.0          
#>  [76] stringr_1.5.1           later_1.4.1             dplyr_1.1.4            
#>  [79] BiocFileCache_2.15.0    lattice_0.22-6          bit_4.5.0.1            
#>  [82] tidyselect_1.2.1        maketools_1.3.1         Biostrings_2.75.3      
#>  [85] miniUI_0.1.1.1          pbapply_1.7-2           knitr_1.49             
#>  [88] gridExtra_2.3           IRanges_2.41.2          scattermore_1.2        
#>  [91] stats4_4.4.2            xfun_0.49               datasets_4.4.2         
#>  [94] Biobase_2.67.0          DT_0.33                 stringi_1.8.4          
#>  [97] UCSC.utils_1.3.0        lazyeval_0.2.2          yaml_2.3.10            
#> [100] boot_1.3-31             evaluate_1.0.1          extrafont_0.19         
#> [103] tibble_3.2.1            BiocManager_1.30.25     cli_3.6.3              
#> [106] RcppParallel_5.1.9      xtable_1.8-4            reticulate_1.40.0      
#> [109] munsell_0.5.1           jquerylib_0.1.4         Rcpp_1.0.13-1          
#> [112] GenomeInfoDb_1.43.2     bioDist_1.79.0          dbplyr_2.5.0           
#> [115] utils_4.4.2             png_0.1-8               fastcluster_1.2.6      
#> [118] parallel_4.4.2          Rfast_2.1.0             ggplot2_3.5.1          
#> [121] blob_1.2.4              methods_4.4.2           bitops_1.0-9           
#> [124] viridisLite_0.4.2       scales_1.3.0            grDevices_4.4.2        
#> [127] purrr_1.0.2             crayon_1.5.3            BiocStyle_2.35.0       
#> [130] rlang_1.1.4             KEGGREST_1.47.0         shinyjs_2.1.0

References

  1. Adebali, O. and Zhulin, I.B. (2017) Aquerium: A web application for comparative exploration of domain-based protein occurrences on the taxonomically clustered genome tree. Proteins, 85, 72-77.

  2. Altenhoff, A.M. et al. (2016) Standardized benchmarking in the quest for orthologs. Nat Methods, 13, 425-430.

  3. Altenhoff, A.M. et al. (2015) The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements. Nucleic Acids Res, 43, D240-249.

  4. Capra, J.A. et al. (2013) How old is my gene? Trends Genet, 29, 659-668. Daubin, V., Gouy, M. and Perriere, G. (2002) A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history. Genome Res, 12, 1080-1090.

  5. Huerta-Cepas, J., Serra, F. and Bork, P. (2016) ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Mol Biol Evol, 33, 1635-1638.

  6. Koestler, T., Haeseler, A.v. and Ebersberger, I. (2010) FACT: Functional annotation transfer between proteins with similar feature architectures. BMC Bioinformatics, 11, 417.

  7. Lee, D., Redfern, O. and Orengo, C. (2007) Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol, 8, 995-1005.

  8. Moore, A.D. et al. (2014) DoMosaics: software for domain arrangement visualization and domain-centric analysis of proteins. Bioinformatics, 30, 282-283.

  9. Pellegrini, M. (2012) Using phylogenetic profiles to predict functional relationships. Methods Mol Biol, 804, 167-177.

  10. Pellegrini, M. et al. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A, 96, 4285-4288.

  11. Schmitt, T. et al. (2011) Letter to the editor: SeqXML and OrthoXML: standards for sequence and orthology information. Brief. Bioinform., 12, 485-488.

  12. Studer, R.A. and Robinson-Rechavi, M. (2009) How confident can we be that orthologs are similar, but paralogs differ? Trends Genet, 25, 210-216.