--- title: "README" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{README} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # goSorensen This R package implements inferential methods to compare gene lists (proving equivalence) in terms of their biological significance as expressed in the Gene Ontology GO. The compared gene lists are characterized by cross-tabulation frequency tables of enriched GO terms. Dissimilarity between gene lists is evaluated using the Sorensen-Dice dissimilarity. The fundamental guiding principle is that two gene lists are taken as similar if they share a significant proportion of common enriched GO terms. This inferential method, is developed and explained in the paper *An equivalence test between features lists, based on the Sorensen-Dice index and the joint frequencies of GO term enrichment*, available online in ## Installation Instructions goSorensen package has to be installed with a working R version (>=4.3.3). Installation could take a few minutes on a regular desktop or laptop. Package can be installed from Bioconductor as follows: ``` if (!requireNamespace("goSorensen", quietly = TRUE)) { BiocManager::install("goSorensen") } library(goSorensen) ``` ### Installing Dependencies of Genomic Annotation Packages in Bioconductor Before using `goSorensen`, the user must have adequate knowledge of the species they intend to focus their analysis. The genomic annotation packages available in Bioconductor provide all the essential information about these species. For instance, if the research involves mice, the user must install and activate the annotation package `org.Mm.eg.db` in the following manner: ``` if (!requireNamespace("org.Mm.eg.db", quietly = TRUE)) { BiocManager::install("org.Mm.eg.db") } ``` Other species may include: - `org.Hs.eg.db`: Genome wide annotation for Humans. - `org.At.tair.db`: Genome wide annotation for Arabidopsis - `org.Ag.eg.db`: Genome wide annotation for Anopheles - `org.Bt.eg.db`: Genome wide annotation for Bovine - `org.Ce.eg.db`: Genome wide annotation for Worm - `org.Cf.eg.db`: Genome wide annotation for Canine - `org.Dm.eg.db`: Genome wide annotation for Fly - `org.EcSakai.eg.db`: Genome wide annotation for E coli strain Sakai - `org.EcK12.eg.db`: Genome wide annotation for E coli strain K12 - `org.Dr.eg.db`: Genome wide annotation for Zebrafish - `org.Gg.eg.db`: Genome wide annotation for Chicken - `org.Mm.eg.db`: Genome wide annotation for Mouse - `org.Mmu.eg.db`: Genome wide annotation for Rhesus - etc... Due to the extensive research conducted on the human species and the examples documented in goSorensen about this species, the installation of the goSorensen package automatically includes the annotation package `org.Hs.eg.db` as a dependency. Therefore, there is no need to install it previously. The user must install the appropriate package to use genomic annotation for other species. The goSorensen functions require genome annotation via the `orgPackg` argument to annotate genes in GO terms for enrichment analysis. For example, the enrichment contingency table can be computed using the `buildEnrichTable` function. ``` buildEnrichTable(..., orgPackg = "org.Mm.eg.db") ``` ### Gene Identifiers `goSorensen` functions, such as `buidEnrichTable`, that entail identifying annotations in a GO term, which are essential for determining enrichment, require the user to provide, in the argument `geneUniverse`, a vector containing the identifiers of the universe of genes (wide genome) of the species being analysed. These gene identifiers can be readily obtained from the above-mentioned genomic annotation packages through the function `keys`. For instance, one can acquire the ENTREZ identifiers for the universe of genes in mice as follows: ``` # Load the package: library(org.Mm.eg.db) # Obtain the IDs: mouseEntrezIDs <- keys(org.Mm.eg.db, keytype = "ENTREZID") ``` Or, for the human species, as follows: ``` # Load the package: library(org.Hs.eg.db) # Obtain the IDs: humanEntrezIDs <- keys(org.Hs.eg.db, keytype = "ENTREZID") ``` And in this way, for any other species Then, one can use this object in the `geneUniverse` argument in whatever goSorensen function (e.g., `buidEnrichTable`) is needed: ``` buildEnrichTable(..., orgPackg = "org.Mm.eg.db", geneUniverse = mouseEntrezIDs) ``` ## Contribution Guidelines Contributions are welcome, if you wish to contribute or give ideas to improve the package, please you can contact with maintainer (Pablo Flores) to the address `p_flores@espoch.edu.ec`, and we can discuss your suggestion. ## References
Flores, P., Salicru, M., Sanchez-Pla, A., & Ocana, J. (2022). An equivalence test between features lists, based on the Sorensen-Dice index and the joint frequencies of GO term enrichment. BMC bioinformatics, 23(1), 1-21.