Title: | Interspecies gene mapping |
---|---|
Description: | `orthogene` is an R package for easy mapping of orthologous genes across hundreds of species. It pulls up-to-date gene ortholog mappings across **700+ organisms**. It also provides various utility functions to aggregate/expand common objects (e.g. data.frames, gene expression matrices, lists) using **1:1**, **many:1**, **1:many** or **many:many** gene mappings, both within- and between-species. |
Authors: | Brian Schilder [cre] |
Maintainer: | Brian Schilder <[email protected]> |
License: | GPL-3 |
Version: | 1.13.0 |
Built: | 2024-12-29 06:24:08 UTC |
Source: | https://github.com/bioc/orthogene |
orthogene is an R package for easy mapping of orthologous genes across hundreds of species.
It pulls up-to-date interspecies gene ortholog mappings
across 700+ organisms.
It also provides various utility functions to map common objects
(e.g. data.frames, gene expression matrices, lists) onto
1:1 gene orthologs from any other species.
Maintainer: Brian Schilder [email protected] (ORCID)
GitHub : Source code and Issues submission.
Author Site :
orthogene
was created by Brian M. Schilder.
Useful links:
Report bugs at https://github.com/neurogenomics/orthogene/issues
Aggregate/expand a gene matrix (gene_df
) using a gene mapping
data.frame (gene_map
).
Importantly, mappings can be performed across a variety of scenarios that
can occur during within-species and between-species gene mapping:
1 gene : 1 gene
many genes : 1 gene
1 gene : many genes
many genes : many genes
For more details on how aggregation/expansion is performed, please see: many2many_rows.
aggregate_mapped_genes( gene_df, gene_map = NULL, input_col = "input_gene", output_col = "ortholog_gene", input_species = "human", output_species = input_species, method = c("gprofiler", "homologene", "babelgene"), agg_fun = "sum", agg_method = c("monocle3", "stats"), aggregate_orthologs = TRUE, transpose = FALSE, mthreshold = 1, target = "ENSG", numeric_ns = "", as_integers = FALSE, as_sparse = TRUE, as_DelayedArray = FALSE, dropNA = TRUE, sort_rows = FALSE, verbose = TRUE )
aggregate_mapped_genes( gene_df, gene_map = NULL, input_col = "input_gene", output_col = "ortholog_gene", input_species = "human", output_species = input_species, method = c("gprofiler", "homologene", "babelgene"), agg_fun = "sum", agg_method = c("monocle3", "stats"), aggregate_orthologs = TRUE, transpose = FALSE, mthreshold = 1, target = "ENSG", numeric_ns = "", as_integers = FALSE, as_sparse = TRUE, as_DelayedArray = FALSE, dropNA = TRUE, sort_rows = FALSE, verbose = TRUE )
gene_df |
Input matrix where row names are genes. |
gene_map |
A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows:
|
input_col |
Column name within |
output_col |
Column name within |
input_species |
Name of the input species (e.g., "mouse","fly"). Use map_species to return a full list of available species. |
output_species |
Name of the output species (e.g. "human","chicken"). Use map_species to return a full list of available species. |
method |
R package to use for gene mapping:
|
agg_fun |
Aggregation function. |
agg_method |
Aggregation method. |
aggregate_orthologs |
[Optional] After performing an initial round of
many:many aggregation/expansion with many2many_rows,
ensure each orthologous gene only appears in one row by using the
aggregate_rows function (default: |
transpose |
Transpose |
mthreshold |
maximum number of results per initial alias to show. Shows all by default. |
target |
target namespace. |
numeric_ns |
namespace to use for fully numeric IDs (list of available namespaces). |
as_integers |
Force all values in the matrix to become integers,
by applying floor (default: |
as_sparse |
Convert aggregated matrix to sparse matrix. |
as_DelayedArray |
Convert aggregated matrix to DelayedArray. |
dropNA |
Drop genes assigned to |
sort_rows |
Sort |
verbose |
Print messages. |
Aggregated matrix
#### Aggregate within species: gene synonyms #### data("exp_mouse_enst") X_agg <- aggregate_mapped_genes(gene_df = exp_mouse_enst, input_species = "mouse") #### Aggregate across species: gene orthologs #### data("exp_mouse") X_agg2 <- aggregate_mapped_genes(gene_df = exp_mouse, input_species = "mouse", output_species = "human", method="homologene")
#### Aggregate within species: gene synonyms #### data("exp_mouse_enst") X_agg <- aggregate_mapped_genes(gene_df = exp_mouse_enst, input_species = "mouse") #### Aggregate across species: gene orthologs #### data("exp_mouse") X_agg2 <- aggregate_mapped_genes(gene_df = exp_mouse, input_species = "mouse", output_species = "human", method="homologene")
Return all known genes from a given species.
all_genes( species, method = c("gprofiler", "homologene", "babelgene"), ensure_filter_nas = FALSE, run_map_species = TRUE, verbose = TRUE, ... )
all_genes( species, method = c("gprofiler", "homologene", "babelgene"), ensure_filter_nas = FALSE, run_map_species = TRUE, verbose = TRUE, ... )
species |
Species to get all genes for.
Will first be standardised with |
method |
R package to use for gene mapping:
|
ensure_filter_nas |
Perform an extra check to remove
genes that are |
run_map_species |
Standardise |
verbose |
Print messages. |
... |
Additional arguments to be passed to
gorth or homologene. |
References homologeneData or gconvert.
Table with all gene symbols
from the given species
.
genome_mouse <- all_genes(species = "mouse") genome_human <- all_genes(species = "human")
genome_mouse <- all_genes(species = "mouse") genome_human <- all_genes(species = "human")
List all species currently supported by orthogene.
Wrapper function for map_species.
When method=NULL
, all species from all available
methods will be returned.
all_species(method = NULL, verbose = TRUE)
all_species(method = NULL, verbose = TRUE)
method |
R package to use for gene mapping:
|
verbose |
Print messages. |
data.table of species names, provided in multiple formats.
species_dt <- all_species()
species_dt <- all_species()
Currently supports ortholog mapping between any
pair of 700+ species.
Use map_species to
return a full list of available organisms.
convert_orthologs( gene_df, gene_input = "rownames", gene_output = "rownames", standardise_genes = FALSE, input_species, output_species = "human", method = c("gprofiler", "homologene", "babelgene"), drop_nonorths = TRUE, non121_strategy = "drop_both_species", agg_fun = NULL, mthreshold = Inf, as_sparse = FALSE, as_DelayedArray = FALSE, sort_rows = FALSE, gene_map = NULL, input_col = "input_gene", output_col = "ortholog_gene", verbose = TRUE, ... )
convert_orthologs( gene_df, gene_input = "rownames", gene_output = "rownames", standardise_genes = FALSE, input_species, output_species = "human", method = c("gprofiler", "homologene", "babelgene"), drop_nonorths = TRUE, non121_strategy = "drop_both_species", agg_fun = NULL, mthreshold = Inf, as_sparse = FALSE, as_DelayedArray = FALSE, sort_rows = FALSE, gene_map = NULL, input_col = "input_gene", output_col = "ortholog_gene", verbose = TRUE, ... )
gene_df |
Data object containing the genes
(see
Genes, transcripts, proteins, SNPs, or genomic ranges
can be provided in any format
(HGNC, Ensembl, RefSeq, UniProt, etc.) and will be
automatically converted to gene symbols unless
specified otherwise with the |
gene_input |
Which aspect of
|
gene_output |
How to return genes.
Options include:
|
standardise_genes |
If |
input_species |
Name of the input species (e.g., "mouse","fly"). Use map_species to return a full list of available species. |
output_species |
Name of the output species (e.g. "human","chicken"). Use map_species to return a full list of available species. |
method |
R package to use for gene mapping:
|
drop_nonorths |
Drop genes that don't have an ortholog
in the |
non121_strategy |
How to handle genes that don't have
1:1 mappings between
|
agg_fun |
Aggregation function passed to
aggregate_mapped_genes.
Set to |
mthreshold |
Maximum number of ortholog names per gene to show.
Passed to gorth.
Only used when |
as_sparse |
Convert
If |
as_DelayedArray |
Convert aggregated matrix to DelayedArray. |
sort_rows |
Sort |
gene_map |
A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows:
|
input_col |
Column name within |
output_col |
Column name within |
verbose |
Print messages. |
... |
Additional arguments to be passed to
gorth or homologene. |
gene_df
with orthologs converted to the
output_species
.
Instead returned as a dictionary (named list) if
gene_output="dict"
or "dict_rev"
.
data("exp_mouse") gene_df <- convert_orthologs( gene_df = exp_mouse, input_species = "mouse" )
data("exp_mouse") gene_df <- convert_orthologs( gene_df = exp_mouse, input_species = "mouse" )
Create a gene background as the union/intersect of
all orthologs between input species (species1
and species2
),
and the output_species
.
This can be useful when generating random lists of background genes
to test against in analyses with data from multiple species
(e.g. enrichment of mouse cell-type markers gene sets in
human GWAS-derived gene sets).
create_background( species1, species2, output_species = "human", as_output_species = TRUE, use_intersect = TRUE, bg = NULL, gene_map = NULL, method = "homologene", non121_strategy = "drop_both_species", verbose = TRUE )
create_background( species1, species2, output_species = "human", as_output_species = TRUE, use_intersect = TRUE, bg = NULL, gene_map = NULL, method = "homologene", non121_strategy = "drop_both_species", verbose = TRUE )
species1 |
First species. |
species2 |
Second species. |
output_species |
Species to convert all genes from
|
as_output_species |
Return background gene list as
|
use_intersect |
When |
bg |
User supplied background list that will be returned to the user after removing duplicate genes. |
gene_map |
User-supplied |
method |
R package to use for gene mapping:
|
non121_strategy |
How to handle genes that don't have
1:1 mappings between
|
verbose |
Print messages. |
Background gene list.
bg <- orthogene::create_background(species1 = "mouse", species2 = "rat", output_species = "human")
bg <- orthogene::create_background(species1 = "mouse", species2 = "rat", output_species = "human")
Mean pseudobulk single-cell RNA-seq gene expression matrix.
Data originally comes from Zeisel et al., 2018 (Cell).
data("exp_mouse")
data("exp_mouse")
sparse matrix
Publication
ctd <- ewceData::ctd()
exp_mouse <- as(ctd[[1]]$mean_exp, "sparseMatrix")
usethis::use_data(exp_mouse, overwrite = TRUE)
Mean pseudobulk single-cell RNA-seq Transcript expression matrix.
Data originally comes from Zeisel et al., 2018 (Cell).
data("exp_mouse_enst")
data("exp_mouse_enst")
sparse matrix
Publication
data("exp_mouse")
mapped_genes <- map_genes(genes = rownames(exp_mouse)[seq(1,100)],
target = "ENST",
species = "mouse",
drop_na = FALSE)
exp_mouse_enst <- exp_mouse[mapped_genes$input,]
rownames(exp_mouse_enst) <- mapped_genes$target
all_nas <- orthogene:::find_all_nas(rownames(exp_mouse_enst))
exp_mouse_enst <- exp_mouse_enst[!all_nas,]
exp_mouse_enst <- phenomix::add_noise(exp_mouse_enst)
usethis::use_data(exp_mouse_enst, overwrite = TRUE)
Format scientific species names into a standardised manner.
format_species( species, remove_parentheses = TRUE, abbrev = FALSE, remove_subspecies = FALSE, remove_subspecies_exceptions = c("Canis lupus familiaris"), split_char = " ", collapse = " ", remove_chars = c(" ", ".", "(", ")", "[", "]"), replace_char = "", lowercase = FALSE, trim = "'", standardise_scientific = FALSE )
format_species( species, remove_parentheses = TRUE, abbrev = FALSE, remove_subspecies = FALSE, remove_subspecies_exceptions = c("Canis lupus familiaris"), split_char = " ", collapse = " ", remove_chars = c(" ", ".", "(", ")", "[", "]"), replace_char = "", lowercase = FALSE, trim = "'", standardise_scientific = FALSE )
species |
Species query
(e.g. "human", "homo sapiens", "hsapiens", or 9606).
If given a list, will iterate queries for each item.
Set to |
remove_parentheses |
Remove substring within parentheses: e.g. "Xenopus (Silurana) tropicalis" –> "Xenopus tropicalis" |
abbrev |
Abbreviate all taxonomic levels except the last one: e.g. "Canis lupus familiaris" ==> "C l familiaris" |
remove_subspecies |
Only keep the first two taxonomic levels: e.g. "Canis lupus familiaris" –> "Canis lupus" |
remove_subspecies_exceptions |
Selected species to ignore when
|
split_char |
Character to split species names by. |
collapse |
Character to re-collapse species names with after splitting
with |
remove_chars |
Characters to remove. |
replace_char |
Character to replace |
lowercase |
Make species names all lowercase. |
trim |
Characters to trim from the beginning/end of each species name. |
standardise_scientific |
Automatically sets multiple arguments at once
to create standardised scientific names for each species. Assumes that
|
A named vector where the values are the standardised species names and the names are the original input species names.
species <- c("Xenopus (Silurana) tropicalis","Canis lupus familiaris") species2 <- format_species(species = species, abbrev=TRUE) species3 <- format_species(species = species, standardise_scientific=TRUE, remove_subspecies_exceptions=NULL)
species <- c("Xenopus (Silurana) tropicalis","Canis lupus familiaris") species2 <- format_species(species = species, abbrev=TRUE) species3 <- format_species(species = species, standardise_scientific=TRUE, remove_subspecies_exceptions=NULL)
Get silhouette images of each species from phylopic.
get_silhouettes( species, which = rep(1, length(species)), run_format_species = TRUE, include_image_data = FALSE, mc.cores = 1, add_png = FALSE, remove_bg = FALSE, verbose = TRUE )
get_silhouettes( species, which = rep(1, length(species)), run_format_species = TRUE, include_image_data = FALSE, mc.cores = 1, add_png = FALSE, remove_bg = FALSE, verbose = TRUE )
species |
A character vector of species names to query phylopic for. |
which |
An integer vector of the same length as |
run_format_species |
Standardise species names with
format_species before querying
phylopic (default: |
include_image_data |
Include the image data itself (not just the image UID) in the results. |
mc.cores |
Accelerate multiple species queries by parallelising across multiple cores. |
add_png |
Return URLs for both the SVG and PNG versions of the image. |
remove_bg |
Remove image background. |
verbose |
Print messages. |
data.frame with:
input_species : Species name (input).
species : Species name (standardised).
uid : Species UID.
url : Image URL.
Related function: ggimage::geom_phylopic
phylopic/rphylopic API changes
ggimage: Issue with finding valid PNGs
species <- c("Mus_musculus","Pan_troglodytes","Homo_sapiens") uids <- get_silhouettes(species = species)
species <- c("Mus_musculus","Pan_troglodytes","Homo_sapiens") uids <- get_silhouettes(species = species)
Available namespaces used by link[gprofiler2]gconvert.
data.frame
#### Manually-prepared CSV ####
path <- "inst/extdata/gprofiler_namespace.csv.gz"
gprofiler_namespace <- data.table::fread(path)
Organism for which gene references are available via gProfiler API. Used as a backup if API is not available.
data.frame
# NOTE!: Must run usethis::use_data for all internal data at once.
# otherwise, the prior internal data will be overwritten.
#### Internal data 1: gprofiler_namespace ####
#### Manually-prepared CSV ####
path <- "inst/extdata/gprofiler_namespace.csv.gz"
gprofiler_namespace <- data.table::fread(path)
#### Internal data 2: gprofiler_orgs
gprofiler_orgs <- orthogene:::get_orgdb_gprofiler(use_local=FALSE)
#### Save ####
usethis::use_data(gprofiler_orgs,gprofiler_namespace,
overwrite = TRUE, internal=TRUE)
Infers which species the genes within gene_df
is from.
Iteratively test the percentage of gene_df
genes that match with
the genes from each test_species
.
infer_species( gene_df, gene_input = "rownames", test_species = c("human", "monkey", "rat", "mouse", "zebrafish", "fly"), method = c("homologene", "gprofiler", "babelgene"), make_plot = TRUE, show_plot = TRUE, verbose = TRUE )
infer_species( gene_df, gene_input = "rownames", test_species = c("human", "monkey", "rat", "mouse", "zebrafish", "fly"), method = c("homologene", "gprofiler", "babelgene"), make_plot = TRUE, show_plot = TRUE, verbose = TRUE )
gene_df |
Data object containing the genes
(see
Genes, transcripts, proteins, SNPs, or genomic ranges
can be provided in any format
(HGNC, Ensembl, RefSeq, UniProt, etc.) and will be
automatically converted to gene symbols unless
specified otherwise with the |
gene_input |
Which aspect of
|
test_species |
Which species to test for matches with.
If set to
|
method |
R package to use for gene mapping:
|
make_plot |
Make a plot of the results. |
show_plot |
Print the plot of the results. |
verbose |
Print messages. |
An ordered dataframe of test_species
from best to worst matches.
data("exp_mouse") matches <- orthogene::infer_species(gene_df = exp_mouse[1:200,])
data("exp_mouse") matches <- orthogene::infer_species(gene_df = exp_mouse[1:200,])
Input a list of genes, transcripts, proteins, SNPs, or genomic ranges in any format (HGNC, Ensembl, RefSeq, UniProt, etc.) and return a table with standardised gene symbols (the "names" column).
map_genes( genes, species = "hsapiens", target = "ENSG", mthreshold = Inf, drop_na = FALSE, numeric_ns = "", run_map_species = TRUE, verbose = TRUE )
map_genes( genes, species = "hsapiens", target = "ENSG", mthreshold = Inf, drop_na = FALSE, numeric_ns = "", run_map_species = TRUE, verbose = TRUE )
genes |
Gene list. |
species |
Species to map against. |
target |
target namespace. |
mthreshold |
maximum number of results per initial alias to show. Shows all by default. |
drop_na |
Drop all genes without mappings.
Sets |
numeric_ns |
namespace to use for fully numeric IDs (list of available namespaces). |
run_map_species |
Standardise |
verbose |
Print messages. |
Uses gconvert.
The exact contents of the output table will depend on
target
parameter.
See ?gprofiler2::gconvert
for more details.
Table with standardised genes.
genes <- c( "Klf4", "Sox2", "TSPAN12", "NM_173007", "Q8BKT6", "ENSMUSG00000012396", "ENSMUSG00000074637" ) mapped_genes <- map_genes( genes = genes, species = "mouse" )
genes <- c( "Klf4", "Sox2", "TSPAN12", "NM_173007", "Q8BKT6", "ENSMUSG00000012396", "ENSMUSG00000074637" ) mapped_genes <- map_genes( genes = genes, species = "mouse" )
Map orthologs from one species to another.
map_orthologs( genes, standardise_genes = FALSE, input_species, output_species = "human", method = c("gprofiler", "homologene", "babelgene"), mthreshold = Inf, gene_map = NULL, input_col = "input_gene", output_col = "ortholog_gene", verbose = TRUE, ... )
map_orthologs( genes, standardise_genes = FALSE, input_species, output_species = "human", method = c("gprofiler", "homologene", "babelgene"), mthreshold = Inf, gene_map = NULL, input_col = "input_gene", output_col = "ortholog_gene", verbose = TRUE, ... )
genes |
can be a mixture of any format (HGNC, Ensembl, RefSeq, UniProt, etc.) and will be automatically converted to standardised HGNC symbol format. |
standardise_genes |
If |
input_species |
Name of the input species (e.g., "mouse","fly"). Use map_species to return a full list of available species. |
output_species |
Name of the output species (e.g. "human","chicken"). Use map_species to return a full list of available species. |
method |
R package to use for gene mapping:
|
mthreshold |
Maximum number of ortholog names per gene to show.
Passed to gorth.
Only used when |
gene_map |
A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows:
|
input_col |
Column name within |
output_col |
Column name within |
verbose |
Print messages. |
... |
Additional arguments to be passed to
gorth or homologene. |
map_orthologs()
is a core function within
convert_orthologs()
, but does not have many
of the extra checks, such as non121_strategy
)
and drop_nonorths
.
Ortholog map data.frame
with at
least the columns "input_gene" and "ortholog_gene".
data("exp_mouse") gene_map <- map_orthologs( genes = rownames(exp_mouse), input_species = "mouse")
data("exp_mouse") gene_map <- map_orthologs( genes = rownames(exp_mouse), input_species = "mouse")
Search gprofiler
database for species
that match the input text string.
Then translate to a standardised species ID.
map_species( species = NULL, search_cols = c("display_name", "id", "scientific_name", "taxonomy_id"), output_format = c("scientific_name", "id", "display_name", "taxonomy_id", "version", "scientific_name_formatted"), method = c("homologene", "gprofiler", "babelgene"), remove_subspecies = TRUE, remove_subspecies_exceptions = c("Canis lupus familiaris"), use_local = TRUE, verbose = TRUE )
map_species( species = NULL, search_cols = c("display_name", "id", "scientific_name", "taxonomy_id"), output_format = c("scientific_name", "id", "display_name", "taxonomy_id", "version", "scientific_name_formatted"), method = c("homologene", "gprofiler", "babelgene"), remove_subspecies = TRUE, remove_subspecies_exceptions = c("Canis lupus familiaris"), use_local = TRUE, verbose = TRUE )
species |
Species query
(e.g. "human", "homo sapiens", "hsapiens", or 9606).
If given a list, will iterate queries for each item.
Set to |
search_cols |
Which columns to search for
|
output_format |
Which column to return. |
method |
R package to use for gene mapping:
|
remove_subspecies |
Only keep the first two taxonomic levels: e.g. "Canis lupus familiaris" –> "Canis lupus" |
remove_subspecies_exceptions |
Selected species to ignore when
|
use_local |
If |
verbose |
Print messages. |
Species ID of type output_format
ids <- map_species(species = c( "human", 9606, "mus musculus", "fly", "C elegans" ))
ids <- map_species(species = c( "human", 9606, "mus musculus", "fly", "C elegans" ))
Automatically creates a phylogenetic tree plot annotated with metadata
describing how many orthologous genes each species shares with the
reference_species
("human" by default).
plot_orthotree( tree = NULL, orth_report = NULL, species = NULL, method = c("babelgene", "homologene", "gprofiler"), tree_source = "timetree", non121_strategy = "drop_both_species", reference_species = "human", clades = list(Primates = c("Homo sapiens", "Macaca mulatta"), Eutherians = c("Homo sapiens", "Mus musculus", "Bos taurus"), Mammals = c("Homo sapiens", "Mus musculus", "Bos taurus", "Ornithorhynchus anatinus", "Monodelphis domestica"), Tetrapods = c("Homo sapiens", "Mus musculus", "Gallus gallus", "Anolis carolinensis", "Xenopus tropicalis"), Vertebrates = c("Homo sapiens", "Mus musculus", "Gallus gallus", "Anolis carolinensis", "Xenopus tropicalis", "Danio rerio"), Invertebrates = c("Drosophila melanogaster", "Caenorhabditis elegans")), clades_rotate = list(), scaling_factor = NULL, show_plot = TRUE, save_paths = c(tempfile(fileext = ".ggtree.pdf"), tempfile(fileext = ".ggtree.png")), width = 15, height = width, mc.cores = 1, verbose = TRUE )
plot_orthotree( tree = NULL, orth_report = NULL, species = NULL, method = c("babelgene", "homologene", "gprofiler"), tree_source = "timetree", non121_strategy = "drop_both_species", reference_species = "human", clades = list(Primates = c("Homo sapiens", "Macaca mulatta"), Eutherians = c("Homo sapiens", "Mus musculus", "Bos taurus"), Mammals = c("Homo sapiens", "Mus musculus", "Bos taurus", "Ornithorhynchus anatinus", "Monodelphis domestica"), Tetrapods = c("Homo sapiens", "Mus musculus", "Gallus gallus", "Anolis carolinensis", "Xenopus tropicalis"), Vertebrates = c("Homo sapiens", "Mus musculus", "Gallus gallus", "Anolis carolinensis", "Xenopus tropicalis", "Danio rerio"), Invertebrates = c("Drosophila melanogaster", "Caenorhabditis elegans")), clades_rotate = list(), scaling_factor = NULL, show_plot = TRUE, save_paths = c(tempfile(fileext = ".ggtree.pdf"), tempfile(fileext = ".ggtree.png")), width = 15, height = width, mc.cores = 1, verbose = TRUE )
tree |
A phylogenetic tree of class phylo. If no tree
is provided ( |
orth_report |
An ortholog report from one or more species generated by report_orthologs. |
species |
Species to include in the final plot.
If |
method |
R package to use for gene mapping:
|
tree_source |
Can be one of the following:
|
non121_strategy |
How to handle genes that don't have
1:1 mappings between
|
reference_species |
Reference species. |
clades |
A named list of clades each containing a character vector of species used to define the respective clade using MRCA. |
clades_rotate |
A list of clades to rotate (via rotate), each containing a character vector of species used to define the respective clade using MRCA. |
scaling_factor |
How much to scale y-axis parameters (e.g. offset) by. |
show_plot |
Whether to print the final tree plot. |
save_paths |
Paths to save plot to. |
width |
Saved plot width. |
height |
Saved plot height. |
mc.cores |
Number of cores to parallelise different steps with. |
verbose |
Print messages. |
A list containing:
plot : Annotated ggtree object.
tree : The pruned, standardised phylogenetic tree used in the plot.
orth_report : Ortholog reports for each species
against
the reference_species
.
metadata : Metadata used in the plot, including silhouette PNG ids from phylopic.
clades : Metadata used for highlighting clades.
method : method
used.
reference_species : reference_species
used.
save_paths : save_paths
to plot.
orthotree <- plot_orthotree(species = c("human","monkey","mouse"))
orthotree <- plot_orthotree(species = c("human","monkey","mouse"))
Import a phylogenetic tree and then conduct
a series of optional standardisation steps.
Optionally, if output_format
is not NULL
, species names from
both the tree and the species
argument will first be standardised
using map_species.
prepare_tree( tree_source = "timetree", species = NULL, output_format = "scientific_name_formatted", run_map_species = c(TRUE, TRUE), method = c("homologene", "gprofiler", "babelgene"), force_ultrametric = TRUE, age_max = NULL, show_plot = TRUE, save_dir = tools::R_user_dir("orthogene", which = "cache"), verbose = TRUE, ... )
prepare_tree( tree_source = "timetree", species = NULL, output_format = "scientific_name_formatted", run_map_species = c(TRUE, TRUE), method = c("homologene", "gprofiler", "babelgene"), force_ultrametric = TRUE, age_max = NULL, show_plot = TRUE, save_dir = tools::R_user_dir("orthogene", which = "cache"), verbose = TRUE, ... )
tree_source |
Can be one of the following:
|
species |
Species names to subset the tree by
(after |
output_format |
Which column to return. |
run_map_species |
Whether to first standardise species names with map_species. |
method |
R package to use for gene mapping:
|
force_ultrametric |
Whether to force the tree to be ultrametric (i.e. make all tips the same date) using force.ultrametric. |
age_max |
Rescale the edges of the tree into units of
millions of years (MY) instead than evolutionary rates (e.g. dN/dS ratios).
Only used if |
show_plot |
Show a basic plot of the resulting tree. |
save_dir |
Directory to cache full tree in.
Set to |
verbose |
Print messages. |
... |
Additional arguments passed to makeChronosCalib. |
A filtered tree of class "phylo" (with standardised species names).
TimeTree 5: An Expanded Resource for Species Divergence Times
species <- c("human","chimp","mouse") tr <- orthogene::prepare_tree(species = species)
species <- c("human","chimp","mouse") tr <- orthogene::prepare_tree(species = species)
Identify the number of orthologous genes between two species.
report_orthologs( target_species = "mouse", reference_species = "human", standardise_genes = FALSE, method_all_genes = c("homologene", "gprofiler", "babelgene"), method_convert_orthologs = method_all_genes, drop_nonorths = TRUE, non121_strategy = "drop_both_species", round_digits = 2, return_report = TRUE, ref_genes = NULL, mc.cores = 1, verbose = TRUE, ... )
report_orthologs( target_species = "mouse", reference_species = "human", standardise_genes = FALSE, method_all_genes = c("homologene", "gprofiler", "babelgene"), method_convert_orthologs = method_all_genes, drop_nonorths = TRUE, non121_strategy = "drop_both_species", round_digits = 2, return_report = TRUE, ref_genes = NULL, mc.cores = 1, verbose = TRUE, ... )
target_species |
Target species. |
reference_species |
Reference species. |
standardise_genes |
If |
method_all_genes |
R package to to use in all_genes step:
|
method_convert_orthologs |
R package to to use in convert_orthologs step:
|
drop_nonorths |
Drop genes that don't have an ortholog
in the |
non121_strategy |
How to handle genes that don't have
1:1 mappings between
|
round_digits |
Number of digits to round to when printing percentages. |
return_report |
Return just the ortholog mapping
between two species ( |
ref_genes |
A table of all genes for the |
mc.cores |
Number of cores to parallelise each
|
verbose |
Print messages. |
... |
Arguments passed on to
|
A list containing:
map : A table of inter-species gene mappings.
report : A list of aggregate orthology report statistics.
If >1 target_species
are provided, then a table of
aggregated report
statistics concatenated across species
will be returned instead.
orth_fly <- report_orthologs( target_species = "fly", reference_species = "human")
orth_fly <- report_orthologs( target_species = "fly", reference_species = "human")