Package 'GOfuncR' reference manual

Title:	Gene ontology enrichment using FUNC
Description:	GOfuncR performs a gene ontology enrichment analysis based on the ontology enrichment software FUNC. GO-annotations are obtained from OrganismDb or OrgDb packages ('Homo.sapiens' by default); the GO-graph is included in the package and updated regularly (01-May-2021). GOfuncR provides the standard candidate vs. background enrichment analysis using the hypergeometric test, as well as three additional tests: (i) the Wilcoxon rank-sum test that is used when genes are ranked, (ii) a binomial test that is used when genes are associated with two counts and (iii) a Chi-square or Fisher's exact test that is used in cases when genes are associated with four counts. To correct for multiple testing and interdependency of the tests, family-wise error rates are computed based on random permutations of the gene-associated variables. GOfuncR also provides tools for exploring the ontology graph and the annotations, and options to take gene-length or spatial clustering of genes into account. It is also possible to provide custom gene coordinates, annotations and ontologies.
Authors:	Steffi Grote
Maintainer:	Steffi Grote <[email protected]>
License:	GPL (>= 2)
Version:	1.27.0
Built:	2025-03-15 05:48:08 UTC
Source:	https://github.com/bioc/GOfuncR

Get all associated ontology categories for the input genes

Description

Returns all associated GO-categories given a vector of gene-symbols, e.g. c('SPAG5', 'BTC').

Usage

get_anno_categories(genes,  database = 'Homo.sapiens', annotations = NULL,
    term_df = NULL, godir = NULL, silent = FALSE)
get_anno_categories(genes,  database = 'Homo.sapiens', annotations = NULL,
    term_df = NULL, godir = NULL, silent = FALSE)

Arguments

`genes`	a character() vector of gene-symbols, e.g. c('SPAG5', 'BTC').
`database`	optional character() defining an OrganismDb or OrgDb annotation package from Bioconductor, like 'Mus.musculus' (mouse) or 'org.Pt.eg.db' (chimp).
`annotations`	optional data.frame() with two character() columns: gene-symbols and GO-categories. Alternative to 'database'.
`term_df`	optional data.frame() with an ontology 'term' table. Alternative to the default integrated GO-graph or `godir`.
`godir`	optional character() specifying a directory that contains the ontology table 'term.txt'. Alternative to the default integrated GO-graph or `term_df`.
`silent`	logical. If TRUE all output to the screen except for warnings and errors is suppressed.

Details

Besides the default 'Homo.sapiens', also other OrganismDb or OrgDb packages from Bioconductor, like 'Mus.musculus' (mouse) or 'org.Pt.eg.db' (chimp), can be used. It is also possible to directly provide a dataframe with annotations, which is then searched for the input genes and filtered for GO-categories that are present in the ontology.

By default the package's integrated ontology is used, but a custom ontology can be defined, too. For details on how to use a custom ontology with term_df or godir please refer to the package's vignette. The advantage of term_df over godir is that the latter reads the file 'term.txt' from disk and therefore takes longer.

Value

a data.frame() with four columns: gene (character()), GO-ID (character(), GO-name (character() and GO-domain (character()).

Note

This gives only direct annotations of genes to GO-categories. By definition genes are also indirectly annotated to all parent nodes of those categories. Use get_parent_nodes to get the higher level categories of the directly annotated GO-categories.
Also note that GO-categories which are not represented or obsolete in the internal GO-graph of GOfuncR or the custom ontology provided through term_df or godir are removed to be consistent with the annotations used in go_enrich.

Author(s)

Steffi Grote

References

[1] Ashburner, M. et al. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25-29.

Examples


## get the GO-annotations for two random genes
anno1 = get_anno_categories(c('BTC', 'SPAG5'))
head(anno1)
## get the GO-annotations for two random genes
anno1 = get_anno_categories(c('BTC', 'SPAG5'))
head(anno1)

Get genes that are annotated to GO-categories

Description

Given a vector of GO-IDs, e.g. c('GO:0072025','GO:0072221') this function returns all genes that are annotated to those GO-categories. This includes genes that are annotated to any of the child nodes of a GO-category.

Usage

get_anno_genes(go_ids, database = 'Homo.sapiens', genes = NULL, annotations = NULL,
    term_df = NULL, graph_path_df = NULL, godir = NULL)
get_anno_genes(go_ids, database = 'Homo.sapiens', genes = NULL, annotations = NULL,
    term_df = NULL, graph_path_df = NULL, godir = NULL)

Arguments

`go_ids`	character() vector of GO-IDs, e.g. c('GO:0051082', 'GO:0042254').
`database`	optional character() defining an OrganismDb or OrgDb annotation package from Bioconductor, like 'Mus.musculus' (mouse) or 'org.Pt.eg.db' (chimp).
`genes`	optional character() vector of gene-symbols. If defined, only annotations of those genes are returned.
`annotations`	optional data.frame() with two character() columns: gene-symbols and GO-categories. Alternative to 'database'.
`term_df`	optional data.frame() with an ontology 'term' table. Alternative to the default integrated GO-graph or `godir`. Also needs `graph_path_df`.
`graph_path_df`	optional data.frame() with an ontology 'graph_path' table. Alternative to the default integrated GO-graph or `godir`. Also needs `term_df`.
`godir`	optional character() specifying a directory that contains the ontology tables 'term.txt' and 'graph_path.txt'. Alternative to the default integrated GO-graph or `term_df` + `graph_path_df`.

Details

Besides the default 'Homo.sapiens', also other OrganismDb or OrgDb packages from Bioconductor, like 'Mus.musculus' (mouse) or 'org.Pt.eg.db' (chimp), can be used. It is also possible to directly provide a data.frame() with annotations, which is then searched for the input GO-categories and their child nodes.

By default the package's integrated GO-graph is used to find child nodes, but a custom ontology can be defined, too. For details on how to use a custom ontology with term_df + graph_path_df or godir please refer to the package's vignette. The advantage of term_df + graph_path_df over godir is that the latter reads the files 'term.txt' and 'graph_path.txt' from disk and therefore takes longer.

Value

A data.frame() with two columns: GO-IDs (character()) and the annotated genes (character()). The output is ordered by GO-ID and gene-symbol.

Author(s)

Steffi Grote

References

[1] Ashburner, M. et al. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25-29.

Examples

 

## find all genes that are annotated to GO:0000109
## ("nucleotide-excision repair complex")
get_anno_genes(go_ids='GO:0000109')

## find out wich genes from a set of genes
## are annotated to some GO-categories
genes = c('AGTR1', 'ANO1', 'CALB1', 'GYG1', 'PAX2')
gos = c('GO:0001558', 'GO:0005536', 'GO:0072205', 'GO:0006821')
anno_genes = get_anno_genes(go_ids=gos, genes=genes)
# add the names and domains of the GO-categories
cbind(anno_genes ,get_names(anno_genes$go_id)[,2:3])

## find all annotations to GO-categories containing 'serotonin receptor'
sero_ids = get_ids('serotonin receptor')
sero_anno = get_anno_genes(go_ids=sero_ids$go_id)
# merge with names of GO-categories
head(merge(sero_ids, sero_anno))
## find all genes that are annotated to GO:0000109
## ("nucleotide-excision repair complex")
get_anno_genes(go_ids='GO:0000109')

## find out wich genes from a set of genes
## are annotated to some GO-categories
genes = c('AGTR1', 'ANO1', 'CALB1', 'GYG1', 'PAX2')
gos = c('GO:0001558', 'GO:0005536', 'GO:0072205', 'GO:0006821')
anno_genes = get_anno_genes(go_ids=gos, genes=genes)
# add the names and domains of the GO-categories
cbind(anno_genes ,get_names(anno_genes$go_id)[,2:3])

## find all annotations to GO-categories containing 'serotonin receptor'
sero_ids = get_ids('serotonin receptor')
sero_anno = get_anno_genes(go_ids=sero_ids$go_id)
# merge with names of GO-categories
head(merge(sero_ids, sero_anno))

Get all child nodes of gene ontology categories

Description

Returns all child nodes (sub-categories) of GO-categories given their GO-IDs, e.g. c('GO:0042254', 'GO:0000109'). The output also states the shortest distance to the child node. Note that a GO-ID itself is also considered as child with distance 0.

Usage

get_child_nodes(go_ids, term_df = NULL, graph_path_df = NULL, godir = NULL)
get_child_nodes(go_ids, term_df = NULL, graph_path_df = NULL, godir = NULL)

Arguments

`go_ids`	a character() vector of GO-IDs, e.g. c('GO:0051082', 'GO:0042254').
`term_df`	optional data.frame() with an ontology 'term' table. Alternative to the default integrated GO-graph or `godir`. Also needs `graph_path_df`.
`graph_path_df`	optional data.frame() with an ontology 'graph_path' table. Alternative to the default integrated GO-graph or `godir`. Also needs `term_df`.
`godir`	optional character() specifying a directory that contains the ontology tables 'term.txt' and 'graph_path.txt'. Alternative to the default integrated GO-graph or `term_df` + `graph_path_df`.

Details

By default the package's integrated GO-graph is used, but a custom ontology can be defined, too. For details on how to use a custom ontology with term_df + graph_path_df or godir please refer to the package's vignette. The advantage of term_df + graph_path_df over godir is that the latter reads the files 'term.txt' and 'graph_path.txt' from disk and therefore takes longer.

Value

a data.frame() with four columns: parent GO-ID (character()), child GO-ID (character()), child GO-name (character()) and distance (numeric()).

Author(s)

Steffi Grote

References

[1] Ashburner, M. et al. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25-29.

Examples

## get the child nodes (sub-categories) of two random GO-IDs
child_nodes = get_child_nodes(c('GO:0090070', 'GO:0000112'))
child_nodes
## get the child nodes (sub-categories) of two random GO-IDs
child_nodes = get_child_nodes(c('GO:0090070', 'GO:0000112'))
child_nodes

Get the ID of a GO-category given its name

Description

Returns GO-categories given (part of) their name. Matching is not case-sensitive.

Usage

    get_ids(go_name, term_df = NULL, godir = NULL)
get_ids(go_name, term_df = NULL, godir = NULL)

Arguments

`go_name`	character(). (partial) name of a GO-category
`term_df`	optional data.frame() with an ontology 'term' table. Alternative to the default integrated GO-graph or `godir`.
`godir`	optional character() specifying a directory that contains the ontology table 'term.txt'. Alternative to the default integrated GO-graph or `term_df`.

Details

For details on how to use a custom ontology with term_df or godir please refer to the package's vignette. The advantage of term_df over godir is that the latter reads the file 'term.txt' from disk and therefore takes longer.

Value

a data.frame() with three columns: the full names (character()) of the GO-categories that contain go_name; together with the GO-domain ('cellular_component', 'biological_process' or 'molecular_function') and the GO-category IDs (character()).

Note

This is just a grep(..., ignore.case=TRUE) on the node names of the ontology.
More sophisticated searches, e.g. with regular expressions, could be performed on the table returned by get_ids('') which lists all non-obsolete GO-categories.

Author(s)

Steffi Grote

References

[1] Ashburner, M. et al. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25-29.

Examples

## get GO-IDs of categories that contain 'gabaergic' in their names
get_ids('gabaergic')

## get GO-IDs of categories that contain 'blood-brain barrier' in their names
get_ids('blood-brain barrier')

## get all valid GO-categories
all_nodes = get_ids('')
head(all_nodes)
## get GO-IDs of categories that contain 'gabaergic' in their names
get_ids('gabaergic')

## get GO-IDs of categories that contain 'blood-brain barrier' in their names
get_ids('blood-brain barrier')

## get all valid GO-categories
all_nodes = get_ids('')
head(all_nodes)

Get the full names of gene ontology categories given the IDs

Description

Returns the full names and the domains of GO-categories given the GO-IDs, e.g. 'GO:0042254'. By default the package's integrated GO-graph is used, but a custom ontology can be defined, too.

Usage

get_names(go_ids, term_df = NULL, godir = NULL)
get_names(go_ids, term_df = NULL, godir = NULL)

Arguments

`go_ids`	a character() vector of GO-IDs, e.g. c('GO:0051082', 'GO:0042254').
`term_df`	optional data.frame() with an ontology 'term' table. Alternative to the default integrated GO-graph or `godir`.
`godir`	optional character() specifying a directory that contains the ontology table 'term.txt'. Alternative to the default integrated GO-graph or `term_df`.

Details

Value

a data.frame() with three columns: go_id (character()), go_name (character()) and root_node (domain, character()).

Author(s)

Steffi Grote

References

[1] Ashburner, M. et al. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25-29.

Examples

## get the full names of three random GO-IDs
get_names(c('GO:0051082', 'GO:0042254', 'GO:0000109'))
## get the full names of three random GO-IDs
get_names(c('GO:0051082', 'GO:0042254', 'GO:0000109'))

Get all parent nodes of gene ontology categories

Description

Returns all parent nodes (higher level categories) of GO-categories given their GO-IDs, e.g. c('GO:0042254', 'GO:0000109'). The output also states the shortest distance to the parent node. Note that a GO-ID itself is also considered as parent with distance 0.

Usage

get_parent_nodes(go_ids, term_df = NULL, graph_path_df = NULL, godir = NULL)
get_parent_nodes(go_ids, term_df = NULL, graph_path_df = NULL, godir = NULL)

Arguments

`go_ids`	a character() vector of GO-IDs, e.g. c('GO:0051082', 'GO:0042254').
`term_df`	optional data.frame() with an ontology 'term' table. Alternative to the default integrated GO-graph or `godir`. Also needs `graph_path_df`.
`graph_path_df`	optional data.frame() with an ontology 'graph_path' table. Alternative to the default integrated GO-graph or `godir`. Also needs `term_df`.
`godir`	optional character() specifying a directory that contains the ontology tables 'term.txt' and 'graph_path.txt'. Alternative to the default integrated GO-graph or `term_df` + `graph_path_df`.

Details

Value

a data.frame() with four columns: child GO-ID (character()), parent GO-ID (character()), parent GO-name (character()) and distance (numeric()).

Author(s)

Steffi Grote

References

[1] Ashburner, M. et al. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25-29.

Examples

## get the parent nodes (higher level GO-categories) of two random GO-IDs
parents = get_parent_nodes(c('GO:0051082', 'GO:0042254'))
parents
## get the parent nodes (higher level GO-categories) of two random GO-IDs
parents = get_parent_nodes(c('GO:0051082', 'GO:0042254'))
parents

Test gene sets for enrichment in GO-categories

Description

Tests GO-categories for enrichment of user defined gene sets, using either the hypergeometric (default), Wilcoxon rank-sum, binomial or 2x2 contingency table test.

Usage

go_enrich(genes, test = 'hyper', n_randsets = 1000, organismDb = 'Homo.sapiens', gene_len = FALSE,
    regions = FALSE, circ_chrom = FALSE, silent = FALSE, domains = NULL, orgDb = NULL,
    txDb = NULL, annotations = NULL, gene_coords = NULL,  godir = NULL)
go_enrich(genes, test = 'hyper', n_randsets = 1000, organismDb = 'Homo.sapiens', gene_len = FALSE,
    regions = FALSE, circ_chrom = FALSE, silent = FALSE, domains = NULL, orgDb = NULL,
    txDb = NULL, annotations = NULL, gene_coords = NULL,  godir = NULL)

Arguments

`genes`	a data.frame() with gene-symbols (character()) in the first column and test-dependent additional numeric() columns: If `test='hyper'` (default) a second column with 1 for candidate genes and 0 for background genes. If no background genes are defined, all remaining genes from the internal dataset are used as background. All candidate genes are implicitly part of the background gene set. If `test='wilcoxon'` a second column with the score that is associated with each gene. If `test='binomial'` two additional columns with two gene-associated integers. If `test='contingency'` four additional columns with four gene-associated integers. For `test='hyper'` with `regions=TRUE` the first column describes chromosomal regions ('chr:start-stop', e.g. '9:0-39200000'). Note that each gene has to be unique in the data.frame; e.g. for `test='wilcoxon'` a gene cannot have two different scores assigned.
`test`	character(). 'hyper' (default) for the hypergeometric test, 'wilcoxon' for the Wilcoxon rank-sum test, 'binomial' for the binomial test and 'contingency' for the 2x2-contingency table test (fisher's exact test or chi-square).
`n_randsets`	integer defining the number of random sets for computing the FWER.
`organismDb`	character(). Annotation package for GO-annotations and gene coordinates. Besides the default 'Homo.sapiens' also 'Mus.musculus' and 'Rattus.norvegicus' are currently available on Bioconductor.
`gene_len`	logical. Correct for gene length. If `TRUE` the probability of a background gene to be chosen as a candidate gene in a random set is dependent on the gene length. If `FALSE` genes are chosen randomly with equal probability each. Only for `test='hyper'`.
`regions`	logical. If `TRUE` chromosomal regions are analyzed, and genes are automatically extracted from the regions defined in `genes[,1]` as e.g. '9:0-39200000'. Note that this option requires the input of background regions.
`circ_chrom`	logical. When `genes` defines chromosomal regions, `circ_chrom=TRUE` uses background regions from the same chromosome and allows randomly chosen blocks to overlap multiple background regions. Only if `test='hyper'`.
`silent`	logical. If `TRUE` all output to the screen except for warnings and errors is suppressed.
`domains`	optional character() vector containing one or more of the three GO-domains 'cellular_component', 'biological_process' and 'molecular_function'. If defined, the analysis will be reduced to those domains which saves time.\ cr If a custom ontology is specified in `godir` it might have a different domains.
`orgDb`	optional character() naming an OrgDb annotation package from Bioconductor. If `orgDb` is set, `organismDb` is not used. OrgDb annotations are available for a wider range of species, e.g. 'org.Pt.eg.db' for chimp and 'org.Gg.eg.db' for chicken. Note that options `gene_len` and `regions` also need a `txDb` for the gene coordinates (which are integrated in OrganismDb).
`txDb`	optional character() naming an TxDb annotation package from Bioconductor (e.g. 'TxDb.Ptroglodytes.UCSC.panTro4.refGene') for the gene coordinates. Only needed when `orgDb` is specified, and options `gene_len` or `regions` are set. Note that `orgDb` is needed whenever `txDb` is defined, even when custom `annotations` are provided, because the `orgDb` package is used for Entrez-ID to gene-symbol conversions.
`annotations`	optional data.frame() for custom annotations, with two character() columns: (1) gene-symbols and (2) GO-categories. Note that options `gene_len` and `regions` also need an `organismDb` or `txDb` + `orgDb`.
`gene_coords`	optional data.frame() for custom gene coordinates, with four columns: gene-symbols (character), chromosome (character), start (integer), end (integer). When `gene_len=TRUE` or `regions=TRUE` these custom gene coordinates are used instead of the ones obtained from `organismDb` or `txDb`.
`godir`	optional character() specifying a directory () that contains a custom GO-graph (files 'term.txt', 'term2term.txt' and 'graph_path.txt'). Alternative to the default integrated GO-graph. For details please refer to the package's vignette.

Details

Please also refer to the package's vignette.
GO-annotations are taken from a Bioconductor annotation package (OrganismDb package 'Homo.sapiens' by default), but also other 'OrganismDb' or 'OrgDb' packages can be used. It is also possible to provide custom annotations as a data.frame().

The ontology graph is integrated, but a custom version can be defined as well with parameter 'godir'. As long as the ontology tables are in the right format (see link to description in vignette), any ontology can be used in GOfuncR, it is not restricted to the gene ontology.

The statistical analysis is based on the ontology enrichment software FUNC [2]. go_enrich offers four different statistical tests: (1) the hypergeometric test for a candidate and a background gene set; (2) the Wilcoxon rank-sum test for genes that are ranked by scores (e.g. p-value for differential expression); (3) the binomial test for genes that have two associated counts (e.g. amino-acid changes on the human and the chimp lineage); and (4) a 2x2-contingency table test for genes that have four associated counts (e.g. for a McDonald-Kreitman test).

To account for multiple testing family-wise error rates are computed using randomsets. Besides naming candidate genes explicitly, for the hypergeometric test it is also possible to provide entire genomic regions as input. The enrichment analysis is then performed for all genes located in or overlapping these regions and the multiple testing correction accounts for the spatial clustering of genes.

Value

A list with components

`results`	a data.frame() with the FWERs from the enrichment analyses per ontology category, ordered by 'FWER_overrep', 'raw_p_overrep', -'FWER_underrep', -'raw_p_underrep', 'ontology' and 'node_id', or the corresponding columns if another test then the hypergeometric test is used. This table contains the combined results for all three ontology domains. Note that GO-categories without any annotations of candidate or background genes are not listed.
`genes`	the input `genes` data.frame(), excluding those genes for which no GO-annotations are available and which therefore were not included in the enrichment analysis. If `gene_len=TRUE`, also genes without gene coordinates are excluded.
`databases`	a data.frame() with the used annotation packages and GO-graph.
`min_p`	a data.frame() with the minimum p-values from the randomsets that are used to compute the FWER.

Author(s)

Steffi Grote

References

[1] Ashburner, M. et al. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics 25: 25-29. doi:10.1038/75556
[2] Pruefer, K. et al. (2007). FUNC: A package for detecting significant associations between gene sets and ontological. BMC Bioinformatics 8: 41. doi:10.1186/1471-2105-8-41

Examples


#### Note that argument 'n_randsets' is reduced 
#### to lower computational time in the following examples. 
#### Using the default value is recommended.

#### Perform a GO-enrichment analysis for some human genes 
#### with a defined background set
# create input dataframe that defines the candidate and backround genes
candi_gene_ids = c('NCAPG', 'APOL4', 'NGFR', 'NXPH4', 'C21orf59', 'CACNG2', 
    'AGTR1', 'ANO1', 'BTBD3', 'MTUS1', 'CALB1', 'GYG1', 'PAX2')
bg_gene_ids = c('FGR', 'NPHP1', 'DRD2', 'ABCC10', 'PTBP2', 'JPH4', 'SMARCC2',
    'FN1', 'NODAL', 'CYP1A2', 'ACSS1', 'CDHR1', 'SLC25A36', 'LEPR', 'PRPS2',
    'TNFAIP3', 'NKX3-1', 'LPAR2', 'PGAM2')
is_candidate = c(rep(1,length(candi_gene_ids)), rep(0,length(bg_gene_ids)))
genes = data.frame(gene_ids=c(candi_gene_ids, bg_gene_ids), is_candidate)
genes

# run enrichment analysis
go_res = go_enrich(genes, n_randset=100)

# go_enrich returns a list with 4 elements:
# 1) results from the anlysis
# (ordered by FWER for overrepresentation of candidate genes)
head(go_res[[1]])
# see the top GOs from every GO-domain
by(go_res[[1]], go_res[[1]][,'ontology'], head)
# 2) all valid input genes
go_res[[2]]
# 3) annotation databases used
go_res[[3]]
# 4) minimum p-values from randomsets
head(go_res[[4]])

#### see the package's vignette for more examples
#### Note that argument 'n_randsets' is reduced 
#### to lower computational time in the following examples. 
#### Using the default value is recommended.

#### Perform a GO-enrichment analysis for some human genes 
#### with a defined background set
# create input dataframe that defines the candidate and backround genes
candi_gene_ids = c('NCAPG', 'APOL4', 'NGFR', 'NXPH4', 'C21orf59', 'CACNG2', 
    'AGTR1', 'ANO1', 'BTBD3', 'MTUS1', 'CALB1', 'GYG1', 'PAX2')
bg_gene_ids = c('FGR', 'NPHP1', 'DRD2', 'ABCC10', 'PTBP2', 'JPH4', 'SMARCC2',
    'FN1', 'NODAL', 'CYP1A2', 'ACSS1', 'CDHR1', 'SLC25A36', 'LEPR', 'PRPS2',
    'TNFAIP3', 'NKX3-1', 'LPAR2', 'PGAM2')
is_candidate = c(rep(1,length(candi_gene_ids)), rep(0,length(bg_gene_ids)))
genes = data.frame(gene_ids=c(candi_gene_ids, bg_gene_ids), is_candidate)
genes

# run enrichment analysis
go_res = go_enrich(genes, n_randset=100)

# go_enrich returns a list with 4 elements:
# 1) results from the anlysis
# (ordered by FWER for overrepresentation of candidate genes)
head(go_res[[1]])
# see the top GOs from every GO-domain
by(go_res[[1]], go_res[[1]][,'ontology'], head)
# 2) all valid input genes
go_res[[2]]
# 3) annotation databases used
go_res[[3]]
# 4) minimum p-values from randomsets
head(go_res[[4]])

#### see the package's vignette for more examples

Plot distribution of scores of genes annotated to GO-categories

Description

Uses the result of a GO-enrichment analysis performed with go_enrich and a vector of GO-IDs and plots for each of these GO-IDs the scores of the annotated genes. This refers to the scores that were provided as user-input in the go_enrich analysis.
plot_anno_scores works with all four tests implemented in go_enrich (hypergeometric, Wilcoxon rank-sum, binomial and 2x2 contingency table test), with test-specific output (see details).

Usage

plot_anno_scores(res, go_ids, annotations = NULL)
plot_anno_scores(res, go_ids, annotations = NULL)

Arguments

`res`	an object returned from `go_enrich` (list() of 4 elements).
`go_ids`	character() vector of GO-IDs, e.g. c('GO:0005737','GO:0071495'). This specifies the GO-categories that are plotted.
`annotations`	optional data.frame() for custom annotations, with two character() columns: (1) gene-symbols and (2) GO-categories. This is needed if `go_enrich` was run with custom `annotations` to generate `res`, too.

Details

The plot depends on the statistical test that was specified in the go_enrich call.

For the hypergeometric test pie charts show the amounts of candidate and background genes that are annotated to the GO-categories and the root nodes (candidate genes in the colour of the corresponding root node). The top panel shows the odds-ratio and 95%-CI from fisher's exact test (two-sided) comparing the GO-categories with their root nodes. Note that go_enrich reports the the hypergeometric tests for over- and under-representation of candidate genes which correspond to the one-sided fisher's exact tests.

For the Wilcoxon rank-sum test violin plots show the distribution of the scores of genes that are annotated to each GO-category and the root nodes. Horizontal lines in the left panel indicate the median of the scores that are annotated to the root nodes. The Wilcoxon rank-sum test reported in the go_enrich result compares the scores annotated to a GO-category with the scores annotated to the corresponding root node.

For the binomial test pie charts show the amounts of A and B counts associated with each GO-category and root node, (A in the colour of the corresponding root node). The top-panel shows point estimates and the 95%-CI of p(A) in the nodes, as well as horizontal lines that correspond to p(A) in the root nodes. The p-value in the returned object is based on the null hypothesis that p(A) in a node equals p(A) in the corresponding root node. Note that go_enrich reports that value for one-sided binomial tests.

For the 2x2 contingency table test pie charts show the proportions of A and B, as well as C and D counts associated with a GO-category. Root nodes are not shown, because this test is independent of the root category. The top panel shows the odds ratio and 95%-CI from Fisher's exact test (two-sided) comparing A/B and C/D inside one node. Note that in go_enrich, if all four values are >=10, a chi-square test is performed instead of fisher's exact test.

Value

For the hypergeometric, binomial and 2x2 contingency table test, a data.frame() with the statistics that are used in the plots.
For the Wilcoxon rank-sum test no statistical results are plotted, just the distribution of annotated scores. The returned element in this case is a list() with three data frames: annotations of genes to the GO-categories, annotations of genes to the root nodes and a table which contains for every GO-ID the corresponding root node.

Author(s)

Steffi Grote

Examples

 
#### see the package's vignette for more examples

#### Note that argument 'n_randsets' is reduced 
#### to lower computational time in the example.

## Assign two random counts to some genes to create example input
set.seed(123)
high_A_genes = c('G6PD', 'GCK', 'GYS1', 'HK2', 'PYGL', 'SLC2A8',
    'UGP2', 'ZWINT', 'ENGASE')
low_A_genes = c('CACNG2', 'AGTR1', 'ANO1', 'BTBD3', 'MTUS1', 'CALB1',
    'GYG1', 'PAX2')
A_counts = c(sample(15:25, length(high_A_genes)),
    sample(5:15, length(low_A_genes)))
B_counts = c(sample(5:15, length(high_A_genes)),
    sample(15:25, length(low_A_genes)))
genes = data.frame(gene=c(high_A_genes, low_A_genes), A_counts, B_counts)

## perform enrichment analysis to find GO-categories with high fraction of A
go_binom = go_enrich(genes, test='binomial', n_randsets=20)

## plot sums of A and B counts associated with the top GO-categories
top_gos = head(go_binom[[1]]$node_id)
stats = plot_anno_scores(go_binom, go_ids=top_gos)

## look at the results of binomial test used for plotting
## (this is two-sided, go_enrich reports one-sided tests)
head(stats)

#### see the package's vignette for more examples

#### Note that argument 'n_randsets' is reduced 
#### to lower computational time in the example.

## Assign two random counts to some genes to create example input
set.seed(123)
high_A_genes = c('G6PD', 'GCK', 'GYS1', 'HK2', 'PYGL', 'SLC2A8',
    'UGP2', 'ZWINT', 'ENGASE')
low_A_genes = c('CACNG2', 'AGTR1', 'ANO1', 'BTBD3', 'MTUS1', 'CALB1',
    'GYG1', 'PAX2')
A_counts = c(sample(15:25, length(high_A_genes)),
    sample(5:15, length(low_A_genes)))
B_counts = c(sample(5:15, length(high_A_genes)),
    sample(15:25, length(low_A_genes)))
genes = data.frame(gene=c(high_A_genes, low_A_genes), A_counts, B_counts)

## perform enrichment analysis to find GO-categories with high fraction of A
go_binom = go_enrich(genes, test='binomial', n_randsets=20)

## plot sums of A and B counts associated with the top GO-categories
top_gos = head(go_binom[[1]]$node_id)
stats = plot_anno_scores(go_binom, go_ids=top_gos)

## look at the results of binomial test used for plotting
## (this is two-sided, go_enrich reports one-sided tests)
head(stats)

Refine results given a FWER threshold

Description

Given a FWER threshold, this function refines the results from go_enrich() like described in the elim algorithm of [1].
This algorithm removes genes from significant child-categories and then checks whether a category is still significant.
This way significant results are restricted to more specific categories.

Usage

    refine(res, fwer = 0.05, fwer_col = 7, annotations = NULL)
refine(res, fwer = 0.05, fwer_col = 7, annotations = NULL)

Arguments

`res`	list() returned from `go_enrich()`
`fwer`	numeric() FWER-threshold. Categories with a FWER < `fwer` will be labeled significant.
`fwer_col`	numeric() or character() specifying the column of `go_enrich()[[1]]` that is to be filtered. E.g. 6 for under-representation or 7 over-representation of candidate genes in the hypergeometric test.
`annotations`	optional data.frame() with custom annotations. Only needed if `go_enrich()` was run with custom annotations in the first place.

Details

For each domain a p-value is found by interpolation, that corresponds to the input FWER threshold. Since GO-domains are independent graphs, the same FWER will correspond to different p-values, e.g. in 'molecular_function' and 'biological_process'.

Value

a data.frame() with p-values after refinement for categories that were significant in go_enrich()[[1]] given the FWER-threshold.

Author(s)

Steffi Grote

References

[1] Alexa, A. et al. (2006). Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600-1607.

Examples


## perform enrichment analysis for some genes
gene_ids = c('NCAPG', 'APOL4', 'NGFR', 'NXPH4', 'C21orf59', 'CACNG2', 'AGTR1',
    'ANO1', 'BTBD3', 'MTUS1', 'CALB1', 'GYG1', 'PAX2')
input_hyper = data.frame(gene_ids, is_candidate=1)
res_hyper = go_enrich(input_hyper, n_randset=100, silent=TRUE)
head(res_hyper[[1]])
## perform refinement for categories with FWER < 0.1
refined = refine(res_hyper, fwer=0.1)
refined

## perform enrichment analysis for some genes
gene_ids = c('NCAPG', 'APOL4', 'NGFR', 'NXPH4', 'C21orf59', 'CACNG2', 'AGTR1',
    'ANO1', 'BTBD3', 'MTUS1', 'CALB1', 'GYG1', 'PAX2')
input_hyper = data.frame(gene_ids, is_candidate=1)
res_hyper = go_enrich(input_hyper, n_randset=100, silent=TRUE)
head(res_hyper[[1]])
## perform refinement for categories with FWER < 0.1
refined = refine(res_hyper, fwer=0.1)
refined

Package 'GOfuncR'

Help Index

Get all associated ontology categories for the input genes

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Get genes that are annotated to GO-categories

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Get all child nodes of gene ontology categories

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Get the ID of a GO-category given its name

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Get the full names of gene ontology categories given the IDs

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Get all parent nodes of gene ontology categories

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Test gene sets for enrichment in GO-categories

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Plot distribution of scores of genes annotated to GO-categories

Description

Usage

Arguments

Details

Value