| Title: | Gene Ontology enrichment analysis of gene pairs |
|---|---|
| Description: | GO-a-GO annotates Gene Ontology terms that are enriched in a given set of gene pairs. The enrichment is calculated from a permutation test for overrepresentation of gene pairs that are associated with a shared term. Such gene pairs are counted for the original set of gene pairs and compared against randomized sets in which the structure of the pairs is preserved, but the gene identities (including the associated terms) are permuted. |
| Authors: | Aleksander Jankowski [aut, cre] (ORCID: <https://orcid.org/0000-0002-2212-6224>) |
| Maintainer: | Aleksander Jankowski <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.1.1 |
| Built: | 2026-06-05 06:28:29 UTC |
| Source: | https://github.com/bioc/GOaGO |
Interaction anchors are associated to TSSes as follows. Each anchor is
associated to all the TSSes it overlaps. If there is no such overlap, then
the anchor is associated to all TSSes with the shortest distance to the
anchor, if this distance is not larger than maxDistanceToTSS.
annotateAnchors( anchors, transcripts = NULL, tss = NULL, keyType = NULL, maxDistanceToTSS = -1 )annotateAnchors( anchors, transcripts = NULL, tss = NULL, keyType = NULL, maxDistanceToTSS = -1 )
anchors |
object of class |
transcripts |
|
tss |
object of class |
keyType |
type of gene identifiers, such as "ENTREZID" or "ENSEMBL", if
it cannot be determined from metadata of |
maxDistanceToTSS |
maximal distance to extend the search for nearest TSS outside the anchor, or -1 (the default) to skip the extension |
Either transcripts or tss must be provided, but not both.
A data table with columns interactionID (index of the
anchor), chrom, start, end (coordinates of the
anchor), geneID (gene identifier from 'transcripts' or 'tss'),
tss (TSS position) and strand (TSS strand).
library(TxDb.Hsapiens.UCSC.hg19.knownGene) # take only the transcripts of coding genes by ensuring that the coding # sequence strand is not NA transcripts <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene, columns = "gene_id", filter = list(cds_strand = c("-", "+")) ) tss <- convertTranscriptsToTSS(transcripts) gr <- GRanges("chr1", IRanges(c(42001, 890001), c(62000, 900000))) # note that anchors are associated to TSSes outside the anchor only if there # are no TSSes overlapping the anchor annotateAnchors(gr, tss, maxDistanceToTSS = 10e3) # this may yield more associations to TSSes outside the anchors annotateAnchors(gr + 10e3, tss)library(TxDb.Hsapiens.UCSC.hg19.knownGene) # take only the transcripts of coding genes by ensuring that the coding # sequence strand is not NA transcripts <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene, columns = "gene_id", filter = list(cds_strand = c("-", "+")) ) tss <- convertTranscriptsToTSS(transcripts) gr <- GRanges("chr1", IRanges(c(42001, 890001), c(62000, 900000))) # note that anchors are associated to TSSes outside the anchor only if there # are no TSSes overlapping the anchor annotateAnchors(gr, tss, maxDistanceToTSS = 10e3) # this may yield more associations to TSSes outside the anchors annotateAnchors(gr + 10e3, tss)
Both interaction anchors are associated to TSSes as described in
annotateAnchors. Briefly, each anchor is associated to all the
TSSes it overlaps, or to all closest TSSes up to maxDistanceToTSS if
there is no such overlap. The annotation of an interaction is a Cartesian
product of annotations for both anchors.
annotateInteractions( interactions, transcripts = NULL, tss = NULL, keyType = NULL, maxDistanceToTSS = -1 )annotateInteractions( interactions, transcripts = NULL, tss = NULL, keyType = NULL, maxDistanceToTSS = -1 )
interactions |
object of class |
transcripts |
|
tss |
object of class |
keyType |
type of gene identifiers, such as "ENTREZID" or "ENSEMBL", if
it cannot be determined from metadata of |
maxDistanceToTSS |
maximal distance to extend the search for nearest TSS outside the anchor, or -1 (the default) to skip the extension |
Either transcripts or tss must be provided, but not both.
A data table with columns interactionID (index of the
interaction), chrom1, start1, end1, chrom2,
start2, end2 (coordinates of both anchors), geneID1,
geneID2 (gene identifiers from 'transcripts' or 'tss' for both
anchors), tss1, tss2 (TSS position for both anchors),
strand1 and strand2 (TSS strand for both anchors).
convertTranscriptsToTSS,
annotateAnchors
library(TxDb.Hsapiens.UCSC.hg19.knownGene) # take only the transcripts of coding genes by ensuring that the coding # sequence strand is not NA transcripts <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene, columns = "gene_id", filter = list(cds_strand = c("-", "+")) ) fpath <- system.file("extdata", "GM12878_loops.bedpe.gz", package = "GOaGO") pairs <- rtracklayer::import(fpath, genome = "hg19") annotateInteractions(pairs, transcripts, maxDistanceToTSS = 10e3)library(TxDb.Hsapiens.UCSC.hg19.knownGene) # take only the transcripts of coding genes by ensuring that the coding # sequence strand is not NA transcripts <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene, columns = "gene_id", filter = list(cds_strand = c("-", "+")) ) fpath <- system.file("extdata", "GM12878_loops.bedpe.gz", package = "GOaGO") pairs <- rtracklayer::import(fpath, genome = "hg19") annotateInteractions(pairs, transcripts, maxDistanceToTSS = 10e3)
GOaGO-result object to a data frameCoerce a GOaGO-result object to a data frame
as.data.frame(x, row.names=NULL, optional=FALSE, ...)as.data.frame(x, row.names=NULL, optional=FALSE, ...)
x |
The object to coerce. |
row.names, optional, ...
|
Not used, inherited from
|
A data frame of the enriched Gene Ontology terms, with the
following columns: ONTOLOGY, ID, Description (all of
the GO term), Count (number of input gene pairs sharing the given
term), PairRatio (fraction of input gene pairs sharing the given
term), BgRatio (fraction of permuted gene pairs sharing the given
term), FoldEnrichment (quotient of the two fractions),
pvalue, p.adjust, qvalue.
library(org.Hs.eg.db) data("genePairsGM12878") goago <- GOaGO(genePairsGM12878, keyType = "ENTREZID", OrgDb = org.Hs.eg.db) as.data.frame(goago)library(org.Hs.eg.db) data("genePairsGM12878") goago <- GOaGO(genePairsGM12878, keyType = "ENTREZID", OrgDb = org.Hs.eg.db) as.data.frame(goago)
Convert a GRanges object with gene transcripts to a GRanges
object with gene TSSes of length 1 bp, with duplicate rows removed. Assumes
that gene identifiers are provided in one of the metadata columns, either as
a vector (possibly containing NA values) or a CharacterList.
The function is idempotent, i.e. can be applied multiple times without
changing the result.
convertTranscriptsToTSS( transcripts, geneid_column = c("gene_id", "GENEID", "geneID") )convertTranscriptsToTSS( transcripts, geneid_column = c("gene_id", "GENEID", "geneID") )
transcripts |
|
geneid_column |
A character vector of recognized names for the metadata
column in |
A GRanges object with gene identifiers in metadata column
geneID being a vector.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) # take only the transcripts of coding genes by ensuring that the coding # sequence strand is not NA transcripts <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene, columns = "gene_id", filter = list(cds_strand = c("-", "+")) ) convertTranscriptsToTSS(transcripts)library(TxDb.Hsapiens.UCSC.hg19.knownGene) # take only the transcripts of coding genes by ensuring that the coding # sequence strand is not NA transcripts <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene, columns = "gene_id", filter = list(cds_strand = c("-", "+")) ) convertTranscriptsToTSS(transcripts)
By default, plots the most enriched terms, with fold enrichment on the X-axis, point size indicating the number of gene pairs sharing the given term, and point color – the adjusted p-value.
DotPlot( object, minCount = 5, x = "FoldEnrichment", color = "p.adjust", size = "Count", showCategory = 10, orderBy = "FoldEnrichment", decreasing = TRUE, font.size = 12, label_format = 50 )DotPlot( object, minCount = 5, x = "FoldEnrichment", color = "p.adjust", size = "Count", showCategory = 10, orderBy = "FoldEnrichment", decreasing = TRUE, font.size = 12, label_format = 50 )
object |
GO-a-GO results of class |
minCount |
plot only the GO terms that are associated to at least the given number of gene pairs |
x |
Variable for X-axis, one of |
color |
Variable used to color enriched terms, e.g. |
size |
Variable used to scale the sizes of points, one of
|
showCategory |
number of terms to display or a vector of terms |
orderBy |
The order of the Y-axis, one of |
decreasing |
logical. Should the |
font.size |
font size |
label_format |
a numeric value sets wrap length, alternatively a custom function to format axis labels. By default wraps names longer than 50 characters. |
A ggplot object that can be further customized using the
ggplot2 package.
library(org.Hs.eg.db) data("genePairsGM12878") goago <- GOaGO(genePairsGM12878, keyType = "ENTREZID", OrgDb = org.Hs.eg.db) DotPlot(goago)library(org.Hs.eg.db) data("genePairsGM12878") goago <- GOaGO(genePairsGM12878, keyType = "ENTREZID", OrgDb = org.Hs.eg.db) DotPlot(goago)
The dataset is based on 9,448 chromatin loops identified in human cell line GM12878 as peaks in Hi-C contact maps. Of these chromatin loops, 1,581 overlapped at least one gene Transcription Start Site (TSS) at both loop anchors. As some loop anchors overlapped multiple TSSes, possibly of different genes, the dataset contains all combinations for these loops, yielding a total of 2,339 gene pairs, of which 1,743 pairs are unique and do not contain the same gene twice.
genePairsGM12878genePairsGM12878
A data frame with 2,339 rows and 13 columns:
loop identifier
chromosome of loop anchor 1
start coordinate of loop anchor 1
end coordinate of loop anchor 1
Entrez identifier of the gene associated to loop anchor 1
TSS coordinate of the associated gene
strand ("+" or "-") of the associated gene
chromosome of loop anchor 2
start coordinate of loop anchor 2
end coordinate of loop anchor 2
Entrez identifier of the gene associated to loop anchor 2
TSS coordinate of the associated gene
strand ("+" or "-") of the associated gene.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525
Rao, S. S., Huntley, M. H., et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 159(7), 1665-80.
Given a data frame of gene pairs, this function will return the enriched Gene Ontology terms after FDR control.
GOaGO( genePairs, OrgDb, keyType = NULL, ont = "MF", minCount = 1, numPermutations = 10000, universe, pvalueCutoff = 0.05, pAdjustMethod = "BH", qvalueCutoff = 0.2, minGSSize = 10, maxGSSize = 500 )GOaGO( genePairs, OrgDb, keyType = NULL, ont = "MF", minCount = 1, numPermutations = 10000, universe, pvalueCutoff = 0.05, pAdjustMethod = "BH", qvalueCutoff = 0.2, minGSSize = 10, maxGSSize = 500 )
genePairs |
a data frame with columns |
OrgDb |
OrgDb |
keyType |
type of gene identifiers, such as "ENTREZID" or "ENSEMBL", if
it cannot be determined from metadata of |
ont |
one of "BP", "MF", and "CC" subontologies, or "ALL" for all three |
minCount |
cutoff for number of pairs that share a GO term for this term to be considered |
numPermutations |
number of permutations performed in the enrichment test |
universe |
a set of background genes. If missing, all the genes from all the gene pairs will be used as background. |
pvalueCutoff |
adjusted p-value cutoff on enrichment tests to report |
pAdjustMethod |
one of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none" |
qvalueCutoff |
q-value cutoff on enrichment tests to report as
significant. Tests must pass (i) |
minGSSize |
minimal size of genes annotated for testing |
maxGSSize |
maximal size of genes annotated for testing |
A GOaGO-result object.
library(org.Hs.eg.db) data("genePairsGM12878") goago <- GOaGO(genePairsGM12878, keyType = "ENTREZID", OrgDb = org.Hs.eg.db) show(goago)library(org.Hs.eg.db) data("genePairsGM12878") goago <- GOaGO(genePairsGM12878, keyType = "ENTREZID", OrgDb = org.Hs.eg.db) show(goago)
GOaGO-result objectsAccessors and show method for GOaGO-result objects
genePairs(object) keyType(object) organism(object) show(object)genePairs(object) keyType(object) organism(object) show(object)
object |
of class |
genePairs returns a data frame with the input gene pairs, with the
columns geneID1, geneID2 and pairID.
keyType returns the type of gene identifiers, such as "ENTREZID" or
"ENSEMBL".
organism returns the scientific name (i.e. genus and species, or
genus and species and subspecies) of the organism.
show displays the object, and returns an invisible NULL.
library(org.Hs.eg.db) data("genePairsGM12878") goago <- GOaGO(genePairsGM12878, keyType = "ENTREZID", OrgDb = org.Hs.eg.db) show(goago) genePairs(goago) keyType(goago) organism(goago)library(org.Hs.eg.db) data("genePairsGM12878") goago <- GOaGO(genePairsGM12878, keyType = "ENTREZID", OrgDb = org.Hs.eg.db) show(goago) genePairs(goago) keyType(goago) organism(goago)
An S4 class to represent the results of GO-a-GO enrichment analysis
resultA data frame of the enriched Gene Ontology terms, with the
following columns: ONTOLOGY, ID, Description (all of
the GO term), Count (number of input gene pairs sharing the given
term), PairRatio (fraction of input gene pairs sharing the given
term), BgRatio (fraction of permuted gene pairs sharing the given
term), FoldEnrichment (quotient of the two fractions),
pvalue, p.adjust, qvalue.
pvalueCutoffadjusted p-value cutoff on enrichment tests
pAdjustMethodp-value adjustment method
qvalueCutoffq-value cutoff on enrichment tests
minCountcutoff for number of pairs that share a GO term for this term to be considered
numPermutationsnumber of permutations performed in the enrichment test
minGSSizeminimal size of genes annotated for testing
maxGSSizemaximal size of genes annotated for testing
organismscientific name of the organism
ontologyone of "BP", "MF", and "CC" subontologies, or "ALL" for all three
keyTypetype of gene identifiers, such as "ENTREZID" or "ENSEMBL"
genePairsA data frame with the input gene pairs, with the columns
geneID1, geneID2 and pairID.
pairTermsA data frame linking the enriched Gene Ontology terms with
the input gene pairs, with the columns pairID and ID (of the
GO term).
permutedResultA data frame with the columns ID (of the GO
term) and Count, keeping the numbers of permuted gene pairs sharing
the term as obtained in every random permutation.
universea set of background genes
Ridgeplot of the sampling distributions of numbers of gene pairs sharing each enriched Gene Ontology term, obtained for the randomized gene pairs.
RidgePlot( object, minCount = 5, showCategory = 10, orderBy = "FoldEnrichment", decreasing = TRUE, font.size = 12, label_format = 50 )RidgePlot( object, minCount = 5, showCategory = 10, orderBy = "FoldEnrichment", decreasing = TRUE, font.size = 12, label_format = 50 )
object |
GO-a-GO results of class |
minCount |
plot only the GO terms that are associated to at least the given number of gene pairs |
showCategory |
number of terms to display or a vector of terms |
orderBy |
The order of the Y-axis, one of |
decreasing |
logical. Should the |
font.size |
font size |
label_format |
a numeric value sets wrap length, alternatively a custom function to format axis labels. By default wraps names longer than 50 characters. |
A ggplot object that can be further customized using the
ggplot2 package.
library(org.Hs.eg.db) data("genePairsGM12878") goago <- GOaGO(genePairsGM12878, keyType = "ENTREZID", OrgDb = org.Hs.eg.db) RidgePlot(goago)library(org.Hs.eg.db) data("genePairsGM12878") goago <- GOaGO(genePairsGM12878, keyType = "ENTREZID", OrgDb = org.Hs.eg.db) RidgePlot(goago)
Extract the enriched Gene Ontology terms along with gene pairs sharing them
termGenePairs(object, OrgDb = NULL)termGenePairs(object, OrgDb = NULL)
object |
of class |
OrgDb |
OrgDb to map gene identifiers to gene symbols |
A data frame similar to returned by as.data.frame(object),
but including all the gene pairs sharing each enriched Gene Ontology term,
one gene pair in each row. Additional columns include pairID,
geneID1, geneID2 and any other columns provided in
genePairs argument to GOaGO. If OrgDb is
provided, geneSymbol1 and geneSymbol2 will also be added.
as.data.frame,GOaGO-result-method
library(org.Hs.eg.db) data("genePairsGM12878") goago <- GOaGO(genePairsGM12878, keyType = "ENTREZID", OrgDb = org.Hs.eg.db) termGenePairs(goago, OrgDb = org.Hs.eg.db)library(org.Hs.eg.db) data("genePairsGM12878") goago <- GOaGO(genePairsGM12878, keyType = "ENTREZID", OrgDb = org.Hs.eg.db) termGenePairs(goago, OrgDb = org.Hs.eg.db)
Given a data frame of gene pairs, this function will return the unique pairs of genes, removing loops (gene pairs containing the same gene twice) and duplicates. Note that gene pair (A, B) is a duplicate of (B, A).
uniqueGenePairs(genePairs)uniqueGenePairs(genePairs)
genePairs |
a data frame with columns |
A data frame with columns pairID, geneID1 and
geneID2. If loops or duplicates were removed, a warning will
alert you. If column pairID was not provided in genePairs,
an integer vector equal to seq_len(nrow(result)) will be used.