Title: | GREAT Analysis - Functional Enrichment on Genomic Regions |
---|---|
Description: | GREAT (Genomic Regions Enrichment of Annotations Tool) is a type of functional enrichment analysis directly performed on genomic regions. This package implements the GREAT algorithm (the local GREAT analysis), also it supports directly interacting with the GREAT web service (the online GREAT analysis). Both analysis can be viewed by a Shiny application. rGREAT by default supports more than 600 organisms and a large number of gene set collections, as well as self-provided gene sets and organisms from users. Additionally, it implements a general method for dealing with background regions. |
Authors: | Zuguang Gu [aut, cre] |
Maintainer: | Zuguang Gu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.9.0 |
Built: | 2024-12-18 04:13:15 UTC |
Source: | https://github.com/bioc/rGREAT |
Available ontology categories of the GREAT job
## S4 method for signature 'GreatJob' availableCategories(object)
## S4 method for signature 'GreatJob' availableCategories(object)
object |
A |
The values of the supported categories sometime change. You should run the function to get the real-time values. The meaning of categories returned is quite self-explained by the name.
The returned value is a vector of categories.
Zuguang gu <[email protected]>
job = readRDS(system.file("extdata", "GreatJob.rds", package = "rGREAT")) availableCategories(job)
job = readRDS(system.file("extdata", "GreatJob.rds", package = "rGREAT")) availableCategories(job)
All available ontology names of the GREAT job
## S4 method for signature 'GreatJob' availableOntologies(object, category = NULL)
## S4 method for signature 'GreatJob' availableOntologies(object, category = NULL)
object |
A |
category |
one or multiple categories. All available categories can be got by |
The values of the supported ontologies sometime change. You should run the function to get the real-time values. The meaning of ontology returned is quite self-explained by the name.
The returned values is a vector of ontologies.
Zuguang gu <[email protected]>
job = readRDS(system.file("extdata", "GreatJob.rds", package = "rGREAT")) availableOntologies(job)
job = readRDS(system.file("extdata", "GreatJob.rds", package = "rGREAT")) availableOntologies(job)
Extend TSS
extendTSS(gene, seqlengths = NULL, genome = NULL, gene_id_type = NULL, mode = "basalPlusExt", extend_from = c("TSS", "gene"), basal_upstream = 5000, basal_downstream = 1000, extension = 1000000, verbose = great_opt$verbose, .attr = list())
extendTSS(gene, seqlengths = NULL, genome = NULL, gene_id_type = NULL, mode = "basalPlusExt", extend_from = c("TSS", "gene"), basal_upstream = 5000, basal_downstream = 1000, extension = 1000000, verbose = great_opt$verbose, .attr = list())
gene |
A |
extend_from |
Should the gene be extended only from its TSS or the complete gene? |
seqlengths |
A named vector of chromosome lengths. If it is not provided, it is taken by |
genome |
UCSC genome can be set here, then |
gene_id_type |
Gene ID types in |
mode |
The mode to extend TSS. Value should be one of 'basalPlusExt', 'twoClosest' and 'oneClosest'. See "Details" section. |
basal_upstream |
In 'basalPlusExt' mode, number of base pairs extending to the upstream of TSS to form the basal domains. |
basal_downstream |
In 'basalPlusExt' mode, number of base pairs extending to the downstream of TSS to form the basal domains. |
extension |
Extensions from the basal domains. The value can also be a vector of length two which corresponds to extension to upstream and downstream respectively. |
verbose |
Whether to print messages. |
.attr |
Only used internally. |
Following are general explanations of the three modes for extending TSS:
basalPlusExt
1. TSS are extended into basal domains (e.g. by upstream 5kb, downstream 1kb); 2. basal domains are sorted by their genomic coordinates; 3. each basal domain is extended to its both sides until it reaches the next TSS's basal domain or it reaches the maximal extension (e.g. 1000kb).
twoClosest
1. TSS are sorted by their genomic coordinates; 2. each TSS is extended to its both sides until it reaches the next TSS or it reaches the maximal extension (e.g. 1000kb).
oneClosest
1. TSS are sorted by their genomic coordinates; 2. each TSS is extended to its both sides until it reaches the middle point of itself and the next TSS or it reaches the maximal extension (e.g. 1000kb).
The official explanation is at https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655443/Association+Rules .
A GRanges
object with one meta column 'gene_id'.
# There is no example NULL
# There is no example NULL
Extend TSS
extendTSSFromDataFrame(df, seqlengths, genome = NULL, strand = NULL, gene_id = NULL, gene_id_type = NULL, verbose = great_opt$verbose, ...)
extendTSSFromDataFrame(df, seqlengths, genome = NULL, strand = NULL, gene_id = NULL, gene_id_type = NULL, verbose = great_opt$verbose, ...)
df |
A bed-like data frame where the first three columns should be chromosomes, start positions, end positions. It does not matter whether regions correspond to genes or TSS. |
seqlengths |
A named vector of chromosome lengths. |
genome |
UCSC genome can be set here, then |
strand |
The strand information can be provided in |
gene_id |
The gene ID information can be provided in |
gene_id_type |
Gene ID types in |
verbose |
Whether to print messages. |
... |
All pass to |
A GRanges
object with one meta column 'gene_id'.
# There is no example NULL
# There is no example NULL
Extend TSS
extendTSSFromOrgDb(orgdb, verbose = great_opt$verbose, ...)
extendTSSFromOrgDb(orgdb, verbose = great_opt$verbose, ...)
orgdb |
Name of "org.*" packages from Bioconductor. All supported OrgDb packages are in |
verbose |
Whether to print messages. |
... |
All pass to |
A GRanges
object with one meta column 'gene_id'.
if(FALSE) { extendTSSFromOrgDb("Org.Hs.eg.db") extendTSSFromOrgDb("hg19") }
if(FALSE) { extendTSSFromOrgDb("Org.Hs.eg.db") extendTSSFromOrgDb("hg19") }
Extend TSS
extendTSSFromTxDb(txdb, verbose = great_opt$verbose, ...)
extendTSSFromTxDb(txdb, verbose = great_opt$verbose, ...)
txdb |
Name of "TxDb.*" packages from Bioconductor. All supported TxDb packages are in |
verbose |
Whether to print messages. |
... |
All pass to |
A GRanges
object with one meta column 'gene_id'.
if(FALSE) { extendTSSFromTxDb("TxDb.Hsapiens.UCSC.hg19.knownGene") extendTSSFromTxDb("hg19") }
if(FALSE) { extendTSSFromTxDb("TxDb.Hsapiens.UCSC.hg19.knownGene") extendTSSFromTxDb("hg19") }
Method dispatch page for getEnrichmentTable
.
getEnrichmentTable
can be dispatched on following classes:
getEnrichmentTable,GreatJob-method
, GreatJob-class
class method
getEnrichmentTable,GreatObject-method
, GreatObject-class
class method
# no example NULL
# no example NULL
Get a single enrichment table from GREAT web server
## S4 method for signature 'GreatJob' getEnrichmentTable(object, ontology, ...)
## S4 method for signature 'GreatJob' getEnrichmentTable(object, ontology, ...)
object |
A |
ontology |
A single ontology names. Valid values are in |
... |
All pass to |
A data frame of the enrichment results for a single ontology.
job = readRDS(system.file("extdata", "GreatJob.rds", package = "rGREAT")) tb = getEnrichmentTable(job, ontology = "GO Molecular Function") head(tb)
job = readRDS(system.file("extdata", "GreatJob.rds", package = "rGREAT")) tb = getEnrichmentTable(job, ontology = "GO Molecular Function") head(tb)
Get enrichment table
## S4 method for signature 'GreatObject' getEnrichmentTable(object, min_region_hits = 5)
## S4 method for signature 'GreatObject' getEnrichmentTable(object, min_region_hits = 5)
object |
A |
min_region_hits |
Minimal number of input regions overlapping to the geneset associated regions. |
Note: adjusted p-values are re-calculated based on min_region_hits
.
A data frame of enrichment results
obj = readRDS(system.file("extdata", "GreatObject.rds", package = "rGREAT")) getEnrichmentTable(obj)
obj = readRDS(system.file("extdata", "GreatObject.rds", package = "rGREAT")) getEnrichmentTable(obj)
Method dispatch page for getEnrichmentTables
.
getEnrichmentTables
can be dispatched on following classes:
getEnrichmentTables,GreatObject-method
, GreatObject-class
class method
getEnrichmentTables,GreatJob-method
, GreatJob-class
class method
# no example NULL
# no example NULL
Get enrichment tables from GREAT web server
## S4 method for signature 'GreatJob' getEnrichmentTables(object, ontology = NULL, category = "GO", request_interval = 10, max_tries = 100, download_by = c("json", "tsv"), verbose = TRUE)
## S4 method for signature 'GreatJob' getEnrichmentTables(object, ontology = NULL, category = "GO", request_interval = 10, max_tries = 100, download_by = c("json", "tsv"), verbose = TRUE)
object |
A |
ontology |
Ontology names. Valid values are in |
category |
Pre-defined ontology categories. One category can contain more than one ontologies. Valid values are in |
request_interval |
Time interval for two requests. Default is 300 seconds. |
max_tries |
Maximal times for automatically reconnecting GREAT web server. |
download_by |
Internally used. The complete enrichment table is provided as json data on the website, but there is no information of gene-region association. By setting |
verbose |
Whether to print messages. |
The structure of the data frames are same as the tables available on GREAT website.
availableOntologies
, availableCategories
Zuguang gu <[email protected]>
job = readRDS(system.file("extdata", "GreatJob.rds", package = "rGREAT")) tbl = getEnrichmentTables(job) names(tbl) head(tbl[[1]]) job tbl = getEnrichmentTables(job, ontology = "GO Molecular Function") tbl = getEnrichmentTables(job, category = "GO")
job = readRDS(system.file("extdata", "GreatJob.rds", package = "rGREAT")) tbl = getEnrichmentTables(job) names(tbl) head(tbl[[1]]) job tbl = getEnrichmentTables(job, ontology = "GO Molecular Function") tbl = getEnrichmentTables(job, category = "GO")
Get enrichment table
## S4 method for signature 'GreatObject' getEnrichmentTables(object, ...)
## S4 method for signature 'GreatObject' getEnrichmentTables(object, ...)
object |
A |
... |
All passed to |
Please use getEnrichmentTable,GreatObject-method
directly.
A data frame of enrichment results
# There is no example NULL
# There is no example NULL
Get gap regions from UCSC
getGapFromUCSC(genome, seqnames = NULL)
getGapFromUCSC(genome, seqnames = NULL)
genome |
UCSC genome, such as "hg19". |
seqnames |
A vector of chromosome names. |
A GRanges
object.
getGapFromUCSC("hg19")
getGapFromUCSC("hg19")
Get GO gene sets from BioMart
getGeneSetsFromBioMart(dataset, ontology = "bp")
getGeneSetsFromBioMart(dataset, ontology = "bp")
dataset |
Name of the dataset. |
ontology |
Value should be bp, mf or cc. |
GO gene sets are from BioMartGOGeneSets::getBioMartGOGeneSets
.
A list of vectors where each vector contains Ensembl IDs annotated to a GO term.
# There is no example NULL
# There is no example NULL
Get GO gene sets from OrgDb object
getGeneSetsFromOrgDb(orgdb, ontology = "BP")
getGeneSetsFromOrgDb(orgdb, ontology = "BP")
orgdb |
An |
ontology |
Value should be bp, mf or cc. |
A list of vectors where each vector contains Entrez IDs annotated to a GO term.
# There is no example NULL
# There is no example NULL
Get Gencode genes
getGenesFromGencode(version)
getGenesFromGencode(version)
version |
Gencode version, e.g. v19 for human, vM21 for mouse. |
Only the protein coding genes.
A GRanges
object.
# There is no example NULL
# There is no example NULL
Get genome data from NCBI
getGenomeDataFromNCBI(refseq_assembly_accession, return_granges = FALSE)
getGenomeDataFromNCBI(refseq_assembly_accession, return_granges = FALSE)
refseq_assembly_accession |
The RefSeq accession number for the assembly, such as "GCF_000001405.40" for human. |
return_granges |
If the assembly is already on chromosome level, it will directly construct a GRanges object where "chromosomes" are only used and chromosome lengths are corrected fitted in its |
Only protein coding genes are used.
If return_granges
is set to FALSE
, it returns a list of two data frames:
A data frame of several columns.
A data frame for genes. The first column contains the RefSeq accession numbers of the corresponding contigs. If the genome is assembled on the chromosome level, the first column corresponds to chromosomes. The contig names can be converted to other names with the information in the genome
data frame.
if(FALSE) { getGenomeDataFromNCBI("GCF_000001405.40", return_granges = TRUE) getGenomeDataFromNCBI("GCF_000001405.40") }
if(FALSE) { getGenomeDataFromNCBI("GCF_000001405.40", return_granges = TRUE) getGenomeDataFromNCBI("GCF_000001405.40") }
Get built-in TSS from GREAT
getGREATDefaultTSS(genome)
getGREATDefaultTSS(genome)
genome |
Only support "hg19", "hg38", "mm10", "mm9". Files are downloaded from https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655445/Genes . |
A GRanges
object.
# There is no example NULL
# There is no example NULL
Get the corresponding assembly id for a kegg organism
getKEGGGenome(organism)
getKEGGGenome(organism)
organism |
The organism code on KEGG. |
The Refseq access ID for the genome.
# There is no example NULL
# There is no example NULL
Get KEGG pathway gene sets
getKEGGPathways(organism, as_table = FALSE)
getKEGGPathways(organism, as_table = FALSE)
organism |
The organism code on KEGG. |
as_table |
Whether to return the gene sets as a two-column table. |
A list of a data frame, depends on the value of as_table
.
# There is no example NULL
# There is no example NULL
Get RefSeq genes from UCSC
getRefSeqGenesFromUCSC(genome, subset = c("RefSeqSelect", "RefSeqCurated"))
getRefSeqGenesFromUCSC(genome, subset = c("RefSeqSelect", "RefSeqCurated"))
genome |
UCSC genome, such as "hg19". |
subset |
Subset of RefSeq genes. See https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=refSeqComposite . |
A GenomicRanges
object.
# There is no example NULL
# There is no example NULL
Method dispatch page for getRegionGeneAssociations
.
getRegionGeneAssociations
can be dispatched on following classes:
getRegionGeneAssociations,GreatObject-method
, GreatObject-class
class method
getRegionGeneAssociations,GreatJob-method
, GreatJob-class
class method
# no example NULL
# no example NULL
Get region-gene associations
## S4 method for signature 'GreatJob' getRegionGeneAssociations(object, ontology = NULL, term_id = NULL, request_interval = 10, max_tries = 100, verbose = great_opt$verbose)
## S4 method for signature 'GreatJob' getRegionGeneAssociations(object, ontology = NULL, term_id = NULL, request_interval = 10, max_tries = 100, verbose = great_opt$verbose)
object |
A |
ontology |
ontology name |
term_id |
Term id in the selected ontology. |
request_interval |
Time interval for two requests. Default is 300 seconds. |
max_tries |
Maximal times for automatically reconnecting GREAT web server. |
verbose |
Whether to show messages. |
A GRanges
object. Please the two meta columns are in formats of CharacterList
and IntegerList
because a region may associate to multiple genes.
Please note, the distance is from the middle points of input regions to TSS.
Zuguang gu <[email protected]>
job = readRDS(system.file("extdata", "GreatJob.rds", package = "rGREAT")) gr = getRegionGeneAssociations(job) gr
job = readRDS(system.file("extdata", "GreatJob.rds", package = "rGREAT")) gr = getRegionGeneAssociations(job) gr
Get region-gene associations
## S4 method for signature 'GreatObject' getRegionGeneAssociations(object, term_id = NULL, by_middle_points = FALSE, use_symbols = TRUE)
## S4 method for signature 'GreatObject' getRegionGeneAssociations(object, term_id = NULL, by_middle_points = FALSE, use_symbols = TRUE)
object |
A |
term_id |
Term ID. |
by_middle_points |
Whether the distances are calculated from the middle points of input regions? |
use_symbols |
Whether to use gene symbols |
A GRanges
object. Please the two meta columns are in formats of CharacterList
and IntegerList
because a region may associate to multiple genes.
obj = readRDS(system.file("extdata", "GreatObject.rds", package = "rGREAT")) getRegionGeneAssociations(obj)
obj = readRDS(system.file("extdata", "GreatObject.rds", package = "rGREAT")) getRegionGeneAssociations(obj)
Get the internally used TSS
getTSS(tss_source, biomart_dataset = NULL)
getTSS(tss_source, biomart_dataset = NULL)
tss_source |
The same format as in |
biomart_dataset |
The same format as in |
A GRanges
object.
# There is no example NULL
# There is no example NULL
Perform GREAT analysis
great(gr, gene_sets, tss_source, biomart_dataset = NULL, min_gene_set_size = 5, mode = "basalPlusExt", extend_from = c("TSS", "gene"), basal_upstream = 5000, basal_downstream = 1000, extension = 1000000, extended_tss = NULL, background = NULL, exclude = "gap", cores = 1, verbose = great_opt$verbose)
great(gr, gene_sets, tss_source, biomart_dataset = NULL, min_gene_set_size = 5, mode = "basalPlusExt", extend_from = c("TSS", "gene"), basal_upstream = 5000, basal_downstream = 1000, extension = 1000000, extended_tss = NULL, background = NULL, exclude = "gap", cores = 1, verbose = great_opt$verbose)
gr |
A |
gene_sets |
A single string of defautly supported gene sets collections (see the full list in "Genesets" section), or a named list of vectors where each vector correspond to a gene set. |
tss_source |
Source of TSS. See "TSS" section. |
biomart_dataset |
The value should be in |
min_gene_set_size |
Minimal size of gene sets. |
mode |
The mode to extend genes. Value should be one of 'basalPlusExt', 'twoClosest' and 'oneClosest'. See |
extend_from |
Should the gene be extended only from its TSS or the complete gene? |
basal_upstream |
In 'basalPlusExt' mode, number of base pairs extending to the upstream of TSS to form the basal domains. |
basal_downstream |
In 'basalPlusExt' mode, number of base pairs extending to the downstream of TSS to form the basal domains. |
extension |
Extensions from the basal domains. |
extended_tss |
If your organism is not defaultly supported, you can first prepare one by |
background |
Background regions. The value can also be a vector of chromosome names. |
exclude |
Regions that are excluded from analysis such as gap regions (which can be get by |
cores |
Number of cores to use. |
verbose |
Whether to print messages. |
When background
or exclude
is set, the analysis is restricted in the background regions, still by using Binomial method. Note
this is different from the original GREAT method which uses Fisher's exact test if background regions is set. See submitGreatJob
for explanations.
By default, gap regions are excluded from the analysis.
A GreatObject-class
object. The following methods can be applied on it:
getEnrichmentTable,GreatObject-method
to retrieve the result table.
getRegionGeneAssociations,GreatObject-method
to get the associations between input regions and genes.
plotRegionGeneAssociations,GreatObject-method
to plot the associations bewteen input regions and genes.
shinyReport,GreatObject-method
to view the results by a shiny application.
rGREAT supports TSS from many organisms. The value of tss_source
should be encoded in a special format:
Name of TxDb.*
packages. Supported packages are in rGREAT:::BIOC_ANNO_PKGS$txdb
.
Genome version of the organism, e.g. "hg19". Then the corresponding TxDb will be used.
In a format of RefSeqCurated:$genome
where $genome
is the genome version of an organism, such as hg19. RefSeqCurated subset will be used.
In a format of RefSeqSelect:$genome
where $genome
is the genome version of an organism, such as hg19. RefSeqSelect subset will be used.
In a format of Gencode_v$version
where $version
is gencode version, such as 19 (for human) or M21 for mouse. Gencode protein coding genes will be used.
In a format of GREAT:$genome
, where $genome
can only be mm9, mm10, hg19, hg38. The TSS from GREAT will be used.
rGREAT supports the following built-in GO gene sets for all organisms (note "GO:" can be omitted):
Biological Process, from GO.db package.
Cellular Component, from GO.db package.
Molecular Function, from GO.db pacakge.
rGREAT also supports built-in gene sets collections from MSigDB (note this is only for human, "msigdb:" can be omitted):
Hallmark gene sets.
Positional gene sets.
Curated gene sets.
C2 subcategory: chemical and genetic perturbations gene sets.
C2 subcategory: canonical pathways gene sets.
C2 subcategory: BioCarta subset of CP.
C2 subcategory: KEGG subset of CP.
C2 subcategory: PID subset of CP.
C2 subcategory: REACTOME subset of CP.
C2 subcategory: WIKIPATHWAYS subset of CP.
Regulatory target gene sets.
miRDB of microRNA targets gene sets.
MIR_Legacy of MIRDB.
GTRD transcription factor targets gene sets.
TFT_Legacy.
Computational gene sets.
C4 subcategory: cancer gene neighborhoods gene sets.
C4 subcategory: cancer modules gene sets.
Ontology gene sets.
C5 subcategory: BP subset.
C5 subcategory: CC subset.
C5 subcategory: MF subset.
C5 subcategory: human phenotype ontology gene sets.
Oncogenic signature gene sets.
Immunologic signature gene sets.
ImmuneSigDB subset of C7.
C7 subcategory: vaccine response gene sets.
Cell type signature gene sets.
If the defaultly supported TxDb is used, Entrez gene ID is always used as the main gene ID. If you provide a self-defined
gene_sets
or extended_tss
, you need to make sure they two have the same gene ID types.
rGREAT supports a large number of organisms of which the information is retrieved from Ensembl BioMart. The name of a BioMart dataset
can be assigned to argument biomart_dataset
. All supported organisms can be found with BioMartGOGeneSets::supportedOrganisms
.
if(FALSE) { gr = randomRegions(genome = "hg19") res = great(gr, "MSigDB:H", "txdb:hg19") res = great(gr, "MSigDB:H", "TxDb.Hsapiens.UCSC.hg19.knownGene") res = great(gr, "MSigDB:H", "RefSeq:hg19") res = great(gr, "MSigDB:H", "GREAT:hg19") res = great(gr, "MSigDB:H", "Gencode_v19") res = great(gr, "GO:BP", "hsapiens_gene_ensembl") }
if(FALSE) { gr = randomRegions(genome = "hg19") res = great(gr, "MSigDB:H", "txdb:hg19") res = great(gr, "MSigDB:H", "TxDb.Hsapiens.UCSC.hg19.knownGene") res = great(gr, "MSigDB:H", "RefSeq:hg19") res = great(gr, "MSigDB:H", "GREAT:hg19") res = great(gr, "MSigDB:H", "Gencode_v19") res = great(gr, "GO:BP", "hsapiens_gene_ensembl") }
Global parameters for rGREAT
great_opt(..., RESET = FALSE, READ.ONLY = NULL, LOCAL = FALSE, ADD = FALSE)
great_opt(..., RESET = FALSE, READ.ONLY = NULL, LOCAL = FALSE, ADD = FALSE)
... |
Arguments for the parameters, see "details" section |
RESET |
Reset to default values. |
READ.ONLY |
Please ignore. |
LOCAL |
Pllease ignore. |
ADD |
Please ignore. |
There are following parameters:
verbose
Whether to show messages.
great_opt
great_opt
Constructor method for GreatJob class
GreatJob(...)
GreatJob(...)
... |
arguments. |
There is no public constructor method for the GreatJob-class
.
No value is returned.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Class to store and retrieve GREAT results
After submitting request to GREAT server, the generated results will be
available on GREAT server for some time. The GreatJob-class
is defined
to store parameters that user has set and result tables what were retrieved
from GREAT server.
Users don't need to construct by hand, submitGreatJob
is used to generate a GreatJob-class
instance.
After submitting request to GREAT server, users can perform following steps:
getEnrichmentTables,GreatJob-method
to get enrichment tables for selected ontologies catalogues.
plotRegionGeneAssociations,GreatJob-method
to plot associations between regions and genes
getRegionGeneAssociations,GreatJob-method
to get a GRanges
object which contains associations bewteen regions and genes.
shinyReport,GreatJob-method
to view the results by a shiny application.
Zuguang gu <[email protected]>
# There is no example NULL
# There is no example NULL
Constructor method for GreatObject class
GreatObject(...)
GreatObject(...)
... |
arguments. |
There are following methods that can be applied on GreatObject-class
object:
getEnrichmentTable,GreatObject-method
to retrieve the result table.
getRegionGeneAssociations,GreatObject-method
to get the associations between input regions and genes.
plotRegionGeneAssociations,GreatObject-method
to plot the associations bewteen input regions and genes.
shinyReport,GreatObject-method
to view the results by a shiny application.
No value is returned.
Zuguang Gu <[email protected]>
# There is no example NULL
# There is no example NULL
Class for local GREAT analysis
great
returns A GreatObject-class
object.
# There is no example NULL
# There is no example NULL
Plot region-gene associations
## S4 method for signature 'GreatJob' plotRegionGeneAssociationGraphs(object, ...)
## S4 method for signature 'GreatJob' plotRegionGeneAssociationGraphs(object, ...)
object |
A |
... |
All passed to |
This function will be removed in the future, please use plotRegionGeneAssociations,GreatJob-method
instead.
# There is no example NULL
# There is no example NULL
Method dispatch page for plotRegionGeneAssociations
.
plotRegionGeneAssociations
can be dispatched on following classes:
plotRegionGeneAssociations,GreatObject-method
, GreatObject-class
class method
plotRegionGeneAssociations,GreatJob-method
, GreatJob-class
class method
# no example NULL
# no example NULL
Plot region-gene associations
## S4 method for signature 'GreatJob' plotRegionGeneAssociations(object, ontology = NULL, term_id = NULL, which_plot = 1:3, request_interval = 10, max_tries = 100, verbose = great_opt$verbose)
## S4 method for signature 'GreatJob' plotRegionGeneAssociations(object, ontology = NULL, term_id = NULL, which_plot = 1:3, request_interval = 10, max_tries = 100, verbose = great_opt$verbose)
object |
A |
ontology |
A single ontology names. Valid values are in |
term_id |
Term id in the selected ontology |
which_plot |
Which plots to draw? The value should be in |
request_interval |
Time interval for two requests. Default is 300 seconds. |
max_tries |
Maximal times for automatically reconnecting GREAT web server. |
verbose |
Whether to show messages. |
There are following figures:
Association between regions and genes (which_plot = 1
).
Distribution of distance to TSS (which_plot = 2
).
Distribution of absolute distance to TSS (which_plot = 3
).
If ontology
and term_id
are set, only regions and genes corresponding to
selected ontology term will be used. Valid value for ontology
is in
availableOntologies
and valid value for term_id
is from 'id' column
in the table which is returned by getEnrichmentTables
.
Zuguang gu <[email protected]>
job = readRDS(system.file("extdata", "GreatJob.rds", package = "rGREAT")) plotRegionGeneAssociations(job) plotRegionGeneAssociations(job, which_plot = 1) # Do not use other term_id for this example, or you need to generate a new `job` object. plotRegionGeneAssociations(job, ontology = "GO Molecular Function", term_id = "GO:0004984")
job = readRDS(system.file("extdata", "GreatJob.rds", package = "rGREAT")) plotRegionGeneAssociations(job) plotRegionGeneAssociations(job, which_plot = 1) # Do not use other term_id for this example, or you need to generate a new `job` object. plotRegionGeneAssociations(job, ontology = "GO Molecular Function", term_id = "GO:0004984")
Plot region-gene associations
## S4 method for signature 'GreatObject' plotRegionGeneAssociations(object, term_id = NULL, which_plot = 1:3)
## S4 method for signature 'GreatObject' plotRegionGeneAssociations(object, term_id = NULL, which_plot = 1:3)
object |
A |
term_id |
Term ID. |
which_plot |
Which plots to draw? The value should be in |
There are following figures:
Association between regions and genes (which_plot = 1
).
Distribution of distance to TSS (which_plot = 2
).
Distribution of absolute distance to TSS (which_plot = 3
).
obj = readRDS(system.file("extdata", "GreatObject.rds", package = "rGREAT")) plotRegionGeneAssociations(obj)
obj = readRDS(system.file("extdata", "GreatObject.rds", package = "rGREAT")) plotRegionGeneAssociations(obj)
Method dispatch page for plotVolcano
.
plotVolcano
can be dispatched on following classes:
plotVolcano,GreatObject-method
, GreatObject-class
class method
plotVolcano,GreatJob-method
, GreatJob-class
class method
# no example NULL
# no example NULL
Make volcano plot
## S4 method for signature 'GreatJob' plotVolcano(object, ontology, min_region_hits = 5, x_values = c("fold_enrichment", "z-score"), y_values = c("p_value", "p_adjust"), main = NULL)
## S4 method for signature 'GreatJob' plotVolcano(object, ontology, min_region_hits = 5, x_values = c("fold_enrichment", "z-score"), y_values = c("p_value", "p_adjust"), main = NULL)
object |
A |
ontology |
A single ontology names. Valid values are in |
min_region_hits |
Minimal number of input regions overlapping to the geneset associated regions. |
x_values |
Which values for the x-axis. |
y_values |
Which values for the y-axis. |
main |
Title of the plot. |
Since the enrichment is an over-representation test, it is only the half volcano.
# There is no example NULL
# There is no example NULL
Make volcano plot
## S4 method for signature 'GreatObject' plotVolcano(object, min_region_hits = 5, x_values = c("fold_enrichment", "z-score"), y_values = c("p_value", "p_adjust"), main = NULL)
## S4 method for signature 'GreatObject' plotVolcano(object, min_region_hits = 5, x_values = c("fold_enrichment", "z-score"), y_values = c("p_value", "p_adjust"), main = NULL)
object |
A |
min_region_hits |
Minimal number of input regions overlapping to the geneset associated regions. |
x_values |
Which values for the x-axis. |
y_values |
Which values for the y-axis. |
main |
Title of the plot. |
Since the enrichment is an over-representation test, it is only the half volcano.
# There is no example NULL
# There is no example NULL
Generate random regions
randomRegions(genome = NULL, nr = 1000, seqlengths = NULL, width_fun = function(n) runif(n, min = 1000, max = 10000))
randomRegions(genome = NULL, nr = 1000, seqlengths = NULL, width_fun = function(n) runif(n, min = 1000, max = 10000))
genome |
UCSC genome version, e.g. "hg19". |
nr |
Number of regions. |
seqlengths |
Alternatively, you can also specify a named vector of seqlengths (chromosome lengths). |
width_fun |
A function which defines the distribution of region widths. |
The number of regions per chromosome is proportional to the chromsome length.
gr = randomRegions(genome = "hg19") quantile(width(gr))
gr = randomRegions(genome = "hg19") quantile(width(gr))
Generate random regions from a BioMart genome
randomRegionsFromBioMartGenome(biomart_dataset, nr = 1000, ...)
randomRegionsFromBioMartGenome(biomart_dataset, nr = 1000, ...)
biomart_dataset |
A BioMart dataset. Values should be in |
nr |
Number of regions. |
... |
Pass to |
The number of regions per chromosome is proportional to the chromsome length.
if(FALSE) { # Giant panda gr = randomRegionsFromBioMartGenome("amelanoleuca_gene_ensembl") }
if(FALSE) { # Giant panda gr = randomRegionsFromBioMartGenome("amelanoleuca_gene_ensembl") }
Read gmt gene sets file
read_gmt(x, from = NULL, to = NULL, orgdb = NULL)
read_gmt(x, from = NULL, to = NULL, orgdb = NULL)
x |
The file name of a .gmt file. |
from |
Gene ID type in the original gmt file. Value can only take values in 'ENTREZ/SYMBOL/ENSEMBL/REFSEQ'. |
to |
Gene ID type that you want to convert to. Value can only take values in 'ENTREZ/SYMBOL/ENSEMBL/REFSEQ'. |
orgdb |
The name of an OrgDb database. |
A named list of vectors.
read_gmt(url("http://dsigdb.tanlab.org/Downloads/D2_LINCS.gmt"))
read_gmt(url("http://dsigdb.tanlab.org/Downloads/D2_LINCS.gmt"))
Reduce by start and end
reduce_by_start_and_end(s, e)
reduce_by_start_and_end(s, e)
s |
Start positions. Sorted. |
e |
End positions. Sorted. |
Only internally used.
Sum of total widths of the reduced regions.
# There is no example NULL
# There is no example NULL
Method dispatch page for shinyReport
.
shinyReport
can be dispatched on following classes:
shinyReport,GreatJob-method
, GreatJob-class
class method
shinyReport,GreatObject-method
, GreatObject-class
class method
# no example NULL
# no example NULL
Shiny app on the GreatJob object
## S4 method for signature 'GreatJob' shinyReport(object)
## S4 method for signature 'GreatJob' shinyReport(object)
object |
The |
A shiny app object.
if(FALSE) { # pseudo code job = submitGreatJob(...) shinyReport(job) }
if(FALSE) { # pseudo code job = submitGreatJob(...) shinyReport(job) }
Shiny app on the GreatObject object
## S4 method for signature 'GreatObject' shinyReport(object)
## S4 method for signature 'GreatObject' shinyReport(object)
object |
The |
A shiny app object.
if(FALSE) { # pseudo code obj = great(...) shinyReport(obj) }
if(FALSE) { # pseudo code obj = great(...) shinyReport(obj) }
Perform online GREAT analysis
submitGreatJob(gr, bg = NULL, gr_is_zero_based = FALSE, species = "hg19", genome = species, includeCuratedRegDoms = TRUE, rule = c("basalPlusExt", "twoClosest", "oneClosest"), adv_upstream = 5.0, adv_downstream = 1.0, adv_span = 1000.0, adv_twoDistance = 1000.0, adv_oneDistance = 1000.0, request_interval = 60, max_tries = 10, version = DEFAULT_VERSION, base_url = "http://great.stanford.edu/public/cgi-bin", use_name_column = FALSE, verbose = help, help = great_opt$verbose)
submitGreatJob(gr, bg = NULL, gr_is_zero_based = FALSE, species = "hg19", genome = species, includeCuratedRegDoms = TRUE, rule = c("basalPlusExt", "twoClosest", "oneClosest"), adv_upstream = 5.0, adv_downstream = 1.0, adv_span = 1000.0, adv_twoDistance = 1000.0, adv_oneDistance = 1000.0, request_interval = 60, max_tries = 10, version = DEFAULT_VERSION, base_url = "http://great.stanford.edu/public/cgi-bin", use_name_column = FALSE, verbose = help, help = great_opt$verbose)
gr |
A |
bg |
Not supported any more. See explanations in section "When_background_regions_are_set". |
gr_is_zero_based |
Are start positions in |
genome |
Genome. "hg38", "hg19", "mm10", "mm9" are supported in GREAT version 4.x.x, "hg19", "mm10", "mm9", "danRer7" are supported in GREAT version 3.x.x and "hg19", "hg18", "mm9", "danRer7" are supported in GREAT version 2.x.x. |
species |
The same as |
includeCuratedRegDoms |
Whether to include curated regulatory domains, see https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655443/Association+Rules#AssociationRules-CuratedRegulatoryDomains . |
rule |
How to associate genomic regions to genes. See 'Details' section. |
adv_upstream |
Unit: kb, only used when rule is |
adv_downstream |
Unit: kb, only used when rule is |
adv_span |
Unit: kb, only used when rule is |
adv_twoDistance |
Unit: kb, only used when rule is |
adv_oneDistance |
Unit: kb, only used when rule is |
request_interval |
Time interval for two requests. Default is 300 seconds. |
max_tries |
Maximal times for aotumatically reconnecting GREAT web server. |
version |
Version of GREAT. The value should be "4.0.4", "3.0.0", "2.0.2". Shorten version numbers can also be used, such as using "4" or "4.0" is same as "4.0.4". |
base_url |
the url of |
use_name_column |
If the input is a data frame, whether to use the fourth column as the "names" of regions? |
verbose |
Whether to print help messages. |
help |
Whether to print help messages. This argument will be replaced by |
Note: On Aug 19 2019 GREAT released version 4(https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655442/Version+History ) where it supports hg38
genome and removes some ontologies such pathways. submitGreatJob
still
takes hg19
as default. hg38
can be specified by the genome = "hg38"
argument.
To use the older versions such as 3.0.0, specify as submitGreatJob(..., version = "3.0.0")
.
Note it does not use the standard GREAT API. This function directly send data to GREAT web server by HTTP POST.
Following text is copied from GREAT web site ( http://great.stanford.edu/public/html/ )
Explanation of rule
and settings with names started with 'adv_' (advanced settings):
Mode 'Basal plus extension'. Gene regulatory domain definition: Each gene is assigned a basal regulatory domain of a minimum distance upstream and downstream of the TSS (regardless of other nearby genes, controlled by adv_upstream
and adv_downstream
argument). The gene regulatory domain is extended in both directions to the nearest gene's basal domain but no more than the maximum extension in one direction (controlled by adv_span
).
Mode 'Two nearest genes'. Gene regulatory domain definition: Each gene is assigned a regulatory domain that extends in both directions to the nearest gene's TSS (controlled by adv_twoDistance
) but no more than the maximum extension in one direction.
Mode 'Single nearest gene'. Gene regulatory domain definition: Each gene is assigned a regulatory domain that extends in both directions to the midpoint between the gene's TSS and the nearest gene's TSS (controlled by adv_oneDistance
) but no more than the maximum extension in one direction.
A GreatJob-class
object which can be used to get results from GREAT server. The following methods can be applied on it:
getEnrichmentTables,GreatObject-method
to retreive the result tables.
getRegionGeneAssociations,GreatObject-method
to get the associations between input regions and genes.
plotRegionGeneAssociations,GreatObject-method
to plot the associations bewteen input regions and genes.
shinyReport,GreatObject-method
to view the results by a shiny application.
Note when bg
argument is set to a list of background regions, GREAT uses a completely different test!
When bg
is set, gr
should be exactly subset of bg
. For example, let's say a background region list contains
five regions: [1, 10], [15, 23], [34, 38], [40, 49], [54, 63]
, gr
can only be a subset of the five regions, which
means gr
can take [15, 23], [40, 49]
, but it cannot take [16, 20], [39, 51]
. In this setting, regions are taken
as single units and Fisher's exact test is applied for calculating the enrichment (by testing number of regions in the 2x2 contigency table).
Check https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655452/File+Formats#FileFormats-Whatshouldmybackgroundregionsfilecontain? for more explanations.
Please note from rGREAT 1.99.0, setting bg
is not supported any more and this argument will be removed in the future. You can either directly use GREAT website or use other Bioconductor packages such as "LOLA" to perform
the Fisher's exact test-based analysis.
If you want to restrict the input regions to background regions (by intersections) and still to apply Binomial test there, please
consider to use local GREAT by great
.
Zuguang gu <[email protected]>
great
for the local implementation of GREAT algorithm.
set.seed(123) gr = randomRegions(nr = 1000, genome = "hg19") job = submitGreatJob(gr) job # more parameters can be set for the job if(FALSE) { # suppress running it when building the package # current GREAT version is 4.0.4 job = submitGreatJob(gr, genome = "hg19") job = submitGreatJob(gr, adv_upstream = 10, adv_downstream = 2, adv_span = 2000) job = submitGreatJob(gr, rule = "twoClosest", adv_twoDistance = 2000) job = submitGreatJob(gr, rule = "oneClosest", adv_oneDistance = 2000) }
set.seed(123) gr = randomRegions(nr = 1000, genome = "hg19") job = submitGreatJob(gr) job # more parameters can be set for the job if(FALSE) { # suppress running it when building the package # current GREAT version is 4.0.4 job = submitGreatJob(gr, genome = "hg19") job = submitGreatJob(gr, adv_upstream = 10, adv_downstream = 2, adv_span = 2000) job = submitGreatJob(gr, rule = "twoClosest", adv_twoDistance = 2000) job = submitGreatJob(gr, rule = "oneClosest", adv_oneDistance = 2000) }