Title: | Calculate, visualize and analyse overlap between genomic regions |
---|---|
Description: | OGRE calculates overlap between user defined genomic region datasets. Any regions can be supplied i.e. genes, SNPs, or reads from sequencing experiments. Key numbers help analyse the extend of overlaps which can also be visualized at a genomic level. |
Authors: | Sven Berres [aut, cre], Jörg Gromoll [ctb], Marius Wöste [ctb], Sarah Sandmann [ctb], Sandra Laurentino [ctb] |
Maintainer: | Sven Berres <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.11.0 |
Built: | 2024-11-06 06:37:33 UTC |
Source: | https://github.com/bioc/OGRE |
OGRE calculates overlap between user defined annotated genomic region datasets. Any regions can be supplied such as public annotations (genes), genetic variation (SNPs, mutations), regulatory elements (TFBS, promoters, CpG islands) and basically all types of NGS output from sequencing experiments. After overlap calculation, key numbers help analyse the extend of overlaps which can also be visualized at a genomic level.
The main functions are:
OGREDataSetFromDir
- build an OGRE dataset from a user
defined directory with GRanges annotation files.
loadAnnotations
- Load dataset files containing genomic
regions annotation information from hard drive
OGREDataSet
- build an empty OGRE dataset to flexibly add
datasets from other sources like AnnotationHub or custom GRanges objects.
addDataSetFromHub
- adds datasets from AnnotationHub
addGRanges
- adds user defined GenomicRanges datasets
fOverlaps
- Finds all overlaps between query and subject
datasetssumPlot
- calculates key numbers, tables and plotsgvizPlot
- generates a genomic plot around query elements with
overlapping subject hits.
For additional information, see the package vignette, by typing
vignette("OGRE")
. Software-related questions or issues can be posted
to the Bioconductor Support Site:
https://support.bioconductor.org
or on github:
https://https://github.com/svenbioinf/OGRE
Sven Berres, Jörg Gromoll, Marius Wöste, Sarah Sandmann, Sandra Laurentino
AnnotationHub offers a wide range of annotated datasets which can be manually
aquired but need some parsing to work with OGRE as detailed in vignette
section "Load datasets from AnnotationHub".
For convienence addDataSetFromHub()
adds one of the predefined human
dataSets of listPredefinedDataSets()
to a OGREDataSet.Those are taken from
AnnotationHub and are ready to use for OGRE. Additional information on
dataSets can be found here listPredefinedDataSets
.
addDataSetFromHub(OGREDataSet, dataSet, type)
addDataSetFromHub(OGREDataSet, dataSet, type)
OGREDataSet |
OGREDataSet |
dataSet |
|
type |
Type of dataSet, must be either query or subject. If query the dataSet will be added as query and at the first position of OGREDataSet. |
OGREDataSet.
myOGRE <- OGREDataSet() myOGRE <- addDataSetFromHub(myOGRE,"protCodingGenes","query")
myOGRE <- OGREDataSet() myOGRE <- addDataSetFromHub(myOGRE,"protCodingGenes","query")
Add a GenomicRanges dataset to OGREDataSet
addGRanges(OGREDataSet, dataSet, type, label = NULL)
addGRanges(OGREDataSet, dataSet, type, label = NULL)
OGREDataSet |
An OGREDataSet |
dataSet |
A GRanges object. Each region needs chromosome, start, end and
strand information. A unique ID and a name column must be present in the
|
type |
Type of dataSet, must be either query or subject. If query the dataSet will be added as query and at the first position of OGREDataSet. |
label |
A |
OGREDataSet.
myOGRE <- OGREDataSet() myGRanges <- makeExampleGRanges() myOGRE <- addGRanges(myOGRE,myGRanges,"query")
myOGRE <- OGREDataSet() myGRanges <- makeExampleGRanges() myOGRE <- addGRanges(myOGRE,myGRanges,"query")
Generates coverage plots of all subject datasets and stores them as a list,
that can be accessed by metadata(OGREDataSet)$covPlot
covPlot( OGREDataSet, datasets = names(OGREDataSet)[seq(2, length(OGREDataSet))], nbin = 100 )
covPlot( OGREDataSet, datasets = names(OGREDataSet)[seq(2, length(OGREDataSet))], nbin = 100 )
OGREDataSet |
An OGREDataSet |
datasets |
|
nbin |
Number of bins |
OGREDataSet.
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- fOverlaps(myOGRE) myOGRE <- covPlot(myOGRE) metadata(myOGRE)$covPlot
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- fOverlaps(myOGRE) myOGRE <- covPlot(myOGRE) metadata(myOGRE)$covPlot
Extend(shrink) ranges of a GRanges object.
extendGRanges(OGREDataSet, name, upstream = 0, downstream = 0)
extendGRanges(OGREDataSet, name, upstream = 0, downstream = 0)
OGREDataSet |
An OGREDataSet |
name |
|
upstream |
|
downstream |
|
OGREDataSet
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) #extend range by shifting start 100 bp in upstream direction myOGRE <- extendGRanges(myOGRE,"genes",upstream=100) #shrinking range by shifting end 100 bp in upstream direction myOGRE <- extendGRanges(myOGRE,"genes",downstream=-100) #shrinking range by shifting from both sides to the center myOGRE <- extendGRanges(myOGRE,"genes",upstream=-10,downstream=-10)
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) #extend range by shifting start 100 bp in upstream direction myOGRE <- extendGRanges(myOGRE,"genes",upstream=100) #shrinking range by shifting end 100 bp in upstream direction myOGRE <- extendGRanges(myOGRE,"genes",downstream=-100) #shrinking range by shifting from both sides to the center myOGRE <- extendGRanges(myOGRE,"genes",upstream=-10,downstream=-10)
A wrapper of GenomicRanges::promoters()
to extract promoter regions of
a GRanges object stored in a OGREDataSet
extractPromoters(OGREDataSet, name, upstream = 2000, downstream = 200)
extractPromoters(OGREDataSet, name, upstream = 2000, downstream = 200)
OGREDataSet |
An OGREDataSet |
name |
|
upstream |
|
downstream |
|
OGREDataSet
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- extractPromoters(myOGRE,"genes", upstream=2000, downstream=200)
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- extractPromoters(myOGRE,"genes", upstream=2000, downstream=200)
Finds all overlaps between query and subject(s) and stores each hit (overlap)
in data table detailDT
. Data table sumDT
shows all
overlaps of a certain subject type for all query elements. By default also
partially overlaps are reported. Overlap calculation is done using
GenomicRanges::findOverlaps()
implementation.
fOverlaps(OGREDataSet, selfHits = FALSE, ignoreStrand = TRUE, ...)
fOverlaps(OGREDataSet, selfHits = FALSE, ignoreStrand = TRUE, ...)
OGREDataSet |
A OGREDataSet. |
selfHits |
|
ignoreStrand |
|
... |
Additional parameters, see |
OGREDataSet.
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- fOverlaps(myOGRE)
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- fOverlaps(myOGRE)
gvizPlot
generates a plot around one or many given query elements with all
overlapping subject hits. In addition, each generated plot can be stored in
the gvizPlots
folder get or set by gvizPlotsFolder
. A maximum of 25
elements can be plotted per track.
gvizPlot( OGREDataSet, query, gvizPlotsFolder = metadata(OGREDataSet)$gvizPlotsFolder, trackRegionLabels = setNames(rep("ID", length(OGREDataSet)), names(OGREDataSet)), trackShapes = setNames(rep("fixedArrow", length(OGREDataSet)), names(OGREDataSet)), showPlot = FALSE, extendPlot = c(-300, 300), nElements = 25 )
gvizPlot( OGREDataSet, query, gvizPlotsFolder = metadata(OGREDataSet)$gvizPlotsFolder, trackRegionLabels = setNames(rep("ID", length(OGREDataSet)), names(OGREDataSet)), trackShapes = setNames(rep("fixedArrow", length(OGREDataSet)), names(OGREDataSet)), showPlot = FALSE, extendPlot = c(-300, 300), nElements = 25 )
OGREDataSet |
A OGREDataSet. |
query |
A character vector of one or many query elements ID's (i.e. Gene ID's). |
gvizPlotsFolder |
A character pointing to the plot(s) output directory.
If not supplied a folder is automatically generated and can be accessed by
|
trackRegionLabels |
A labeled character vector that defines the type of
label that is displayed for query and subject elements during plotting.
Vector values represent the type of label and vector labels define the type
of subject element. In the following example
|
trackShapes |
A labeled character vector that defines the type of
shape in which every dataset's elements are displayed.
Vector values represent the type of shape and vector labels define the type
of subject element. In the following example
|
showPlot |
|
extendPlot |
|
nElements |
|
OGREDataSet.
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- fOverlaps(myOGRE) myOGRE <- gvizPlot(myOGRE,query="ENSG00000142168")
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- fOverlaps(myOGRE) myOGRE <- gvizPlot(myOGRE,query="ENSG00000142168")
Use listPredefinedDataSets()
to receive a vector of names for predefined
datasets that can be aquired from AnnotationHub that are already correctly
parsed and formatted. Each of the listed names can be used as input for
addDataSetFromHub()
. Currently supported:
protCodingGenes - Protein coding genes from HG19 (GRCh37) Ensembl
For additional information use:
getInfoOnIds(AnnotationHub(), "AH10684")
CGI - CpG islands from HG19 UCSC
For additional information use:
getInfoOnIds(AnnotationHub(), "AH5086")
SNP - Common Single Nucleotide Polymorphism from HG19 UCSC
For additional information use:
getInfoOnIds(AnnotationHub(), "AH5105")
TFBS - Transcription Factor Binding Sites conserved from HG19 UCSC
For additional information use:
getInfoOnIds(AnnotationHub(), "AH5090")
Promoters - Promoter and flanking regions from HG19 Ensembl (Note: This annotation is currently not included in AnnotationHub and is therefore downloaded from Ensembl's ftp site)
listPredefinedDataSets()
listPredefinedDataSets()
character
vector.
listPredefinedDataSets()
listPredefinedDataSets()
Load dataset files containing genomic regions annotation information from
hard drive. loadAnnotations
calls readQuery
and readSubject
to read in genomic regions as GenomicRanges
objects stored as .RDS / .rds
files. Each region needs chromosome, start, end and strand information.
A unique ID and a name column must be present in the GenomicRanges
object
metadata. OGRE searches for the query file in your query folder and any
number of subject files in your subjects folder. Alternatively, .gff (v2&v3) files
in the query or subject folder with attribute columns containing "ID" and "name"
information are read in by OGRE.
loadAnnotations(OGREDataSet)
loadAnnotations(OGREDataSet)
OGREDataSet |
A OGREDataSet. |
A OGREDataSet.
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE)
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE)
makeExampleGRanges
generates an example GRanges dataset.
makeExampleGRanges()
makeExampleGRanges()
OGREDataSet.
myGRanges <- makeExampleGRanges()
myGRanges <- makeExampleGRanges()
makeExampleOGREDataSet
generates a example OGREDataSet from dataset files
stored in OGRE's extdata directory.
makeExampleOGREDataSet()
makeExampleOGREDataSet()
OGREDataSet.
myOGRE <- makeExampleOGREDataSet()
myOGRE <- makeExampleOGREDataSet()
Builds a OGREDataset
as a GenomicRangesList
for storing and analysing
datasets which can be added by addDataSetFromHub()
or addGRanges()
.
Use BuildOGREDataSetFromDir
for adding dataSets stored as files.
OGREDataSet()
OGREDataSet()
A OGREDataSet.
myOGRE <- OGREDataSet()
myOGRE <- OGREDataSet()
Builds a OGREDataset
from user specified directories containing datasets
for which an overlap between query and subject is to be calculated.
A OGREDataset
is a GenomicRangesList
which stores datasets in a list like
structure and possible metadata information.
OGREDataSetFromDir(queryFolder, subjectFolder)
OGREDataSetFromDir(queryFolder, subjectFolder)
queryFolder |
A |
subjectFolder |
A |
A OGREDataSet.
myQueryFolder <- file.path(system.file('extdata', package = 'OGRE'),"query") mySubjectFolder <- file.path(system.file('extdata', package = 'OGRE'),"subject") myOGRE <- OGREDataSetFromDir(queryFolder=myQueryFolder,subjectFolder=mySubjectFolder)
myQueryFolder <- file.path(system.file('extdata', package = 'OGRE'),"query") mySubjectFolder <- file.path(system.file('extdata', package = 'OGRE'),"subject") myOGRE <- OGREDataSetFromDir(queryFolder=myQueryFolder,subjectFolder=mySubjectFolder)
Plots overlap histograms of all subject datasets and stores them as a list,
that can be accessed by metadata(myOGRE)$hist
plotHist(OGREDataSet, plot0 = FALSE)
plotHist(OGREDataSet, plot0 = FALSE)
OGREDataSet |
An OGREDataSet |
plot0 |
plot0=FALSE(default) plots a histogram of all dataset elements with overlaps, excluding elements without overlaps. plot0=FALSE also includes elements without overlaps. |
OGREDataSet.
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- fOverlaps(myOGRE) myOGRE <- plotHist(myOGRE) metadata(myOGRE)$hist
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- fOverlaps(myOGRE) myOGRE <- plotHist(myOGRE) metadata(myOGRE)$hist
readDataSetFromFolder()
scanns queryFolder and subjectFolder for either
.RDS/.rds or .CSV/.csv files and adds them to a OGREDataSet. Each region
needs chromosome, start, end and strand information. (tabular file columns
must be named accordingly). A unique ID and a name column must be present in
the GenomicRanges
object's metatdata and tabular file.
readDataSetFromFolder(OGREDataSet, type)
readDataSetFromFolder(OGREDataSet, type)
OGREDataSet |
A OGREDataSet. |
type |
|
A OGREDataSet.
myOGRE <- makeExampleOGREDataSet() myOGRE <- readDataSetFromFolder(myOGRE,type="query") myOGRE <- readDataSetFromFolder(myOGRE,type="subject")
myOGRE <- makeExampleOGREDataSet() myOGRE <- readDataSetFromFolder(myOGRE,type="query") myOGRE <- readDataSetFromFolder(myOGRE,type="subject")
SHREC()
is a graphical user interface for OGRE
SHREC()
SHREC()
Runs GUI, this function normally does not return
Subsets a GRanges object with reference to it's ID column using a ID vector.
subsetGRanges(OGREDataSet, IDs, name)
subsetGRanges(OGREDataSet, IDs, name)
OGREDataSet |
An OGREDataSet |
IDs |
|
name |
|
OGREDataSet.
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- subsetGRanges(myOGRE,c("ENSG00000142168","ENSG00000256715"),"genes")
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- subsetGRanges(myOGRE,c("ENSG00000142168","ENSG00000256715"),"genes")
Calculates min/max/average overlap for all datasets using summary()
.
Results can be accessed by metadata(OGREDataSet)$summaryDT
which is a
list()
of two data.table
objects. The first one includes
elements without any overlap at all and the second provides summary numbers
for all elements that have at least one overlap.
summarizeOverlap(OGREDataSet)
summarizeOverlap(OGREDataSet)
OGREDataSet |
An OGREDataSet |
OGREDataSet.
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- fOverlaps(myOGRE) myOGRE <- summarizeOverlap(myOGRE) metadata(myOGRE)$summaryDT
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- fOverlaps(myOGRE) myOGRE <- summarizeOverlap(myOGRE) metadata(myOGRE)$summaryDT
sumPlot()
calculates key numbers i.e.
(total number of overlaps, number of overlaps per subject...) to help with an
exploratory data evaluation and displays them in an informative barplot.
sumPlot(OGREDataSet)
sumPlot(OGREDataSet)
OGREDataSet |
A OGREDataSet. |
OGREDataSet.
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- fOverlaps(myOGRE) myOGRE <- sumPlot(myOGRE)
myOGRE <- makeExampleOGREDataSet() myOGRE <- loadAnnotations(myOGRE) myOGRE <- fOverlaps(myOGRE) myOGRE <- sumPlot(myOGRE)