Title: | Summary, annotation and visualization of genomic data |
---|---|
Description: | A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input. |
Authors: | Altuna Akalin [aut, cre], Vedran Franke [aut, cre], Katarzyna Wreczycka [aut], Alexander Gosdschan [ctb], Liz Ing-Simmons [ctb], Bozena Mika-Gospodorz [ctb] |
Maintainer: | Altuna Akalin <[email protected]>, Vedran Franke <[email protected]>, Katarzyna Wreczycka <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.39.0 |
Built: | 2024-12-29 07:25:47 UTC |
Source: | https://github.com/bioc/genomation |
Extract method for a ScoreMatrix object.
## S4 method for signature 'ScoreMatrix,ANY,ANY,ANY' x[i, j]
## S4 method for signature 'ScoreMatrix,ANY,ANY,ANY' x[i, j]
x |
the |
i |
numeric value |
j |
numeric value |
Extract method for a ScoreMatrixList object.
## S4 method for signature 'ScoreMatrixList,ANY,ANY,ANY' x[i]
## S4 method for signature 'ScoreMatrixList,ANY,ANY,ANY' x[i]
x |
the |
i |
numeric value |
Function to annotate given GRanges object with a given genomic feature
annotateWithFeature(target, feature, strand = FALSE, extend = 0, feature.name = NULL, intersect.chr = FALSE) ## S4 method for signature 'GRanges,GRanges' annotateWithFeature(target, feature, strand = FALSE, extend = 0, feature.name = NULL, intersect.chr = FALSE)
annotateWithFeature(target, feature, strand = FALSE, extend = 0, feature.name = NULL, intersect.chr = FALSE) ## S4 method for signature 'GRanges,GRanges' annotateWithFeature(target, feature, strand = FALSE, extend = 0, feature.name = NULL, intersect.chr = FALSE)
target |
a GRanges object storing chromosome locations to be annotated |
feature |
a GRanges object storing chromosome locations of a feature (can be CpG islands, ChIP-seq peaks, etc) |
strand |
If set to TRUE, annotation features and target features will be overlapped based on strand (def:FAULT) |
extend |
specifiying a positive value will extend the feature on both
sides as much as |
feature.name |
name of the annotation feature. For example: H3K4me1,CpGisland etc. by default the name is taken from the given variable |
intersect.chr |
boolean, whether to select only chromosomes that are common to feature and target. FALSE by default |
returns an AnnotationByFeature
object
data(cpgi) data(promoters) annot = annotateWithFeature(cpgi, promoters)
data(cpgi) data(promoters) annot = annotateWithFeature(cpgi, promoters)
Function to annotate a given GRanges object with promoter,exon,intron & intergenic values
annotateWithFeatureFlank(target, feature, flank, feature.name = NULL, flank.name = "flank", strand = FALSE, intersect.chr = FALSE) ## S4 method for signature 'GRanges,GRanges,GRanges' annotateWithFeatureFlank(target, feature, flank, feature.name = NULL, flank.name = "flank", strand = FALSE, intersect.chr = FALSE)
annotateWithFeatureFlank(target, feature, flank, feature.name = NULL, flank.name = "flank", strand = FALSE, intersect.chr = FALSE) ## S4 method for signature 'GRanges,GRanges,GRanges' annotateWithFeatureFlank(target, feature, flank, feature.name = NULL, flank.name = "flank", strand = FALSE, intersect.chr = FALSE)
target |
a granges object storing chromosome locations to be annotated |
feature |
a granges object storing chromosome locations of a feature (can be CpG islands, ChIP-seq peaks, etc) |
flank |
a granges object storing chromosome locations of the flanks of the feature |
feature.name |
string for the name of the feature |
flank.name |
string for the name of the flanks |
strand |
If set to TRUE, annotation features and target features will be overlapped based on strand (def:FAULT) |
intersect.chr |
boolean, whether to select only chromosomes that are common to feature and target. FALSE by default |
returns an AnnotationByFeature
object
data(cpgi) data(cage) cpgi.flanks = getFlanks(cpgi) flank.annot = annotateWithFeatureFlank(cage, cpgi, cpgi.flanks)
data(cpgi) data(cage) cpgi.flanks = getFlanks(cpgi) flank.annot = annotateWithFeatureFlank(cage, cpgi, cpgi.flanks)
The function annotates a target GRangesList or GRanges object as overlapping or not with the elements of a named GRangesList. This is useful to annotate your regions of interest with genomic features with arbitrary categories such as repeat classes or families, or output from genome segmentation alogorithms such as chromHMM.
annotateWithFeatures(target, features, strand.aware = FALSE, intersect.chr = FALSE) ## S4 method for signature 'GRanges,GRangesList' annotateWithFeatures(target, features, strand.aware = FALSE, intersect.chr = FALSE) ## S4 method for signature 'GRangesList,GRangesList' annotateWithFeatures(target, features, strand.aware = FALSE, intersect.chr = FALSE)
annotateWithFeatures(target, features, strand.aware = FALSE, intersect.chr = FALSE) ## S4 method for signature 'GRanges,GRangesList' annotateWithFeatures(target, features, strand.aware = FALSE, intersect.chr = FALSE) ## S4 method for signature 'GRangesList,GRangesList' annotateWithFeatures(target, features, strand.aware = FALSE, intersect.chr = FALSE)
target |
GRanges or GRangesList object storing chromosome locations to be annotated (e.g. chipseq peaks) |
features |
a named GRangesList object containing GRanges objects different set
of features. The function calculates percent overlaps with and without
precendence at the same time. The order of objects in GRangesList
defines their precedence. If a range in |
strand.aware |
if set to TRUE, annotation features and target features will be overlapped based on strand (def:FALSE) |
intersect.chr |
logical value, whether to select only chromosomes that are common to feature and target. FALSE by default |
returns an AnnotationByFeature
object or if target is
a GRangesList, a list of AnnotationByFeature
objects.
see getMembers
,
heatTargetAnnotation
,
plotTargetAnnotation
library(GenomicRanges) data(cage) data(cpgi) cage$tpm=NULL gl = GRangesList(cage=cage, cpgi=cpgi) bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) annot = annotateWithFeatures(gl, gene.parts, intersect.chr=TRUE)
library(GenomicRanges) data(cage) data(cpgi) cage$tpm=NULL gl = GRangesList(cage=cage, cpgi=cpgi) bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) annot = annotateWithFeatures(gl, gene.parts, intersect.chr=TRUE)
The function annotates GRangesList
or GRanges
object as
overlapping with promoter,exon,intron or intergenic regions.
annotateWithGeneParts(target, feature, strand = FALSE, intersect.chr = FALSE) ## S4 method for signature 'GRanges,GRangesList' annotateWithGeneParts(target, feature, strand = FALSE, intersect.chr = FALSE) ## S4 method for signature 'GRangesList,GRangesList' annotateWithGeneParts(target, feature, strand = FALSE, intersect.chr = FALSE)
annotateWithGeneParts(target, feature, strand = FALSE, intersect.chr = FALSE) ## S4 method for signature 'GRanges,GRangesList' annotateWithGeneParts(target, feature, strand = FALSE, intersect.chr = FALSE) ## S4 method for signature 'GRangesList,GRangesList' annotateWithGeneParts(target, feature, strand = FALSE, intersect.chr = FALSE)
target |
|
feature |
GRangesList object containing GRanges object for promoter, exons, introns and transcription start sites, or simply output of readTranscriptFeatures function |
strand |
If set to TRUE, annotation features and target features will be overlapped based on strand (def:FALSE) |
intersect.chr |
boolean, whether to select only chromosomes that are common to feature and target. FALSE by default |
AnnotationByGeneParts
object or a list of
AnnotationByGeneParts
objects if target is a GRangesList
object.
data(cage) bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) cage.annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE) cage.annot
data(cage) bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) cage.annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE) cage.annot
This object is desgined to hold statistics and information about genomic feature overlaps
members
a matrix showing overlap of target features with annotation genomic features
annotation
a named vector of percentages
precedence
a named vector of percentages
num.annotation
vector
num.precedence
vector
no.of.OlapFeat
vector
perc.of.OlapFeat
vector
This object is desgined to hold statistics and information about genomic feature overlaps
members
a matrix showing overlap of target features with annotation genomic features
annotation
a named vector of percentages
precedence
a named vector of percentages
num.annotation
vector
num.precedence
vector
no.of.OlapFeat
vector
perc.of.OlapFeat
vector
a data frame showing distances to TSS and gene/TSS names and strand
Bins the columns of a matrix using a user provided function
binMatrix(x, bin.num = NULL, fun = "mean") ## S4 method for signature 'ScoreMatrix' binMatrix(x, bin.num = NULL, fun = "mean") ## S4 method for signature 'ScoreMatrixList' binMatrix(x, bin.num = NULL, fun = "mean")
binMatrix(x, bin.num = NULL, fun = "mean") ## S4 method for signature 'ScoreMatrix' binMatrix(x, bin.num = NULL, fun = "mean") ## S4 method for signature 'ScoreMatrixList' binMatrix(x, bin.num = NULL, fun = "mean")
x |
|
bin.num |
|
fun |
|
ScoreMatrix
or ScoreMatrixList
object
# binning the columns in a ScoreMatrix object library(GenomicRanges) target = GRanges(rep(c(1,2),each=7), IRanges(rep(c(1,1,2,3,7,8,9), times=2), width=5), weight = rep(c(1,2),each=7), strand=c('-', '-', '-', '-', '+', '-', '+', '-', '-', '-', '-', '-', '-', '+')) windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5), strand=c('-','+','-','+')) sm = ScoreMatrix(target, windows) bin = binMatrix(sm, bin.num=2)
# binning the columns in a ScoreMatrix object library(GenomicRanges) target = GRanges(rep(c(1,2),each=7), IRanges(rep(c(1,1,2,3,7,8,9), times=2), width=5), weight = rep(c(1,2),each=7), strand=c('-', '-', '-', '-', '+', '-', '+', '-', '-', '-', '-', '-', '-', '+')) windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5), strand=c('-','+','-','+')) sm = ScoreMatrix(target, windows) bin = binMatrix(sm, bin.num=2)
Combine a scoreMatrix into a scoreMatrixList object - when a ScoreMatrix is a first argument
## S3 method for class 'ScoreMatrix' c(..., recursive = FALSE, use.names = TRUE)
## S3 method for class 'ScoreMatrix' c(..., recursive = FALSE, use.names = TRUE)
... |
contais scoreMatrix and scoreMatrixList objects |
recursive |
logical |
use.names |
logical |
returns a scoreMatrixList
object
Combine a scoreMatrix into a scoreMatrixList object - when a ScoreMatrixList is a first argument
## S3 method for class 'ScoreMatrixList' c(..., recursive = FALSE, use.names = TRUE)
## S3 method for class 'ScoreMatrixList' c(..., recursive = FALSE, use.names = TRUE)
... |
contais scoreMatrix and scoreMatrixList objects |
recursive |
logical |
use.names |
logical |
returns a scoreMatrixList
object
Location and tag per million values for CAGE TSS clusters on chr21 and chr22 of human genome (hg19 assembly). The clusters are dowloaded from ENCODE project downloads for NHEK cells.
GRanges
object
This function calculates the significance of overlaps of two sets of features using randomization. #' It returns a distributon of overlaps of a target set with a given randomized feature set. The randomization can be constrained by supplied arguments. The function is still in Beta mode - the regions can overlap excluded regions, and the randomized regions are not disjoint. Please take care that the excluded and included regions are not too strict when compared to the total width of the ranges.
calculateOverlapSignificance(target, feature, chrom.sizes = NULL, stranded = TRUE, keep.strand.prop = TRUE, keep.chrom = TRUE, exclude = NULL, include = NULL, seed = NULL, nrand = 1) ## S4 method for signature 'GRanges,GRanges' calculateOverlapSignificance(target, feature, chrom.sizes = NULL, stranded = TRUE, keep.strand.prop = TRUE, keep.chrom = TRUE, exclude = NULL, include = NULL, seed = NULL, nrand = 1)
calculateOverlapSignificance(target, feature, chrom.sizes = NULL, stranded = TRUE, keep.strand.prop = TRUE, keep.chrom = TRUE, exclude = NULL, include = NULL, seed = NULL, nrand = 1) ## S4 method for signature 'GRanges,GRanges' calculateOverlapSignificance(target, feature, chrom.sizes = NULL, stranded = TRUE, keep.strand.prop = TRUE, keep.chrom = TRUE, exclude = NULL, include = NULL, seed = NULL, nrand = 1)
target |
a GRanges object for which the overlap needs to be calculates |
feature |
a GRanges object to be randomized |
chrom.sizes |
sizes of chromosomes as a named vector (names are chromsomes names and elements of the vectors are lengths). , if not given sizes in GRanges object will be used if no sizes there the end of each chr will be the end last feature on each chr |
stranded |
if FALSE, all of the returned features will be strandless (will have "*" in the strand slot) |
keep.strand.prop |
If TRUE strands will have the same proportion as the features |
keep.chrom |
If TRUE, number of features and randomized features for a chromosome will match. Currently seeting this to FALSE is not supported. |
exclude |
A GRanges object where no randomized feature should overlap, can be gaps or unmappable regions in the genome as an example. |
include |
A GRanges object which defines the boundaries of randomized features |
seed |
random number generator seed |
nrand |
number of randomizations (default:1) |
returns a GRanges object which is randomized version of the feature
convert a data frame read-in from a bed file to a GRanges object for exons
convertBed2Exons(bed.df) ## S4 method for signature 'data.frame' convertBed2Exons(bed.df)
convertBed2Exons(bed.df) ## S4 method for signature 'data.frame' convertBed2Exons(bed.df)
bed.df |
a data.frame where column order and content resembles a bed file with 12 columns |
GRanges
object
one bed track per file is only accepted, the bed files with multiple tracks will cause en error
file = system.file('extdata/chr21.refseq.hg19.bed', package='genomation') bed12 = read.table(file) exons = convertBed2Exons(bed12) head(exons)
file = system.file('extdata/chr21.refseq.hg19.bed', package='genomation') bed12 = read.table(file) exons = convertBed2Exons(bed12) head(exons)
convert a data frame read-in from a bed file to a GRanges object for introns
convertBed2Introns(bed.df) ## S4 method for signature 'data.frame' convertBed2Introns(bed.df)
convertBed2Introns(bed.df) ## S4 method for signature 'data.frame' convertBed2Introns(bed.df)
bed.df |
a data.frame where column order and content resembles a bed file with 12 columns |
GRanges
object
one bed track per file is only accepted, the bed files with multiple tracks will cause en error
file = system.file('extdata/chr21.refseq.hg19.bed', package='genomation') bed12 = read.table(file) introns = convertBed2Introns(bed12) head(introns)
file = system.file('extdata/chr21.refseq.hg19.bed', package='genomation') bed12 = read.table(file) introns = convertBed2Introns(bed12) head(introns)
convert a data frame read-in from a bed file to a GRanges object
convertBedDf(bed) ## S4 method for signature 'data.frame' convertBedDf(bed)
convertBedDf(bed) ## S4 method for signature 'data.frame' convertBedDf(bed)
bed |
a data.frame where column order and content resembles a bed file with 12 columns |
GRanges
object
one bed track per file is only accepted, the bed files with multiple tracks will cause en error
bed files are expected not to have header lines
CpG islands of hg19 assembly of human genome on chr21 and chr22. Downloaded from UCSC genome browser.
GRanges
object
This is an enrichmentMatrix
function for ScoreMatrix
objects,
that enables to normalize ChIP-seq signals with respect to IgG or input DNA control.
\S4method{enrichmentMatrix}{ScoreMatrix,ScoreMatrix}(IP, control)
\S4method{enrichmentMatrix}{ScoreMatrix,ScoreMatrix}(IP, control)
IP |
|
control |
|
ScoreMatrix
object
The function computes an enrichment of IP over control as follow: Suppose both IP and control are ScoreMatrix objects that have same dimensions. Then, the enrichment is calculated usign a formula: log2((IP + 1) / (control + 1)).
#load IP and control BAM files and create ScoreMatrix objects library('genomationData') bam.file_IP <- system.file("extdata", "wgEncodeBroadHistoneH1hescSuz12051317AlnRep1.chr21.bam", package = "genomationData") bam.file_c <- system.file("extdata", "wgEncodeBroadHistoneH1hescCtcfStdAlnRep1.chr21.bam", package = "genomationData") data(promoters) IP <- ScoreMatrix(target = bam.file_IP, windows = promoters, type = 'bam') control <- ScoreMatrix(target = bam.file_c, windows = promoters, type = 'bam') # compute an enrichment of IP over control enrichmentMatrix(IP, control)
#load IP and control BAM files and create ScoreMatrix objects library('genomationData') bam.file_IP <- system.file("extdata", "wgEncodeBroadHistoneH1hescSuz12051317AlnRep1.chr21.bam", package = "genomationData") bam.file_c <- system.file("extdata", "wgEncodeBroadHistoneH1hescCtcfStdAlnRep1.chr21.bam", package = "genomationData") data(promoters) IP <- ScoreMatrix(target = bam.file_IP, windows = promoters, type = 'bam') control <- ScoreMatrix(target = bam.file_c, windows = promoters, type = 'bam') # compute an enrichment of IP over control enrichmentMatrix(IP, control)
This is an enrichmentMatrix
function for IP ScoreMatrixList
object and control ScoreMatrix
object,
that enables to normalize ChIP-seq signals with respect to IgG or input DNA control.
\S4method{enrichmentMatrix}{ScoreMatrixList,ScoreMatrix}(IP, control)
\S4method{enrichmentMatrix}{ScoreMatrixList,ScoreMatrix}(IP, control)
IP |
|
control |
|
ScoreMatrixList
object
The function computes an enrichment of IP over control as follow: Suppose both IP and control are ScoreMatrix objects that have same dimensions. Then, the enrichment is calculated usign a formula: log2((IP + 1) / (control + 1)).
#load IP and control BAM files and create ScoreMatrix objects library('genomationData') data(promoters) bam.file_IP_1 <- system.file("extdata", "wgEncodeSydhTfbsH1hescZnf143IggrabAlnRep1.chr21.bam", package = "genomationData") IP_1 <- ScoreMatrix(target = bam.file_IP_1, windows = promoters, type = 'bam') bam.file_IP_2 <- system.file("extdata", "wgEncodeBroadHistoneH1hescSuz12051317AlnRep1.chr21.bam", package = "genomationData") IP_2 <- ScoreMatrix(target=bam.file_IP_2, windows = promoters, type = 'bam') bam.file_c <- system.file("extdata", "wgEncodeBroadHistoneH1hescCtcfStdAlnRep1.chr21.bam", package = "genomationData") control <- ScoreMatrix(target = bam.file_c, windows = promoters, type = 'bam') # create a ScoreMatrixList object storing IP ScoreMatrix objects sml_IP <- ScoreMatrixList(list(IP1 = IP_1, IP2 = IP_2)) # compute an enrichment of IP over control enrichmentMatrix(sml_IP, control)
#load IP and control BAM files and create ScoreMatrix objects library('genomationData') data(promoters) bam.file_IP_1 <- system.file("extdata", "wgEncodeSydhTfbsH1hescZnf143IggrabAlnRep1.chr21.bam", package = "genomationData") IP_1 <- ScoreMatrix(target = bam.file_IP_1, windows = promoters, type = 'bam') bam.file_IP_2 <- system.file("extdata", "wgEncodeBroadHistoneH1hescSuz12051317AlnRep1.chr21.bam", package = "genomationData") IP_2 <- ScoreMatrix(target=bam.file_IP_2, windows = promoters, type = 'bam') bam.file_c <- system.file("extdata", "wgEncodeBroadHistoneH1hescCtcfStdAlnRep1.chr21.bam", package = "genomationData") control <- ScoreMatrix(target = bam.file_c, windows = promoters, type = 'bam') # create a ScoreMatrixList object storing IP ScoreMatrix objects sml_IP <- ScoreMatrixList(list(IP1 = IP_1, IP2 = IP_2)) # compute an enrichment of IP over control enrichmentMatrix(sml_IP, control)
This is an enrichmentMatrix
function for ScoreMatrixList
objects,
that enables to normalize ChIP-seq signals with respect to IgG or input DNA control.
\S4method{enrichmentMatrix}{ScoreMatrixList,ScoreMatrixList}(IP, control)
\S4method{enrichmentMatrix}{ScoreMatrixList,ScoreMatrixList}(IP, control)
IP |
|
control |
|
ScoreMatrixList
object
The function computes an enrichment of IP over control as follow: Suppose both IP and control are ScoreMatrix objects that have same dimensions. Then, the enrichment is calculated usign a formula: log2((IP + 1) / (control + 1)).
#load IP and control BAM files and create ScoreMatrix objects library('genomationData') data(promoters) bam.file_IP_1 <- system.file("extdata", "wgEncodeSydhTfbsH1hescZnf143IggrabAlnRep1.chr21.bam", package = "genomationData") IP_1 <- ScoreMatrix(target = bam.file_IP_1, windows = promoters, type = 'bam') bam.file_IP_2 <- system.file("extdata", "wgEncodeBroadHistoneH1hescSuz12051317AlnRep1.chr21.bam", package = "genomationData") IP_2 <- ScoreMatrix(target=bam.file_IP_2, windows = promoters, type = 'bam') bam.file_c <- system.file("extdata", "wgEncodeBroadHistoneH1hescCtcfStdAlnRep1.chr21.bam", package = "genomationData") control <- ScoreMatrix(target = bam.file_c, windows = promoters, type = 'bam') # create a ScoreMatrixList object storing IP ScoreMatrix objects sml_IP <- ScoreMatrixList(list(IP1 = IP_1, IP2 = IP_2)) # create a ScoreMatrixList object storing control ScoreMatrix objects sml_control <- ScoreMatrixList(list(c1 = control, c2 = control)) # compute an enrichment of IP over control enrichmentMatrix(sml_IP, sml_control)
#load IP and control BAM files and create ScoreMatrix objects library('genomationData') data(promoters) bam.file_IP_1 <- system.file("extdata", "wgEncodeSydhTfbsH1hescZnf143IggrabAlnRep1.chr21.bam", package = "genomationData") IP_1 <- ScoreMatrix(target = bam.file_IP_1, windows = promoters, type = 'bam') bam.file_IP_2 <- system.file("extdata", "wgEncodeBroadHistoneH1hescSuz12051317AlnRep1.chr21.bam", package = "genomationData") IP_2 <- ScoreMatrix(target=bam.file_IP_2, windows = promoters, type = 'bam') bam.file_c <- system.file("extdata", "wgEncodeBroadHistoneH1hescCtcfStdAlnRep1.chr21.bam", package = "genomationData") control <- ScoreMatrix(target = bam.file_c, windows = promoters, type = 'bam') # create a ScoreMatrixList object storing IP ScoreMatrix objects sml_IP <- ScoreMatrixList(list(IP1 = IP_1, IP2 = IP_2)) # create a ScoreMatrixList object storing control ScoreMatrix objects sml_control <- ScoreMatrixList(list(c1 = control, c2 = control)) # compute an enrichment of IP over control enrichmentMatrix(sml_IP, sml_control)
Provided a GRangesList, finds the combinations of sets of ranges. It is mostly used to look at the combinatorics of transcription factor binding. The function works by, firstly, constructing a union of all ranges in the list, which are then designated by the combinatorics of overlap with the original sets. A caveat of this approach is that the number of possible combinations increases exponentially, so we would advise you to use it with up to 6 data sets. If you wish to take a look at a greater number of factors, methods like self organizing maps or ChromHMM might be more appropriate.
findFeatureComb(gl, width=0, use.names=FALSE, collapse.char=':') ## S4 method for signature 'GRangesList' findFeatureComb(gl, width = 0, use.names = FALSE, collapse.char = ":")
findFeatureComb(gl, width=0, use.names=FALSE, collapse.char=':') ## S4 method for signature 'GRangesList' findFeatureComb(gl, width = 0, use.names = FALSE, collapse.char = ":")
gl |
a |
width |
|
use.names |
a boolean which tells the function whether to return the resulting ranges with a numeric vector which designates each class (the default), or to construct the names of each class using the names from the GRangesList |
collapse.char |
a character which will be used to separate the class names if use.names=TRUE. The default is ':' |
a GRanges object
library(GenomicRanges) g = GRanges(paste('chr',rep(1:2, each=3), sep=''), IRanges(rep(c(1,5,9), times=2), width=3)) gl = GRangesList(g1=g, g2=g[2:5], g3=g[3:4]) findFeatureComb(gl) findFeatureComb(gl, use.names=TRUE)
library(GenomicRanges) g = GRanges(paste('chr',rep(1:2, each=3), sep=''), IRanges(rep(c(1,5,9), times=2), width=3)) gl = GRangesList(g1=g, g2=g[2:5], g3=g[3:4]) findFeatureComb(gl) findFeatureComb(gl, use.names=TRUE)
RefSeq genes of hg19 assembly of human genome on chr21 and chr22. Downloaded from UCSC genome browser.
GRanges
object
This accessor function gets the nearest TSS, its distance to target feature, strand and name of TSS/gene from AnnotationByGeneParts object
getAssociationWithTSS(x) ## S4 method for signature 'AnnotationByGeneParts' getAssociationWithTSS(x)
getAssociationWithTSS(x) ## S4 method for signature 'AnnotationByGeneParts' getAssociationWithTSS(x)
x |
a |
RETURNS a data.frame containing row number of the target features, distance of target to nearest TSS, TSS/Gene name, TSS strand
data(cage) bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) cage.annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE) head(getAssociationWithTSS(cage.annot))
data(cage) bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) cage.annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE) head(getAssociationWithTSS(cage.annot))
This function retrieves percentage/number of
annotation features overlapping with targets.
For example, if AnnotationByFeature
object is containing
statistics of differentially methylated
regions overlapping with gene annotation. This function will return
number/percentage of introns,exons and promoters
overlapping with differentially methylated regions.
getFeatsWithTargetsStats(x,percentage=TRUE) ## S4 method for signature 'AnnotationByFeature' getFeatsWithTargetsStats(x, percentage = TRUE)
getFeatsWithTargetsStats(x,percentage=TRUE) ## S4 method for signature 'AnnotationByFeature' getFeatsWithTargetsStats(x, percentage = TRUE)
x |
a |
percentage |
TRUE|FALSE. If TRUE percentage of annotation features will be returned. If FALSE, number of annotation features will be returned |
RETURNS a vector of percentages or counts showing quantity of annotation features overlapping with target features
data(cage) bed.file=system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) cage.annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE) getFeatsWithTargetsStats(cage.annot)
data(cage) bed.file=system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) cage.annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE) getFeatsWithTargetsStats(cage.annot)
Function to get upstream and downstream adjecent regions to a genomic feature such as CpG islands
getFlanks(grange,flank=2000,clean=TRUE) ## S4 method for signature 'GRanges' getFlanks(grange, flank = 2000, clean = TRUE)
getFlanks(grange,flank=2000,clean=TRUE) ## S4 method for signature 'GRanges' getFlanks(grange, flank = 2000, clean = TRUE)
grange |
GRanges object for the feature |
flank |
number of basepairs for the flanking regions |
clean |
If set to TRUE, flanks overlapping with other main features will be trimmed, and overlapping flanks will be removed. This will remove multiple counts when other features overlap with flanks |
GRanges object for flanking regions
data(cpgi) cpgi.flanks = getFlanks(cpgi) head(cpgi.flanks)
data(cpgi) cpgi.flanks = getFlanks(cpgi) head(cpgi.flanks)
Membership slot defines the overlap of target features with annotation features For example, if a target feature overlaps with an exon
getMembers(x) ## S4 method for signature 'AnnotationByFeature' getMembers(x)
getMembers(x) ## S4 method for signature 'AnnotationByFeature' getMembers(x)
x |
a |
matrix showing overlap of target features with annotation features. 1 for overlap, 0 for non-overlap
This function measures the association between two genomic features by randomizing one feature and counting the overlaps in randomized sets.
That is to say, query
feature will be randomly distributed over the genome (constrained by provided options), and the overlap of target
with these randomized features will be measured.
getRandomEnrichment(target, query, randomizations = 1000, rand.set = NULL, ...) ## S4 method for signature 'GRanges,GRanges' getRandomEnrichment(target, query, randomizations = 1000, rand.set = NULL, ...)
getRandomEnrichment(target, query, randomizations = 1000, rand.set = NULL, ...) ## S4 method for signature 'GRanges,GRanges' getRandomEnrichment(target, query, randomizations = 1000, rand.set = NULL, ...)
target |
a |
query |
a |
randomizations |
number of times the features to be shuffled |
rand.set |
instead of randomly placing features in |
... |
other parameters to be passed to |
returns a RandomEnrichment
object
data(cage) data(cpgi) enr = getRandomEnrichment(cage, cpgi, randomizations=50)
data(cage) data(cpgi) enr = getRandomEnrichment(cage, cpgi, randomizations=50)
This function retrieves percentage/number of target features overlapping with annotation
getTargetAnnotationStats(x,percentage=TRUE,precedence=TRUE) ## S4 method for signature 'AnnotationByFeature' getTargetAnnotationStats(x, percentage = TRUE, precedence = TRUE)
getTargetAnnotationStats(x,percentage=TRUE,precedence=TRUE) ## S4 method for signature 'AnnotationByFeature' getTargetAnnotationStats(x, percentage = TRUE, precedence = TRUE)
x |
a |
percentage |
TRUE|FALSE. If TRUE percentage of target features will be returned. If FALSE, number of target features will be returned |
precedence |
TRUE|FALSE. If TRUE there will be a hierachy of annotation features when calculating numbers (with promoter>exon>intron precedence) That means if a feature overlaps with a promoter it will be counted as promoter overlapping only, or if it is overlapping with a an exon but not a promoter, #' it will be counted as exon overlapping only whether or not it overlaps with an intron. |
a vector of percentages or counts showing quantity of target features overlapping with annotation
data(cage) bed.file=system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) cage.annot=annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE) getTargetAnnotationStats(cage.annot)
data(cage) bed.file=system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) cage.annot=annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE) getTargetAnnotationStats(cage.annot)
Converts a gff formated data.frame into a GenomicRanges object. The GenomicRanges object needs to be properly formated for the function to work.
gffToGRanges(gff.file, filter = NULL, zero.based = FALSE, ensembl = FALSE)
gffToGRanges(gff.file, filter = NULL, zero.based = FALSE, ensembl = FALSE)
gff.file |
path to a gff formatted file.
The file can end in |
filter |
a character designating which elements to retain from the gff file (e.g. exon, CDS, ...) |
zero.based |
|
ensembl |
|
returns a GenomicRanges
object
gff.file = system.file('extdata/chr21.refseq.hg19.gtf', package='genomation') gff = gffToGRanges(gff.file)
gff.file = system.file('extdata/chr21.refseq.hg19.gtf', package='genomation') gff = gffToGRanges(gff.file)
The function makes a heatmap out of given ScoreMatrix
object. If desired
it can use clustering using given clustering function
(e.g. k-means) and plot cluster color codes as a sidebar.
In addition, user can define groups of rows using 'group' argument.
heatMatrix(mat, grid = FALSE, col = NULL, xcoords = NULL, group = NULL, group.col = NULL, order = FALSE, user.order = FALSE, winsorize = c(0, 100), clustfun = NULL, main = "", legend.name = NULL, cex.legend = 1, xlab = NULL, cex.main = 1, cex.lab = 1, cex.axis = 1, newpage = TRUE)
heatMatrix(mat, grid = FALSE, col = NULL, xcoords = NULL, group = NULL, group.col = NULL, order = FALSE, user.order = FALSE, winsorize = c(0, 100), clustfun = NULL, main = "", legend.name = NULL, cex.legend = 1, xlab = NULL, cex.main = 1, cex.lab = 1, cex.axis = 1, newpage = TRUE)
mat |
a |
grid |
if TRUE, grid graphics will be used. if FALSE, base graphics will be used on the top level, so users can use par(mfrow) or par(mfcol) prior to calling the function. Default:FALSE |
col |
a vector of colors, such as the ones created by heat.colors(10). If NULL (which is default), jet color scheme (common in matlab plots) will be used. |
xcoords |
a vector of numbers showing relative positions of the bases or
windows. It must match the number of columns in the |
group |
a list of vectors of row numbers or a factor. This grouping is
used for rowside colors of the heatmap. If it is a list,
each element of the list must be a vector of row numbers. Names
of the elements of the list will be used as names of groups.
If |
group.col |
a vector of color names to be used at the rowside colors if
|
order |
Logical indicating if the rows should be ordered or not
(Default:FALSE). If |
user.order |
a numerical vector indicating the order of groups/clusters (it works only
when |
winsorize |
Numeric vector of two, defaults to c(0,100). This vector determines the upper and lower percentile values to limit the extreme values. For example, c(0,99) will limit the values to only 99th percentile, everything above the 99 percentile will be equalized to the value of 99th percentile.This is useful for visualization of matrices that have outliers. |
clustfun |
a function for clustering
rows of |
main |
a character string for the plot title |
legend.name |
a character label plotted next to the legend |
cex.legend |
A numerical value giving the amount by which legend axis marks should be magnified relative to the default |
xlab |
label a character string for x-axis of the heatmap |
cex.main |
A numerical value giving the amount by which plot title should be magnified |
cex.lab |
A numerical value giving the amount by which axis labels (including 'legend.name') should be magnified relative to the default. |
cex.axis |
A numerical value giving the amount by which axis marks should be magnified relative to the default |
newpage |
logical indicating if |
returns clustering result invisibly, if clustfun is definied
data(cage) data(promoters) scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE, weight.col="tpm") set.seed(1000) heatMatrix(mat=scores1,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS", xcoords=-1000:1000, cex.legend=0.8,main="CAGE clusters on promoters",cex.lab=1, cex.axis=0.9,grid=FALSE) ## examples using clustering functions ## k-means cl1 <- function(x) kmeans(x, centers=3)$cluster set.seed(1000) heatMatrix(mat=scores1,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS", xcoords=-1000:1000,clustfun=cl1, cex.legend=0.8,main="CAGE clusters on promoters",cex.lab=1, cex.axis=0.9,grid=FALSE, user.order=c(1,3,2)) ## hierarchical clustering cl2 <- function(x) cutree(hclust(dist(x), method="complete"), k=3) set.seed(1000) heatMatrix(mat=scores1,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS", xcoords=-1000:1000,clustfun=cl2, cex.legend=0.8,main="CAGE clusters on promoters",cex.lab=1, cex.axis=0.9,grid=FALSE)
data(cage) data(promoters) scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE, weight.col="tpm") set.seed(1000) heatMatrix(mat=scores1,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS", xcoords=-1000:1000, cex.legend=0.8,main="CAGE clusters on promoters",cex.lab=1, cex.axis=0.9,grid=FALSE) ## examples using clustering functions ## k-means cl1 <- function(x) kmeans(x, centers=3)$cluster set.seed(1000) heatMatrix(mat=scores1,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS", xcoords=-1000:1000,clustfun=cl1, cex.legend=0.8,main="CAGE clusters on promoters",cex.lab=1, cex.axis=0.9,grid=FALSE, user.order=c(1,3,2)) ## hierarchical clustering cl2 <- function(x) cutree(hclust(dist(x), method="complete"), k=3) set.seed(1000) heatMatrix(mat=scores1,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS", xcoords=-1000:1000,clustfun=cl2, cex.legend=0.8,main="CAGE clusters on promoters",cex.lab=1, cex.axis=0.9,grid=FALSE)
Function calculates meta-profile(s) from a ScoreMatrix or a ScoreMatrixList, then produces a heatmap or a set of stacked heatmaps for meta-region profiles
heatMeta(mat, centralTend = "mean", profile.names = NULL, xcoords = NULL, col = NULL, meta.rescale = FALSE, winsorize = c(0, 100), legend.name = NULL, cex.legend = 1, xlab = NULL, main = "", cex.lab = 1, cex.axis = 1)
heatMeta(mat, centralTend = "mean", profile.names = NULL, xcoords = NULL, col = NULL, meta.rescale = FALSE, winsorize = c(0, 100), legend.name = NULL, cex.legend = 1, xlab = NULL, main = "", cex.lab = 1, cex.axis = 1)
mat |
|
centralTend |
a character that determines central tendency of meta-profile(s). It takes "mean" (default) or "median". |
profile.names |
a character vector for names of profiles. If NULL,
the names
will be taken from names(mat) if mat is a
|
xcoords |
a vector of numbers showing relative positions of the bases or
windows. It must match the number of columns in the |
col |
a vector of color pallete. color scheme to be used. If NULL, a version of jet colors will be used. |
meta.rescale |
if TRUE meta-region profiles are scaled to 0 to 1 range by subracting the min from profiles and dividing them by max-min. |
winsorize |
Numeric vector of two, defaults to c(0,100). This vector determines the upper and lower percentile values to limit the extreme values. For example, c(0,99) will limit the values to only 99th percentile, everything above the 99 percentile will be equalized to the value of 99th percentile. This is useful for visualization of matrices that have outliers. |
legend.name |
a character label plotted next to the legend |
cex.legend |
A numerical value giving the amount by which legend axis marks should be magnified relative to the default |
xlab |
label a character string for x-axis |
main |
a character string for the plot title |
cex.lab |
A numerical value giving the amount by which axis labels (including 'legend.name') should be magnified relative to the default. |
cex.axis |
A numerical value giving the amount by which axis marks should be magnified relative to the default |
returns meta-profile matrix invisibly.
data(cage) data(promoters) scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE) data(cpgi) scores2=ScoreMatrix(target=cpgi,windows=promoters,strand.aware=TRUE) x=new("ScoreMatrixList",list(scores1,scores2)) heatMeta(mat=x,legend.name="fg",cex.legend=0.8,main="fdf",cex.lab=6, cex.axis=0.9)
data(cage) data(promoters) scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE) data(cpgi) scores2=ScoreMatrix(target=cpgi,windows=promoters,strand.aware=TRUE) x=new("ScoreMatrixList",list(scores1,scores2)) heatMeta(mat=x,legend.name="fg",cex.legend=0.8,main="fdf",cex.lab=6, cex.axis=0.9)
This function plots a heatmap for percentage of overlapping ranges
with provided genomic features. The input object is a list of
AnnotationByFeature
objects, which contains necessary information
about overlap statistics to make the plot.
heatTargetAnnotation(l, cluster = FALSE, col = c("white", "blue"), precedence = FALSE, plot = TRUE)
heatTargetAnnotation(l, cluster = FALSE, col = c("white", "blue"), precedence = FALSE, plot = TRUE)
l |
a |
cluster |
TRUE/FALSE. If TRUE the heatmap is going to be clustered using a default hierarchical clustering scheme. |
col |
a vector of two colors that will be used for interpolation. The first color is the lowest one, the second is the highest one |
precedence |
TRUE|FALSE. If TRUE the precedence of annotation features will be used when plotting. The precedence will be taken from the GRangesList used as annotation. The first GRanges will be treated as most important, and the second as the second most important and so on. Such that, if an interval overlaps with features on that is part of the first GRanges object in the annotation GRangesList, the rest of its overlaps with other elements in the annotation GRangesList will be ignored. This feature is important to have if the user desires that percentage of overlaps adds up to 100. This is only possible when the annotation features are non-overlaping with eachother or there is a hierarchy/precedence among them such as (with promoter>exon>intron precedence). In this case, anything that overlaps with a promoter annotation will only be counted as promoter even if it overlaps with exons or introns. |
plot |
If FALSE, does not plot the heatmap just returns the matrix used to make the heatmap |
returns the matrix used to make the heatmap when plot FALSE, otherwise returns ggplot2 object which can be modified further.
see getMembers
,
annotateWithFeatures
library(GenomicRanges) data(cage) data(cpgi) cage$tpm=NULL gl = GRangesList(cage=cage, cpgi=cpgi) bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) annot = annotateWithFeatures(gl, gene.parts, intersect.chr=TRUE) heatTargetAnnotation(annot)
library(GenomicRanges) data(cage) data(cpgi) cage$tpm=NULL gl = GRangesList(cage=cage, cpgi=cpgi) bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) annot = annotateWithFeatures(gl, gene.parts, intersect.chr=TRUE) heatTargetAnnotation(annot)
Returns a intersection of rows for each matrix in a ScoreMatrixList object. This is done using the rownames of each element in the list.
intersectScoreMatrixList(sml, reorder = FALSE) ## S4 method for signature 'ScoreMatrixList' intersectScoreMatrixList(sml, reorder = FALSE)
intersectScoreMatrixList(sml, reorder = FALSE) ## S4 method for signature 'ScoreMatrixList' intersectScoreMatrixList(sml, reorder = FALSE)
sml |
a |
reorder |
if TRUE |
ScoreMatrixList
object
library(GenomicRanges) target = GRanges(rep(c(1,2),each=7), IRanges(rep(c(1,1,2,3,7,8,9), times=2), width=5), weight = rep(c(1,2),each=7)) windows1 = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5), strand=c('-','+','-','+')) windows2 = windows1[c(1,3)] sml = as(list(ScoreMatrix(target, windows1), ScoreMatrix(target, windows2)), 'ScoreMatrixList') sml intersectScoreMatrixList(sml)
library(GenomicRanges) target = GRanges(rep(c(1,2),each=7), IRanges(rep(c(1,1,2,3,7,8,9), times=2), width=5), weight = rep(c(1,2),each=7)) windows1 = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5), strand=c('-','+','-','+')) windows2 = windows1[c(1,3)] sml = as(list(ScoreMatrix(target, windows1), ScoreMatrix(target, windows2)), 'ScoreMatrixList') sml intersectScoreMatrixList(sml)
The function plots multiple heatmaps for a ScoreMatrixList object side by side. Each matrix can have different color schemes but it is essential that each matrix is obtained from same regions or neighbouring regions.
multiHeatMatrix(sml, grid = TRUE, col = NULL, xcoords = NULL, group = NULL, group.col = NULL, order = FALSE, user.order = FALSE, winsorize = c(0, 100), clustfun = FALSE, clust.matrix = NULL, column.scale = TRUE, matrix.main = NULL, common.scale = FALSE, legend = TRUE, legend.name = NULL, cex.legend = 0.8, xlab = NULL, cex.lab = 1, cex.main = 1, cex.axis = 0.8, newpage = TRUE)
multiHeatMatrix(sml, grid = TRUE, col = NULL, xcoords = NULL, group = NULL, group.col = NULL, order = FALSE, user.order = FALSE, winsorize = c(0, 100), clustfun = FALSE, clust.matrix = NULL, column.scale = TRUE, matrix.main = NULL, common.scale = FALSE, legend = TRUE, legend.name = NULL, cex.legend = 0.8, xlab = NULL, cex.lab = 1, cex.main = 1, cex.axis = 0.8, newpage = TRUE)
sml |
a |
grid |
if TRUE, grid graphics will be used. if FALSE, base graphics will be used on the top level, so users can use par(mfrow) or par(mfcol) prior to calling the function. Default:FALSE |
col |
a color palette or list of color palettes, such as list(heat.colors(10),topo.colors(10)). If it is a list, it is length must match the number of matrices to be plotted. If it is a single palette every heatmap will have the same colors. |
xcoords |
a vector of numbers showing relative positions of the bases or
windows or a list of vectors.
The elements of the list must match the number of columns in the
corresponding |
group |
a list of vectors of row numbers or a factor. The rows will be
reordered to match their grouping. The grouping is
used for rowside colors of the heatmap. If it is a list,
each element of the list must be a vector of row numbers. Names
of the elements of the list will be used as names of groups.
If |
group.col |
a vector of color names to be used at the rowside colors if
|
order |
Logical indicating if the rows should be ordered or not
(Default:FALSE). If |
user.order |
a numerical vector indicating the order of groups/clusters (it works only
when |
winsorize |
Numeric vector of two, defaults to c(0,100). This vector determines the upper and lower percentile values to limit the extreme values. For example, c(0,99) will limit the values to only 99th percentile for a matrix, everything above the 99 percentile will be equalized to the value of 99th percentile.This is useful for visualization of matrices that have outliers. |
clustfun |
a function for clustering
rows of |
clust.matrix |
a numerical vector of indexes or a character vector of names
of the |
column.scale |
Logical indicating if matrices should be scaled or not,
prior to clustering or ordering. Setting this
to TRUE scales the columns of the
matrices using,
|
matrix.main |
a vector of strings for the titles of the heatmaps. If NULL
titles will be obtained from names of the |
common.scale |
if TRUE (Default:FALSE) all the heatmap colors will be coming from the same score scale, although each heatmap color scale can be different. The color intensities will be coming from the same scale. The scale will be determined by minimum of all matrices and maximum of all matrices. This is useful when all matrices are on the same score scale. If FALSE, the color scale will be determined by minimum and maximum of each matrix individually. |
legend |
if TRUE and color legend for the heatmap is drawn. |
legend.name |
a vector of legend labels to be plotted with legends of each heatmap. If it is a length 1 vector, all heatmaps will have the same legend label. |
cex.legend |
A numerical value giving the amount by which legend axis marks should be magnified relative to the default |
xlab |
a vector of character strings for x-axis labels of the heatmaps. if it is length 1, all heatmaps will have the same label. |
cex.lab |
A numerical value giving the amount by which axis labels (including 'legend.name') should be magnified relative to the default. |
cex.main |
A numerical value giving the amount by which plot title should be magnified |
cex.axis |
A numerical value giving the amount by which axis marks should be magnified relative to the default |
newpage |
logical indicating if |
invisibly returns the order of rows, if clustfun is provided and/or order=TRUE
data(cage) data(promoters) scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE) data(cpgi) scores2=ScoreMatrix(target=cpgi,windows=promoters,strand.aware=TRUE) sml=new("ScoreMatrixList",list(a=scores1,b=scores2)) # use with k-means multiHeatMatrix(sml, clustfun=function(x) kmeans(x, centers=2)$cluster, cex.axis=0.8,xcoords=c(-1000,1000), winsorize=c(0,99), legend.name=c("tpm","coverage"),xlab="region around TSS") # use with hierarchical clustering cl2 <- function(x) cutree(hclust(dist(x), method="complete"), k=2) multiHeatMatrix(sml,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS", xcoords=-1000:1000,clustfun=cl2, cex.legend=0.8,cex.lab=1, cex.axis=0.9,grid=FALSE) # use different colors require(RColorBrewer) col.cage= brewer.pal(9,"Blues") col.cpgi= brewer.pal(9,"YlGn") multiHeatMatrix(sml, clustfun=function(x) kmeans(x, centers=2)$cluster, cex.axis=0.8,xcoords=c(-1000,1000), winsorize=c(0,99),col=list(col.cage,col.cpgi), legend.name=c("tpm","coverage"),xlab="region around TSS")
data(cage) data(promoters) scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE) data(cpgi) scores2=ScoreMatrix(target=cpgi,windows=promoters,strand.aware=TRUE) sml=new("ScoreMatrixList",list(a=scores1,b=scores2)) # use with k-means multiHeatMatrix(sml, clustfun=function(x) kmeans(x, centers=2)$cluster, cex.axis=0.8,xcoords=c(-1000,1000), winsorize=c(0,99), legend.name=c("tpm","coverage"),xlab="region around TSS") # use with hierarchical clustering cl2 <- function(x) cutree(hclust(dist(x), method="complete"), k=2) multiHeatMatrix(sml,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS", xcoords=-1000:1000,clustfun=cl2, cex.legend=0.8,cex.lab=1, cex.axis=0.9,grid=FALSE) # use different colors require(RColorBrewer) col.cage= brewer.pal(9,"Blues") col.cpgi= brewer.pal(9,"YlGn") multiHeatMatrix(sml, clustfun=function(x) kmeans(x, centers=2)$cluster, cex.axis=0.8,xcoords=c(-1000,1000), winsorize=c(0,99),col=list(col.cage,col.cpgi), legend.name=c("tpm","coverage"),xlab="region around TSS")
Arithmetic method for ScoreMatrixList
## S4 method for signature 'numeric,ScoreMatrixList' Ops(e1, e2)
## S4 method for signature 'numeric,ScoreMatrixList' Ops(e1, e2)
e1 |
the numeric value |
e2 |
the |
ScoreMatrixList
Arithmetic method for ScoreMatrix and ScoreMatrixList classes
## S4 method for signature 'ScoreMatrix,ScoreMatrix' Ops(e1, e2)
## S4 method for signature 'ScoreMatrix,ScoreMatrix' Ops(e1, e2)
e1 |
the |
e2 |
the |
ScoreMatrix
Arithmetic method for ScoreMatrixList
## S4 method for signature 'ScoreMatrixList,numeric' Ops(e1, e2)
## S4 method for signature 'ScoreMatrixList,numeric' Ops(e1, e2)
e1 |
the |
e2 |
the numeric value |
ScoreMatrixList
Arithmetic methods for ScoreMatrixList
## S4 method for signature 'ScoreMatrixList,ScoreMatrixList' Ops(e1, e2)
## S4 method for signature 'ScoreMatrixList,ScoreMatrixList' Ops(e1, e2)
e1 |
the |
e2 |
the |
ScoreMatrixList
Reorder all elements of a ScoreMatrixList to a given ordering vector
orderBy(sml, ord.vec) ## S4 method for signature 'ScoreMatrixList' orderBy(sml, ord.vec)
orderBy(sml, ord.vec) ## S4 method for signature 'ScoreMatrixList' orderBy(sml, ord.vec)
sml |
|
ord.vec |
an integer vector |
ScoreMatrixList
object
library(GenomicRanges) data(cage) data(cpgi) data(promoters) cage$tpm = NULL targets = GRangesList(cage=cage, cpgi=cpgi) sml = ScoreMatrixList(targets, promoters, bin.num=10) kmeans.clust = kmeans(sml$cage,3) sml.ordered = orderBy(sml, kmeans.clust$cluster) multiHeatMatrix(sml.ordered)
library(GenomicRanges) data(cage) data(cpgi) data(promoters) cage$tpm = NULL targets = GRangesList(cage=cage, cpgi=cpgi) sml = ScoreMatrixList(targets, promoters, bin.num=10) kmeans.clust = kmeans(sml$cage,3) sml.ordered = orderBy(sml, kmeans.clust$cluster) multiHeatMatrix(sml.ordered)
The function produces a base-pair resolution matrix or matrices of scores that correspond to k-mer or PWM matrix occurrence over predefined windows that have equal width. It finds either positions of pattern hits above a specified threshold and creates score matrix filled with 1 (presence of pattern) and 0 (its absence) or matrix with scores themselves. If pattern is a character of length 1 or PWM matrix then the function returns a ScoreMatrix object, if character of length more than 1 or list of PWMs then ScoreMatrixList.
patternMatrix(pattern, windows, genome = NULL, min.score = 0.8, asPercentage = FALSE, cores = 1) \S4method{patternMatrix}{character,DNAStringSet}(pattern, windows, asPercentage, cores) \S4method{patternMatrix}{character,GRanges,BSgenome}(pattern, windows, genome, cores) \S4method{patternMatrix}{matrix,DNAStringSet}(pattern, windows, min.score, asPercentage, cores) \S4method{patternMatrix}{matrix,GRanges,BSgenome}(pattern, windows, genome, min.score, asPercentage, cores) \S4method{patternMatrix}{list,DNAStringSet}(pattern, windows, min.score, asPercentage, cores) \S4method{patternMatrix}{list,GRanges,BSgenome}(pattern, windows, genome, min.score, asPercentage, cores)
patternMatrix(pattern, windows, genome = NULL, min.score = 0.8, asPercentage = FALSE, cores = 1) \S4method{patternMatrix}{character,DNAStringSet}(pattern, windows, asPercentage, cores) \S4method{patternMatrix}{character,GRanges,BSgenome}(pattern, windows, genome, cores) \S4method{patternMatrix}{matrix,DNAStringSet}(pattern, windows, min.score, asPercentage, cores) \S4method{patternMatrix}{matrix,GRanges,BSgenome}(pattern, windows, genome, min.score, asPercentage, cores) \S4method{patternMatrix}{list,DNAStringSet}(pattern, windows, min.score, asPercentage, cores) \S4method{patternMatrix}{list,GRanges,BSgenome}(pattern, windows, genome, min.score, asPercentage, cores)
pattern |
matrix (a PWM matrix), list of matrices or a character vector of length 1 or more. A matrix is a PWM matrix that needs to have one row for each nucleotide ("A","C","G" and "T" respectively). IUPAC ambiguity codes can be used and it will match any letter in the subject that is associated with the code. |
windows |
|
genome |
|
min.score |
numeric or character indicating minimum score to count a match.
It can be given as a character string containing
a percentage of the highest possible score or a single number
(by default "80%" or 0.8). If min.score is set to NULL
then |
asPercentage |
boolean telling whether scores represent percentage of the maximal motif PWM score (default: TRUE) or raw scores (FALSE). |
cores |
the number of cores to use (default: 1). It is supported only on Unix-like platforms. |
patternMatrix
is based on functions from the seqPattern package:
getPatternOccurrenceList function to find position of pattern that is a character vector in
a list of sequences (a DNAStringSet object)
and adapted function motifScanHits to find pattern that is a PWM matrix
in sequences (a DNAStringSet object).
If cores > 1 is provided then for every window occurrence of pattern is counted in paralallel.
returns a scoreMatrix
object or a scoreMatrixList
object
library(Biostrings) # consensus sequence of the ctcf motif motif = "CCGCGNGGNGGCAG" # Creates 10 random DNA sequences seqs = sapply(1:10, function(x) paste(sample(c("A","T","G","C"), 180, replace=TRUE), collapse="")) windows = DNAStringSet(seqs) p = patternMatrix(pattern=motif, windows=windows, min.score=0.8) p
library(Biostrings) # consensus sequence of the ctcf motif motif = "CCGCGNGGNGGCAG" # Creates 10 random DNA sequences seqs = sapply(1:10, function(x) paste(sample(c("A","T","G","C"), 180, replace=TRUE), collapse="")) windows = DNAStringSet(seqs) p = patternMatrix(pattern=motif, windows=windows, min.score=0.8) p
Function calculates meta-profile(s) from a ScoreMatrix or a ScoreMatrixList, then produces a line plot or a set of line plots for meta-region profiles
plotMeta(mat, centralTend = "mean", overlay = TRUE, winsorize = c(0, 100), profile.names = NULL, xcoords = NULL, meta.rescale = FALSE, smoothfun = NULL, line.col = NULL, dispersion = NULL, dispersion.col = NULL, ylim = NULL, ylab = "average score", xlab = "bases", ...)
plotMeta(mat, centralTend = "mean", overlay = TRUE, winsorize = c(0, 100), profile.names = NULL, xcoords = NULL, meta.rescale = FALSE, smoothfun = NULL, line.col = NULL, dispersion = NULL, dispersion.col = NULL, ylim = NULL, ylab = "average score", xlab = "bases", ...)
mat |
|
centralTend |
a character that determines central tendency of meta-profile(s). It takes "mean" (default) or "median". |
overlay |
If TRUE multiple profiles will be overlayed in the same plot (Default:TRUE). If FALSE, and mat is a ScoreMatrixList, consider using par(mfrow=c(1,length(mat))) to see the plots from all matrices at once. |
winsorize |
Numeric vector of two, defaults to c(0,100). This vector determines the upper and lower percentile values to limit the extreme values. For example, c(0,99) will limit the values to only 99th percentile, everything above the 99 percentile will be equalized to the value of 99th percentile.This is useful for visualization of matrices that have outliers. |
profile.names |
a character vector for names of the profiles. The order should be same as the as the order of ScoreMatrixList. |
xcoords |
a numeric vector which designates relative base positions of the meta-region profiles. For example, for a 2001 column ScoreMatrix, xcoord=-1000:1000 indicates relative positions of each column in the score matrix. If NULL (Default), xcoords equals to 1:ncol(mat) |
meta.rescale |
if TRUE meta-region profiles are scaled to 0 to 1 range by subtracting the min from profiles and dividing them by max-min. If dispersion is not NULL, then dispersion will be scaled as well. |
smoothfun |
a function to smooth central tendency and dispersion bands (Default: NULL), e.g. stats::lowess. |
line.col |
color of lines for |
dispersion |
shows dispersion interval bands around
|
dispersion.col |
color of bands of |
ylim |
same as |
ylab |
same as |
xlab |
same as |
... |
other options to |
returns the meta-region profiles invisibly as a matrix.
Score matrices are plotted according to ScoreMatrixList order. If ScoreMatrixList contains more than one matrix then they will overlap each other on a plot, i.e. the first one is plotted first and every next one overlays previous one(s) and the last one is the topmost.
Missing values in data slow down plotting dispersion around central tendency.
The reason is that dispersion is plotted only for non-missing values,
for each segment that
contains numerical values graphics::polygon
function is used to plot dispersion bands.
There might be a situation, when in a column of ScoreMatrix is only one
numeric number and the rest are NAs, then at corresponding position
only central tendency will be plotted.
Notches show the 95 percent confidence interval for the median according to an approximation based on the normal distribution. They are used to compare groups - if notches corresponding to adjacent base pairs on the plot do not overlap, this is strong evidence that medians differ. Small sample sizes (5-10) can cause notches to extend beyond the interquartile range (IQR) (Martin Krzywinski et al. Nature Methods 11, 119-120 (2014))
data(cage) data(promoters) scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE) data(cpgi) scores2=ScoreMatrix(target=cpgi,windows=promoters,strand.aware=TRUE) # create a new ScoreMatrixList x=new("ScoreMatrixList",list(scores1,scores2)) plotMeta(mat=x,overlay=TRUE,main="my plotowski") # plot dispersion nd smooth central tendency and variation interval bands plotMeta(mat=x, centralTend="mean", dispersion="se", winsorize=c(0,99), main="Dispersion as interquartile band", lwd=4, smoothfun=function(x) stats::lowess(x, f = 1/5))
data(cage) data(promoters) scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE) data(cpgi) scores2=ScoreMatrix(target=cpgi,windows=promoters,strand.aware=TRUE) # create a new ScoreMatrixList x=new("ScoreMatrixList",list(scores1,scores2)) plotMeta(mat=x,overlay=TRUE,main="my plotowski") # plot dispersion nd smooth central tendency and variation interval bands plotMeta(mat=x, centralTend="mean", dispersion="se", winsorize=c(0,99), main="Dispersion as interquartile band", lwd=4, smoothfun=function(x) stats::lowess(x, f = 1/5))
This function plots a pie or bar chart for showing percentages of targets annotated by genic parts or other query features
plotTargetAnnotation(x, precedence = TRUE, col = getColors(length(x@annotation)), cex.legend = 1, ...) ## S4 method for signature 'AnnotationByFeature' plotTargetAnnotation(x, precedence = TRUE, col = getColors(length(x@annotation)), cex.legend = 1, ...)
plotTargetAnnotation(x, precedence = TRUE, col = getColors(length(x@annotation)), cex.legend = 1, ...) ## S4 method for signature 'AnnotationByFeature' plotTargetAnnotation(x, precedence = TRUE, col = getColors(length(x@annotation)), cex.legend = 1, ...)
x |
a |
precedence |
TRUE|FALSE. If TRUE there will be a hierachy of annotation
features when calculating numbers
(with promoter>exon>intron precedence).
This option is only valid when x is a
|
col |
a vector of colors for piechart or the bar plot |
cex.legend |
a numeric value of length 1 to specify the size of the legend. By default 1. |
... |
graphical parameters to be passed to |
plots a piechart or a barplot for percentage of the target features overlapping with annotation
data(cage) bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE) plotTargetAnnotation(annot)
data(cage) bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE) plotTargetAnnotation(annot)
promoters of hg19 assembly of human genome on chr21 and chr22. Promoter set is derived from refseq TSS.
GRanges
object
getRandomEnrichment
function resultsThe resulting object stores the results of getRandomEnrichment
function
orig.cnt
:number of features overlapping with query at getRandomEnrichment
rand.olap.dist
:set of number of features overlapping with randomized queries at getRandomEnrichment
log2fc
: log2 fold change calculated by dividing orig.cnt
by mean(rand.olap.dist
) and taking log2 of that result
p.value
:P-value assuming rand.olap.dist
has a normal distribution and comparing orig.cnt
with that distribution
rand.p.value
:p-value from randomization by calculation the proportion of how many times a random number of overlap exceeds the original number of overlap
This function randomly distributes the coordinates of genomic features which
is stored in a GRanges
object. The randomization can be constrained by
supplied arguments.
The function is still in Beta mode - the regions can overlap excluded regions, and the randomized regions are not disjoint. Please take care that the excluded and included regions are not too strict when compared to the total width of the ranges.
randomizeFeature(feature, chrom.sizes = NULL, stranded = TRUE, keep.strand.prop = TRUE, keep.chrom = TRUE, exclude = NULL, include = NULL, seed = NULL, nrand = 1) ## S4 method for signature 'GRanges' randomizeFeature(feature, chrom.sizes = NULL, stranded = TRUE, keep.strand.prop = TRUE, keep.chrom = TRUE, exclude = NULL, include = NULL, seed = NULL, nrand = 1)
randomizeFeature(feature, chrom.sizes = NULL, stranded = TRUE, keep.strand.prop = TRUE, keep.chrom = TRUE, exclude = NULL, include = NULL, seed = NULL, nrand = 1) ## S4 method for signature 'GRanges' randomizeFeature(feature, chrom.sizes = NULL, stranded = TRUE, keep.strand.prop = TRUE, keep.chrom = TRUE, exclude = NULL, include = NULL, seed = NULL, nrand = 1)
feature |
a GRanges object to be randomized |
chrom.sizes |
sizes of chromosomes as a named vector (names are chromsomes names and elements of the vectors are lengths). , if not given sizes in GRanges object will be used if no sizes there the end of each chr will be the end last feature on each chr |
stranded |
if FALSE, all of the returned features will be strandless (will have "*" in the strand slot) |
keep.strand.prop |
If TRUE strands will have the same proportion as the features |
keep.chrom |
If TRUE, number of features and randomized features for a chromosome will match. Currently seeting this to FALSE is not supported. |
exclude |
A GRanges object where no randomized feature should overlap, can be gaps or unmappable regions in the genome as an example. |
include |
A GRanges object which defines the boundaries of randomized features. If not provided the whole genome is used, as defined using the chrom.sizes parameter. |
seed |
random number generator seed |
nrand |
number of randomizations (default:1) |
returns a GRanges object which is randomized version of the feature, along with a "set" column in the metadata which designates to which iteration of the randomization the range belong.
The function reads a BED file that contains location and other information
on genomic features and returns a GRanges
object.
The minimal information that the BED file has to have is chromosome,
start and end columns. it can handle all BED formats up to 12 columns.
readBed(file, track.line = FALSE, remove.unusual = FALSE, zero.based = TRUE)
readBed(file, track.line = FALSE, remove.unusual = FALSE, zero.based = TRUE)
file |
location of the file, a character string such as: "/home/user/my.bed"
or the input itself as a string (containing at least one \n).
The file can end in |
track.line |
the number of track lines to skip, "auto" to detect them automatically or FALSE(default) if the bed file doesn't have track lines |
remove.unusual |
if TRUE remove the chromosomes with unsual names, such as chrX_random (Default:FALSE) |
zero.based |
a boolean which tells whether the ranges in the bed file are 0 or 1 base encoded. (Default: TRUE) |
GRanges
object
my.file=system.file("extdata","chr21.refseq.hg19.bed",package="genomation") refseq = readBed(my.file,track.line=FALSE,remove.unusual=FALSE) head(refseq)
my.file=system.file("extdata","chr21.refseq.hg19.bed",package="genomation") refseq = readBed(my.file,track.line=FALSE,remove.unusual=FALSE) head(refseq)
A function to read the Encode formatted broad peak file into a GRanges object
readBroadPeak(file, track.line=FALSE, zero.based=TRUE)
readBroadPeak(file, track.line=FALSE, zero.based=TRUE)
file |
an absolute or relative path to a bed file formatted by the Encode broadPeak standard.
The file can end in |
track.line |
the number of track lines to skip, "auto" to detect them automatically or FALSE(default) if the bed file doesn't have track lines |
zero.based |
a boolean which tells whether the ranges in the bed file are 0 or 1 base encoded. (Default: TRUE) |
a GRanges object
broad.peak.file = system.file('extdata',"ex.broadPeak", package='genomation') broad.peak = readBroadPeak(broad.peak.file) head(broad.peak)
broad.peak.file = system.file('extdata',"ex.broadPeak", package='genomation') broad.peak = readBroadPeak(broad.peak.file) head(broad.peak)
A function to read-in genomic features and their upstream and downstream adjecent regions such as CpG islands and their shores
readFeatureFlank(location,remove.unusual=TRUE,flank=2000, clean=TRUE,feature.flank.name=NULL) ## S4 method for signature 'character' readFeatureFlank(location, remove.unusual = TRUE, flank = 2000, clean = TRUE, feature.flank.name = NULL)
readFeatureFlank(location,remove.unusual=TRUE,flank=2000, clean=TRUE,feature.flank.name=NULL) ## S4 method for signature 'character' readFeatureFlank(location, remove.unusual = TRUE, flank = 2000, clean = TRUE, feature.flank.name = NULL)
location |
for the bed file of the feature. |
remove.unusual |
remove chromsomes with unsual names random, Un and antyhing with "_" character |
flank |
number of basepairs for the flanking regions |
clean |
If set to TRUE, flanks overlapping with other main features will be trimmed |
feature.flank.name |
the names for feature and flank ranges, it should be a character vector of length 2. example: c("CpGi","shores") |
a GRangesList object containing one GRanges object for flanks and one for GRanges object for the main feature. NOTE: This can not return a CompressedGRangesList at the moment because flanking regions do not have to have the same column name as the feature. CompressedGRangesList elements should resemble each other in the column content. We can not satisfy that criteria for the flanks
cgi.path = system.file('extdata/chr21.CpGi.hg19.bed', package='genomation') cgi.shores = readFeatureFlank(cgi.path) cgi.shores
cgi.path = system.file('extdata/chr21.CpGi.hg19.bed', package='genomation') cgi.shores = readFeatureFlank(cgi.path) cgi.shores
The function reads a tabular text file that contains location and other information
on genomic features and returns a GRanges
object.
The minimal information that the file has to have is chromosome,
start and end columns. Strand information is not compulsory.
readGeneric(file, chr = 1, start = 2, end = 3, strand = NULL, meta.cols = NULL, keep.all.metadata = FALSE, zero.based = FALSE, remove.unusual = FALSE, header = FALSE, skip = 0, sep = "\t")
readGeneric(file, chr = 1, start = 2, end = 3, strand = NULL, meta.cols = NULL, keep.all.metadata = FALSE, zero.based = FALSE, remove.unusual = FALSE, header = FALSE, skip = 0, sep = "\t")
file |
location of the file, a character string such as: "/home/user/my.bed" or the input itself as a string (containing at least one \n). |
chr |
number of the column that has chromsomes information in the table (Def:1) |
start |
number of the column that has start coordinates in the table (Def:2) |
end |
number of the column that has end coordinates in the table (Def:3) |
strand |
number of the column that has strand information, only -/+ is accepted (Default:NULL) |
meta.cols |
named |
keep.all.metadata |
|
zero.based |
a boolean which tells whether the ranges in the bed file are 0 or 1 base encoded. (Default: FALSE) |
remove.unusual |
if TRUE(default) remove the chromosomes with unsual names, such as chrX_random (Default:FALSE) |
header |
whether the original file contains a header line
which designates the column names. If |
skip |
number of lines to skip. If there is a header line(s) you do not wish to include you can use skip argument to skip that line. |
sep |
a single character which designates the separator in the file. The default value is tab. |
GRanges
object
my.file=system.file("extdata","chr21.refseq.hg19.bed",package="genomation") refseq = readGeneric(my.file,chr=1,start=2,end=3,strand=NULL, meta.cols=list(score=5,name=4), keep.all.metadata=FALSE, zero.based=TRUE) head(refseq)
my.file=system.file("extdata","chr21.refseq.hg19.bed",package="genomation") refseq = readGeneric(my.file,chr=1,start=2,end=3,strand=NULL, meta.cols=list(score=5,name=4), keep.all.metadata=FALSE, zero.based=TRUE) head(refseq)
A function to read the Encode formatted narrowPeak file into a GRanges object
readNarrowPeak(file, track.line=FALSE, zero.based=TRUE)
readNarrowPeak(file, track.line=FALSE, zero.based=TRUE)
file |
an absolute or relative path to a bed file formatted by the Encode
narrowPeak standard. The file can end in |
track.line |
the number of track lines to skip, "auto" to detect them automatically or FALSE(default) if the bed file doesn't have track lines |
zero.based |
a boolean which tells whether the ranges in the bed file are 0 or 1 base encoded. (Default: TRUE) |
a GRanges object
narrow.peak.file = system.file('extdata',"ex.narrowPeak", package='genomation') narrow.peak = readBroadPeak(narrow.peak.file) head(narrow.peak)
narrow.peak.file = system.file('extdata',"ex.narrowPeak", package='genomation') narrow.peak = readBroadPeak(narrow.peak.file) head(narrow.peak)
Function for reading exon intron and promoter structure from a given bed file
readTranscriptFeatures(location,remove.unusual=TRUE, up.flank=1000,down.flank=1000,unique.prom=TRUE) ## S4 method for signature 'character' readTranscriptFeatures(location, remove.unusual = TRUE, up.flank = 1000, down.flank = 1000, unique.prom = TRUE)
readTranscriptFeatures(location,remove.unusual=TRUE, up.flank=1000,down.flank=1000,unique.prom=TRUE) ## S4 method for signature 'character' readTranscriptFeatures(location, remove.unusual = TRUE, up.flank = 1000, down.flank = 1000, unique.prom = TRUE)
location |
location of the bed file with 12 or more columns.
The file can end in |
remove.unusual |
remove the chromomesomes with unsual names, mainly random chromsomes etc |
up.flank |
up-stream from TSS to detect promoter boundaries |
down.flank |
down-stream from TSS to detect promoter boundaries |
unique.prom |
get only the unique promoters, promoter boundaries will not have a gene name if you set this option to be TRUE |
a GRangesList
containing locations of exon/intron/promoter/TSS
one bed track per file is only accepted, the bed files with multiple tracks will cause en error
my.bed12.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") my.bed12.file feats = readTranscriptFeatures(my.bed12.file) names(feats) sapply(feats, head)
my.bed12.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") my.bed12.file feats = readTranscriptFeatures(my.bed12.file) names(feats) sapply(feats, head)
Scales the values in the matrix by rows and/or columns
scaleScoreMatrix(mat, columns = FALSE, rows = TRUE, scalefun = NULL) ## S4 method for signature 'ScoreMatrix' scaleScoreMatrix(mat, columns = FALSE, rows = TRUE, scalefun = NULL)
scaleScoreMatrix(mat, columns = FALSE, rows = TRUE, scalefun = NULL) ## S4 method for signature 'ScoreMatrix' scaleScoreMatrix(mat, columns = FALSE, rows = TRUE, scalefun = NULL)
mat |
|
columns |
|
rows |
|
scalefun |
function object that takes as input a matrix and returns a matrix. By default the argument is set to (x - mean(x))/(max(x)-min(x)+1) |
ScoreMatrix
object
# scale the rows of a scoreMatrix object library(GenomicRanges) target = GRanges(rep(c(1,2),each=7), IRanges(rep(c(1,1,2,3,7,8,9), times=2), width=5), weight = rep(c(1,2),each=7), strand=c('-', '-', '-', '-', '+', '-', '+', '-', '-', '-', '-', '-', '-', '+')) windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5), strand=c('-','+','-','+')) sm = ScoreMatrix(target, windows) ssm = scaleScoreMatrix(sm, rows=TRUE)
# scale the rows of a scoreMatrix object library(GenomicRanges) target = GRanges(rep(c(1,2),each=7), IRanges(rep(c(1,1,2,3,7,8,9), times=2), width=5), weight = rep(c(1,2),each=7), strand=c('-', '-', '-', '-', '+', '-', '+', '-', '-', '-', '-', '-', '-', '+')) windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5), strand=c('-','+','-','+')) sm = ScoreMatrix(target, windows) ssm = scaleScoreMatrix(sm, rows=TRUE)
Scales each ScoreMatrix in the ScoreMatrixList object, by rows and/or columns
scaleScoreMatrixList(sml, columns, rows, scalefun) ## S4 method for signature 'ScoreMatrixList' scaleScoreMatrixList(sml, columns = FALSE, rows = TRUE, scalefun = NULL)
scaleScoreMatrixList(sml, columns, rows, scalefun) ## S4 method for signature 'ScoreMatrixList' scaleScoreMatrixList(sml, columns = FALSE, rows = TRUE, scalefun = NULL)
sml |
a |
columns |
a |
rows |
a |
scalefun |
a function object that takes as input a matrix and returns a matrix. By default the argument is set to the R scale function with center=TRUE and scale=TRUE |
ScoreMatrixList
object
library(GenomicRanges) data(cage) data(cpgi) data(promoters) cage$tpm = NULL targets = GRangesList(cage=cage, cpgi=cpgi) sml = ScoreMatrixList(targets, promoters, bin.num=10, strand.aware=TRUE) sml.scaled = scaleScoreMatrixList(sml, rows=TRUE) sml.scaled multiHeatMatrix(sml)
library(GenomicRanges) data(cage) data(cpgi) data(promoters) cage$tpm = NULL targets = GRangesList(cage=cage, cpgi=cpgi) sml = ScoreMatrixList(targets, promoters, bin.num=10, strand.aware=TRUE) sml.scaled = scaleScoreMatrixList(sml, rows=TRUE) sml.scaled multiHeatMatrix(sml)
The funcion produces a base-pair resolution matrix of scores for given equal
width windows of interest. The returned matrix can be used to
draw meta profiles or heatmap of read coverage or wig track-like data.
The windows
argument can be a predefined region around transcription start sites
or other regions of interest that have equal lengths
The function removes all window that fall off the Rle object -
have the start coordinate < 1 or end coordinate > length(Rle)
The function takes the intersection of names in the Rle and GRanges objects.
On Windows OS the function will give an error if the target is a file in .bigWig format.
ScoreMatrix(target, windows, strand.aware = FALSE, weight.col = NULL, is.noCovNA = FALSE, type = "auto", rpm = FALSE, unique = FALSE, extend = 0, param = NULL, bam.paired.end = FALSE, library.size = NULL) \S4method{ScoreMatrix}{RleList,GRanges}(target,windows,strand.aware) \S4method{ScoreMatrix}{GRanges,GRanges}(target, windows, strand.aware, weight.col, is.noCovNA) \S4method{ScoreMatrix}{character,GRanges}(target, windows, strand.aware, weight.col=NULL,is.noCovNA=FALSE, type='auto', rpm=FALSE, unique=FALSE, extend=0, param=NULL, bam.paired.end=FALSE, library.size=NULL)
ScoreMatrix(target, windows, strand.aware = FALSE, weight.col = NULL, is.noCovNA = FALSE, type = "auto", rpm = FALSE, unique = FALSE, extend = 0, param = NULL, bam.paired.end = FALSE, library.size = NULL) \S4method{ScoreMatrix}{RleList,GRanges}(target,windows,strand.aware) \S4method{ScoreMatrix}{GRanges,GRanges}(target, windows, strand.aware, weight.col, is.noCovNA) \S4method{ScoreMatrix}{character,GRanges}(target, windows, strand.aware, weight.col=NULL,is.noCovNA=FALSE, type='auto', rpm=FALSE, unique=FALSE, extend=0, param=NULL, bam.paired.end=FALSE, library.size=NULL)
target |
|
windows |
|
strand.aware |
If TRUE (default: FALSE), the strands of the
windows will be taken into account in the resulting
|
weight.col |
if the object is |
is.noCovNA |
(Default:FALSE) if TRUE,and if 'target' is a GRanges object with 'weight.col' provided, the bases that are uncovered will be preserved as NA in the returned object. This useful for situations where you can not have coverage all over the genome, such as CpG methylation values. |
type |
(Default:"auto") if target is a character vector of file paths, then type designates the type of the corresponding files (bam or bigWig). |
rpm |
boolean telling whether to normalize the coverage to per milion
reads. FALSE by default. See |
unique |
boolean which tells the function to remove duplicated reads based on chr, start, end and strand |
extend |
numeric which tells the function to extend the reads to width=extend |
param |
ScanBamParam object |
bam.paired.end |
boolean indicating whether given BAM file contains paired-end reads (default:FALSE). Paired-reads will be treated as fragments. |
library.size |
numeric indicating total number of mapped reads in a BAM file
( |
returns a ScoreMatrix
object
We assume that a paired-end BAM file contains reads with unique ids and we remove
both mates of reads if they are repeated. Due to the fact that ScoreMatrix
uses the GenomicAlignments:readGAlignmentPairs function to read paired-end BAM files
a duplication of reads occurs when mates of one pair map into two different windows.
Strands of reads in a paired-end BAM are inferred depending on strand of first alignment from the pair. This is a default setting in the GenomicAlignments:readGAlignmentPairs function (see a strandMode argument). This mode should be used when the paired-end data was generated using one of the following stranded protocols: Directional Illumina (Ligation), Standard SOLiD.
# When target is GRanges data(cage) data(promoters) scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE, weight.col="tpm") # When target is RleList library(GenomicRanges) covs = coverage(cage) scores2 = ScoreMatrix(target=covs,windows=promoters,strand.aware=TRUE) scores2 # When target is a bam file bam.file = system.file('unitTests/test.bam', package='genomation') windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5)) scores3 = ScoreMatrix(target=bam.file,windows=windows, type='bam') scores3
# When target is GRanges data(cage) data(promoters) scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE, weight.col="tpm") # When target is RleList library(GenomicRanges) covs = coverage(cage) scores2 = ScoreMatrix(target=covs,windows=promoters,strand.aware=TRUE) scores2 # When target is a bam file bam.file = system.file('unitTests/test.bam', package='genomation') windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5)) scores3 = ScoreMatrix(target=bam.file,windows=windows, type='bam') scores3
ScoreMatrix
function resultsThe resulting object is an extension of a matrix
object, and stores values (typically genome-wide scores) for a predefined set of regions
Each row on the ScoreMatrix is a predefined region (Ex: CpG islands, promoters) and columns are values across those regions.
see ScoreMatrix
as(from, "matrix"): Creates a matrix from ScoreMatrix
object. You can also use S3Part() function to extract the matrix from ScoreMatrix
object.
In the code snippets below, x is a ScoreMatrix object.
'x[i,j]'
: Get or set elements from row i
and column j
and return a subset ScoreMatrix object.
The function first bins each window to equal number of bins, and calculates
the a summary matrix for scores of each bin (currently, mean, max and min supported)
A scoreMatrix object can be used to draw average profiles or heatmap of read coverage
or wig track-like data. windows
can be a predefined region such as CpG islands,
gene bodies, transcripts or CDS (coding sequences) that are not necessarily equi-width.
Each window will be chopped to equal number of bins based on bin.num
option.
ScoreMatrixBin(target, windows, bin.num = 10, bin.op = "mean", strand.aware = FALSE, weight.col = NULL, is.noCovNA = FALSE, type = "auto", rpm = FALSE, unique = FALSE, extend = 0, param = NULL, bam.paired.end = FALSE, library.size = NULL) \S4method{ScoreMatrixBin}{RleList,GRanges}(target, windows, bin.num, bin.op, strand.aware) \S4method{ScoreMatrixBin}{GRanges,GRanges}(target,windows, bin.num,bin.op, strand.aware,weight.col, is.noCovNA) \S4method{ScoreMatrixBin}{character,GRanges}(target, windows, bin.num=10, bin.op='mean',strand.aware, is.noCovNA=FALSE, type='auto', rpm, unique, extend, param, bam.paired.end=FALSE, library.size=NULL) \S4method{ScoreMatrixBin}{RleList,GRangesList}(target,windows, bin.num, bin.op, strand.aware) \S4method{ScoreMatrixBin}{GRanges,GRangesList}(target,windows, bin.num,bin.op, strand.aware,weight.col, is.noCovNA) \S4method{ScoreMatrixBin}{character,GRangesList}(target, windows, bin.num=10, bin.op='mean',strand.aware, weight.col=NULL, is.noCovNA=FALSE, type='auto', rpm, unique, extend, param, bam.paired.end=FALSE, library.size=NULL)
ScoreMatrixBin(target, windows, bin.num = 10, bin.op = "mean", strand.aware = FALSE, weight.col = NULL, is.noCovNA = FALSE, type = "auto", rpm = FALSE, unique = FALSE, extend = 0, param = NULL, bam.paired.end = FALSE, library.size = NULL) \S4method{ScoreMatrixBin}{RleList,GRanges}(target, windows, bin.num, bin.op, strand.aware) \S4method{ScoreMatrixBin}{GRanges,GRanges}(target,windows, bin.num,bin.op, strand.aware,weight.col, is.noCovNA) \S4method{ScoreMatrixBin}{character,GRanges}(target, windows, bin.num=10, bin.op='mean',strand.aware, is.noCovNA=FALSE, type='auto', rpm, unique, extend, param, bam.paired.end=FALSE, library.size=NULL) \S4method{ScoreMatrixBin}{RleList,GRangesList}(target,windows, bin.num, bin.op, strand.aware) \S4method{ScoreMatrixBin}{GRanges,GRangesList}(target,windows, bin.num,bin.op, strand.aware,weight.col, is.noCovNA) \S4method{ScoreMatrixBin}{character,GRangesList}(target, windows, bin.num=10, bin.op='mean',strand.aware, weight.col=NULL, is.noCovNA=FALSE, type='auto', rpm, unique, extend, param, bam.paired.end=FALSE, library.size=NULL)
target |
|
windows |
|
bin.num |
single |
bin.op |
bin operation that is either one of the following strings: "max","min","mean","median","sum". The operation is applied on the values in the bin. Defaults to "mean" |
strand.aware |
If TRUE (default: FALSE), the strands of the windows will
be taken into account in the resulting |
weight.col |
if the object is |
is.noCovNA |
(Default:FALSE) if TRUE,and if 'target' is a GRanges object with 'weight.col' provided, the bases that are uncovered will be preserved as NA in the returned object. This useful for situations where you can not have coverage all over the genome, such as CpG methylation values. |
type |
(Default:"auto") if target is a character vector of file paths, then type designates the type of the corresponding files (bam or bigWig) |
rpm |
boolean telling whether to normalize the coverage to per milion reads.
FALSE by default. See |
unique |
boolean which tells the function to remove duplicated reads based on chr, start, end and strand |
extend |
numeric which tells the function to extend the reads to width=extend |
param |
ScanBamParam object |
bam.paired.end |
boolean indicating whether given BAM file contains paired-end reads (default:FALSE). Paired-reads will be treated as fragments. |
library.size |
numeric indicating total number of mapped reads in a BAM file
( |
returns a scoreMatrix
object
data(cage) data(cpgi) data(promoters) myMat=ScoreMatrixBin(target=cage, windows=cpgi,bin.num=10,bin.op="mean",weight.col="tpm") plot(colMeans(myMat,na.rm=TRUE),type="l") myMat2=ScoreMatrixBin(target=cage, windows=promoters,bin.num=10,bin.op="mean", weight.col="tpm",strand.aware=TRUE) plot(colMeans(myMat2,na.rm=TRUE),type="l") # Compute transcript coverage of a set of exons. library(GenomicRanges) bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) transcripts = split(gene.parts$exons, gene.parts$exons$name) transcripts = transcripts[] myMat3 = ScoreMatrixBin(target=cage, windows=transcripts[1:250], bin.num=10) myMat3
data(cage) data(cpgi) data(promoters) myMat=ScoreMatrixBin(target=cage, windows=cpgi,bin.num=10,bin.op="mean",weight.col="tpm") plot(colMeans(myMat,na.rm=TRUE),type="l") myMat2=ScoreMatrixBin(target=cage, windows=promoters,bin.num=10,bin.op="mean", weight.col="tpm",strand.aware=TRUE) plot(colMeans(myMat2,na.rm=TRUE),type="l") # Compute transcript coverage of a set of exons. library(GenomicRanges) bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation") gene.parts = readTranscriptFeatures(bed.file) transcripts = split(gene.parts$exons, gene.parts$exons$name) transcripts = transcripts[] myMat3 = ScoreMatrixBin(target=cage, windows=transcripts[1:250], bin.num=10) myMat3
The function constructs a list of ScoreMatrix
objects in the form
of ScoreMatrixList
object. This object can be visualized using
multiHeatMatrix
, heatMeta
or plotMeta
ScoreMatrixList(targets, windows = NULL, bin.num = NULL, bin.op = "mean", strand.aware = FALSE, weight.col = NULL, is.noCovNA = FALSE, type = "auto", rpm = FALSE, unique = FALSE, extend = 0, param = NULL, library.size = NULL, cores = 1)
ScoreMatrixList(targets, windows = NULL, bin.num = NULL, bin.op = "mean", strand.aware = FALSE, weight.col = NULL, is.noCovNA = FALSE, type = "auto", rpm = FALSE, unique = FALSE, extend = 0, param = NULL, library.size = NULL, cores = 1)
targets |
can be a list of |
windows |
|
bin.num |
an integer telling the number of bins to bin the score matrix |
bin.op |
an name of the function that will be used for smoothing windows of ranges |
strand.aware |
a boolean telling the function whether to reverse the coverage of ranges that come from - strand (e.g. when plotting enrichment around transcription start sites) |
weight.col |
if the object is |
is.noCovNA |
(Default:FALSE) if TRUE,and if 'targets' is a GRanges object with 'weight.col' provided, the bases that are uncovered will be preserved as NA in the returned object. This useful for situations where you can not have coverage all over the genome, such as CpG methylation values. |
type |
(Default:"auto")
if |
rpm |
boolean telling whether to normalize the coverage to per milion reads.
FALSE by default. See |
unique |
boolean which tells the function to remove duplicated reads based on chr, start, end and strand |
extend |
numeric which tells the function to extend the features ( i.e aligned reads) to total length ofwidth+extend |
param |
ScanBamParam object |
library.size |
a numeric vector of the same length as |
cores |
the number of cores to use (default: 1) |
returns a ScoreMatrixList
object
# visualize the distribution of cage clusters and cpg islands around promoters library(GenomicRanges) data(cage) data(cpgi) data(promoters) cage$tpm = NULL targets = GRangesList(cage=cage, cpgi=cpgi) sml = ScoreMatrixList(targets, promoters, bin.num=10, strand.aware=TRUE) sml multiHeatMatrix(sml)
# visualize the distribution of cage clusters and cpg islands around promoters library(GenomicRanges) data(cage) data(cpgi) data(promoters) cage$tpm = NULL targets = GRangesList(cage=cage, cpgi=cpgi) sml = ScoreMatrixList(targets, promoters, bin.num=10, strand.aware=TRUE) sml multiHeatMatrix(sml)
ScoreMatrixList
The resulting object is an extension of a list
object, where each element corresponds to a score matrix object
see ScoreMatrixList
as(from, "ScoreMatrixList"): Creates a ScoreMatrixList
object from a list containing ScoreMatrix
or ScoreMatrixBin
objects.
In the code snippets below, x is a ScoreMatrixList object.
x[[i]]
,x[[i]]
: Get or set elements i
, where i
is a numeric or character vector of length 1.
x$name
, x$name
: value: Get or set element name
, where name
is a name or character vector of length 1.
show method for some of the genomation classes
## S4 method for signature 'RandomEnrichment' show(object) ## S4 method for signature 'AnnotationByGeneParts' show(object) ## S4 method for signature 'AnnotationByFeature' show(object) ## S4 method for signature 'ScoreMatrix' show(object) ## S4 method for signature 'ScoreMatrixList' show(object)
## S4 method for signature 'RandomEnrichment' show(object) ## S4 method for signature 'AnnotationByGeneParts' show(object) ## S4 method for signature 'AnnotationByFeature' show(object) ## S4 method for signature 'ScoreMatrix' show(object) ## S4 method for signature 'ScoreMatrixList' show(object)
object |
object of class RandomEnrichment |
Shows the dimension of the ScoreMatrix
Shows the number of matrices and their sizes