Package 'genomation'

Title:	Summary, annotation and visualization of genomic data
Description:	A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input.
Authors:	Altuna Akalin [aut, cre], Vedran Franke [aut, cre], Katarzyna Wreczycka [aut], Alexander Gosdschan [ctb], Liz Ing-Simmons [ctb], Bozena Mika-Gospodorz [ctb]
Maintainer:	Altuna Akalin <[email protected]>, Vedran Franke <[email protected]>, Katarzyna Wreczycka <[email protected]>
License:	Artistic-2.0
Version:	1.39.0
Built:	2025-03-29 05:19:01 UTC
Source:	https://github.com/bioc/genomation

Help Index

Extract method for a ScoreMatrix object.
Extract method for a ScoreMatrixList object.
Function to annotate given GRanges object with a given genomic feature
Function to annotate a given GRanges object with promoter,exon,intron & intergenic values
Annotate given ranges with genomic features
Annotate given object with promoter, exon, intron and intergenic regions
An S4 class that information on overlap of target features with annotation features
An S4 class that information on overlap of target features with annotation features
Bins the columns of a matrix using a user provided function
c.ScoreMatrix
c.ScoreMatrixList
Example CAGE data set.
function that calculates the significance of overlaps of two sets of features using randomization
convert a data frame read-in from a bed file to a GRanges object for exons
convert a data frame read-in from a bed file to a GRanges object for introns
convert a data frame read-in from a bed file to a GRanges object
Example CpG island data set.
Compute an enrichment of IP over control both stored in ScoreMatrix objects
Compute an enrichment of IP (stored in ScoreMatrixList object) over control (stored in ScoreMatrix object)
Compute an enrichment of IP over control both stored in ScoreMatrixList objects
Find combitations of genomic features
Example RefSeq genes data set.
Get distance to nearest TSS and gene id from AnnotationByGeneParts
Get the percentage/count of annotation features overlapping with target features from AnnotationByFeature
Function to get upstream and downstream adjecent regions to a genomic feature such as CpG islands
Get the membership slot of AnnotationByFeature
get enrichment based on randomized feature overlap
Get the percentage of target features overlapping with annotation from AnnotationByFeature
Converts a gff formated data.frame into a GenomicRanges object. The GenomicRanges object needs to be properly formated for the function to work.
Draw a heatmap of a given ScoreMatrix object
Heatmap for meta-region profiles
Plots the percentage of overlapping intervals with genomic features in a heatmap
Get common rows from all matrices in a ScoreMatrixList object
Draw multiple heatmaps from a ScoreMatrixList object
Ops method for a ScoreMatrixList object. It enables to use arithmetic, indicator and logic operations on ScoreMatrixList objects.
Ops method for a ScoreMatrix object. It enables to use arithmetic, indicator and logic operations on ScoreMatrix objects.
Ops method for a ScoreMatrixList object. It enables to use arithmetic, indicator and logic operations on ScoreMatrixList objects.
Ops method for a ScoreMatrixList object. It enables to use arithmetic, indicator and logic operations on ScoreMatrixList objects.
Reorder all elements of a ScoreMatrixList to a given ordering vector
Get scores that correspond to k-mer or PWM matrix occurrence for bases in each window
Line plot(s) for meta-region profiles
Plot annotation categories from AnnotationByGeneParts or AnnotationByFeature
Example promoter data set.
An S4 class for storing getRandomEnrichment function results
function that randomizes the genomic coordinates
Read a BED file and convert it to GRanges.
A function to read the Encode formatted broad peak file into a GRanges object
A function to read-in genomic features and their upstream and downstream adjecent regions such as CpG islands and their shores
Read a tabular file and convert it to GRanges.
A function to read the Encode formatted narrowPeak file into a GRanges object
Function for reading exon intron and promoter structure from a given bed file
Scales the values in the matrix by rows and/or columns
Scale the ScoreMatrixList
Get base-pair score for bases in each window
An S4 class for storing ScoreMatrix function results
Get bin score for bins on each window
Make ScoreMatrixList from multiple targets
An S4 class for storing a set of ScoreMatrixList
show method for some of the genomation classes

Extract method for a ScoreMatrix object.

Description

Extract method for a ScoreMatrix object.

Usage

## S4 method for signature 'ScoreMatrix,ANY,ANY,ANY'
x[i, j]
## S4 method for signature 'ScoreMatrix,ANY,ANY,ANY'
x[i, j]

Arguments

`x`	the `ScoreMatrix` object
`i`	numeric value
`j`	numeric value

Extract method for a ScoreMatrixList object.

Description

Extract method for a ScoreMatrixList object.

Usage

## S4 method for signature 'ScoreMatrixList,ANY,ANY,ANY'
x[i]
## S4 method for signature 'ScoreMatrixList,ANY,ANY,ANY'
x[i]

Arguments

`x`	the `ScoreMatrixList` object
`i`	numeric value

Function to annotate given GRanges object with a given genomic feature

Description

Function to annotate given GRanges object with a given genomic feature

Usage

annotateWithFeature(target, feature, strand = FALSE, extend = 0,
  feature.name = NULL, intersect.chr = FALSE)

## S4 method for signature 'GRanges,GRanges'
annotateWithFeature(target, feature,
  strand = FALSE, extend = 0, feature.name = NULL,
  intersect.chr = FALSE)
annotateWithFeature(target, feature, strand = FALSE, extend = 0,
  feature.name = NULL, intersect.chr = FALSE)

## S4 method for signature 'GRanges,GRanges'
annotateWithFeature(target, feature,
  strand = FALSE, extend = 0, feature.name = NULL,
  intersect.chr = FALSE)

Arguments

`target`	a GRanges object storing chromosome locations to be annotated
`feature`	a GRanges object storing chromosome locations of a feature (can be CpG islands, ChIP-seq peaks, etc)
`strand`	If set to TRUE, annotation features and target features will be overlapped based on strand (def:FAULT)
`extend`	specifiying a positive value will extend the feature on both sides as much as `extend`
`feature.name`	name of the annotation feature. For example: H3K4me1,CpGisland etc. by default the name is taken from the given variable
`intersect.chr`	boolean, whether to select only chromosomes that are common to feature and target. FALSE by default

Value

returns an AnnotationByFeature object

Examples

data(cpgi)
data(promoters)
annot = annotateWithFeature(cpgi, promoters)


data(cpgi)
data(promoters)
annot = annotateWithFeature(cpgi, promoters)

Function to annotate a given GRanges object with promoter,exon,intron & intergenic values

Description

Function to annotate a given GRanges object with promoter,exon,intron & intergenic values

Usage

annotateWithFeatureFlank(target, feature, flank, feature.name = NULL,
  flank.name = "flank", strand = FALSE, intersect.chr = FALSE)

## S4 method for signature 'GRanges,GRanges,GRanges'
annotateWithFeatureFlank(target, feature,
  flank, feature.name = NULL, flank.name = "flank", strand = FALSE,
  intersect.chr = FALSE)
annotateWithFeatureFlank(target, feature, flank, feature.name = NULL,
  flank.name = "flank", strand = FALSE, intersect.chr = FALSE)

## S4 method for signature 'GRanges,GRanges,GRanges'
annotateWithFeatureFlank(target, feature,
  flank, feature.name = NULL, flank.name = "flank", strand = FALSE,
  intersect.chr = FALSE)

Arguments

`target`	a granges object storing chromosome locations to be annotated
`feature`	a granges object storing chromosome locations of a feature (can be CpG islands, ChIP-seq peaks, etc)
`flank`	a granges object storing chromosome locations of the flanks of the feature
`feature.name`	string for the name of the feature
`flank.name`	string for the name of the flanks
`strand`	If set to TRUE, annotation features and target features will be overlapped based on strand (def:FAULT)
`intersect.chr`	boolean, whether to select only chromosomes that are common to feature and target. FALSE by default

Value

returns an AnnotationByFeature object

Examples

data(cpgi)
data(cage)
cpgi.flanks = getFlanks(cpgi)
flank.annot = annotateWithFeatureFlank(cage, cpgi, cpgi.flanks)

data(cpgi)
data(cage)
cpgi.flanks = getFlanks(cpgi)
flank.annot = annotateWithFeatureFlank(cage, cpgi, cpgi.flanks)

Annotate given ranges with genomic features

Description

The function annotates a target GRangesList or GRanges object as overlapping or not with the elements of a named GRangesList. This is useful to annotate your regions of interest with genomic features with arbitrary categories such as repeat classes or families, or output from genome segmentation alogorithms such as chromHMM.

Usage

annotateWithFeatures(target, features, strand.aware = FALSE,
  intersect.chr = FALSE)

## S4 method for signature 'GRanges,GRangesList'
annotateWithFeatures(target, features,
  strand.aware = FALSE, intersect.chr = FALSE)

## S4 method for signature 'GRangesList,GRangesList'
annotateWithFeatures(target, features,
  strand.aware = FALSE, intersect.chr = FALSE)
annotateWithFeatures(target, features, strand.aware = FALSE,
  intersect.chr = FALSE)

## S4 method for signature 'GRanges,GRangesList'
annotateWithFeatures(target, features,
  strand.aware = FALSE, intersect.chr = FALSE)

## S4 method for signature 'GRangesList,GRangesList'
annotateWithFeatures(target, features,
  strand.aware = FALSE, intersect.chr = FALSE)

Arguments

`target`	GRanges or GRangesList object storing chromosome locations to be annotated (e.g. chipseq peaks)
`features`	a named GRangesList object containing GRanges objects different set of features. The function calculates percent overlaps with and without precendence at the same time. The order of objects in GRangesList defines their precedence. If a range in `target` overlaps with a more precedent range in an element of `features`, the other overlaps from other less precedent elments will be discarded. This is useful for getting piecharts where percentages should add up to 100.
`strand.aware`	if set to TRUE, annotation features and target features will be overlapped based on strand (def:FALSE)
`intersect.chr`	logical value, whether to select only chromosomes that are common to feature and target. FALSE by default

Value

returns an AnnotationByFeature object or if target is a GRangesList, a list of AnnotationByFeature objects.

Examples

library(GenomicRanges)
data(cage)
data(cpgi)
cage$tpm=NULL
gl = GRangesList(cage=cage, cpgi=cpgi)

bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
annot = annotateWithFeatures(gl, gene.parts, intersect.chr=TRUE)

library(GenomicRanges)
data(cage)
data(cpgi)
cage$tpm=NULL
gl = GRangesList(cage=cage, cpgi=cpgi)

bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
annot = annotateWithFeatures(gl, gene.parts, intersect.chr=TRUE)

Annotate given object with promoter, exon, intron and intergenic regions

Description

The function annotates GRangesList or GRanges object as overlapping with promoter,exon,intron or intergenic regions.

Usage

annotateWithGeneParts(target, feature, strand = FALSE,
  intersect.chr = FALSE)

## S4 method for signature 'GRanges,GRangesList'
annotateWithGeneParts(target, feature,
  strand = FALSE, intersect.chr = FALSE)

## S4 method for signature 'GRangesList,GRangesList'
annotateWithGeneParts(target, feature,
  strand = FALSE, intersect.chr = FALSE)
annotateWithGeneParts(target, feature, strand = FALSE,
  intersect.chr = FALSE)

## S4 method for signature 'GRanges,GRangesList'
annotateWithGeneParts(target, feature,
  strand = FALSE, intersect.chr = FALSE)

## S4 method for signature 'GRangesList,GRangesList'
annotateWithGeneParts(target, feature,
  strand = FALSE, intersect.chr = FALSE)

Arguments

`target`	`GRanges` or `GRangesList` object storing chromosome locations to be annotated (e.g. chipseq peaks)
`feature`	GRangesList object containing GRanges object for promoter, exons, introns and transcription start sites, or simply output of readTranscriptFeatures function
`strand`	If set to TRUE, annotation features and target features will be overlapped based on strand (def:FALSE)
`intersect.chr`	boolean, whether to select only chromosomes that are common to feature and target. FALSE by default

Value

AnnotationByGeneParts object or a list of AnnotationByGeneParts objects if target is a GRangesList object.

Examples

data(cage)
bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
cage.annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE)
cage.annot
data(cage)
bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
cage.annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE)
cage.annot

An S4 class that information on overlap of target features with annotation features

Description

This object is desgined to hold statistics and information about genomic feature overlaps

Slots

members: a matrix showing overlap of target features with annotation genomic features
annotation: a named vector of percentages
precedence: a named vector of percentages
num.annotation: vector
num.precedence: vector
no.of.OlapFeat: vector
perc.of.OlapFeat: vector

An S4 class that information on overlap of target features with annotation features

Description

This object is desgined to hold statistics and information about genomic feature overlaps

Slots

members: a matrix showing overlap of target features with annotation genomic features
annotation: a named vector of percentages
precedence: a named vector of percentages
num.annotation: vector
num.precedence: vector
no.of.OlapFeat: vector
perc.of.OlapFeat: vector
dist.to.TSS: a data frame showing distances to TSS and gene/TSS names and strand

Bins the columns of a matrix using a user provided function

Description

Bins the columns of a matrix using a user provided function

Usage

binMatrix(x, bin.num = NULL, fun = "mean")

## S4 method for signature 'ScoreMatrix'
binMatrix(x, bin.num = NULL, fun = "mean")

## S4 method for signature 'ScoreMatrixList'
binMatrix(x, bin.num = NULL, fun = "mean")
binMatrix(x, bin.num = NULL, fun = "mean")

## S4 method for signature 'ScoreMatrix'
binMatrix(x, bin.num = NULL, fun = "mean")

## S4 method for signature 'ScoreMatrixList'
binMatrix(x, bin.num = NULL, fun = "mean")

Arguments

`x`	`ScoreMatrix` or a `ScoreMatrixList` object
`bin.num`	`integer` number of bins in the final matrix
`fun`	`character` vector or an anonymous function that will be used for binning

Value

ScoreMatrix or ScoreMatrixList object

Examples


# binning the columns in a ScoreMatrix object
library(GenomicRanges)
target = GRanges(rep(c(1,2),each=7), IRanges(rep(c(1,1,2,3,7,8,9), times=2), width=5), 
weight = rep(c(1,2),each=7), 
strand=c('-', '-', '-', '-', '+', '-', '+', '-', '-', '-', '-', '-', '-', '+'))
windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5), 
strand=c('-','+','-','+'))
sm = ScoreMatrix(target, windows)
bin = binMatrix(sm, bin.num=2)

# binning the columns in a ScoreMatrix object
library(GenomicRanges)
target = GRanges(rep(c(1,2),each=7), IRanges(rep(c(1,1,2,3,7,8,9), times=2), width=5), 
weight = rep(c(1,2),each=7), 
strand=c('-', '-', '-', '-', '+', '-', '+', '-', '-', '-', '-', '-', '-', '+'))
windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5), 
strand=c('-','+','-','+'))
sm = ScoreMatrix(target, windows)
bin = binMatrix(sm, bin.num=2)

c.ScoreMatrix

Description

Combine a scoreMatrix into a scoreMatrixList object - when a ScoreMatrix is a first argument

Usage

## S3 method for class 'ScoreMatrix'
c(..., recursive = FALSE, use.names = TRUE)
## S3 method for class 'ScoreMatrix'
c(..., recursive = FALSE, use.names = TRUE)

Arguments

`...`	contais scoreMatrix and scoreMatrixList objects
`recursive`	logical
`use.names`	logical

Value

returns a scoreMatrixList object

c.ScoreMatrixList

Description

Combine a scoreMatrix into a scoreMatrixList object - when a ScoreMatrixList is a first argument

Usage

## S3 method for class 'ScoreMatrixList'
c(..., recursive = FALSE, use.names = TRUE)
## S3 method for class 'ScoreMatrixList'
c(..., recursive = FALSE, use.names = TRUE)

Arguments

`...`	contais scoreMatrix and scoreMatrixList objects
`recursive`	logical
`use.names`	logical

Value

returns a scoreMatrixList object

Example CAGE data set.

Description

Location and tag per million values for CAGE TSS clusters on chr21 and chr22 of human genome (hg19 assembly). The clusters are dowloaded from ENCODE project downloads for NHEK cells.

Format

GRanges object

function that calculates the significance of overlaps of two sets of features using randomization

Description

This function calculates the significance of overlaps of two sets of features using randomization. #' It returns a distributon of overlaps of a target set with a given randomized feature set. The randomization can be constrained by supplied arguments. The function is still in Beta mode - the regions can overlap excluded regions, and the randomized regions are not disjoint. Please take care that the excluded and included regions are not too strict when compared to the total width of the ranges.

Usage

calculateOverlapSignificance(target, feature, chrom.sizes = NULL,
  stranded = TRUE, keep.strand.prop = TRUE, keep.chrom = TRUE,
  exclude = NULL, include = NULL, seed = NULL, nrand = 1)

## S4 method for signature 'GRanges,GRanges'
calculateOverlapSignificance(target, feature,
  chrom.sizes = NULL, stranded = TRUE, keep.strand.prop = TRUE,
  keep.chrom = TRUE, exclude = NULL, include = NULL, seed = NULL,
  nrand = 1)
calculateOverlapSignificance(target, feature, chrom.sizes = NULL,
  stranded = TRUE, keep.strand.prop = TRUE, keep.chrom = TRUE,
  exclude = NULL, include = NULL, seed = NULL, nrand = 1)

## S4 method for signature 'GRanges,GRanges'
calculateOverlapSignificance(target, feature,
  chrom.sizes = NULL, stranded = TRUE, keep.strand.prop = TRUE,
  keep.chrom = TRUE, exclude = NULL, include = NULL, seed = NULL,
  nrand = 1)

Arguments

`target`	a GRanges object for which the overlap needs to be calculates
`feature`	a GRanges object to be randomized
`chrom.sizes`	sizes of chromosomes as a named vector (names are chromsomes names and elements of the vectors are lengths). , if not given sizes in GRanges object will be used if no sizes there the end of each chr will be the end last feature on each chr
`stranded`	if FALSE, all of the returned features will be strandless (will have "*" in the strand slot)
`keep.strand.prop`	If TRUE strands will have the same proportion as the features
`keep.chrom`	If TRUE, number of features and randomized features for a chromosome will match. Currently seeting this to FALSE is not supported.
`exclude`	A GRanges object where no randomized feature should overlap, can be gaps or unmappable regions in the genome as an example.
`include`	A GRanges object which defines the boundaries of randomized features
`seed`	random number generator seed
`nrand`	number of randomizations (default:1)

Value

returns a GRanges object which is randomized version of the feature

convert a data frame read-in from a bed file to a GRanges object for exons

Description

convert a data frame read-in from a bed file to a GRanges object for exons

Usage

convertBed2Exons(bed.df)

## S4 method for signature 'data.frame'
convertBed2Exons(bed.df)
convertBed2Exons(bed.df)

## S4 method for signature 'data.frame'
convertBed2Exons(bed.df)

Arguments

bed.df

a data.frame where column order and content resembles a bed file with 12 columns

Value

GRanges object

Note

one bed track per file is only accepted, the bed files with multiple tracks will cause en error

Examples

file = system.file('extdata/chr21.refseq.hg19.bed', package='genomation')
bed12 = read.table(file)
exons = convertBed2Exons(bed12)
head(exons)

file = system.file('extdata/chr21.refseq.hg19.bed', package='genomation')
bed12 = read.table(file)
exons = convertBed2Exons(bed12)
head(exons)

convert a data frame read-in from a bed file to a GRanges object for introns

Description

convert a data frame read-in from a bed file to a GRanges object for introns

Usage

convertBed2Introns(bed.df)

## S4 method for signature 'data.frame'
convertBed2Introns(bed.df)
convertBed2Introns(bed.df)

## S4 method for signature 'data.frame'
convertBed2Introns(bed.df)

Arguments

bed.df

a data.frame where column order and content resembles a bed file with 12 columns

Value

GRanges object

Note

one bed track per file is only accepted, the bed files with multiple tracks will cause en error

Examples

file = system.file('extdata/chr21.refseq.hg19.bed', package='genomation')
bed12 = read.table(file)
introns = convertBed2Introns(bed12)
head(introns)

file = system.file('extdata/chr21.refseq.hg19.bed', package='genomation')
bed12 = read.table(file)
introns = convertBed2Introns(bed12)
head(introns)

convert a data frame read-in from a bed file to a GRanges object

Description

convert a data frame read-in from a bed file to a GRanges object

Usage

convertBedDf(bed)

## S4 method for signature 'data.frame'
convertBedDf(bed)
convertBedDf(bed)

## S4 method for signature 'data.frame'
convertBedDf(bed)

Arguments

bed

a data.frame where column order and content resembles a bed file with 12 columns

Value

GRanges object

Note

one bed track per file is only accepted, the bed files with multiple tracks will cause en error

bed files are expected not to have header lines

Example CpG island data set.

Description

CpG islands of hg19 assembly of human genome on chr21 and chr22. Downloaded from UCSC genome browser.

Format

GRanges object

Compute an enrichment of IP over control both stored in ScoreMatrix objects

Description

This is an enrichmentMatrix function for ScoreMatrix objects, that enables to normalize ChIP-seq signals with respect to IgG or input DNA control.

Usage

\S4method{enrichmentMatrix}{ScoreMatrix,ScoreMatrix}(IP, control)
\S4method{enrichmentMatrix}{ScoreMatrix,ScoreMatrix}(IP, control)

Arguments

`IP`	`ScoreMatrix` object storing an IP sample
`control`	`ScoreMatrix` object storing a control sample

Value

ScoreMatrix object

Note

The function computes an enrichment of IP over control as follow: Suppose both IP and control are ScoreMatrix objects that have same dimensions. Then, the enrichment is calculated usign a formula: log2((IP + 1) / (control + 1)).

Examples

 
#load IP and control BAM files and create ScoreMatrix objects
library('genomationData')
bam.file_IP <- system.file("extdata", 
"wgEncodeBroadHistoneH1hescSuz12051317AlnRep1.chr21.bam", package = "genomationData")
bam.file_c <- system.file("extdata", 
"wgEncodeBroadHistoneH1hescCtcfStdAlnRep1.chr21.bam", package = "genomationData")
data(promoters)
IP <- ScoreMatrix(target = bam.file_IP, windows = promoters, type = 'bam')
control <- ScoreMatrix(target = bam.file_c, windows = promoters, type = 'bam')

# compute an enrichment of IP over control
enrichmentMatrix(IP, control)

#load IP and control BAM files and create ScoreMatrix objects
library('genomationData')
bam.file_IP <- system.file("extdata", 
"wgEncodeBroadHistoneH1hescSuz12051317AlnRep1.chr21.bam", package = "genomationData")
bam.file_c <- system.file("extdata", 
"wgEncodeBroadHistoneH1hescCtcfStdAlnRep1.chr21.bam", package = "genomationData")
data(promoters)
IP <- ScoreMatrix(target = bam.file_IP, windows = promoters, type = 'bam')
control <- ScoreMatrix(target = bam.file_c, windows = promoters, type = 'bam')

# compute an enrichment of IP over control
enrichmentMatrix(IP, control)

Compute an enrichment of IP (stored in ScoreMatrixList object) over control (stored in ScoreMatrix object)

Description

This is an enrichmentMatrix function for IP ScoreMatrixList object and control ScoreMatrix object, that enables to normalize ChIP-seq signals with respect to IgG or input DNA control.

Usage

\S4method{enrichmentMatrix}{ScoreMatrixList,ScoreMatrix}(IP, control)
\S4method{enrichmentMatrix}{ScoreMatrixList,ScoreMatrix}(IP, control)

Arguments

`IP`	`ScoreMatrixList` object storing IP samples
`control`	`ScoreMatrix` storing control sample

Value

ScoreMatrixList object

Note

Examples

#load IP and control BAM files and create ScoreMatrix objects
library('genomationData')
data(promoters)
bam.file_IP_1 <- system.file("extdata", 
"wgEncodeSydhTfbsH1hescZnf143IggrabAlnRep1.chr21.bam", package = "genomationData")
IP_1 <- ScoreMatrix(target = bam.file_IP_1, windows = promoters, type = 'bam')

bam.file_IP_2 <- system.file("extdata", 
"wgEncodeBroadHistoneH1hescSuz12051317AlnRep1.chr21.bam", package = "genomationData")
IP_2 <- ScoreMatrix(target=bam.file_IP_2, windows = promoters, type = 'bam')

bam.file_c <- system.file("extdata", 
"wgEncodeBroadHistoneH1hescCtcfStdAlnRep1.chr21.bam", package = "genomationData")
control <- ScoreMatrix(target = bam.file_c, windows = promoters, type = 'bam')

# create a ScoreMatrixList object storing IP ScoreMatrix objects
sml_IP <- ScoreMatrixList(list(IP1 = IP_1, IP2 = IP_2))
 
# compute an enrichment of IP over control
enrichmentMatrix(sml_IP, control)

#load IP and control BAM files and create ScoreMatrix objects
library('genomationData')
data(promoters)
bam.file_IP_1 <- system.file("extdata", 
"wgEncodeSydhTfbsH1hescZnf143IggrabAlnRep1.chr21.bam", package = "genomationData")
IP_1 <- ScoreMatrix(target = bam.file_IP_1, windows = promoters, type = 'bam')

bam.file_IP_2 <- system.file("extdata", 
"wgEncodeBroadHistoneH1hescSuz12051317AlnRep1.chr21.bam", package = "genomationData")
IP_2 <- ScoreMatrix(target=bam.file_IP_2, windows = promoters, type = 'bam')

bam.file_c <- system.file("extdata", 
"wgEncodeBroadHistoneH1hescCtcfStdAlnRep1.chr21.bam", package = "genomationData")
control <- ScoreMatrix(target = bam.file_c, windows = promoters, type = 'bam')

# create a ScoreMatrixList object storing IP ScoreMatrix objects
sml_IP <- ScoreMatrixList(list(IP1 = IP_1, IP2 = IP_2))
 
# compute an enrichment of IP over control
enrichmentMatrix(sml_IP, control)

Compute an enrichment of IP over control both stored in ScoreMatrixList objects

Description

This is an enrichmentMatrix function for ScoreMatrixList objects, that enables to normalize ChIP-seq signals with respect to IgG or input DNA control.

Usage

\S4method{enrichmentMatrix}{ScoreMatrixList,ScoreMatrixList}(IP, control)
\S4method{enrichmentMatrix}{ScoreMatrixList,ScoreMatrixList}(IP, control)

Arguments

`IP`	`ScoreMatrixList` object storing IP samples
`control`	`ScoreMatrixList` storing control samples

Value

ScoreMatrixList object

Note

Examples

#load IP and control BAM files and create ScoreMatrix objects
library('genomationData')
data(promoters)
bam.file_IP_1 <- system.file("extdata", 
"wgEncodeSydhTfbsH1hescZnf143IggrabAlnRep1.chr21.bam", package = "genomationData")
IP_1 <- ScoreMatrix(target = bam.file_IP_1, windows = promoters, type = 'bam')

bam.file_IP_2 <- system.file("extdata", 
"wgEncodeBroadHistoneH1hescSuz12051317AlnRep1.chr21.bam", package = "genomationData")
IP_2 <- ScoreMatrix(target=bam.file_IP_2, windows = promoters, type = 'bam')

bam.file_c <- system.file("extdata", 
"wgEncodeBroadHistoneH1hescCtcfStdAlnRep1.chr21.bam", package = "genomationData")
control <- ScoreMatrix(target = bam.file_c, windows = promoters, type = 'bam')

# create a ScoreMatrixList object storing IP ScoreMatrix objects
sml_IP <- ScoreMatrixList(list(IP1 = IP_1, IP2 = IP_2))

# create a ScoreMatrixList object storing control ScoreMatrix objects
sml_control <- ScoreMatrixList(list(c1 = control, c2 = control))

# compute an enrichment of IP over control
enrichmentMatrix(sml_IP, sml_control)

#load IP and control BAM files and create ScoreMatrix objects
library('genomationData')
data(promoters)
bam.file_IP_1 <- system.file("extdata", 
"wgEncodeSydhTfbsH1hescZnf143IggrabAlnRep1.chr21.bam", package = "genomationData")
IP_1 <- ScoreMatrix(target = bam.file_IP_1, windows = promoters, type = 'bam')

bam.file_IP_2 <- system.file("extdata", 
"wgEncodeBroadHistoneH1hescSuz12051317AlnRep1.chr21.bam", package = "genomationData")
IP_2 <- ScoreMatrix(target=bam.file_IP_2, windows = promoters, type = 'bam')

bam.file_c <- system.file("extdata", 
"wgEncodeBroadHistoneH1hescCtcfStdAlnRep1.chr21.bam", package = "genomationData")
control <- ScoreMatrix(target = bam.file_c, windows = promoters, type = 'bam')

# create a ScoreMatrixList object storing IP ScoreMatrix objects
sml_IP <- ScoreMatrixList(list(IP1 = IP_1, IP2 = IP_2))

# create a ScoreMatrixList object storing control ScoreMatrix objects
sml_control <- ScoreMatrixList(list(c1 = control, c2 = control))

# compute an enrichment of IP over control
enrichmentMatrix(sml_IP, sml_control)

Find combitations of genomic features

Description

Provided a GRangesList, finds the combinations of sets of ranges. It is mostly used to look at the combinatorics of transcription factor binding. The function works by, firstly, constructing a union of all ranges in the list, which are then designated by the combinatorics of overlap with the original sets. A caveat of this approach is that the number of possible combinations increases exponentially, so we would advise you to use it with up to 6 data sets. If you wish to take a look at a greater number of factors, methods like self organizing maps or ChromHMM might be more appropriate.

Usage

findFeatureComb(gl, width=0, use.names=FALSE, collapse.char=':')

## S4 method for signature 'GRangesList'
findFeatureComb(gl, width = 0, use.names = FALSE,
  collapse.char = ":")
findFeatureComb(gl, width=0, use.names=FALSE, collapse.char=':')

## S4 method for signature 'GRangesList'
findFeatureComb(gl, width = 0, use.names = FALSE,
  collapse.char = ":")

Arguments

`gl`	a `GRangesList` object, containing ranges for which represent regions enriched for transcription factor binding
`width`	`integer` is the requested width of each enriched region. If 0 the ranges are not resized, if a positive integer, the width of all ranges is set to that number. Ranges are resized relative to the center of original ranges.
`use.names`	a boolean which tells the function whether to return the resulting ranges with a numeric vector which designates each class (the default), or to construct the names of each class using the names from the GRangesList
`collapse.char`	a character which will be used to separate the class names if use.names=TRUE. The default is ':'

Value

a GRanges object

Examples

library(GenomicRanges)
g = GRanges(paste('chr',rep(1:2, each=3), sep=''), IRanges(rep(c(1,5,9), times=2), width=3))
gl = GRangesList(g1=g, g2=g[2:5], g3=g[3:4])
findFeatureComb(gl)
findFeatureComb(gl, use.names=TRUE)

library(GenomicRanges)
g = GRanges(paste('chr',rep(1:2, each=3), sep=''), IRanges(rep(c(1,5,9), times=2), width=3))
gl = GRangesList(g1=g, g2=g[2:5], g3=g[3:4])
findFeatureComb(gl)
findFeatureComb(gl, use.names=TRUE)

Example RefSeq genes data set.

Description

RefSeq genes of hg19 assembly of human genome on chr21 and chr22. Downloaded from UCSC genome browser.

Format

GRanges object

Get distance to nearest TSS and gene id from AnnotationByGeneParts

Description

This accessor function gets the nearest TSS, its distance to target feature, strand and name of TSS/gene from AnnotationByGeneParts object

Usage

getAssociationWithTSS(x)

## S4 method for signature 'AnnotationByGeneParts'
getAssociationWithTSS(x)
getAssociationWithTSS(x)

## S4 method for signature 'AnnotationByGeneParts'
getAssociationWithTSS(x)

Arguments

`x`	a `AnnotationByGeneParts` object

Value

RETURNS a data.frame containing row number of the target features, distance of target to nearest TSS, TSS/Gene name, TSS strand

Examples

data(cage)
bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
cage.annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE)        
head(getAssociationWithTSS(cage.annot))
 
data(cage)
bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
cage.annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE)        
head(getAssociationWithTSS(cage.annot))

Get the percentage/count of annotation features overlapping with target features from AnnotationByFeature

Description

This function retrieves percentage/number of annotation features overlapping with targets. For example, if AnnotationByFeature object is containing statistics of differentially methylated regions overlapping with gene annotation. This function will return number/percentage of introns,exons and promoters overlapping with differentially methylated regions.

Usage

getFeatsWithTargetsStats(x,percentage=TRUE)

## S4 method for signature 'AnnotationByFeature'
getFeatsWithTargetsStats(x, percentage = TRUE)
getFeatsWithTargetsStats(x,percentage=TRUE)

## S4 method for signature 'AnnotationByFeature'
getFeatsWithTargetsStats(x, percentage = TRUE)

Arguments

`x`	a `AnnotationByFeature` object
`percentage`	TRUE\|FALSE. If TRUE percentage of annotation features will be returned. If FALSE, number of annotation features will be returned

Value

RETURNS a vector of percentages or counts showing quantity of annotation features overlapping with target features

Examples

data(cage)
bed.file=system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
cage.annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE)
getFeatsWithTargetsStats(cage.annot)

data(cage)
bed.file=system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
cage.annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE)
getFeatsWithTargetsStats(cage.annot)

Function to get upstream and downstream adjecent regions to a genomic feature such as CpG islands

Description

Function to get upstream and downstream adjecent regions to a genomic feature such as CpG islands

Usage

getFlanks(grange,flank=2000,clean=TRUE)

## S4 method for signature 'GRanges'
getFlanks(grange, flank = 2000, clean = TRUE)
getFlanks(grange,flank=2000,clean=TRUE)

## S4 method for signature 'GRanges'
getFlanks(grange, flank = 2000, clean = TRUE)

Arguments

`grange`	GRanges object for the feature
`flank`	number of basepairs for the flanking regions
`clean`	If set to TRUE, flanks overlapping with other main features will be trimmed, and overlapping flanks will be removed. This will remove multiple counts when other features overlap with flanks

Value

GRanges object for flanking regions

Examples

data(cpgi)
cpgi.flanks = getFlanks(cpgi)
head(cpgi.flanks)

data(cpgi)
cpgi.flanks = getFlanks(cpgi)
head(cpgi.flanks)

Get the membership slot of AnnotationByFeature

Description

Membership slot defines the overlap of target features with annotation features For example, if a target feature overlaps with an exon

Usage

getMembers(x)

## S4 method for signature 'AnnotationByFeature'
getMembers(x)
getMembers(x)

## S4 method for signature 'AnnotationByFeature'
getMembers(x)

Arguments

`x`	a `AnnotationByFeature` object

Value

matrix showing overlap of target features with annotation features. 1 for overlap, 0 for non-overlap

get enrichment based on randomized feature overlap

Description

This function measures the association between two genomic features by randomizing one feature and counting the overlaps in randomized sets. That is to say, query feature will be randomly distributed over the genome (constrained by provided options), and the overlap of target with these randomized features will be measured.

Usage

getRandomEnrichment(target, query, randomizations = 1000, rand.set = NULL,
  ...)

## S4 method for signature 'GRanges,GRanges'
getRandomEnrichment(target, query,
  randomizations = 1000, rand.set = NULL, ...)
getRandomEnrichment(target, query, randomizations = 1000, rand.set = NULL,
  ...)

## S4 method for signature 'GRanges,GRanges'
getRandomEnrichment(target, query,
  randomizations = 1000, rand.set = NULL, ...)

Arguments

`target`	a `GRanges` object to be overlapped with `query`
`query`	a `GRanges` object that will be randomly placed across the genome and overlap of these random regions with `target` will be the background distribution of association between `target` and `query`.
`randomizations`	number of times the features to be shuffled
`rand.set`	instead of randomly placing features in `query` one can supply an already shuffled set of `query` genomic features.
`...`	other parameters to be passed to `randomizeFeature` function. These parameters ccontrol how randomization is done.

Value

returns a RandomEnrichment object

Examples

data(cage)
data(cpgi)
 
enr = getRandomEnrichment(cage, cpgi, randomizations=50)

data(cage)
data(cpgi)
 
enr = getRandomEnrichment(cage, cpgi, randomizations=50)

Get the percentage of target features overlapping with annotation from AnnotationByFeature

Description

This function retrieves percentage/number of target features overlapping with annotation

Usage

getTargetAnnotationStats(x,percentage=TRUE,precedence=TRUE)

## S4 method for signature 'AnnotationByFeature'
getTargetAnnotationStats(x, percentage = TRUE,
  precedence = TRUE)
getTargetAnnotationStats(x,percentage=TRUE,precedence=TRUE)

## S4 method for signature 'AnnotationByFeature'
getTargetAnnotationStats(x, percentage = TRUE,
  precedence = TRUE)

Arguments

x

a AnnotationByFeature object

percentage

TRUE|FALSE. If TRUE percentage of target features will be returned. If FALSE, number of target features will be returned

precedence

TRUE|FALSE. If TRUE there will be a hierachy of annotation features when calculating numbers (with promoter>exon>intron precedence)

That means if a feature overlaps with a promoter it will be counted as promoter overlapping only, or if it is overlapping with a an exon but not a promoter, #' it will be counted as exon overlapping only whether or not it overlaps with an intron.

Value

a vector of percentages or counts showing quantity of target features overlapping with annotation

Examples

data(cage)
bed.file=system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
cage.annot=annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE)
getTargetAnnotationStats(cage.annot)

data(cage)
bed.file=system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
cage.annot=annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE)
getTargetAnnotationStats(cage.annot)

Converts a gff formated data.frame into a GenomicRanges object. The GenomicRanges object needs to be properly formated for the function to work.

Description

Converts a gff formated data.frame into a GenomicRanges object. The GenomicRanges object needs to be properly formated for the function to work.

Usage

gffToGRanges(gff.file, filter = NULL, zero.based = FALSE, ensembl = FALSE)
gffToGRanges(gff.file, filter = NULL, zero.based = FALSE, ensembl = FALSE)

Arguments

`gff.file`	path to a gff formatted file. The file can end in `.gz`, `.bz2`, `.xz`, or `.zip` and/or start with `http://` or `ftp://`. If the file is not compressed it can also start with `https://` or `ftps://`.
`filter`	a character designating which elements to retain from the gff file (e.g. exon, CDS, ...)
`zero.based`	`boolean` whether the coordinates are 0 or 1 based. 0 is the default
`ensembl`	`boolean` if TRUE, add the chr prefix to seqlevels. FALSE by default

Value

returns a GenomicRanges object

Examples

gff.file = system.file('extdata/chr21.refseq.hg19.gtf', package='genomation')
gff = gffToGRanges(gff.file)

gff.file = system.file('extdata/chr21.refseq.hg19.gtf', package='genomation')
gff = gffToGRanges(gff.file)

Draw a heatmap of a given ScoreMatrix object

Description

The function makes a heatmap out of given ScoreMatrix object. If desired it can use clustering using given clustering function (e.g. k-means) and plot cluster color codes as a sidebar. In addition, user can define groups of rows using 'group' argument.

Usage

heatMatrix(mat, grid = FALSE, col = NULL, xcoords = NULL, group = NULL,
  group.col = NULL, order = FALSE, user.order = FALSE, winsorize = c(0,
  100), clustfun = NULL, main = "", legend.name = NULL, cex.legend = 1,
  xlab = NULL, cex.main = 1, cex.lab = 1, cex.axis = 1,
  newpage = TRUE)
heatMatrix(mat, grid = FALSE, col = NULL, xcoords = NULL, group = NULL,
  group.col = NULL, order = FALSE, user.order = FALSE, winsorize = c(0,
  100), clustfun = NULL, main = "", legend.name = NULL, cex.legend = 1,
  xlab = NULL, cex.main = 1, cex.lab = 1, cex.axis = 1,
  newpage = TRUE)

Arguments

`mat`	a `ScoreMatrix` object
`grid`	if TRUE, grid graphics will be used. if FALSE, base graphics will be used on the top level, so users can use par(mfrow) or par(mfcol) prior to calling the function. Default:FALSE
`col`	a vector of colors, such as the ones created by heat.colors(10). If NULL (which is default), jet color scheme (common in matlab plots) will be used.
`xcoords`	a vector of numbers showing relative positions of the bases or windows. It must match the number of columns in the `ScoreMatrix`. Alternatively, it could be a numeric vector of two elements. Such as c(0,100) showing the relative start and end coordinates of the first and last column of the `ScoreMatrix` object.
`group`	a list of vectors of row numbers or a factor. This grouping is used for rowside colors of the heatmap. If it is a list, each element of the list must be a vector of row numbers. Names of the elements of the list will be used as names of groups. If `group` is a factor , it's length must match the number of rows of the matrix, and factor levels will be used as the names of the groups in the plot.
`group.col`	a vector of color names to be used at the rowside colors if `group` argument is given or `clustfun` function is given.
`order`	Logical indicating if the rows should be ordered or not (Default:FALSE). If `order=TRUE` the matrix will be ordered with rowSums(mat) values in descending order. If `group` argument is provided, first the groups will be ordered in descending order of sums of rows then, everything within the clusters will be ordered by sums of rows. If `clustfun` is given then rows within clusters will be order in descending order of sums of rows.
`user.order`	a numerical vector indicating the order of groups/clusters (it works only when `group` or `clustfun` argument is given).
`winsorize`	Numeric vector of two, defaults to c(0,100). This vector determines the upper and lower percentile values to limit the extreme values. For example, c(0,99) will limit the values to only 99th percentile, everything above the 99 percentile will be equalized to the value of 99th percentile.This is useful for visualization of matrices that have outliers.
`clustfun`	a function for clustering rows of `mat` that returns a vector of integers indicating the cluster to which each point is allocated (a vector of cluster membership), e.g. k-means algorithm with 3 centers: function(x) kmeans(x, centers=3)$cluster. By default FALSE.
`main`	a character string for the plot title
`legend.name`	a character label plotted next to the legend
`cex.legend`	A numerical value giving the amount by which legend axis marks should be magnified relative to the default
`xlab`	label a character string for x-axis of the heatmap
`cex.main`	A numerical value giving the amount by which plot title should be magnified
`cex.lab`	A numerical value giving the amount by which axis labels (including 'legend.name') should be magnified relative to the default.
`cex.axis`	A numerical value giving the amount by which axis marks should be magnified relative to the default
`newpage`	logical indicating if `grid.newpage()` function should be invoked if `grid=TRUE`.

Value

returns clustering result invisibly, if clustfun is definied

Examples


data(cage)
data(promoters)
scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE,
                   weight.col="tpm")

set.seed(1000)

heatMatrix(mat=scores1,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS",
           xcoords=-1000:1000,
           cex.legend=0.8,main="CAGE clusters on promoters",cex.lab=1,
           cex.axis=0.9,grid=FALSE)
           
## examples using clustering functions
## k-means
cl1 <- function(x) kmeans(x, centers=3)$cluster
set.seed(1000)
heatMatrix(mat=scores1,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS",
         xcoords=-1000:1000,clustfun=cl1,
         cex.legend=0.8,main="CAGE clusters on promoters",cex.lab=1,
         cex.axis=0.9,grid=FALSE,
         user.order=c(1,3,2))

## hierarchical clustering
cl2 <- function(x) cutree(hclust(dist(x), method="complete"), k=3)
set.seed(1000)
heatMatrix(mat=scores1,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS",
         xcoords=-1000:1000,clustfun=cl2,
         cex.legend=0.8,main="CAGE clusters on promoters",cex.lab=1,
         cex.axis=0.9,grid=FALSE)



data(cage)
data(promoters)
scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE,
                   weight.col="tpm")

set.seed(1000)

heatMatrix(mat=scores1,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS",
           xcoords=-1000:1000,
           cex.legend=0.8,main="CAGE clusters on promoters",cex.lab=1,
           cex.axis=0.9,grid=FALSE)
           
## examples using clustering functions
## k-means
cl1 <- function(x) kmeans(x, centers=3)$cluster
set.seed(1000)
heatMatrix(mat=scores1,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS",
         xcoords=-1000:1000,clustfun=cl1,
         cex.legend=0.8,main="CAGE clusters on promoters",cex.lab=1,
         cex.axis=0.9,grid=FALSE,
         user.order=c(1,3,2))

## hierarchical clustering
cl2 <- function(x) cutree(hclust(dist(x), method="complete"), k=3)
set.seed(1000)
heatMatrix(mat=scores1,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS",
         xcoords=-1000:1000,clustfun=cl2,
         cex.legend=0.8,main="CAGE clusters on promoters",cex.lab=1,
         cex.axis=0.9,grid=FALSE)

Heatmap for meta-region profiles

Description

Function calculates meta-profile(s) from a ScoreMatrix or a ScoreMatrixList, then produces a heatmap or a set of stacked heatmaps for meta-region profiles

Usage

heatMeta(mat, centralTend = "mean", profile.names = NULL, xcoords = NULL,
  col = NULL, meta.rescale = FALSE, winsorize = c(0, 100),
  legend.name = NULL, cex.legend = 1, xlab = NULL, main = "",
  cex.lab = 1, cex.axis = 1)
heatMeta(mat, centralTend = "mean", profile.names = NULL, xcoords = NULL,
  col = NULL, meta.rescale = FALSE, winsorize = c(0, 100),
  legend.name = NULL, cex.legend = 1, xlab = NULL, main = "",
  cex.lab = 1, cex.axis = 1)

Arguments

`mat`	`ScoreMatrix` or `ScoreMatrixList` to be plotted
`centralTend`	a character that determines central tendency of meta-profile(s). It takes "mean" (default) or "median".
`profile.names`	a character vector for names of profiles. If NULL, the names will be taken from names(mat) if mat is a `ScoreMatrixList` object.
`xcoords`	a vector of numbers showing relative positions of the bases or windows. It must match the number of columns in the `ScoreMatrix` For example: if there are 2001 elements in the matrices which are base-pair resolution and they are centered around an anchor point like TSS, the xcoords argument should be -1000:1000. This argument is used to plot accurate x-axis labels for the plots.If NULL it will be equal to 1:ncol(mat).
`col`	a vector of color pallete. color scheme to be used. If NULL, a version of jet colors will be used.
`meta.rescale`	if TRUE meta-region profiles are scaled to 0 to 1 range by subracting the min from profiles and dividing them by max-min.
`winsorize`	Numeric vector of two, defaults to c(0,100). This vector determines the upper and lower percentile values to limit the extreme values. For example, c(0,99) will limit the values to only 99th percentile, everything above the 99 percentile will be equalized to the value of 99th percentile. This is useful for visualization of matrices that have outliers.
`legend.name`	a character label plotted next to the legend
`cex.legend`	A numerical value giving the amount by which legend axis marks should be magnified relative to the default
`xlab`	label a character string for x-axis
`main`	a character string for the plot title
`cex.lab`	A numerical value giving the amount by which axis labels (including 'legend.name') should be magnified relative to the default.
`cex.axis`	A numerical value giving the amount by which axis marks should be magnified relative to the default

Value

returns meta-profile matrix invisibly.

Examples

 data(cage)
 data(promoters)
 scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE)
 data(cpgi)
 scores2=ScoreMatrix(target=cpgi,windows=promoters,strand.aware=TRUE)

 x=new("ScoreMatrixList",list(scores1,scores2))
 
 heatMeta(mat=x,legend.name="fg",cex.legend=0.8,main="fdf",cex.lab=6,
          cex.axis=0.9)
 
data(cage)
 data(promoters)
 scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE)
 data(cpgi)
 scores2=ScoreMatrix(target=cpgi,windows=promoters,strand.aware=TRUE)

 x=new("ScoreMatrixList",list(scores1,scores2))
 
 heatMeta(mat=x,legend.name="fg",cex.legend=0.8,main="fdf",cex.lab=6,
          cex.axis=0.9)

Plots the percentage of overlapping intervals with genomic features in a heatmap

Description

This function plots a heatmap for percentage of overlapping ranges with provided genomic features. The input object is a list of AnnotationByFeature objects, which contains necessary information about overlap statistics to make the plot.

Usage

heatTargetAnnotation(l, cluster = FALSE, col = c("white", "blue"),
  precedence = FALSE, plot = TRUE)
heatTargetAnnotation(l, cluster = FALSE, col = c("white", "blue"),
  precedence = FALSE, plot = TRUE)

Arguments

`l`	a `list` of `AnnotationByFeature` objects. This could be returned by `annotateWithFeatures` function.
`cluster`	TRUE/FALSE. If TRUE the heatmap is going to be clustered using a default hierarchical clustering scheme.
`col`	a vector of two colors that will be used for interpolation. The first color is the lowest one, the second is the highest one
`precedence`	TRUE\|FALSE. If TRUE the precedence of annotation features will be used when plotting. The precedence will be taken from the GRangesList used as annotation. The first GRanges will be treated as most important, and the second as the second most important and so on. Such that, if an interval overlaps with features on that is part of the first GRanges object in the annotation GRangesList, the rest of its overlaps with other elements in the annotation GRangesList will be ignored. This feature is important to have if the user desires that percentage of overlaps adds up to 100. This is only possible when the annotation features are non-overlaping with eachother or there is a hierarchy/precedence among them such as (with promoter>exon>intron precedence). In this case, anything that overlaps with a promoter annotation will only be counted as promoter even if it overlaps with exons or introns.
`plot`	If FALSE, does not plot the heatmap just returns the matrix used to make the heatmap

Value

returns the matrix used to make the heatmap when plot FALSE, otherwise returns ggplot2 object which can be modified further.

Examples


library(GenomicRanges)
data(cage)
data(cpgi)
cage$tpm=NULL
gl = GRangesList(cage=cage, cpgi=cpgi)

bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
annot = annotateWithFeatures(gl, gene.parts, intersect.chr=TRUE)

            
  
heatTargetAnnotation(annot)


library(GenomicRanges)
data(cage)
data(cpgi)
cage$tpm=NULL
gl = GRangesList(cage=cage, cpgi=cpgi)

bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
annot = annotateWithFeatures(gl, gene.parts, intersect.chr=TRUE)

            
  
heatTargetAnnotation(annot)

Get common rows from all matrices in a ScoreMatrixList object

Description

Returns a intersection of rows for each matrix in a ScoreMatrixList object. This is done using the rownames of each element in the list.

Usage

intersectScoreMatrixList(sml, reorder = FALSE)

## S4 method for signature 'ScoreMatrixList'
intersectScoreMatrixList(sml, reorder = FALSE)
intersectScoreMatrixList(sml, reorder = FALSE)

## S4 method for signature 'ScoreMatrixList'
intersectScoreMatrixList(sml, reorder = FALSE)

Arguments

`sml`	a `ScoreMatrixList` object
`reorder`	if TRUE `ScoreMatrix` objects in the list are sorted based on their common row ids.

Value

ScoreMatrixList object

Examples

library(GenomicRanges)
target = GRanges(rep(c(1,2),each=7), 
                  IRanges(rep(c(1,1,2,3,7,8,9), times=2), width=5), 
                  weight = rep(c(1,2),each=7))
                 
windows1 = GRanges(rep(c(1,2),each=2), 
                    IRanges(rep(c(1,2), times=2), width=5), 
                    strand=c('-','+','-','+'))
windows2 = windows1[c(1,3)]
sml = as(list(ScoreMatrix(target, windows1),
               ScoreMatrix(target, windows2)), 'ScoreMatrixList')
sml
intersectScoreMatrixList(sml)

library(GenomicRanges)
target = GRanges(rep(c(1,2),each=7), 
                  IRanges(rep(c(1,1,2,3,7,8,9), times=2), width=5), 
                  weight = rep(c(1,2),each=7))
                 
windows1 = GRanges(rep(c(1,2),each=2), 
                    IRanges(rep(c(1,2), times=2), width=5), 
                    strand=c('-','+','-','+'))
windows2 = windows1[c(1,3)]
sml = as(list(ScoreMatrix(target, windows1),
               ScoreMatrix(target, windows2)), 'ScoreMatrixList')
sml
intersectScoreMatrixList(sml)

Draw multiple heatmaps from a ScoreMatrixList object

Description

The function plots multiple heatmaps for a ScoreMatrixList object side by side. Each matrix can have different color schemes but it is essential that each matrix is obtained from same regions or neighbouring regions.

Usage

multiHeatMatrix(sml, grid = TRUE, col = NULL, xcoords = NULL,
  group = NULL, group.col = NULL, order = FALSE, user.order = FALSE,
  winsorize = c(0, 100), clustfun = FALSE, clust.matrix = NULL,
  column.scale = TRUE, matrix.main = NULL, common.scale = FALSE,
  legend = TRUE, legend.name = NULL, cex.legend = 0.8, xlab = NULL,
  cex.lab = 1, cex.main = 1, cex.axis = 0.8, newpage = TRUE)
multiHeatMatrix(sml, grid = TRUE, col = NULL, xcoords = NULL,
  group = NULL, group.col = NULL, order = FALSE, user.order = FALSE,
  winsorize = c(0, 100), clustfun = FALSE, clust.matrix = NULL,
  column.scale = TRUE, matrix.main = NULL, common.scale = FALSE,
  legend = TRUE, legend.name = NULL, cex.legend = 0.8, xlab = NULL,
  cex.lab = 1, cex.main = 1, cex.axis = 0.8, newpage = TRUE)

Arguments

`sml`	a `ScoreMatrixList` object
`grid`	if TRUE, grid graphics will be used. if FALSE, base graphics will be used on the top level, so users can use par(mfrow) or par(mfcol) prior to calling the function. Default:FALSE
`col`	a color palette or list of color palettes, such as list(heat.colors(10),topo.colors(10)). If it is a list, it is length must match the number of matrices to be plotted. If it is a single palette every heatmap will have the same colors.
`xcoords`	a vector of numbers showing relative positions of the bases or windows or a list of vectors. The elements of the list must match the number of columns in the corresponding `ScoreMatrix`. Alternatively, the elements could be a numeric vector of two elements. Such as c(0,100) showing the relative start and end coordinates of the first and last column of the `ScoreMatrix` object. The remaining coordinates will be automatically matched in this case. If the argument is not a list but a single vector, then all heatmaps will have the same coordinate on their x-axis.
`group`	a list of vectors of row numbers or a factor. The rows will be reordered to match their grouping. The grouping is used for rowside colors of the heatmap. If it is a list, each element of the list must be a vector of row numbers. Names of the elements of the list will be used as names of groups. If `group` is a factor , it's length must match the number of rows of the matrix, and factor levels will be used as the names of the groups in the plot.
`group.col`	a vector of color names to be used at the rowside colors if `group` and `clustfun` arguments are given
`order`	Logical indicating if the rows should be ordered or not (Default:FALSE). If `order=TRUE` the matrix will be ordered with rowSums(mat) values in descending order. If `group` argument is provided, first the groups will be ordered in descending order of sums of rows then, everything within the clusters will be ordered by sums of rows. If `clustfun` is given then rows within clusters will be order in descending order by sums of rows.
`user.order`	a numerical vector indicating the order of groups/clusters (it works only when `group` or `clustfun` argument is given).
`winsorize`	Numeric vector of two, defaults to c(0,100). This vector determines the upper and lower percentile values to limit the extreme values. For example, c(0,99) will limit the values to only 99th percentile for a matrix, everything above the 99 percentile will be equalized to the value of 99th percentile.This is useful for visualization of matrices that have outliers.
`clustfun`	a function for clustering rows of `mat` that returns a vector of integers indicating the cluster to which each point is allocated (a vector of cluster membership), e.g. k-means algorithm with 3 centers: function(x) kmeans(x, centers=3)$cluster. By default FALSE.
`clust.matrix`	a numerical vector of indexes or a character vector of names of the `ScoreMatrix` objects in 'sml' to be used in clustering (if `clustfun` argument is provided). By default all matrices are clustered. Matrices that are not indicated in clust.matrix are ordered according to result of clustering algorithm.
`column.scale`	Logical indicating if matrices should be scaled or not, prior to clustering or ordering. Setting this to TRUE scales the columns of the matrices using, `scale()` function. scaled columns are only used for clustering or ordering. Original scores are displayed for heatmaps.
`matrix.main`	a vector of strings for the titles of the heatmaps. If NULL titles will be obtained from names of the `ScoreMatrix` objects in the `ScoreMatrixList` objects.
`common.scale`	if TRUE (Default:FALSE) all the heatmap colors will be coming from the same score scale, although each heatmap color scale can be different. The color intensities will be coming from the same scale. The scale will be determined by minimum of all matrices and maximum of all matrices. This is useful when all matrices are on the same score scale. If FALSE, the color scale will be determined by minimum and maximum of each matrix individually.
`legend`	if TRUE and color legend for the heatmap is drawn.
`legend.name`	a vector of legend labels to be plotted with legends of each heatmap. If it is a length 1 vector, all heatmaps will have the same legend label.
`cex.legend`	A numerical value giving the amount by which legend axis marks should be magnified relative to the default
`xlab`	a vector of character strings for x-axis labels of the heatmaps. if it is length 1, all heatmaps will have the same label.
`cex.lab`	A numerical value giving the amount by which axis labels (including 'legend.name') should be magnified relative to the default.
`cex.main`	A numerical value giving the amount by which plot title should be magnified
`cex.axis`	A numerical value giving the amount by which axis marks should be magnified relative to the default
`newpage`	logical indicating if `grid.newpage()` function should be invoked if `grid=TRUE`.

Value

invisibly returns the order of rows, if clustfun is provided and/or order=TRUE

Examples

data(cage)
data(promoters)
scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE)

data(cpgi)
scores2=ScoreMatrix(target=cpgi,windows=promoters,strand.aware=TRUE)

sml=new("ScoreMatrixList",list(a=scores1,b=scores2))

# use with k-means
multiHeatMatrix(sml,
                 clustfun=function(x) kmeans(x, centers=2)$cluster,
                 cex.axis=0.8,xcoords=c(-1000,1000),
                 winsorize=c(0,99),
                 legend.name=c("tpm","coverage"),xlab="region around TSS")
                 
# use with hierarchical clustering
cl2 <- function(x) cutree(hclust(dist(x), method="complete"), k=2)
multiHeatMatrix(sml,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS",
         xcoords=-1000:1000,clustfun=cl2,
         cex.legend=0.8,cex.lab=1,
         cex.axis=0.9,grid=FALSE)

# use different colors
require(RColorBrewer)
col.cage= brewer.pal(9,"Blues")
col.cpgi= brewer.pal(9,"YlGn")
multiHeatMatrix(sml,
                 clustfun=function(x) kmeans(x, centers=2)$cluster,
                 cex.axis=0.8,xcoords=c(-1000,1000),
                 winsorize=c(0,99),col=list(col.cage,col.cpgi),
                 legend.name=c("tpm","coverage"),xlab="region around TSS")




data(cage)
data(promoters)
scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE)

data(cpgi)
scores2=ScoreMatrix(target=cpgi,windows=promoters,strand.aware=TRUE)

sml=new("ScoreMatrixList",list(a=scores1,b=scores2))

# use with k-means
multiHeatMatrix(sml,
                 clustfun=function(x) kmeans(x, centers=2)$cluster,
                 cex.axis=0.8,xcoords=c(-1000,1000),
                 winsorize=c(0,99),
                 legend.name=c("tpm","coverage"),xlab="region around TSS")
                 
# use with hierarchical clustering
cl2 <- function(x) cutree(hclust(dist(x), method="complete"), k=2)
multiHeatMatrix(sml,legend.name="tpm",winsorize=c(0,99),xlab="region around TSS",
         xcoords=-1000:1000,clustfun=cl2,
         cex.legend=0.8,cex.lab=1,
         cex.axis=0.9,grid=FALSE)

# use different colors
require(RColorBrewer)
col.cage= brewer.pal(9,"Blues")
col.cpgi= brewer.pal(9,"YlGn")
multiHeatMatrix(sml,
                 clustfun=function(x) kmeans(x, centers=2)$cluster,
                 cex.axis=0.8,xcoords=c(-1000,1000),
                 winsorize=c(0,99),col=list(col.cage,col.cpgi),
                 legend.name=c("tpm","coverage"),xlab="region around TSS")

Ops method for a ScoreMatrixList object. It enables to use arithmetic, indicator and logic operations on ScoreMatrixList objects.

Description

Arithmetic method for ScoreMatrixList

Usage

## S4 method for signature 'numeric,ScoreMatrixList'
Ops(e1, e2)
## S4 method for signature 'numeric,ScoreMatrixList'
Ops(e1, e2)

Arguments

`e1`	the numeric value
`e2`	the `ScoreMatrixList` object

Value

ScoreMatrixList

Ops method for a ScoreMatrix object. It enables to use arithmetic, indicator and logic operations on ScoreMatrix objects.

Description

Arithmetic method for ScoreMatrix and ScoreMatrixList classes

Usage

## S4 method for signature 'ScoreMatrix,ScoreMatrix'
Ops(e1, e2)
## S4 method for signature 'ScoreMatrix,ScoreMatrix'
Ops(e1, e2)

Arguments

`e1`	the `ScoreMatrix` object or numeric value
`e2`	the `ScoreMatrix` object or numeric value

Value

ScoreMatrix

Ops method for a ScoreMatrixList object. It enables to use arithmetic, indicator and logic operations on ScoreMatrixList objects.

Description

Arithmetic method for ScoreMatrixList

Usage

## S4 method for signature 'ScoreMatrixList,numeric'
Ops(e1, e2)
## S4 method for signature 'ScoreMatrixList,numeric'
Ops(e1, e2)

Arguments

`e1`	the `ScoreMatrixList` object
`e2`	the numeric value

Value

ScoreMatrixList

Ops method for a ScoreMatrixList object. It enables to use arithmetic, indicator and logic operations on ScoreMatrixList objects.

Description

Arithmetic methods for ScoreMatrixList

Usage

## S4 method for signature 'ScoreMatrixList,ScoreMatrixList'
Ops(e1, e2)
## S4 method for signature 'ScoreMatrixList,ScoreMatrixList'
Ops(e1, e2)

Arguments

`e1`	the `ScoreMatrixList` object
`e2`	the `ScoreMatrixList` object

Value

ScoreMatrixList

Reorder all elements of a ScoreMatrixList to a given ordering vector

Description

Reorder all elements of a ScoreMatrixList to a given ordering vector

Usage

orderBy(sml, ord.vec)

## S4 method for signature 'ScoreMatrixList'
orderBy(sml, ord.vec)
orderBy(sml, ord.vec)

## S4 method for signature 'ScoreMatrixList'
orderBy(sml, ord.vec)

Arguments

`sml`	`ScoreMatrixList` object
`ord.vec`	an integer vector

Value

ScoreMatrixList object

Examples

library(GenomicRanges)
data(cage)
data(cpgi)
data(promoters)

cage$tpm = NULL
targets = GRangesList(cage=cage, cpgi=cpgi)
sml = ScoreMatrixList(targets, promoters, bin.num=10)
kmeans.clust = kmeans(sml$cage,3)
 
sml.ordered = orderBy(sml, kmeans.clust$cluster)

multiHeatMatrix(sml.ordered)

library(GenomicRanges)
data(cage)
data(cpgi)
data(promoters)

cage$tpm = NULL
targets = GRangesList(cage=cage, cpgi=cpgi)
sml = ScoreMatrixList(targets, promoters, bin.num=10)
kmeans.clust = kmeans(sml$cage,3)
 
sml.ordered = orderBy(sml, kmeans.clust$cluster)

multiHeatMatrix(sml.ordered)

Get scores that correspond to k-mer or PWM matrix occurrence for bases in each window

Description

The function produces a base-pair resolution matrix or matrices of scores that correspond to k-mer or PWM matrix occurrence over predefined windows that have equal width. It finds either positions of pattern hits above a specified threshold and creates score matrix filled with 1 (presence of pattern) and 0 (its absence) or matrix with scores themselves. If pattern is a character of length 1 or PWM matrix then the function returns a ScoreMatrix object, if character of length more than 1 or list of PWMs then ScoreMatrixList.

Usage

patternMatrix(pattern, windows, genome = NULL, min.score = 0.8,
  asPercentage = FALSE, cores = 1)

\S4method{patternMatrix}{character,DNAStringSet}(pattern, windows,
                                                          asPercentage, cores)

\S4method{patternMatrix}{character,GRanges,BSgenome}(pattern, windows, genome,
                                                              cores)

\S4method{patternMatrix}{matrix,DNAStringSet}(pattern, windows,
                                                       min.score, asPercentage, 
                                                       cores)

\S4method{patternMatrix}{matrix,GRanges,BSgenome}(pattern, windows, genome,
                                                           min.score, asPercentage, 
                                                           cores)

\S4method{patternMatrix}{list,DNAStringSet}(pattern, windows,
                                                     min.score, asPercentage, 
                                                     cores)

\S4method{patternMatrix}{list,GRanges,BSgenome}(pattern, windows, genome, 
                                                         min.score, asPercentage, 
                                                         cores)
patternMatrix(pattern, windows, genome = NULL, min.score = 0.8,
  asPercentage = FALSE, cores = 1)

\S4method{patternMatrix}{character,DNAStringSet}(pattern, windows,
                                                          asPercentage, cores)

\S4method{patternMatrix}{character,GRanges,BSgenome}(pattern, windows, genome,
                                                              cores)

\S4method{patternMatrix}{matrix,DNAStringSet}(pattern, windows,
                                                       min.score, asPercentage, 
                                                       cores)

\S4method{patternMatrix}{matrix,GRanges,BSgenome}(pattern, windows, genome,
                                                           min.score, asPercentage, 
                                                           cores)

\S4method{patternMatrix}{list,DNAStringSet}(pattern, windows,
                                                     min.score, asPercentage, 
                                                     cores)

\S4method{patternMatrix}{list,GRanges,BSgenome}(pattern, windows, genome, 
                                                         min.score, asPercentage, 
                                                         cores)

Arguments

`pattern`	matrix (a PWM matrix), list of matrices or a character vector of length 1 or more. A matrix is a PWM matrix that needs to have one row for each nucleotide ("A","C","G" and "T" respectively). IUPAC ambiguity codes can be used and it will match any letter in the subject that is associated with the code.
`windows`	`GRanges` object or `DNAStringSet` object that have equal width of ranges or sequences.
`genome`	`BSgenome` object
`min.score`	numeric or character indicating minimum score to count a match. It can be given as a character string containing a percentage of the highest possible score or a single number (by default "80%" or 0.8). If min.score is set to NULL then `patternMatrix` returns scores themselves (default).
`asPercentage`	boolean telling whether scores represent percentage of the maximal motif PWM score (default: TRUE) or raw scores (FALSE).
`cores`	the number of cores to use (default: 1). It is supported only on Unix-like platforms.

Details

patternMatrix is based on functions from the seqPattern package: getPatternOccurrenceList function to find position of pattern that is a character vector in a list of sequences (a DNAStringSet object) and adapted function motifScanHits to find pattern that is a PWM matrix in sequences (a DNAStringSet object).

If cores > 1 is provided then for every window occurrence of pattern is counted in paralallel.

Value

returns a scoreMatrix object or a scoreMatrixList object

Examples

library(Biostrings)

# consensus sequence of the ctcf motif
motif = "CCGCGNGGNGGCAG"
# Creates 10 random DNA sequences
seqs = sapply(1:10, 
       function(x) paste(sample(c("A","T","G","C"), 180, replace=TRUE), collapse=""))
windows = DNAStringSet(seqs)
p = patternMatrix(pattern=motif, windows=windows, min.score=0.8)
p

library(Biostrings)

# consensus sequence of the ctcf motif
motif = "CCGCGNGGNGGCAG"
# Creates 10 random DNA sequences
seqs = sapply(1:10, 
       function(x) paste(sample(c("A","T","G","C"), 180, replace=TRUE), collapse=""))
windows = DNAStringSet(seqs)
p = patternMatrix(pattern=motif, windows=windows, min.score=0.8)
p

Line plot(s) for meta-region profiles

Description

Function calculates meta-profile(s) from a ScoreMatrix or a ScoreMatrixList, then produces a line plot or a set of line plots for meta-region profiles

Usage

plotMeta(mat, centralTend = "mean", overlay = TRUE, winsorize = c(0, 100),
  profile.names = NULL, xcoords = NULL, meta.rescale = FALSE,
  smoothfun = NULL, line.col = NULL, dispersion = NULL,
  dispersion.col = NULL, ylim = NULL, ylab = "average score",
  xlab = "bases", ...)
plotMeta(mat, centralTend = "mean", overlay = TRUE, winsorize = c(0, 100),
  profile.names = NULL, xcoords = NULL, meta.rescale = FALSE,
  smoothfun = NULL, line.col = NULL, dispersion = NULL,
  dispersion.col = NULL, ylim = NULL, ylab = "average score",
  xlab = "bases", ...)

Arguments

`mat`	`ScoreMatrix` or `ScoreMatrixList` object. If it is a `ScoreMatrixList` object, all matrices in the ScoreMatrixList should have the same number of columns.
`centralTend`	a character that determines central tendency of meta-profile(s). It takes "mean" (default) or "median".
`overlay`	If TRUE multiple profiles will be overlayed in the same plot (Default:TRUE). If FALSE, and mat is a ScoreMatrixList, consider using par(mfrow=c(1,length(mat))) to see the plots from all matrices at once.
`winsorize`	Numeric vector of two, defaults to c(0,100). This vector determines the upper and lower percentile values to limit the extreme values. For example, c(0,99) will limit the values to only 99th percentile, everything above the 99 percentile will be equalized to the value of 99th percentile.This is useful for visualization of matrices that have outliers.
`profile.names`	a character vector for names of the profiles. The order should be same as the as the order of ScoreMatrixList.
`xcoords`	a numeric vector which designates relative base positions of the meta-region profiles. For example, for a 2001 column ScoreMatrix, xcoord=-1000:1000 indicates relative positions of each column in the score matrix. If NULL (Default), xcoords equals to 1:ncol(mat)
`meta.rescale`	if TRUE meta-region profiles are scaled to 0 to 1 range by subtracting the min from profiles and dividing them by max-min. If dispersion is not NULL, then dispersion will be scaled as well.
`smoothfun`	a function to smooth central tendency and dispersion bands (Default: NULL), e.g. stats::lowess.
`line.col`	color of lines for `centralTend` of meta-region profiles. Defaults to colors from `rainbow()` function.
`dispersion`	shows dispersion interval bands around `centralTend` (default:NULL). It takes one of the character: "se"shows standard error of the mean and 95 percent confidence interval for the mean "sd"shows standard deviation and 2(standard deviation) "IQR"shows 1st and 3rd quartile and confidence interval around the median based on the median +/- 1.57 IQR/sqrt(n) (notches)
`dispersion.col`	color of bands of `dispersion`. Defaults to colors from `rainbow()` and transparency is set to 0.5 (rainbow(length(mat), alpha = 0.5)).
`ylim`	same as `ylim` at `plot` function. if NULL ylim is estimated from all meta-region profiles.
`ylab`	same as `ylab` at `plot` function. Default: "average score"
`xlab`	same as `xlab` at `plot` function. Default: "bases"
`...`	other options to `plot`

Value

returns the meta-region profiles invisibly as a matrix.

Note

Score matrices are plotted according to ScoreMatrixList order. If ScoreMatrixList contains more than one matrix then they will overlap each other on a plot, i.e. the first one is plotted first and every next one overlays previous one(s) and the last one is the topmost.

Missing values in data slow down plotting dispersion around central tendency. The reason is that dispersion is plotted only for non-missing values, for each segment that contains numerical values graphics::polygon function is used to plot dispersion bands. There might be a situation, when in a column of ScoreMatrix is only one numeric number and the rest are NAs, then at corresponding position only central tendency will be plotted.

Notches show the 95 percent confidence interval for the median according to an approximation based on the normal distribution. They are used to compare groups - if notches corresponding to adjacent base pairs on the plot do not overlap, this is strong evidence that medians differ. Small sample sizes (5-10) can cause notches to extend beyond the interquartile range (IQR) (Martin Krzywinski et al. Nature Methods 11, 119-120 (2014))

Examples

 data(cage)
 data(promoters)
 scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE)

 data(cpgi)
 scores2=ScoreMatrix(target=cpgi,windows=promoters,strand.aware=TRUE)

 # create a new ScoreMatrixList
 x=new("ScoreMatrixList",list(scores1,scores2))
 
 plotMeta(mat=x,overlay=TRUE,main="my plotowski")

 # plot dispersion nd smooth central tendency and variation interval bands
 plotMeta(mat=x, centralTend="mean", dispersion="se", winsorize=c(0,99), 
         main="Dispersion as interquartile band", lwd=4, 
         smoothfun=function(x) stats::lowess(x, f = 1/5))
 
data(cage)
 data(promoters)
 scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE)

 data(cpgi)
 scores2=ScoreMatrix(target=cpgi,windows=promoters,strand.aware=TRUE)

 # create a new ScoreMatrixList
 x=new("ScoreMatrixList",list(scores1,scores2))
 
 plotMeta(mat=x,overlay=TRUE,main="my plotowski")

 # plot dispersion nd smooth central tendency and variation interval bands
 plotMeta(mat=x, centralTend="mean", dispersion="se", winsorize=c(0,99), 
         main="Dispersion as interquartile band", lwd=4, 
         smoothfun=function(x) stats::lowess(x, f = 1/5))

Plot annotation categories from AnnotationByGeneParts or AnnotationByFeature

Description

This function plots a pie or bar chart for showing percentages of targets annotated by genic parts or other query features

Usage

plotTargetAnnotation(x, precedence = TRUE,
  col = getColors(length(x@annotation)), cex.legend = 1, ...)

## S4 method for signature 'AnnotationByFeature'
plotTargetAnnotation(x, precedence = TRUE,
  col = getColors(length(x@annotation)), cex.legend = 1, ...)
plotTargetAnnotation(x, precedence = TRUE,
  col = getColors(length(x@annotation)), cex.legend = 1, ...)

## S4 method for signature 'AnnotationByFeature'
plotTargetAnnotation(x, precedence = TRUE,
  col = getColors(length(x@annotation)), cex.legend = 1, ...)

Arguments

`x`	a `AnnotationByFeature` or `AnnotationByGeneParts` object
`precedence`	TRUE\|FALSE. If TRUE there will be a hierachy of annotation features when calculating numbers (with promoter>exon>intron precedence). This option is only valid when x is a `AnnotationByGeneParts` object
`col`	a vector of colors for piechart or the bar plot
`cex.legend`	a numeric value of length 1 to specify the size of the legend. By default 1.
`...`	graphical parameters to be passed to `pie` or `barplot` functions

Value

plots a piechart or a barplot for percentage of the target features overlapping with annotation

Examples

data(cage)

bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE)

plotTargetAnnotation(annot)


data(cage)

bed.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
annot = annotateWithGeneParts(cage, gene.parts, intersect.chr=TRUE)

plotTargetAnnotation(annot)

Example promoter data set.

Description

promoters of hg19 assembly of human genome on chr21 and chr22. Promoter set is derived from refseq TSS.

Format

GRanges object

An S4 class for storing `getRandomEnrichment` function results

Description

The resulting object stores the results of getRandomEnrichment function

Slots

orig.cnt:: number of features overlapping with query at getRandomEnrichment
rand.olap.dist:: set of number of features overlapping with randomized queries at getRandomEnrichment
log2fc:: log2 fold change calculated by dividing orig.cnt by mean(rand.olap.dist) and taking log2 of that result
p.value:: P-value assuming rand.olap.dist has a normal distribution and comparing orig.cnt with that distribution
rand.p.value:: p-value from randomization by calculation the proportion of how many times a random number of overlap exceeds the original number of overlap

function that randomizes the genomic coordinates

Description

This function randomly distributes the coordinates of genomic features which is stored in a GRanges object. The randomization can be constrained by supplied arguments. The function is still in Beta mode - the regions can overlap excluded regions, and the randomized regions are not disjoint. Please take care that the excluded and included regions are not too strict when compared to the total width of the ranges.

Usage

randomizeFeature(feature, chrom.sizes = NULL, stranded = TRUE,
  keep.strand.prop = TRUE, keep.chrom = TRUE, exclude = NULL,
  include = NULL, seed = NULL, nrand = 1)

## S4 method for signature 'GRanges'
randomizeFeature(feature, chrom.sizes = NULL,
  stranded = TRUE, keep.strand.prop = TRUE, keep.chrom = TRUE,
  exclude = NULL, include = NULL, seed = NULL, nrand = 1)
randomizeFeature(feature, chrom.sizes = NULL, stranded = TRUE,
  keep.strand.prop = TRUE, keep.chrom = TRUE, exclude = NULL,
  include = NULL, seed = NULL, nrand = 1)

## S4 method for signature 'GRanges'
randomizeFeature(feature, chrom.sizes = NULL,
  stranded = TRUE, keep.strand.prop = TRUE, keep.chrom = TRUE,
  exclude = NULL, include = NULL, seed = NULL, nrand = 1)

Arguments

`feature`	a GRanges object to be randomized
`chrom.sizes`	sizes of chromosomes as a named vector (names are chromsomes names and elements of the vectors are lengths). , if not given sizes in GRanges object will be used if no sizes there the end of each chr will be the end last feature on each chr
`stranded`	if FALSE, all of the returned features will be strandless (will have "*" in the strand slot)
`keep.strand.prop`	If TRUE strands will have the same proportion as the features
`keep.chrom`	If TRUE, number of features and randomized features for a chromosome will match. Currently seeting this to FALSE is not supported.
`exclude`	A GRanges object where no randomized feature should overlap, can be gaps or unmappable regions in the genome as an example.
`include`	A GRanges object which defines the boundaries of randomized features. If not provided the whole genome is used, as defined using the chrom.sizes parameter.
`seed`	random number generator seed
`nrand`	number of randomizations (default:1)

Value

returns a GRanges object which is randomized version of the feature, along with a "set" column in the metadata which designates to which iteration of the randomization the range belong.

Read a BED file and convert it to GRanges.

Description

The function reads a BED file that contains location and other information on genomic features and returns a GRanges object. The minimal information that the BED file has to have is chromosome, start and end columns. it can handle all BED formats up to 12 columns.

Usage

readBed(file, track.line = FALSE, remove.unusual = FALSE,
  zero.based = TRUE)
readBed(file, track.line = FALSE, remove.unusual = FALSE,
  zero.based = TRUE)

Arguments

`file`	location of the file, a character string such as: "/home/user/my.bed" or the input itself as a string (containing at least one \n). The file can end in `.gz`, `.bz2`, `.xz`, or `.zip` and/or start with `http://` or `ftp://`. If the file is not compressed it can also start with `https://` or `ftps://`.
`track.line`	the number of track lines to skip, "auto" to detect them automatically or FALSE(default) if the bed file doesn't have track lines
`remove.unusual`	if TRUE remove the chromosomes with unsual names, such as chrX_random (Default:FALSE)
`zero.based`	a boolean which tells whether the ranges in the bed file are 0 or 1 base encoded. (Default: TRUE)

Value

GRanges object

Examples

my.file=system.file("extdata","chr21.refseq.hg19.bed",package="genomation")
refseq = readBed(my.file,track.line=FALSE,remove.unusual=FALSE)
head(refseq)

my.file=system.file("extdata","chr21.refseq.hg19.bed",package="genomation")
refseq = readBed(my.file,track.line=FALSE,remove.unusual=FALSE)
head(refseq)

A function to read the Encode formatted broad peak file into a GRanges object

Description

A function to read the Encode formatted broad peak file into a GRanges object

Usage

readBroadPeak(file, track.line=FALSE, zero.based=TRUE)
readBroadPeak(file, track.line=FALSE, zero.based=TRUE)

Arguments

`file`	an absolute or relative path to a bed file formatted by the Encode broadPeak standard. The file can end in `.gz`, `.bz2`, `.xz`, or `.zip` and/or start with `http://` or `ftp://`. If the file is not compressed it can also start with `https://` or `ftps://`.
`track.line`	the number of track lines to skip, "auto" to detect them automatically or FALSE(default) if the bed file doesn't have track lines
`zero.based`	a boolean which tells whether the ranges in the bed file are 0 or 1 base encoded. (Default: TRUE)

Value

a GRanges object

Examples

broad.peak.file = system.file('extdata',"ex.broadPeak", package='genomation')
                           
broad.peak = readBroadPeak(broad.peak.file)
head(broad.peak)

broad.peak.file = system.file('extdata',"ex.broadPeak", package='genomation')
                           
broad.peak = readBroadPeak(broad.peak.file)
head(broad.peak)

A function to read-in genomic features and their upstream and downstream adjecent regions such as CpG islands and their shores

Description

A function to read-in genomic features and their upstream and downstream adjecent regions such as CpG islands and their shores

Usage

readFeatureFlank(location,remove.unusual=TRUE,flank=2000,
                        clean=TRUE,feature.flank.name=NULL)

## S4 method for signature 'character'
readFeatureFlank(location, remove.unusual = TRUE,
  flank = 2000, clean = TRUE, feature.flank.name = NULL)
readFeatureFlank(location,remove.unusual=TRUE,flank=2000,
                        clean=TRUE,feature.flank.name=NULL)

## S4 method for signature 'character'
readFeatureFlank(location, remove.unusual = TRUE,
  flank = 2000, clean = TRUE, feature.flank.name = NULL)

Arguments

`location`	for the bed file of the feature.
`remove.unusual`	remove chromsomes with unsual names random, Un and antyhing with "_" character
`flank`	number of basepairs for the flanking regions
`clean`	If set to TRUE, flanks overlapping with other main features will be trimmed
`feature.flank.name`	the names for feature and flank ranges, it should be a character vector of length 2. example: c("CpGi","shores")

Value

a GRangesList object containing one GRanges object for flanks and one for GRanges object for the main feature. NOTE: This can not return a CompressedGRangesList at the moment because flanking regions do not have to have the same column name as the feature. CompressedGRangesList elements should resemble each other in the column content. We can not satisfy that criteria for the flanks

Examples

cgi.path = system.file('extdata/chr21.CpGi.hg19.bed', package='genomation')
cgi.shores = readFeatureFlank(cgi.path)
cgi.shores
cgi.path = system.file('extdata/chr21.CpGi.hg19.bed', package='genomation')
cgi.shores = readFeatureFlank(cgi.path)
cgi.shores

Read a tabular file and convert it to GRanges.

Description

The function reads a tabular text file that contains location and other information on genomic features and returns a GRanges object. The minimal information that the file has to have is chromosome, start and end columns. Strand information is not compulsory.

Usage

readGeneric(file, chr = 1, start = 2, end = 3, strand = NULL,
  meta.cols = NULL, keep.all.metadata = FALSE, zero.based = FALSE,
  remove.unusual = FALSE, header = FALSE, skip = 0, sep = "\t")
readGeneric(file, chr = 1, start = 2, end = 3, strand = NULL,
  meta.cols = NULL, keep.all.metadata = FALSE, zero.based = FALSE,
  remove.unusual = FALSE, header = FALSE, skip = 0, sep = "\t")

Arguments

`file`	location of the file, a character string such as: "/home/user/my.bed" or the input itself as a string (containing at least one \n).
`chr`	number of the column that has chromsomes information in the table (Def:1)
`start`	number of the column that has start coordinates in the table (Def:2)
`end`	number of the column that has end coordinates in the table (Def:3)
`strand`	number of the column that has strand information, only -/+ is accepted (Default:NULL)
`meta.cols`	named `list` that maps column numbers to meta data columns. e.g. list(name=5, score=10), which means 5th column will be named "name", and 10th column will be named "score" and their contents will be a part of the returned GRanges object. If header = TRUE, meta.cols parameter will over-write the column names given by the header line of the data frame.
`keep.all.metadata`	`logical` determining if the extra columns ( the ones that are not designated by chr,start,end,strand and meta.cols arguments ) should be kept or not. (Default:FALSE)
`zero.based`	a boolean which tells whether the ranges in the bed file are 0 or 1 base encoded. (Default: FALSE)
`remove.unusual`	if TRUE(default) remove the chromosomes with unsual names, such as chrX_random (Default:FALSE)
`header`	whether the original file contains a header line which designates the column names. If `TRUE` header will be used to construct column names. These names can be over written by meta.cols argument.
`skip`	number of lines to skip. If there is a header line(s) you do not wish to include you can use skip argument to skip that line.
`sep`	a single character which designates the separator in the file. The default value is tab.

Value

GRanges object

Examples

my.file=system.file("extdata","chr21.refseq.hg19.bed",package="genomation")
refseq = readGeneric(my.file,chr=1,start=2,end=3,strand=NULL,
                      meta.cols=list(score=5,name=4),
                      keep.all.metadata=FALSE, zero.based=TRUE)
head(refseq)
my.file=system.file("extdata","chr21.refseq.hg19.bed",package="genomation")
refseq = readGeneric(my.file,chr=1,start=2,end=3,strand=NULL,
                      meta.cols=list(score=5,name=4),
                      keep.all.metadata=FALSE, zero.based=TRUE)
head(refseq)

A function to read the Encode formatted narrowPeak file into a GRanges object

Description

A function to read the Encode formatted narrowPeak file into a GRanges object

Usage

readNarrowPeak(file, track.line=FALSE, zero.based=TRUE)
readNarrowPeak(file, track.line=FALSE, zero.based=TRUE)

Arguments

`file`	an absolute or relative path to a bed file formatted by the Encode narrowPeak standard. The file can end in `.gz`, `.bz2`, `.xz`, or `.zip` and/or start with `http://` or `ftp://`. If the file is not compressed it can also start with `https://` or `ftps://`.
`track.line`	the number of track lines to skip, "auto" to detect them automatically or FALSE(default) if the bed file doesn't have track lines
`zero.based`	a boolean which tells whether the ranges in the bed file are 0 or 1 base encoded. (Default: TRUE)

Value

a GRanges object

Examples

narrow.peak.file = system.file('extdata',"ex.narrowPeak", package='genomation')
                 
narrow.peak = readBroadPeak(narrow.peak.file)
head(narrow.peak)
narrow.peak.file = system.file('extdata',"ex.narrowPeak", package='genomation')
                 
narrow.peak = readBroadPeak(narrow.peak.file)
head(narrow.peak)

Function for reading exon intron and promoter structure from a given bed file

Description

Function for reading exon intron and promoter structure from a given bed file

Usage

readTranscriptFeatures(location,remove.unusual=TRUE,
                              up.flank=1000,down.flank=1000,unique.prom=TRUE)

## S4 method for signature 'character'
readTranscriptFeatures(location, remove.unusual = TRUE,
  up.flank = 1000, down.flank = 1000, unique.prom = TRUE)
readTranscriptFeatures(location,remove.unusual=TRUE,
                              up.flank=1000,down.flank=1000,unique.prom=TRUE)

## S4 method for signature 'character'
readTranscriptFeatures(location, remove.unusual = TRUE,
  up.flank = 1000, down.flank = 1000, unique.prom = TRUE)

Arguments

`location`	location of the bed file with 12 or more columns. The file can end in `.gz`, `.bz2`, `.xz`, or `.zip` and/or start with `http://` or `ftp://`. If the file is not compressed it can also start with `https://` or `ftps://`.
`remove.unusual`	remove the chromomesomes with unsual names, mainly random chromsomes etc
`up.flank`	up-stream from TSS to detect promoter boundaries
`down.flank`	down-stream from TSS to detect promoter boundaries
`unique.prom`	get only the unique promoters, promoter boundaries will not have a gene name if you set this option to be TRUE

Value

a GRangesList containing locations of exon/intron/promoter/TSS

Note

one bed track per file is only accepted, the bed files with multiple tracks will cause en error

Examples

my.bed12.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
my.bed12.file
feats = readTranscriptFeatures(my.bed12.file) 
names(feats)
sapply(feats, head)
my.bed12.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
my.bed12.file
feats = readTranscriptFeatures(my.bed12.file) 
names(feats)
sapply(feats, head)

Scales the values in the matrix by rows and/or columns

Description

Scales the values in the matrix by rows and/or columns

Usage

scaleScoreMatrix(mat, columns = FALSE, rows = TRUE, scalefun = NULL)

## S4 method for signature 'ScoreMatrix'
scaleScoreMatrix(mat, columns = FALSE, rows = TRUE,
  scalefun = NULL)
scaleScoreMatrix(mat, columns = FALSE, rows = TRUE, scalefun = NULL)

## S4 method for signature 'ScoreMatrix'
scaleScoreMatrix(mat, columns = FALSE, rows = TRUE,
  scalefun = NULL)

Arguments

`mat`	`ScoreMatrix` object
`columns`	`columns` whether to scale the matrix by columns. Set by default to FALSE.
`rows`	`rows` Whether to scale the matrix by rows. Set by default to TRUE
`scalefun`	function object that takes as input a matrix and returns a matrix. By default the argument is set to (x - mean(x))/(max(x)-min(x)+1)

Value

ScoreMatrix object

Examples


# scale the rows of a scoreMatrix object
library(GenomicRanges)
target = GRanges(rep(c(1,2),each=7), IRanges(rep(c(1,1,2,3,7,8,9), times=2), width=5), 
         weight = rep(c(1,2),each=7), 
         strand=c('-', '-', '-', '-', '+', '-', '+', '-', '-', '-', '-', '-', '-', '+'))
windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5), 
           strand=c('-','+','-','+'))
sm = ScoreMatrix(target, windows)
ssm = scaleScoreMatrix(sm, rows=TRUE)

# scale the rows of a scoreMatrix object
library(GenomicRanges)
target = GRanges(rep(c(1,2),each=7), IRanges(rep(c(1,1,2,3,7,8,9), times=2), width=5), 
         weight = rep(c(1,2),each=7), 
         strand=c('-', '-', '-', '-', '+', '-', '+', '-', '-', '-', '-', '-', '-', '+'))
windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5), 
           strand=c('-','+','-','+'))
sm = ScoreMatrix(target, windows)
ssm = scaleScoreMatrix(sm, rows=TRUE)

Scale the ScoreMatrixList

Description

Scales each ScoreMatrix in the ScoreMatrixList object, by rows and/or columns

Usage

scaleScoreMatrixList(sml, columns, rows, scalefun)

## S4 method for signature 'ScoreMatrixList'
scaleScoreMatrixList(sml, columns = FALSE,
  rows = TRUE, scalefun = NULL)
scaleScoreMatrixList(sml, columns, rows, scalefun)

## S4 method for signature 'ScoreMatrixList'
scaleScoreMatrixList(sml, columns = FALSE,
  rows = TRUE, scalefun = NULL)

Arguments

`sml`	a `ScoreMatrixList` object
`columns`	a `columns` whether to scale the matrix by columns. Set by default to FALSE
`rows`	a `rows` Whether to scale the matrix by rows. Set by default to TRUE
`scalefun`	a function object that takes as input a matrix and returns a matrix. By default the argument is set to the R scale function with center=TRUE and scale=TRUE

Value

ScoreMatrixList object

Examples

library(GenomicRanges)
data(cage)
data(cpgi)
data(promoters)
 
cage$tpm = NULL
targets = GRangesList(cage=cage, cpgi=cpgi)
sml = ScoreMatrixList(targets, promoters, bin.num=10, strand.aware=TRUE)
sml.scaled = scaleScoreMatrixList(sml, rows=TRUE)
sml.scaled

multiHeatMatrix(sml) 


library(GenomicRanges)
data(cage)
data(cpgi)
data(promoters)
 
cage$tpm = NULL
targets = GRangesList(cage=cage, cpgi=cpgi)
sml = ScoreMatrixList(targets, promoters, bin.num=10, strand.aware=TRUE)
sml.scaled = scaleScoreMatrixList(sml, rows=TRUE)
sml.scaled

multiHeatMatrix(sml)

Get base-pair score for bases in each window

Description

The funcion produces a base-pair resolution matrix of scores for given equal width windows of interest. The returned matrix can be used to draw meta profiles or heatmap of read coverage or wig track-like data. The windows argument can be a predefined region around transcription start sites or other regions of interest that have equal lengths The function removes all window that fall off the Rle object - have the start coordinate < 1 or end coordinate > length(Rle) The function takes the intersection of names in the Rle and GRanges objects. On Windows OS the function will give an error if the target is a file in .bigWig format.

Usage

ScoreMatrix(target, windows, strand.aware = FALSE, weight.col = NULL,
  is.noCovNA = FALSE, type = "auto", rpm = FALSE, unique = FALSE,
  extend = 0, param = NULL, bam.paired.end = FALSE, library.size = NULL)

\S4method{ScoreMatrix}{RleList,GRanges}(target,windows,strand.aware)

\S4method{ScoreMatrix}{GRanges,GRanges}(target, windows, strand.aware,
                                                weight.col, is.noCovNA)

\S4method{ScoreMatrix}{character,GRanges}(target, windows, strand.aware,
                                                  weight.col=NULL,is.noCovNA=FALSE, 
                                                  type='auto', rpm=FALSE,
                                                  unique=FALSE, extend=0, param=NULL, 
                                                  bam.paired.end=FALSE,
                                                  library.size=NULL)
ScoreMatrix(target, windows, strand.aware = FALSE, weight.col = NULL,
  is.noCovNA = FALSE, type = "auto", rpm = FALSE, unique = FALSE,
  extend = 0, param = NULL, bam.paired.end = FALSE, library.size = NULL)

\S4method{ScoreMatrix}{RleList,GRanges}(target,windows,strand.aware)

\S4method{ScoreMatrix}{GRanges,GRanges}(target, windows, strand.aware,
                                                weight.col, is.noCovNA)

\S4method{ScoreMatrix}{character,GRanges}(target, windows, strand.aware,
                                                  weight.col=NULL,is.noCovNA=FALSE, 
                                                  type='auto', rpm=FALSE,
                                                  unique=FALSE, extend=0, param=NULL, 
                                                  bam.paired.end=FALSE,
                                                  library.size=NULL)

Arguments

`target`	`RleList` , `GRanges`, a BAM file or a BigWig to be overlapped with ranges in `windows`
`windows`	`GRanges` object that contains the windows of interest. It could be promoters, CpG islands, exons, introns. However the sizes of windows have to be equal.
`strand.aware`	If TRUE (default: FALSE), the strands of the windows will be taken into account in the resulting `ScoreMatrix`. If the strand of a window is -, the values of the bins for that window will be reversed
`weight.col`	if the object is `GRanges` object a numeric column in meta data part can be used as weights. This is particularly useful when genomic regions have scores other than their coverage values, such as percent methylation, conservation scores, GC content, etc.
`is.noCovNA`	(Default:FALSE) if TRUE,and if 'target' is a GRanges object with 'weight.col' provided, the bases that are uncovered will be preserved as NA in the returned object. This useful for situations where you can not have coverage all over the genome, such as CpG methylation values.
`type`	(Default:"auto") if target is a character vector of file paths, then type designates the type of the corresponding files (bam or bigWig).
`rpm`	boolean telling whether to normalize the coverage to per milion reads. FALSE by default. See `library.size`.
`unique`	boolean which tells the function to remove duplicated reads based on chr, start, end and strand
`extend`	numeric which tells the function to extend the reads to width=extend
`param`	ScanBamParam object
`bam.paired.end`	boolean indicating whether given BAM file contains paired-end reads (default:FALSE). Paired-reads will be treated as fragments.
`library.size`	numeric indicating total number of mapped reads in a BAM file (`rpm` has to be set to TRUE). If is not given (default: NULL) then library size is calculated using the Rsamtools idxstatsBam function: sum(idxstatsBam(target)$mapped).

Value

returns a ScoreMatrix object

Note

We assume that a paired-end BAM file contains reads with unique ids and we remove both mates of reads if they are repeated. Due to the fact that ScoreMatrix uses the GenomicAlignments:readGAlignmentPairs function to read paired-end BAM files a duplication of reads occurs when mates of one pair map into two different windows.

Strands of reads in a paired-end BAM are inferred depending on strand of first alignment from the pair. This is a default setting in the GenomicAlignments:readGAlignmentPairs function (see a strandMode argument). This mode should be used when the paired-end data was generated using one of the following stranded protocols: Directional Illumina (Ligation), Standard SOLiD.

Examples

# When target is GRanges
data(cage)
data(promoters)
scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE,
                                weight.col="tpm")                                
                                 

# When target is RleList
library(GenomicRanges)
covs = coverage(cage)
scores2 = ScoreMatrix(target=covs,windows=promoters,strand.aware=TRUE)   
scores2 

# When target is a bam file
bam.file = system.file('unitTests/test.bam', package='genomation')
windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5))
scores3 = ScoreMatrix(target=bam.file,windows=windows, type='bam') 
scores3

# When target is GRanges
data(cage)
data(promoters)
scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE,
                                weight.col="tpm")                                
                                 

# When target is RleList
library(GenomicRanges)
covs = coverage(cage)
scores2 = ScoreMatrix(target=covs,windows=promoters,strand.aware=TRUE)   
scores2 

# When target is a bam file
bam.file = system.file('unitTests/test.bam', package='genomation')
windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5))
scores3 = ScoreMatrix(target=bam.file,windows=windows, type='bam') 
scores3

An S4 class for storing `ScoreMatrix` function results

Description

The resulting object is an extension of a matrix object, and stores values (typically genome-wide scores) for a predefined set of regions Each row on the ScoreMatrix is a predefined region (Ex: CpG islands, promoters) and columns are values across those regions.

Constructors

see ScoreMatrix

Coercion

as(from, "matrix"): Creates a matrix from ScoreMatrix object. You can also use S3Part() function to extract the matrix from ScoreMatrix object.

Subsetting

In the code snippets below, x is a ScoreMatrix object. 'x[i,j]': Get or set elements from row i and column j and return a subset ScoreMatrix object.

Get bin score for bins on each window

Description

The function first bins each window to equal number of bins, and calculates the a summary matrix for scores of each bin (currently, mean, max and min supported) A scoreMatrix object can be used to draw average profiles or heatmap of read coverage or wig track-like data. windows can be a predefined region such as CpG islands, gene bodies, transcripts or CDS (coding sequences) that are not necessarily equi-width. Each window will be chopped to equal number of bins based on bin.num option.

Usage

ScoreMatrixBin(target, windows, bin.num = 10, bin.op = "mean",
  strand.aware = FALSE, weight.col = NULL, is.noCovNA = FALSE,
  type = "auto", rpm = FALSE, unique = FALSE, extend = 0,
  param = NULL, bam.paired.end = FALSE, library.size = NULL)

\S4method{ScoreMatrixBin}{RleList,GRanges}(target, windows, bin.num, bin.op,
                                                   strand.aware)

\S4method{ScoreMatrixBin}{GRanges,GRanges}(target,windows,
                                                   bin.num,bin.op,
                                                   strand.aware,weight.col,
                                                   is.noCovNA)

\S4method{ScoreMatrixBin}{character,GRanges}(target, windows, bin.num=10,
                                                     bin.op='mean',strand.aware, 
                                                     is.noCovNA=FALSE, type='auto',
                                                     rpm, unique, extend, param,
                                                     bam.paired.end=FALSE, 
                                                     library.size=NULL)

\S4method{ScoreMatrixBin}{RleList,GRangesList}(target,windows,
                                                        bin.num, bin.op,
                                                        strand.aware)

\S4method{ScoreMatrixBin}{GRanges,GRangesList}(target,windows,
                                                   bin.num,bin.op,
                                                   strand.aware,weight.col,
                                                   is.noCovNA)

\S4method{ScoreMatrixBin}{character,GRangesList}(target, windows, bin.num=10,
                                                     bin.op='mean',strand.aware, 
                                                     weight.col=NULL,
                                                     is.noCovNA=FALSE, type='auto',
                                                     rpm, unique, extend, param,
                                                     bam.paired.end=FALSE, 
                                                     library.size=NULL)
ScoreMatrixBin(target, windows, bin.num = 10, bin.op = "mean",
  strand.aware = FALSE, weight.col = NULL, is.noCovNA = FALSE,
  type = "auto", rpm = FALSE, unique = FALSE, extend = 0,
  param = NULL, bam.paired.end = FALSE, library.size = NULL)

\S4method{ScoreMatrixBin}{RleList,GRanges}(target, windows, bin.num, bin.op,
                                                   strand.aware)

\S4method{ScoreMatrixBin}{GRanges,GRanges}(target,windows,
                                                   bin.num,bin.op,
                                                   strand.aware,weight.col,
                                                   is.noCovNA)

\S4method{ScoreMatrixBin}{character,GRanges}(target, windows, bin.num=10,
                                                     bin.op='mean',strand.aware, 
                                                     is.noCovNA=FALSE, type='auto',
                                                     rpm, unique, extend, param,
                                                     bam.paired.end=FALSE, 
                                                     library.size=NULL)

\S4method{ScoreMatrixBin}{RleList,GRangesList}(target,windows,
                                                        bin.num, bin.op,
                                                        strand.aware)

\S4method{ScoreMatrixBin}{GRanges,GRangesList}(target,windows,
                                                   bin.num,bin.op,
                                                   strand.aware,weight.col,
                                                   is.noCovNA)

\S4method{ScoreMatrixBin}{character,GRangesList}(target, windows, bin.num=10,
                                                     bin.op='mean',strand.aware, 
                                                     weight.col=NULL,
                                                     is.noCovNA=FALSE, type='auto',
                                                     rpm, unique, extend, param,
                                                     bam.paired.end=FALSE, 
                                                     library.size=NULL)

Arguments

`target`	`RleList`, `GRanges`, a BAM file or a bigWig file object to be overlapped with ranges in `windows`
`windows`	`GRanges` or `GRangesList` object that contains the windows of interest. It could be promoters, CpG islands, exons, introns as GRanges object or GrangesList object representing exons of each transcript. Exons must be ordered by ascending rank by their position in transcript. The sizes of windows does NOT have to be equal.
`bin.num`	single `integer` value denoting how many bins there should be for each window
`bin.op`	bin operation that is either one of the following strings: "max","min","mean","median","sum". The operation is applied on the values in the bin. Defaults to "mean"
`strand.aware`	If TRUE (default: FALSE), the strands of the windows will be taken into account in the resulting `scoreMatrix`. If the strand of a window is -, the values of the bins for that window will be reversed
`weight.col`	if the object is `GRanges` object a numeric column in meta data part can be used as weights. This is particularly useful when genomic regions have scores other than their coverage values, such as percent methylation, conservation scores, GC content, etc.
`is.noCovNA`	(Default:FALSE) if TRUE,and if 'target' is a GRanges object with 'weight.col' provided, the bases that are uncovered will be preserved as NA in the returned object. This useful for situations where you can not have coverage all over the genome, such as CpG methylation values.
`type`	(Default:"auto") if target is a character vector of file paths, then type designates the type of the corresponding files (bam or bigWig)
`rpm`	boolean telling whether to normalize the coverage to per milion reads. FALSE by default. See `library.size`.
`unique`	boolean which tells the function to remove duplicated reads based on chr, start, end and strand
`extend`	numeric which tells the function to extend the reads to width=extend
`param`	ScanBamParam object
`bam.paired.end`	boolean indicating whether given BAM file contains paired-end reads (default:FALSE). Paired-reads will be treated as fragments.
`library.size`	numeric indicating total number of mapped reads in a BAM file (`rpm` has to be set to TRUE). If is not given (default: NULL) then library size is calculated using the Rsamtools idxstatsBam function: sum(idxstatsBam(target)$mapped).

Value

returns a scoreMatrix object

Examples

data(cage)
data(cpgi)
data(promoters)
myMat=ScoreMatrixBin(target=cage,
                      windows=cpgi,bin.num=10,bin.op="mean",weight.col="tpm")

plot(colMeans(myMat,na.rm=TRUE),type="l")

  
myMat2=ScoreMatrixBin(target=cage,
                       windows=promoters,bin.num=10,bin.op="mean",
                       weight.col="tpm",strand.aware=TRUE)

plot(colMeans(myMat2,na.rm=TRUE),type="l")


# Compute transcript coverage of a set of exons.
library(GenomicRanges)
bed.file = system.file("extdata/chr21.refseq.hg19.bed", 
                       package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
transcripts = split(gene.parts$exons, gene.parts$exons$name)
transcripts = transcripts[]
myMat3 = ScoreMatrixBin(target=cage, windows=transcripts[1:250], 
                    bin.num=10)
myMat3                     
data(cage)
data(cpgi)
data(promoters)
myMat=ScoreMatrixBin(target=cage,
                      windows=cpgi,bin.num=10,bin.op="mean",weight.col="tpm")

plot(colMeans(myMat,na.rm=TRUE),type="l")

  
myMat2=ScoreMatrixBin(target=cage,
                       windows=promoters,bin.num=10,bin.op="mean",
                       weight.col="tpm",strand.aware=TRUE)

plot(colMeans(myMat2,na.rm=TRUE),type="l")


# Compute transcript coverage of a set of exons.
library(GenomicRanges)
bed.file = system.file("extdata/chr21.refseq.hg19.bed", 
                       package = "genomation")
gene.parts = readTranscriptFeatures(bed.file)
transcripts = split(gene.parts$exons, gene.parts$exons$name)
transcripts = transcripts[]
myMat3 = ScoreMatrixBin(target=cage, windows=transcripts[1:250], 
                    bin.num=10)
myMat3

Make ScoreMatrixList from multiple targets

Description

The function constructs a list of ScoreMatrix objects in the form of ScoreMatrixList object. This object can be visualized using multiHeatMatrix, heatMeta or plotMeta

Usage

ScoreMatrixList(targets, windows = NULL, bin.num = NULL, bin.op = "mean",
  strand.aware = FALSE, weight.col = NULL, is.noCovNA = FALSE,
  type = "auto", rpm = FALSE, unique = FALSE, extend = 0,
  param = NULL, library.size = NULL, cores = 1)
ScoreMatrixList(targets, windows = NULL, bin.num = NULL, bin.op = "mean",
  strand.aware = FALSE, weight.col = NULL, is.noCovNA = FALSE,
  type = "auto", rpm = FALSE, unique = FALSE, extend = 0,
  param = NULL, library.size = NULL, cores = 1)

Arguments

`targets`	can be a list of `scoreMatrix` objects, that are coerced to the `ScoreMatrixList`, a list of `RleList` objects, or a character vector specifying the locations of mulitple bam files or bigWig files that are used to construct the `scoreMatrixList`. If it is either a RleList object or a character vector of files, it is obligatory to give a windows argument.
`windows`	`GenomicRanges` containing viewpoints for the scoreMatrix or ScoreMatrixList functions
`bin.num`	an integer telling the number of bins to bin the score matrix
`bin.op`	an name of the function that will be used for smoothing windows of ranges
`strand.aware`	a boolean telling the function whether to reverse the coverage of ranges that come from - strand (e.g. when plotting enrichment around transcription start sites)
`weight.col`	if the object is `GRanges` object a numeric column in meta data part can be used as weights. This is particularly useful when genomic regions have scores other than their coverage values, such as percent methylation, conservation scores, GC content, etc.
`is.noCovNA`	(Default:FALSE) if TRUE,and if 'targets' is a GRanges object with 'weight.col' provided, the bases that are uncovered will be preserved as NA in the returned object. This useful for situations where you can not have coverage all over the genome, such as CpG methylation values.
`type`	(Default:"auto") if `targets` is a character vector of file paths, then type designates the type of the corresponding files (bam or bigWig)
`rpm`	boolean telling whether to normalize the coverage to per milion reads. FALSE by default. See `library.size`.
`unique`	boolean which tells the function to remove duplicated reads based on chr, start, end and strand
`extend`	numeric which tells the function to extend the features ( i.e aligned reads) to total length ofwidth+extend
`param`	ScanBamParam object
`library.size`	a numeric vector of the same length as `targets` indicating total number of mapped reads in BAM files (`targets`). If is not given (default: NULL) then library sizes for every target is calculated using the Rsamtools idxstatsBam function: sum(idxstatsBam(target)$mapped). `rpm` argument has to be set to TRUE.
`cores`	the number of cores to use (default: 1)

Value

returns a ScoreMatrixList object

Examples


# visualize the distribution of cage clusters and cpg islands around promoters
library(GenomicRanges)
data(cage)
data(cpgi)
data(promoters)
 
cage$tpm = NULL
targets = GRangesList(cage=cage, cpgi=cpgi)
sml = ScoreMatrixList(targets, promoters, bin.num=10, strand.aware=TRUE)
sml

multiHeatMatrix(sml)

# visualize the distribution of cage clusters and cpg islands around promoters
library(GenomicRanges)
data(cage)
data(cpgi)
data(promoters)
 
cage$tpm = NULL
targets = GRangesList(cage=cage, cpgi=cpgi)
sml = ScoreMatrixList(targets, promoters, bin.num=10, strand.aware=TRUE)
sml

multiHeatMatrix(sml)

An S4 class for storing a set of `ScoreMatrixList`

Description

The resulting object is an extension of a list object, where each element corresponds to a score matrix object

Constructors

see ScoreMatrixList

Coercion

as(from, "ScoreMatrixList"): Creates a ScoreMatrixList object from a list containing ScoreMatrix or ScoreMatrixBin objects.

Subsetting

In the code snippets below, x is a ScoreMatrixList object.

x[[i]],x[[i]]: Get or set elements i, where i is a numeric or character vector of length 1.

x$name, x$name: value: Get or set element name, where name is a name or character vector of length 1.

show method for some of the genomation classes

Description

show method for some of the genomation classes

Usage

## S4 method for signature 'RandomEnrichment'
show(object)

## S4 method for signature 'AnnotationByGeneParts'
show(object)

## S4 method for signature 'AnnotationByFeature'
show(object)

## S4 method for signature 'ScoreMatrix'
show(object)

## S4 method for signature 'ScoreMatrixList'
show(object)
## S4 method for signature 'RandomEnrichment'
show(object)

## S4 method for signature 'AnnotationByGeneParts'
show(object)

## S4 method for signature 'AnnotationByFeature'
show(object)

## S4 method for signature 'ScoreMatrix'
show(object)

## S4 method for signature 'ScoreMatrixList'
show(object)

Arguments

object

object of class RandomEnrichment

Value

Shows the dimension of the ScoreMatrix

Shows the number of matrices and their sizes

Package 'genomation'

Help Index

Extract method for a ScoreMatrix object.

Description

Usage

Arguments

Extract method for a ScoreMatrixList object.

Description

Usage

Arguments

Function to annotate given GRanges object with a given genomic feature

Description

Usage

Arguments

Value

Examples

Function to annotate a given GRanges object with promoter,exon,intron & intergenic values

Description

Usage

Arguments

Value

Examples

Annotate given ranges with genomic features

Description

Usage

Arguments

Value

See Also

Examples

Annotate given object with promoter, exon, intron and intergenic regions

Description

Usage

Arguments

Value

Examples

An S4 class that information on overlap of target features with annotation features

Description

Slots

An S4 class that information on overlap of target features with annotation features

Description

Slots

Bins the columns of a matrix using a user provided function

Description

Usage

Arguments

Value

Examples

c.ScoreMatrix

Description

Usage

Arguments

Value

c.ScoreMatrixList

Description

Usage

Arguments

Value

Example CAGE data set.

Description

Format

function that calculates the significance of overlaps of two sets of features using randomization

Description

Usage

Arguments

Value

convert a data frame read-in from a bed file to a GRanges object for exons

Description

Usage

Arguments

Value

Note

Examples

convert a data frame read-in from a bed file to a GRanges object for introns

Description

Usage

Arguments

Value

Note

Examples

convert a data frame read-in from a bed file to a GRanges object