Package 'groHMM' reference manual

Title:	GRO-seq Analysis Pipeline
Description:	A pipeline for the analysis of GRO-seq data.
Authors:	Charles G. Danko, Minho Chae, Andre Martins, W. Lee Kraus
Maintainer:	Tulip Nandu <[email protected]>, W. Lee Kraus <[email protected]>
License:	GPL-3
Version:	1.41.1
Built:	2025-03-18 02:50:31 UTC
Source:	https://github.com/bioc/groHMM

groHMM: GRO-seq Analysis Pipeline

Description

groHMM was developed for analysis of GRO-seq data, which provides a genome wide 'map' of the position and orientation of all transcriptionally active RNA polymerases. groHMM predicts the boundaries of transcriptional activity across the genome de novo using a two-state hidden Markov model (HMM). The model essentially divides the genome into 'transcribed' and 'non-transcribed' regions in a strand specific manner.

We also use HMMs to identify the leading edge of Pol II at genes activated by a stimulus in GRO-seq time course data. This approach allows the genome-wide interrogation of transcription rates in cells.

In addition to these advanced features, groHMM provides wrapper functions for counting raw reads, generating wiggle files for visualization, and creating metagene (averaging) plots. Although groHMM is tailored towards GRO-seq data, the same functions and analytical methodologies can, in principal, be applied to a wide variety of other short read data sets.

Details

Package:	groHMM
Type:	Package
Version:	0.99.0
Date:	2014-04-02
License:	GPL (>=3)
LazyLoad:	yes
Depends:	R (>= 2.14.0), MASS, GenomicRanges, rtracklayer, parallel

Author(s)

Charles G. Danko, Minho Chae, Andre Martins

Maintainer: Minho Chae<[email protected]>

References

Luo, X., Chae, M., Krishnakumar, R,, Danko, C., Kraus, L. Dynamic reorganization of the AC16 cardiomyocyte transcriptome in response to TNFa signaling revealed by integrated genomic analyses. BMC Genomics. 2014 Feb 24;15(1):155

Hah, N., Danko, C., Core, L., Waterfall, J., Siepel, A., Lis, J., Kraus, L. A Rapid, Extensive, and Transient Transcriptional Response to Estrogen Signaling in Breast Cancer Cells. Cell. 2011 May 13;145(4):622-34

Returns the average profile of tiling array probe intensity values or wiggle-like count data centered on a set of genomic positions (specified by 'Peaks').

Description

Supports parallel processing using mclapply in the 'parallel' package. To change the number of processors, use the argument 'mc.cores'.

Usage

averagePlot(ProbeData, Peaks, size = 50, bins = seq(-1000, 1000, size))
averagePlot(ProbeData, Peaks, size = 50, bins = seq(-1000, 1000, size))

Arguments

`ProbeData`	Data.frame representing chromosome, window center, and a value.
`Peaks`	Data.frame representing chromosome, and window center.
`size`	Numeric. The size of the moving window. Default: 50 bp.
`bins`	The bins of the meta gene – i.e. the number of moving windows to break it into. Default +/- 1kb from center.

Value

A vector representing the 'typical' signal centered on the peaks of interest.

Author(s)

Charles G. Danko and Minho Chae

breakTranscriptsOnGenes Breaks transcripts on genes

Description

Breaks transcripts when they are overlapped with multiple well annotated genes.

Usage

breakTranscriptsOnGenes(tx, annox, strand = "+", geneSize = 5000,
  threshold = 0.8, gap = 5, plot = FALSE)
breakTranscriptsOnGenes(tx, annox, strand = "+", geneSize = 5000,
  threshold = 0.8, gap = 5, plot = FALSE)

Arguments

`tx`	GRanges of transcripts.
`annox`	GRanges of non-overlapping annotations for reference.
`strand`	Takes "+" or "-" Default: "+"
`geneSize`	Numeric. Minimum gene size in annox to be used as reference. Default: 5000
`threshold`	Numeric. Ratio of overlapped region relative to a gene width. Transcripts only greater than this threshold are subjected to be broken. Default: 0.8
`gap`	Numeric. Gap (bp) between broken transcripts. Default: 5
`plot`	Logical. If set to TRUE, show each step in a plot. Default: FALSE

Value

Returns GRanges object of broken transcripts.

Author(s)

Minho Chae and Charles G. Danko

Examples

tx <- GRanges("chr7", IRanges(1000, 30000), strand="+")
annox <- GRanges("chr7", IRanges(start=c(1000, 20000),
             width=c(10000,10000)), strand="+")
bPlus <- breakTranscriptsOnGenes(tx, annox, strand="+")
tx <- GRanges("chr7", IRanges(1000, 30000), strand="+")
annox <- GRanges("chr7", IRanges(start=c(1000, 20000),
             width=c(10000,10000)), strand="+")
bPlus <- breakTranscriptsOnGenes(tx, annox, strand="+")

combineTranscripts Combines transnscipts.

Description

Combines transcripts that are within the same gene annotation, combining smaller transcripts for genes with low regulation into a single transcript representing the gene.

Usage

combineTranscripts(tx, annox, geneSize = 1000, threshold = 0.8,
  plot = FALSE)
combineTranscripts(tx, annox, geneSize = 1000, threshold = 0.8,
  plot = FALSE)

Arguments

`tx`	GRanges of transcripts.
`annox`	GRanges of non-overlapping annotations for reference.
`geneSize`	Numeric. Minimum gene size in annotations to be used as reference. Default: 1000
`threshold`	Numeric. Ratio of overlapped region relative to transcript width. Transcripts only greater than this threshold are subjected to be combined. Default: 0.8
`plot`	Logical. If set to TRUE, show easch step in a plot. Default: FALSE

Value

Returns GRanges object of combined transcripts.

Author(s)

Minho Chae and Charles G. Danko

Examples

tx <- GRanges("chr7", IRanges(start=c(1000, 20000), width=c(10000,10000)),
strand="+")
annox <- GRanges("chr7", IRanges(1000, 30000), strand="+")
combined <- combineTranscripts(tx, annox)
tx <- GRanges("chr7", IRanges(start=c(1000, 20000), width=c(10000,10000)),
strand="+")
annox <- GRanges("chr7", IRanges(1000, 30000), strand="+")
combined <- combineTranscripts(tx, annox)

countMappableReadsInInterval counts the number of mappable reads in a set of genomic features.

Description

Supports parallel processing using mclapply in the 'parallel' package. To change the number of processors, use the argument 'mc.cores'.

Usage

countMappableReadsInInterval(features, UnMap, debug = FALSE, ...)
countMappableReadsInInterval(features, UnMap, debug = FALSE, ...)

Arguments

`features`	A GRanges object representing a set of genomic coordinates. The meta-plot will be centered on the start position.
`UnMap`	List object representing the position of un-mappable reads. Default: not used.
`debug`	If set to TRUE, provides additional print options. Default: FALSE
`...`	Extra argument passed to mclapply

Value

Returns a vector of counts, each representing the number of reads inside each genomic interval.

Author(s)

Charles G. Danko and Minho Chae

detectTranscripts detects transcripts de novo using a two-state hidden Markov model (HMM).

Description

Read counts can be specified as either a GRanges object (reads), or using a fixed-step wiggle-format passed in a list (Fp and Fm). Either reads or BOTH Fp and Fm must be specified.

Usage

detectTranscripts(reads = NULL, Fp = NULL, Fm = NULL, LtProbA = -5,
  LtProbB = -200, UTS = 5, size = 50, threshold = 0.1, debug = TRUE,
  ...)
detectTranscripts(reads = NULL, Fp = NULL, Fm = NULL, LtProbA = -5,
  LtProbB = -200, UTS = 5, size = 50, threshold = 0.1, debug = TRUE,
  ...)

Arguments

`reads`	A GRanges object representing a set of mapped reads.
`Fp`	Wiggle-formatted read counts on "+" strand. Optionally, Fp and Fm represent list() filled with a vector of counts for each chromosome. Can detect transcripts starting from a fixed-step wiggle.
`Fm`	Wiggle-formatted read counts on "-" strand.
`LtProbA`	Log probability of t... . Default: -5. One of these is just an initialization, and the final value is set by EM. The other is a holdout parameter.
`LtProbB`	Log probability of t... . Default: -200.
`UTS`	Varience in read counts of the untranscribed sequence. Default: 5.
`size`	Log probability of t... . Default: -5.
`threshold`	Threshold change in total likelihood, below which EM exits.
`debug`	If set to TRUE, provides additional print options. Default: FALSE
`...`	Extra argument passed to mclapply

Details

Supports parallel processing using mclapply in the 'parallel' package. To change the number of processors set the option 'mc.cores'.

Reference: Hah N, Danko CG, Core L, Waterfall JJ, Siepel A, Lis JT, Kraus WL. A rapid, extensive, and transient transcriptional response to estrogen signaling in breast cancer cells. Cell. 2011 May 13;145(4):622-34. doi: 10.1016/j.cell.2011.03.042.

Value

Returns a list of emisParams, trnasParams, viterbiStates, and transcripts. The transcript element is a GRanges object representing the predicted genomic coordinates of transcripts on both the + and - strand.

Author(s)

Charles G. Danko and Minho Chae

Examples

S0mR1 <- as(readGAlignments(system.file("extdata", "S0mR1.bam",
               package="groHMM")), "GRanges")
## Not run:
# hmmResult <- detectTranscripts(S0mR1, LtProbB=-200, UTS=5, threshold=1)
# txHMM <- hmmResult$transcripts
S0mR1 <- as(readGAlignments(system.file("extdata", "S0mR1.bam",
               package="groHMM")), "GRanges")
## Not run:
# hmmResult <- detectTranscripts(S0mR1, LtProbB=-200, UTS=5, threshold=1)
# txHMM <- hmmResult$transcripts

evaluateHMM Evaluates HMM calling.

Description

Evaluates HMM calling of transripts compared to known annotations.

Usage

evaluateHMMInAnnotations(tx, annox)
evaluateHMMInAnnotations(tx, annox)

Arguments

`tx`	GRanges of transcripts predicted by HMM.
`annox`	GRanges of non-overlapping annotatoins.

Value

a list of error information; merged annotations, dissociated annotation, total, and rate.

Author(s)

Minho Chae

Examples

tx <- GRanges("chr7", IRanges(start=seq(100, 1000, by=200),
width=seq(100, 1000, by=100)), strand="+")
annox <- GRanges("chr7", IRanges(start=seq(110, 1100, by=150),
width=seq(100, 1000, by=150)), strand="+")
error <- evaluateHMMInAnnotations(tx, annox)
tx <- GRanges("chr7", IRanges(start=seq(100, 1000, by=200),
width=seq(100, 1000, by=100)), strand="+")
annox <- GRanges("chr7", IRanges(start=seq(110, 1100, by=150),
width=seq(100, 1000, by=150)), strand="+")
error <- evaluateHMMInAnnotations(tx, annox)

Function identifies expressed features using the methods introduced in Core, Waterfall, Lis; Science, Dec. 2008.

Description

Supports parallel processing using mclapply in the 'parallel' package. To change the number of processors use the argument 'mc.cores'.

Usage

expressedGenes(features, reads, Lambda = NULL, UnMap = NULL,
  debug = FALSE, ...)
expressedGenes(features, reads, Lambda = NULL, UnMap = NULL,
  debug = FALSE, ...)

Arguments

`features`	A GRanges object representing a set of genomic coordinates. The meta-plot will be centered on the start position. There can be optional "ID" column for gene ids.
`reads`	A GRanges object representing a set of mapped reads.
`Lambda`	Measurement of assay noise. Default: 0.04 reads/ kb in a library of 10,751,533 mapped reads. (background computed in Core, Waterfall, Lis. (2008) Science.).
`UnMap`	List object representing the position of un-mappable reads. Default: not used.
`debug`	If set to true, returns the number of positions. Default: FALSE.
`...`	Extra argument passed to mclapply

Value

Returns a data.frame representing the expression p.values for features of interest.

Author(s)

Charles G. Danko

Returns the number of cores.

Description

Returns the number of cores.

Usage

getCores(cores)
getCores(cores)

Arguments

cores

the number of cores, it is 1 in windows platform.

Examples

cores <- getCores(2L)
cores <- getCores(2L)

getTxDensity Calculates transcript density.

Description

Calculates transcript density for transcripts which overlapps with annotations. For 'run genes together' or 'broken up a single annotation' errors, best overlapped transcripts or annotations are used.

Usage

getTxDensity(tx, annox, plot = TRUE, scale = 1000L, nSampling = 0L,
  samplingRatio = 0.1, ...)
getTxDensity(tx, annox, plot = TRUE, scale = 1000L, nSampling = 0L,
  samplingRatio = 0.1, ...)

Arguments

`tx`	GRanges of transcripts.
`annox`	GRanges of non-overlapping annotatoins.
`plot`	Logical. If TRUE, plot transcript density. Default: TRUE
`scale`	Numeric. Scaled size of a gene for transcript density calculation. Default: 1000L
`nSampling`	Numeric. Number of subsampling. Default: 0L
`samplingRatio`	Numeric. Ratio of sampling for annotations. Default: 0.1
`...`	Extra argument passed to mclapply.

Details

Supports parallel processing using mclapply in the 'parallel' package. To change the number of processors set the option 'mc.cores'.

Value

Returns a list of FTD, TTD, PostTTS, and AUC.

Author(s)

Minho Chae

Examples

tx <- GRanges("chr7", IRanges(start=seq(1000,4000, by=1000),
width=seq(1000, 1300, by=100)), strand=rep("+", 4))
annox <- GRanges("chr7", IRanges(start=seq(1100,4100, by=1000),
width=seq(900, 1200, by=100)), strand=rep("+", 4))
## Not run:
# density <- getTxDensity(tx, annox)
tx <- GRanges("chr7", IRanges(start=seq(1000,4000, by=1000),
width=seq(1000, 1300, by=100)), strand=rep("+", 4))
annox <- GRanges("chr7", IRanges(start=seq(1100,4100, by=1000),
width=seq(900, 1200, by=100)), strand=rep("+", 4))
## Not run:
# density <- getTxDensity(tx, annox)

limitToXkb truncates a set of genomic itnervals at a constant, maximum size.

Description

limitToXkb truncates a set of genomic itnervals at a constant, maximum size.

Usage

limitToXkb(features, offset = 1000, size = 13000)
limitToXkb(features, offset = 1000, size = 13000)

Arguments

`features`	A GRanges object representing a set of genomic coordinates. The meta-plot will be centered on the start position.
`offset`	Starts the interval from this position relative to the start of each genomic features.
`size`	Specifies the size of the window.

Value

Returns GRanges object with new genomic coordiates.

Author(s)

Minho Chae and Charles G. Danko

Examples

tx <- GRanges("chr7", IRanges(1000, 30000), strand="+")
newTX <- limitToXkb(tx)
tx <- GRanges("chr7", IRanges(1000, 30000), strand="+")
newTX <- limitToXkb(tx)

makeConsensusAnnotations Makes a consensus annotation

Description

Makes a non-overlapping consensus annotation. Gene annotations are often overalpping due to #' multiple isoforms for a gene. In consensus annotation, isoforms are first reduced so that only redundant intervals are used to represent a genomic interval for a gene, i.e., a gene id. Remaining unresolved annotations are further reduced by truncating 3' end of annotations.

Usage

makeConsensusAnnotations(ar, minGap = 1L, minWidth = 1000L, ...)
makeConsensusAnnotations(ar, minGap = 1L, minWidth = 1000L, ...)

Arguments

`ar`	GRanges of annotations to be collapsed.
`minGap`	Minimun gap between overlapped annotations after truncated. Default: 1L
`minWidth`	Minimun width of consensus annotations. Default: 1000L
`...`	Extra argument passed to mclapply.

Details

Supports parallel processing using mclapply in the 'parallel' package. To change the number of processors, use the argument 'mc.cores'.

Value

Returns GRanges object of annotations.

Author(s)

Minho Chae

Examples

## Not run:
# library(TxDb.Hsapiens.UCSC.hg19.knownGene)
# txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
# tx <- transcripts(txdb, columns=c("gene_id", "tx_id", "tx_name"),
#                         filter=list(tx_chrom="chr7"))
# tx <- tx[grep("random", as.character(seqnames(tx)), invert=TRUE),]
# ca <- makeConsensusAnnotations(tx)
## Not run:
# library(TxDb.Hsapiens.UCSC.hg19.knownGene)
# txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
# tx <- transcripts(txdb, columns=c("gene_id", "tx_id", "tx_name"),
#                         filter=list(tx_chrom="chr7"))
# tx <- tx[grep("random", as.character(seqnames(tx)), invert=TRUE),]
# ca <- makeConsensusAnnotations(tx)

Returns a histogram of the number of reads in each section of a moving window centered on a certain feature.

Description

Supports parallel processing using mclapply in the 'parallel' package. To change the number of processors, set the option 'mc.cores'.

Usage

metaGene(features, reads = NULL, plusCVG = NULL, minusCVG = NULL,
  size = 100L, up = 10000L, down = NULL, ...)
metaGene(features, reads = NULL, plusCVG = NULL, minusCVG = NULL,
  size = 100L, up = 10000L, down = NULL, ...)

Arguments

`features`	A GRanges object representing a set of genomic coordinates. The meta-plot will be centered on the transcription start site (TSS)
`reads`	A GRanges object representing a set of mapped reads. Instead of 'reads', 'plusCVG' and 'minusCVG' can be used Default: NULL
`plusCVG`	An IntegerRangesList object for reads with '+' strand.
`minusCVG`	An IntegerRangesList object for reads with '-' strand.
`size`	The size of the moving window.
`up`	Distance upstream of each features to align and histogram. Default: 10 kb.
`down`	Distance downstream of each features to align and histogram. If NULL, same as up. Default: NULL.
`...`	Extra argument passed to mclapply

Value

Returns a integer-Rle representing the 'typical' signal centered on a point of interest.

Author(s)

Charles G. Danko and Minho Chae

Examples

features <- GRanges("chr7", IRanges(1000, 1000), strand="+")
reads <- GRanges("chr7", IRanges(start=c(1000:1004, 1100),
 width=rep(1, 6)), strand="+")
mg <- metaGene(features, reads, size=4, up=10)
features <- GRanges("chr7", IRanges(1000, 1000), strand="+")
reads <- GRanges("chr7", IRanges(start=c(1000:1004, 1100),
 width=rep(1, 6)), strand="+")
mg <- metaGene(features, reads, size=4, up=10)

Returns a histogram of the number of reads in each section of a moving window of #' variable size across genes.

Description

Supports parallel processing using mclapply in the 'parallel' package. To change the number of processors, use the argument 'mc.cores'.

Usage

metaGene_nL(features, reads, n_windows = 1000, debug = FALSE, ...)
metaGene_nL(features, reads, n_windows = 1000, debug = FALSE, ...)

Arguments

`features`	A GRanges object representing a set of genomic coordinates.
`reads`	A GRanges object representing a set of mapped reads.
`n_windows`	The number of windows to break genes into.
`debug`	If set to TRUE, provides additional print options. Default: FALSE
`...`	Extra argument passed to mclapply

Value

Returns a vector representing the 'typical' signal across genes of different length.

Author(s)

Charles G. Danko and Minho Chae

Returns a matrix, with rows representing read counts across a specified gene, or other features of interest.

Description

Supports parallel processing using mclapply in the 'parallel' package. To change the number of processors, use the argument 'mc.cores'.

Usage

metaGeneMatrix(features, reads, size = 50, up = 1000, down = up,
  debug = FALSE, ...)
metaGeneMatrix(features, reads, size = 50, up = 1000, down = up,
  debug = FALSE, ...)

Arguments

`features`	A GRanges object representing a set of genomic coordinates.
`reads`	A GRanges object representing a set of mapped reads.
`size`	The size of the moving window.
`up`	Distance upstream of each f to align and histogram Default: 1 kb.
`down`	Distance downstream of each f to align and histogram Default: same as up.
`debug`	If set to TRUE, provides additional print options. Default: FALSE
`...`	Extra argument passed to mclapply

Value

Returns a vector representing the 'typical' signal across genes of different length.

Author(s)

Charles G. Danko and Minho Chae

Returns the pausing index for different genes. TODO: DESCRIBE THE PAUSING INDEX.

Description

Supports parallel processing using mclapply in the 'parallel' package. To change the number of processors, use the argument 'mc.cores'.

Usage

pausingIndex(features, reads, size = 50, up = 1000, down = 1000,
  UnMAQ = NULL, debug = FALSE, ...)
pausingIndex(features, reads, size = 50, up = 1000, down = 1000,
  UnMAQ = NULL, debug = FALSE, ...)

Arguments

`features`	A GRanges object representing a set of genomic coordinates.
`reads`	A GRanges object representing a set of mapped reads.
`size`	The size of the moving window.
`up`	Distance upstream of each f to align and histogram.
`down`	Distance downstream of each f to align and histogram (NULL).
`UnMAQ`	Data structure representing the coordinates of all un-mappable regions in the genome.
`debug`	If set to TRUE, provides additional print options. Default: FALSE
`...`	Extra argument passed to mclapply

Value

Returns a data.frame of the pausing indices for the input genes.

Returns the pausing index for different genes.

Author(s)

Charles G. Danko and Minho Chae.

Examples

features <- GRanges("chr7", IRanges(2394474,2420377), strand="+")
 reads <- as(readGAlignments(system.file("extdata", "S0mR1.bam",
             package="groHMM")), "GRanges")
 ## Not run:
 # pi <- pausingIndex(features, reads)
features <- GRanges("chr7", IRanges(2394474,2420377), strand="+")
 reads <- as(readGAlignments(system.file("extdata", "S0mR1.bam",
             package="groHMM")), "GRanges")
 ## Not run:
 # pi <- pausingIndex(features, reads)

Given GRO-seq data, identifies the location of the polymerase 'wave' in up- or down- regulated genes.

Description

The model is a three state hidden Markov model (HMM). States represent: (1) the 5' end of genes upstream of the transcription start site, (2) upregulated sequence, and (3) the 3' end of the gene through the polyadenylation site.

Usage

polymeraseWave(reads1, reads2, genes, approxDist, size = 50,
  upstreamDist = 10000, TSmooth = NA, NonMap = NULL, prefix = NULL,
  emissionDistAssumption = "gamma", finterWindowSize = 10000,
  limitPCRDups = FALSE, returnVal = "simple", debug = TRUE)
polymeraseWave(reads1, reads2, genes, approxDist, size = 50,
  upstreamDist = 10000, TSmooth = NA, NonMap = NULL, prefix = NULL,
  emissionDistAssumption = "gamma", finterWindowSize = 10000,
  limitPCRDups = FALSE, returnVal = "simple", debug = TRUE)

Arguments

`reads1`	Mapped reads in time point 1.
`reads2`	Mapped reads in time point 2.
`genes`	A set of genes in which to search for the wave.
`approxDist`	The approximate position of the wave. Suggest using 2000 [bp/ min] * time [min], for mammalian data.
`size`	The size of the moving window. Suggest using 50 for direct ligation data, and 200 for circular ligation data. Default: 50.
`upstreamDist`	The amount of upstream sequence to include Default: 10 kb.
`TSmooth`	Optimonally, outlying windows are set a maximum value over the inter-quantile interval, specified by TSmooth. Reasonable value: 20; Default: NA (for no smoothing). Users are encouraged to use this parameter ONLY in combination with the normal distribution assumptions.
`NonMap`	Optionally, un-mappable positions are trated as missing data. NonMap passes in the list() structure for un-mappable regions.
`prefix`	Optionally, writes out png images of each gene examined for a wave. 'Prefix' denotes the file prefix for image names written to disk. Users are encouraged to create a new directory and write in a full path.
`emissionDistAssumption`	Takes values "norm", "normExp", and "gamma". Specifies the functional form of the 'emission' distribution for states I and II (i.e. 5' of the gene, and inside of the wave). In our experience, "gamma" works best for highly-variable 'spikey' data, and "norm" works for smooth data. As a general rule of thumb, "gamma" is used for libraries made using the direct ligation method, and "norm" for circular ligation data. Default: "gamma".
`finterWindowSize`	Method returns 'quality' information for each gene to which a wave was fit. Included in these metrics are several that define a moving window. The moving window size is specified by filterWindowSize. Default: 10 kb.
`limitPCRDups`	If true, counts only 1 read at each position with >= 1 read. NOT recommended to set this to TRUE. Defulat: FALSE.
`returnVal`	Takes value "simple" (default) or "alldata". "simple" returns a data.frame with Pol II wave end positions. "alldata" returns all of the availiable data from each gene, including the full posterior distribution of the model after EM.
`debug`	If TRUE, prints error messages.

Details

The model computes differences in read counts between the two conditions. Differences are assumed fit a functional form which can be specified by the user (using the emissionDistAssumption argument). Currently supported functional forms include a normal distribution (good for GRO-seq data prepared using the circular ligation protocol), a gamma distribution (good for 'spikey' ligation based GRO-seq data), and a long-tailed normal+exponential distribution was implemented, but never deployed.

Initial parameter estimates are based on initial assumptions of transcription rates taken from the literature. Subsequently all parameters are fit using Baum-Welch expetation maximization.

Reference: Danko CG, Hah N, Luo X, Martins AL, Core L, Lis JT, Siepel A, Kraus WL. Signaling Pathways Differentially Affect RNA Polymerase II Initiation, Pausing, and Elongation Rate in Cells. Mol Cell. 2013 Mar 19. doi:pii: S1097-2765(13)00171-8. 10.1016/j.molcel.2013.02.015.

Arguments:

Value

Returns either a data.frame with Pol II wave end positions, or a List() structure with additional data, as specified by returnVal.

Author(s)

Charles G. Danko

Examples

genes <- GRanges("chr7", IRanges(2394474,2420377), strand="+",
 SYMBOL="CYP2W1", ID="54905")
 reads1 <- as(readGAlignments(system.file("extdata", "S0mR1.bam",
                             package="groHMM")), "GRanges")
 reads2 <- as(readGAlignments(system.file("extdata", "S40mR1.bam",
                             package="groHMM")), "GRanges")
 approxDist <- 2000*10
 # Not run:
 # pw <- polymeraseWave(reads1, reads2, genes, approxDist)
genes <- GRanges("chr7", IRanges(2394474,2420377), strand="+",
 SYMBOL="CYP2W1", ID="54905")
 reads1 <- as(readGAlignments(system.file("extdata", "S0mR1.bam",
                             package="groHMM")), "GRanges")
 reads2 <- as(readGAlignments(system.file("extdata", "S40mR1.bam",
                             package="groHMM")), "GRanges")
 approxDist <- 2000*10
 # Not run:
 # pw <- polymeraseWave(reads1, reads2, genes, approxDist)

readBed Returns a GenomicRanges object constrcuted from the specified bed file.

Description

Bed file format is assumed to be either four column: seqnames, start, end, strand columns; or six column: seqnames, start, end, name, score, and strand. Three column format is also possible when there is no strand information.

Usage

readBed(file, ...)
readBed(file, ...)

Arguments

`file`	Path to the input file.
`...`	Extra argument passed to read.table

Details

Any additional arguments availiable to read.table can be specified.

Value

Returns GRanges object representing mapped reads.

Author(s)

Minho Chae and Charles G. Danko.

RgammaMLE fits a gamma distribution to a specified data vector using maximum likelihood.

Description

RgammaMLE fits a gamma distribution to a specified data vector using maximum likelihood.

Usage

RgammaMLE(X)
RgammaMLE(X)

Arguments

`X`	A vector of observations, assumed to be real numbers in the inveraval [0,+Inf).

Value

Returns a list of parameters for the best-fit gamma distribution (shape and scale).

Author(s)

Charles G. Danko

Rnorm fits a normal distribution to a specified data vector using maximum likelihood.

Description

Rnorm fits a normal distribution to a specified data vector using maximum likelihood.

Usage

Rnorm(X)
Rnorm(X)

Arguments

`X`	A vector of observations, assumed to be real numbers in the inveraval (-Inf,+Inf).

Value

Returns a list of parameters for the best-fit normal distribution (mean and varience).

Author(s)

Charles G. Danko

Rnorm.exp fits a normal+exponential distribution to a specified data vector using maximum likelihood.

Description

Distrubtion function devined by: alpha*Normal(mean, varience)+(1-alpha) *Exponential(lambda).

Usage

Rnorm.exp(xi, wi = rep(1, NROW(xi)), guess = c(0.5, 0, 1, 1),
  tol = sqrt(.Machine$double.eps), maxit = 10000)
Rnorm.exp(xi, wi = rep(1, NROW(xi)), guess = c(0.5, 0, 1, 1),
  tol = sqrt(.Machine$double.eps), maxit = 10000)

Arguments

`xi`	A vector of observations, assumed to be real numbers in the inveraval (-Inf,+Inf).
`wi`	A vector of weights. Default: vector of repeating 1; indicating all observations are weighted equally. (Are these normalized internally?! Or do they have to be [0,1]?)
`guess`	Initial guess for paremeters. Default: c(0.5, 0, 1, 1).
`tol`	Convergence tolerance. Default: sqrt(.Machine$double.eps).
`maxit`	Maximum number of iterations. Default: 10,000.

Details

Fits nicely with data types that look normal overall, but have a long tail starting for positive values.

Value

Returns a list of parameters for the best-fit normal distribution (alpha, mean, varience, and lambda).

Author(s)

Charles G. Danko

Runs metagene analysis for sense and antisense direction.

Description

Supports parallel processing using mclapply in the 'parallel' package. To change the number of processors, set the option 'mc.cores'.

Usage

runMetaGene(features, reads, anchorType = "TSS", size = 100L,
  normCounts = 1L, up = 10000L, down = NULL, sampling = FALSE,
  nSampling = 1000L, samplingRatio = 0.1, ...)
runMetaGene(features, reads, anchorType = "TSS", size = 100L,
  normCounts = 1L, up = 10000L, down = NULL, sampling = FALSE,
  nSampling = 1000L, samplingRatio = 0.1, ...)

Arguments

`features`	GRanges A GRanges object representing a set of genomic coordinates, i.e., set of genes.
`reads`	GRanges of reads.
`anchorType`	Either 'TSS' or 'TTS'. Metagene will be centered on the transcription start site(TSS) or transcription termination site(TTS). Default: TSS.
`size`	Numeric. The size of the moving window. Default: 100L
`normCounts`	Numeric. Normalization vector such as average reads. Default: 1L
`up`	Numeric. Distance upstream of each feature to align and histogram. Default: 1 kb
`down`	Numeric. Distance downstream of each feature to align and histogram. If NULL, down is same as up. Default: NULL
`sampling`	Logical. If TRUE, subsampling of Metagene is used. Default: FALSE
`nSampling`	Numeric. Number of subsampling. Default: 1000L
`samplingRatio`	Numeric. Ratio of sampling for features. Default: 0.1
`...`	Extra argument passed to mclapply.

Value

A list of integer-Rle for sense and antisene.

Author(s)

Minho Chae

Examples

features <- GRanges("chr7", IRanges(start=1000:1001, width=rep(1,2)),
 strand=c("+", "-"))
reads <- GRanges("chr7", IRanges(start=c(1000:1003, 1100:1101),
 width=rep(1, 6)), strand=rep(c("+","-"), 3))
## Not run:
# mg <- runMetaGene(features, reads, size=4, up=10)
features <- GRanges("chr7", IRanges(start=1000:1001, width=rep(1,2)),
 strand=c("+", "-"))
reads <- GRanges("chr7", IRanges(start=c(1000:1003, 1100:1101),
 width=rep(1, 6)), strand=rep(c("+","-"), 3))
## Not run:
# mg <- runMetaGene(features, reads, size=4, up=10)

A 'total least squares' implementation using demming regression.

Description

A 'total least squares' implementation using demming regression.

Usage

tlsDeming(x, y, d = 1)
tlsDeming(x, y, d = 1)

Arguments

`x`	X values.
`y`	Y values.
`d`	Ratio of variences. Default: 1, for orthogonal regression.

Value

Parameters for the linear model.

Author(s)

Charles G. Danko

A 'total least squares'-like hack for LOESS. Works by rotating points 45 degrees, fitting LOESS, and rotating back.

Description

A 'total least squares'-like hack for LOESS. Works by rotating points 45 degrees, fitting LOESS, and rotating back.

Usage

tlsLoess(x, y, theta = -pi/4, span = 1)
tlsLoess(x, y, theta = -pi/4, span = 1)

Arguments

`x`	X values.
`y`	Y values.
`theta`	Amount to rotate, sets the ratio of variences that are assumed by the hack. Default: -pi/4 radians (45 degrees) for orthogonal regression.
`span`	The LOESS span parameter. Default: 1

Value

List of input values and LOESS predictions.

Author(s)

Charles G. Danko

A 'total least squares' implementation using singular value demposition.

Description

A 'total least squares' implementation using singular value demposition.

Usage

tlsSvd(x, y)
tlsSvd(x, y)

Arguments

`x`	X values.
`y`	Y values.

Value

Parameters for the linear model Y~a*X+e.

Author(s)

Charles G. Danko

windowAnalysis Returns a vector of integers representing the counts of reads in a moving window.

Description

Supports parallel processing using mclapply in the 'parallel' package. To change the number of processors, set the option 'mc.cores'.

Usage

windowAnalysis(reads, strand = "*", windowSize = stepSize,
  stepSize = windowSize, chrom = NULL, limitPCRDups = FALSE, ...)
windowAnalysis(reads, strand = "*", windowSize = stepSize,
  stepSize = windowSize, chrom = NULL, limitPCRDups = FALSE, ...)

Arguments

`reads`	GenomicRanges object representing the position of reads mapping in the genome.
`strand`	Takes values of "+", "-", or "". "" denotes collapsing reads on both strands. Default: "*".
`windowSize`	Size of the moving window. Either windowSize or stepSize must be specified.
`stepSize`	The number of bp moved with each step.
`chrom`	Chromosome for which to return data. Default: returns all avaliable data.
`limitPCRDups`	Counts only one read mapping to each start site. NOTE: If set to TRUE, assumes that all reads are the same length (don't use for paired-end data). Default: FALSE.
`...`	Extra argument passed to mclapply

Value

Returns a list object, each element of which represents a chromosome.

Author(s)

Charles G. Danko and Minho Chae

Examples

S0mR1 <- as(readGAlignments(system.file("extdata", "S0mR1.bam",
     package="groHMM")), "GRanges")
## Not run:
# Fp <- windowAnalysis(S0mR1, strand="+", windowSize=50)
S0mR1 <- as(readGAlignments(system.file("extdata", "S0mR1.bam",
     package="groHMM")), "GRanges")
## Not run:
# Fp <- windowAnalysis(S0mR1, strand="+", windowSize=50)

writeWiggle writes a wiggle track or BigWig file suitable for uploading to the UCSC genome browser.

Description

writeWiggle writes a wiggle track or BigWig file suitable for uploading to the UCSC genome browser.

Usage

writeWiggle(reads, file, strand = "*", fileType = "wig", size = 50,
  normCounts = NULL, reverse = FALSE, seqinfo = NULL,
  track.type.line = FALSE, ...)
writeWiggle(reads, file, strand = "*", fileType = "wig", size = 50,
  normCounts = NULL, reverse = FALSE, seqinfo = NULL,
  track.type.line = FALSE, ...)

Arguments

`reads`	GenomicRanges object representing the position of reads mapping in the genome.
`file`	Specifies the filename for output.
`strand`	Takes values of "+", "-", or "". Computes Writes a wiggle on the speicified strand. "" denotes collapsing reads on both strands. Default: "*".
`fileType`	Takes values of "wig" or "BigWig". Default: "wig".
`size`	Size of the moving window.
`normCounts`	A normalization factor correcting for library size or other effects. For example, total mappible read counts might be a reasonable value. Default: 1 (i.e. no normalization).
`reverse`	If set to TRUE, multiplies values by -1. Used for reversing GRO-seq data on the negative (-) strand. Default: FALSE
`seqinfo`	Seqinfo object for reads. Default: NULL.
`track.type.line`	If set to TRUE, prints a header identifying the file as a wiggle. Necessary to upload a custom track to the UCSC genome browser. Default: TRUE
`...`	Extra argument passed to mclapply.

Author(s)

Minho Chae and Charles G. Danko

Examples

S0mR1 <- as(readGAlignments(system.file("extdata", "S0mR1.bam",
package="groHMM")), "GRanges")
## Not run:
# writeWiggle(reads=S0mR1, file="S0mR1_Plus.wig", fileType="wig",
# strand="+", reverse=FALSE)
S0mR1 <- as(readGAlignments(system.file("extdata", "S0mR1.bam",
package="groHMM")), "GRanges")
## Not run:
# writeWiggle(reads=S0mR1, file="S0mR1_Plus.wig", fileType="wig",
# strand="+", reverse=FALSE)

Package 'groHMM'

Help Index

groHMM: GRO-seq Analysis Pipeline

Description

Details

Author(s)

References

Returns the average profile of tiling array probe intensity values or wiggle-like count data centered on a set of genomic positions (specified by 'Peaks').

Description

Usage

Arguments

Value

Author(s)

breakTranscriptsOnGenes Breaks transcripts on genes

Description

Usage

Arguments

Value

Author(s)

Examples

combineTranscripts Combines transnscipts.

Description

Usage

Arguments

Value

Author(s)

Examples

countMappableReadsInInterval counts the number of mappable reads in a set of genomic features.

Description

Usage

Arguments

Value

Author(s)

detectTranscripts detects transcripts de novo using a two-state hidden Markov model (HMM).

Description

Usage

Arguments

Details

Value

Author(s)

Examples

evaluateHMM Evaluates HMM calling.

Description

Usage

Arguments

Value

Author(s)

Examples

Function identifies expressed features using the methods introduced in Core, Waterfall, Lis; Science, Dec. 2008.

Description

Usage

Arguments

Value

Author(s)

Returns the number of cores.

Description

Usage

Arguments

Examples

getTxDensity Calculates transcript density.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

limitToXkb truncates a set of genomic itnervals at a constant, maximum size.

Description

Usage

Arguments

Value

Author(s)

Examples

makeConsensusAnnotations Makes a consensus annotation

Description

Usage

Arguments

Details

Value