Title: | Plots data (continuous/discrete) along chromosomal ideogram |
---|---|
Description: | Plots data associated with arbitrary genomic intervals along chromosomal ideogram. |
Authors: | Shraddha Pai <[email protected]>, Jingliang Ren |
Maintainer: | Shraddha Pai <[email protected]> |
License: | GPL-2 |
Version: | 1.43.0 |
Built: | 2024-10-30 07:32:11 UTC |
Source: | https://github.com/bioc/IdeoViz |
Plotting discrete or continuous dataseries in the context of chromosomal location has several useful applications in genomic analysis. Examples of possible metrics include RNA expression levels, densities of epigenetic marks or genomic variation, while applications could range from the analysis of a single variable in a single context, to multiple measurements in several biological contexts (e.g. age/sex/tissue/disease context). Visualization of metrics against the chromosome could identify:
could identify distinctive spatial distribution that could further hypotheses about the functional role of the metric (e.g. telocentric or pericentromeric enrichment)
could highlight distribution differences between different groups of samples, suggesting different regulatory mechanisms; in extreme cases, visualization may identify large genomic foci of differences
could confirm that a quantitative difference measured between groups of interest is consistent throughout the genome (i.e. that there are no foci, and that the change is global).
This package provides a method to plot one or several dataseries against the chromosomal ideogram. It provides some simple options (vertical/horizontal orientation, display in bars or linegraphs). Data are expected to be binned; IdeoViz provides a function for user-specified bin widths. Ideograms for the genome of choice can also be automatically downloaded from UCSC using the getIdeo() function.
Package: | IdeoViz |
Type: | Package |
Title: | Plots data (continuous/discrete) along chromosomal ideogram |
Version: | 0.99.1 |
Date: | 2013-06-26 |
Author: | Shraddha Pai <[email protected]>, Jingliang Ren |
Maintainer: | Shraddha Pai <[email protected]> |
Depends: | Biobase, IRanges, GenomicRanges, RColorBrewer, rtracklayer |
biocViews: | Visualization,Microarray |
License: | GPL-2 |
Shraddha Pai <[email protected]>, Jingliang Ren
Aggregates data by genomic bins
avgByBin(xpr, featureData, target_GR, justReturnBins = FALSE, getBinCountOnly = FALSE, FUN = mean, doSampleCor = FALSE, verbose = FALSE)
avgByBin(xpr, featureData, target_GR, justReturnBins = FALSE, getBinCountOnly = FALSE, FUN = mean, doSampleCor = FALSE, verbose = FALSE)
xpr |
(data.frame or matrix) Locus-wise values. Rows correspond to genomic intervals (probes, genes, etc.,) while columns correspond to individual samples |
featureData |
(data.frame or GRanges) Locus coordinates. Row order must match |
target_GR |
(GRanges) Target intervals, with coordinate system matching that of featureData. |
justReturnBins |
(logical) when TRUE, returns the coordinates of the bin to which each row belongs. Does not aggregate data in any way. This output can be used as input for more complex functions with data from each bin. |
getBinCountOnly |
(logical) when TRUE, does not aggregate or expect xpr. Only returns number of overlapping subject ranges per bin. Speeds up computation. |
FUN |
(function) function to aggregate data in bin |
doSampleCor |
(logical) set to TRUE to compute mean pairwise
sample correlation (Pearson correlation) for each bin; when TRUE,
this function overrides |
verbose |
(logical) print status messages |
Computed mean value of binned data. This function assumes
that all elements in featureData
have identical width.
If provided with elements of disparate widths, the respective widths
are not weighted averaging. This behaviour may change in future
versions of IdeoViz.
This function allows the user to bin data if this hasn't already
been done, and is a step involved in preparing the data for
plotOnIdeo()
. This function computes binned within-sample
average of probes overlapping the same range. Where a range
overlaps multiple bins, it gets counted in all.
(GRanges) Binned data or binning statistics; information
returned for non-empty bins only.
The default for this function is to return binned data; alternately,
if justReturnBins=TRUE
or getBinCountOnly=TRUE
the
function will return statistics on bin counts. The latter may be
useful to plot spatial density of the input metric.
The flags
and output types are presented in order of evaluation precedence:
If getBinCountOnly=TRUE
, returns a list with a single
entry: bin_ID: (data.frame) bin information: chrom, start,
end, width, strand, index, and count. "index" is the row number of
target_GR to which this bin corresponds
If justReturnBins=TRUE
and
getBinCountOnly=FALSE
, returns a list with three entries:
bin_ID: same as bin_ID in output 1 above
xpr:(data.frame) B-by-n columns where B is total number of [target_GR, featureData] overlaps (see next entry, binmap_idx) and n is number of columns in xpr; column order matches xpr. Contains sample-wise data "flattened" so that each [target,subject] pair is presented. More formally, entry [i,j] contains expression for overlap of row i from binmap_idx for sample j (where 1 <= i <= B, 1 <= j <= n)
binmap_idx:(matrix) two-column matrix:
1) target_GR row, 2) row of featureData which overlaps with index
in column 1. (matrix output of
GenomicRanges::findOverlaps())
)
Default: If justReturnBins=FALSE
and getBinCountOnly=FALSE
, returns a GRanges object. Results are contained in the elementMetadata
slot. For a dataset with n samples, the table would have (n+1) columns; the first column is bin_count, and indicates number of units contained in that bin. Columns (2:(n+1)) contain binned values for each sample in column order corresponding to that of xpr.
For doSampleCor=TRUE
, result is in a metadata column with name "mean_pairwise"cor". Bins with a single datapoint per sample get a value of NA.
getIdeo()
, getBins()
ideo_hg19 <- getIdeo("hg19") data(GSM733664_broadPeaks) chrom_bins <- getBins(c("chr1","chr2","chrX"), ideo_hg19,stepSize=5*100*1000) # default binning mean_peak <- avgByBin(data.frame(value=GSM733664_broadPeaks[,7]), GSM733664_broadPeaks[,1:3], chrom_bins) # custom function median_peak <- avgByBin( data.frame(value=GSM733664_broadPeaks[,7]), GSM733664_broadPeaks[,1:3], chrom_bins, FUN=median) # mean pairwise sample correlation data(binned_multiSeries) bins2 <- getBins(c("chr1"), ideo_hg19, stepSize=5e6) samplecor <- avgByBin(mcols(binned_multiSeries)[,1:3], binned_multiSeries, bins2, doSampleCor=TRUE) # just get bin count binstats <- avgByBin(data.frame(value=GSM733664_broadPeaks[,7]), GSM733664_broadPeaks[,1:3], chrom_bins, getBinCountOnly=TRUE)
ideo_hg19 <- getIdeo("hg19") data(GSM733664_broadPeaks) chrom_bins <- getBins(c("chr1","chr2","chrX"), ideo_hg19,stepSize=5*100*1000) # default binning mean_peak <- avgByBin(data.frame(value=GSM733664_broadPeaks[,7]), GSM733664_broadPeaks[,1:3], chrom_bins) # custom function median_peak <- avgByBin( data.frame(value=GSM733664_broadPeaks[,7]), GSM733664_broadPeaks[,1:3], chrom_bins, FUN=median) # mean pairwise sample correlation data(binned_multiSeries) bins2 <- getBins(c("chr1"), ideo_hg19, stepSize=5e6) samplecor <- avgByBin(mcols(binned_multiSeries)[,1:3], binned_multiSeries, bins2, doSampleCor=TRUE) # just get bin count binstats <- avgByBin(data.frame(value=GSM733664_broadPeaks[,7]), GSM733664_broadPeaks[,1:3], chrom_bins, getBinCountOnly=TRUE)
Simulated data spanning all autosomes and X,Y chromosomes of the human genome (build hg18). Values consist of a single dataseries of random uniform distribution between -1 and + 1. The chromosomes are tiled in 1Mb bins and coordinates are one-based.
data(binned_fullGenome)
data(binned_fullGenome)
Simulated data, generated by Shraddha Pai
data(binned_fullGenome) head(binned_fullGenome) seqlevels(binned_fullGenome)
data(binned_fullGenome) head(binned_fullGenome) seqlevels(binned_fullGenome)
A simulated dataseries spanning chr1,chrX,chrY of the human genome (build hg18). Values consist of five series constructed to show mostly random behaviour with the exception of elevated signal in a few regions. The chromosomes are tiled in 1Mb bins and coordinates are one-based.
data(binned_multiSeries)
data(binned_multiSeries)
Simulated data, generated by Shraddha Pai
data(binned_multiSeries) head(binned_multiSeries)
data(binned_multiSeries) head(binned_multiSeries)
Simulated data spanning 3 human chromosomes and varying in a random uniform distribution between -1 and +1.
data(binned_singleSeries)
data(binned_singleSeries)
Simulated data by Shraddha Pai
data(binned_singleSeries) head(binned_singleSeries)
data(binned_singleSeries) head(binned_singleSeries)
getBins
getBins(chroms, ideo, binLim = NULL, stepSize)
getBins(chroms, ideo, binLim = NULL, stepSize)
chroms |
(character) chromosomes to generate bins for |
ideo |
(data.frame) ideogram table as generated by |
stepSize |
(integer) bin size in bases |
binLim |
(numeric, |
length 2) [start, end] of genomic range to generate bins for. A value of NULL results in binning of entire chromosome
Get uniformly-sized bins of specified width
This is a helper function used to generate binned data for plotOnIdeo()
. It takes the chromosome-wide extents from ideo, which is essentially the cytoBandIdeo table from UCSC browser with the header as the first row. A use case is to generate bins using this function and supply the output to avgByBin()
to bin the data.
(GRanges) bin ranges in 1-base coordinates
getIdeo()
,avgByBin()
ideo_hg19 <- getIdeo("hg19") chrom_bins <- getBins(c("chr1","chr2","chrX"), ideo_hg19,stepSize=5*100*1000)
ideo_hg19 <- getIdeo("hg19") chrom_bins <- getBins(c("chr1","chr2","chrX"), ideo_hg19,stepSize=5*100*1000)
Download ideogram table from UCSC
getIdeo(ideoSource)
getIdeo(ideoSource)
ideoSource |
(character) Genome build for data (e.g. mm10). |
Download table containing chromosomal extent and band locations from the UCSC genome browser
Uses rtracklayer
to retrieve the cytoBandIdeo.
table from the UCSC genome browser. The cytoBandIdeo table
contains chromosomal ideogram information and is used to graph the
chromosomal bands in plotOnIdeo()
. This table is provided as
input to plotOnIdeo()
. In the case where the user bins the
data, the output of this function can also be used as input to
generate bin coordinates for binning the data (see
avgByBin()
).
(data.frame) ideogram table
avgByBin()
,getBins()
getIdeo("mm9")
getIdeo("mm9")
Broadpeaks file mapping H3K9me3 marks in human lymphoblastoid cells (peaks from chr1, chr2, and chrX).
data(GSM733664_broadPeaks)
data(GSM733664_broadPeaks)
GEO accession GSM733664, subset containing chr1,chr2,and chrX peaks.
ENCODE Project Consortium, Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74.
data(GSM733664_broadPeaks) head(GSM733664_broadPeaks)
data(GSM733664_broadPeaks) head(GSM733664_broadPeaks)
Cytoband information for all chromosomes in human genome build hg18. Used for vignette examples.
data(hg18_ideo)
data(hg18_ideo)
UCSC genome browser.
data(hg18_ideo) head(hg18_ideo)
data(hg18_ideo) head(hg18_ideo)
Base function which plots the ideogram and superimposed data for a single chromosome. plotOnIdeo() calls this function and stacks the resulting output.
plotChromValuePair(chrom, cytoTable, bpLim, vertical, values_GR,val_range, col, value_cols = "values", default_margins, addScale, ablines_y, smoothVals, span=0.03, verbose = FALSE, ...)
plotChromValuePair(chrom, cytoTable, bpLim, vertical, values_GR,val_range, col, value_cols = "values", default_margins, addScale, ablines_y, smoothVals, span=0.03, verbose = FALSE, ...)
... |
arguments to |
chrom(character) |
chromosome(s) to create ideograms for |
cytoTable(data.frame) |
loaded ideogram table. (see ideoTable argument to plotOnIdeo()) |
bpLim(numeric) |
(aka xlim); display only a section of the chromosome and the corresponding values |
vertical(logical) |
if TRUE, chromosomes will be plotted vertically |
values_GR(list |
or GenomicRanges) If plotType is "lines" or "rect", the function expects this to be a GRanges() object with data series in metadata columns. If plotType is "seg_tracks" this is a list of GRanges(), each entry of which represents a track. |
val_range(numeric) |
(aka ylim); y-axis scale for data series |
col(character) |
colour for series |
value_cols(character) |
column name for series to plot |
default_margins(numeric) |
page inner margins (in inches) |
addScale(logical) |
if FALSE, bp positions will be hidden |
ablines_y(numeric) |
when specified, will draw reference lines on the y-axis |
smoothVals(logical) |
when T applies loess() to each series |
span(numeric) |
loess::span() |
verbose(logical) |
print messages |
Plots one unit of chromosome ideogram with dataseries superimposed. Usually, the user can avoid this function and directly call plotOnIdeo(). However, this function may be used in cases where further plot customization is required.
plotOnIdeo()
data(hg18_ideo) data(binned_multiSeries) layout(matrix(1:2, byrow=TRUE,ncol=1),heights=c(2.5,1)) plotChromValuePair("chr1",hg18_ideo, values_GR=binned_multiSeries, value_cols=colnames(mcols(binned_multiSeries)),plotType='lines', col=1:5,val_range=c(0,10),bpLim=NULL,vertical=FALSE,addScale=TRUE, ablines_y=NULL,smoothVals=FALSE,default_margins=c(0.5,.5,.1,.1))
data(hg18_ideo) data(binned_multiSeries) layout(matrix(1:2, byrow=TRUE,ncol=1),heights=c(2.5,1)) plotChromValuePair("chr1",hg18_ideo, values_GR=binned_multiSeries, value_cols=colnames(mcols(binned_multiSeries)),plotType='lines', col=1:5,val_range=c(0,10),bpLim=NULL,vertical=FALSE,addScale=TRUE, ablines_y=NULL,smoothVals=FALSE,default_margins=c(0.5,.5,.1,.1))
Plot data superimposed on chromosomal ideogram
plotOnIdeo(chrom = stop("enter chromosome(s) to plot"), ideoTable, values_GR, value_cols = "values", plotType = "lines", col = "orange", bpLim = NULL, val_range = NULL, addScale = TRUE, scaleChrom = TRUE, vertical = FALSE, addOnetoStart = TRUE, smoothVals = FALSE, cex.axis = 1,plot_title = NULL, ablines_y = NULL, cex.main=1, ...)
plotOnIdeo(chrom = stop("enter chromosome(s) to plot"), ideoTable, values_GR, value_cols = "values", plotType = "lines", col = "orange", bpLim = NULL, val_range = NULL, addScale = TRUE, scaleChrom = TRUE, vertical = FALSE, addOnetoStart = TRUE, smoothVals = FALSE, cex.axis = 1,plot_title = NULL, ablines_y = NULL, cex.main=1, ...)
chrom |
(character) chromosome(s) to create ideograms for |
ideoTable |
(data.frame) ideogram table. See getIdeo() |
values_GR |
(GenomicRanges) data to be plotted must be in metadata columns |
value_cols |
(character) which series to plot. Should be column names of the mcols() slot of values_GR |
plotType |
(character) Plot type for each series. Values can be "lines" or "rect" to plot lines or barplots respectively. The latter is not recommended when several series are to be plotted on the same axis.) |
col |
(character) vector of colors for data series. If provided as a named vector, will use the metadata column "group" to code points by colour. This option is available for plotType="seg_tracks" only. |
bpLim |
(numeric) (xlim); display only a section of the chromosome and the corresponding values |
val_range |
(numeric) (ylim); y-axis scale for data series |
addScale |
(logical) if TRUE, bp positions will be shown along the chromosomes. This feature should be turned off if numerous chromosomes' worth of data are being plotted and all objects don't fit on the final graphics device. |
scaleChrom |
(logical) if FALSE, all chroms will display as the same size. scaleChrom will be ignored if bpLim is not NULL |
vertical |
(logical) if TRUE, chromosomes will be plotted vertically |
addOnetoStart |
(logical) if TRUE, adds 1 to chromStart. Useful to convert data in half-open coordinates - which is all data from the UCSC genome browser, including cytoBandIdeo, into 1-base. |
smoothVals |
(logical) if T, smoothes each trendline. Currently hard-coded to lowess smoothing with span=0.03 |
cex.axis |
(integer) axis font size |
plot_title |
(character) title for overall graph |
ablines_y |
(numeric) when supplied, draws reference lines on the y-axis |
cex.main |
(numeric) font size for plot title. |
... |
other graphing options for barplot (i.e. main="Values", to title bar plot "Values") |
Main function to plot binned data alongside chromosomal ideogram. plotOnIdeo() is the main function of this package. It is the one the end-user is expected to call to generate plots. Input is provided as a GRanges object (values_GR), with data to be plotted contained in its metadata slot. The user is responsible for providing pre-binned data, if binning is required. Data can also be binned using the avgByBin() function in this package. The ideogram table (ideoTable) is the same as the cytoBandIdeo table available from the UCSC genome browser database for a given genome is a can be either automatically downloaded from UCSC (see getIdeo()) or read in from a local-file and passed to this function.
There are numerous arguments which control the appearance of the plot. The main decision points are:
vertical: Whether the entire plot should have a horizontal or vertical orientation
plotType: One of [rect|lines|seg_tracks]. Type of plot, trendline ("lines"), barplot ("rect") or tracks of GenomicRanges (seg_tracks). "rect" only works when there is a single data series (single set of values) to be plotted on the same axis.
Other considerations:
The size of the graphics device limits the number of chromosomes that can be plotted. A simple solution may be to set addScale=FALSE. However, it is recommended to call plotOnIdeo() multiple times, and plotting a fewer number of chromosomes on each page.
The code expects coordinates of values_GR to be in 1-base. Set addOneToStart=TRUE if supplied coordinates are in 0-base.
data(binned_multiSeries) data(hg18_ideo) plotOnIdeo(chrom=seqlevels(binned_multiSeries), ideoTable=hg18_ideo, values_GR=binned_multiSeries, value_cols=colnames(mcols(binned_multiSeries)), col=1:5)
data(binned_multiSeries) data(hg18_ideo) plotOnIdeo(chrom=seqlevels(binned_multiSeries), ideoTable=hg18_ideo, values_GR=binned_multiSeries, value_cols=colnames(mcols(binned_multiSeries)), col=1:5)
A simulated dataseries spanning three chromosomes, and containing five series. The chromosomes are tiled in 1Mb windows.
data(wins)
data(wins)
Simulation by Shraddha Pai
data(wins) head(wins)
data(wins) head(wins)
Simulated data spanning 3 human chromosomes and varying in a random uniform distribution between -1 and +1.
data(wins_discrete)
data(wins_discrete)
Simulated data by Shraddha Pai
data(wins_discrete) head(wins_discrete)
data(wins_discrete) head(wins_discrete)
Simulated data spanning all human chromosomes. Values follow random uniform distribution between -1 nd + 1.
data(wins_entiregenome)
data(wins_entiregenome)
Simulated data, generated by Shraddha Pai
data(wins_entiregenome) head(wins_entiregenome) seqlevels(wins_entiregenome)
data(wins_entiregenome) head(wins_entiregenome) seqlevels(wins_entiregenome)