Title: | Visualization Tool for GWAS Result |
---|---|
Description: | Manhattan plot and QQ Plot are commonly used to visualize the end result of Genome Wide Association Study. The "ggmanh" package aims to keep the generation of these plots simple while maintaining customizability. Main functions include manhattan_plot, qqunif, and thinPoints. |
Authors: | John Lee [aut, cre], John Lee [aut] (AbbVie), Xiuwen Zheng [ctb, dtc] |
Maintainer: | John Lee <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.9.7 |
Built: | 2024-07-21 03:16:46 UTC |
Source: | https://github.com/bioc/ggmanh |
Calculate the actual x-positions of each point used for
the manhattan plot. MPdata
object contains the unscaled positions
that has not been positioned according to the relative position and width
of each chromosome.
calc_new_pos(mpdata)
calc_new_pos(mpdata)
mpdata |
an |
This is used calculate the actual positions used for the
inside manhattan_plot
function. It was designed this way should the
scaling and relative positioning of each chromosome be changed (e.g. gap
between the )
a numeric
vector containing the scaled x-positions.
gwasdat <- data.frame( "chromosome" = rep(1:5, each = 30), "position" = c(replicate(5, sample(1:300, 30))), "pvalue" = rbeta(150, 1, 1)^5 ) mpdata <- manhattan_data_preprocess( gwasdat, pval.colname = "pvalue", chr.colname = "chromosome", pos.colname = "position", chr.order = as.character(1:5) ) calc_new_pos(mpdata)
gwasdat <- data.frame( "chromosome" = rep(1:5, each = 30), "position" = c(replicate(5, sample(1:300, 30))), "pvalue" = rbeta(150, 1, 1)^5 ) mpdata <- manhattan_data_preprocess( gwasdat, pval.colname = "pvalue", chr.colname = "chromosome", pos.colname = "position", chr.order = as.character(1:5) ) calc_new_pos(mpdata)
Find path to the default gds file.
default_gds_path()
default_gds_path()
A character vector.
default_gds_path()
default_gds_path()
Retrieve variant annotation stored in a GDS file with chromosome location or rs.id.
gds_annotate( x, gdsfile = NULL, annot.method = "position", chr = NULL, pos = NULL, ref = NULL, alt = NULL, rs.id = NULL, concat_char = "/", verbose = TRUE, annotation_names = c("annotation/info/symbol", "annotation/info/consequence", "annotation/info/LoF") )
gds_annotate( x, gdsfile = NULL, annot.method = "position", chr = NULL, pos = NULL, ref = NULL, alt = NULL, rs.id = NULL, concat_char = "/", verbose = TRUE, annotation_names = c("annotation/info/symbol", "annotation/info/consequence", "annotation/info/LoF") )
x |
a |
gdsfile |
a character for GDS filename. If |
annot.method |
a method for searching variants. "position" requires |
chr , pos , ref , alt , rs.id
|
column names of |
concat_char |
a character used to separate multiple annotations returned from the gds file. |
verbose |
output messages. |
annotation_names |
a character vector of nodes of the |
A character vector the length of nrow(x)
if concat_char
is a character.
A data frame with nrow(x)
rows and length(annotation_names)
if concat_char
is null.
vardata <- data.frame( chr = c(11,20,14), pos = c(12261002, 10033792, 23875025), ref = c("G", "G", "CG"), alt = c("A", "A", "C") ) annotations <- gds_annotate( x = vardata, annot.method = "position", chr = "chr", pos = "pos", ref = "ref", alt = "alt" ) print(annotations)
vardata <- data.frame( chr = c(11,20,14), pos = c(12261002, 10033792, 23875025), ref = c("G", "G", "CG"), alt = c("A", "A", "C") ) annotations <- gds_annotate( x = vardata, annot.method = "position", chr = "chr", pos = "pos", ref = "ref", alt = "alt" ) print(annotations)
ggmanh provides flexible tools for visualizing GWAS result for downstream analysis.
Manhattan plot is commonly used to display significant Single Nucleotide Polymorphisms (SNPs)
in Genome Wide Association Study (GWAS)
This package comes with features useful for manhattan plot creation, including annotation with ggrepel
,
truncating data for faster plot generation, and manual rescaling of the y-axis.
The manhattan plot is generated in two steps: data preprocessing and plotting. This allows the user to iteratively
customize the plot without having the process the GWAS summary data over and over again.
Currently, data.frame
and GRanges
from GenomicRanges
are supported.
A vignette detailing the usage of the package is accessible by vignette("ggmanh")
Maintainer: John Lee [email protected]
Authors:
John Lee [email protected] (AbbVie)
Other contributors:
Xiuwen Zheng [email protected] [contributor, data contributor]
ggmanh
provides a GDS file whose path is accessible by default_gds_path
.
The original annotation file is from the gnomAD browser v2.1.1 release, available in this link: https://gnomad.broadinstitute.org/downloads.
This gds file contains variants in the exome with the global minor allele frequency 0.0002, and has been manually curated
to fit the file size requirement for R Bioconductor packages.
A GDS file with 1015430 variants with chromosome, position, allele, gene symbol, Ensembl VEP Consequence, and predicted LoF.
Preprocesses a result from Genome Wide Association Study
before making a manhattan plot.
It accepts a data.frame
, which at bare minimum should
contain a chromosome, position, and p-value.
Additional options, such as chromosome color, label colum names,
and colors for specific variants, are provided here.
manhattan_data_preprocess(x, ...) ## Default S3 method: manhattan_data_preprocess(x, ...) ## S3 method for class 'data.frame' manhattan_data_preprocess( x, chromosome = NULL, signif = c(5e-08, 1e-05), pval.colname = "pval", chr.colname = "chr", pos.colname = "pos", highlight.colname = NULL, chr.order = NULL, signif.col = NULL, chr.col = NULL, highlight.col = NULL, preserve.position = FALSE, thin = NULL, thin.n = 1000, thin.bins = 200, pval.log.transform = TRUE, chr.gap.scaling = 1, ... ) ## S4 method for signature 'GRanges' manhattan_data_preprocess( x, chromosome = NULL, signif = c(5e-08, 1e-05), pval.colname = "pval", highlight.colname = NULL, chr.order = NULL, signif.col = NULL, chr.col = NULL, highlight.col = NULL, preserve.position = FALSE, thin = NULL, thin.n = 100, thin.bins = 200, pval.log.transform = TRUE, chr.gap.scaling = 1, ... )
manhattan_data_preprocess(x, ...) ## Default S3 method: manhattan_data_preprocess(x, ...) ## S3 method for class 'data.frame' manhattan_data_preprocess( x, chromosome = NULL, signif = c(5e-08, 1e-05), pval.colname = "pval", chr.colname = "chr", pos.colname = "pos", highlight.colname = NULL, chr.order = NULL, signif.col = NULL, chr.col = NULL, highlight.col = NULL, preserve.position = FALSE, thin = NULL, thin.n = 1000, thin.bins = 200, pval.log.transform = TRUE, chr.gap.scaling = 1, ... ) ## S4 method for signature 'GRanges' manhattan_data_preprocess( x, chromosome = NULL, signif = c(5e-08, 1e-05), pval.colname = "pval", highlight.colname = NULL, chr.order = NULL, signif.col = NULL, chr.col = NULL, highlight.col = NULL, preserve.position = FALSE, thin = NULL, thin.n = 100, thin.bins = 200, pval.log.transform = TRUE, chr.gap.scaling = 1, ... )
x |
a data frame or any other extension of data frame (e.g. a tibble). At bare minimum, it should contain chromosome, position, and p-value. |
... |
Additional arguments for manhattan_data_preprocess. |
chromosome |
a character. This is supplied if a manhattan plot of a single chromosome is
desired. If |
signif |
a numeric vector. Significant p-value thresholds to be drawn for manhattan plot. At least one value should be provided. Default value is c(5e-08, 1e-5) |
pval.colname |
a character. Column name of |
chr.colname |
a character. Column name of |
pos.colname |
a character. Column name of |
highlight.colname |
a character. If you desire to color certain points
(e.g. significant variants) rather than color by chromosome, you can specify the
category in this column, and provide the color mapping in |
chr.order |
a character vector. Order of chromosomes presented in manhattan plot. |
signif.col |
a character vector of equal length as |
chr.col |
a character vector of equal length as chr.order. It contains colors
for the chromosomes. Name of the vector should match |
highlight.col |
a character vector. It contains color mapping for the values from
|
preserve.position |
a logical. If |
thin |
a logical. If |
thin.n |
an integer. Number of max points per horizontal partitions of the plot. Defaults to 1000. |
thin.bins |
an integer. Number of bins to partition the data. Defaults to 200. |
pval.log.transform |
a logical. If |
chr.gap.scaling |
scaling factor for gap between chromosome if you desire to change it.
This can also be set in |
manhattan_data_preprocess
gathers information needed to plot a manhattan plot
and organizes the information as MPdata
S3 object.
New positions for each points are calculated, and stored in the data.frame as
"new_pos"
. By default, all chromosomes will have the same width, with each
point being equally spaced. This behavior is changed when preserve.position = TRUE
.
The width of each chromosome will scale to the number of points and the points will
reflect the original positions.
chr.col
and highlight.col
, maps the data values to colors. If they are
an unnamed vector, then the function will try its best to match the values of
chr.colname
or highlight.colname
to the colors. If they are a named vector,
then they are expected to map all values to a color. If highlight.colname
is
supplied, then chr.col
is ignored.
While feeding a data.frame
directly into manhattan_plot
does preprocessing & plotting in one step. If you plan on making multiple plots
with different graphic options, you have the choice to preprocess separately and
then generate plots.
a MPdata object. This object contains all the necessary info for constructing a manhattan plot.
gwasdat <- data.frame( "chromosome" = rep(1:5, each = 30), "position" = c(replicate(5, sample(1:300, 30))), "pvalue" = rbeta(150, 1, 1)^5 ) manhattan_data_preprocess( gwasdat, pval.colname = "pvalue", chr.colname = "chromosome", pos.colname = "position", chr.order = as.character(1:5) )
gwasdat <- data.frame( "chromosome" = rep(1:5, each = 30), "position" = c(replicate(5, sample(1:300, 30))), "pvalue" = rbeta(150, 1, 1)^5 ) manhattan_data_preprocess( gwasdat, pval.colname = "pvalue", chr.colname = "chromosome", pos.colname = "position", chr.order = as.character(1:5) )
A generic function for manhattan plot.
manhattan_plot(x, ...) manhattan_plot.default(x, ...) ## S3 method for class 'data.frame' manhattan_plot( x, chromosome = NULL, outfn = NULL, signif = c(5e-08, 1e-05), pval.colname = "pval", chr.colname = "chr", pos.colname = "pos", label.colname = NULL, highlight.colname = NULL, chr.order = NULL, signif.col = NULL, chr.col = NULL, highlight.col = NULL, rescale = TRUE, rescale.ratio.threshold = 5, signif.rel.pos = 0.2, chr.gap.scaling = 1, color.by.highlight = FALSE, preserve.position = FALSE, thin = NULL, thin.n = 1000, thin.bins = 200, pval.log.transform = TRUE, plot.title = ggplot2::waiver(), plot.subtitle = ggplot2::waiver(), plot.width = 10, plot.height = 5, point.size = 0.75, label.font.size = 2, max.overlaps = 20, x.label = "Chromosome", y.label = expression(-log[10](p)), ... ) ## S3 method for class 'MPdata' manhattan_plot( x, chromosome = NULL, outfn = NULL, signif = NULL, signif.col = NULL, rescale = TRUE, rescale.ratio.threshold = 5, signif.rel.pos = 0.2, chr.gap.scaling = NULL, color.by.highlight = FALSE, label.colname = NULL, x.label = "Chromosome", y.label = expression(-log[10](p)), point.size = 0.75, label.font.size = 2, max.overlaps = 20, plot.title = ggplot2::waiver(), plot.subtitle = ggplot2::waiver(), plot.width = 10, plot.height = 5, ... ) ## S4 method for signature 'GRanges' manhattan_plot( x, chromosome = NULL, outfn = NULL, signif = c(5e-08, 1e-05), pval.colname = "pval", label.colname = NULL, highlight.colname = NULL, chr.order = NULL, signif.col = NULL, chr.col = NULL, highlight.col = NULL, rescale = TRUE, rescale.ratio.threshold = 5, signif.rel.pos = 0.2, chr.gap.scaling = 1, color.by.highlight = FALSE, preserve.position = FALSE, thin = NULL, thin.n = 1000, thin.bins = 200, pval.log.transform = TRUE, plot.title = ggplot2::waiver(), plot.subtitle = ggplot2::waiver(), plot.width = 10, plot.height = 5, point.size = 0.75, label.font.size = 2, max.overlaps = 20, x.label = "Chromosome", y.label = expression(-log[10](p)), ... )
manhattan_plot(x, ...) manhattan_plot.default(x, ...) ## S3 method for class 'data.frame' manhattan_plot( x, chromosome = NULL, outfn = NULL, signif = c(5e-08, 1e-05), pval.colname = "pval", chr.colname = "chr", pos.colname = "pos", label.colname = NULL, highlight.colname = NULL, chr.order = NULL, signif.col = NULL, chr.col = NULL, highlight.col = NULL, rescale = TRUE, rescale.ratio.threshold = 5, signif.rel.pos = 0.2, chr.gap.scaling = 1, color.by.highlight = FALSE, preserve.position = FALSE, thin = NULL, thin.n = 1000, thin.bins = 200, pval.log.transform = TRUE, plot.title = ggplot2::waiver(), plot.subtitle = ggplot2::waiver(), plot.width = 10, plot.height = 5, point.size = 0.75, label.font.size = 2, max.overlaps = 20, x.label = "Chromosome", y.label = expression(-log[10](p)), ... ) ## S3 method for class 'MPdata' manhattan_plot( x, chromosome = NULL, outfn = NULL, signif = NULL, signif.col = NULL, rescale = TRUE, rescale.ratio.threshold = 5, signif.rel.pos = 0.2, chr.gap.scaling = NULL, color.by.highlight = FALSE, label.colname = NULL, x.label = "Chromosome", y.label = expression(-log[10](p)), point.size = 0.75, label.font.size = 2, max.overlaps = 20, plot.title = ggplot2::waiver(), plot.subtitle = ggplot2::waiver(), plot.width = 10, plot.height = 5, ... ) ## S4 method for signature 'GRanges' manhattan_plot( x, chromosome = NULL, outfn = NULL, signif = c(5e-08, 1e-05), pval.colname = "pval", label.colname = NULL, highlight.colname = NULL, chr.order = NULL, signif.col = NULL, chr.col = NULL, highlight.col = NULL, rescale = TRUE, rescale.ratio.threshold = 5, signif.rel.pos = 0.2, chr.gap.scaling = 1, color.by.highlight = FALSE, preserve.position = FALSE, thin = NULL, thin.n = 1000, thin.bins = 200, pval.log.transform = TRUE, plot.title = ggplot2::waiver(), plot.subtitle = ggplot2::waiver(), plot.width = 10, plot.height = 5, point.size = 0.75, label.font.size = 2, max.overlaps = 20, x.label = "Chromosome", y.label = expression(-log[10](p)), ... )
x |
a |
... |
additional arguments to be passed onto |
chromosome |
a character. This is supplied if a manhattan plot of a single chromosome is
desired. If |
outfn |
a character. File name to save the Manhattan Plot. If |
signif |
a numeric vector. Significant p-value thresholds to be drawn for manhattan plot.
At least one value should be provided. Default value is c(5e-08, 1e-5).
If |
pval.colname |
a character. Column name of |
chr.colname |
a character. Column name of |
pos.colname |
a character. Column name of |
label.colname |
a character. Name of the column in |
highlight.colname |
a character. If you desire to color certain points
(e.g. significant variants) rather than color by chromosome, you can specify the
category in this column, and provide the color mapping in |
chr.order |
a character vector. Order of chromosomes presented in manhattan plot. |
signif.col |
a character vector of equal length as |
chr.col |
a character vector of equal length as chr.order. It contains colors
for the chromosomes. Name of the vector should match |
highlight.col |
a character vector. It contains color mapping for the values from
|
rescale |
a logical. If |
rescale.ratio.threshold |
a numeric. Threshold of that triggers the rescale. |
signif.rel.pos |
a numeric between 0.1 and 0.9. If the plot is rescaled, where should the significance threshold be positioned? |
chr.gap.scaling |
a numeric. scaling factor for gap between chromosome if you desire to change it
if x is an |
color.by.highlight |
a logical. Should the points be colored based on a highlight column? |
preserve.position |
a logical. If |
thin |
a logical. If |
thin.n |
an integer. Number of max points per horizontal partitions of the plot. Defaults to 1000. |
thin.bins |
an integer. Number of bins to partition the data. Defaults to 200. |
pval.log.transform |
a logical. If |
plot.title |
a character. Plot title |
plot.subtitle |
a character. Plot subtitle |
plot.width |
a numeric. Plot width in inches. |
plot.height |
a numeric. Plot height in inches. |
point.size |
a numeric. Size of the points. |
label.font.size |
a numeric. Size of the labels. |
max.overlaps |
an integer. Exclude text labels that overlaps too many things. |
x.label |
a character. x-axis label |
y.label |
a character. y-axis label |
This generic function accepts a result of a GWAS in the form of data.frame
or a MPdata
object produced by manhattan_data_preprocess
. The
function will throw an error if another type of object is passed.
Having rescale = TRUE
is useful when there are points with very
high -log10(p.value). In this case, the function attempts to split
the plot into two different scales, with the split happening near the strictest
significance threshold. More precisely, the plot is rescaled when
If you wish to add annotation to the points, provide the name of the column to
label.colname
. The labels are added with ggrepel
.
Be careful though: if the annotation column contains
a large number of variants, then the plotting could take a long time, and the
labels will clutter up the plot. For those points with no annotation, you have the
choice to set them as NA
or ""
.
gg
object if is.null(outfn)
, NULL
if !is.null(outf)
gwasdat <- data.frame( "chromosome" = rep(1:5, each = 30), "position" = c(replicate(5, sample(1:300, 30))), "pvalue" = rbeta(150, 1, 1)^5 ) manhattan_plot( gwasdat, pval.colname = "pvalue", chr.colname = "chromosome", pos.colname = "position", chr.order = as.character(1:5) ) mpdata <- manhattan_data_preprocess( gwasdat, pval.colname = "pvalue", chr.colname = "chromosome", pos.colname = "position", chr.order = as.character(1:5) ) manhattan_plot(mpdata)
gwasdat <- data.frame( "chromosome" = rep(1:5, each = 30), "position" = c(replicate(5, sample(1:300, 30))), "pvalue" = rbeta(150, 1, 1)^5 ) manhattan_plot( gwasdat, pval.colname = "pvalue", chr.colname = "chromosome", pos.colname = "position", chr.order = as.character(1:5) ) mpdata <- manhattan_data_preprocess( gwasdat, pval.colname = "pvalue", chr.colname = "chromosome", pos.colname = "position", chr.order = as.character(1:5) ) manhattan_plot(mpdata)
Plot Quantile-Quantile Plot of p-values against uniform distribution.
qqunif( x, outfn = NULL, conf.int = 0.95, plot.width = 5, plot.height = 5, thin = TRUE, thin.n = 500, zero.pval = "replace" )
qqunif( x, outfn = NULL, conf.int = 0.95, plot.width = 5, plot.height = 5, thin = TRUE, thin.n = 500, zero.pval = "replace" )
x |
a numeric vector of p-values. All values should be between 0 and 1. |
outfn |
a character. File name to save the QQ Plot. If |
conf.int |
a numeric between 0 and 1. Confidence band to draw around reference line. Set to |
plot.width |
a numeric. Plot width in inches. |
plot.height |
a numeric. Plot height in inches. |
thin |
a logical. Reduce number of data points when they are cluttered? |
thin.n |
an integer. Number of max points per horizontal partitions of the plot. Defaults to 500. |
zero.pval |
a character. Determine how to treat 0 pvals. "replace" will replace the p-value of zero with the non-zero minimum. "remove" will remove the p-value of zero. |
a ggplot object
x <- rbeta(1000, 1, 1) qqunif(x)
x <- rbeta(1000, 1, 1) qqunif(x)
Reduce the number of cluttered data points.
thinPoints(dat, value, n = 1000, nbins = 200, groupBy = NULL)
thinPoints(dat, value, n = 1000, nbins = 200, groupBy = NULL)
dat |
a data frame |
value |
column name of |
n |
number of points to sample for each partition |
nbins |
number of partitions |
groupBy |
column name of |
The result of Genome Wide Association Study can be very large, with the majority
of points being being clustered below significance threshold. This unnecessarily increases the time
to plot while making almost no difference. This function reduces the number of points by partitioning the points
by a numeric column value
into nbins
and sampling n
points.
a data.frame
dat <- data.frame( A1 = c(1:20, 20, 20), A2 = c(rep(1, 12), rep(1,5), rep(20, 3), 20, 20) , B = rep(c("a", "b", "c", "d"), times = c(5, 7, 8, 2)) ) # partition "A1" into 2 bins and then sample 6 data points thinPoints(dat, value = "A1", n = 6, nbins = 2) # partition "A2" into 2 bins and then sample 6 data points thinPoints(dat, value = "A2", n = 6, nbins = 2) # group by "B", partition "A2" into 2 bins and then sample 3 data points thinPoints(dat, value = "A2", n = 3, nbins = 2, groupBy = "B")
dat <- data.frame( A1 = c(1:20, 20, 20), A2 = c(rep(1, 12), rep(1,5), rep(20, 3), 20, 20) , B = rep(c("a", "b", "c", "d"), times = c(5, 7, 8, 2)) ) # partition "A1" into 2 bins and then sample 6 data points thinPoints(dat, value = "A1", n = 6, nbins = 2) # partition "A2" into 2 bins and then sample 6 data points thinPoints(dat, value = "A2", n = 6, nbins = 2) # group by "B", partition "A2" into 2 bins and then sample 3 data points thinPoints(dat, value = "A2", n = 3, nbins = 2, groupBy = "B")