Title: | Make read coverage plots from BigWig files |
---|---|
Description: | Tools to visualise read coverage from sequencing experiments together with genomic annotations (genes, transcripts, peaks). Introns of long transcripts can be rescaled to a fixed length for better visualisation of exonic read coverage. |
Authors: | Kaur Alasoo [aut, cre] |
Maintainer: | Kaur Alasoo <[email protected]> |
License: | Apache License 2.0 |
Version: | 1.31.0 |
Built: | 2024-12-17 03:29:52 UTC |
Source: | https://github.com/bioc/wiggleplotr |
Returns a three-colour palette suitable for visualising read coverage stratified by genotype
getGenotypePalette(old = FALSE)
getGenotypePalette(old = FALSE)
old |
Return old colour palette (now deprecated). |
Vector of three colours.
getGenotypePalette()
getGenotypePalette()
The Manhattan plots is compatible with wiggpleplotr read coverage and transcript strucutre plots. Can be appended to those using the cowplot::plot_grid() function.
makeManhattanPlot(pvalues_df, region_coords, color_R2 = FALSE, data_track = TRUE)
makeManhattanPlot(pvalues_df, region_coords, color_R2 = FALSE, data_track = TRUE)
pvalues_df |
Data frame of association p-values (required columns: track_id, p_nominal, pos) |
region_coords |
Start and end coordinates of the region to plot. |
color_R2 |
Color the points according to R2 from the lead variant. Require R2 column in the pvalues_df data frame. |
data_track |
If TRUE, then remove all information from x-axis. Makes it easy to append to read coverage or transcript strcture plots using cowplot::plot_grid(). |
gglot2 object
data = dplyr::data_frame(track_id = "GWAS", pos = sample(c(1:1000), 200), p_nominal = runif(200, min = 0.0000001, 1)) makeManhattanPlot(data, c(1,1000), data_track = FALSE)
data = dplyr::data_frame(track_id = "GWAS", pos = sample(c(1:1000), 200), p_nominal = runif(200, min = 0.0000001, 1)) makeManhattanPlot(data, c(1,1000), data_track = FALSE)
A dataset containing start and end coordinates of coding sequences (CDS) from nine protein coding transcripts of NCOA7.
ncoa7_cdss
ncoa7_cdss
A GRangesList object with 9 elements:
CDS start and end coordinates for a single transcript (GRanges object)
...
A dataset containing start and end coordinates of exons from nine protein coding transcripts of NCOA7.
ncoa7_exons
ncoa7_exons
A GRangesList object with 9 elements:
Exon start and end coordinates for a single transcript (GRanges object)
...
A a list of transcripts for NCOA7.
ncoa7_metadata
ncoa7_metadata
A data.frame object with 4 columns:
Ensembl transcript id.
Ensembl gene id.
Human readable gene name.
Strand of the transcript (either +1 or -1).
...
Paste two factors together and preserved their joint order.
pasteFactors(factor1, factor2)
pasteFactors(factor1, factor2)
factor1 |
First factor |
factor2 |
Second factor |
Factors factor1 and factor2 pasted together.
Also supports rescaling introns to constant length. Does not work on Windows, because rtracklayer cannot read BigWig files on Windows.
plotCoverage(exons, cdss = NULL, transcript_annotations = NULL, track_data, rescale_introns = TRUE, new_intron_length = 50, flanking_length = c(50, 50), plot_fraction = 0.1, heights = c(0.75, 0.25), alpha = 1, fill_palette = c("#a1dab4", "#41b6c4", "#225ea8"), mean_only = TRUE, connect_exons = TRUE, transcript_label = TRUE, return_subplots_list = FALSE, region_coords = NULL, coverage_type = "area")
plotCoverage(exons, cdss = NULL, transcript_annotations = NULL, track_data, rescale_introns = TRUE, new_intron_length = 50, flanking_length = c(50, 50), plot_fraction = 0.1, heights = c(0.75, 0.25), alpha = 1, fill_palette = c("#a1dab4", "#41b6c4", "#225ea8"), mean_only = TRUE, connect_exons = TRUE, transcript_label = TRUE, return_subplots_list = FALSE, region_coords = NULL, coverage_type = "area")
exons |
list of GRanges objects, each object containing exons for one transcript. The list must have names that correspond to transcript_id column in transript_annotations data.frame. |
cdss |
list of GRanges objects, each object containing the coding regions (CDS) of a single transcript. The list must have names that correspond to transcript_id column in transript_annotations data.frame. If cdss is not specified then exons list will be used for both arguments. (default: NULL). |
transcript_annotations |
Data frame with at least three columns: transcript_id, gene_name, strand. Used to construct transcript labels. (default: NULL) |
track_data |
data.frame with the metadata for the bigWig read coverage files. Must contain the following columns:
|
rescale_introns |
Specifies if the introns should be scaled to fixed length or not. (default: TRUE) |
new_intron_length |
length (bp) of introns after scaling. (default: 50) |
flanking_length |
Lengths of the flanking regions upstream and downstream of the gene. (default: c(50,50)) |
plot_fraction |
Size of the random sub-sample of points used to plot coverage (between 0 and 1). Smaller values make plotting significantly faster. (default: 0.1) |
heights |
Specifies the proportion of the height that is dedicated to coverage plots (first value) relative to transcript annotations (second value). (default: c(0.75,0.25)) |
alpha |
Transparency (alpha) value for the read coverage tracks. Useful to set to something < 1 when overlaying multiple tracks (see track_id). (default: 1) |
fill_palette |
Vector of fill colours used for the coverage tracks. Length must be equal to the number of unique values in track_data$colour_group column. |
mean_only |
Plot only mean coverage within each combination of track_id and colour_group values. Useful for example for plotting mean coverage stratified by genotype (which is specified in the colour_group column) (default: TRUE). |
connect_exons |
Print lines that connect exons together. Set to FALSE when plotting peaks (default: TRUE). |
transcript_label |
If TRUE then transcript labels are printed above each transcript. (default: TRUE). |
return_subplots_list |
Instead of a joint plot return a list of subplots that can be joined together manually. |
region_coords |
Start and end coordinates of the region to plot, overrides flanking_length parameter. |
coverage_type |
Specifies if the read coverage is represented by either 'line', 'area' or 'both'. The 'both' option tends to give better results for wide regions. (default: area). |
Either object from cow_plot::plot_grid() function or a list of subplots (if return_subplots_list == TRUE)
require("dplyr") require("GenomicRanges") sample_data = dplyr::data_frame(sample_id = c("aipt_A", "aipt_C", "bima_A", "bima_C"), condition = factor(c("Naive", "LPS", "Naive", "LPS"), levels = c("Naive", "LPS")), scaling_factor = 1) %>% dplyr::mutate(bigWig = system.file("extdata", paste0(sample_id, ".str2.bw"), package = "wiggleplotr")) track_data = dplyr::mutate(sample_data, track_id = condition, colour_group = condition) selected_transcripts = c("ENST00000438495", "ENST00000392477") #Plot only two transcripts of the gens ## Not run: plotCoverage(ncoa7_exons[selected_transcripts], ncoa7_cdss[selected_transcripts], ncoa7_metadata, track_data, heights = c(2,1), fill_palette = getGenotypePalette()) ## End(Not run)
require("dplyr") require("GenomicRanges") sample_data = dplyr::data_frame(sample_id = c("aipt_A", "aipt_C", "bima_A", "bima_C"), condition = factor(c("Naive", "LPS", "Naive", "LPS"), levels = c("Naive", "LPS")), scaling_factor = 1) %>% dplyr::mutate(bigWig = system.file("extdata", paste0(sample_id, ".str2.bw"), package = "wiggleplotr")) track_data = dplyr::mutate(sample_data, track_id = condition, colour_group = condition) selected_transcripts = c("ENST00000438495", "ENST00000392477") #Plot only two transcripts of the gens ## Not run: plotCoverage(ncoa7_exons[selected_transcripts], ncoa7_cdss[selected_transcripts], ncoa7_metadata, track_data, heights = c(2,1), fill_palette = getGenotypePalette()) ## End(Not run)
A wrapper around the plotCoverage function. See the documentation for (plotCoverage
)
for more information.
plotCoverageFromEnsembldb(ensembldb, gene_names, transcript_ids = NULL, ...)
plotCoverageFromEnsembldb(ensembldb, gene_names, transcript_ids = NULL, ...)
ensembldb |
ensembldb object. |
gene_names |
List of gene names to be plotted. |
transcript_ids |
Optional list of transcript ids to be plotted. |
... |
Additional parameters to be passed to plotCoverage. |
ggplot2 object
require("EnsDb.Hsapiens.v86") require("dplyr") require("GenomicRanges") sample_data = dplyr::data_frame(sample_id = c("aipt_A", "aipt_C", "bima_A", "bima_C"), condition = factor(c("Naive", "LPS", "Naive", "LPS"), levels = c("Naive", "LPS")), scaling_factor = 1) %>% dplyr::mutate(bigWig = system.file("extdata", paste0(sample_id, ".str2.bw"), package = "wiggleplotr")) track_data = dplyr::mutate(sample_data, track_id = condition, colour_group = condition) ## Not run: plotCoverageFromEnsembldb(EnsDb.Hsapiens.v86, "NCOA7", transcript_ids = c("ENST00000438495", "ENST00000392477"), track_data, heights = c(2,1), fill_palette = getGenotypePalette()) ## End(Not run)
require("EnsDb.Hsapiens.v86") require("dplyr") require("GenomicRanges") sample_data = dplyr::data_frame(sample_id = c("aipt_A", "aipt_C", "bima_A", "bima_C"), condition = factor(c("Naive", "LPS", "Naive", "LPS"), levels = c("Naive", "LPS")), scaling_factor = 1) %>% dplyr::mutate(bigWig = system.file("extdata", paste0(sample_id, ".str2.bw"), package = "wiggleplotr")) track_data = dplyr::mutate(sample_data, track_id = condition, colour_group = condition) ## Not run: plotCoverageFromEnsembldb(EnsDb.Hsapiens.v86, "NCOA7", transcript_ids = c("ENST00000438495", "ENST00000392477"), track_data, heights = c(2,1), fill_palette = getGenotypePalette()) ## End(Not run)
A wrapper around the plotCoverage function. See the documentation for (plotCoverage
)
for more information.
plotCoverageFromUCSC(orgdb, txdb, gene_names, transcript_ids = NULL, ...)
plotCoverageFromUCSC(orgdb, txdb, gene_names, transcript_ids = NULL, ...)
orgdb |
UCSC OrgDb object. |
txdb |
UCSC TxDb obejct. |
gene_names |
List of gene names to be plotted. |
transcript_ids |
Optional list of transcript ids to be plotted. |
... |
Additional parameters to be passed to plotCoverage. |
ggplot2 object
require("dplyr") require("GenomicRanges") require("org.Hs.eg.db") require("TxDb.Hsapiens.UCSC.hg38.knownGene") orgdb = org.Hs.eg.db txdb = TxDb.Hsapiens.UCSC.hg38.knownGene sample_data = dplyr::data_frame(sample_id = c("aipt_A", "aipt_C", "bima_A", "bima_C"), condition = factor(c("Naive", "LPS", "Naive", "LPS"), levels = c("Naive", "LPS")), scaling_factor = 1) %>% dplyr::mutate(bigWig = system.file("extdata", paste0(sample_id, ".str2.bw"), package = "wiggleplotr")) track_data = dplyr::mutate(sample_data, track_id = condition, colour_group = condition) ## Not run: #Note: This example does not work, becasue UCSC and Ensembl use different chromosome names plotCoverageFromUCSC(orgdb, txdb, "NCOA7", transcript_ids = c("ENST00000438495.6", "ENST00000368357.7"), track_data, heights = c(2,1), fill_palette = getGenotypePalette()) ## End(Not run)
require("dplyr") require("GenomicRanges") require("org.Hs.eg.db") require("TxDb.Hsapiens.UCSC.hg38.knownGene") orgdb = org.Hs.eg.db txdb = TxDb.Hsapiens.UCSC.hg38.knownGene sample_data = dplyr::data_frame(sample_id = c("aipt_A", "aipt_C", "bima_A", "bima_C"), condition = factor(c("Naive", "LPS", "Naive", "LPS"), levels = c("Naive", "LPS")), scaling_factor = 1) %>% dplyr::mutate(bigWig = system.file("extdata", paste0(sample_id, ".str2.bw"), package = "wiggleplotr")) track_data = dplyr::mutate(sample_data, track_id = condition, colour_group = condition) ## Not run: #Note: This example does not work, becasue UCSC and Ensembl use different chromosome names plotCoverageFromUCSC(orgdb, txdb, "NCOA7", transcript_ids = c("ENST00000438495.6", "ENST00000368357.7"), track_data, heights = c(2,1), fill_palette = getGenotypePalette()) ## End(Not run)
Quickly plot transcript structure without read coverage tracks
plotTranscripts(exons, cdss = NULL, transcript_annotations = NULL, rescale_introns = TRUE, new_intron_length = 50, flanking_length = c(50, 50), connect_exons = TRUE, transcript_label = TRUE, region_coords = NULL)
plotTranscripts(exons, cdss = NULL, transcript_annotations = NULL, rescale_introns = TRUE, new_intron_length = 50, flanking_length = c(50, 50), connect_exons = TRUE, transcript_label = TRUE, region_coords = NULL)
exons |
list of GRanges objects, each object containing exons for one transcript. The list must have names that correspond to transcript_id column in transript_annotations data.frame. |
cdss |
list of GRanges objects, each object containing the coding regions (CDS) of a single transcript. The list must have names that correspond to transcript_id column in transript_annotations data.frame. If cdss is not specified then exons list will be used for both arguments. (default: NULL) |
transcript_annotations |
Data frame with at least three columns: transcript_id, gene_name, strand. Used to construct transcript labels. (default: NULL) |
rescale_introns |
Specifies if the introns should be scaled to fixed length or not. (default: TRUE) |
new_intron_length |
length (bp) of introns after scaling. (default: 50) |
flanking_length |
Lengths of the flanking regions upstream and downstream of the gene. (default: c(50,50)) |
connect_exons |
Print lines that connect exons together. Set to FALSE when plotting peaks (default: TRUE). |
transcript_label |
If TRUE then transcript labels are printed above each transcript. (default: TRUE). |
region_coords |
Start and end coordinates of the region to plot, overrides flanking_length parameter. |
ggplot2 object
plotTranscripts(ncoa7_exons, ncoa7_cdss, ncoa7_metadata, rescale_introns = FALSE)
plotTranscripts(ncoa7_exons, ncoa7_cdss, ncoa7_metadata, rescale_introns = FALSE)
A wrapper around the plotTranscripts function. See the documentation for (plotTranscripts
)
for more information.
plotTranscriptsFromEnsembldb(ensembldb, gene_names, transcript_ids = NULL, ...)
plotTranscriptsFromEnsembldb(ensembldb, gene_names, transcript_ids = NULL, ...)
ensembldb |
ensembldb object. |
gene_names |
List of gene names to be plotted. |
transcript_ids |
Optional list of transcript ids to be plotted. |
... |
Additional parameters to be passed to plotTranscripts |
ggplot2 object
require("EnsDb.Hsapiens.v86") plotTranscriptsFromEnsembldb(EnsDb.Hsapiens.v86, "NCOA7", transcript_ids = c("ENST00000438495", "ENST00000392477"))
require("EnsDb.Hsapiens.v86") plotTranscriptsFromEnsembldb(EnsDb.Hsapiens.v86, "NCOA7", transcript_ids = c("ENST00000438495", "ENST00000392477"))
A wrapper around the plotTranscripts function. See the documentation for (plotTranscripts
)
for more information. Note that this function is much slower than (plotTranscripts
) or
(plotTranscriptsFromEnsembldb
) functions, because indivudally extracting exon
coordinates from txdb objects is quite inefficient.
plotTranscriptsFromUCSC(orgdb, txdb, gene_names, transcript_ids = NULL, ...)
plotTranscriptsFromUCSC(orgdb, txdb, gene_names, transcript_ids = NULL, ...)
orgdb |
UCSC OrgDb object. |
txdb |
UCSC TxDb obejct. |
gene_names |
List of gene genaes to be plot. |
transcript_ids |
Optional list of transcript ids to be plot. (default = NULL) |
... |
Additional parameters to be passed to plotTranscripts |
Transcript plot.
#Load OrgDb and TxDb objects with UCSC gene annotations require("org.Hs.eg.db") require("TxDb.Hsapiens.UCSC.hg38.knownGene") orgdb = org.Hs.eg.db txdb = TxDb.Hsapiens.UCSC.hg38.knownGene plotTranscriptsFromUCSC(orgdb, txdb, "NCOA7", transcript_ids = c("ENST00000438495.6", "ENST00000368357.7"))
#Load OrgDb and TxDb objects with UCSC gene annotations require("org.Hs.eg.db") require("TxDb.Hsapiens.UCSC.hg38.knownGene") orgdb = org.Hs.eg.db txdb = TxDb.Hsapiens.UCSC.hg38.knownGene plotTranscriptsFromUCSC(orgdb, txdb, "NCOA7", transcript_ids = c("ENST00000438495.6", "ENST00000368357.7"))
wiggleplotr package provides tools to visualise transcript annotations (plotTranscripts
) and plot
sequencing read coverage over annotated transcripts (plotCoverage
).
You can also use covenient wrapper functions
(plotTranscriptsFromEnsembldb
), (plotCoverageFromEnsembldb
),
(plotTranscriptsFromUCSC
) and (plotCoverageFromUCSC
).
To learn more about wiggleplotr, start with the vignette:
browseVignettes(package = "wiggleplotr")