Title: | Optimized Functional Annotation Of ChIP-seq Data |
---|---|
Description: | geneXtendeR optimizes the functional annotation of ChIP-seq peaks by exploring relative differences in annotating ChIP-seq peak sets to variable-length gene bodies. In contrast to prior techniques, geneXtendeR considers peak annotations beyond just the closest gene, allowing users to see peak summary statistics for the first-closest gene, second-closest gene, ..., n-closest gene whilst ranking the output according to biologically relevant events and iteratively comparing the fidelity of peak-to-gene overlap across a user-defined range of upstream and downstream extensions on the original boundaries of each gene's coordinates. Since different ChIP-seq peak callers produce different differentially enriched peaks with a large variance in peak length distribution and total peak count, annotating peak lists with their nearest genes can often be a noisy process. As such, the goal of geneXtendeR is to robustly link differentially enriched peaks with their respective genes, thereby aiding experimental follow-up and validation in designing primers for a set of prospective gene candidates during qPCR. |
Authors: | Bohdan Khomtchouk [aut, cre], William Koehler [aut] |
Maintainer: | Bohdan Khomtchouk <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.33.0 |
Built: | 2024-11-27 04:44:28 UTC |
Source: | https://github.com/bioc/geneXtendeR |
Makes boxplots of all peak lengths (within a peaks input file) to show how lengths of individual peaks are distributed across the entire peak set.
allPeakLengths(filename)
allPeakLengths(filename)
filename |
Name of peaks input file. |
Returns a box-and-whisker plot of peak length distribution across a peaks file.
myfile <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") allPeakLengths(myfile)
myfile <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") allPeakLengths(myfile)
Annotate a user's peaks file (which has been preprocessed with the peaksInput() command) with gene information based on optimally chosen geneXtendeR upstream extension file. This command requires a preprocessed "peaks.txt" file (generated using peaksInput()) to be present in the user's working directory, otherwise the user is prompted to rerun the peaksInput() command in order to regenerate it.
annotate(organism, extension)
annotate(organism, extension)
organism |
Object name assigned from readGFF() command. |
extension |
Desired upstream extension. |
The gene coordinates are extended by 'extension' at the 5-prime end, and by 500 bp at the 3-prime end. The peaks file is then overlayed on these new gene coordinates, producing a file of peaks annotated with gene ID, gene name, and gene-to-peak genomic distance (in bp). Distance is calculated between 5-prime end of gene and 3-prime end of peak.
A data.table formatted version of the annotated file for checking or further calculations.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) annotate(rat, 2500)
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) annotate(rat, 2500)
Annotate a user's peaks file (which has been preprocessed with the peaksInput() command) with gene information based on optimally chosen geneXtendeR upstream extension file. This command requires a preprocessed "peaks.txt" file (generated using peaksInput()) to be present in the user's working directory, otherwise the user is prompted to rerun the peaksInput() command in order to regenerate it.
annotate_n(organism, extension, n = 2)
annotate_n(organism, extension, n = 2)
organism |
Object name assigned from readGFF() command. |
extension |
Desired upstream extension. |
n |
Number of Gene's closest away from the peak |
The gene coordinates are extended by ‘extension' at the 5-prime end, and by 500 bp at the 3-prime end. The peaks file is then overlayed on these new gene coordinates, producing a file of peaks annotated with gene ID, gene name, and gene-to-peak genomic distance (in bp). Distance is calculated between 5-prime end of gene and 3-prime end of peak. File named "annotated_’extension'_'n'.txt".
A data.table formatted version of the annotated file for checking or further calculations.
## Not run: rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) annotate_n(rat, 2500, n=3) ## End(Not run)
## Not run: rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) annotate_n(rat, 2500, n=3) ## End(Not run)
Makes bar graphs showing the number of genes under peaks at various upstream extension levels.
barChart(organism, start, end, by)
barChart(organism, start, end, by)
organism |
Object name assigned from readGFF() command. |
start |
Lower bound of upstream extension. |
end |
Upper bound of upstream extension. |
by |
Interval between consecutive extensions. |
Creates bar charts.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) barChart(rat, 1000, 3000, 100)
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) barChart(rat, 1000, 3000, 100)
Makes cumulative differential line plots showing the cumulative sums of the number of genes under peaks at consecutive upstream extension levels.
cumlinePlot(organism, start, end, by)
cumlinePlot(organism, start, end, by)
organism |
Object name assigned from readGFF() command. |
start |
Lower bound of upstream extension. |
end |
Upper bound of upstream extension. |
by |
Interval between consecutive extensions. |
Creates cumulative differential line plots.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) cumlinePlot(rat, 1000, 3000, 100)
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) cumlinePlot(rat, 1000, 3000, 100)
Determines gene ontology terms for each category (biological process (BP), cellular compartment (CC), molecular function (MF)) of genes-under-peaks that are unique between two different upstream extension levels.
diffGO(organism, start, end, GOcategory, GOspecies)
diffGO(organism, start, end, GOcategory, GOspecies)
organism |
Object name assigned from readGFF() command. |
start |
Lower bound of upstream extension. |
end |
Upper bound of upstream extension. |
GOcategory |
Either BP, CC, or MF. |
GOspecies |
Either org.Ag.eg.db (mosquito), org.Bt.eg.db (bovine), org.Ce.eg.db (worm), org.Cf.eg.db (canine), org.Dm.eg.db (fly), org.Dr.eg.db (zebrafish), org.Gg.eg.db (chicken), org.Hs.eg.db (human), org.Mm.eg.db (mouse), org.Mmu.eg.db (rhesus), org.Pt.eg.db (chimpanzee), org.Rn.eg.db (rat), org.Sc.sgd.db (yeast), org.Ss.eg.db (pig), or org.Xl.eg.db (frog). |
A data frame of gene symbol, gene ontology ID, and gene ontology term for either a BP, CC, or MF category. This data frame displays the annotations of all unique genes (i.e., genes that are located under peaks between two upstream extension levels) with their respective gene ontology information.
library(rtracklayer) library(org.Rn.eg.db) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) diffGO(rat, 0, 500, BP, org.Rn.eg.db)
library(rtracklayer) library(org.Rn.eg.db) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) diffGO(rat, 0, 500, BP, org.Rn.eg.db)
Determines what genes directly under peaks are actually unique between two different upstream extension levels.
distinct(organism, start, end)
distinct(organism, start, end)
organism |
Object name assigned from readGFF() command. |
start |
Lower bound of upstream extension. |
end |
Upper bound of upstream extension. |
V1-V3 denote the chromosome/start/end positions of the peaks, V4-V6 denote the respective values of the genes, V7 is the gene ID (e.g., Ensembl ID), V8 is the gene name, and V9 is the distance of peak to nearest gene.
A data.table of unique genes located under peaks between two upstream extension levels.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) distinct(rat, 2000, 3000)
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) distinct(rat, 2000, 3000)
Annotate a user's peaks file (which has been preprocessed with the peaksInput() command) with gene information based on optimally chosen geneXtendeR upstream extension file and compresses the annotations based on genes. This command requires a preprocessed "peaks.txt" file (generated using peaksInput()) to be present in the user's working directory, otherwise the user is prompted to rerun the peaksInput() command in order to regenerate it.
gene_annotate(organism, extension)
gene_annotate(organism, extension)
organism |
Object name assigned from readGFF() command. |
extension |
Desired upstream extension. |
The gene coordinates are extended by 'extension' at the 5-prime end, and by 500 bp at the 3-prime end. The peaks file is then overlayed on these new gene coordinates, producing a file of peaks annotated with gene ID, gene name, gene location, mean and standard deviation of peaks-to-genes, number of peaks-to-genes, and peak-to-gene genomic distance (in bp). Distance is calculated between 5-prime end of gene and 3-prime end of peak.
A data.table formatted version of the gene-annotated file for checking or further calculations.
(From annotate.r) The gene coordinates are extended by 'extension' at the 5-prime end, and by 500 bp at the 3-prime end. The peaks file is then overlayed on these new gene coordinates, producing a file of peaks annotated with gene ID, gene name, and gene-to-peak genomic distance (in bp). Distance is calculated between 5-prime end of gene and 3-prime end of peak.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) gene_annotate(rat, 3400)
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) gene_annotate(rat, 3400)
Looks up closest peak to a specified gene on the peaks file (which has been preprocessed with the peaksInput() command) based on the latest .bed file accessed or for a specified extension. This command requires a preprocessed "peaks.txt" file (generated using peaksInput()) to be present in the user's working directory, otherwise the user is prompted to rerun the peaksInput() command in order to regenerate it.
gene_lookup(organism, gene_name, n = 2, extension = NA, cutoff = Inf)
gene_lookup(organism, gene_name, n = 2, extension = NA, cutoff = Inf)
organism |
Object name assigned from readGFF() command. |
gene_name |
Gene names or gene ids specified by user in a string form. |
n |
Number of closest peaks located to 'gene_name' on any given chromosome to be found. Default = 2 |
extension |
Desired upstream extension. If left unspecified, the latest geneXtender bed file will be chosen. If no extension is specified and no bed file can be found, a default extension of 500 is selected. |
cutoff |
Optional arg to specify max bp distance to search around 'gene_name'. Default = Inf |
A data.table with all peaks located within 'n' peaks and 'cutoff' bp distance away on every chromosome that contains 'gene_name'.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") closest <- gene_lookup(rat, c("Vom2r3", "Vom2r5"), n = 7, extension = 1000) closest
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") closest <- gene_lookup(rat, c("Vom2r3", "Vom2r5"), n = 7, extension = 1000) closest
Makes line plots showing the ratio of statistically significant peaks to the total number of peaks at each genomic interval (e.g., 0-500 bp upstream of every gene in the genome, 500-1000 bp upstream of every gene in the genome, etc.).
hotspotPlot(totalpeaksfile, significantpeaksfile, organism, start, end, by)
hotspotPlot(totalpeaksfile, significantpeaksfile, organism, start, end, by)
totalpeaksfile |
Filename in user's working directory (or full path to filename) containing all peaks. |
significantpeaksfile |
Filename in user's working directory (or full path to filename) containing only significant peaks. |
organism |
Object name assigned from readGFF() command. |
start |
Lower bound of upstream extension. |
end |
Upper bound of upstream extension. |
by |
Interval between consecutive extensions. |
Line plot showing the ratio of significant to total peaks at each interval across the genome.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") allpeaks <- system.file("extdata", "totalpeaksfile.txt", package="geneXtendeR") sigpeaks <- system.file("extdata", "significantpeaksfile.txt", package="geneXtendeR") hotspotPlot(allpeaks, sigpeaks, rat, 0, 10000, 500)
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") allpeaks <- system.file("extdata", "totalpeaksfile.txt", package="geneXtendeR") sigpeaks <- system.file("extdata", "significantpeaksfile.txt", package="geneXtendeR") hotspotPlot(allpeaks, sigpeaks, rat, 0, 10000, 500)
Makes differential line plots showing the differences in the number of genes under peaks at consecutive upstream extension levels.
linePlot(organism, start, end, by)
linePlot(organism, start, end, by)
organism |
Object name assigned from readGFF() command. |
start |
Lower bound of upstream extension. |
end |
Upper bound of upstream extension. |
by |
Interval between consecutive extensions. |
Creates differential line plots.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) linePlot(rat, 1000, 3000, 100)
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) linePlot(rat, 1000, 3000, 100)
Creates dynamic and interactive networks of genes linked to their respective gene ontology terms for each category (biological process (BP), cellular compartment (CC), molecular function (MF)) of genes-under-peaks that are unique between two different upstream extension levels.
makeNetwork(organism, start, end, GOcategory, GOspecies)
makeNetwork(organism, start, end, GOcategory, GOspecies)
organism |
Object name assigned from readGFF() command. |
start |
Lower bound of upstream extension. |
end |
Upper bound of upstream extension. |
GOcategory |
Either BP, CC, or MF. |
GOspecies |
Either org.Ag.eg.db (mosquito), org.Bt.eg.db (bovine), org.Ce.eg.db (worm), org.Cf.eg.db (canine), org.Dm.eg.db (fly), org.Dr.eg.db (zebrafish), org.Gg.eg.db (chicken), org.Hs.eg.db (human), org.Mm.eg.db (mouse), org.Mmu.eg.db (rhesus), org.Pt.eg.db (chimpanzee), org.Rn.eg.db (rat), org.Sc.sgd.db (yeast), org.Ss.eg.db (pig), or org.Xl.eg.db (frog). |
An interactive network of gene names linked to their respective gene ontology terms for either a BP, CC, or MF category.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) library(networkD3) library(dplyr) library(org.Rn.eg.db) makeNetwork(rat, 0, 500, BP, org.Rn.eg.db)
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) library(networkD3) library(dplyr) library(org.Rn.eg.db) makeNetwork(rat, 0, 500, BP, org.Rn.eg.db)
Creates word cloud from gene ontology terms derived from either biological process (BP), cellular compartment (CC), or molecular function (MF) of genes-under-peaks that are unique between two different upstream extension levels.
makeWordCloud(organism, start, end, GOcategory, GOspecies)
makeWordCloud(organism, start, end, GOcategory, GOspecies)
organism |
Object name assigned from readGFF() command. |
start |
Lower bound of upstream extension. |
end |
Upper bound of upstream extension. |
GOcategory |
Either BP, CC, or MF. |
GOspecies |
Either org.Ag.eg.db (mosquito), org.Bt.eg.db (bovine), org.Ce.eg.db (worm), org.Cf.eg.db (canine), org.Dm.eg.db (fly), org.Dr.eg.db (zebrafish), org.Gg.eg.db (chicken), org.Hs.eg.db (human), org.Mm.eg.db (mouse), org.Mmu.eg.db (rhesus), org.Pt.eg.db (chimpanzee), org.Rn.eg.db (rat), org.Sc.sgd.db (yeast), org.Ss.eg.db (pig), or org.Xl.eg.db (frog). |
A word cloud comprised of words gathered from gene ontology terms of either a BP, CC, or MF category.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) library(tm) library(SnowballC) library(wordcloud) library(RColorBrewer) library(org.Rn.eg.db) makeWordCloud(rat, 0, 500, BP, org.Rn.eg.db)
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) library(tm) library(SnowballC) library(wordcloud) library(RColorBrewer) library(org.Rn.eg.db) makeWordCloud(rat, 0, 500, BP, org.Rn.eg.db)
Determines the average peak length of all peaks found within some genomic interval (e.g., 0-500 bp upstream of nearest gene for all genes throughout the genome).
meanPeakLength(organism, start, end)
meanPeakLength(organism, start, end)
organism |
Object name assigned from readGFF() command. |
start |
Lower bound of upstream extension. |
end |
Upper bound of upstream extension. |
A vector composed of a single number representing the average peak length found within a genomic interval.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") sigpeaks <- system.file("extdata", "significantpeaksfile.txt", package="geneXtendeR") peaksInput(sigpeaks) meanPeakLength(rat, 0, 500)
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") sigpeaks <- system.file("extdata", "significantpeaksfile.txt", package="geneXtendeR") peaksInput(sigpeaks) meanPeakLength(rat, 0, 500)
Makes line plots of mean peak lengths to show the average length of individual peaks within any genomic interval (e.g., 0-500 bp upstream of nearest gene for all genes throughout the genome).
meanPeakLengthPlot(organism, start, end, by)
meanPeakLengthPlot(organism, start, end, by)
organism |
Object name assigned from readGFF() command. |
start |
Lower bound of upstream extension. |
end |
Upper bound of upstream extension. |
by |
Interval between consecutive extensions. |
Creates mean peak length line plots.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") allpeaks <- system.file("extdata", "totalpeaksfile.txt", package="geneXtendeR") peaksInput(allpeaks) meanPeakLengthPlot(rat, 0, 10000, 500)
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") allpeaks <- system.file("extdata", "totalpeaksfile.txt", package="geneXtendeR") peaksInput(allpeaks) meanPeakLengthPlot(rat, 0, 10000, 500)
Makes boxplots of peak lengths to show how lengths of individual peaks are distributed within any genomic interval (e.g., 0-500 bp upstream of nearest gene for all genes throughout the genome).
peakLengthBoxplot(organism, start, end)
peakLengthBoxplot(organism, start, end)
organism |
Object name assigned from readGFF() command. |
start |
Lower bound of upstream extension. |
end |
Upper bound of upstream extension. |
Creates boxplots showing how lengths of peaks are distributed within any given genomic interval. Also, creates character vector composed of individual peak lengths.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") allpeaks <- system.file("extdata", "totalpeaksfile.txt", package="geneXtendeR") peaksInput(allpeaks) peakLengthBoxplot(rat, 0, 500)
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") allpeaks <- system.file("extdata", "totalpeaksfile.txt", package="geneXtendeR") peaksInput(allpeaks) peakLengthBoxplot(rat, 0, 500)
Takes your tab-delimited 3-column (chromosome number, peak start, and peak end) input file (see ?samplepeaksinput) consisting of peaks called from a peak caller (e.g., MACS2 or SICER) and sorts the file by chromosome and start position, thereby creating a preprocessed file for downstream geneXtendeR analysis. This file (called "peaks.txt") is a preprocessed file of the original input and is deposited in the user's working directory and used for the remainder of the analysis. In this "peaks.txt" file, peaks are sorted by chromosome number and start position, where the X chromosome is designated by the integer 100, the Y chromosome by the integer 200, and the mitochondrial chromosome by the integer 300.
peaksInput(filename)
peaksInput(filename)
filename |
Name of file containing peaks that have been generated from a peak caller (e.g., MACS2, SICER). See ?samplepeaksinput for an example of such an input file. |
Returns a formatted file (called "peaks.txt") that has been preprocessed in preparation for usage with barChart(), linePlot(), distinct(), and other downstream commands and deposited in the user's working directory.
?samplepeaksinput #Documentation of some exemplary sample input data(samplepeaksinput) head(samplepeaksinput) tail(samplepeaksinput) fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath)
?samplepeaksinput #Documentation of some exemplary sample input data(samplepeaksinput) head(samplepeaksinput) tail(samplepeaksinput) fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath)
Takes your tab-delimited 3-column (chromosome number, peak start, and peak end) input file (see ?samplepeaksinput) consisting of peaks called from a peak caller (e.g., MACS2 or SICER) and transforms this file into a file of merged peaks. This file (called "peaks.txt") is a preprocessed file of the original input transformed into merged peaks, and it is deposited in the user's working directory and used for the remainder of the analysis. In this "peaks.txt" file, peaks are sorted by chromosome number and start position, where the X chromosome is designated by the integer 100, the Y chromosome by the integer 200, and the mitochondrial chromosome by the integer 300.
peaksMerge(filename, mergeby)
peaksMerge(filename, mergeby)
filename |
Name of file containing peaks that have been generated from a peak caller (e.g., MACS2, SICER). See ?samplepeaksinput for an example of such an input file. |
mergeby |
Integer indicating how close two adjacent sorted peaks need to be in order to be merged into one peak. |
Returns a formatted file (called "peaks.txt"), deposited in the user's working directory, which has been preprocessed to transform individual peaks into merged peaks in preparation for usage with barChart(), linePlot(), distinct(), and other downstream commands.
fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksMerge(fpath, 500)
fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksMerge(fpath, 500)
Creates barplots of word frequencies from gene ontology terms derived from either biological process (BP), cellular compartment (CC), or molecular function (MF) of genes-under-peaks that are unique between two different upstream extension levels.
plotWordFreq(organism, start, end, GOcategory, GOspecies, word_count)
plotWordFreq(organism, start, end, GOcategory, GOspecies, word_count)
organism |
Object name assigned from readGFF() command. |
start |
Lower bound of upstream extension. |
end |
Upper bound of upstream extension. |
GOcategory |
Either BP, CC, or MF. |
GOspecies |
Either org.Ag.eg.db (mosquito), org.Bt.eg.db (bovine), org.Ce.eg.db (worm), org.Cf.eg.db (canine), org.Dm.eg.db (fly), org.Dr.eg.db (zebrafish), org.Gg.eg.db (chicken), org.Hs.eg.db (human), org.Mm.eg.db (mouse), org.Mmu.eg.db (rhesus), org.Pt.eg.db (chimpanzee), org.Rn.eg.db (rat), org.Sc.sgd.db (yeast), org.Ss.eg.db (pig), or org.Xl.eg.db (frog). |
word_count |
Number of top words to display |
A barplot comprised of words sorted by frequency of occurrence from gene ontology terms of either a BP, CC, or MF category.
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) library(tm) library(SnowballC) library(wordcloud) library(RColorBrewer) library(org.Rn.eg.db) plotWordFreq(rat, 0, 500, BP, org.Rn.eg.db, 10)
library(rtracklayer) rat <- readGFF("ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz") fpath <- system.file("extdata", "somepeaksfile.txt", package="geneXtendeR") peaksInput(fpath) library(tm) library(SnowballC) library(wordcloud) library(RColorBrewer) library(org.Rn.eg.db) plotWordFreq(rat, 0, 500, BP, org.Rn.eg.db, 10)
A dataset downloaded from Ensembl that contains the entries of a GTF file for Rattus norvegicus.
data(rat)
data(rat)
A data frame with 748514 rows and 28 variables corresponding to the entries of a GTF file. Column names are standardized and can be found here: http://www.ensembl.org/info/website/upload/gff.html.
Demonstrates a rat GTF file downloaded from: ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz.
head(rat) tail(rat)
head(rat) tail(rat)
A dataset containing the chromosome number, start and stop positions of ChIP-seq peaks along the Rattus norvegicus genome (rn6 assembly). A dataset like this may be used as input to the peaksInput() command, which will sort the dataset by chromosome number and start position.
data(samplepeaksinput)
data(samplepeaksinput)
A data frame with 25089 rows and 3 variables:
Chromosome number
Peak start position [in units of base pairs]
Peak end position [in units of base pairs]
Demonstrates a sample peaks file used as input.
head(samplepeaksinput) tail(samplepeaksinput)
head(samplepeaksinput) tail(samplepeaksinput)