| Title: | epiSeeker: an R package for Annotation, Comparison and Visualization of multi-omics epigenetic data |
|---|---|
| Description: | This package implements functions to analyze multi-omics epigenetic data. Data of fragment type and base type are supported by epiSeeker. It provides functions to retrieve the nearest genes around the peak, annotate genomic region of the peak, statistical methods to estimate the significance of overlap among peak data sets, and motif analysis. It incorporates the GEO database for users to compare their own dataset with those deposited in the database. The comparison can be used to infer cooperative regulation and thus can be used to generate hypotheses. Several visualization functions are implemented to summarize the coverage of the peak experiment, average profile and heatmap of peaks binding to TSS regions, genomic annotation, distance to TSS, overlap of peaks or genes, and the single-base resolution epigenetic data by considering the strand, motif, and additional information. |
| Authors: | Guangchuang Yu [aut, cre, fnd] (ORCID: <https://orcid.org/0000-0002-6485-8781>), Ming Li [ctb], Qianwen Wang [ctb], Yun Yan [ctb], Hervé Pagès [ctb], Michael Kluge [ctb], Thomas Schwarzl [ctb], Zhougeng Xu [ctb], Chun-Hui Gao [ctb] |
| Maintainer: | Guangchuang Yu <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.1.0 |
| Built: | 2026-05-19 10:25:42 UTC |
| Source: | https://github.com/bioc/epiSeeker |
capture name of variable
.(..., .env = parent.frame()).(..., .env = parent.frame())
... |
expression |
.env |
environment |
expression
x <- 1 eval(.(x)[[1]])x <- 1 eval(.(x)[[1]])
Env function for epiSeeker
.epiSeekerEnv(TxDb, item = "epiSeekerEnv", force = FALSE).epiSeekerEnv(TxDb, item = "epiSeekerEnv", force = FALSE)
TxDb |
TxDb object |
item |
item name |
force |
force to update txdb item in cache or not. |
Returns 'invisible(NULL)' invisibly. The primary purpose of this function is to manage the TXDB cache through side effects (creating, updating, or removing cached objects), rather than returning a value.
Annotate peaks
annotateSeq( peak, tssRegion = c(-3000, 3000), TxDb = NULL, level = "transcript", assignGenomicAnnotation = TRUE, genomicAnnotationPriority = c("Promoter", "5UTR", "3UTR", "Exon", "Intron", "Downstream", "Intergenic"), annoDb = NULL, addFlankGeneInfo = FALSE, flankDistance = 5000, sameStrand = FALSE, ignoreOverlap = FALSE, ignoreUpstream = FALSE, ignoreDownstream = FALSE, overlap = "TSS", verbose = TRUE, columns = c("ENTREZID", "ENSEMBL", "SYMBOL", "GENENAME") )annotateSeq( peak, tssRegion = c(-3000, 3000), TxDb = NULL, level = "transcript", assignGenomicAnnotation = TRUE, genomicAnnotationPriority = c("Promoter", "5UTR", "3UTR", "Exon", "Intron", "Downstream", "Intergenic"), annoDb = NULL, addFlankGeneInfo = FALSE, flankDistance = 5000, sameStrand = FALSE, ignoreOverlap = FALSE, ignoreUpstream = FALSE, ignoreDownstream = FALSE, overlap = "TSS", verbose = TRUE, columns = c("ENTREZID", "ENSEMBL", "SYMBOL", "GENENAME") )
peak |
peak file or GRanges object |
tssRegion |
Region Range of TSS |
TxDb |
TxDb or EnsDb annotation object |
level |
one of transcript and gene |
assignGenomicAnnotation |
logical, assign peak genomic annotation or not |
genomicAnnotationPriority |
genomic annotation priority |
annoDb |
annotation package |
addFlankGeneInfo |
logical, add flanking gene information from the peaks |
flankDistance |
distance of flanking sequence |
sameStrand |
logical, whether find nearest/overlap gene in the same strand |
ignoreOverlap |
logical, whether ignore overlap of TSS with peak |
ignoreUpstream |
logical, if True only annotate gene at the 3' of the peak. |
ignoreDownstream |
logical, if True only annotate gene at the 5' of the peak. |
overlap |
one of 'TSS' or 'all', if overlap="all", then gene overlap with peak will be reported as nearest gene, no matter the overlap is at TSS region or not. |
verbose |
print message or not |
columns |
names of columns to be obtained from database |
data.frame or GRanges object with columns of:
all columns provided by input.
annotation: genomic feature of the peak, for instance if the peak is located in 5'UTR, it will annotated by 5'UTR. Possible annotation is Promoter-TSS, Exon, 5' UTR, 3' UTR, Intron, and Intergenic.
geneChr: Chromosome of the nearest gene
geneStart: gene start
geneEnd: gene end
geneLength: gene length
geneStrand: gene strand
geneId: entrezgene ID
distanceToTSS: distance from peak to gene TSS
if annoDb is provided, extra column will be included:
ENSEMBL: ensembl ID of the nearest gene
SYMBOL: gene symbol
GENENAME: full gene name
G Yu
[plotAnnoBar()] [plotAnnoPie()] [plotDistToTSS()]
data(peakAnno) peakAnnodata(peakAnno) peakAnno
Arrange GRanges object
## S3 method for class 'GRanges' arrange(.data, ..., .by_group = FALSE)## S3 method for class 'GRanges' arrange(.data, ..., .by_group = FALSE)
.data |
granges object |
... |
additional parameters |
.by_group |
If TRUE, will sort first by grouping variable. Applies to grouped data frames only. |
grange object
peakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") peak <- readPeakFile(peakfile) dplyr::arrange(peak, seqnames)peakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") peak <- readPeakFile(peakfile) dplyr::arrange(peak, seqnames)
convert csAnno object to data.frame
## S3 method for class 'csAnno' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'csAnno' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
csAnno object |
row.names |
row names |
optional |
should be omitted. |
... |
additional parameters |
data.frame
Guangchuang Yu <https://guangchuangyu.github.io>
convert csAnno object to GRanges
as.GRanges(x)as.GRanges(x)
x |
csAnno object |
GRanges object
Guangchuang Yu <https://guangchuangyu.github.io>
data(peakAnno) as.GRanges(peakAnno)data(peakAnno) as.GRanges(peakAnno)
bin vector function
bin_vector(vec, nbin = 800)bin_vector(vec, nbin = 800)
vec |
vector. |
nbin |
number of bin. |
bin list
This is constructor fo bmData objects.
bmData( value1 = NULL, value2 = NULL, pos = NULL, chr = NULL, gr = NULL, sampleNames = NULL, valueNames = NULL, ... )bmData( value1 = NULL, value2 = NULL, pos = NULL, chr = NULL, gr = NULL, sampleNames = NULL, valueNames = NULL, ... )
value1 |
the first value to be stored, a matrix-like object |
value2 |
the second value to be stored, a matrix-like object |
pos |
A vector of locations |
chr |
A vector of chromosomes |
gr |
An object of type [GenomicRanges::GRanges] |
sampleNames |
A vector of sample names |
valueNames |
the name of value1 or value2 or both. The order maps to the value. |
... |
other parameters from [bsseq::BSseq] |
bmData object
data(demo_bmdata)data(demo_bmdata)
This class added extra data to [bsseq::BSseq-class]. Change the assays by storing M/Cov to any value1/2
bmData object
bmData class inherits [SummarizedExperiment::RangedSummarizedExperiment-class], other slots see [SummarizedExperiment::RangedSummarizedExperiment]
check bin parameter method
check_bin(nbin, windows, verbose)check_bin(nbin, windows, verbose)
nbin |
numbers of bin. |
windows |
a list of region in granges. |
verbose |
show details or not |
message or nothing
check upstream and downstream extension
check_extension(upstream, downstream, type)check_extension(upstream, downstream, type)
upstream |
upstream extension. One of actual number or rel() object. |
downstream |
downstream extension. One of actual number or rel() object. |
type |
one of "start_site", "end_site", "body". |
message or null
check windows function
check_windows(windows)check_windows(windows)
windows |
windows |
message or null
Combine csAnno Object
combine_csAnno(x, ...)combine_csAnno(x, ...)
x |
csAnno object |
... |
csAnno objects |
https://github.com/YuLab-SMU/ChIPseeker/issues/157
csAnno object
data(peakAnno) combine_csAnno(peakAnno, peakAnno)data(peakAnno) combine_csAnno(peakAnno, peakAnno)
create regex patterns in negative strand
create_regex_patterns_negative(motif)create_regex_patterns_negative(motif)
motif |
the motif(e.g C:CG/CH, A:GAGG/AGG) of the base modification |
regex pattern
create regex patterns in positive strand
create_regex_patterns_positive(motif)create_regex_patterns_positive(motif)
motif |
the motif(e.g C:CG/CH, A:GAGG/AGG) of the base modification |
regex pattern
Class "csAnno" This class represents the output of epiSeeker Annotation
annotation object
annoannotation
tssRegionTSS region
leveltranscript or gene
hasGenomicAnnotationlogical
detailGenomicAnnotationGenomic Annotation in detail
annoStatannotation statistics
peakNumnumber of peaks
Guangchuang Yu <https://guangchuangyu.github.io>
[annotateSeq()]
A small example bmData object representing cytosine methylation
measurements from Bisulfite-Seq data.
This dataset is intended for demonstrating base-modification visualization,
regional methylation profiling, and epiSeeker workflows operating on
bmData objects.
See data-raw/example_data.R
A bmData object containing one sample.
bmData object
The example dataset was constructed from publicly available Bisulfite-Seq
data (GEO accession: GSM6940395, genome build: hg38).
The raw methylation coverage file (*.bismark.cov.gz) was imported
using data.table::fread().
A small genomic window on chromosome 22
([10525991, 10526342]) was selected to create a lightweight example
dataset. The data were processed as follows:
Filter records where chrom == 22 and positions fall within
the chosen window.
Convert chromosome name to UCSC style ("chr22").
Compute total coverage as: Cov = methylated + unmethylated.
Extract columns: chromosome, position, coverage, and methylation percentage.
Convert methylation percentage to a fraction.
A bmData S4 object containing one sample ("acinar_methyl").
Each entry stores:
chrChromosome in UCSC format (e.g. "chr22").
posGenomic coordinate of the cytosine.
CovTotal read coverage at the site.
MethylationMethylation level as a fraction (0–1).
Peak in Grange object. See data-raw/example_data.R
A GRanges object with 200 rows and 5 metadata columns.
Grange object
The demo peaks were extracted from GSM6418464 in the GEO database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM6418464).
A GRanges object with 220 genomic ranges and the following metadata columns:
seqnameschr name
rangesPeak ranges
strandstrand information
mcoloutput from MACS2
Download all BED files of a particular genome version
downloadGEObedFiles(genome, destDir = getwd())downloadGEObedFiles(genome, destDir = getwd())
genome |
genome version |
destDir |
destination folder |
GEO files
G Yu
gse <- "GSE11431"gse <- "GSE11431"
Download BED supplementary files of a list of GSM accession numbers
downloadGSMbedFiles(GSM, destDir = getwd())downloadGSMbedFiles(GSM, destDir = getwd())
GSM |
GSM accession numbers |
destDir |
destination folder |
GEO data
G Yu
gsm <- "GSM288348"gsm <- "GSM288348"
dropAnno
dropAnno(csAnno, distanceToTSS_cutoff = 10000)dropAnno(csAnno, distanceToTSS_cutoff = 10000)
csAnno |
output of annotateSeq |
distanceToTSS_cutoff |
distance to TSS cutoff |
drop annotation exceeding distanceToTSS_cutoff
csAnno object
Guangchuang Yu
data(peakAnno) dropAnno(peakAnno)data(peakAnno) dropAnno(peakAnno)
Calculate overlap significance of ChIP experiments based on their nearest gene annotation
enrichAnnoOverlap( queryPeak, targetPeak, TxDb = NULL, pAdjustMethod = "BH", chainFile = NULL, distanceToTSS_cutoff = NULL )enrichAnnoOverlap( queryPeak, targetPeak, TxDb = NULL, pAdjustMethod = "BH", chainFile = NULL, distanceToTSS_cutoff = NULL )
queryPeak |
query bed file |
targetPeak |
target bed file(s) or folder containing bed files |
TxDb |
TxDb |
pAdjustMethod |
pvalue adjustment method |
chainFile |
chain file for liftOver |
distanceToTSS_cutoff |
restrict nearest gene annotation by distance cutoff |
data.frame
G Yu
if (interactive()) { require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene peakfile <- system.file("extdata", "demo_peak.txt", package = "epiSeeker") enrichAnnoOverlap(peakfile, peakfile, txdb) }if (interactive()) { require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene peakfile <- system.file("extdata", "demo_peak.txt", package = "epiSeeker") enrichAnnoOverlap(peakfile, peakfile, txdb) }
calculate overlap significant of ChIP experiments based on the genome coordinations
enrichPeakOverlap( queryPeak, targetPeak, TxDb = NULL, pAdjustMethod = "BH", nShuffle = 1000, chainFile = NULL, pool = TRUE, mc.cores = detectCores() - 1, verbose = TRUE )enrichPeakOverlap( queryPeak, targetPeak, TxDb = NULL, pAdjustMethod = "BH", nShuffle = 1000, chainFile = NULL, pool = TRUE, mc.cores = detectCores() - 1, verbose = TRUE )
queryPeak |
query bed file or GRanges object |
targetPeak |
target bed file(s) or folder that containing bed files or a list of GRanges objects |
TxDb |
TxDb |
pAdjustMethod |
pvalue adjustment method |
nShuffle |
shuffle numbers |
chainFile |
chain file for liftOver |
pool |
logical, whether pool target peaks |
mc.cores |
number of cores, see mclapply |
verbose |
logical |
data.frame
G Yu
require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene peakfile <- system.file("extdata", "demo_peak.txt", package = "epiSeeker") peak <- readPeakFile(peakfile)[1:10] enrichPeakOverlap(peak, peakfile, txdb, mc.cores = 1, nShuffle = 20)require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene peakfile <- system.file("extdata", "demo_peak.txt", package = "epiSeeker") peak <- readPeakFile(peakfile)[1:10] enrichPeakOverlap(peak, peakfile, txdb, mc.cores = 1, nShuffle = 20)
Name of the epiSeeker cache environment (internal static variable)
epiSeekerCacheepiSeekerCache
character vector
Extend regions functions
extend_gr(regions, upstream, downstream, by, type)extend_gr(regions, upstream, downstream, by, type)
regions |
GRanges object |
upstream |
upstream extension. One of actual number or rel() object. |
downstream |
downstream extension. One of actual number or rel() object. |
by |
one of 'gene', 'transcript', 'exon', 'intron' , '3UTR' , '5UTR', 'UTR'. |
type |
one of "start_site", "end_site", "body". |
GRanges object
Extend filter to Peak (GRanges class object)
## S3 method for class 'GRanges' filter(.data, ..., .by = NULL, .preserve = FALSE)## S3 method for class 'GRanges' filter(.data, ..., .by = NULL, .preserve = FALSE)
.data |
granges object |
... |
additional parameters |
.by |
Optional grouping variable(s) (column name or variable expression) specifying which columns to group by when applying filters |
.preserve |
Logical value indicating whether to preserve the original grouping structure when .by is specified. If TRUE, group order and identities are maintained |
A filtered GRanges object containing only rows that meet the specified criteria
grange object
peakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") peak <- readPeakFile(peakfile) dplyr::filter(peak, fold_enrichment > 20)peakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") peak <- readPeakFile(peakfile) dplyr::filter(peak, fold_enrichment > 20)
getting status of annotation
getAnnoStat(x)getAnnoStat(x)
x |
csAnno object |
data frame
data(peakAnno) getAnnoStat(peakAnno)data(peakAnno) getAnnoStat(peakAnno)
Prepare a bioregion of selected feature
getBioRegion( TxDb = NULL, upstream = 1000, downstream = 1000, by = "gene", type = "start_site" )getBioRegion( TxDb = NULL, upstream = 1000, downstream = 1000, by = "gene", type = "start_site" )
TxDb |
TxDb object or self-made granges object. |
upstream |
upstream extension. One of actual number or rel() object. |
downstream |
downstream extension. One of actual number or rel() object. |
by |
one of 'gene', 'transcript', 'exon', 'intron' , '3UTR' , '5UTR', 'UTR'. |
type |
one of "start_site", "end_site", "body". |
this function combined previous functions getPromoters(), getBioRegion() and getGeneBody() in order to solve the following issues.
(1) <https://github.com/GuangchuangYu/ChIPseeker/issues/16>
(2) <https://github.com/GuangchuangYu/ChIPseeker/issues/87>
1. function can provide a region of interest from txdb object. 2. function can make region from granges object. txdb object do not contain insulator or enhancer regions. Users can provide these regions through self-made granges object https://github.com/YuLab-SMU/ChIPseeker/issues/189.
There are three kinds of way to extend regions: start_site, end_site and body. We take transcript region to expain the differences of these three regions (tx: chr1 1000 1400).
(1) body region refers to the 1000 ~ 1400 bp.
(2) start_site region with (upstream = upstream = 100) refers to 900-1100bp.
(3) end_site region with (upstream = upstream = 100) refers to 1300-1500bp.
GRanges object
Guangchuang Yu
require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene getBioRegion(txdb)require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene getBioRegion(txdb)
getBmMatrix method for [bsseq::BSseq]
getBmMatrix method for bmData
getBmMatrix( region, input, BSgenome, base = NULL, motif = NULL, position_bias = NULL, ... ) ## S4 method for signature 'ANY,BSseq' getBmMatrix( region, input, BSgenome, base = NULL, motif = NULL, position_bias = NULL, cover_depth = TRUE, ... ) ## S4 method for signature 'ANY,bmData' getBmMatrix( region, input, BSgenome, base = NULL, motif = NULL, position_bias = NULL, ... )getBmMatrix( region, input, BSgenome, base = NULL, motif = NULL, position_bias = NULL, ... ) ## S4 method for signature 'ANY,BSseq' getBmMatrix( region, input, BSgenome, base = NULL, motif = NULL, position_bias = NULL, cover_depth = TRUE, ... ) ## S4 method for signature 'ANY,bmData' getBmMatrix( region, input, BSgenome, base = NULL, motif = NULL, position_bias = NULL, ... )
region |
base modification region in the form of dataframe, having columns of "chr","start" and "end" |
input |
the input data stored in BSseq objects or BSseqExtra objects |
BSgenome |
genome reference |
base |
one of A/T/G/C/U |
motif |
the motif(e.g C:CG/CH, A:GAGG/AGG) of the base modification |
position_bias |
1-base bias. e.g position_bias = 1("C" in "CHH"), position_bias = 2("A" in "GAGG") |
... |
other parameters |
cover_depth |
take the depth of cover into account or not |
data.frame
dataframe
require(BSgenome.Hsapiens.UCSC.hg38) data(demo_bmdata) bmMatrix <- getBmMatrix( region = data.frame(chr = "chr22", start = 10525991, end = 10526342), BSgenome = BSgenome.Hsapiens.UCSC.hg38, input = demo_bmdata, base = "C", motif = c("CG") )require(BSgenome.Hsapiens.UCSC.hg38) data(demo_bmdata) bmMatrix <- getBmMatrix( region = data.frame(chr = "chr22", start = 10525991, end = 10526342), BSgenome = BSgenome.Hsapiens.UCSC.hg38, input = demo_bmdata, base = "C", motif = c("CG") )
get the information of base modification
getBmMatrix.bmData( region, input, BSgenome, base = NULL, motif = NULL, position_bias = NULL )getBmMatrix.bmData( region, input, BSgenome, base = NULL, motif = NULL, position_bias = NULL )
region |
base modification region in the form of dataframe, having columns of "chr","start" and "end" |
input |
the input data stored in bmData objects |
BSgenome |
genome reference |
base |
one of A/T/G/C/U |
motif |
the motif(e.g C:CG/CH, A:GAGG/AGG) of the base modification |
position_bias |
1-base bias. e.g position_bias = 1("C" in "CHH"), position_bias = 2("A" in "GAGG") |
This function retrieve the information of each base, requiring bmData object as input. Then organized it to dataframe.
dataframe
Get the information of base modification
getBmMatrix.BSseq( region, input, BSgenome, cover_depth = TRUE, base = NULL, motif = NULL, position_bias = NULL )getBmMatrix.BSseq( region, input, BSgenome, cover_depth = TRUE, base = NULL, motif = NULL, position_bias = NULL )
region |
base modification region in the form of data.frame, having columns of "chr","start" and "end" |
input |
the input data stored in [bsseq::BSseq] objects |
BSgenome |
genome reference |
cover_depth |
take the depth of cover into account or not |
base |
one of A/T/G/C/U |
motif |
the motif(e.g C:CG/CH, A:GAGG/AGG) of the base modification |
position_bias |
1-base bias. e.g position_bias = 1("C" in "CHH"), position_bias = 2("A" in "GAGG") |
This function retrieve the information of each base, requiring [bsseq::BSseq] object as input. Then organized it to data.frame.
data.frame
Get gene annotation, symbol, gene name etc.
getGeneAnno(annoDb, geneID, type, columns)getGeneAnno(annoDb, geneID, type, columns)
annoDb |
annotation package |
geneID |
query geneID |
type |
gene ID type |
columns |
names of columns to be obtained from database |
data.frame
G Yu
Get Genomic Annotation of peaks
getGenomicAnnotation( peaks, distance, tssRegion = c(-3000, 3000), TxDb, level, genomicAnnotationPriority, sameStrand = FALSE )getGenomicAnnotation( peaks, distance, tssRegion = c(-3000, 3000), TxDb, level, genomicAnnotationPriority, sameStrand = FALSE )
peaks |
peaks in GRanges object |
distance |
distance of peak to TSS |
tssRegion |
tssRegion, default is -3kb to +3kb |
TxDb |
TxDb object |
level |
one of gene or transcript |
genomicAnnotationPriority |
genomic Annotation Priority |
sameStrand |
whether annotate gene in same strand |
character vector
G Yu
Get genome version statistics collecting from GEO ChIPseq data
getGEOgenomeVersion()getGEOgenomeVersion()
data.frame
G Yu
getGEOgenomeVersion()getGEOgenomeVersion()
Get subset of GEO information by genome version keyword
getGEOInfo(genome, simplify = TRUE)getGEOInfo(genome, simplify = TRUE)
genome |
genome version |
simplify |
simplify result or not |
data.frame
G Yu
hg19 <- getGEOInfo(genome = "hg19", simplify = TRUE)hg19 <- getGEOInfo(genome = "hg19", simplify = TRUE)
Accessing species statistics collecting from GEO database
getGEOspecies()getGEOspecies()
data.frame
G Yu
getGEOspecies()getGEOspecies()
Get the information of motif in a range
getMotifMatrix(region, pwm, ref_obj, by = "name")getMotifMatrix(region, pwm, ref_obj, by = "name")
region |
region object in granges. |
pwm |
PFMatrixList. |
ref_obj |
seq reference object. e.g. BSgenome object. |
by |
show the motif by name or ID. |
score matrix
require(BSgenome.Hsapiens.UCSC.hg38) data(pwm_obj) region_oi <- GRanges( seqnames = "chr22", ranges = IRanges(start = 10525891, end = 10525991) ) motifMatrix <- getMotifMatrix( region = region_oi, pwm = pwm_obj[c(45, 120, 170)], ref_obj = BSgenome.Hsapiens.UCSC.hg38 )require(BSgenome.Hsapiens.UCSC.hg38) data(pwm_obj) region_oi <- GRanges( seqnames = "chr22", ranges = IRanges(start = 10525891, end = 10525991) ) motifMatrix <- getMotifMatrix( region = region_oi, pwm = pwm_obj[c(45, 120, 170)], ref_obj = BSgenome.Hsapiens.UCSC.hg38 )
Get index of features that closest to peak and calculate distance
getNearestFeatureIndicesAndDistances( peaks, features, sameStrand = FALSE, ignoreOverlap = FALSE, ignoreUpstream = FALSE, ignoreDownstream = FALSE, overlap = "TSS" )getNearestFeatureIndicesAndDistances( peaks, features, sameStrand = FALSE, ignoreOverlap = FALSE, ignoreUpstream = FALSE, ignoreDownstream = FALSE, overlap = "TSS" )
peaks |
peak in GRanges |
features |
features in GRanges |
sameStrand |
logical, whether find nearest gene in the same strand |
ignoreOverlap |
logical, whether ignore overlap of TSS with peak |
ignoreUpstream |
logical, if True only annotate gene at the 3' of the peak. |
ignoreDownstream |
logical, if True only annotate gene at the 5' of the peak. |
overlap |
one of "TSS" or "all" |
list
G Yu
Get promoter region in GRanges format
getPromoters(TxDb = NULL, upstream = 1000, downstream = 1000, by = "gene")getPromoters(TxDb = NULL, upstream = 1000, downstream = 1000, by = "gene")
TxDb |
TxDb object |
upstream |
upstream extension. One of actual number or rel() object. |
downstream |
downstream extension. One of actual number or rel() object. |
by |
one of 'gene', 'transcript'. |
GRanges object
Guangchuang Yu
require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene promoters <- getPromoters(TxDb = txdb, upstream = 1000, downstream = 1000)require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene promoters <- getPromoters(TxDb = txdb, upstream = 1000, downstream = 1000)
get filenames of sample files
getSampleFiles()getSampleFiles()
list of file names
G Yu
files <- getSampleFiles()files <- getSampleFiles()
getTagMatrix
getTagMatrix( peak, upstream = 0, downstream = 0, windows = NULL, type = NULL, by = NULL, TxDb = NULL, weightCol = NULL, nbin = NULL, verbose = TRUE, ignore_strand = FALSE )getTagMatrix( peak, upstream = 0, downstream = 0, windows = NULL, type = NULL, by = NULL, TxDb = NULL, weightCol = NULL, nbin = NULL, verbose = TRUE, ignore_strand = FALSE )
peak |
(1) a peak file or GRanges object. (2) a list of peak file or GRanges object. |
upstream |
upstream extension. One of actual number or rel() object. |
downstream |
downstream extension. One of actual number or rel() object. |
windows |
a collection of region |
type |
one of "start_site", "end_site", "body" |
by |
one of 'gene', 'transcript', 'exon', 'intron', '3UTR' , '5UTR', or specified by users |
TxDb |
TxDb or self-made granges object, served as txdb |
weightCol |
column name of weight, default is NULL.This column acts as a weight vaule. Details see https://github.com/YuLab-SMU/ChIPseeker/issues/15 |
nbin |
the amount of nbines. Calculate the tagMatrix by binning method. Idea is derived from the function of deeptools(https://deeptools.readthedocs.io/en/develop/content/tools/computeMatrix.html) |
verbose |
print message or not |
ignore_strand |
ignore the strand information or not |
getTagMatrix() function can produce the matrix for visualization. Matrix represents the peak count in a windows and there are two ways to specify the 'windows':
(1) use getPromoters and getBioRegion to get 'windows' and
put it into windows parameter in getTagMatrix().
(2) use getTagMatrix() to call getPromoters()/getBioRegion(). In this way users do not need to input 'windows' parameter but need to input 'TxDb' parameter. 'TxDb' can accept a set of packages contained annotation of regions of different genomes(e.g. TxDb.Hsapiens.UCSC.hg38.knownGene). Users can get the regions of interest through specific functions. These specific functions are built in getPromoters()/getBioRegion().
However, many regions can not be gain through txdb(e.g. insulator and enhancer regions), Users can provide these regions in the form of granges object. These self-made granges object will be passed to 'TxDb' and they will be passed to makeBioRegionFromGranges() to produce the 'windows'.
In a word, 'TxDb' parameter getTagMatrix() is a reference information. Users can pass txdb object or self-made granges into it.
tagMatrix
G Yu
if (interactive()) { require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene data(demo_peak) tagMatrix <- getTagMatrix(demo_peak, type = "start_site", by = "gene", upstream = 500, downstream = 500, TxDb = txdb, weightCol = "V7" ) }if (interactive()) { require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene data(demo_peak) tagMatrix <- getTagMatrix(demo_peak, type = "start_site", by = "gene", upstream = 500, downstream = 500, TxDb = txdb, weightCol = "V7" ) }
getTagMatrix function for region of body
getTagMatrix_body( peak.cov, windows, nbin, verbose = TRUE, ignore_strand = FALSE )getTagMatrix_body( peak.cov, windows, nbin, verbose = TRUE, ignore_strand = FALSE )
peak.cov |
peak coverage. |
windows |
a collection of region. |
nbin |
the amount of nbines |
verbose |
print message or not |
ignore_strand |
ignore the strand information or not |
tagMatrix
get tagmatrix internal function
getTagMatrix_body_internal(peak.cov, windows, nbin, chr.idx)getTagMatrix_body_internal(peak.cov, windows, nbin, chr.idx)
peak.cov |
peak coverage. |
windows |
a collection of region. |
nbin |
the amount of nbines. |
chr.idx |
idx of chr. |
matrix
getTagMatrix function for region of site
getTagMatrix_site( peak.cov, windows, chr.idx, nbin = NULL, verbose = TRUE, ignore_strand = FALSE )getTagMatrix_site( peak.cov, windows, chr.idx, nbin = NULL, verbose = TRUE, ignore_strand = FALSE )
peak.cov |
peak coverage. |
windows |
a collection of region. |
chr.idx |
idx of chr. |
nbin |
the amount of nbines |
verbose |
print message or not |
ignore_strand |
ignore the strand information or not |
tagMatrix
getTagMatrix internal function
getTagMatrix.internal( peak, upstream = 0, downstream = 0, windows = NULL, type = NULL, by = NULL, TxDb = NULL, weightCol = NULL, nbin = NULL, verbose = TRUE, ignore_strand = FALSE )getTagMatrix.internal( peak, upstream = 0, downstream = 0, windows = NULL, type = NULL, by = NULL, TxDb = NULL, weightCol = NULL, nbin = NULL, verbose = TRUE, ignore_strand = FALSE )
peak |
peak file or GRanges object |
upstream |
upstream extension. One of actual number or rel() object. |
downstream |
downstream extension. One of actual number or rel() object. |
windows |
a collection of region |
type |
one of "start_site", "end_site", "body" |
by |
one of 'gene', 'transcript', 'exon', 'intron', '3UTR' , '5UTR', or specified by users |
TxDb |
TxDb or self-made granges object, served as txdb |
weightCol |
column name of weight, default is NULL. |
nbin |
the amount of nbines. |
verbose |
print message or not |
ignore_strand |
ignore the strand information or not |
matrix
change a list grange object to matrix
grange2mt(gr_list, weightCol = NULL)grange2mt(gr_list, weightCol = NULL)
gr_list |
grange list object |
weightCol |
weight column of peak. |
matrix
data(demo_peak) grange2mt(list(a = demo_peak, b = demo_peak), "V5")data(demo_peak) grange2mt(list(a = demo_peak, b = demo_peak), "V5")
ucsc genome version, precalculated data and gsm information
A data frame with 'n' rows (GSM samples) and 14 columns.
data frame
The 'gsminfo' dataset was constructed programmatically from public resources in the NCBI GEO and UCSC Genome Browser databases. The data generation pipeline is implemented in 'data-raw/' (see 'prepareGSMInfo()' in the package source).
Briefly, GEO metadata were retrieved using the 'GEOmetadb' SQLite database and 'GEOquery'. The latest GEOmetadb SQLite file was downloaded via 'getSQLiteFile()' or, if unavailable, directly from <http://starbuck1.s3.amazonaws.com/sradb/GEOmetadb.sqlite.gz>. Platform (GPL) records were queried to identify platforms associated with high-throughput sequencing experiments. For each sequencing platform, the corresponding GSM records were obtained using 'Meta(getGEO())'. Supplementary BED-like files for each GSM were collected using 'getGSMsuppFile()' and 'batchGetGSMsuppFile()'.
Additional metadata fields (title, organism, extract protocol, characteristics, data processing description, submission date, and supplementary file URLs) were extracted from GSM SOFT files downloaded using 'GEOquery'. Genome assembly versions for each GSM were inferred using the function 'getGenomicVersion()', which matches UCSC genome labels to either the data processing description or the supplementary file names, using the reference table provided in the internal dataset 'ucsc_release'.
PubMed IDs associated with each GEO series (GSE) were obtained from the 'gse' table in GEOmetadb. All GSM-level metadata were merged, cleaned, and converted to ASCII using 'iconv()' to remove non-ASCII characters.
Finally, newly processed GSM entries were appended to any preexisting 'gsminfo' object stored in the package, deduplicated, and saved as 'gsminfo.rda' with 'compress="xz"'.
Thus, 'gsminfo' represents a curated, reproducibly constructed metadata table summarizing GEO high-throughput sequencing samples, including organism, platform, experimental descriptions, processing information, genome versions, supplementary BED file locations, and associated PubMed IDs.
A data frame with one row per GSM sample and the following columns:
GEO series accession (GSE).
GEO sample accession (GSM).
GEO platform accession (GPL).
Organism name (e.g., *Mus musculus*).
Sample title as provided in GEO.
Experiment-specific metadata such as cell type, treatment, or antibody.
Source material for sequencing, typically cell or tissue type.
Detailed wet-lab protocol for chromatin extraction, immunoprecipitation, and library preparation as reported in GEO.
Antibody information or additional sample description.
Bioinformatics processing description including aligner, genome build, peak calling method, and filtering steps.
Date when the sample was submitted to GEO.
URL to supplementary processed files (e.g., BED).
Genome assembly used in the processed data (e.g., mm8, hg19).
PMID of the reference publication associated with the dataset.
load defaulst txdb
loadTxDb(TxDb)loadTxDb(TxDb)
TxDb |
txdb. |
txdb object
makeBmDataFromData method generics
makeBmDataFromData method for 'CompressedGRangesList' objects
makeBmDataFromData method for 'GRanges' objects
makeBmDataFromData method for 'list' objects
makeBmDataFromData method for data.frame objects
makeBmDataFromData(data, sampleNames = NULL) ## S4 method for signature 'CompressedGRangesList' makeBmDataFromData(data, sampleNames = NULL) ## S4 method for signature 'GRanges' makeBmDataFromData(data, sampleNames = NULL) ## S4 method for signature 'list' makeBmDataFromData(data, sampleNames = NULL) ## S4 method for signature 'data.frame' makeBmDataFromData(data, sampleNames = NULL)makeBmDataFromData(data, sampleNames = NULL) ## S4 method for signature 'CompressedGRangesList' makeBmDataFromData(data, sampleNames = NULL) ## S4 method for signature 'GRanges' makeBmDataFromData(data, sampleNames = NULL) ## S4 method for signature 'list' makeBmDataFromData(data, sampleNames = NULL) ## S4 method for signature 'data.frame' makeBmDataFromData(data, sampleNames = NULL)
data |
lists object |
sampleNames |
the name of each samples |
The objects in 'data' must have specific forms. Colunms should be features, which should be organized in the order of "chr", "pos", "value1", "value2(optional)". chr stands for chromosome. pos stands for position on chromosome, also known as coordinates. value1/2 stands for the value on each base. The colnames can be any character but must be in the order. Rows stands for each observation.
The objects in data must have specific forms. Colunms should be
features, which should be organized in the order of "chr", "pos", "value1",
"value2(optional)". chr stands for chromosome. pos stands for position on
chromosome, also known as coordinates. value1/2 stands for the value on each base.
The colnames can be any character but must be in the order. Rows stands for each
observation.
bmData
demo_bisseq_file <- system.file("extdata", "demo_bisseq.txt", package = "epiSeeker" ) demo_bisseq <- read.table(demo_bisseq_file, header = TRUE) demo_bmdata <- makeBmDataFromData( data = list(acinar_methyl = demo_bisseq), sampleNames = "acinar_methyl" )demo_bisseq_file <- system.file("extdata", "demo_bisseq.txt", package = "epiSeeker" ) demo_bisseq <- read.table(demo_bisseq_file, header = TRUE) demo_bmdata <- makeBmDataFromData( data = list(acinar_methyl = demo_bisseq), sampleNames = "acinar_methyl" )
make dmData object from data
makeBmDataFromData.internal(data, sampleNames = NULL)makeBmDataFromData.internal(data, sampleNames = NULL)
data |
lists object |
sampleNames |
the name of each samples |
This internal function was inspired by DSS::makeBSseqData.
The objects in data must have specific forms. Colunms should be
features, which should be organized in the order of "chr", "pos", "value1",
"value2(optional)". chr stands for chromosome. pos stands for position on
chromosome, also known as coordinates. value1/2 stands for the value on each base.
The colnames can be any character but must be in the order. Rows stands for each
observation.
dmData object
This function makes bmData object from files. Users can input the name of a file or a file folder.
makeBmDataFromFiles(name, sampleNames = NULL, variablesNames = NULL)makeBmDataFromFiles(name, sampleNames = NULL, variablesNames = NULL)
name |
the name of files or file folder |
sampleNames |
the name for each file |
variablesNames |
the names of the first two columns will be assigned c("chr","pos"), the names of the following columns will be assigned by variablesNames |
bed files and txt files are supported. Bed files can only contain no more than two metadata, as it stands for value1/2. Txt files should organize the columns as chr, pos, value1, value2(optional).
bmData
demo_bisseq_file <- system.file("extdata", "demo_bisseq.txt", package = "epiSeeker") data <- makeBmDataFromFiles(demo_bisseq_file, sampleNames = "acinar_methyl", variablesNames = c("Cov", "Methylation") )demo_bisseq_file <- system.file("extdata", "demo_bisseq.txt", package = "epiSeeker") data <- makeBmDataFromFiles(demo_bisseq_file, sampleNames = "acinar_methyl", variablesNames = c("Cov", "Methylation") )
Extend mutate to Peak (GRanges class object)
## S3 method for class 'GRanges' mutate( .data, ..., .by = NULL, .keep = c("all", "used", "unused", "none"), .before = NULL, .after = NULL )## S3 method for class 'GRanges' mutate( .data, ..., .by = NULL, .keep = c("all", "used", "unused", "none"), .before = NULL, .after = NULL )
.data |
granges object |
... |
additional parameters |
.by |
Optional grouping variable(s) (column name or variable expression) specifying which columns to group by for operations |
.keep |
Character vector specifying which columns to retain. Possible values: "all" (retain all columns, default), "used" (retain only columns used in calculations), "unused" (retain only columns not used in calculations), "none" (retain only newly created columns) |
.before |
Column name or position index specifying where to insert new columns before |
.after |
Column name or position index specifying where to insert new columns after |
A processed GRanges object containing the added or modified columns
peakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") peak <- readPeakFile(peakfile) dplyr::mutate(peak, score = tags)peakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") peak <- readPeakFile(peakfile) dplyr::mutate(peak, score = tags)
calculate the overlap matrix, which is useful for vennplot
overlap(Sets)overlap(Sets)
Sets |
a list of objects |
data.frame
G Yu
parse peak str
parse_peak(peak_str)parse_peak(peak_str)
peak_str |
peak str |
data frame
parse_peak("chr1:150235946-150236624")parse_peak("chr1:150235946-150236624")
A 'csAnno' object representing the annotation result of the example peak set 'demo_peak'. Peaks were annotated using the function 'annotateSeq()' in 'epiSeeker'.
A 'csAnno' object containing 220 annotated peaks.
csAnno object
Input peaks were taken from the example dataset 'demo_peak'. Annotation was generated using 'epiSeeker::annotateSeq()'.
A 'csAnno' S4 object with the following slots:
A 'GRanges' object containing the annotated peaks, including peak coordinates, basic peak metrics, and gene-based annotation fields.
Numeric vector of length two defining the upstream and downstream window used for TSS annotation.
Character string indicating whether annotation was performed at the '"transcript"' or '"gene"' level.
Logical value indicating whether detailed genomic annotation (promoter, exon, intron, etc.) was computed.
A data frame providing per-peak binary indicators for genomic categories.
A data frame summarizing annotation category frequencies across the annotated peak set.
Total number of annotated peaks.
A list of csAnno objects obtained by annotating multiple peak
files using epiSeeker::annotateSeq().
See data-raw/example_data.R
A a list of csAnno objects.
list of csAnno object
The example peak annotation list was generated using several example peak
files returned by getSampleFiles(). Each peak file was annotated
using epiSeeker::annotateSeq().
A named list where each element is a csAnno S4 object produced by annotateSeq().
plotAnnoBar method for 'csAnno' instance
plotAnnoBar( x, xlab = "", ylab = "Percentage(%)", title = "Feature Distribution", ... ) ## S4 method for signature 'list' plotAnnoBar( x, xlab = "", ylab = "Percentage(%)", title = "Feature Distribution", ... ) plotAnnoBar(x, xlab="", ylab='Percentage(%)',title="Feature Distribution", ...)plotAnnoBar( x, xlab = "", ylab = "Percentage(%)", title = "Feature Distribution", ... ) ## S4 method for signature 'list' plotAnnoBar( x, xlab = "", ylab = "Percentage(%)", title = "Feature Distribution", ... ) plotAnnoBar(x, xlab="", ylab='Percentage(%)',title="Feature Distribution", ...)
x |
'csAnno' instance |
xlab |
xlab |
ylab |
ylab |
title |
title |
... |
additional paramter |
plot
Guangchuang Yu <https://guangchuangyu.github.io>
data(peakAnno) plotAnnoBar(peakAnno)data(peakAnno) plotAnnoBar(peakAnno)
Plot feature distribution based on their chromosome region
plotAnnoBar.data.frame( anno.df, xlab = "", ylab = "Percentage(%)", title = "Feature Distribution", categoryColumn )plotAnnoBar.data.frame( anno.df, xlab = "", ylab = "Percentage(%)", title = "Feature Distribution", categoryColumn )
anno.df |
annotation stats |
xlab |
xlab |
ylab |
ylab |
title |
plot title |
categoryColumn |
category column |
plot chromosome region features
bar plot that summarize genomic features of peaks
Guangchuang Yu <https://yulab-smu.top>
[annotateSeq()] [plotAnnoPie()]
plotAnnoPie method for 'csAnno' instance
plotAnnoPie( x, ndigit = 2, cex = 0.9, col = NA, legend.position = "rightside", pie3D = FALSE, radius = 0.8, ... ) plotAnnoPie(x, ndigit = 2, cex = 0.9, col = NA, legend.position = "rightside", pie3D = FALSE, radius = 0.8, ...)plotAnnoPie( x, ndigit = 2, cex = 0.9, col = NA, legend.position = "rightside", pie3D = FALSE, radius = 0.8, ... ) plotAnnoPie(x, ndigit = 2, cex = 0.9, col = NA, legend.position = "rightside", pie3D = FALSE, radius = 0.8, ...)
x |
'csAnno' instance |
ndigit |
number of digit to round |
cex |
label cex |
col |
color |
legend.position |
topright or other. |
pie3D |
plot in 3D or not |
radius |
radius of the pie |
... |
extra parameter |
plot
Guangchuang Yu <https://guangchuangyu.github.io>
data(peakAnno) plotAnnoPie(peakAnno)data(peakAnno) plotAnnoPie(peakAnno)
pieplot from peak genomic annotation
plotAnnoPie.csAnno( x, ndigit = 2, cex = 0.8, col = NA, legend.position = "rightside", pie3D = FALSE, radius = 0.8, ... )plotAnnoPie.csAnno( x, ndigit = 2, cex = 0.8, col = NA, legend.position = "rightside", pie3D = FALSE, radius = 0.8, ... )
x |
csAnno object |
ndigit |
number of digit to round |
cex |
label cex |
col |
color |
legend.position |
topright or other. |
pie3D |
plot in 3D or not |
radius |
radius of Pie |
... |
extra parameter |
pie plot of peak genomic feature annotation
Guangchuang Yu <https://yulab-smu.top>
[annotateSeq()] [plotAnnoBar()]
data(peakAnno) plotAnnoPie(peakAnno)data(peakAnno) plotAnnoPie(peakAnno)
Plot base modification profile
plotBmProf( df, motif_color = NULL, title = NULL, xlim = NULL, interactive = FALSE, width_svg = 10, height_svg = 6, highlight = NULL, highlight_color = "#c6c3c3", highlight_alpha = 0.2, xlab = "Genomic Region(5'->3')", ylab = NULL, second_ylab = NULL, switch_y_value = TRUE, legend_lab_motif = NULL, legend_lab_value2 = NULL, strip_placement = "outside", angle_of_facet_label = 360, alpha = 0.6, y_ticks_length = 0.25, x_ticks_length = 0.25, auto_x_axis = TRUE, strip_border = FALSE, facet_label_text_size = 10, axis_title_text_size = 17, title_text_size = 20, right_y_axis_text_size = 10, left_y_axis_text_size = 10, x_axis_text_size = 10, depth_heatmap = TRUE, nrow = NULL, ncol = NULL, panel_spacing = 1, legend_box_spacing = 3, legend_position = "right" )plotBmProf( df, motif_color = NULL, title = NULL, xlim = NULL, interactive = FALSE, width_svg = 10, height_svg = 6, highlight = NULL, highlight_color = "#c6c3c3", highlight_alpha = 0.2, xlab = "Genomic Region(5'->3')", ylab = NULL, second_ylab = NULL, switch_y_value = TRUE, legend_lab_motif = NULL, legend_lab_value2 = NULL, strip_placement = "outside", angle_of_facet_label = 360, alpha = 0.6, y_ticks_length = 0.25, x_ticks_length = 0.25, auto_x_axis = TRUE, strip_border = FALSE, facet_label_text_size = 10, axis_title_text_size = 17, title_text_size = 20, right_y_axis_text_size = 10, left_y_axis_text_size = 10, x_axis_text_size = 10, depth_heatmap = TRUE, nrow = NULL, ncol = NULL, panel_spacing = 1, legend_box_spacing = 3, legend_position = "right" )
df |
the base modification data.frame |
motif_color |
the color for different motifs(CHH,CHG,CG) |
title |
the title of the plot, can also be a list of title. |
xlim |
the specified interval of region, must be the sub-interval of the dmR. list for list df |
interactive |
produce interactive fig or not. |
width_svg |
width_svg. |
height_svg |
height_svg. |
highlight |
a region or a list of region to highlight. |
highlight_color |
colors of highlight rect. Default "#c6c3c3" |
highlight_alpha |
alpha of highlight rect. |
xlab |
the x label, can also be a list of x label |
ylab |
the y label, can also be a list of y label |
second_ylab |
the ylab for second y-axis |
switch_y_value |
switch the value from left y-axis to right y-axis |
legend_lab_motif |
the label of legend for motif |
legend_lab_value2 |
the label of legend for the second value(ylab is the label for the first value) |
strip_placement |
strip.placement |
angle_of_facet_label |
the angle of facet label, e.g. 0 is horizontal |
alpha |
transparency for the depth information line |
y_ticks_length |
the length of y-axis ticks |
x_ticks_length |
the length of x-axis ticks |
auto_x_axis |
use auto x axis or not. |
strip_border |
add border to the facet label or not |
facet_label_text_size |
the size of facet label text |
axis_title_text_size |
the size of axis title text |
title_text_size |
the size of the title text |
right_y_axis_text_size |
the size of the left y axis text,this work when depth information is taken into account |
left_y_axis_text_size |
the size of the left y axis text |
x_axis_text_size |
the size of x axis text |
depth_heatmap |
draw the heatmap of depth information or not |
nrow |
the nrow of plotting a list of dmR |
ncol |
the ncol of plotting a list of dmR |
panel_spacing |
the distance between panels |
legend_box_spacing |
the distance between legend and plotting area,"cm" |
legend_position |
the position of legend |
ggplot object
require(BSgenome.Hsapiens.UCSC.hg38) data(demo_bmdata) bmMatrix <- getBmMatrix( region = data.frame(chr = "chr22", start = 10525991, end = 10526342), BSgenome = BSgenome.Hsapiens.UCSC.hg38, input = demo_bmdata, # base = "C", motif = c("CG") ) plotBmProf(bmMatrix)require(BSgenome.Hsapiens.UCSC.hg38) data(demo_bmdata) bmMatrix <- getBmMatrix( region = data.frame(chr = "chr22", start = 10525991, end = 10526342), BSgenome = BSgenome.Hsapiens.UCSC.hg38, input = demo_bmdata, # base = "C", motif = c("CG") ) plotBmProf(bmMatrix)
plotCov
plotCov( peak, weightCol = NULL, facet_level = NULL, highlight = NULL, highlight_color = "#c6c3c3", highlight_alpha = 0.2, xlab = "Chromosome Size (bp)", ylab = "", interactive = FALSE, width_svg = 10, height_svg = 6, title = "ChIP Peaks over Chromosomes", x_text_size = 10, y_text_size = 10, facet_label_text_size = 10, chrs = NULL, xlim = NULL, facet_var = NULL, facet_scales = "free", lower = 1, fill_color = "black", add_cluster_tree = FALSE, cluster_dist_method = "euclidean", cluster_hclust_methond = "complete", legend_position = NULL, add_coaccess = FALSE, curvature = 0.3, coaccess_top_n = NULL, coaccess_cor_threshold = NULL, design = NULL, coaccess_legend_pos = c(0.9, 0.5), coaccess_legend_text_size = 10, coaccess_legend_title_size = 12 )plotCov( peak, weightCol = NULL, facet_level = NULL, highlight = NULL, highlight_color = "#c6c3c3", highlight_alpha = 0.2, xlab = "Chromosome Size (bp)", ylab = "", interactive = FALSE, width_svg = 10, height_svg = 6, title = "ChIP Peaks over Chromosomes", x_text_size = 10, y_text_size = 10, facet_label_text_size = 10, chrs = NULL, xlim = NULL, facet_var = NULL, facet_scales = "free", lower = 1, fill_color = "black", add_cluster_tree = FALSE, cluster_dist_method = "euclidean", cluster_hclust_methond = "complete", legend_position = NULL, add_coaccess = FALSE, curvature = 0.3, coaccess_top_n = NULL, coaccess_cor_threshold = NULL, design = NULL, coaccess_legend_pos = c(0.9, 0.5), coaccess_legend_text_size = 10, coaccess_legend_title_size = 12 )
peak |
peak file or GRanges object. |
weightCol |
weight column of peak. |
facet_level |
facet_level. |
highlight |
a region or a list of region to highlight. |
highlight_color |
colors of highlight rect. Default "#c6c3c3" |
highlight_alpha |
alpha of highlight rect. |
xlab |
xlab. |
ylab |
ylab. |
interactive |
produce interactive fig or not. |
width_svg |
width_svg |
height_svg |
height_svg |
title |
title. |
x_text_size |
the size of x text. |
y_text_size |
the size of y text. |
facet_label_text_size |
the size of facet label text. |
chrs |
selected chromosomes to plot, all chromosomes by default. |
xlim |
ranges to plot, default is whole chromosome. |
facet_var |
how to facet. one of c("chr~.", ".~ chr", ".~.id", ".id~.", ".id~chr", "chr~.id") |
facet_scales |
how to scale facet data. Default: "free". |
lower |
lower cutoff of coverage signal. |
fill_color |
specify the color/palette for the plot. Order matters. |
add_cluster_tree |
add cluster tree for samples or not. |
cluster_dist_method |
method for calculate cluster tree. Details see [stats::dist()] |
cluster_hclust_methond |
method for hclust. Details see [stats::hclust()] |
legend_position |
legend_position |
add_coaccess |
add co-accessibility or not |
curvature |
curvature. |
coaccess_top_n |
top n co-accessibility to show, default: 3. |
coaccess_cor_threshold |
co-access peak cor threshold. |
design |
the design layout of figure. |
coaccess_legend_pos |
the legend position of co-accessibiliy plot legend. |
coaccess_legend_text_size |
the legend position of co-accessibiliy plot legend text size. |
coaccess_legend_title_size |
the legend position of co-accessibiliy plot legend title size. |
Plot peak coverage
ggplot2 object
G Yu
peakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") peak <- readPeakFile(peakfile) plotCov(peak)peakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") peak <- readPeakFile(peakfile) plotCov(peak)
plotDistToTSS method for 'csAnno' instance
plotDistToTSS( x, distanceColumn = "distanceToTSS", xlab = "", ylab = "Binding sites (%) (5'->3')", title = "Distribution of transcription factor-binding loci relative to TSS", ... ) ## S4 method for signature 'list' plotDistToTSS( x, distanceColumn = "distanceToTSS", xlab = "", ylab = "Binding sites (%) (5'->3')", title = "Distribution of transcription factor-binding loci relative to TSS", distanceBreaks = c(0, 1000, 3000, 5000, 10000, 1e+05), palette = NULL, ... ) plotDistToTSS(x,distanceColumn="distanceToTSS", xlab="", ylab="Binding sites (%) (5'->3')", title="Distribution of transcription factor-binding loci relative to TSS",...)plotDistToTSS( x, distanceColumn = "distanceToTSS", xlab = "", ylab = "Binding sites (%) (5'->3')", title = "Distribution of transcription factor-binding loci relative to TSS", ... ) ## S4 method for signature 'list' plotDistToTSS( x, distanceColumn = "distanceToTSS", xlab = "", ylab = "Binding sites (%) (5'->3')", title = "Distribution of transcription factor-binding loci relative to TSS", distanceBreaks = c(0, 1000, 3000, 5000, 10000, 1e+05), palette = NULL, ... ) plotDistToTSS(x,distanceColumn="distanceToTSS", xlab="", ylab="Binding sites (%) (5'->3')", title="Distribution of transcription factor-binding loci relative to TSS",...)
x |
'csAnno' instance |
distanceColumn |
distance column name |
xlab |
xlab |
ylab |
ylab |
title |
title |
... |
additional parameter |
distanceBreaks |
breaks of distance, default is 'c(0, 1000, 3000, 5000, 10000, 100000)' |
palette |
palette name for coloring different distances. Run 'RColorBrewer::display.brewer.all()' to see all applicable values. |
plot
Guangchuang Yu <https://guangchuangyu.github.io>
data(peakAnno) plotDistToTSS(peakAnno)data(peakAnno) plotDistToTSS(peakAnno)
Plot feature distribution based on the distances to the TSS
plotDistToTSS.data.frame( peakDist, distanceColumn = "distanceToTSS", distanceBreaks = c(0, 1000, 3000, 5000, 10000, 1e+05), palette = NULL, xlab = "", ylab = "Binding sites (%) (5'->3')", title = "Distribution of transcription factor-binding loci relative to TSS", categoryColumn = ".id" )plotDistToTSS.data.frame( peakDist, distanceColumn = "distanceToTSS", distanceBreaks = c(0, 1000, 3000, 5000, 10000, 1e+05), palette = NULL, xlab = "", ylab = "Binding sites (%) (5'->3')", title = "Distribution of transcription factor-binding loci relative to TSS", categoryColumn = ".id" )
peakDist |
peak annotation |
distanceColumn |
column name of the distance from peak to nearest gene |
distanceBreaks |
default is 'c(0, 1000, 3000, 5000, 10000, 100000)' |
palette |
palette name for coloring different distances. Run 'RColorBrewer::display.brewer.all()' to see all applicable values. |
xlab |
x label |
ylab |
y label |
title |
figure title |
categoryColumn |
category column, default is ".id" |
bar plot that summarize distance from peak to TSS of the nearest gene.
Guangchuang Yu https://guangchuangyu.github.io
Plot gene track
plotGeneTrack( txdb, chr, start_pos, end_pos, xlab = "", ylab = "", x_text_size = 10, y_text_size = 10, select_gene = "all", palette = NULL, fromType = "ENTREZID", highlight = NULL, highlight_color = "#c6c3c3", highlight_alpha = 0.2, OrgDb = NULL, show_legend = FALSE, auto_x_axis = TRUE )plotGeneTrack( txdb, chr, start_pos, end_pos, xlab = "", ylab = "", x_text_size = 10, y_text_size = 10, select_gene = "all", palette = NULL, fromType = "ENTREZID", highlight = NULL, highlight_color = "#c6c3c3", highlight_alpha = 0.2, OrgDb = NULL, show_legend = FALSE, auto_x_axis = TRUE )
txdb |
TxDb object, providing gene annotation. |
chr |
chromosome id. |
start_pos |
start coordinate of windows. |
end_pos |
end coordinate of windows. |
xlab |
x lab. |
ylab |
y lab. |
x_text_size |
the size of x text. |
y_text_size |
the size of y text. |
select_gene |
show all gene or specifc gene. (1)"all", show all genes. (2) gene symbol, e.g. c("SKAP1", "EFCAB13"). (3) gene id, e.g. c(4831, 55316) |
palette |
palette, default "Set3". |
fromType |
from which type of gene name to change gene id. Default: ENTREZID. See [clusterProfiler::bitr()] |
highlight |
a region or a list of region to highlight. |
highlight_color |
colors of highlight rect. Default "#c6c3c3" |
highlight_alpha |
alpha of highlight rect. |
OrgDb |
OrgDb for change gene id to gene symbol. |
show_legend |
show legend or not. |
auto_x_axis |
use auto x axis or not. |
ggplot object
require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene plotGeneTrack(txdb = txdb, chr = "chr8", start_pos = 126712193, end_pos = 126712293)require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene plotGeneTrack(txdb = txdb, chr = "chr8", start_pos = 126712193, end_pos = 126712293)
Plot the profile of motif of specific peak
plotMotifProf( df, legend_lab = "motif", y_lab = "motif score", x_lab = NULL, interactive = FALSE, width_svg = 10, height_svg = 6 )plotMotifProf( df, legend_lab = "motif", y_lab = "motif score", x_lab = NULL, interactive = FALSE, width_svg = 10, height_svg = 6 )
df |
motif information data.frame. |
legend_lab |
legend lab. |
y_lab |
y axis label. |
x_lab |
x axis label. |
interactive |
produce interactive fig or not. |
width_svg |
width_svg |
height_svg |
height_svg |
ggplot object
require(BSgenome.Hsapiens.UCSC.hg38) data(pwm_obj) region_oi <- GRanges( seqnames = "chr22", ranges = IRanges(start = 10525891, end = 10525991) ) motifMatrix <- getMotifMatrix( region = region_oi, pwm = pwm_obj[c(45, 120, 170)], ref_obj = BSgenome.Hsapiens.UCSC.hg38 ) plotMotifProf(motifMatrix)require(BSgenome.Hsapiens.UCSC.hg38) data(pwm_obj) region_oi <- GRanges( seqnames = "chr22", ranges = IRanges(start = 10525891, end = 10525991) ) motifMatrix <- getMotifMatrix( region = region_oi, pwm = pwm_obj[c(45, 120, 170)], ref_obj = BSgenome.Hsapiens.UCSC.hg38 ) plotMotifProf(motifMatrix)
plotPeakHeatmap function
plotPeakHeatmap( tagMatrix, plot_prof = TRUE, xlab = "", ylab = "", palette = NULL, title = NULL, facet_label_text_size = 12, nrow = NULL, ncol = NULL, conf = NULL, statistic_method = "mean", missingDataAsZero = TRUE, facet = "none", free_y = TRUE, height_proportion = 4, ... )plotPeakHeatmap( tagMatrix, plot_prof = TRUE, xlab = "", ylab = "", palette = NULL, title = NULL, facet_label_text_size = 12, nrow = NULL, ncol = NULL, conf = NULL, statistic_method = "mean", missingDataAsZero = TRUE, facet = "none", free_y = TRUE, height_proportion = 4, ... )
tagMatrix |
output from getTagMatrix(). |
plot_prof |
combine prof or not. Default: TRUE |
xlab |
xlab. |
ylab |
ylab. |
palette |
palette to be filled in,details see scale_colour_brewer. |
title |
title. |
facet_label_text_size |
the size of facet label text |
nrow |
nrow to place a number of fig. |
ncol |
ncol to place a number of fig. |
conf |
confidence interval. |
statistic_method |
method to do statistic. one of "mean", "median", "min", "max", "sum", "std" |
missingDataAsZero |
set missing data as zero or not. |
facet |
one of 'none', 'row' and 'column'. |
free_y |
if TRUE, y will be scaled. |
height_proportion |
the proportion of profiling picture and heatmap |
... |
additional parameters |
ggplot object
data(tagMatrix) plotPeakHeatmap(tagMatrix)data(tagMatrix) plotPeakHeatmap(tagMatrix)
Plot peak heatmap sub function
plotPeakHeatmap_sub( tagMatrix, xlab = "", ylab = "", palette = NULL, title = NULL, facet_label_text_size = 12, nrow = NULL, ncol = NULL )plotPeakHeatmap_sub( tagMatrix, xlab = "", ylab = "", palette = NULL, title = NULL, facet_label_text_size = 12, nrow = NULL, ncol = NULL )
tagMatrix |
output from getTagMatrix(). |
xlab |
xlab. |
ylab |
ylab. |
palette |
palette to be filled in,details see [ggplot2::scale_colour_brewer()]. |
title |
title. |
facet_label_text_size |
the size of facet label text |
nrow |
nrow to place a number of fig. |
ncol |
ncol to place a number of fig. |
ggplot object
internal function of plotPeakHeatmap
plotPeakHeatmap_sub.internal( tagMatrix, xlab = "", ylab = "", palette = NULL, title = NULL, facet_label_text_size = 12 )plotPeakHeatmap_sub.internal( tagMatrix, xlab = "", ylab = "", palette = NULL, title = NULL, facet_label_text_size = 12 )
tagMatrix |
output from getTagMatrix(). |
xlab |
xlab. |
ylab |
ylab. |
palette |
palette to be filled in,details see [ggplot2::scale_colour_brewer()]. |
title |
title. |
facet_label_text_size |
the size of facet label text |
ggplot object
plot peak profile
plotPeakProf( tagMatrix, xlab = "Genomic Region (5'->3')", ylab = "Peak Count Frequency", conf = NULL, title = "", facet = "none", free_y = TRUE, statistic_method = "mean", missingDataAsZero = TRUE, ... )plotPeakProf( tagMatrix, xlab = "Genomic Region (5'->3')", ylab = "Peak Count Frequency", conf = NULL, title = "", facet = "none", free_y = TRUE, statistic_method = "mean", missingDataAsZero = TRUE, ... )
tagMatrix |
output from getTagMatrix(). |
xlab |
xlab. |
ylab |
ylab. |
conf |
confidence interval. |
title |
title. |
facet |
one of 'none', 'row' and 'column'. |
free_y |
if TRUE, y will be scaled. |
statistic_method |
method to do statistic. one of "mean", "median", "min", "max", "sum", "std" |
missingDataAsZero |
set missing data as zero or not. |
... |
additional parameters |
ggplot object
G Yu; Y Yan
data(tagMatrix) plotPeakProf(tagMatrix)data(tagMatrix) plotPeakProf(tagMatrix)
A collection of transcription factor position weight matrices (PWMs) retrieved from the JASPAR 2024 database. This dataset is used to demonstrate motif enrichment, motif scanning, and peak–motif association analyses in epiSeeker. See data-raw/example_data.R
A PFMatrixList object containing PWMs for multiple human
transcription factors from the JASPAR 2024 CORE collection.
pwm_obj
The PWM set was obtained using the JASPAR 2024 SQLite database bundled in the JASPAR2024 package. Matrices were retrieved using TFBSTools with the following parameters:
collection = "CORE"
all_versions = FALSE
species = "Homo sapiens"
tax_group = "vertebrates"
A TFBSTools::PWMatrixList (or PFMatrixList) object containing
one PWM per transcription factor.
Each matrix stores nucleotide position weights across the TF binding motif,
with rows representing A, C, G, T and columns representing motif positions.
Read peak file and store in data.frame or GRanges object
readPeakFile(peakfile, as = "GRanges", ...)readPeakFile(peakfile, as = "GRanges", ...)
peakfile |
peak file |
as |
output format, one of GRanges or data.frame |
... |
additional parameter (pass to 'utils::read.delim()') |
peak information, in GRanges or data.frame object
G Yu
peakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") peak.gr <- readPeakFile(peakfile, as = "GRanges") peak.grpeakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") peak.gr <- readPeakFile(peakfile, as = "GRanges") peak.gr
Rename columns of a GRanges object
## S3 method for class 'GRanges' rename(.data, ...)## S3 method for class 'GRanges' rename(.data, ...)
.data |
A GRanges object. |
... |
Rename expressions in the form new_name = old_name. |
A GRanges object with renamed metadata columns.
peakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") peak <- readPeakFile(peakfile) dplyr::rename(peak, tag = tags)peakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") peak <- readPeakFile(peakfile) dplyr::rename(peak, tag = tags)
Annotate genomic regions to genes in many-to-many mapping
seq2gene(seq, tssRegion, flankDistance, TxDb, sameStrand = FALSE)seq2gene(seq, tssRegion, flankDistance, TxDb, sameStrand = FALSE)
seq |
genomic regions in GRanges object |
tssRegion |
TSS region |
flankDistance |
flanking search radius |
TxDb |
TxDb object |
sameStrand |
logical, whether find nearest/overlap gene in the same strand |
This function associates genomic regions with coding genes in a many-to-many mapping. It first maps genomic regions to host genes (either located in exon or intron), proximal genes (located in promoter regions) and flanking genes (located in upstream and downstream within user-specified distance).
gene vector
Guangchuang Yu
data(seq2gene_result) seq2gene_resultdata(seq2gene_result) seq2gene_result
A character vector of gene IDs returned by seq2gene(), representing
genes associated with a subset of peaks.
This dataset is used to illustrate peak-to-gene mapping and regulatory
region annotation workflows in epiSeeker.
See data-raw/example_data.R
A character vector of gene IDs generated by
seq2gene() from the subset of peaks derived from demo_peak.
vector of gene names
A character vector of gene identifiers (ENTREZ IDs) representing genes linked to the example peak set via TSS proximity or flanking-gene search.
The example peak set demo_peak was constructed by sampling up to
10 peaks per autosome (chr1–chr22) from the ChIP-seq dataset
GSM6418464.
Peaks were imported using readPeakFile(), subset by chromosome,
and combined into a single GRanges object.
The gene-level associations were then computed directly using:
seq2gene_result <- seq2gene(
demo_peak,
tssRegion = c(-1000, 1000),
flankDistance = 3000,
txdb
)
The resulting character vector of gene IDs was saved via
data-raw/example_data.R.
show method for 'csAnno' instance
show(object)show(object)
object |
A 'csAnno' instance |
message
Guangchuang Yu <https://guangchuangyu.github.io>
peakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") show(peakfile)peakfile <- system.file("extdata", "sample_peaks.txt", package = "epiSeeker") show(peakfile)
shuffle the position of peak
shuffle(peak.gr, TxDb)shuffle(peak.gr, TxDb)
peak.gr |
GRanges object |
TxDb |
TxDb |
GRanges object
G Yu
require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene p <- GRanges( seqnames = c("chr1", "chr3"), ranges = IRanges(start = c(1, 100), end = c(50, 130)) ) shuffle(p, TxDb = txdb)require(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene p <- GRanges( seqnames = c("chr1", "chr3"), ranges = IRanges(start = c(1, 100), end = c(50, 130)) ) shuffle(p, TxDb = txdb)
tagMatrix result used to demonstrate TSS enrichment visualization and tag distribution plotting functions in epiSeeker. See data-raw/example_data.R
A numeric matrix with n genes × 500 bins.
matrix
The tag matrix was generated using a sample peak file obtained from
getSampleFiles()[[4]].
Peaks were imported via readPeakFile() and processed using
epiSeeker::getTagMatrix() with the following settings:
Transcript database: TxDb.Hsapiens.UCSC.hg19.knownGene
Annotation mode: type = "start_site", by = "gene"
TSS window: upstream 3000 bp, downstream 3000 bp
Peak weight: column "V5" of the peak file
Number of bins: nbin = 500
A numeric matrix in which:
Represent individual genes contributing tags around their TSS.
Represent evenly spaced bins across the TSS window from -3000 bp to +3000 bp (500 bins total).
upsetplot method generics
upsetplot(x, ...)upsetplot(x, ...)
x |
A 'csAnno' instance |
... |
additional parameter |
plot
Guangchuang Yu <https://guangchuangyu.github.io>
data(peakAnno) upsetplot(peakAnno)data(peakAnno) upsetplot(peakAnno)
vennpie method generics
vennpie(x, r = 0.2, cex = 1.2, ...) vennpie(x, r = 0.2, cex=1.2, ...)vennpie(x, r = 0.2, cex = 1.2, ...) vennpie(x, r = 0.2, cex=1.2, ...)
x |
A 'csAnno' instance |
r |
initial radius |
cex |
value to adjust legend |
... |
additional parameter |
plot
Guangchuang Yu <https://guangchuangyu.github.io>
data(peakAnno) vennpie(peakAnno)data(peakAnno) vennpie(peakAnno)
Plot the overlap of a list of object
vennplot(Sets, ...)vennplot(Sets, ...)
Sets |
a list of object, can be vector or GRanges object. |
... |
extra parameters using ggVennDiagram. Details see [ggVennDiagram::ggVennDiagram] |
venn plot produced through this way has colors which can be defined by users using ggplot2 grammar e.g.(scale_fill_distiller()). And users can specify any details, like digital number, text size and showing percentage or not, by inputting '...' extra parameters.
venn plot that summarize the overlap of peaks from different experiments or gene annotation from different peak files.
G Yu
data(peakAnnoList) genes <- lapply(peakAnnoList, function(i) as.data.frame(i)$geneId) vennplot(genes)data(peakAnnoList) genes <- lapply(peakAnnoList, function(i) as.data.frame(i)$geneId) vennplot(genes)
Vennplot for peak files
vennplot.peakfile(files, labels = NULL)vennplot.peakfile(files, labels = NULL)
files |
peak files |
labels |
labels for peak files |
figure
G Yu
files <- list( system.file("extdata", "sample_peaks.txt", package = "epiSeeker"), system.file("extdata", "sample_peaks.txt", package = "epiSeeker") ) vennplot.peakfile(files)files <- list( system.file("extdata", "sample_peaks.txt", package = "epiSeeker"), system.file("extdata", "sample_peaks.txt", package = "epiSeeker") ) vennplot.peakfile(files)