Title: | Transcription Factor Enrichment Analysis for ATAC-seq |
---|---|
Description: | Assay for Transpose-Accessible Chromatin using sequencing (ATAC-seq) is a technique to assess genome-wide chromatin accessibility by probing open chromatin with hyperactive mutant Tn5 Transposase that inserts sequencing adapters into open regions of the genome. ATACseqTFEA is an improvement of the current computational method that detects differential activity of transcription factors (TFs). ATACseqTFEA not only uses the difference of open region information, but also (or emphasizes) the difference of TFs footprints (cutting sites or insertion sites). ATACseqTFEA provides an easy, rigorous way to broadly assess TF activity changes between two conditions. |
Authors: | Jianhong Ou [aut, cre] |
Maintainer: | Jianhong Ou <[email protected]> |
License: | GPL-3 |
Version: | 1.9.0 |
Built: | 2024-10-30 03:35:49 UTC |
Source: | https://github.com/bioc/ATACseqTFEA |
Assay for Transpose-Accessible Chromatin using sequencing (ATAC-seq) is a technique to assess genome-wide chromatin accessibility by probing open chromatin with hyperactive mutant Tn5 Transposase that inserts sequencing adapters into open regions of the genome. ATACseqTFEA is an improvement of the current computational method that detects differential activity of transcription factors (TFs). ATACseqTFEA not only uses the difference of open region information, but also (or emphasizes) the difference of TFs footprints (cutting sites or insertion sites). ATACseqTFEA provides an easy, rigorous way to broadly assess TF activity changes between two conditions.
Maintainer: Jianhong Ou [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/jianhong/ATACseqTFEA/issues
Use open score to calculate the weights for the binding score. The open score is calculated by the counts of the proximal region divided by the counts of the distal region. And the counts RangedSummarizedExperiment will be filtered by the Z-score of the open score. The weight is calculated by converting the Z score to the range of 0-1 following the normal distribution.
calWeights(se, openscoreZcutoff = 0, ...)
calWeights(se, openscoreZcutoff = 0, ...)
se |
An RangedSummarizedExperiment object. Outputs of countsNormalization. |
openscoreZcutoff |
Open score Z value cutoff value. Default is 0. Open score is calculated by the count ratio of proximal site and distal site. |
... |
Not used. |
A RangedSummarizedExperiment object with assays of count matrix with bindingSites, proximalRegion and distalRegion as column names and bindingSites GRanges object with the weights as rowRanges.
Jianhong Ou
bam <- system.file("extdata", "KD.shift.rep1.bam", package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) ## get the count regions bsEx <- expandBindingSites(bindingSites) ## count reads by 5'ends res <- count5ends(bam, positive=0L, negative=0L, bindingSites=bindingSites, bindingSitesWithGap=bsEx$bindingSitesWithGap, bindingSitesWithProximal=bsEx$bindingSitesWithProximal, bindingSitesWithProximalAndGap= bsEx$bindingSitesWithProximalAndGap, bindingSitesWithDistal=bsEx$bindingSitesWithDistal) ## filter 0 counts in proximal se <- eventsFilter(res, proximalRegion>0) ## normalize counts by width of count region se <- countsNormalization(se, proximal=40, distal=40) ## calculate the weights calWeights(se)
bam <- system.file("extdata", "KD.shift.rep1.bam", package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) ## get the count regions bsEx <- expandBindingSites(bindingSites) ## count reads by 5'ends res <- count5ends(bam, positive=0L, negative=0L, bindingSites=bindingSites, bindingSitesWithGap=bsEx$bindingSitesWithGap, bindingSitesWithProximal=bsEx$bindingSitesWithProximal, bindingSitesWithProximalAndGap= bsEx$bindingSitesWithProximalAndGap, bindingSitesWithDistal=bsEx$bindingSitesWithDistal) ## filter 0 counts in proximal se <- eventsFilter(res, proximalRegion>0) ## normalize counts by width of count region se <- countsNormalization(se, proximal=40, distal=40) ## calculate the weights calWeights(se)
Prepare the counts matrix by 5'end of reads.
count5ends( bam, index = bam, yieldSize = 1e+05, positive = 4L, negative = 5L, bindingSites, bindingSitesWithGap, bindingSitesWithProximal, bindingSitesWithProximalAndGap, bindingSitesWithDistal )
count5ends( bam, index = bam, yieldSize = 1e+05, positive = 4L, negative = 5L, bindingSites, bindingSitesWithGap, bindingSitesWithProximal, bindingSitesWithProximalAndGap, bindingSitesWithDistal )
bam |
A character vector indicates the file names of the bams or an object of BamFile. |
index |
The names of the index file of the 'BAM' file being processed; This is given without the '.bai' extension. |
yieldSize |
Number of records to yield each time the file is read. See BamFile for details. |
positive , negative
|
integer(1). the size to be shift for positive/negative strand. If the bam file is 5'end shifed files, please set the parameter to 0. |
bindingSites |
A object of GenomicRanges indicates candidate binding sites. The prepareBindingSites function is a helper function to generate the binding sites. Users can also use other software for example fimo to generate the list. |
bindingSitesWithGap |
bindingSites with gaps and in both ends, |
bindingSitesWithProximal |
bindingSites with gaps and proximal region in both ends, |
bindingSitesWithProximalAndGap |
bindingSites with gaps, and then proximal and gaps in both ends, |
bindingSitesWithDistal |
bindingSites with gap, proximal, gap and distal regions. |
A RangedSummarizedExperiment object with assays of count matrix with bindingSites, proximalRegion and distalRegion as column names and bindingSites GRanges object as rowRanges.
Jianhong Ou
bam <- system.file("extdata", "KD.shift.rep1.bam", package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) ## get the count regions bsEx <- expandBindingSites(bindingSites) res <- count5ends(bam, positive=0L, negative=0L, bindingSites=bindingSites, bindingSitesWithGap=bsEx$bindingSitesWithGap, bindingSitesWithProximal=bsEx$bindingSitesWithProximal, bindingSitesWithProximalAndGap= bsEx$bindingSitesWithProximalAndGap, bindingSitesWithDistal=bsEx$bindingSitesWithDistal) head(res)
bam <- system.file("extdata", "KD.shift.rep1.bam", package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) ## get the count regions bsEx <- expandBindingSites(bindingSites) res <- count5ends(bam, positive=0L, negative=0L, bindingSites=bindingSites, bindingSitesWithGap=bsEx$bindingSitesWithGap, bindingSitesWithProximal=bsEx$bindingSitesWithProximal, bindingSitesWithProximalAndGap= bsEx$bindingSitesWithProximalAndGap, bindingSitesWithDistal=bsEx$bindingSitesWithDistal) head(res)
Do normalization by width for counts in binding sites, proximal and distal regions.
countsNormalization(se, proximal, distal)
countsNormalization(se, proximal, distal)
se |
An RangedSummarizedExperiment object. Outputs of count5ends or eventsFilter. |
proximal , distal
|
numeric(1) or integer(1). bases for open region from binding sites (proximal) and extended region for background (distal) of the binding region for aggregate ATAC-seq footprint. |
A RangedSummarizedExperiment object with assays of count matrix with bindingSites, proximalRegion and distalRegion as column names and bindingSites GRanges object as rowRanges.
Jianhong Ou
bam <- system.file("extdata", "KD.shift.rep1.bam", package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) ## get the count regions bsEx <- expandBindingSites(bindingSites) ## count reads by 5'ends res <- count5ends(bam, positive=0L, negative=0L, bindingSites=bindingSites, bindingSitesWithGap=bsEx$bindingSitesWithGap, bindingSitesWithProximal=bsEx$bindingSitesWithProximal, bindingSitesWithProximalAndGap= bsEx$bindingSitesWithProximalAndGap, bindingSitesWithDistal=bsEx$bindingSitesWithDistal) ## filter 0 counts in proximal se <- eventsFilter(res, proximalRegion>0) ## normalize counts by width of count region countsNormalization(se, proximal=40, distal=40)
bam <- system.file("extdata", "KD.shift.rep1.bam", package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) ## get the count regions bsEx <- expandBindingSites(bindingSites) ## count reads by 5'ends res <- count5ends(bam, positive=0L, negative=0L, bindingSites=bindingSites, bindingSitesWithGap=bsEx$bindingSitesWithGap, bindingSitesWithProximal=bsEx$bindingSitesWithProximal, bindingSitesWithProximalAndGap= bsEx$bindingSitesWithProximalAndGap, bindingSitesWithDistal=bsEx$bindingSitesWithDistal) ## filter 0 counts in proximal se <- eventsFilter(res, proximalRegion>0) ## normalize counts by width of count region countsNormalization(se, proximal=40, distal=40)
Use limma to do differential binding analysis for binding scores.
DBscore(se, design, coef, ...)
DBscore(se, design, coef, ...)
se |
An RangedSummarizedExperiment object. Outputs of getWeightedBindingScore. |
design |
Design table for lmFit. |
coef |
column number or column name specifying which coefficient or contrast of the linear model is of interest. See topTable. |
... |
Parameters can be used by lmFit. |
A RangedSummarizedExperiment object with the dataframe returned by topTable as appendence of the origin rowData.
Jianhong Ou
library(SummarizedExperiment) set.seed(1) sigma2 <- 0.05 / rchisq(100, df=10) * 10 y <- matrix(rnorm(100*6,sd=sqrt(sigma2)),100,6) design <- cbind(Intercept=1,Group=c(0,0,0,1,1,1)) y[1,4:6] <- y[1,4:6] + 1 se <- SummarizedExperiment(assays=list(counts=y)) DBscore(se, design, coef=1)
library(SummarizedExperiment) set.seed(1) sigma2 <- 0.05 / rchisq(100, df=10) * 10 y <- matrix(rnorm(100*6,sd=sqrt(sigma2)),100,6) design <- cbind(Intercept=1,Group=c(0,0,0,1,1,1)) y[1,4:6] <- y[1,4:6] + 1 se <- SummarizedExperiment(assays=list(counts=y)) DBscore(se, design, coef=1)
Transcription factor enrichment analysis for the filtered output of DBscore
doTFEA(se, ...)
doTFEA(se, ...)
se |
An RangedSummarizedExperiment object. Filtered outputs of DBscore. |
... |
Not used. |
A TFEAresults object.
Jianhong Ou
bamExp <- system.file("extdata", c("KD.shift.rep1.bam", "KD.shift.rep2.bam"), package="ATACseqTFEA") bamCtl <- system.file("extdata", c("WT.shift.rep1.bam", "WT.shift.rep2.bam"), package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) ## get the count regions bsEx <- expandBindingSites(bindingSites) ## count reads by 5'ends res <- count5ends(c(bamExp, bamCtl), positive=0L, negative=0L, bindingSites=bindingSites, bindingSitesWithGap=bsEx$bindingSitesWithGap, bindingSitesWithProximal=bsEx$bindingSitesWithProximal, bindingSitesWithProximalAndGap= bsEx$bindingSitesWithProximalAndGap, bindingSitesWithDistal=bsEx$bindingSitesWithDistal) ## filter 0 counts in proximal se <- eventsFilter(res, proximalRegion>0) ## normalize counts by width of count region se <- countsNormalization(se, proximal=40, distal=40) ## get the weighted binding scores se <- getWeightedBindingScore(se) design <- cbind(CTL=1, EXPvsCTL=c(1, 1, 0, 0)) rownames(design) <- colnames(se) counts <- DBscore(se, design=design, coef="EXPvsCTL") doTFEA(counts)
bamExp <- system.file("extdata", c("KD.shift.rep1.bam", "KD.shift.rep2.bam"), package="ATACseqTFEA") bamCtl <- system.file("extdata", c("WT.shift.rep1.bam", "WT.shift.rep2.bam"), package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) ## get the count regions bsEx <- expandBindingSites(bindingSites) ## count reads by 5'ends res <- count5ends(c(bamExp, bamCtl), positive=0L, negative=0L, bindingSites=bindingSites, bindingSitesWithGap=bsEx$bindingSitesWithGap, bindingSitesWithProximal=bsEx$bindingSitesWithProximal, bindingSitesWithProximalAndGap= bsEx$bindingSitesWithProximalAndGap, bindingSitesWithDistal=bsEx$bindingSitesWithDistal) ## filter 0 counts in proximal se <- eventsFilter(res, proximalRegion>0) ## normalize counts by width of count region se <- countsNormalization(se, proximal=40, distal=40) ## get the weighted binding scores se <- getWeightedBindingScore(se) design <- cbind(CTL=1, EXPvsCTL=c(1, 1, 0, 0)) rownames(design) <- colnames(se) counts <- DBscore(se, design=design, coef="EXPvsCTL") doTFEA(counts)
Plot GSEA style enrichment score curve.
ESvolcanoplot( TFEAresults, xlab = "Enrichment Score", ylab = "-log10(p value)", TFnameToShow = 20, significantCutoff = 0.05, col = c("red", "blue", "gray"), ... )
ESvolcanoplot( TFEAresults, xlab = "Enrichment Score", ylab = "-log10(p value)", TFnameToShow = 20, significantCutoff = 0.05, col = c("red", "blue", "gray"), ... )
TFEAresults |
A TFEAresults object. Output of TFEA. |
xlab , ylab
|
character string giving label for x-axis/y-axis. |
TFnameToShow |
Transcription factor names to be drawn. |
significantCutoff |
Cutoff value for significant. |
col |
Color sets for the points. |
... |
parameter passed to pdf. |
ggplot object.
res <- system.file("extdata", "res.rds", package="ATACseqTFEA") res <- readRDS(res) ESvolcanoplot(TFEAresults=res)
res <- system.file("extdata", "res.rds", package="ATACseqTFEA") res <- readRDS(res) ESvolcanoplot(TFEAresults=res)
A helper function to subset the counts object outputed by count5ends.
eventsFilter(se, filter)
eventsFilter(se, filter)
se |
An RangedSummarizedExperiment object. Outputs of count5ends. |
filter |
An expression which, when evaluated in the context of assays(se), is a logical vector indicating elements or rows to keep. The expression results for each assay will be combined and use 'or' operator to filter the counts assays. |
A RangedSummarizedExperiment object with assays of count matrix with bindingSites, proximalRegion and distalRegion as column names and bindingSites GRanges object as rowRanges.
Jianhong Ou
bam <- system.file("extdata", "KD.shift.rep1.bam", package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) ## get the count regions bsEx <- expandBindingSites(bindingSites) ## count reads by 5'ends res <- count5ends(bam, bindingSites=bindingSites, bindingSitesWithGap=bsEx$bindingSitesWithGap, bindingSitesWithProximal=bsEx$bindingSitesWithProximal, bindingSitesWithProximalAndGap= bsEx$bindingSitesWithProximalAndGap, bindingSitesWithDistal=bsEx$bindingSitesWithDistal) eventsFilter(res, proximalRegion>0) eventsFilter(res, seqnames(res)=="chr1") eventsFilter(res, sample(c(TRUE, FALSE), length(res), replace=TRUE)) eventsFilter(res, "proximalRegion>0") filter <- "proximalRegion>0" eventsFilter(res, filter) filter <- sample(c(TRUE, FALSE), length(res), replace=TRUE) eventsFilter(res, filter)
bam <- system.file("extdata", "KD.shift.rep1.bam", package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) ## get the count regions bsEx <- expandBindingSites(bindingSites) ## count reads by 5'ends res <- count5ends(bam, bindingSites=bindingSites, bindingSitesWithGap=bsEx$bindingSitesWithGap, bindingSitesWithProximal=bsEx$bindingSitesWithProximal, bindingSitesWithProximalAndGap= bsEx$bindingSitesWithProximalAndGap, bindingSitesWithDistal=bsEx$bindingSitesWithDistal) eventsFilter(res, proximalRegion>0) eventsFilter(res, seqnames(res)=="chr1") eventsFilter(res, sample(c(TRUE, FALSE), length(res), replace=TRUE)) eventsFilter(res, "proximalRegion>0") filter <- "proximalRegion>0" eventsFilter(res, filter) filter <- sample(c(TRUE, FALSE), length(res), replace=TRUE) eventsFilter(res, filter)
Create multiple GRanges objects for downstream counting. The GRanges objects including bindingSitesWithGap: bindingSites with gaps and in both ends, bindingSitesWithProximal: bindingSites with gaps and proximal region in both ends, bindingSitesWithProximalAndGap: bindingSites with gaps, and then proximal and gaps in both ends, and bindingSitesWithDistal: bindingSites with gaps, proximal, gaps and distal regions.
expandBindingSites(bindingSites, proximal = 40L, distal = proximal, gap = 10L)
expandBindingSites(bindingSites, proximal = 40L, distal = proximal, gap = 10L)
bindingSites |
A object of GenomicRanges indicates candidate binding sites. The prepareBindingSites function is a helper function to generate the binding sites. Users can also use other software for example fimo to generate the list. |
proximal , distal
|
numeric(1) or integer(1). bases for open region from binding sites (proximal) and extended region for background (distal) of the binding region for aggregate ATAC-seq footprint. |
gap |
numeric(1) or integer(1). bases for gaps among binding sites, proximal, and distal. default is 10L. |
an GRangesList object with elements bindingSitesWithGap, bindingSitesWithProximal, bindingSitesWithProximalAndGap, and bindingSitesWithDistal for count5ends
Jianhong Ou
bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) bs <- expandBindingSites(bindingSites) length(bs) names(bs) lengths(bs)
bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) bs <- expandBindingSites(bindingSites) length(bs) names(bs) lengths(bs)
The list of data saved in extdata folder.
The 'PWMatrixList' is a collection of jasper2018, jolma2013 and cisbp_1.02 from package motifDB (v 1.28.0) and merged by distance smaller than 1e-9 calculated by MotIV::motifDistances function (v 1.42.0). The merged motifs were exported by motifStack (v 1.30.0).
The 'cluster_PWMs' is a list of non-redundant TF motifs downloaded from [DeepSTARR](https://github.com/bernardo-de-almeida/motif-clustering). There are 6502 motifs in the data set.
The 'best_curated_Human' is a list of TF motifs downloaded from [TFEA github](https://github.com/Dowell-Lab/TFEA). There are 1279 human motifs in the data set.
motifs <- readRDS(system.file("extdata", "PWMatrixList.rds", package="ATACseqTFEA")) motifs2 <- readRDS(system.file("extdata", "cluster_PWMs.rds", package="ATACseqTFEA")) motifs3 <- readRDS(system.file("extdata", "best_curated_Human.rds", package="ATACseqTFEA"))
motifs <- readRDS(system.file("extdata", "PWMatrixList.rds", package="ATACseqTFEA")) motifs2 <- readRDS(system.file("extdata", "cluster_PWMs.rds", package="ATACseqTFEA")) motifs3 <- readRDS(system.file("extdata", "best_curated_Human.rds", package="ATACseqTFEA"))
The assessment and replacement methods for TFEAresults-class
getEnrichmentScore(x) getBindingSites(x, TF) getMotifID(x) ## S4 method for signature 'TFEAresults' show(object) ## S4 method for signature 'TFEAresults' x$name ## S4 replacement method for signature 'TFEAresults' x$name <- value ## S4 method for signature 'TFEAresults,ANY,ANY' x[[i, j, ..., exact = TRUE]] ## S4 replacement method for signature 'TFEAresults,ANY,ANY' x[[i, j, ...]] <- value ## S4 method for signature 'TFEAresults' getEnrichmentScore(x) ## S4 method for signature 'TFEAresults' getBindingSites(x, TF) ## S4 method for signature 'TFEAresults' getMotifID(x)
getEnrichmentScore(x) getBindingSites(x, TF) getMotifID(x) ## S4 method for signature 'TFEAresults' show(object) ## S4 method for signature 'TFEAresults' x$name ## S4 replacement method for signature 'TFEAresults' x$name <- value ## S4 method for signature 'TFEAresults,ANY,ANY' x[[i, j, ..., exact = TRUE]] ## S4 replacement method for signature 'TFEAresults,ANY,ANY' x[[i, j, ...]] <- value ## S4 method for signature 'TFEAresults' getEnrichmentScore(x) ## S4 method for signature 'TFEAresults' getBindingSites(x, TF) ## S4 method for signature 'TFEAresults' getMotifID(x)
x |
TFEAresults object. |
TF |
Transcription factor |
object |
an object of TFEAresults |
name |
A literal character string or a name (possibly backtick quoted). |
value |
value to replace. |
i , j
|
indices specifying elements to extract or replace. |
... |
Named or unnamed arguments to form a signature. |
exact |
see Extract |
The 'getEnrichmentScore' method will return the enrichment score matrix.
The 'getBindingSites' method will return a GRanges object indicates binding sites.
The method 'getMotifID' will return A list of positions of the binding sites for the motifs.
res <- readRDS(system.file("extdata", "res.rds", package="ATACseqTFEA")) as(res, "data.frame") res head(res$resultsTable) head(res[["resultsTable"]]) head(getEnrichmentScore(res))
res <- readRDS(system.file("extdata", "res.rds", package="ATACseqTFEA")) as(res, "data.frame") res head(res$resultsTable) head(res[["resultsTable"]]) head(getEnrichmentScore(res))
Use user predefined weight to get the weighted binding score or use open score to weight the binding score. The open score is calculated by the counts of proximal region divided by the counts of distal region. The binding score is calculated by the counts of proximal region divided by the counts of binding region. This value is the measure of avoidance of reads in the binding sites.
getWeightedBindingScore(se, weight = NA, ...)
getWeightedBindingScore(se, weight = NA, ...)
se |
An RangedSummarizedExperiment object. Outputs of countsNormalization. |
weight |
If NA, the weight will be calculated by the open score. See calWeights. User can define the weight by a matrix or numeric vector. |
... |
The parameters will be passed to calWeights. |
A RangedSummarizedExperiment object with assays of count matrix with bindingSites, proximalRegion and distalRegion as column names and bindingSites GRanges object as rowRanges.
Jianhong Ou
bam <- system.file("extdata", "KD.shift.rep1.bam", package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) ## get the count regions bsEx <- expandBindingSites(bindingSites) ## count reads by 5'ends res <- count5ends(bam, positive=0L, negative=0L, bindingSites=bindingSites, bindingSitesWithGap=bsEx$bindingSitesWithGap, bindingSitesWithProximal=bsEx$bindingSitesWithProximal, bindingSitesWithProximalAndGap= bsEx$bindingSitesWithProximalAndGap, bindingSitesWithDistal=bsEx$bindingSitesWithDistal) ## filter 0 counts in proximal se <- eventsFilter(res, proximalRegion>0) ## normalize counts by width of count region se <- countsNormalization(se, proximal=40, distal=40) ## get the weighted binding scores getWeightedBindingScore(se)
bam <- system.file("extdata", "KD.shift.rep1.bam", package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) ## get the count regions bsEx <- expandBindingSites(bindingSites) ## count reads by 5'ends res <- count5ends(bam, positive=0L, negative=0L, bindingSites=bindingSites, bindingSitesWithGap=bsEx$bindingSitesWithGap, bindingSitesWithProximal=bsEx$bindingSitesWithProximal, bindingSitesWithProximalAndGap= bsEx$bindingSitesWithProximalAndGap, bindingSitesWithDistal=bsEx$bindingSitesWithDistal) ## filter 0 counts in proximal se <- eventsFilter(res, proximalRegion>0) ## normalize counts by width of count region se <- countsNormalization(se, proximal=40, distal=40) ## get the weighted binding scores getWeightedBindingScore(se)
Prepare binding sites by given fimo gff files
importFimoBindingSites( fimoGFFfiles, maximalBindingWidth = 40L, mergeBindingSitesByPercentage = 0.8, ignore.strand = TRUE, ... )
importFimoBindingSites( fimoGFFfiles, maximalBindingWidth = 40L, mergeBindingSitesByPercentage = 0.8, ignore.strand = TRUE, ... )
fimoGFFfiles |
Filenames of gff files of fimo output. |
maximalBindingWidth |
A numeric vector(length=1). Maximal binding site width. Default is 40. |
mergeBindingSitesByPercentage |
A numeric vector (length=1). The percentage of overlapping region of binding sites to merge as one binding site. |
ignore.strand |
When set to TRUE, the strand information is ignored in the calculations. |
... |
Parameter to be passed to import.gff |
A GenomicRanges
with
all the positions of matches.
Jianhong Ou
extdata <- system.file('extdata', package='ATACseqTFEA') fimoGFFfiles <- dir(extdata, 'fimo.*.gff', full.names=TRUE) mts <- importFimoBindingSites(fimoGFFfiles)
extdata <- system.file('extdata', package='ATACseqTFEA') fimoGFFfiles <- dir(extdata, 'fimo.*.gff', full.names=TRUE) mts <- importFimoBindingSites(fimoGFFfiles)
Plot GSEA style enrichment score curve.
plotES( TFEAresults, TF, outfolder = ".", xlab = "rank", ylab = "Enrichment", resolution = 500L, device = "pdf", ... )
plotES( TFEAresults, TF, outfolder = ".", xlab = "rank", ylab = "Enrichment", resolution = 500L, device = "pdf", ... )
TFEAresults |
A TFEAresults object. Output of TFEA. |
TF |
A character vector. The transcription factor names. |
outfolder |
character(1). Output file path. |
xlab , ylab
|
character string giving label for x-axis/y-axis. |
resolution |
integer(1). The number of bars plotted in the bottom of figure to show the density of occurrence of events. |
device |
Device to use. Can be one of "eps", "ps", "tex" (pictex), "pdf", "jpeg", "tiff", "png", "bmp", "svg" or "wmf" (windows only). |
... |
parameter passed to ggsave. |
NULL if outfolder is set or ggplot object.
res <- system.file("extdata", "res.rds", package="ATACseqTFEA") res <- readRDS(res) g <- plotES(res, TF="KLF9", outfolder=NA) g
res <- system.file("extdata", "res.rds", package="ATACseqTFEA") res <- readRDS(res) g <- plotES(res, TF="KLF9", outfolder=NA) g
Prepare binding sites by given position weight matrix and genome.
prepareBindingSites( pwms, genome, seqlev = seqlevels(genome), p.cutoff = 1e-05, w = 7, grange, maximalBindingWidth = 40L, mergeBindingSitesByPercentage = 0.8, ignore.strand = TRUE )
prepareBindingSites( pwms, genome, seqlev = seqlevels(genome), p.cutoff = 1e-05, w = 7, grange, maximalBindingWidth = 40L, mergeBindingSitesByPercentage = 0.8, ignore.strand = TRUE )
pwms |
either |
genome |
|
seqlev |
A character vector. Sequence levels to be searched. |
p.cutoff |
p-value cutoff for returning motifs; default is 1e-05 |
w |
parameter controlling size of window for filtration; default is 7 |
grange |
GRanges for motif search. If it is set, function will only search the binding site within the grange. Usually a peak list should be supplied. |
maximalBindingWidth |
A numeric vector(length=1). Maximal binding site width. Default is 40. |
mergeBindingSitesByPercentage |
A numeric vector (length=1). The percentage of overlapping region of binding sites to merge as one binding site. |
ignore.strand |
When set to TRUE, the strand information is ignored in the calculations. |
A GenomicRanges
with
all the positions of matches.
Jianhong Ou
library(TFBSTools) motifs <- readRDS(system.file("extdata", "PWMatrixList.rds", package="ATACseqTFEA")) library(BSgenome.Drerio.UCSC.danRer10) seqlev <- "chr1" #paste0("chr", 1:25) mts <- prepareBindingSites(motifs, Drerio, seqlev, grange=GRanges("chr1", IRanges(5000, 100000)))
library(TFBSTools) motifs <- readRDS(system.file("extdata", "PWMatrixList.rds", package="ATACseqTFEA")) library(BSgenome.Drerio.UCSC.danRer10) seqlev <- "chr1" #paste0("chr", 1:25) mts <- prepareBindingSites(motifs, Drerio, seqlev, grange=GRanges("chr1", IRanges(5000, 100000)))
Merge the ranges by percentage of overlaps to avoid broad ranges of continues ranges overlapped with limit bases.
reduceByPercentage( query, percentage, ignore.strand = TRUE, colnToKeep = c("score", "motif") )
reduceByPercentage( query, percentage, ignore.strand = TRUE, colnToKeep = c("score", "motif") )
query |
An object of GRanges |
percentage |
A numeric vector (length=1). The percentage of overlapping region of binding sites to merge as one range. |
ignore.strand |
When set to TRUE, the strand information is ignored in the calculations. |
colnToKeep |
The metadata colnums should be kept for reduced GRanges |
An object of GRanges.
library(GenomicRanges) gr <- GRanges("chr1", IRanges(c(1, 5, 10), width=c(10, 5, 2))) reduceByPercentage(gr, 0.5, colnToKeep=NULL)
library(GenomicRanges) gr <- GRanges("chr1", IRanges(c(1, 5, 10), width=c(10, 5, 2))) reduceByPercentage(gr, 0.5, colnToKeep=NULL)
Transcription factor enrichment analysis for ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing). We treat all the binding sites for one TF as a TF set and all the open regions as features for random walking.
TFEA( bamExp, bamCtl, indexExp = bamExp, indexCtl = bamCtl, positive = 4L, negative = 5L, bindingSites, proximal = 40L, distal = proximal, gap = 10L, filter = "proximalRegion>0", openscoreZcutoff = 0, bindingScoreLog2FCcutoff = 0, bindingScorePvalCutoff = 1 )
TFEA( bamExp, bamCtl, indexExp = bamExp, indexCtl = bamCtl, positive = 4L, negative = 5L, bindingSites, proximal = 40L, distal = proximal, gap = 10L, filter = "proximalRegion>0", openscoreZcutoff = 0, bindingScoreLog2FCcutoff = 0, bindingScorePvalCutoff = 1 )
bamExp |
A vector of characters indicates the file names of experiment bams. The bam file must be the one with shifted reads. |
bamCtl |
A vector of characters indicates the file names of control bams. The bam file must be the one with shifted reads. |
indexExp , indexCtl
|
The names of the index file of the 'BAM' file being processed; This is given without the '.bai' extension. |
positive , negative
|
integer(1). the size to be shift for positive/negative strand. If the bam file is 5'end shifed files, please set the parameter to 0. |
bindingSites |
A object of GenomicRanges indicates candidate binding sites. The prepareBindingSites function is a helper function to generate the binding sites. Users can also use other software for example fimo to generate the list. |
proximal , distal
|
numeric(1) or integer(1). bases for open region from binding sites (proximal) and extended region for background (distal) of the binding region for aggregate ATAC-seq footprint. |
gap |
numeric(1) or integer(1). bases for gaps among binding sites, proximal, and distal. default is 10L. |
filter |
An expression which, when evaluated in the context of assays(se), is a logical vector indicating elements or rows to keep. The expression results for each assay will be combined and use 'or' operator to filter the counts assays. |
openscoreZcutoff |
Open score Z value cutoff value. Default is 0. Open score is calculated by the count ratio of proximal site and distal site. |
bindingScorePvalCutoff , bindingScoreLog2FCcutoff
|
Binding score cutoff values. Default is 1 and 0. Binding score is calculated by the count ratio of proximal site and binding site. The cutoff values are used to decrease the total number of binding site for ranking. Increasing the 'log2FCcutoff' value and decreasing the P-value cutoff value can greatly decrease the memory cost and computing time by decreasing the total binding sites. |
A TFEAresults object.
Jianhong Ou
bamExp <- system.file("extdata", c("KD.shift.rep1.bam", "KD.shift.rep2.bam"), package="ATACseqTFEA") bamCtl <- system.file("extdata", c("WT.shift.rep1.bam", "WT.shift.rep2.bam"), package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) res <- TFEA(bamExp, bamCtl, bindingSites=bindingSites, positive=0, negative=0) res
bamExp <- system.file("extdata", c("KD.shift.rep1.bam", "KD.shift.rep2.bam"), package="ATACseqTFEA") bamCtl <- system.file("extdata", c("WT.shift.rep1.bam", "WT.shift.rep2.bam"), package="ATACseqTFEA") bsl <- system.file("extdata", "bindingSites.rds", package="ATACseqTFEA") bindingSites <- readRDS(bsl) res <- TFEA(bamExp, bamCtl, bindingSites=bindingSites, positive=0, negative=0) res
"TFEAresults"
An object of class "TFEAresults"
represents the results of TFEA.
TFEAresults(...)
TFEAresults(...)
... |
Each argument in ... becomes an slot in the new
|
A TFEAresults object.
enrichmentScore
"numeric Matrix"
, specify the
enrichment score for each transcription factor (TF). Every row represents
a TF. The columns represents the accumulated enrichment score for that rank.
bindingSites
GenomicRanges
object. It is keep same length and order as the columns in enrichmentScore.
motifID
"list"
. The ranks of binding sites for each TF.
resultsTable
"data.frame"
. The data frame contains the
summarized enrichment score, the p-value, and adjuct p-value for each TF.
res <- readRDS(system.file("extdata", "res.rds", package="ATACseqTFEA")) res
res <- readRDS(system.file("extdata", "res.rds", package="ATACseqTFEA")) res