Title: | Quantitative comparison of multiple ChIP-seq datasets |
---|---|
Description: | ChIPComp detects differentially bound sharp binding sites across multiple conditions considering matching control. |
Authors: | Hao Wu, Li Chen, Zhaohui S.Qin, Chi Wang |
Maintainer: | Li Chen <[email protected]> |
License: | GPL |
Version: | 1.37.0 |
Built: | 2024-10-30 04:39:09 UTC |
Source: | https://github.com/bioc/ChIPComp |
ChIPComp is an R library performing the differential binding analysis for ChIP-seq count data. Compared with other similar packages (DBChIP, DIME), ChIPComp considers the control samples in the process of detecting the differential binding sites. Extensive simulation results showed that ChIPComp performs favorably compared to DBChIP and DIME when the control samples are ignored. ChIPComp only works for two group comparison at this time, that is, to detect the differential binding sites for one transcription factor(histone) between two conditions (cell lines). We plan to extend the functionalities and make it work for more general experimental designs in the near future.
Hao Wu <[email protected]>, Li Chen <[email protected]>
Perform hypothesis testing to detect differential binding sites
ChIPComp(countSet,A,threshold=1)
ChIPComp(countSet,A,threshold=1)
countSet |
A |
A |
User-specified regions to fit the model. It is a bed file with three columns, named ("chr","start","end"), could be separated by space or tab. |
threshold |
User specified posterior probability threshold. Default is 1. |
A object ChIPComp
contains
Column chr
,start
,end
are the binding site genomic coordinate;
Column ip_c(\#condition)_r(\#replicate)
indicates ChIP counts in \#replicate in \#condition;
Column ct_c(\#condition)_r(\#replicate)
indicates smoothing control counts in \#replicate in \#condition;
Column commonPeak
1s indicate common binding sites;
Column prob.post
is the posterior probability for each binding site.
Column pvalue.wald
is the pvalue of wald test for each binding site.
Hao Wu<[email protected]>, Li Chen <[email protected]>
data(seqData) seqData=ChIPComp(seqData)
data(seqData) seqData=ChIPComp(seqData)
Make a list with two elements. The first element is a data frame containing two group comparison study information. The second element is the design matrix.
makeConf(sampleSheet)
makeConf(sampleSheet)
sampleSheet |
A csv sheet represents ChIP experiments design.
It contains 6 columns, |
A list with two elements. The first element is a data frame containing two group comparison study information. The second element is the design matrix.
Hao Wu<[email protected]>, Li Chen <[email protected]>
confs=makeConf(system.file("extdata", "conf.csv", package="ChIPComp")) conf=confs$conf design=confs$design
confs=makeConf(system.file("extdata", "conf.csv", package="ChIPComp")) conf=confs$conf design=confs$design
This is an utility function to create a data frame. The data frame contains binding sites merged by peaks from two conditions, count ChIP read counts, smoothing control counts for each candidate region, and indicate the common peaks from two conditions.
makeCountSet(conf,design,filetype,species,peak.center=FALSE,peak.ext=0,binsize=50,mva.span=c(1000,5000,10000))
makeCountSet(conf,design,filetype,species,peak.center=FALSE,peak.ext=0,binsize=50,mva.span=c(1000,5000,10000))
conf |
A data frame that represents the ChIP experiments information.
It contains 6 columns, |
design |
Two column design matrix. The number of rows equals number of ChIP samples from two conditions. The first column are all 1s, which indicates intercept in regression model. The second column are 1s for one condition and 0s for another condition. |
filetype |
Two sequence file types are supported (bed or bam). |
species |
Two species are supported (hg19 or mm9). Other species are supported by specifying other. |
peak.center |
This argument is coupled with |
peak.ext |
This argument is coupled with |
binsize |
binsize in bp to calculate the smooth local lambda in poisson distribution. The default is 50bp. |
mva.span |
1 kb, 5 kb or 10 kb window centered at the peak location in the control sample. |
A object ChIPComp
.
Column chr
,start
,end
are the binding site genomic coordinate;
Column ip_c(\#condition)_r(\#replicate)
indicates the ChIP counts in \#replicate in \#condition;
Column ct_c(\#condition)_r(\#replicate)
indicates the smoothing control counts in \#replicate in \#condition;
Column commonPeak
indicates the common binding sites.
conf=data.frame( SampleID=1:4, condition=c("Helas3","Helas3","K562","K562"), factor=c("H3k27ac","H3k27ac","H3k27ac","H3k27ac"), ipReads=system.file("extdata",c("Helas3.ip1.bed","Helas3.ip2.bed","K562.ip1.bed","K562.ip2.bed"),package="ChIPComp"), ctReads=system.file("extdata",c("Helas3.ct.bed","Helas3.ct.bed","K562.ct.bed","K562.ct.bed"),package="ChIPComp"), peaks=system.file("extdata",c("Helas3.peak.bed","Helas3.peak.bed","K562.peak.bed","K562.peak.bed"),package="ChIPComp") ) conf$condition=factor(conf$condition) conf$factor=factor(conf$factor) design=as.data.frame(lapply(conf[,c("condition","factor")],as.numeric))-1 design=as.data.frame(model.matrix(~condition,design)) countSet=makeCountSet(conf,design,filetype="bed", species="hg19",binsize=1000)
conf=data.frame( SampleID=1:4, condition=c("Helas3","Helas3","K562","K562"), factor=c("H3k27ac","H3k27ac","H3k27ac","H3k27ac"), ipReads=system.file("extdata",c("Helas3.ip1.bed","Helas3.ip2.bed","K562.ip1.bed","K562.ip2.bed"),package="ChIPComp"), ctReads=system.file("extdata",c("Helas3.ct.bed","Helas3.ct.bed","K562.ct.bed","K562.ct.bed"),package="ChIPComp"), peaks=system.file("extdata",c("Helas3.peak.bed","Helas3.peak.bed","K562.peak.bed","K562.peak.bed"),package="ChIPComp") ) conf$condition=factor(conf$condition) conf$factor=factor(conf$factor) design=as.data.frame(lapply(conf[,c("condition","factor")],as.numeric))-1 design=as.data.frame(model.matrix(~condition,design)) countSet=makeCountSet(conf,design,filetype="bed", species="hg19",binsize=1000)
plot correlation between log ChIP counts and smoothing control counts in common binding sites.
## S3 method for class 'ChIPComp' plot(x,...)
## S3 method for class 'ChIPComp' plot(x,...)
x |
A |
... |
Other graphical parameters to |
Plot the correlation between ChIP sample and control sample
Hao Wu<[email protected]>, Li Chen <[email protected]>
data(seqData) plot(seqData)
data(seqData) plot(seqData)
Print top differential binding sites ranked by posterior probability in a decreasing order.
## S3 method for class 'ChIPComp' print(x,topK=10,...)
## S3 method for class 'ChIPComp' print(x,topK=10,...)
x |
A |
topK |
top K differential binding sites. Default is 10. |
... |
Other parameters to |
Print differential binding sites ranked by posterior probability
Hao Wu<[email protected]>, Li Chen <[email protected]>
data(seqData) seqData=ChIPComp(seqData) print(seqData)
data(seqData) seqData=ChIPComp(seqData) print(seqData)
ChIPComp
object.
The object is sampled from 50 common binding sites between Helas3 and K562 cell lines for H3K27ac and 5 unique binding sites for each cell line.
data(seqData)
data(seqData)
A "ChIPComp" class object
data(seqData)
data(seqData)