Title: | Statistical tools for the analysis of ChIP-seq data |
---|---|
Description: | Statistical tools for ChIP-seq data analysis. The package includes the statistical method described in Kaufmann et al. (2009) PLoS Biology: 7(4):e1000090. Briefly, Taking the average DNA fragment size subjected to sequencing into account, the software calculates genomic single-nucleotide read-enrichment values. After normalization, sample and control are compared using a test based on the Poisson distribution. Test statistic thresholds to control the false discovery rate are obtained through random permutation. |
Authors: | Jose M Muino |
Maintainer: | Jose M Muino <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.59.0 |
Built: | 2024-10-30 05:18:04 UTC |
Source: | https://github.com/bioc/CSAR |
Statistical tools for ChIP-seq data analysis.
The package is oriented to plant organisms, and compatible with standard file formats in the plant research field.
Package: | CSAR |
Type: | Package |
Version: | 1.0 |
Date: | 2009-11-09 |
License: | Artistic-2.0 |
LazyLoad: | yes |
Jose M Muino
Maintainer: Jose M Muino <[email protected]>
Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistcal detection of protein-bound genomic regions.
Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS) ##We calculate the candidate read-enriched regions win<-sigWin(test) ##We generate a wig file of the results to visualize tehm in a genome browser score2wig(test,file="test.wig") ##We calculate relative positions of read-enriched regions regarding gene position d<-distance2Genes(win=win,gff=TAIR8_genes_test) ##We calculate table of genes with read-enriched regions, and their location genes<-genesWithPeaks(d) ##We calculate two sets of read-enrichment scores through permutation permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) ###Next function will get all permutated score values generated by permutatedWinScores function. ##This represent the score distribution under the null hypotesis and therefore it can be use to control the error of our test. nulldist<-getPermutatedWinScores(file="test",nn=1:2) ##From this distribution, several cut-off values can be calculated to control the error of our test. ##Several functions in R can be used for this purpose. ##In this package we had implemented a simple method for the control of the error based on FDR" getThreshold(winscores=values(win)$score,permutatedScores=nulldist,FDR=.01)
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS) ##We calculate the candidate read-enriched regions win<-sigWin(test) ##We generate a wig file of the results to visualize tehm in a genome browser score2wig(test,file="test.wig") ##We calculate relative positions of read-enriched regions regarding gene position d<-distance2Genes(win=win,gff=TAIR8_genes_test) ##We calculate table of genes with read-enriched regions, and their location genes<-genesWithPeaks(d) ##We calculate two sets of read-enrichment scores through permutation permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) ###Next function will get all permutated score values generated by permutatedWinScores function. ##This represent the score distribution under the null hypotesis and therefore it can be use to control the error of our test. nulldist<-getPermutatedWinScores(file="test",nn=1:2) ##From this distribution, several cut-off values can be calculated to control the error of our test. ##Several functions in R can be used for this purpose. ##In this package we had implemented a simple method for the control of the error based on FDR" getThreshold(winscores=values(win)$score,permutatedScores=nulldist,FDR=.01)
Calculate read-enrichment scores for each nucleotide position
ChIPseqScore(control, sample, backg = -1, file = NA, norm = 3 * 10^9, test = "Ratio",times=1e6,digits=2)
ChIPseqScore(control, sample, backg = -1, file = NA, norm = 3 * 10^9, test = "Ratio",times=1e6,digits=2)
control |
data.frame structure obtained by mappedReads2Nhits |
sample |
data.frame structure obtained by mappedReads2Nhits |
backg |
Due low coverage in the control, there could be regions with no hits. Any region with a hit value lower than |
file |
Name of the file where you wan to save the results (if desired) |
norm |
Integer value. Number of hits will be reported by number of hits per |
test |
Use a score based on the poisson distribution ("Poisson") or in the ratio ("Ratio") |
times |
To be memory efficient, CSAR will only upload to the RAM memory fragments of length |
digits |
Number of decimal digits used to report the score values |
Different sequencing efforts yield different number of sequenced reads, for this reason the "number of hits" at each nucleotide position is normalized by the total number of nucleotides sequenced.
Subsequently, the number of hits for the sample is normalize to have the same mean and variance than the control, for each chromosome independently or for the whole set of chromosomes (depending of the value of normEachChrInd
).
Due low coverage, there could be regions with no hits. Any region with a hit value lower than backg
in the control
will be set to the value of backg
For each nucleotide position, a read-enrichment score will be calculated with the Poisson test, or with the ratio.
A list to be used for other functions of the CSAR package
chr |
Chromosme names |
chrL |
Chromosme length (bp) |
filenames |
Name of the files where the score values are storaged |
digits |
Score values storaged on the files need to be divided by 10^ |
Jose M Muino, [email protected]
Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistical detection of protein-bound genomic regions.
Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.
CSAR-package
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS)
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS)
Calculate relative positions of read-enrichment regions regarding gene position
distance2Genes(win, gff, t = 1, d1 = -3000, d2 = 1000)
distance2Genes(win, gff, t = 1, d1 = -3000, d2 = 1000)
win |
GRange structure obtained with the function |
gff |
Data.frame structure obtained after loading a desired gff file |
t |
Integer. Only distances of read-enriched regions with a score bigger than |
d1 |
Negative integer. Minimum relative position regarding the start of the gene to be considered |
d2 |
Positive integer. Maximum relative position regarding the end of the gene to be considered |
data.frame structure where each row represents one relative position, and each column being:
peakName |
read-enriched region name |
p1 |
relative position regarding the start of the |
p2 |
relative position regarding the end of the |
gene |
name of the gene |
le |
length (bp) of the gene |
Jose M Muino, [email protected]
Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistcal detection of protein-bound genomic regions.
Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.
genesWithPeaks, CSAR-package
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS) ##We calculate the candidate read-enriched regions win<-sigWin(test) ##We calculate relative positions of read-enriched regions regarding gene position d<-distance2Genes(win=win,gff=TAIR8_genes_test)
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS) ##We calculate the candidate read-enriched regions win<-sigWin(test) ##We calculate relative positions of read-enriched regions regarding gene position d<-distance2Genes(win=win,gff=TAIR8_genes_test)
Provide table of genes with read-enriched regions, and their location
genesWithPeaks(distances)
genesWithPeaks(distances)
distances |
data.frame structure obtained by |
This function report for each gene, the maximum peak score in different regions near of the gene. The input of the function is the distances between genes and peaks calculated by distance2Genes
data.frame structure with each coloumn being:
name |
name of the gene |
max3kb1kb |
maximum score value for the region 3Kb upstream to 1Kb dowstream |
u3000 |
maximum score value for the region 3Kb upstream to 2Kb upstream |
u2000 |
maximum score value for the region 2Kb upstream to 1Kb upstream |
u1000 |
maximum score value for the region 1Kb upstream to 0Kb upstream |
d0 |
maximum score value for the region 0Kb upstream to 0Kb dowstream |
d1000 |
maximum score value for the region 0Kb dowstream to 1Kb dowstream |
Jose M Muino, [email protected]
Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistcal detection of protein-bound genomic regions.
Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.
distance2Genes,CSAR-package
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS) ##We calculate the candidate read-enriched regions win<-sigWin(test) ##We calculate relative positions of read-enriched regions regarding gene position d<-distance2Genes(win=win,gff=TAIR8_genes_test) ##We calculate table of genes with read-enriched regions, and their location genes<-genesWithPeaks(d)
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS) ##We calculate the candidate read-enriched regions win<-sigWin(test) ##We calculate relative positions of read-enriched regions regarding gene position d<-distance2Genes(win=win,gff=TAIR8_genes_test) ##We calculate table of genes with read-enriched regions, and their location genes<-genesWithPeaks(d)
Obtain the read-enrichment score distribution under the null hypothesis
getPermutatedWinScores(file, nn)
getPermutatedWinScores(file, nn)
file |
Name of the file generated by permutatedWinScores |
nn |
ID for the multiple permutation process |
Numeric vector of score values under permutation
Jose M Muino, [email protected]
Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistcal detection of protein-bound genomic regions.
Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.
CSAR-package, permutatedWinScores
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate two sets of read-enrichment scores through permutation permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) ###Next function will get all permutated score values generated by permutatedWinScores function. ##This represent the score distribution under the null hypotesis and therefore it can be use to control the error of our test. nulldist<-getPermutatedWinScores(file="test",nn=1:2)
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate two sets of read-enrichment scores through permutation permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) ###Next function will get all permutated score values generated by permutatedWinScores function. ##This represent the score distribution under the null hypotesis and therefore it can be use to control the error of our test. nulldist<-getPermutatedWinScores(file="test",nn=1:2)
Calculate the threshold value corresponding to control FDR at a desired level
getThreshold(winscores, permutatedScores, FDR)
getThreshold(winscores, permutatedScores, FDR)
winscores |
Numeric vector with score values obtained from the |
permutatedScores |
Numeric vector with the permutated read-enrichment score values |
FDR |
Numeric value with the desired FDR control |
This is a very simple function to obtain the threshold value of our test statistic controlling FDR at a desired level. Other functions implemented in R (eg: multtest
) could be more sophisticated.
Basically, for each possible threshold value, the proportion of error type I is calculated assuming that the permutated score distribution is a optimal estimation of the score distribution under the null hypothesis.
This is, the proportion of permutated scores exceding the considered threshold value is used as an estimation of the error type I of our statisitic.
FDR is obtained as the ratio of the proportion of error type I by the proportion of significant tests.
A table with the columns being:
threshold |
The threshold value |
p-value |
The p-value obtained from the permutated score ditribution |
FDR |
The FDR control obtained using |
Jose M Muino, [email protected]
Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistcal detection of protein-bound genomic regions.
Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.
CSAR-package,getPermutatedWinScores, sigWin
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS) ##We calculate the candidate read-enriched regions win<-sigWin(test) ##We calculate two sets of read-enrichment scores through permutation permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) ###Next function will get all permutated score values generated by permutatedWinScores function. ##This represent the score distribution under the null hypotesis and therefore it can be use to control the error of our test. nulldist<-getPermutatedWinScores(file="test",nn=1:2) ##From this distribution, several cut-off values can be calculated to control the error of our test. ##Several functions in R can be used for this purpose. ##In this package we had implemented a simple method for the control of the error based on FDR" getThreshold(winscores=values(win)$score,permutatedScores=nulldist,FDR=.01)
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS) ##We calculate the candidate read-enriched regions win<-sigWin(test) ##We calculate two sets of read-enrichment scores through permutation permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) ###Next function will get all permutated score values generated by permutatedWinScores function. ##This represent the score distribution under the null hypotesis and therefore it can be use to control the error of our test. nulldist<-getPermutatedWinScores(file="test",nn=1:2) ##From this distribution, several cut-off values can be calculated to control the error of our test. ##Several functions in R can be used for this purpose. ##In this package we had implemented a simple method for the control of the error based on FDR" getThreshold(winscores=values(win)$score,permutatedScores=nulldist,FDR=.01)
This function load the output file of a read mapping software (eg:SOAP)
loadMappedReads(file, format = "SOAP", header = FALSE)
loadMappedReads(file, format = "SOAP", header = FALSE)
file |
File name to load |
format |
Format of the file. "SOAP" for the output of the soap software and "MAQ" for the maq software. Other user formats can be provided as a character vector for the |
header |
Logical value indicating if the first line of the file should be skipped (TRUE) or not (FALSE) |
data.frame structure that can be used by mappedReads2Nhits
Jose M Muino, [email protected]
Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistcal detection of protein-bound genomic regions.
Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.
CSAR-package
##We load the mapped reads: #sample<-loadMappedReads(file=file,format="SOAP",w=300,header=F) ##where file is the name and path of the output file of the mapping process.
##We load the mapped reads: #sample<-loadMappedReads(file=file,format="SOAP",w=300,header=F) ##where file is the name and path of the output file of the mapping process.
Calculate number of overlapped extended reads per nucleotide position
mappedReads2Nhits(input, file , chr = c("chr1", "chr2", "chr3", "chr4", "chr5"), chrL = "TAIR9", w = 300L, considerStrand = "Minimum", uniquelyMapped = TRUE, uniquePosition = FALSE)
mappedReads2Nhits(input, file , chr = c("chr1", "chr2", "chr3", "chr4", "chr5"), chrL = "TAIR9", w = 300L, considerStrand = "Minimum", uniquelyMapped = TRUE, uniquePosition = FALSE)
input |
data loaded with loadMappedReads or an AlignedRead object from the ShortRead package |
file |
Name of the file where the results will be saved. If NA the results will not be saved in a file. |
chr |
Character vector containing the chromosome names as identified on |
chrL |
Numeric vector containing the length (bp) of the chromosomes. It should be in the same order than |
w |
Integer corresponding to the desired length of the extended reads. An advised value will be the average fragment length of the DNA submitted to sequence (usually 300 bp). |
considerStrand |
Character value. "Minimum"=>Default value. Report the minimum number of hits at each nucleotide position for both strands. "Foward"=> Report the number of hits at each nucleotide position for the "foward" strands (the one denoted as "+" in "Reverse"=>Report the number of hits at each nucleotide position for the "reverse" strands (the one denoted as "-" in "Sum"=>Report the sum of number of hits at each nucleotide position for both strands. |
uniquelyMapped |
Logic value, If TRUE, only consider uniquely mapped reads. |
uniquePosition |
Logic value. If TRUE, only consider reads mapped in different positions. |
A list to be used for other functions of the CSAR package
chr |
Chromosme names |
chrL |
Chromosme length (bp) |
chrL_0 |
Number of nucleotide positions with at least one extended read |
chrL_0 |
Number of nucleotide positions with at least one extended read |
filenames |
Name of the files where the Nhits values are storaged |
c1 |
Sum of all the Nhits values for each chromosome |
c2 |
Sum of all the Nhits square values for each chromosome |
Jose M Muino, [email protected]
Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistical detection of protein-bound genomic regions.
Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.
CSAR-package
#For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); #We calculate the number of hits for each nucleotide posotion for the sample. We do that just for chromosome chr1, and for positions from 1 bp to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))
#For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); #We calculate the number of hits for each nucleotide posotion for the sample. We do that just for chromosome chr1, and for positions from 1 bp to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))
Calculate scores for permutated read-enriched regions
permutatedWinScores(nn = 1, control, sample, fileOutput, chr = c("chr1", "chr2", "chr3", "chr4", "chr5"), chrL = "TAIR9", w = 300L, considerStrand = "Minimum", uniquelyMapped = TRUE, uniquePosition = FALSE, norm = 3 * 10^9, backg = -1, t = 1, g = 100,times=1e6,digits=2,test="Ratio")
permutatedWinScores(nn = 1, control, sample, fileOutput, chr = c("chr1", "chr2", "chr3", "chr4", "chr5"), chrL = "TAIR9", w = 300L, considerStrand = "Minimum", uniquelyMapped = TRUE, uniquePosition = FALSE, norm = 3 * 10^9, backg = -1, t = 1, g = 100,times=1e6,digits=2,test="Ratio")
nn |
ID to identify each permutation |
control |
data.frame structure obtained by loading the mapped reads with the function LoadMappedReads() |
sample |
data.frame structure obtained by loading the mapped reads with the function LoadMappedReads() |
fileOutput |
Name of the file were the results will be written |
chr |
Character vector containing the chromosome names as identified on |
chrL |
Numeric vector containing the length (bp) of the chromosomes. It should be in the same order than |
w |
Integer corresponding to the desired length of the extended reads. |
considerStrand |
Character value. "Minimum"=>Default value. Report the minimum number of hits at each nucleotide position for both strands. "Foward"=> Report the number of hits at each nucleotide position for the "foward" strands (the one denoted as "+" in "Reverse"=>Report the number of hits at each nucleotide position for the "reverse" strands (the one denoted as "-" in "Sum"=>Report the sum of number of hits at each nucleotide position for both strands. |
uniquelyMapped |
Logic value, If TRUE, only consider unquely mapped reads. |
uniquePosition |
Logic value. If TRUE, only consider reads mapped in different positions. |
norm |
Integer value. Number of hits will be reported by number of hits per |
backg |
Any region with a hit value lower than |
t |
Numeric value. Read-enriched regions are calculated as genomic regions with score values bigger than |
g |
Integer value. The maximum gap allowed between regions. Regions that are less than |
times |
To be memory efficient, CSAR will only upload to the RAM memory fragments of length |
digits |
Number of decimal digits used to report the score values |
test |
Use a score based on the poisson distribution ("Poisson") or in the ratio ("Ratio") |
The parameter values should be the same than the one used in sigWin
, ChIPseqScore
, and mappedReads2Nhits
.
The label "control" and "sample" is asigned to each read to identify from which group they came. Labels are randomly permutated, and read-enriched regions for this new permuated dataset are calculated.
The file filePutput
is created with its values being the permuated score values.
Jose M Muino, [email protected]
Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistcal detection of protein-bound genomic regions.
Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.
CSAR-package,getPermutatedWinScores
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate two sets of read-enrichment scores through permutation permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate two sets of read-enrichment scores through permutation permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000)) permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))
Partial dataset of a Solexa DNA library obtained from a ChIP-seq experiment in Arabidopsis
Kaufmann et al. (2009) Target Genes of the MADS Transcription Factor SEPALLATA3: Integration of Developmental and Hormonal Pathways in the $Arabidopsis$ Flower. PLoS Biol 7:e1000090
data(CSAR-dataset)
data(CSAR-dataset)
Save the read-enrichment scores at each nucleotide position in a .wig file format that can be visualize by a genome browser (eg: Integrated Genome Browser)
score2wig(experiment, file, t = 2, times = 1e6,description="", name="")
score2wig(experiment, file, t = 2, times = 1e6,description="", name="")
experiment |
Output of the function |
file |
Name of the output .wig file |
t |
Only nucleotide positions with a read-enrichment score bigger than |
times |
To be memory efficient, CSAR will only upload to the RAM memory fragments of length |
description |
Character. It adds a description to the wig file. The description will be shown by the genome browser used to visualize the wig file. |
name |
Character. It adds a wig to the wig file. The name will be shown by the genome browser used to visualize the wig file. |
None. Results are printed in a file
Jose M Muino, [email protected]
Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistcal detection of protein-bound genomic regions.
Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.
CSAR-package
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide position for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##Since we will not need the raw data anymore, we could delete it from the RAM memory rm(sampleSEP3_test,controlSEP3_test);gc(verbose=FALSE) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS) ##We generate a wig file of the results to visualize them in a genome browser score2wig(test,file="test.wig")
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide position for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##Since we will not need the raw data anymore, we could delete it from the RAM memory rm(sampleSEP3_test,controlSEP3_test);gc(verbose=FALSE) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS) ##We generate a wig file of the results to visualize them in a genome browser score2wig(test,file="test.wig")
Calculate regions of read-enrichment
sigWin(experiment, t = 1, g = 100)
sigWin(experiment, t = 1, g = 100)
experiment |
Output of the function |
t |
Numeric value. Read-enriched regions are calculated as genomic regions with score values bigger than |
g |
Integer value. The maximum gap allowed between regions. Regions that are less than |
An object of type'GRange' with its values being:
seqnames |
Chromosome name |
ranges |
An IRanges object indicating start and end of the read-enriched region |
posPeak |
Position of the maximum score value on the read-enriched region |
score |
Maximum score value on the read-enriched region |
Jose M Muino, [email protected]
Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistcal detection of protein-bound genomic regions.
Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.
CSAR-package
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS) ##We calculate the candidate read-enriched regions win<-sigWin(test)
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009) data("CSAR-dataset"); ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000)) ##We calculate a score for each nucleotide position test<-ChIPseqScore(control=nhitsC,sample=nhitsS) ##We calculate the candidate read-enriched regions win<-sigWin(test)