Title: | BaalChIP: Bayesian analysis of allele-specific transcription factor binding in cancer genomes |
---|---|
Description: | The package offers functions to process multiple ChIP-seq BAM files and detect allele-specific events. Computes allele counts at individual variants (SNPs/SNVs), implements extensive QC steps to remove problematic variants, and utilizes a bayesian framework to identify statistically significant allele- specific events. BaalChIP is able to account for copy number differences between the two alleles, a known phenotypical feature of cancer samples. |
Authors: | Ines de Santiago, Wei Liu, Ke Yuan, Martin O'Reilly, Chandra SR Chilamakuri, Bruce Ponder, Kerstin Meyer, Florian Markowetz |
Maintainer: | Ines de Santiago <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.33.0 |
Built: | 2024-10-30 03:35:55 UTC |
Source: | https://github.com/bioc/BaalChIP |
Produces a density plot of the distribution of allelic ratios (REF/TOTAL) before and after BaalChIP adjustment for RM and RAF biases.
adjustmentBaalPlot(.Object, col = c("green3", "gray50")) ## S4 method for signature 'BaalChIP' adjustmentBaalPlot(.Object, col = c("green3", "gray50"))
adjustmentBaalPlot(.Object, col = c("green3", "gray50")) ## S4 method for signature 'BaalChIP' adjustmentBaalPlot(.Object, col = c("green3", "gray50"))
.Object |
An object of the |
col |
A character vector indicating the colours for the density plot ( default is c( 'green3','gray50') ). |
A plot
Ines de Santiago
data('BaalObject') adjustmentBaalPlot(BaalObject) adjustmentBaalPlot(BaalObject, col=c('blue','pink'))
data('BaalObject') adjustmentBaalPlot(BaalObject) adjustmentBaalPlot(BaalObject, col=c('blue','pink'))
Generates allele-specific read count data from each BAM ChIP-seq dataset for each variant.
alleleCounts(.Object, min_base_quality = 10, min_mapq = 15, verbose = TRUE) ## S4 method for signature 'BaalChIP' alleleCounts(.Object, min_base_quality = 10, min_mapq = 15, verbose = TRUE)
alleleCounts(.Object, min_base_quality = 10, min_mapq = 15, verbose = TRUE) ## S4 method for signature 'BaalChIP' alleleCounts(.Object, min_base_quality = 10, min_mapq = 15, verbose = TRUE)
.Object |
An object of the |
min_base_quality |
A numeric value indicating the minimum read base quality below which the base is ignored when summarizing pileup information (default 10). |
min_mapq |
A numeric value indicating the minimum mapping quality (MAPQ) below which the entire read is ignored (default 15). |
verbose |
logical. If TRUE reports extra information on the process |
Utilizes the information within the samples
slot of a BaalChIP object. Will primarily find all variants overlapping peaks. Then, for each variant, computes the number of reads carrying the reference (REF) and alternative (ALT) alleles.
An updated BaalChIP
object with the slot alleleCounts
containing a list of GRanges objects.
BaalChIP computes allelic counts at each variant position with Rsamtools pileup function. The algorithm follows pileup::Rsamtools by automatically excluding reads flagged as unmapped, secondary, duplicate, or not passing quality controls.
Ines de Santiago
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- alleleCounts(res, min_base_quality=10, min_mapq=15) #retrieve alleleCounts: counts <- BaalChIP.get(res, 'alleleCountsPerBam') #alleleCounts are grouped by bam_name and group_name: names(counts) names(counts[['MCF7']]) #check out the result for one of the bam files: counts[['MCF7']][[1]]
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- alleleCounts(res, min_base_quality=10, min_mapq=15) #retrieve alleleCounts: counts <- BaalChIP.get(res, 'alleleCountsPerBam') #alleleCounts are grouped by bam_name and group_name: names(counts) names(counts[['MCF7']]) #check out the result for one of the bam files: counts[['MCF7']][[1]]
This S4 class includes a series of methods for detecting allele-specific events from multiple ChIP-seq datasets.
BaalChIP(samplesheet = NULL, hets = NULL, CorrectWithgDNA = list())
BaalChIP(samplesheet = NULL, hets = NULL, CorrectWithgDNA = list())
samplesheet |
A character string indicating the filename for a
|
hets |
A named vector with filenames for the
|
CorrectWithgDNA |
An optional named list with comple file paths for the |
.Object An object of the BaalChIP
class.
Ines de Santiago, Wei Liu, Ke Yuan, Florian Markowetz
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- new("BaalChIP", samplesheet=samplesheet, hets=hets) res <- BaalChIP(samplesheet=samplesheet, hets=hets)
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- new("BaalChIP", samplesheet=samplesheet, hets=hets) res <- BaalChIP(samplesheet=samplesheet, hets=hets)
Get information from individual slots in a BaalChIP object.
BaalChIP.get( .Object, what = c("samples", "param", "alleleCountsPerBam", "mergedCounts", "assayedVar", "biasTable") ) ## S4 method for signature 'BaalChIP' BaalChIP.get( .Object, what = c("samples", "param", "alleleCountsPerBam", "mergedCounts", "assayedVar") )
BaalChIP.get( .Object, what = c("samples", "param", "alleleCountsPerBam", "mergedCounts", "assayedVar", "biasTable") ) ## S4 method for signature 'BaalChIP' BaalChIP.get( .Object, what = c("samples", "param", "alleleCountsPerBam", "mergedCounts", "assayedVar") )
.Object |
An object of the |
what |
a single character value specifying which information should be retrieved. Options: 'samples', 'param', 'alleleCountsPerBam', 'mergedCounts', 'assayedVar', 'biasTable'. |
The slot content from an object of the BaalChIP
class.
Ines de Santiago
data('BaalObject') #samples data spreadsheet and hets: BaalChIP.get(BaalObject,what='samples') #parameters used within run: BaalChIP.get(BaalObject,what='param') #retrieve a GRanges list with allele-specific read counts per BAM file: counts <- BaalChIP.get(BaalObject,what='alleleCountsPerBam') counts[['MCF7']][[1]] #retrieve a data.frame with allele-specific read counts per group: counts <- BaalChIP.get(BaalObject,what='mergedCounts') head(counts[[1]])
data('BaalObject') #samples data spreadsheet and hets: BaalChIP.get(BaalObject,what='samples') #parameters used within run: BaalChIP.get(BaalObject,what='param') #retrieve a GRanges list with allele-specific read counts per BAM file: counts <- BaalChIP.get(BaalObject,what='alleleCountsPerBam') counts[['MCF7']][[1]] #retrieve a data.frame with allele-specific read counts per group: counts <- BaalChIP.get(BaalObject,what='mergedCounts') head(counts[[1]])
Generates a data.frame per group with all variants and a label for all identified allele-specific binding (ASB) variants.
BaalChIP.report(.Object) ## S4 method for signature 'BaalChIP' BaalChIP.report(.Object)
BaalChIP.report(.Object) ## S4 method for signature 'BaalChIP' BaalChIP.report(.Object)
.Object |
An object of the |
The reported data frame contains the following columns:
ID: unique identifier string per analysed variant.
CHROM: chromosome identifier from the reference genome per variant.
POS: the reference position (1-based).
REF: reference base. Each base must be one of A,C,G,T in uppercase.
ALT: alternate non-reference base. Each base must be one of A,C,G,T in uppercase.
REF.counts: pooled counts of all reads with the reference allele.
ALT.counts: pooled counts of all reads with the non-reference allele.
Total.counts: pooled counts of all reads (REF + ALT).
AR: allelic ratio calculated directly from sequencing reads (REF / TOTAL).
RMbias: numerical value indicating the value estimated and applied by BaalChIP for the reference mapping bias. A value between 0.5 and 1 denotes a bias to the reference allele, and a value between 0 and 0.5 a bias to the alternative allele.
RAF: numerical value indicating the value applied by BaalChIP for the relative allele frequency (RAF) bias correction. A value between 0.5 and 1 denotes a bias to the reference allele, and a value between 0 and 0.5 a bias to the alternative allele.
Bayes_lower: lower interval for the estimated allelic ratio (allelic ratio is given by REF / TOTAL).
Bayes_upper: upper interval for the estimated allelic ratio (allelic ratio is given by REF / TOTAL).
Corrected.AR: average estimated allelic ratio (average between Bayes_lower and Bayes_upper). A value between 0.5 and 1 denotes a bias to the reference allele, and a value between 0 and 0.5 a bias to the alternative allele.
isASB: logical value indicating BaalChIP's classification of variants into allele-specific.
A named list, with a data.frame per group.
Ines de Santiago
data('BaalObject') report <- BaalChIP.report(BaalObject) #the reported list is grouped by group_name: names(report) #check out the report for one of the groups: head(report[['MCF7']])
data('BaalObject') report <- BaalChIP.report(BaalObject) #the reported list is grouped by group_name: names(report) #check out the report for one of the groups: head(report[['MCF7']])
BaalChIP.run is a wrapper convenience function, to compute allele counts and perform quality controls in one step. This function will use the package's defaults.
BaalChIP.run(.Object, cores = 4, verbose = TRUE) ## S4 method for signature 'BaalChIP' BaalChIP.run(.Object, cores = 4, verbose = TRUE)
BaalChIP.run(.Object, cores = 4, verbose = TRUE) ## S4 method for signature 'BaalChIP' BaalChIP.run(.Object, cores = 4, verbose = TRUE)
.Object |
An object of the |
cores |
number of cores for parallel computing (default is 4). |
verbose |
logical. If TRUE reports extra information on the process |
This function is a wrapper of the following functions: alleleCounts
, QCfilter
, mergePerGroup
, filter1allele
, getASB
An object of the BaalChIP
class.
Ines de Santiago
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- BaalChIP.run(res, cores=2) #summary of the QC step summaryQC(res) #summary of the ASB step summaryASB(res)
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- BaalChIP.run(res, cores=2) #summary of the QC step summaryQC(res) #summary of the ASB step summaryASB(res)
BaalObject example dataset
A BaalChIP-class object
Ines de Santiago [email protected]
A GRanges object containing blacklisted regions identified by the ENCODE and modENCODE consortia. These correspond to artifact regions that tend to show artificially high signal (excessive unstructured anomalous reads mapping). Selected from mappability track of the UCSC genome browser (hg19, wgEncodeDacMapabilityConsensusExcludable and wgEncodeDukeMapabilityRegionsExcludable tables).
Code used to retrieve these regions:
curl http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeDacMapabilityConsensusExcludable.bed > hg19_DACExcludable.txt
cat hg19_DUKEExcludable.txt hg19_DACExcludable.txt | grep -v "^#" | cut -f 2,3,4,5,6,7 | sort -k1,1 -k2,2n | mergeBed -nms -i stdin > hg19_DUKE_DAC.bed
Used as 'RegionsToFilter' within the QCfilter function so that variants overlapping these regions will be removed.
A GRanges object of 1378 ranges.
Note that these blacklists are applicable to functional genomic data (e.g. ChIP-seq, MNase-seq, DNase-seq, FAIRE-seq) of short reads (20-100bp reads). These are not applicable to RNA-seq or other transcriptome data types.
Ines de Santiago [email protected]
ENCODEexample example dataset
A BaalChIP-class object
Ines de Santiago [email protected]
Downloaded from supplementary data of BaalChIP paper
FAIREexample example dataset
A BaalChIP-class object
Ines de Santiago [email protected]
Downloaded from supplementary data of BaalChIP paper
Filters the data frame available within a BaalChIP
object (slot mergedCounts
). This filter ignores variants for which only one allele is observed after pooling ChIP-seq reads from all datasets.
filter1allele(.Object) ## S4 method for signature 'BaalChIP' filter1allele(.Object)
filter1allele(.Object) ## S4 method for signature 'BaalChIP' filter1allele(.Object)
.Object |
An object of the |
An updated BaalChIP
object with the slot mergedCounts
containing a data.frame of merged samples per group with variants that pass the filter.
Ines de Santiago
BaalChIP.get
, plotQC
, summaryQC
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- alleleCounts(res, min_base_quality=10, min_mapq=15) data('blacklist_hg19') data('pickrell2011cov1_hg19') data('UniqueMappability50bp_hg19') res <- QCfilter(res, RegionsToFilter=list('blacklist'=blacklist_hg19, 'highcoverage'=pickrell2011cov1_hg19), RegionsToKeep=list('UniqueMappability'=UniqueMappability50bp_hg19)) res <- mergePerGroup(res) res <- filter1allele(res) #retrieve mergedCounts: counts <- BaalChIP.get(res, 'mergedCounts') #mergedCounts are grouped by group_name: names(counts) sapply(counts, dim) #check out the result for one of the groups: head(counts[[1]])
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- alleleCounts(res, min_base_quality=10, min_mapq=15) data('blacklist_hg19') data('pickrell2011cov1_hg19') data('UniqueMappability50bp_hg19') res <- QCfilter(res, RegionsToFilter=list('blacklist'=blacklist_hg19, 'highcoverage'=pickrell2011cov1_hg19), RegionsToKeep=list('UniqueMappability'=UniqueMappability50bp_hg19)) res <- mergePerGroup(res) res <- filter1allele(res) #retrieve mergedCounts: counts <- BaalChIP.get(res, 'mergedCounts') #mergedCounts are grouped by group_name: names(counts) sapply(counts, dim) #check out the result for one of the groups: head(counts[[1]])
Filters the data frame available within a BaalChIP
object (slot alleleCounts
). This filter performs simulations of reads of the same length as the original ChIP-seq reads, aligns the simulated reads to the genome, calculates the allelic ratios for each variant and finally ignores those variants for which the allelic ratio (REF/TOTAL) is different than 0.5.
filterIntbias( .Object, simul_output = NULL, tmpfile_prefix = NULL, simulation_script = "local", alignmentSimulArgs = NULL, skipScriptRun = FALSE, verbose = TRUE ) ## S4 method for signature 'BaalChIP' filterIntbias( .Object, simul_output = NULL, tmpfile_prefix = NULL, simulation_script = "local", alignmentSimulArgs = NULL, skipScriptRun = FALSE, verbose = TRUE )
filterIntbias( .Object, simul_output = NULL, tmpfile_prefix = NULL, simulation_script = "local", alignmentSimulArgs = NULL, skipScriptRun = FALSE, verbose = TRUE ) ## S4 method for signature 'BaalChIP' filterIntbias( .Object, simul_output = NULL, tmpfile_prefix = NULL, simulation_script = "local", alignmentSimulArgs = NULL, skipScriptRun = FALSE, verbose = TRUE )
.Object |
An object of the |
simul_output |
a non-empty character vector giving the directory of where to save the FASTQ and BAM files generated by the function. If NULL, a random directory under the current working directory will be generated. |
tmpfile_prefix |
an optional character vector giving the initial part of the name of the FASTQ and BAM files generated by the function. If NULL, a random name will be generated. |
simulation_script |
the file path for simulation script containing the instructions of simulation and alignment commands. If NULL, the default simulation script distributed with BaalChIP ('extra/simulation_run.sh') will be used. |
alignmentSimulArgs |
a character vector with arguments to the simulation script. If NULL no arguments are passed. |
skipScriptRun |
a logical value indicating if simulation BAM files should not be generated. If TRUE BaalChIP will look for the BAM files in the 'simul_output/temp_prefix' (default is FALSE). |
verbose |
logical. If TRUE reports extra information on the process |
An updated BaalChIP
object with the slot alleleCounts
containing a list of GRanges objects that pass filters.
Ines de Santiago
BaalChIP.get
, plotQC
, summaryQC
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- alleleCounts(res, min_base_quality=10, min_mapq=15) skipScriptRun=TRUE #For demonstration purposes only (read details in vignette) res <- filterIntbias(res, simul_output=system.file('test/simuloutput',package='BaalChIP'), tmpfile_prefix='c67c6ec6c433', skipScriptRun=TRUE) #check results plotSimul(res) summaryQC(res)
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- alleleCounts(res, min_base_quality=10, min_mapq=15) skipScriptRun=TRUE #For demonstration purposes only (read details in vignette) res <- filterIntbias(res, simul_output=system.file('test/simuloutput',package='BaalChIP'), tmpfile_prefix='c67c6ec6c433', skipScriptRun=TRUE) #check results plotSimul(res) summaryQC(res)
getASB identifies allele-specific binding events using a bayesian framework.
getASB( .Object, Iter = 5000, conf_level = 0.95, cores = 4, RMcorrection = TRUE, RAFcorrection = TRUE, verbose = TRUE ) ## S4 method for signature 'BaalChIP' getASB( .Object, Iter = 5000, conf_level = 0.95, cores = 4, RMcorrection = TRUE, RAFcorrection = TRUE, verbose = TRUE )
getASB( .Object, Iter = 5000, conf_level = 0.95, cores = 4, RMcorrection = TRUE, RAFcorrection = TRUE, verbose = TRUE ) ## S4 method for signature 'BaalChIP' getASB( .Object, Iter = 5000, conf_level = 0.95, cores = 4, RMcorrection = TRUE, RAFcorrection = TRUE, verbose = TRUE )
.Object |
An object of the |
Iter |
Maximum number of iterations (default 5000). |
conf_level |
Confidence interval in the estimated allelic ratio (default 0.95). |
cores |
number of cores for parallel computing (default is 4). |
RMcorrection |
Logical value indicating if reference mapping (RM) bias should be applied (default TRUE). If FALSE will not correct for reference allele mapping bias. If TRUE will estimate the RM bias from the overall reference allele proportion. |
RAFcorrection |
Logical value indicating if relative allele frequency (RAF) bias correction should be applied (default TRUE). If TRUE will read RAF values for each variant from |
verbose |
logical. If TRUE reports extra information on the process |
An updated BaalChIP
object with the slot ASB
containing variants identified as allele-specific.
Wei Liu, Ke Yuan, Ines de Santiago
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- alleleCounts(res, min_base_quality=10, min_mapq=15) res <- mergePerGroup(res) res <- getASB(res, cores=2) #summary - number of significant ASB variants summaryASB(res) #report result res <- BaalChIP.report(res)
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- alleleCounts(res, min_base_quality=10, min_mapq=15) res <- mergePerGroup(res) res <- getASB(res, cores=2) #summary - number of significant ASB variants summaryASB(res) #report result res <- BaalChIP.report(res)
Merges all ChIP-seq datasets within a group of samples creating a data.frame that contains allele-specific read count data for all variants that need to be analysed.
mergePerGroup(.Object) ## S4 method for signature 'BaalChIP' mergePerGroup(.Object)
mergePerGroup(.Object) ## S4 method for signature 'BaalChIP' mergePerGroup(.Object)
.Object |
An object of the |
if QCfilter has been applied, will use the most up-to-date variant set available for each individual BAM file (after QC). Missing values are allowed for heterozygous variants that are not available (e.g. do not pass filter for a particular ChIP-seq dataset).
An updated BaalChIP
object with the slot mergedCounts
containing a data.frame of merged samples per group.
Ines de Santiago
BaalChIP.get
, plotQC
, summaryQC
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- alleleCounts(res, min_base_quality=10, min_mapq=15) data('blacklist_hg19') data('pickrell2011cov1_hg19') data('UniqueMappability50bp_hg19') res <- QCfilter(res, RegionsToFilter=list('blacklist'=blacklist_hg19, 'highcoverage'=pickrell2011cov1_hg19), RegionsToKeep=list('UniqueMappability'=UniqueMappability50bp_hg19)) res <- mergePerGroup(res) #retrieve mergedCounts: counts <- BaalChIP.get(res, 'mergedCounts') #mergedCounts are grouped by group_name: names(counts) sapply(counts, dim) #check out the result for one of the groups: head(counts[[1]])
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- alleleCounts(res, min_base_quality=10, min_mapq=15) data('blacklist_hg19') data('pickrell2011cov1_hg19') data('UniqueMappability50bp_hg19') res <- QCfilter(res, RegionsToFilter=list('blacklist'=blacklist_hg19, 'highcoverage'=pickrell2011cov1_hg19), RegionsToKeep=list('UniqueMappability'=UniqueMappability50bp_hg19)) res <- mergePerGroup(res) #retrieve mergedCounts: counts <- BaalChIP.get(res, 'mergedCounts') #mergedCounts are grouped by group_name: names(counts) sapply(counts, dim) #check out the result for one of the groups: head(counts[[1]])
A GRanges object containing collapsed repeat regions at the 0.1% threshold (hg19 reference). Used as 'RegionsToFilter' within the QCfilter function so that variants overlapping these regions will be removed.
A GRanges object of 34359 ranges.
Ines de Santiago [email protected]
File available as supplementary data: Pickrell2011_seq.cov1.bed (http://www.ncbi.nlm.nih.gov/pubmed/21690102)
Pickrell et al., 2011 (http://www.ncbi.nlm.nih.gov/pubmed/21690102)
Produces different plots of QC results.
plotQC(.Object, what = "barplot_per_group", addlegend = TRUE, plot = TRUE) ## S4 method for signature 'BaalChIP' plotQC(.Object, what = "barplot_per_group", addlegend = TRUE, plot = TRUE)
plotQC(.Object, what = "barplot_per_group", addlegend = TRUE, plot = TRUE) ## S4 method for signature 'BaalChIP' plotQC(.Object, what = "barplot_per_group", addlegend = TRUE, plot = TRUE)
.Object |
An object of the |
what |
A single character value indicating the type of plot. Options:
|
addlegend |
A logical value indicating if legend should be included in the plot (default TRUE). |
plot |
a logical value to whether it should plot (TRUE) or not (FALSE). Default is TRUE. |
A plot
Ines de Santiago
data('BaalObject') plotQC(BaalObject, what = 'overall_pie') plotQC(BaalObject, what = 'boxplot_per_filter', addlegend=FALSE) plotQC(BaalObject, what = 'barplot_per_group')
data('BaalObject') plotQC(BaalObject, what = 'overall_pie') plotQC(BaalObject, what = 'boxplot_per_filter', addlegend=FALSE) plotQC(BaalObject, what = 'barplot_per_group')
Produces a plot of the proportion of variants that displayed the correct number of mapped simulated reads.
plotSimul(.Object, plot = TRUE) ## S4 method for signature 'BaalChIP' plotSimul(.Object, plot = TRUE)
plotSimul(.Object, plot = TRUE) ## S4 method for signature 'BaalChIP' plotSimul(.Object, plot = TRUE)
.Object |
An object of the |
plot |
a logical value to whether it should plot (TRUE) or not (FALSE). Default is TRUE. |
A plot
Ines de Santiago
data('BaalObject') plotSimul(BaalObject)
data('BaalObject') plotSimul(BaalObject)
Quality control step for removing variants that may be problematic for identification of allele-specific events.
QCfilter(.Object, RegionsToFilter = NULL, RegionsToKeep = NULL, verbose = TRUE) ## S4 method for signature 'BaalChIP' QCfilter(.Object, RegionsToFilter = NULL, RegionsToKeep = NULL, verbose = TRUE)
QCfilter(.Object, RegionsToFilter = NULL, RegionsToKeep = NULL, verbose = TRUE) ## S4 method for signature 'BaalChIP' QCfilter(.Object, RegionsToFilter = NULL, RegionsToKeep = NULL, verbose = TRUE)
.Object |
An object of the |
RegionsToFilter |
a named list of GRanges objects. Variants overlapping these regions will be removed. |
RegionsToKeep |
a named list of GRanges objects. Works in an oposite way to 'RegionstoFilter', variants NOT overlapping these regions will be removed. |
verbose |
logical. If TRUE reports extra information on the process |
An updated BaalChIP
object with the slot alleleCounts
containing a list of GRanges objects that pass filters.
Ines de Santiago
BaalChIP.get
, plotQC
, summaryQC
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- alleleCounts(res, min_base_quality=10, min_mapq=15) data('blacklist_hg19') data('pickrell2011cov1_hg19') data('UniqueMappability50bp_hg19') res <- QCfilter(res, RegionsToFilter=list('blacklist'=blacklist_hg19, 'highcoverage'=pickrell2011cov1_hg19), RegionsToKeep=list('UniqueMappability'=UniqueMappability50bp_hg19)) #check results plotQC(res,'barplot') summaryQC(res)
samplesheet <- system.file("test", "exampleChIP.tsv", package = "BaalChIP") hets <- c("MCF7"= system.file("test", "MCF7_hetSNP.txt", package = "BaalChIP"), "GM12891"= system.file("test", "GM12891_hetSNP.txt", package = "BaalChIP")) res <- BaalChIP(samplesheet=samplesheet, hets=hets) res <- alleleCounts(res, min_base_quality=10, min_mapq=15) data('blacklist_hg19') data('pickrell2011cov1_hg19') data('UniqueMappability50bp_hg19') res <- QCfilter(res, RegionsToFilter=list('blacklist'=blacklist_hg19, 'highcoverage'=pickrell2011cov1_hg19), RegionsToKeep=list('UniqueMappability'=UniqueMappability50bp_hg19)) #check results plotQC(res,'barplot') summaryQC(res)
Generates summary of ASB test result.
summaryASB(.Object) ## S4 method for signature 'BaalChIP' summaryASB(.Object)
summaryASB(.Object) ## S4 method for signature 'BaalChIP' summaryASB(.Object)
.Object |
An object of the |
A matrix containing the total number of allele-specific variants (TOTAL) and the number of variants allele-specific for the reference (REF) and alternate alleles (ALT).
Ines de Santiago
data('BaalObject') summaryASB(BaalObject)
data('BaalObject') summaryASB(BaalObject)
Generates summary of QC result.
summaryQC(.Object) ## S4 method for signature 'BaalChIP' summaryQC(.Object)
summaryQC(.Object) ## S4 method for signature 'BaalChIP' summaryQC(.Object)
.Object |
An object of the |
A list with two elements:
filtering_stats
containing the number of variants that were filtered out in each filter category and the total number that 'pass' all filters.
average_stats
containing the average number and average percentage of variants in each filter category, averaged across all analysed groups.
Ines de Santiago
data('BaalObject') summaryQC(BaalObject)
data('BaalObject') summaryQC(BaalObject)
A GRanges object containing unique regions with genomic mappability score equal to 1. Selected from DUKE uniqueness mappability track of the UCSC genome browser (hg19, wgEncodeCrgMapabilityAlign50mer table).
Code used to retrieve these regions:
curl http://hgdownload.cse.ucsc.edu/gbdb/hg19/bbi/wgEncodeCrgMapabilityAlign50mer.bw > wgEncodeCrgMapabilityAlign50mer.bw
./bigWigToBedGraph wgEncodeCrgMapabilityAlign50mer.bw wgEncodeCrgMapabilityAlign50mer.bedgraph
awk ' if ($4 >= 1) print $0 ' wgEncodeCrgMapabilityAlign50mer.bedgraph > wgEncodeCrgMapabilityAlign50mer_UNIQUEregions.bedgraph
Used as 'RegionsToKeep' within the QCfilter function so that variants NOT overlapping these regions will be removed.
A GRanges object of 9831690 ranges.
These regions are not applicable to longer reads (> 50bp).
Ines de Santiago [email protected]
Downloaded from http://hgdownload.cse.ucsc.edu/gbdb/hg19/bbi/wgEncodeCrgMapabilityAlign50mer.bw.