Title: | Chip-seq Signal Quantifier Pipeline |
---|---|
Description: | This package is desgined to perform statistical analysis to identify statistically significant differentially bound regions between multiple groups of ChIP-seq dataset. |
Authors: | Ashwath Kumar [aut], Michael Y Hu [aut], Yajun Mei [aut], Yuhong Fan [aut] |
Maintainer: | Fan Lab at Georgia Institute of Technology <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.19.0 |
Built: | 2024-11-21 06:08:17 UTC |
Source: | https://github.com/bioc/CSSQ |
This function quantifies each each region for a sample and performs background correction and normalization as instructed. Returns a vector of count information for the input regions.
ansTransform(countData, noNeg = TRUE, plotDataToPDF = FALSE)
ansTransform(countData, noNeg = TRUE, plotDataToPDF = FALSE)
countData |
A
|
noNeg |
A Logical parameter indicating how to deal with negative values. When TRUE (default), all negative values will be moved to 0 before transforming. When FALSE, the signs will be maintained while the transformation will be applied to the absolute value. (default: TRUE) |
plotDataToPDF |
A logical parameter indicating whether to make plots of the data distribution to a separate PDF file for each sample. When TRUE, a histogram will be plotted for the data before and after transformation. When FALSE, no plots will be made. (default: FALSE) |
A
RangedSummarizedExperiment-class
object
containing the anscombe transformed count data as the assay.
exRange <- GRanges(seqnames=c("chr1","chr2","chr3","chr4"), ranges=IRanges(start=c(1000,2000,3000,4000),end=c(1500,2500,3500,4500))) sampleInfo <- read.table(system.file("extdata", "sample_info.txt", package="CSSQ",mustWork = TRUE),sep="\t",header=TRUE) exCount <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),nrow=4,ncol=4) exData <- SummarizedExperiment(assays = list(countData=exCount), rowRanges=exRange,colData=sampleInfo) ansExData <- ansTransform(exData) assays(ansExData)$ansCount
exRange <- GRanges(seqnames=c("chr1","chr2","chr3","chr4"), ranges=IRanges(start=c(1000,2000,3000,4000),end=c(1500,2500,3500,4500))) sampleInfo <- read.table(system.file("extdata", "sample_info.txt", package="CSSQ",mustWork = TRUE),sep="\t",header=TRUE) exCount <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),nrow=4,ncol=4) exData <- SummarizedExperiment(assays = list(countData=exCount), rowRanges=exRange,colData=sampleInfo) ansExData <- ansTransform(exData) assays(ansExData)$ansCount
This calculates the adjusted P-values for the regions using column permutation and Benjamini Hochberg correction methods.
calculatePvalue(trueTstat, compare_tstats)
calculatePvalue(trueTstat, compare_tstats)
trueTstat |
The T-statistics value calculated using
|
compare_tstats |
The T-statistics value calculated using
|
A vector that is the adjusted P-value for the intended comparison.
DBAnalyze
which calls this function
This calculates the modified T-statistics for the given comparison.
calculateTvalue(preprocessedData, label, comparison, numSamples)
calculateTvalue(preprocessedData, label, comparison, numSamples)
preprocessedData |
A
|
label |
A vector containing the labels to use for the samples in preprocessedData. |
comparison |
A vector containing the comparison to be made. Names here need to correspond to the sample groups in the sample file (Eg. c("G1",G2") means the comparison G1/G2). |
numSamples |
Number of samples in the dataset. |
A vector that is the modified T-statistics for the comparison and labels given.
DBAnalyze
which calls this function
This is a wrapper function that performs the different parts of
differential binding analysis. Returns a
GRanges-class
with a calculated P-value and
Fold change for each region.
DBAnalyze(preprocessedData, comparison)
DBAnalyze(preprocessedData, comparison)
preprocessedData |
A
|
comparison |
A vector containing the comparison to be made. Names here need to correspond to the sample groups in the sample file (Eg. c("G1",G2") means the comparison G1/G2). |
A GRanges-class
object
containing the regions along with their P-values and Fold change for the
comparison.
exRange <- GRanges(seqnames=c("chr1","chr2","chr3","chr4"), ranges=IRanges(start=c(1000,2000,3000,4000),end=c(1500,2500,3500,4500))) sampleInfo <- read.table(system.file("extdata", "sample_info.txt", package="CSSQ",mustWork = TRUE),sep="\t",header=TRUE) exCount <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),nrow=4,ncol=4) exData <- SummarizedExperiment(assays = list(ansCount=exCount), rowRanges=exRange,colData=sampleInfo) normExData <- normalizeData(exData,numClusters=2) res <- DBAnalyze(normExData,comparison=c("HSMM","HESC")) res
exRange <- GRanges(seqnames=c("chr1","chr2","chr3","chr4"), ranges=IRanges(start=c(1000,2000,3000,4000),end=c(1500,2500,3500,4500))) sampleInfo <- read.table(system.file("extdata", "sample_info.txt", package="CSSQ",mustWork = TRUE),sep="\t",header=TRUE) exCount <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),nrow=4,ncol=4) exData <- SummarizedExperiment(assays = list(ansCount=exCount), rowRanges=exRange,colData=sampleInfo) normExData <- normalizeData(exData,numClusters=2) res <- DBAnalyze(normExData,comparison=c("HSMM","HESC")) res
This function quantifies each each region for a sample and performs background correction and normalization as instructed. Returns a vector of count information for the input regions.
getBgSubVal( analysisInfo, sampleIndex, normalizeReadDepth = TRUE, normalizeLength = FALSE, backgroundSubtract = TRUE, countMode = "Union", ignore.strand = TRUE, inter.feature = FALSE )
getBgSubVal( analysisInfo, sampleIndex, normalizeReadDepth = TRUE, normalizeLength = FALSE, backgroundSubtract = TRUE, countMode = "Union", ignore.strand = TRUE, inter.feature = FALSE )
analysisInfo |
A
|
sampleIndex |
Index of the sample to process. |
normalizeReadDepth |
Logical indicating if count data should be normalized for library sequencing depth. When TRUE (default), counts will be normalized for sequencing depth for each library. When FALSE, no such normalization will be performed and raw counts will be used. (default: TRUE) |
normalizeLength |
Logical indicating if count data should be normalized to the length of the regions. When TRUE, count data will be normalized for the length of the region being analyzed. When FALSE (default), no such normalization will be performed. (default: FALSE) |
backgroundSubtract |
Logical indicating if background correction should be performed. When TRUE (default), background subtraction will be performed after length and depth normalization if applicable. When FALSE, no background subtraction will be performed. (default: TRUE) |
countMode |
Count method passed on to
summarizeOverlaps from GenomicAlignments package.
( |
ignore.strand |
A logical indicating if strand should be considered when matching. (default: TRUE) Passed on to summarizeOverlaps from GenomicAlignments package. |
inter.feature |
A logical indicating if the 'r countMode' should be aware of overlapping features. When TRUE, reads mapping to multiple features are dropped (i.e., not counted). When FALSE (default), these reads are retained and a count is assigned to each feature they map to. Passed on to summarizeOverlaps from GenomicAlignments package. (default: FALSE) |
A vector containing the counts for all the regions.
getRegionCounts
which calls this function
regionBed <- read.table(system.file("extdata", "chr19_regions.bed", package="CSSQ",mustWork = TRUE)) sampleInfo <- read.table(system.file("extdata", "sample_info.txt", package="CSSQ",mustWork = TRUE),sep="\t",header=TRUE) sampleInfo[,3] <- sapply(sampleInfo[,3], function(x) system.file("extdata", x, package="CSSQ")) sampleInfo[,5] <- sapply(sampleInfo[,5], function(x) system.file("extdata", x, package="CSSQ")) regionRange <- GRanges(seqnames=regionBed$V1, ranges=IRanges(start=regionBed$V2,end=regionBed$V3)) analysisInfo <- SummarizedExperiment(rowRanges=regionRange, colData=sampleInfo) NormbgSubCounts <- data.frame(sapply(c(1:nrow(colData(analysisInfo))), function(x) getBgSubVal(analysisInfo,sampleIndex = x,backgroundSubtract=TRUE, normalizeReadDepth=TRUE,normalizeLength=FALSE,countMode="Union", ignore.strand=TRUE,inter.feature=FALSE))) NormbgSubCounts
regionBed <- read.table(system.file("extdata", "chr19_regions.bed", package="CSSQ",mustWork = TRUE)) sampleInfo <- read.table(system.file("extdata", "sample_info.txt", package="CSSQ",mustWork = TRUE),sep="\t",header=TRUE) sampleInfo[,3] <- sapply(sampleInfo[,3], function(x) system.file("extdata", x, package="CSSQ")) sampleInfo[,5] <- sapply(sampleInfo[,5], function(x) system.file("extdata", x, package="CSSQ")) regionRange <- GRanges(seqnames=regionBed$V1, ranges=IRanges(start=regionBed$V2,end=regionBed$V3)) analysisInfo <- SummarizedExperiment(rowRanges=regionRange, colData=sampleInfo) NormbgSubCounts <- data.frame(sapply(c(1:nrow(colData(analysisInfo))), function(x) getBgSubVal(analysisInfo,sampleIndex = x,backgroundSubtract=TRUE, normalizeReadDepth=TRUE,normalizeLength=FALSE,countMode="Union", ignore.strand=TRUE,inter.feature=FALSE))) NormbgSubCounts
This function creates a data frame of all possbile combinations of sample
labels.
This information is utilized by calculatePvalue
to calculate
the P-value using column permutation method.
getComparisons(trueLabel, comparison, numSamples)
getComparisons(trueLabel, comparison, numSamples)
trueLabel |
The true labels for the samples. |
comparison |
A vector containing the comparison to be made. Names here need to correspond to the sample groups in the sample file (Eg. c("G1",G2") means the comparison G1/G2). |
numSamples |
Number of samples in the dataset. |
A data frame with possible combinations of samples other the true intended comparison.
DBAnalyze
which calls this function and
getNewLabels
which this function calls
getComparisons
This function labels the samples according the combinations generated by
getComparisons
.
getNewLabels(trueLabel, comparison, numSamples, combns, index)
getNewLabels(trueLabel, comparison, numSamples, combns, index)
trueLabel |
The true labels for the samples. |
comparison |
A vector containing the comparison to be made. Names here need to correspond to the sample groups in the sample file (Eg. c("G1",G2") means the comparison G1/G2). |
numSamples |
Number of samples in the dataset. |
combns |
Possible combinations of sample index generated in
|
index |
index of the combination to use for labeling. |
A vector with labels.
getComparisons
which calls this function
The input is the set of regions and the sample information. It will
calculate the number of reads falling in each region for each sample.
Returns a
RangedSummarizedExperiment-class
object with regions, sample informationa and counts for all samples.
getRegionCounts( regionBed, sampleInfo, sampleDir = ".", backgroundSubtract = TRUE, ... )
getRegionCounts( regionBed, sampleInfo, sampleDir = ".", backgroundSubtract = TRUE, ... )
regionBed |
A bed file containing the list of regions that are being analyzed. |
sampleInfo |
Object from |
sampleDir |
Location of the input sample files in 'sampleInfo' file. (default: ".") |
backgroundSubtract |
Logical indicating if background correction should be performed. (default: TRUE) |
... |
Additional arguments passed on to |
RangedSummarizedExperiment-class
containing the regions,
sample information and counts for all samples.
getBgSubVal
which this function calls.
sampleInfo <- read.table(system.file("extdata", "sample_info.txt", package="CSSQ",mustWork = TRUE),sep="\t",header=TRUE) countData <- getRegionCounts(system.file("extdata", "chr19_regions.bed", package="CSSQ"),sampleInfo, sampleDir = system.file("extdata", package="CSSQ")) countData head(assays(countData)$countData) colData(countData) rowRanges(countData)
sampleInfo <- read.table(system.file("extdata", "sample_info.txt", package="CSSQ",mustWork = TRUE),sep="\t",header=TRUE) countData <- getRegionCounts(system.file("extdata", "chr19_regions.bed", package="CSSQ"),sampleInfo, sampleDir = system.file("extdata", package="CSSQ")) countData head(assays(countData)$countData) colData(countData) rowRanges(countData)
This function performs normalization on the anscombe transformed data by clustering them using k-means algorithmn and utilizing the information from clusters. It returns an DataFrame object normalized counts, cluster information and the variance of that cluster for that sample.
kmeansNormalize(ansDataVec, numClusters = 4)
kmeansNormalize(ansDataVec, numClusters = 4)
ansDataVec |
Anscombe transformed count data for a sample. |
numClusters |
A number indicating the number of clusters to use for k-means clustering. (default: 4) |
DataFrame containing the normalized counts, cluster information and the variance of the cluster in the sample.
normalizeData
which iterates over this function.
exCount <- c(1,2,3,4,5,6,7,8,9,10) kmeansEx <- kmeansNormalize(exCount,numClusters=2) kmeansEx
exCount <- c(1,2,3,4,5,6,7,8,9,10) kmeansEx <- kmeansNormalize(exCount,numClusters=2) kmeansEx
It converts input count file and a bed file regions
into a RangedSummarizedExperiment-class
object.
loadCountData(countFile, regionBed, sampleInfo)
loadCountData(countFile, regionBed, sampleInfo)
countFile |
A path to file containing the count data for the dataset. This should be a tab separated file sample names as header. |
regionBed |
A bed file containing the list of regions that are being analyzed. |
sampleInfo |
Object from |
RangedSummarizedExperiment-class
object containing
the region information,
sample information and the count data.
countData <- loadCountData(system.file("extdata", "sample_count_data.txt", package="CSSQ",mustWork = TRUE),system.file("extdata", "chr19_regions.bed", package="CSSQ"), read.table(system.file("extdata", "sample_info.txt", package="CSSQ", mustWork = TRUE), sep="\t",header=TRUE)) countData
countData <- loadCountData(system.file("extdata", "sample_count_data.txt", package="CSSQ",mustWork = TRUE),system.file("extdata", "chr19_regions.bed", package="CSSQ"), read.table(system.file("extdata", "sample_info.txt", package="CSSQ", mustWork = TRUE), sep="\t",header=TRUE)) countData
This function iterates over kmeansNormalize
to perform
normalization for all samples in the dataset. It returns an
RangedSummarizedExperiment-class
object normalized counts, cluster information and the variance of that
cluster for that sample.
normalizeData(ansData, numClusters = 4)
normalizeData(ansData, numClusters = 4)
ansData |
|
numClusters |
A number indicating the number of clusters to use for k-means clustering. (default: 4) |
RangedSummarizedExperiment-class
containing the
normalized counts, cluster information and the variance of the cluster in
the sample.
kmeansNormalize
which this function calls.
exRange <- GRanges(seqnames=c("chr1","chr2","chr3","chr4"), ranges=IRanges(start=c(1000,2000,3000,4000),end=c(1500,2500,3500,4500))) sampleInfo <- read.table(system.file("extdata", "sample_info.txt", package="CSSQ",mustWork = TRUE),sep="\t",header=TRUE) exCount <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),nrow=4,ncol=4) exData <- SummarizedExperiment(assays = list(ansCount=exCount), rowRanges=exRange,colData=sampleInfo) normExData <- normalizeData(exData,numClusters=2) assays(normExData)$normCount
exRange <- GRanges(seqnames=c("chr1","chr2","chr3","chr4"), ranges=IRanges(start=c(1000,2000,3000,4000),end=c(1500,2500,3500,4500))) sampleInfo <- read.table(system.file("extdata", "sample_info.txt", package="CSSQ",mustWork = TRUE),sep="\t",header=TRUE) exCount <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),nrow=4,ncol=4) exData <- SummarizedExperiment(assays = list(ansCount=exCount), rowRanges=exRange,colData=sampleInfo) normExData <- normalizeData(exData,numClusters=2) assays(normExData)$normCount
This function is to plot data distribution histogram before and after anscombe transformation.
plotDist(countData, ansCount, sampleName, plotDataToPDF = FALSE)
plotDist(countData, ansCount, sampleName, plotDataToPDF = FALSE)
countData |
A
|
ansCount |
A
|
sampleName |
Name of the sample being plotted. |
plotDataToPDF |
A logical parameter indicating whether to make plots of the data distribution to a separate PDF file for each sample. When TRUE, a histogram will be plotted for the data before and after transformation. When FALSE, no plots will be made. (default: FALSE) |
A list of the histogram of the count data before and after anscombe transformation if plotDataToPDF == FALSE. None if plotDataToPDF == TRUE.
exRange <- GRanges(seqnames=c("chr1","chr2","chr3","chr4"), ranges=IRanges(start=c(1000,2000,3000,4000),end=c(1500,2500,3500,4500))) sampleInfo <- read.table(system.file("extdata", "sample_info.txt", package="CSSQ",mustWork = TRUE),sep="\t",header=TRUE) exCount <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),nrow=4,ncol=4) exData <- SummarizedExperiment(assays = list(countData=exCount), rowRanges=exRange,colData=sampleInfo) ansExData <- ansTransform(exData) plotEx <- plotDist(exData,ansExData,"HESC_R1") plotEx[[1]]
exRange <- GRanges(seqnames=c("chr1","chr2","chr3","chr4"), ranges=IRanges(start=c(1000,2000,3000,4000),end=c(1500,2500,3500,4500))) sampleInfo <- read.table(system.file("extdata", "sample_info.txt", package="CSSQ",mustWork = TRUE),sep="\t",header=TRUE) exCount <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),nrow=4,ncol=4) exData <- SummarizedExperiment(assays = list(countData=exCount), rowRanges=exRange,colData=sampleInfo) ansExData <- ansTransform(exData) plotEx <- plotDist(exData,ansExData,"HESC_R1") plotEx[[1]]
This is a wrapper function that calls the functions to preprocess the data.
It results in a
RangedSummarizedExperiment-class
object
normalized counts and meta data that can be used by
DBAnalyze
.
preprocessData( inputRegions, sampleInfoFile, sampleDir = ".", inputCountData, numClusters = 4, noNeg = TRUE, plotDataToPDF = FALSE, ... )
preprocessData( inputRegions, sampleInfoFile, sampleDir = ".", inputCountData, numClusters = 4, noNeg = TRUE, plotDataToPDF = FALSE, ... )
inputRegions |
A bed file the regions to analyze. |
sampleInfoFile |
A tab separated file all sample information. The following are the columns that are present in the file. * Sample Name : Names for the samples. * Group : The group the sample belongs. * IP : The name of the sample bam file. * IP_aligned_reads : The number of aligned reads in the sample. This is used in depth normalization process. * IN : The name of the sample's control bam file. * IN_aligned_reads : The number of aligned reads in the control file. This is used in depth normalization process. |
sampleDir |
Location of the input sample files in 'sampleInfoFile' file. (default: ".") Name,Group/Label,IP bam location,IP number of reads,IN bam location, IN number of reads). |
inputCountData |
The path to the file count data. This parameter is used when directly loading count data from a file. This should be a tab separated file sample names as header. |
numClusters |
A numerical parameter indicating the number of clusters
to use in the normalization step. Passed on to |
noNeg |
A logical parameter indicating how to deal negative
values. It is passed to |
plotDataToPDF |
A logical parameter indicating whether to make plots of the
data distribution to a separate PDF file for each sample.
It is passed on to passed to |
... |
Additional arguments passed on to |
RangedSummarizedExperiment-class
containing the normalized counts, cluster information, the variance of the
cluster in the sample and metadata.
getRegionCounts
, ansTransform
and
normalizeData
which this function calls
processedData <- preprocessData(system.file("extdata", "chr19_regions.bed", package="CSSQ"),system.file("extdata", "sample_info.txt", package="CSSQ"), sampleDir = system.file("extdata", package="CSSQ"), numClusters=4,noNeg=TRUE,plotDataToPDF=FALSE) processedData
processedData <- preprocessData(system.file("extdata", "chr19_regions.bed", package="CSSQ"),system.file("extdata", "sample_info.txt", package="CSSQ"), sampleDir = system.file("extdata", package="CSSQ"), numClusters=4,noNeg=TRUE,plotDataToPDF=FALSE) processedData