Title: | Differential Expressed Windows Based on Negative Binomial Distribution |
---|---|
Description: | DEWSeq is a sliding window approach for the analysis of differentially enriched binding regions eCLIP or iCLIP next generation sequencing data. |
Authors: | Sudeep Sahadevan [aut], Thomas Schwarzl [aut], bioinformatics team Hentze [aut, cre] |
Maintainer: | bioinformatics team Hentze <[email protected]> |
License: | LGPL (>= 3) |
Version: | 1.21.0 |
Built: | 2024-10-30 05:23:43 UTC |
Source: | https://github.com/bioc/DEWSeq |
create DESeq data object from sliding window counts, phenotype data and annotation data
DESeqDataSetFromSlidingWindows( countData, colData, annotObj, design, tidy = FALSE, ignoreRank = FALSE, start0based = TRUE )
DESeqDataSetFromSlidingWindows( countData, colData, annotObj, design, tidy = FALSE, ignoreRank = FALSE, start0based = TRUE )
countData |
|
colData |
|
annotObj |
|
design |
|
tidy |
|
ignoreRank |
|
start0based |
|
If annotObj
is a file name, the input file MUST be <TAB> separated, and supports reading in .gz files.
If annotObj
is a data.frame, colnames(annotObj)
MUST not be empty.
This function checks for the following columns after reading in the file or on data.frame:
chromosome
: chromosome name
unique_id
: unique id of the window, rownames(object)
must match this column
begin
: window start co-ordinate, see parameter start0based
end
: window end co-ordinate
strand
: strand
gene_id
: gene id
gene_name
: gene name
gene_type
: gene type annotation
gene_region
: gene region
Nr_of_region
: number of the current region
Total_nr_of_region
: total number of regions
window_number
: window number
This function creates a DESeqDataSet
using supplied countData, phenotype data
and annotation data. The chromosomal locations and annotations of the sliding windows
(parsed from annotObj
) can be accessed from the returned object using: rowRanges(object)
DESeq object
data("SLBP_K562_w50s20") slbpDat <- counts(SLBP_K562_w50s20) phenoDat <- DataFrame(conditions=as.factor(c(rep('IP',2),'SMI')), row.names = colnames(slbpDat)) phenoDat$conditions <- relevel(phenoDat$conditions,ref='SMI') annotDat <- as.data.frame(rowRanges(SLBP_K562_w50s20)) # by default chromsome column is 'seqnames' # and begin co-ordinate column is 'start' # rename these columns colnames(annotDat)[1:2] <- c('chromosome','begin') slbpDds <- DESeqDataSetFromSlidingWindows(countData = slbpDat, colData = phenoDat,annotObj = annotDat,design=~conditions)
data("SLBP_K562_w50s20") slbpDat <- counts(SLBP_K562_w50s20) phenoDat <- DataFrame(conditions=as.factor(c(rep('IP',2),'SMI')), row.names = colnames(slbpDat)) phenoDat$conditions <- relevel(phenoDat$conditions,ref='SMI') annotDat <- as.data.frame(rowRanges(SLBP_K562_w50s20)) # by default chromsome column is 'seqnames' # and begin co-ordinate column is 'start' # rename these columns colnames(annotDat)[1:2] <- c('chromosome','begin') slbpDds <- DESeqDataSetFromSlidingWindows(countData = slbpDat, colData = phenoDat,annotObj = annotDat,design=~conditions)
extract significant windows from output of
resultsDEWSeq
using the supplied padj and
log2FoldChange cut-offs and merge these significant windows
to regions and create the following columns for each
significant region:
padj_min
: min. padj value in the region
padj_mean
: average padj value in the region
padj_max
: max. padj value in the region
log2FoldChange_min
: min. log 2 fold change in the region
log2FoldChange_mean
: average log 2 fold change in the region
log2FoldChange_max
: max. log 2 fold change in the region
extractRegions( windowRes, padjCol = "padj", padjThresh = 0.05, log2FoldChangeCol = "log2FoldChange", log2FoldChangeThresh = 1, start0based = TRUE )
extractRegions( windowRes, padjCol = "padj", padjThresh = 0.05, log2FoldChangeCol = "log2FoldChange", log2FoldChangeThresh = 1, start0based = TRUE )
windowRes |
|
padjCol |
|
padjThresh |
|
log2FoldChangeCol |
|
log2FoldChangeThresh |
|
start0based |
|
The output data.frame from this function will have the following columns:
chromosome
: chromosome name
regionStartId
: unique_id
of the left most window,
where an enriched region begins
region_begin
: starting position of the enriched region
region_end
: ending position of the enriched region
strand
: strand info
windows_in_region
: total number of windows
that make up the enriched region
region_length
: length of the enrched region
gene_id
: gene id
gene_name
: gene name
gene_type
: gene type annotation
gene_region
: gene region
Nr_of_region
: number of the current region
Total_nr_of_region
: total number of regions
window_number
: window number
padj_min
: min. padj value in the region
padj_mean
: average padj value in the region
padj_max
: max. padj value in the region
log2FoldChange_min
: min. log 2 fold change in the region
log2FoldChange_max
: max. log 2 fold change in the region
log2FoldChange_mean
: average log 2 fold change in the region
data.frame
data("slbpWindows") # using default cut-off thresholds, # 'pSlidingWindows.adj' padj value columns slbpRegions <- extractRegions(slbpWindows, padjCol = 'pSlidingWindows.adj')
data("slbpWindows") # using default cut-off thresholds, # 'pSlidingWindows.adj' padj value columns slbpRegions <- extractRegions(slbpWindows, padjCol = 'pSlidingWindows.adj')
In addition to count data matrix, htseq-clip also creates a max count matrix.
For each window, this file contains the maximum crosslink site count (height) calculated
per nucleotide. This function uses this file to filter the count data file instead of the
default prefiltering on rowSums
. Windows failing the threshold rowSums(maxWindowCount>=countThresh)>=nSamples
will be removed from the object.
filterCounts(object, maxCountFile, countThresh, nsamples)
filterCounts(object, maxCountFile, countThresh, nsamples)
object |
|
maxCountFile |
|
countThresh |
|
nsamples |
|
DESeq object
This is a modified version of the
results
function from DESeq2 package.
This function uses chromosomal positions given in the rowRanges(dds)
to identify overlapping windows in dds
object. For each window,
the number of overlapping windows are counted, and the p-value is
adjusted for FWER using bonferroni correction.
For further details, please refer documentation for
results
function in DESeq2 package
resultsDEWSeq( object, contrast, name, listValues = c(1, -1), cooksCutoff, test, addMLE = FALSE, tidy = FALSE, parallel = FALSE, BPPARAM = bpparam(), minmu = 0.5, start0based = TRUE )
resultsDEWSeq( object, contrast, name, listValues = c(1, -1), cooksCutoff, test, addMLE = FALSE, tidy = FALSE, parallel = FALSE, BPPARAM = bpparam(), minmu = 0.5, start0based = TRUE )
object |
|
contrast |
|
name |
|
listValues |
|
cooksCutoff |
|
test |
|
addMLE |
|
tidy |
|
parallel |
|
BPPARAM |
|
minmu |
|
start0based |
|
For a detailed description of the column use mcols(output)$description
DESeqResults object
data("slbpDds") slbpDds <- estimateSizeFactors(slbpDds) slbpDds <- estimateDispersions(slbpDds) slbpDds <- nbinomWaldTest(slbpDds) slbpWindows <- resultsDEWSeq(slbpDds) ## Not run: # for a description of the columns in slbpWindows use mcols(slbpWindows)$description ## End(Not run)
data("slbpDds") slbpDds <- estimateSizeFactors(slbpDds) slbpDds <- estimateDispersions(slbpDds) slbpDds <- nbinomWaldTest(slbpDds) slbpWindows <- resultsDEWSeq(slbpDds) ## Not run: # for a description of the columns in slbpWindows use mcols(slbpWindows)$description ## End(Not run)
This is ENCODE eCLIP data which was quantified by htseq-clip in sliding-windows of max. length 50nt, the step size was 20. This is not ideal data for DEWSeq since it is lacking replicates, however was small enough for the inclusion of the package.
data(SLBP_K562_w50s20)
data(SLBP_K562_w50s20)
An object of class "DESeq"
;
data(SLBP_K562_w50s20) SLBP_K562_w50s20
data(SLBP_K562_w50s20) SLBP_K562_w50s20
This is a DESeq dataset object for ENCODE eCLIP data: SLBP in K562 cell lines
This is used as an example dataset for a runnable example. This dataset is the output
from running the example code for the function DESeqDataSetFromSlidingWindows
data(slbpDds)
data(slbpDds)
An object of class "DESeq"
;
data(slbpDds) slbpDds
data(slbpDds) slbpDds
This is a DESeq results object for ENCODE eCLIP data: SLBP in K562 cell lines
This is used as an example dataset for a runnable example. This dataset is the output
from running the example code for the function extractRegions
data(slbpRegions)
data(slbpRegions)
data.frame;
data(slbpRegions) head(slbpRegions)
data(slbpRegions) head(slbpRegions)
This is a DESeq normalized sliding window count matrix ENCODE eCLIP data: SLBP in K562 cell lines
This is used as an example dataset for a runnable example. This dataset is the output
from running the example code for the function vst
data(slbpVst)
data(slbpVst)
matrix;
data(slbpVst) head(slbpVst)
data(slbpVst) head(slbpVst)
This is a DESeq results object for ENCODE eCLIP data: SLBP in K562 cell lines
This is used as an example dataset for a runnable example. This dataset is the output
from running the example code for the function resultsDEWSeq
data(slbpWindows)
data(slbpWindows)
data.frame;
data(slbpWindows) head(slbpWindows)
data(slbpWindows) head(slbpWindows)
given output of extractRegions
, resultsDEWSeq
and significance thresholds,
extract significant windows, create regions by merging adjacent significant windows.
Finally, write the output as a BED file for visualization.
toBED( windowRes, regionRes, fileName, padjCol = "padj", padjThresh = 0.05, log2FoldChangeCol = "log2FoldChange", log2FoldChangeThresh = 1, trackName = "sliding windows", description = "sliding windows" )
toBED( windowRes, regionRes, fileName, padjCol = "padj", padjThresh = 0.05, log2FoldChangeCol = "log2FoldChange", log2FoldChangeThresh = 1, trackName = "sliding windows", description = "sliding windows" )
windowRes |
|
regionRes |
|
fileName |
|
padjCol |
|
padjThresh |
|
log2FoldChangeCol |
|
log2FoldChangeThresh |
|
trackName |
|
description |
|
write to file
data(slbpRegions) data(slbpWindows) outFile <- tempfile('SLBP_visualization.bed') # the results are written to a temp file in this example toBED(slbpWindows,slbpRegions,outFile,padjCol='pSlidingWindows.adj')
data(slbpRegions) data(slbpWindows) outFile <- tempfile('SLBP_visualization.bed') # the results are written to a temp file in this example toBED(slbpWindows,slbpRegions,outFile,padjCol='pSlidingWindows.adj')
given window resutls and normalized counts, combine significant overlapping windows into regions and for each region, pick two candidate winodws:
with highest log2FoldChange and
with highest normalized mean in treatment samples (see parameter treatmentCols
)
Return a data.frame with region information and stats, and for the selected windows, the following information:
unique_id
of the window
start and end co-ordinates
log2FoldChange
normalized mean expression in treatment and control samples and
individual normalized expression in replicates
topWindowStats( windowRes, padjCol = "padj", padjThresh = 0.05, log2FoldChangeCol = "log2FoldChange", log2FoldChangeThresh = 1, start0based = TRUE, normalizedCounts, treatmentCols, treatmentName = "treatment", controlName = "control", op = "max" )
topWindowStats( windowRes, padjCol = "padj", padjThresh = 0.05, log2FoldChangeCol = "log2FoldChange", log2FoldChangeThresh = 1, start0based = TRUE, normalizedCounts, treatmentCols, treatmentName = "treatment", controlName = "control", op = "max" )
windowRes |
|
padjCol |
|
padjThresh |
|
log2FoldChangeCol |
|
log2FoldChangeThresh |
|
start0based |
|
normalizedCounts |
|
treatmentCols |
|
treatmentName |
|
controlName |
|
op |
|
The output data.frame of this function has the following columns:
chromosome
: chromosome name
gene_id
: gene id
gene_name
: gene name
gene_region
: gene region
gene_type
: gene type annotation
regionStartId
: unique_id
of the left most window, where a enriched region begins
region_begin
: start position of the enriched region
region_end
: end position of the enriched region
region_length
: length of the enrched region
strand
: strand info
Nr_of_region
: number of the current region
Total_nr_of_region
: total number of regions
log2FoldChange_min
: min. log 2 fold change in the region
log2FoldChange_mean
: average log 2 fold change in the region
log2FoldChange_max
: max. log 2 fold change in the region
unique_id.log2FCWindow
: unique_id of the window with largest log2FoldChange
begin.log2FCWindow
: start position of the window with largest log2FoldChange
end.log2FCWindow
: end of the window with largest log2FoldChange
log2FoldChange.log2FCWindow
: log2FoldChange of the window with largest log2FoldChange
treatmentName.mean.log2FCWindow
: mean of the normalized expression of the treatment samples for log2FCWindow, names in treatmentCols
are used to calculate mean and treatmentName is from the parameter treatmentName
controlName.mean.log2FCWindow
: mean of the normalized expression of the control samples for log2FCWindow, colnames(normalizedCounts)
not found in treatmentCols
are used to calculate mean and controlName is from the parameter controlName
the next columns will be normalized expression values of the log2FCWindow from individual treatment and control samples.
unique_id.meanWindow
: unique_id of the window with largest mean in all treatment samples from treatmentCols
begin.meanWindow
: start position of the mean window
end.meanWindow
: end position of the mean window
log2FoldChange.meanWindow
:log2FoldChange of the mean window
treatmentName.mean.meanWindow
: mean of the normalized expression of the treatment samples for meanWindow, names in treatmentCols
are used to calculate mean and treatmentName is from the parameter treatmentName
controlName.mean.meanWindow
: mean of the normalized expression of the control samples for log2FCWindow, colnames(normalizedCounts)
not found in treatmentCols
are used to calculate mean and controlName is from the parameter controlName
the next columns will be normalized expression values of the meanWindow from individual treatment and control samples
data.frame
data(slbpWindows) data(slbpVst) slbpList <- topWindowStats(slbpWindows,padjCol = 'pSlidingWindows.adj', normalizedCounts = slbpVst, treatmentCols = c('IP1','IP2'), treatmentName = 'SLBP',controlName = 'SMI')
data(slbpWindows) data(slbpVst) slbpList <- topWindowStats(slbpWindows,padjCol = 'pSlidingWindows.adj', normalizedCounts = slbpVst, treatmentCols = c('IP1','IP2'), treatmentName = 'SLBP',controlName = 'SMI')