Title: | Quantitative DNA Sequencing for Chromosomal Aberrations |
---|---|
Description: | Quantitative DNA sequencing for chromosomal aberrations. The genome is divided into non-overlapping fixed-sized bins, number of sequence reads in each counted, adjusted with a simultaneous two-dimensional loess correction for sequence mappability and GC content, and filtered to remove spurious regions in the genome. Downstream steps of segmentation and calling are also implemented via packages DNAcopy and CGHcall, respectively. |
Authors: | Ilari Scheinin [aut], Daoud Sie [aut, cre], Henrik Bengtsson [aut], Erik van Dijk [ctb] |
Maintainer: | Daoud Sie <[email protected]> |
License: | GPL |
Version: | 1.43.0 |
Built: | 2024-11-27 04:58:04 UTC |
Source: | https://github.com/bioc/QDNAseq |
Quantitative DNA sequencing for chromosomal aberrations. The genome is divided into non-overlapping fixed-sized bins, number of sequence reads in each counted, adjusted with a simultaneous two-dimensional loess correction for sequence mappability and GC content, and filtered to remove spurious regions in the genome. Downstream steps of segmentation and calling are also implemented via packages DNAcopy and CGHcall, respectively.
A package to detect chromosomal aberrations from whole-genome sequencing
data. QDNAseqReadCounts
and QDNAseqCopyNumbers
classes are
used as the main data structures.
Whenever using this package, please cite: Scheinin I, Sie D, Bengtsson H, van de Wiel MA, Olshen AB, van Thuijl HF, van Essen HF, Eijk PP, Rustenburg F, Meijer GA, Reijneveld JC, Wesseling P, Pinkel D, Albertson DG, Ylstra B (2014). "DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly." _Genome Research_, *24*, 2022-2032.
This package is licensed under GPL.
Ilari Scheinin
Adds phenotype data from a file to a QDNAseqReadCounts or a QDNAseqCopyNumbers object.
addPhenodata(object, phenofile)
addPhenodata(object, phenofile)
object |
A |
phenofile |
A file name with phenotypic data for samples in
|
Returns a QDNAseqReadCounts
or QDNAseqCopyNumbers
object
with phenotype data added.
Ilari Scheinin
data(LGG150) readCounts <- LGG150 ## Not run: readCounts <- addPhenodata(readCounts, "phenodata.txt") ## End(Not run)
data(LGG150) readCounts <- LGG150 ## Not run: readCounts <- addPhenodata(readCounts, "phenodata.txt") ## End(Not run)
Adjusts the filtering on which bins are used.
applyFilters(object, residual=TRUE, blacklist=TRUE, mappability=NA, bases=NA, chromosomes=c("X", "Y"), verbose=getOption("QDNAseq::verbose", TRUE))
applyFilters(object, residual=TRUE, blacklist=TRUE, mappability=NA, bases=NA, chromosomes=c("X", "Y"), verbose=getOption("QDNAseq::verbose", TRUE))
object |
A |
residual |
Either a |
blacklist |
Either a |
mappability |
A |
bases |
A |
chromosomes |
A |
verbose |
If |
Returns a QDNAseqReadCounts
object with updated filtering.
Ilari Scheinin
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts)
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts)
Calculate binned read counts from a set of BAM files.
binReadCounts(bins, bamfiles=NULL, path=NULL, ext="bam", bamnames=NULL, phenofile=NULL, chunkSize=NULL, cache=getOption("QDNAseq::cache", FALSE), force=!cache, isPaired=NA, isProperPair=NA, isUnmappedQuery=FALSE, hasUnmappedMate=NA, isMinusStrand=NA, isMateMinusStrand=NA, isFirstMateRead=NA, isSecondMateRead=NA, isSecondaryAlignment=NA, isNotPassingQualityControls=FALSE, isDuplicate=FALSE, minMapq=37, pairedEnds=NULL, verbose=getOption("QDNAseq::verbose", TRUE))
binReadCounts(bins, bamfiles=NULL, path=NULL, ext="bam", bamnames=NULL, phenofile=NULL, chunkSize=NULL, cache=getOption("QDNAseq::cache", FALSE), force=!cache, isPaired=NA, isProperPair=NA, isUnmappedQuery=FALSE, hasUnmappedMate=NA, isMinusStrand=NA, isMateMinusStrand=NA, isFirstMateRead=NA, isSecondMateRead=NA, isSecondaryAlignment=NA, isNotPassingQualityControls=FALSE, isDuplicate=FALSE, minMapq=37, pairedEnds=NULL, verbose=getOption("QDNAseq::verbose", TRUE))
bins |
A data.frame or an |
bamfiles |
A character vector of (BAM) file names. If NULL (default), all files with extension ext, are read from directory path. |
path |
If bamfiles is NULL, directory path to read input files from. Defaults to the current working directory. |
ext |
File name extension of input files to read, default is "bam". |
bamnames |
An optional character vector of sample names. Defaults to file names with extension ext removed. |
phenofile |
An optional character(1) specifying a file name for phenotype data. |
chunkSize |
An optional integer specifying the chunk size (nt) by which to process the bam file. |
cache |
Whether to read and write intermediate cache files, which speeds up subsequent analyses of the same files. Requires packages R.cache and digest (both available on CRAN) to be installed. Defaults to getOption("QDNAseq::cache", FALSE). |
force |
When using the cache, whether to force reading input data from the BAM files even when an intermediate cache file is present. |
isPaired |
A logical(1) indicating whether unpaired (FALSE), paired (TRUE), or any (NA, default) read should be returned. |
isProperPair |
A logical(1) indicating whether improperly paired (FALSE), properly paired (TRUE), or any (NA, default) read should be returned. A properly paired read is defined by the alignment algorithm and might, e.g., represent reads aligning to identical reference sequences and with a specified distance. |
isUnmappedQuery |
A logical(1) indicating whether unmapped (TRUE), mapped (FALSE, default), or any (NA) read should be returned. |
hasUnmappedMate |
A logical(1) indicating whether reads with mapped (FALSE), unmapped (TRUE), or any (NA, default) mate should be returned. |
isMinusStrand |
A logical(1) indicating whether reads aligned to the plus (FALSE), minus (TRUE), or any (NA, default) strand should be returned. |
isMateMinusStrand |
A logical(1) indicating whether mate reads aligned to the plus (FALSE), minus (TRUE), or any (NA, default) strand should be returned. |
isFirstMateRead |
A logical(1) indicating whether the first mate read should be returned (TRUE) or not (FALSE), or whether mate read number should be ignored (NA, default). |
isSecondMateRead |
A logical(1) indicating whether the second mate read should be returned (TRUE) or not (FALSE), or whether mate read number should be ignored (NA, default). |
isSecondaryAlignment |
A logical(1) indicating whether alignments that are primary (FALSE), are not primary (TRUE) or whose primary status does not matter (NA, default) should be returned. A non-primary alignment ("secondary alignment" in the SAM specification) might result when a read aligns to multiple locations. One alignment is designated as primary and has this flag set to FALSE; the remainder, for which this flag is TRUE, are designated by the aligner as secondary. |
isNotPassingQualityControls |
A logical(1) indicating whether reads passing quality controls (FALSE, default), reads not passing quality controls (TRUE), or any (NA) read should be returned. |
isDuplicate |
A logical(1) indicating that un-duplicated (FALSE, default), duplicated (TRUE), or any (NA) reads should be returned. 'Duplicated' reads may represent PCR or optical duplicates. |
minMapq |
If quality scores exists, the minimum quality score required in order to keep a read, otherwise all reads are kept. |
pairedEnds |
A boolean value or vector specifying whether the BAM files contain paired-end data or not. Only affects the calculation of the expected variance. |
verbose |
If |
Returns a QDNAseqReadCounts
object with assay data element
counts
containing the binned read counts as non-negative integer
s.
Ilari Scheinin, Daoud Sie
## Not run: # read all files from the current directory with names ending in .bam bins <- getBinAnnotations(15) readCounts <- binReadCounts(bins) ## End(Not run)
## Not run: # read all files from the current directory with names ending in .bam bins <- getBinAnnotations(15) readCounts <- binReadCounts(bins) ## End(Not run)
Call aberrations from segmented copy number data.
callBins(object, organism=c("human", "other"), method=c("CGHcall", "cutoff"), cutoffs=log2(c(deletion = 0.5, loss = 1.5, gain = 2.5, amplification = 10)/2), ...)
callBins(object, organism=c("human", "other"), method=c("CGHcall", "cutoff"), cutoffs=log2(c(deletion = 0.5, loss = 1.5, gain = 2.5, amplification = 10)/2), ...)
object |
An object of class QDNAseqCopyNumbers |
organism |
Either “human” or “other”, see manual page
for |
method |
Calling method to use. Options currently implemented are: “CGHcall” or “cutoff”. |
cutoffs |
When method=“cutoff”, a numeric vector of (log2-transformed) thresholds to use for calling. At least one positive and one negative value must be provided. The smallest positive value is used as the cutoff for calling gains, and the negative value closest to zero is used as the cutoff for losses. If a second positive value is provided, it is used as the cutoff for amplifications. And if a second negative value is provided, it is used as the cutoff for homozygous deletions. |
... |
Additional arguments passed to |
By default, chromosomal aberrations are called with CGHcall. It has been developed for the analysis of series of cancer samples, and uses a model that contains both gains and losses. If used on a single sample, or especially only on a subset of chromosomes, or especially on a single non-cancer sample, it may fail, but method “cutoff” can be used instead.
When using method “cutoff”, the default values assume a uniform cell population and correspond to thresholds of (assuming a diploid genome) 0.5, 1.5, 2.5, and 10 copies to distinguish between homozygous deletions, (hemizygous) losses, normal copy number, gains, and amplifications, respectively. When using with cancer samples, these values might require adjustments to account for tumor cell percentage.
Returns an object of class QDNAseqCopyNumbers
with calling
results added.
Ilari Scheinin
Internally, CGHcall
and ExpandCGHcall
of
the CGHcall package are used when method=“CGHcall”.
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) copyNumbersSegmented <- segmentBins(copyNumbersSmooth) copyNumbersSegmented <- normalizeSegmentedBins(copyNumbersSegmented) copyNumbersCalled <- callBins(copyNumbersSegmented)
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) copyNumbersSegmented <- segmentBins(copyNumbersSmooth) copyNumbersSegmented <- normalizeSegmentedBins(copyNumbersSegmented) copyNumbersCalled <- callBins(copyNumbersSegmented)
Divide binned read counts with those of reference samples.
compareToReference(object, references, force=FALSE)
compareToReference(object, references, force=FALSE)
object |
An object of class |
references |
A numeric vector of indexes of the reference sample.
Must be the same length as there are samples in object. When |
force |
Whether to force the operation even when downstream data will be lost. |
Returns a QDNAseqCopyNumbers
object with the desired samples
divided by the signal of their reference samples.
Ilari Scheinin
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) # Note: the following command will compare the sample to itself, which # does not really make sense: tumorVsNormal <- compareToReference(copyNumbersSmooth, c(1))
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) # Note: the following command will compare the sample to itself, which # does not really make sense: tumorVsNormal <- compareToReference(copyNumbersSmooth, c(1))
Correct binned read counts for GC content and mappability.
correctBins(object, fit=NULL, method="ratio", adjustIncompletes=TRUE, ...)
correctBins(object, fit=NULL, method="ratio", adjustIncompletes=TRUE, ...)
object |
An |
fit |
An optional matrix of values to use for the correction. If
NULL (default), assay data |
method |
A character(1) string specifying the correction method.
|
adjustIncompletes |
A boolean(1) specifying whether |
... |
Additional arguments passed to |
Returns a QDNAseqCopyNumbers
object with assay data element
copynumber
.
Ilari Scheinin
Internally, loess
is used to fit the regression model.
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered)
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered)
Builds bin annotation data for a particular bin size.
createBins(bsgenome, binSize, ignoreMitochondria=TRUE, excludeSeqnames=NULL, verbose=getOption("QDNAseq::verbose", TRUE))
createBins(bsgenome, binSize, ignoreMitochondria=TRUE, excludeSeqnames=NULL, verbose=getOption("QDNAseq::verbose", TRUE))
bsgenome |
A BSgenome package. |
binSize |
A |
ignoreMitochondria |
Whether to ignore the mitochondria, defined as chromosomes named 'chrM', 'chrMT', 'M', or 'MT'. |
excludeSeqnames |
Character vector of seqnames which should be ignored. |
verbose |
If |
Returns a data.frame
with columns chromosome, start, end, bases,
and gc
, which correspond to the chromosome name, positions of the first
and last base pair in the bin, the percentage of characterized nucleotides
(A, C, G, or T, i.e. non-N), and GC content (percentage of C and G
nucleotides of non-N nucleotides).
The future is used parallelize the following functions:
createBins()
- parallelizes binned GC content across chromosomes
calculateBlacklist()
- parallelizes overlap counts across bins)
Ilari Scheinin
## Not run: # NOTE: These take a very long time to run. library(BSgenome.Hsapiens.UCSC.hg19) bins <- createBins(BSgenome.Hsapiens.UCSC.hg19, 15) bins$mappability <- calculateMappability(bins, bigWigFile='/path/to/wgEncodeCrgMapabilityAlign50mer.bigWig', bigWigAverageOverBed='/path/to/bigWigAverageOverBed') bins$blacklist <- calculateBlacklist(bins, bedFiles=c('/path/to/wgEncodeDacMapabilityConsensusExcludable.bed', '/path/to/wgEncodeDukeMapabilityRegionsExcludable.bed')) bins$residual <- iterateResiduals(readCountsG1K) ## End(Not run)
## Not run: # NOTE: These take a very long time to run. library(BSgenome.Hsapiens.UCSC.hg19) bins <- createBins(BSgenome.Hsapiens.UCSC.hg19, 15) bins$mappability <- calculateMappability(bins, bigWigFile='/path/to/wgEncodeCrgMapabilityAlign50mer.bigWig', bigWigAverageOverBed='/path/to/bigWigAverageOverBed') bins$blacklist <- calculateBlacklist(bins, bedFiles=c('/path/to/wgEncodeDacMapabilityConsensusExcludable.bed', '/path/to/wgEncodeDukeMapabilityRegionsExcludable.bed')) bins$residual <- iterateResiduals(readCountsG1K) ## End(Not run)
Estimate correction to read counts for GC content and mappability.
estimateCorrection(object, span=0.65, family="symmetric", adjustIncompletes=TRUE, maxIter=1, cutoff=4, variables=c("gc", "mappability"), ...)
estimateCorrection(object, span=0.65, family="symmetric", adjustIncompletes=TRUE, maxIter=1, cutoff=4, variables=c("gc", "mappability"), ...)
object |
An |
span |
For |
family |
For |
adjustIncompletes |
A boolean(1) specifying whether |
maxIter |
An integer(1) specifying the maximum number of iterations
to perform, default is 1. If larger, after the first loess fit, bins
with median residuals larger than |
cutoff |
A numeric(1) specifying the number of standard deviations
(as estimated with |
variables |
A character vector specifying which variables to include
in the correction. Can be |
... |
Additional arguments passed to |
Returns a QDNAseqReadCounts
object with the assay data element
fit
added.
This function uses future to calculate the QDNAseq model across samples in parallel.
Ilari Scheinin
Internally, loess
is used to fit the regression model.
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered)
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered)
Exports to a file.
exportBins(object, file, format=c("tsv", "igv", "bed", "vcf", "seg"), type=c("copynumber", "segments", "calls"), filter=TRUE, logTransform=TRUE, digits=3, chromosomeReplacements=c(`23` = "X", `24` = "Y", `25` = "MT"), ...)
exportBins(object, file, format=c("tsv", "igv", "bed", "vcf", "seg"), type=c("copynumber", "segments", "calls"), filter=TRUE, logTransform=TRUE, digits=3, chromosomeReplacements=c(`23` = "X", `24` = "Y", `25` = "MT"), ...)
object |
A |
file |
Filename. For formats that support only one sample per file, such as BED, '%s' can be used as a placeholder for sample name or '%d' for sample number. |
format |
Format to export in. Currently supported ones are "tsv" (tab separated values), "igv" (Integrative Genomics Viewer), and "bed" (BED file format). |
type |
Type of data to export, options are "copynumber" (corrected or uncorrected read counts), "segments", or "calls". |
filter |
If |
logTransform |
If |
digits |
The number of digits to round to. If not |
chromosomeReplacements |
A named character vector of chromosome name
replacements to be done. Only used when |
... |
Additional arguments passed to |
Exports object
to a file.
Returns the pathnames of the files written.
Ilari Scheinin
## Not run: data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) exportBins(copyNumbersSmooth, file="LGG150.igv", format="igv") ## End(Not run)
## Not run: data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) exportBins(copyNumbersSmooth, file="LGG150.igv", format="igv") ## End(Not run)
Plot copy number aberration frequencies.
frequencyPlot(x, y, ...)
frequencyPlot(x, y, ...)
x |
A |
y |
missing |
... |
... |
Ilari Scheinin
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) copyNumbersSegmented <- segmentBins(copyNumbersSmooth) copyNumbersSegmented <- normalizeSegmentedBins(copyNumbersSegmented) copyNumbersCalled <- callBins(copyNumbersSegmented) frequencyPlot(copyNumbersCalled)
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) copyNumbersSegmented <- segmentBins(copyNumbersSmooth) copyNumbersSegmented <- normalizeSegmentedBins(copyNumbersSegmented) copyNumbersCalled <- callBins(copyNumbersSegmented) frequencyPlot(copyNumbersCalled)
Gets bin annotation data for a particular bin size.
getBinAnnotations(binSize, genome="hg19", type="SR50", path=getOption("QDNAseq::binAnnotationPath", NULL), verbose=getOption("QDNAseq::verbose", TRUE))
getBinAnnotations(binSize, genome="hg19", type="SR50", path=getOption("QDNAseq::binAnnotationPath", NULL), verbose=getOption("QDNAseq::verbose", TRUE))
binSize |
A |
genome |
A |
type |
A |
path |
A |
verbose |
If |
Gets bin annotation data for a particular bin size.
Returns a AnnotatedDataFrame
object.
Ilari Scheinin
createBins
().
## Not run: bins <- getBinAnnotations(15) ## End(Not run)
## Not run: bins <- getBinAnnotations(15) ## End(Not run)
Highlights data points in a plotted profile to evaluate filtering.
highlightFilters(object, col="red", residual=NA, blacklist=NA, mappability=NA, bases=NA, type="union", ...)
highlightFilters(object, col="red", residual=NA, blacklist=NA, mappability=NA, bases=NA, type="union", ...)
object |
A |
col |
The color used for highlighting. |
residual |
Either a |
blacklist |
Either a |
mappability |
A |
bases |
A |
type |
When specifying multiple filters ( |
... |
Further arguments to |
Ilari Scheinin
data(LGG150) readCounts <- LGG150 plot(readCounts) highlightFilters(readCounts, residual=TRUE, blacklist=TRUE)
data(LGG150) readCounts <- LGG150 plot(readCounts) highlightFilters(readCounts, residual=TRUE, blacklist=TRUE)
Plot median read counts as a function of GC content and mappability.
isobarPlot(x, y, ...)
isobarPlot(x, y, ...)
x |
A |
y |
missing |
... |
... |
Ilari Scheinin
data(LGG150) readCounts <- LGG150 isobarPlot(readCounts)
data(LGG150) readCounts <- LGG150 isobarPlot(readCounts)
An example data set of read counts from chromosomes 7-10 of sample
LGG150, contained within a QDNAseqReadCounts
object
Ilari Scheinin
Constructs a 'cghRaw', 'cghSeg', or 'cghCall' object.
makeCgh(object, filter=TRUE, chromosomeReplacements=c(X = 23, Y = 24, MT = 25), ...)
makeCgh(object, filter=TRUE, chromosomeReplacements=c(X = 23, Y = 24, MT = 25), ...)
object |
A |
filter |
If |
chromosomeReplacements |
A named integer vector of chromosome name
replacements to be done. QDNAseq stores chromosome names as
characters, but CGHcall expects them to be integers. Defaults to
|
... |
Not used. |
Returns a cghRaw
if the object has not been segmented,
a cghSeg
if it has been segmented but not called,
or cghCall
if it has been called as well.
Ilari Scheinin
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) cgh <- makeCgh(copyNumbersSmooth)
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) cgh <- makeCgh(copyNumbersSmooth)
Plot noise as a function of sequence depth.
noisePlot(x, y, ...)
noisePlot(x, y, ...)
x |
A |
y |
missing |
... |
Ilari Scheinin
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) noisePlot(readCountsFiltered)
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) noisePlot(readCountsFiltered)
Normalizes binned read counts.
normalizeBins(object, method="median", force=FALSE, verbose=getOption("QDNAseq::verbose", TRUE))
normalizeBins(object, method="median", force=FALSE, verbose=getOption("QDNAseq::verbose", TRUE))
object |
A |
method |
A |
force |
Running this function will remove possible segmentation and
calling results. When they are present, running requires specifying
|
verbose |
If |
Returns a QDNAseqCopyNumbers
object with the assay data element
copynumber
scaled with the chosen method.
Ilari Scheinin
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers)
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers)
Normalize segmented bins.
normalizeSegmentedBins(object, inter=c(-0.1, 0.1), force=FALSE)
normalizeSegmentedBins(object, inter=c(-0.1, 0.1), force=FALSE)
object |
An object of class QDNAseqCopyNumbers. |
inter |
The interval in which the function should search for the normal level. |
force |
Whether to force execution when it causes removal of downstream calling results. |
This function recursively searches for the interval containing the most segmented data, decreasing the interval length in each recursion. The recursive search makes the post-segmentation normalization robust against local maxima. This function is particularly useful for profiles for which, after segmentation, the 0-level does not coincide with many segments. It is more or less harmless to other profiles. We advise to keep the search interval (inter) small, in particular at the positive (gain) side to avoid that the 0-level is set to a common gain level.
Returns an object of class QDNAseqCopyNumbers with re-normalized data.
Ilari Scheinin
Internally, postsegnormalize
of the CGHcall package
is used.
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) copyNumbersSegmented <- segmentBins(copyNumbersSmooth) copyNumbersSegmented <- normalizeSegmentedBins(copyNumbersSegmented)
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) copyNumbersSegmented <- segmentBins(copyNumbersSmooth) copyNumbersSegmented <- normalizeSegmentedBins(copyNumbersSegmented)
Plot copy number profile.
plot(x, y, ...)
plot(x, y, ...)
x |
A |
y |
missing |
... |
... |
Ilari Scheinin
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) plot(copyNumbers)
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) plot(copyNumbers)
Pools binned read counts across samples.
poolRuns(object, samples, force=FALSE)
poolRuns(object, samples, force=FALSE)
object |
A |
samples |
A character vector of new sample names. Samples with
identical names will be pooled together. Must be the same length as
there are samples in |
force |
Whether to force the operation even when downstream data will be lost. |
Returns a QDNAseqReadCounts
or QDNAseqCopyNumbers
object.
Ilari Scheinin
data(LGG150) readCounts <- LGG150 # Note: the following command will "pool" data from a single run, which # does not really make sense: pooledReadCounts <- poolRuns(readCounts, samples = "LGG150")
data(LGG150) readCounts <- LGG150 # Note: the following command will "pool" data from a single run, which # does not really make sense: pooledReadCounts <- poolRuns(readCounts, samples = "LGG150")
These functions are defunct and no longer available.
The following functions are defunct; use the replacement indicated below:
downloadBinAnnotations
: getBinAnnotations
Container for QDNAseq read count data
An object of this class contains the following elements:
copynumber
(numeric
) Corrected "count" signals in
An object with only this field is created by
correctBins
().
segmented
(numeric
; optional) Segmented data in
, added by calling
segmentBins
().
calls
(integer
; optional) Calls as -2=deletion,
-1=loss, 0=normal, 1=gain, 2=amplification, added by calling
callBins
().
probdloss
(numeric
; optional) Probabilities of
deletions in , added by calling
callBins
().
probloss
(numeric
; optional) Probabilities of losses in
, added by calling
callBins
().
probnorm
(numeric
; optional) Probabilities of normal
copy number in , added by calling
callBins
().
probgain
(numeric
; optional) Probabilities of gains in
, added by calling
callBins
().
probamp
(numeric
; optional) Probabilities of
amplifications in , added by calling
callBins
().
The bin data (assay data) may contain missing values.
Ilari Scheinin
Container for QDNAseq read count data
An object of this class contains (a subset) the following elements:
counts
(numeric
) Binned read counts as non-negative
integers in . An object with only this field is
created by
binReadCounts
().
fit
(numeric
; optional) Loess fit of "count" signals as
doubles. Normally, these should all be positive values, but a
small number of edge case bins might contain negatives, especially
when fitting unfiltered data. This element is added after calling
estimateCorrection
().
The bin data (assay data) may contain missing values.
Ilari Scheinin
A parent class for containers of QDNAseq data
Ilari Scheinin
Segments normalized copy number data.
segmentBins(object, smoothBy=FALSE, alpha=1e-10, undo.splits="sdundo", undo.SD=1, force=FALSE, transformFun="log2", ...)
segmentBins(object, smoothBy=FALSE, alpha=1e-10, undo.splits="sdundo", undo.SD=1, force=FALSE, transformFun="log2", ...)
object |
An object of class QDNAseqCopyNumbers. |
smoothBy |
An optional integer value to perform smoothing before
segmentation by taking the mean of every smoothBy bins, and then
segment those means. Default ( |
alpha |
Significance levels for the test to accept change-points. Default is 1e-10. |
undo.splits |
A character string specifying how change-points are to be undone, if at all. Default is "sdundo", which undoes splits that are not at least this many SDs apart. Other choices are "prune", which uses a sum of squares criterion, and "none". |
undo.SD |
The number of SDs between means to keep a split if undo.splits="sdundo". Default is 1.0. |
force |
Whether to force execution when it causes removal of downstream calling results. |
transformFun |
A function to transform the data with. This can be
the default "log2" for log2(x + .Machine$double.xmin),
"sqrt" for the Anscombe transform of sqrt(x * 3/8) which
stabilizes the variance, "none" for no transformation, or any
R function that performs the desired transformation and also its
inverse when called with parameter |
... |
Additional arguments passed to |
Returns an object of class QDNAseqCopyNumbers with segmentation results added.
This method make use of random number generation (RNG) via the
segment
used internally. Because of this, calling the
method with the same input data multiple times will each time give slightly
different results. To get numerically reproducible results, the random
seed must be fixed, e.g. by using 'set.seed()' at the top of the script.
This function uses future to segment samples in parallel.
Ilari Scheinin
[1] A.B. Olshen, E.S. Venkatraman (aka Venkatraman E. Seshan), R. Lucito
and M. Wigler, Circular binary segmentation for the analysis of
array-based DNA copy number data, Biostatistics, 2004
[2] E.S. Venkatraman and A.B. Olshen, A faster circular binary
segmentation algorithm for the analysis of array CGH data,
Bioinformatics, 2007
Internally, segment
of the DNAcopy package,
which implements the CBS method [1,2], is used to segment the data.
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) copyNumbersSegmented <- segmentBins(copyNumbersSmooth)
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized) copyNumbersSegmented <- segmentBins(copyNumbersSmooth)
Smooth outlier bins after normalization.
smoothOutlierBins(object, logTransform=TRUE, force=FALSE, ...)
smoothOutlierBins(object, logTransform=TRUE, force=FALSE, ...)
object |
A |
logTransform |
If |
force |
Running this function will remove possible segmentation and
calling results. When they are present, running requires specifying
|
... |
Additional arguments passed to |
Returns a QDNAseqCopyNumbers
object with the values for outliers
smoothed. See smooth.CNA
for more details. If
logTransform
is TRUE
, these signals are log2-transformed prior
to smoothing, but afterwards back-transformed..
Ilari Scheinin
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized)
data(LGG150) readCounts <- LGG150 readCountsFiltered <- applyFilters(readCounts) readCountsFiltered <- estimateCorrection(readCountsFiltered) copyNumbers <- correctBins(readCountsFiltered) copyNumbersNormalized <- normalizeBins(copyNumbers) copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized)