NEWS
PureCN 2.10.0
NEW FEATURES
- adjustLogRatio function for adjusting a tumor vs normal coverage
ratio for purity and ploidy. Useful for downstream tools that
expect ratios instead of absolute copy numbers such as GISTIC.
Thanks @tedtoal (#40).
SIGNIFICANT USER-VISIBLE CHANGES
- Provide interval-level likelihood scores in runAbsoluteCN return
object. Thanks @tinyheero (#335).
- Documentation updates. Thanks @ddrichel (#325).
BUGFIXES
- Bugfix #296 was not merged into the developer branch and did not make
it into 2.8.0.
- Log ratios not shiften to median of sample medians as intended (#356).
Thanks @sleyn.
- Fixed crash with small toy examples (fewer than 2000 baits, #363)
PureCN 2.8.0
SIGNIFICANT USER-VISIBLE CHANGES
- Make processMultipleSamples temporarily defunct because the
copynumber package was removed from Bioconductor
- Make it possible to specify saveRDS version to make output files
readable by old R versions prior to 3.6.0 (#255)
BUGFIXES
- Fixed an issue with callLOH and --model-homozygous (#254)
- Fixed crash when VCF contained NAs in base quality scores (#249)
- Fixed wrong check for outdir write permissions in Coverage.R and
NormalDB.R (#258)
- Fixed inverted return codes in couple of scripts (#284)
- Fixed an issue in callAlterations where the id argument was largely
ignored (#292)
- Fixed broken support for GenomicsDB-R from their developer branch
(#296)
- Fixed crash in plotAbs (#260)
- Fixed an issue with gene-level calls when annotation contained
non-official symbols found on multiple chromosomes (#298)
- Fixed a wrongly formatted error message when no germline database
information was found (#302)
- Fixed a crash when DB field in VCF only contains NAs(#301)
- Added --min-base-quality argument for PureCN.R (#320)
PureCN 2.2.0
NEW FEATURES
- Added chunks parameter to Coverage.R and calculateBamCoverageByInterval
to reduce memory usage (#218)
SIGNIFICANT USER-VISIBLE CHANGES
- When base quality scores are found in the VCF, they are now used to
calculate the minimum number of supporting reads (instead of assuming a
default BQ of 30). By default BQ is capped at 50 and variants below 25
are ignored. Set min.supporting.reads to 0 to turn this off (#206).
- More robust annotation of intervals with gene symbols
- Remove chromosomes not present in the centromeres GRanges object; useful
to remove altcontigs somehow present (should not happen with intervals
generated by IntervalFile.R)
BUGFIXES
- Fixed an issue with old R versions where factors were not converted to
strings, resulting in numbers instead of gene symbols
- Fix for a crash when there are no off-target reads in off-target regions
(#209).
- Fixed parsing of base quality scores in Mutect 2.2
- Fixed crash in GenomicsDB parsing when there were no variants in contig
(#225)
PureCN 2.0.0
NEW FEATURES
- Report median absolute pairwise difference (MAPD) of tumor vs normal log2
ratios in runAbsoluteCN
- Improved mapping bias estimates: variants with insufficient information
for position-specific fits (default 3-6 heterozygous variants)
are clustered and assigned to the most similar fit
- Make Cosmic.CNT INFO field name customizable
SIGNIFICANT USER-VISIBLE CHANGES
- Cleanup of naming of command line arguments (will throw lots of deprecated
warnings, but was long overdue)
- More robust alignment of on- and off-target tumor vs normal log2 ratios.
Ratios are shifted so that median difference of neighboring on/off-target
pairs is 0. This should fix spurious segments consisting of only on- or
off-target regions in high quality samples where those minor off-sets
sometimes exceeded the noise.
- Added min.variants argument to runAbsoluteCN
- Added PureCN version to runAbsoluteCN results object (ret$version)
- Addressed observed over-segmentations in very clean data:
- Do not attempt two-step segmentation in PSCBS when off-target noise is
still very small (< 0.15, min.logr.sdev in runAbsoluteCN)
- Increase automatically determined undo.SD in all segmentation functions
when noise is very small (< min.logr.sdev)
- min.logr.sdev is now accessible in PureCN.R via --min-logr-sdev
- Added pairwise sample distances to normalDB output object helpful for
finding noisy samples or batches in normal databases
- Do not error out readCurationFile when CSV is missing and directory
is not writable when re-generating it (#196)
- Add segmentation parameters as attributes to segmentation data.frame
- Added min.betafit.rho and max.betafit.rho to calculateMappingBias*
- Made --normal_panel in PureCN.R defunct
- Added GATK/Picard header with sequence lengths to interval file,
added readIntervalFile function to parse it
BUGFIXES
- Fix for crash when --normal_panel in NormalDB.R contained no variants
(#180).
- Fix for crash when rtracklayer failed to parse --infile in
FilterCallableLoci.R (#182)
- More robust parsing of VCF with missing GT field (#184)
- Fix for bug and crash when mapping bias RDS file contains variants with
multiple alt alleles (#184)
- Added missing dependency 'markdown'
- Fix for crash when only a small number of off-target intervals pass
filters (#190)
- Fix for crash when PSCBS segmentation was selected without VCF file
(#190)
- Fix for crash when Hclust segmentation was selected without segmentation
file (#190)
- Fix for crashes when not many variant pass filters (#192, #195)
- Fix for crash when provided segmentation does not have chromosomes
in common with VCF (#192) or does not provide all chromosomes present in
the coverage file (#192)
PureCN 1.22.0
NEW FEATURES
- calculateNormalDatabase now suggests an off-target interval width that
minimizes noise while keeping the resolution as high as possible
- Added support for GATK4 CollectAllelicCounts output as alternative
to Mutect
- Added segmentationGATK4 to use GATK4's segmentation function
ModelSegments
SIGNIFICANT USER-VISIBLE CHANGES
- Added min.total.counts filter to filterIntervals to remove
intervals with low number of read counts in combined tumor and normal.
Useful especially for off-target filtering in highly efficient assays
where standard filters keep too many high variance regions.
- Changed default of min.mappability in preprocessIntervals for on-target
intervals to 0.6 (from 0.5)
- Added min.mappability also to filterIntervals so that more conservative
cutoffs can be tested after normalDB generation
- PSCBS: 1.20.0 two-step segmentation slightly tweaked in that only
high quality on-target intervals (high mappability and low PoN noise)
are used in the first segmentation
- Added --skipgcnorm flag to Coverage.R to skip GC-normalization
- Added AF.info.field option to calculateMappingBiasGatk4 for non-standard
GenomicsDB imports
- If segmentation functions add breakpoints within baits, these
breakpoints are now moved to the beginning or end of that bait to avoid
that a single bait is assigned to two segments
- Dx.R now always generates a _signatures.csv file with --signatures, even
if insufficient number of mutations
- Removed defunct calculateIntervalWeights function
BUGFIXES
- Fix for nonsensical error message when VCF does not contain germline
variants (#166).
- Fix for various issues related to the seqlevelsStyle function (e.g.
#171)
- Fix for crash in calculateMappingBiasGatk4 when not all samples had
a single variant call on a particular chromosome (chrY)
- Fix related to annotating mapping bias with triallelic sites and
GenomicsDB
- Fixed an issue in Mutect 1.1.7 data in which good SNPs were ignored
(#174)
PureCN 1.20.0
NEW FEATURES
- Support for GATK4 GenomicsDB import for mapping bias calculation
- Added --additionaltumors to PureCN.R to provide coverage files
from additional biopsies from the same patient when available
- PSCBS segmentation now identifies on-target breakpoints first when
off-target is noisy, thus boosting sensitivity in on-target regions
- Beta-binomial model in runAbsoluteCN now uses the fits in mapping bias
database. We plan to set this as default in upcoming versions and
appreciate feedback.
SIGNIFICANT USER-VISIBLE CHANGES
- We now check if POP_AF or POPAF is -log10 scaled as new Mutect2 versions
do.
- Added support for GERMQ info field containing Phred-scaled germline
probabilities.
- Detect Mutect2 VCF more reliably
- Updated Mutect2 failure flags: "strand_bias", "slippage", "weak_evidence",
"orientation", "haplotype"
- Removed defunct normal.panel.vcf.file from setMappingBiasVcf
- Removed defunct interval.weight.file from segmentationPSCBS,
segmentationCBS and processMultipleSamples
- Made calculateIntervalWeights defunct
- Changed default of min.normals in calculateMappingBiasVcf/Gatk4 to 1
from 2
- Changed default of --signature_databases to
"signatures.exome.cosmic.v3.may2019" (v3 instead of v2)
- Now warn if recommended -funsegmentation is not used
- Added parallel option for callAmplificationsInLowPurity
- callMutationBurden now uses all non-filtered targets as callable region
when callable is not provided
- plotAbs in chromosome mode now displays wider range of log2 ratios
(makes it possible to examine outliers)
- Moved vcf.field.prefix from predictSomatic to runAbsoluteCN since it now
adds more fields like prior somatic and mapping bias to the VCF
- Changed default of runAbsoluteCN min.ploidy to 1.4
BUGFIXES
- Fix for crash with CNVkit input when log-ratio contained highly negative
outliers
- Fixed a bug in preprocessIntervals/IntervalFile.R when input contained
overlapping and stranded intervals
- Fix for crash when GC-correction is attempted on empty coverage (for
example off-target region without any off-target reads)
- Fix for crash when VCF FA field contained missing values
- Fix for a bug in callAmplificationsInLowPurity that can cause a wrong
chromosome percentile
PureCN 1.18.0
SIGNIFICANT USER-VISIBLE CHANGES
- callAlterations: columns C and seg.mean now provide the values of the
segment listed in seg.id. This changes the behaviour in cases where the
gene contains breakpoints and thus multiple segments overlap (#112)
BUGFIXES
- Fix for bug that can result in crash when candidates were provided in
runAbsoluteCN and test.purity, max.ploidy and/or min.ploidy were set to
non-default values
PureCN 1.16.0
NEW FEATURES
- Flag segments in poor quality regions
- predictSomatic now provides log-likelihood of allelic balance
(ALLELIC.IMBALANCE column) for each variant
- Added readLogRatioFile function to read GATK4 DenoiseReadCounts
output files containing log2 tumor/normal ratios
- Added readSegmentationFile function to read GATK4 ModelSegment
output files containing segmented log2 tumor/normal ratios
- Added callAmplificationsInLowPurity to call gene-level
amplifications in samples < 10% purity
- Dx.R now reports chromosomal instability scores
(available also via callCIN function)
- Dx.R supports deconstructSigs 1.9.0 and COSMIC signatures v3.
To run both v2 and v3, simply add --signature_databases
signatures.exome.cosmic.v3.may2019:signatures.cosmic to Dx.R
SIGNIFICANT USER-VISIBLE CHANGES
- Made filterTargets and createTargetWeights defunct
- setMappingBiasVcf now returns a data.frame
- Best practices vignette now HTML-based
- Renamed normal.panel.vcf.file in setMappingBiasVcf to mapping.bias.file;
in 1.18, setMappingBiasVcf will not accept a VCF anymore but requires
a precomputed mapping bias RDS file.
- calculateIntervalWeights now directly called by createNormalDatabase and
information included in the normalDB RDS object. This function is thus
deprecated.
- Column gene.mean in callAlterations output now weighted by interval
weights when available
- Changed default of min.target.width in preprocessIntervals from 10 to 100
(#73)
- replaced write.table with data.table::fwrite to automatically support
producing gzipped output (requires data.table 1.12.4, #106)
- Coverage.R now gzips BAM file coverage (requires data.table 1.12.4, #106)
- Output coverage files now code FALSE as 0 and TRUE as 1
- PureCN.R now bgzips and tabix indexes VCFs when --vcf is provided
BUGFIXES
- Fix for bug in CCF calculation resulting in NAs (happens in high
coverage samples, early mutations with > 1 allele copy number)
- Fix for a bug in preprocessIntervals when small targets
(< min.target.width) were present
- Fix for a bug in callMutationBurden when VCF contained indels
(#82)
- Die with helpful error message when snp.blacklist import failed
- Check input segmentation files for missing values resulting in crash
- Fixed a crash in Varscan2 produced VCFs when ALT field missed ref counts
(#109)
PureCN 1.14.0
NEW FEATURES
- support for copynumber package and its multisample segmentation
- beta support for PSCBS weighting
- support for gene symbol filtering in FilterCallableLoci.R
(e.g. --exclude "^HLA")
- added segmentationHclust function that clusters provided segmentation
using log2-ratio and B-allele frequencies
- min.target.width and small.targets in preprocessIntervals to
automatically deal with too small targets
- calculate confidence intervals for cellular fractions
- throw additional warning when sample is flagged as NON-ABERRANT and
pick the diploid solution with lowest purity as best
SIGNIFICANT USER-VISIBLE CHANGES
- significant runtime improvements
- callLOH now reports all segments, even if there are no informative
SNPs since some users were not aware that segments are missing from
this output. Use keep.no.snp.segments = FALSE to restore old behaviour.
- more detailed output of callLOH
- renamed num.snps.segment to num.snps in callAlterations output
BUGFIXES
- fixed crash in PureCN.R when gene symbols are missing from
interval file
- fixed crash in runAbsoluteCN with matched normals and high test.purity
minimum (#74)
PureCN 1.12.0
NEW FEATURES
- normalDB does not need input normal coverage files anymore after creation
(so the resulting normalDB.rds file can be moved)
- base quality filtering can be turned off by setting min.base.quality to 0
or NULL
- possible to change the POP_AF info field name
- possible to change POP_AF cutoff to set a high germline prior
- possible to change min.cosmic.cnt and max.homozygous.loss in PureCN.R
- set number of cores in PureCN.R (thanks Brad)
SIGNIFICANT USER-VISIBLE CHANGES
- renamed reptimingbinsize to reptimingwidth in IntervalFile.R, added
this feature to preprocessIntervals
- clarified "targets" vs. "intervals"; whenever something affects both
on-target and off-target, it is now called "intervals". When only targets,
e.g. in annotateTargets, "targets" was kept.
- made gc.gene.file defunct
- new default for min.cosmic.cnt = 6 (instead of 4)
BUGFIXES
- catch various input problems and provide better error messages instead
of crashing
- stranded input BED files do not cause problems anymore
- fixed a bug when only a single local optimum was tested (happens only
when users copy the examples that restrict the search speach to avoid
long runtimes)
- added missing QC flag to predictSomatic VCF annotation
PureCN 1.10.0
SIGNIFICANT USER-VISIBLE CHANGES
- New normal database format
- Runtime performance improvements (skip unlikely local optima, support for
BiocParallel in runAbsoluteCN, pre-calculation of mapping bias)
- Support for replication timing scores in coverage normalization
- More accurate confidence intervals in callMutationBurden
- More accurate copy numbers for high-level amplifications
- Very low or high coverage samples are now by default dropped in normal
database creation (less than 25% or more than 4 times the median sample
coverage)
- Improved support for third-party upstream tools like GATK4 (experimental)
- More checks for wrong or sub-optimal input and providing suggestions for
fixing those issues
- Gibbs sampling of log tumor/normal coverage error rate
- Better imputation of mapping bias (instead of smoothing
over neighboring variants in the sample, smooth over neighboring SNPs
in the pool of normals - only available when pre-calculated)
- Experimental support for indels
- Code cleanups (switch to testthat, removed several obsolete and minor
features)
API CHANGES
- renamed gc.gene.file to interval.file since it now provides more than
GC-content and gene symbols
- plotAbs ids changed to id (this function now only plots a single
purity/ploidy solution)
- changed default of runAbsoluteCN max.logr.sdev to 0.6 (from 0.75)
- createTargetWeights does not require tumor coverages anymore
- calculateGCContentByInterval was renamed to preprocessIntervals
- renamed plot.gc.bias to plot.bias in correctCoverageBias since it now
also includes replication timing
- added calculateMappingBiasVcf to pre-compute mapping bias from a
panel of normal VCF, thus avoiding time loading and parsing
of huge VCFs
- max.homozygous.loss now defines the maximum fraction of a chromosome
lost, not the whole genome, to avoid wrong maximum likelihood solutions with
completely deleted chromosome arms
PureCN 1.8.0
NEW FEATURES
- Support for off-target reads in copy number normalization and segmentation
- Added mutation burden calculation
- More robust mapping bias estimation
- Added support for CNVkit coverage files (*.cnn, *.cnr)
- IntervalFile.R can annotate targets with gene symbols and automatically
convert chromosome naming styles
- Better artifact filtering by using normalDB more efficiently
- Support for mappability scores
- Coverage calculation can now include duplicates
- calculateBamCoverageByInterval now provides fragment counts and
duplication rates
- findBestNormal pooling now fragment count based, not coverage based
- Experimental support for GATK4
- predictSomatic now reports posterior probabilites of minor segment copy
numbers, flags segments if copy numbers are unreliable
- Targets can be annotated with multiple gene symbols (comma separated)
- Code cleanups (switch to GRanges where possible, switch to optparse in
command line tools)
API CHANGES
- Due to novel optimizations of provided bait intervals, we highly recommend
to regenerate the interval files and normal databases and recalculate all
coverages from BAM files
- New functions: annotateTargets, callMutationBurden
- Defunct functions: createSNPBlacklist, getDiploid, autoCurateResults,
readCoverageGatk
- min.normals defaults to 2 (changed from 4) in setMappingBiasVcf
- normalDB.min.coverage defaults to 0.25 (changed from 0.2) in filterTargets
- log.ratio.calibration defaults to 0.1 (from 0.25) in runAbsoluteCN; now
relative to purity, not log-ratio noise
- Removed gc.data from filterTargets since gc_bias is now added to tumor
coverage
- dropped purecn.output from correctCoverageBias (no two-pass anymore)
- Coverage.R argument --gatkcoverage renamed to --coverage
- Dropped GC-normalization functionality in NormalDB, since this is
now conveniently done in Coverage.R
- Renamed PureCN.R --outdir argument to --out. Can now specify a file
prefix as in GATK. Filenames are thus not forced to sample id anymore.
If --out is a directory, it will behave like before and will use
out/sampleid_suffix as filename.