PureCN is
backward compatible with input generated by versions 1.16 and later. For
versions 1.8 to 1.14, please re-run NormalDB.R
(see also
below):
$ Rscript $PURECN/NormalDB.R --out-dir $OUT_REF \
--coverage-files example_normal_coverages.list \
--genome hg19 --normal-panel $NORMAL_PANEL --assay agilent_v6
When using --model betabin
in PureCN.R
, we
recommend for all previous versions re-creating the mapping bias
database by re-running NormalDB.R
:
# only re-creating the mapping bias file
$ Rscript $PURECN/NormalDB.R --out-dir $OUT_REF \
--genome hg19 --normal-panel $NORMAL_PANEL --assay agilent_v6
For upgrades from version 1.6, we highly recommend starting from scratch following this tutorial.
For the command line scripts described in this tutorial, we will need to install PureCN with suggested dependencies:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("PureCN", dependencies = TRUE)
Alternatively, manually install the packages required by the command line scripts:
BiocManager::install(c("PureCN", "optparse", "R.utils",
"TxDb.Hsapiens.UCSC.hg19.knownGene", "org.Hs.eg.db"))
(Replace hg19
with your genome version).
To use the alternative and in many cases recommended PSCBS segmentation:
# default PSCBS without support of interval weights
BiocManager::install("PSCBS")
# patched PSCBS with support of interval weights
BiocManager::install("lima1/PSCBS", ref="add_dnacopy_weighting")
To call mutational signatures, install the GitHub version of the deconstructSigs package:
BiocManager::install("raerose01/deconstructSigs")
For the experimental support of importing variant calls from GATK4 GenomicsDB, follow the installations instructions from GenomicsDB-R.
The GATK4 segmentation requires the gatk
binary
in path. Versions 4.1.7.0 and newer are supported.
## [1] "/tmp/RtmpbKPkWC/Rinst1cc917cd5976/PureCN/extdata"
$ export PURECN="/path/to/PureCN/extdata"
$ Rscript $PURECN/PureCN.R --help
Usage: /path/to/PureCN/inst/extdata/PureCN.R [options] ...
# specify path where PureCN should store reference files
$ export OUT_REF="reference_files"
$ Rscript $PURECN/IntervalFile.R --in-file baits_hg19.bed \
--fasta hg19.fa --out-file $OUT_REF/baits_hg19_intervals.txt \
--off-target --genome hg19 \
--export $OUT_REF/baits_optimized_hg19.bed \
--mappability wgEncodeCrgMapabilityAlign100mer.bigWig \
--reptiming wgEncodeUwRepliSeqK562WaveSignalRep1.bigWig
Internally, this script uses rtracklayer
to parse the --in-file
. Make sure that the file format
matches the file extension. See the rtracklayer
documentation for problems loading the file. Check that the genome
version of the baits file matches the reference. Do not include chrM
baits in case the capture kit includes some.
We do not recommend padding the baits file manually unless the
coverages are very low (<30X) where the increased counts from the
padded regions might decrease sampling variance slightly. Note that we
do however strongly recommend running the variant caller with a padding
of at least 50bp to increase the number of informative SNPs, see below
in the VCF section. Double check that the genome version of the
--in-file
is correct - many assays are still designed using
older references and might need to be lifted over to the pipeline
reference. If possible, do NOT use a BED file that contains the targeted
exons, instead use the coordinates of the baits. These are optimized for
GC-content and mappability and will produce cleaner coverage
profiles.
The --off-target
flag will include off-target reads.
Including them is recommended except for Amplicon data. For whole-exome
data, the benefit is usually also limited unless the assay is
inefficient with a high fraction of off-target reads (>10-15%).
The --genome
version is needed to annotate exons with
gene symbols. Use hg19/hg38 for human genomes, not b37/b38. You might
get a warning that an annotation package is missing. For hg19, install
TxDb.Hsapiens.UCSC.hg19.knownGene
in R.
The --export
argument is optional. If provided, this
script will store the modified intervals as BED file for example (again
every rtracklayer
format is supported). This is useful when the coverages are calculated
with third-party tools like GATK.
The --mappability
argument should provide a rtracklayer
parsable file with a mappability score in the first meta data column. If
provided, off-target regions will be restricted to regions specified in
this file. On-target regions with low mappability will be excluded. For
hg19, download the file from the UCSC website. Choose the kmer size that
best fits your average mapped read length. For hg38, download
recommended 76-kmer or 100-kmer mappability files through the courtesy
of the Waldron lab from:
See the FAQ section of the main vignette for instruction how to generate such a file for other references.
Similarly, the --reptiming
argument takes a replication
timing score in the same format. If provided, GC-normalized and
log-transformed coverage is tested for a linear relationship with this
score and normalized accordingly. This is optional and provides only a
minor benefit for coverage normalization, but can identify samples with
high proliferation. Requires --off-target
to be useful.
PureCN does not ship with a variant caller. Use a third-party tool to generate a VCF for each sample.
Important recommendations:
Use MuTect 1.1.7 if possible; Mutect 2 from GATK 4.1.7+ is now out of alpha and VCFs generated following the best practices somatic workflow should work (earlier Mutect 2 versions are not supported and will not work).
VCFs from most other tumor-only callers such as VarScan2 and FreeBayes are supported, but only very limited artifact filtering will be performed for these callers. Make sure to provide filtered VCFs. See the FAQ section in the main vignette for common problems and questions related to input data.
Since germline SNPs are needed to infer allele-specific copy
numbers, the provided VCF needs to contain both somatic and germline
variants. Make sure that upstream filtering does not remove high quality
SNPs, in particular due to presence in germline databases. Mutect
1.1.7 automatically calls SNPs, but Mutect 2 does not.
Make sure to run Mutect 2 with
--genotype-germline-sites true --genotype-pon-sites true
.
You will not get usuable output without those flags. Since Mutect
2 from GATK 4.2.0+, average base quality scores can be
very low and variants will be too aggressively removed by
PureCN. You will need to set
--min-base-quality 20
in PureCN.R to keep
them.
Run the variant caller with a 50-75 base pair interval padding to
increase the number of heterozygous SNPs (for example
--interval_padding
and --interval-padding
in
Mutect 1.1.7 and Mutect 2, respectively). For very
high coverages beyond 1000X, it is safe to increase this value up to
200bp.
The following describes PureCN runs with internal copy number normalization and segmentation.
What you will need:
The interval file generated above
BAM files of tumor samples.
BAM files of normal samples (see main vignette for recommendations). These normal samples are not required to be patient-matched to the tumor samples, but they need to be processed-matched (same assay run through the same alignment pipeline, ideally sequenced in the same lab)
VCF files generated above for all tumor and normal BAM files
For each sample, tumor and normal, calculate GC-normalized coverages:
# Calculate and GC-normalize coverage from a BAM file
$ Rscript $PURECN/Coverage.R --out-dir $OUT/$SAMPLEID \
--bam ${SAMPLEID}.bam \
--intervals $OUT_REF/baits_hg19_intervals.txt
Similar to GATK, this script also takes a text file containing a list
of BAM or coverage file names (one per line). The file extension must be
.list
:
# Calculate and GC-normalize coverage from a list of BAM files
$ Rscript $PURECN/Coverage.R --out-dir $OUT/normals \
--bam normals.list \
--intervals $OUT_REF/baits_hg19_intervals.txt \
--cores 4
Important recommendations:
Only provide --keep-duplicates
or
--remove-mapq0
if you know what you are doing and always
use the same command line arguments for tumor and the normals
It can be safe to skip the GC-normalization with
--skip-gc-norm
when tumor and normal samples are expected
to exhibit similar biases and a sufficient number of normal samples is
available. A good example would be plasma sequencing. In contrast, old
FFPE samples normalized against blood controls will more likely benefit
from GC-normalization.
A potential negative impact of GC-normalization is much more likely in very small targeted panels (< 0.5Mb) and worth benchmarking.
When supported third-party tools are used to calculate coverage (currently CNVkit, GATK3 and GATK4), it is possible to GC-normalize those coverages with a matching interval file:
# GC-normalize coverage from a GATK DepthOfCoverage file
Rscript $PURECN/Coverage.R --out-dir $OUT/$SAMPLEID \
--coverage ${SAMPLEID}.coverage.sample_interval_summary \
--intervals $OUT_REF/baits_hg19_intervals.txt
To build a normal database for coverage normalization, copy the paths to all (GC-normalized) normal coverage files in a single text file, line-by-line:
ls -a $OUT/normals/*_loess.txt.gz | cat > example_normal_coverages.list
# In case no GC-normalization is performed:
# ls -a $OUT/normals/*_coverage.txt.gz | cat > example_normal_coverages.list
$ Rscript $PURECN/NormalDB.R --out-dir $OUT_REF \
--coverage-files example_normal_coverages.list \
--genome hg19 --assay agilent_v6
# When normal panel VCF is available (highly recommended for
# unmatched samples)
$ Rscript $PURECN/NormalDB.R --out-dir $OUT_REF \
--coverage-files example_normal_coverages.list \
--normal-panel $NORMAL_PANEL \
--genome hg19 \
--assay agilent_v6
# For a Mutect2/GATK4 normal panel GenomicsDB (beta)
$ Rscript $PURECN/NormalDB.R --out-dir $OUT_REF \
--coverage-files example_normal_coverages.list \
--normal-panel $GENOMICSDB-WORKSPACE-PATH/pon_db \
--genome hg19 \
--assay agilent_v6
Important recommendations:
Consider generating different databases when differences are significant, e.g. for samples with different read lengths or insert size distributions
In particular, do not mix normal data obtained with different capture kits (e.g. Agilent SureSelect v4 and v6)
Provide a normal panel VCF here to precompute mapping bias for
faster runtimes. The only requirement for the VCF is an AD
format field containing the number of reference and alt reads for all
samples. See the example file
$PURECN/normalpanel.vcf.gz
.
For ideal results, examine the interval_weights.png
file to find good off-target bin widths. You will need to re-run
IntervalFile.R
with the
--average-off-target-width
parameter and re-calculate the
coverages. NormalDB.R
will also give a suggestion for a
good minimum width. We do not recommend going lower than this estimate;
setting --average-off-target-width
to value larger than
this value can decrease noise at the cost of loss in resolution. Setting
it to 1.2-1.5x the minimum recommendation (that should be ideally <
250kb) is a good starting point.
The --assay
argument is optional and is only used to
add the provided assay name to all output files
A warning pointing to the likely use of a wrong baits file means
that more than 5% of targets have close to 0 coverage in all normal
samples. A BED file with the low coverage targets will be generated in
--out-dir
. If for any reason there is no access to the
correct file, it is recommended to re-run the
IntervalFile.R
command and provide this BED file with
--exclude
.
Now that the assay-specific files are created and all coverages calculated, we run PureCN to normalize, segment and determine purity and ploidy:
mkdir $OUT/$SAMPLEID
# Without a matched normal (minimal test run)
$ Rscript $PURECN/PureCN.R --out $OUT/$SAMPLEID \
--tumor $OUT/$SAMPLEID/${SAMPLEID}_coverage_loess.txt.gz \
--sampleid $SAMPLEID \
--vcf ${SAMPLEID}_mutect.vcf \
--normaldb $OUT_REF/normalDB_hg19.rds \
--intervals $OUT_REF/baits_hg19_intervals.txt \
--genome hg19
# Production pipeline run
$ Rscript $PURECN/PureCN.R --out $OUT/$SAMPLEID \
--tumor $OUT/$SAMPLEID/${SAMPLEID}_coverage_loess.txt.gz \
--sampleid $SAMPLEID \
--vcf ${SAMPLEID}_mutect.vcf \
--stats-file ${SAMPLEID}_mutect_stats.txt \
--fun-segmentation PSCBS \
--normaldb $OUT_REF/normalDB_hg19.rds \
--mapping-bias-file $OUT_REF/mapping_bias_hg19.rds \
--intervals $OUT_REF/baits_hg19_intervals.txt \
--snp-blacklist hg19_simpleRepeats.bed \
--genome hg19 \
--model betabin \
--force --post-optimize --seed 123
# With a matched normal (test run; for production pipelines we recommend the
# unmatched workflow described above)
$ Rscript $PURECN/PureCN.R --out $OUT/$SAMPLEID \
--tumor $OUT/$SAMPLEID/${SAMPLEID}_coverage_loess.txt.gz \
--normal $OUT/$SAMPLEID/${SAMPLEID_NORMAL}_coverage_loess.txt.gz \
--sampleid $SAMPLEID \
--vcf ${SAMPLEID}_mutect.vcf \
--normaldb $OUT_REF/normalDB_hg19.rds \
--intervals $OUT_REF/baits_hg19_intervals.txt \
--genome hg19
# Recreate output after manual curation of ${SAMPLEID}.csv
$ Rscript $PURECN/PureCN.R --rds $OUT/$SAMPLEID/${SAMPLEID}.rds
Important recommendations:
Even if matched normals are available, it is often better to use
the normal database for coverage normalization. When a matched
normal coverage is provided with --normal
then the pool of
normal coverage normalization and denoising steps are
skipped!
Always provide the normal coverage database to ignore low quality regions in the segmentation and to increase the sensitivity for homozygous deletions in high purity samples.
Double check that in --tumor
and
--normaldb
, GC-normalization is either used in both
(*_loess.txt.gz
) or skipped in both
(*_coverage.txt.gz
).
The normal panel VCF file is useful for mapping bias correction and especially recommended without matched normals. See the FAQ of the main vignette how to generate this file. It is not essential for test runs.
The MuTect 1.1.7 stats file (the main output file besides the VCF) should be provided for better artifact filtering. If the VCF was generated by a pipeline that performs good artifact filtering, this file is not needed. Do NOT provide this file for Mutect 2.
The --post-optimize
flag defines that purity should
be optimized using both variant allelic fractions and copy number
instead of copy number only. This results in a significant runtime
increase for whole-exome data.
If --out
is a directory, it will use the sample id
as file prefix for all output files. Otherwise PureCN
will use --out
as prefix.
The --parallel
flag will enable the parallel fitting
of local optima. See BiocParallel
for details. This script will use the default backend.
--cores
is a short cut to use the specified number of CPUs
instead of the default backend. Only specify one of the two arguments.
Note that memory usage can increase linearly with number of
cores and insufficient memory can result in random
crashes.
--fun-segmentation PSCBS
is the new recommendation
in 1.22. Support for interval weights currently requires a patch (see
Section @ref(installation)). See below for some more details on the best
choice of the method.
--model betabin
is the new recommendation in 1.22
with larger panel of normals (more than 10-15 normal samples).
Defaults are well calibrated and should produce close to ideal results for most samples. A few common cases where changing defaults makes sense:
High purity and high quality: For cancer types with a high
expected purity, such as ovarian cancer, AND when quality is expected to
be very good (high coverage, young samples),
--max-copy-number 8
. (PureCN
reports copy numbers greater than this value, but will stop fitting SNP
allelic fractions to the exact allele-specific copy number because this
will get impossible very quickly with high copy numbers - and
computationally expensive.)
Small panels with high coverage:
--interval-padding 100
(or higher), requires running the
variant caller with this padding or without interval file. Use the same
settings for the panel of normals VCF so that SNPs in the flanking
regions have reliable mapping bias estimates. The
--max-homozygous-loss
parameter might also need some
adjustment for very small panels with large gaps around captured
deletions.
Cell lines: Safely skip the search for low purity solutions in
cell lines: --max-copy-number 8
,
--min-purity 0.9
, --max-purity 0.99
. Add
--model-homozygous
to find regions of LOH in samples
without normal contamination (do not provide this flag when matched
normal data are available in the VCF).
cfDNA: --min-purity 0.1
, --min-af 0.01
(or lower) and --error 0.0005
(or lower, when there is
UMI-based error correction). Note that the estimated purity can be very
wrong when the true purity is below 5-7%; these samples are usually
flagged as non-aberrant.
All assays: --max-segments
should set to a value so
that with few exceptions only poor quality samples exceed this cutoff.
For cancer types with high heterogenity, it is also recommended to
increase --max-non-clonal
to 0.3-0.4 (this will increase
the runtime significantly for whole-exome data).
The choice of the segmentation function can also make a significant difference and unfortunately there is not yet a universal method that works best in all scenarios.
PSCBS: A good and safe starting point, especially with off-target regions that might exhibit different noise profiles compared to on-target.
GATK4: Most recent addition. Not yet well tested in PureCN, but theoretically best choice with a larger number of SNPs per intervals, for example assays with copy number backbones. We appreciate feedback.
CBS: Simple, fast and well tested. Does not fully support SNP information, so only recommended for settings with a very small SNPs/intervals ratio, for example small targeted panels (<1Mb) with healthy off-target coverage (<150kb resolution and similar log ratio noise compared to on-target).
copynumber: For cases with multiple time points or biopsies. This
is
automatically chosen with --additional-tumors
and currently
not supported in a single-sample analysis.
Hclust/none: For third-party segmentations. Hclust
clusters segments in an attempt to calibrate log-ratios across
chromosomes, none
largely keeps everything as
provided.
A few recommendations for checks whether the PureCN setup is correct:
The “Mean standard deviation of log-ratios” reported in the log file should be fairly low for high quality data. Older FFPE data can be around 0.4, but high coverage, relatively recent samples should approach the 0.15 minimum. If off-target is consistently noisier than on-target, it is probably worth increasing the off-target bin size and start from scratch (or in case of whole-exome sequencing, ignore off-target reads since they do not provide much additional information when bins are large and/or noisy).
Related to that, a warning is thrown when less than 10% of all intervals passing filters are off-target intervals. Whole-exome sequencing is usually around that value. If the log-ratio standard deviation is similar or even lower than the one for on-target, it is worth keeping off-target regions. Otherwise off-target might add more noise than signal. Off-target information is automatically ignored when the passing rate falls below 5% of all intervals.
The fraction of targets with SNPs should be between 10 and 15
percent. If it is significantly lower, make sure that the variant caller
was used with 50-100bp interval padding or no interval file at all. Also
check that the interval file was generated using the baits coordinates,
not the targets (the baits BED file should have a more even size
distribution, e.g. 120bp and multiples of it). If many variants are
removed by the default 25 base quality feature, you might be using
Mutect 2 and need to re-run PureCN.R with
--min-base-quality 20
.
“Initial testing for significant sample cross-contamination” in the log file should not have many false positives, i.e. should be “unlikely” for most samples, not “maybe”. Insufficient artifact removal can result in too many false SNPs calls with low allelic fractions, confusing the contamination caller.
Read all warnings.
Our internal PureCN normalization combined with the PSCBS or GATK4 segmentation should produce highly competitive results and we encourage users to try it and compare it to their existing pipelines. However, we realize that often it is not an option to change tools in production pipelines and we therefore made it relatively easy to use PureCN with third-party tools. We provide examples for CNVkit and GATK4 and it should be straightforward to adapt those for other tools.
What you will need:
Output of third-party tools (see details below)
VCF files for all tumor samples and some normal files (see main vignette for questions related to required normal samples)
If you already have a segmentation from third-party tools (for example CNVkit, GATK4, EXCAVATOR2). For a minimal test run:
Rscript $PURECN/PureCN.R --out $OUT/$SAMPLEID \
--sampleid $SAMPLEID \
--seg-file $OUT/$SAMPLEID/${SAMPLEID}.cnvkit.seg \
--vcf ${SAMPLEID}_mutect.vcf \
--intervals $OUT_REF/baits_hg19_intervals.txt \
--genome hg19
See the main vignette for more details and file formats.
For a production pipeline run we provide again more information about the assay and genome. Here an CNVkit example:
# Recommended: Provide a normal panel VCF to remove mapping biases, pre-compute
# position-specific bias for much faster runtimes with large panels
# This needs to be done only once for each assay
Rscript $PURECN/NormalDB.R --out-dir $OUT_REF --normal-panel $NORMAL_PANEL \
--assay agilent_v6 --genome hg19 --force
# Export the segmentation in DNAcopy format
cnvkit.py export seg $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.cns --enumerate-chroms \
-o $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.seg
# Run PureCN by providing the *.cnr and *.seg files
Rscript $PURECN/PureCN.R --out $OUT/$SAMPLEID \
--sampleid $SAMPLEID \
--tumor $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.cnr \
--seg-file $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.seg \
--mapping-bias-file $OUT_REF/mapping_bias_agilent_v6_hg19.rds \
--vcf ${SAMPLEID}_mutect.vcf \
--stats-file ${SAMPLEID}_mutect_stats.txt \
--snp-blacklist hg19_simpleRepeats.bed \
--genome hg19 \
--fun-segmentation Hclust \
--force --post-optimize --seed 123
Important recommendations:
The --fun-segmentation
argument controls if the data
should to be re-segmented using germline BAFs (default). Set this value
to none
if the provided segmentation should be used as is.
The recommended Hclust
will only cluster provided
segments.
Since CNVkit provides all necessary information in the
*.cnr
output files, the --intervals
argument
is not required.
In test runs, especially when the input VCF contains matched
normal information, --mapping-bias-file
can be
skipped
CNVkit runs without normal reference samples are not recommended
The --stats-file
is only supported for Mutect
1.1.7. Mutect 2 provides the filter flags directly in the
VCF.
# Recommended: Provide a normal panel GenomicsDB to remove mapping
# biases, pre-compute position-specific bias for much faster runtimes
# with large panels. This needs to be done only once for each assay.
Rscript $PURECN/NormalDB.R --out-dir $OUT_REF \
--normal-panel $GENOMICSDB-WORKSPACE-PATH/pon_db \
--assay agilent_v6 --genome hg19 --force
Rscript $PURECN/PureCN.R --out $OUT/$SAMPLEID \
--sampleid $SAMPLEID \
--tumor $OUT/$SAMPLEID/${SAMPLEID}.hdf5 \
--log-ratio-file $OUT/$SAMPLEID/${SAMPLEID}.denoisedCR.tsv \
--seg-file $OUT/$SAMPLEID/${SAMPLEID}.modelFinal.seg \
--mapping-bias-file $OUT_REF/mapping_bias_agilent_v6_hg19.rds \
--vcf ${SAMPLEID}_mutect2_filtered.vcf \
--snp-blacklist hg19_simpleRepeats.bed \
--genome hg19 \
--fun-segmentation Hclust \
--force --post-optimize --seed 123
Important recommendations:
The --fun-segmentation
can be set to none in most
cases. This will keep the segmentation largely as provided.
Hclust
clusters segments to avoid over-segmentation and to
calibrate log-ratios across chromosomes. This will thus alter the GATK4
segmentation, which might not be desired.
Beta support for providing CollectAllelicCounts output
instead of Mutect is available. Use
--vcf ${SAMPLEID}.allelicCounts.tsv
to automatically import
the SNP counts and convert them into a supported VCF. Note that this
will not use any somatic SNV and indel information available in
Mutect VCFs and thus will also not provide any clonality
annotation.
Dx.R
provides copy number and mutation metrics commonly
used as biomarkers, most importantly tumor mutational burden (TMB),
chromosomal instability (CIN) and mutational signatures.
# Provide a BED file with callable regions, for examples obtained by
# GATK CallableLoci. Useful to calculate mutations per megabase and
# to exclude low quality regions.
grep CALLABLE ${SAMPLEID}_callable_status.bed > \
${SAMPLEID}_callable_status_filtered.bed
# Only count mutations in callable regions, also subtract what was
# ignored in PureCN.R via --snp-blacklist, like simple repeats, from the
# mutation per megabase calculation
# Also search for the COSMIC mutation signatures
# (http://cancer.sanger.ac.uk/cosmic/signatures)
Rscript $PureCN/Dx.R --out $OUT/$SAMPLEID/$SAMPLEID \
--rds $OUT/SAMPLEID/${SAMPLEID}.rds \
--callable ${SAMPLEID}_callable_status_filtered.bed \
--exclude hg19_simpleRepeats.bed \
--signatures
# Restrict mutation burden calculation to coding sequences
Rscript $PureCN/FilterCallableLoci.R --genome hg19 \
--in-file ${SAMPLEID}_callable_status_filtered.bed \
--out-file ${SAMPLEID}_callable_status_filtered_cds.bed \
--exclude '^HLA'
Rscript $PureCN/Dx.R --out $OUT/$SAMPLEID/${SAMPLEID}_cds \
--rds $OUT/SAMPLEID/${SAMPLEID}.rds \
--callable ${SAMPLEID}_callable_status_filtered_cds.bed \
--exclude hg19_simpleRepeats.bed
Important recommendations:
Run GATK CallableLoci with --minDepth N
where N is roughly 20% of the mean target coverage of all
samples.
If --callable
is missing, all intervals passing
filters are assumed to be callable.
Argument name | Corresponding PureCN argument | PureCN function |
---|---|---|
--fasta |
reference.file |
preprocessIntervals |
--in-file |
interval.file |
preprocessIntervals |
--off-target |
off.target |
preprocessIntervals |
--average-target-width |
average.target.width |
preprocessIntervals |
--min-target-width |
min.target.width |
preprocessIntervals |
--small-targets |
small.targets |
preprocessIntervals |
--average-off-target-width |
average.off.target.width |
preprocessIntervals |
--off-target-seqlevels |
off.target.seqlevels |
preprocessIntervals |
--mappability |
mappability |
preprocessIntervals |
--min-mappability |
min.mappability |
preprocessIntervals |
--reptiming |
reptiming |
preprocessIntervals |
--average-reptiming-width |
average.reptiming.width |
preprocessIntervals |
--genome |
txdb , org |
annotateTargets |
--out-file |
||
--export |
rtracklayer::export |
|
--version -v |
||
--force -f |
||
--help -h |
Argument name | Corresponding PureCN argument | PureCN function |
---|---|---|
--bam |
bam.file |
calculateBamCoverageByInterval |
--bai |
index.file |
calculateBamCoverageByInterval |
--coverage |
coverage.file |
correctCoverageBias |
--intervals |
interval.file |
correctCoverageBias |
--method |
method |
correctCoverageBias |
--keep-duplicates |
keep.duplicates |
calculateBamCoverageByInterval |
--chunks |
chunks |
calculateBamCoverageByInterval |
--remove-mapq0 |
mapqFilter |
ScanBamParam |
--skip-gc-norm |
correctCoverageBias |
|
--out-dir |
||
--cores |
Number of CPUs to use when multiple BAMs are provided | |
--parallel |
Use default BiocParallel backend when multiple BAMs are provided | |
--seed |
||
--version -v |
||
--force -f |
||
--help -h |
Argument name | Corresponding PureCN argument | PureCN function |
---|---|---|
--coverage-files |
normal.coverage.files |
createNormalDatabase |
--normal-panel |
normal.panel.vcf.file |
calculateMappingBiasVcf |
--assay -a |
Optional assay name | Used in output file names. |
--genome -g |
Optional genome version | Used in output file names. |
--genomicsdb-af-field |
For GenomicsDB import, allelic fraction field | calculateMappingBiasGatk4 |
--min-normals-position-specific-fit |
min.normals.position.specific.fit |
calculateMappingBiasVcf ,
calculateMappingBiasGatk4 |
--out-dir -o |
||
--version -v |
||
--force -f |
||
--help -h |
Argument name | Corresponding PureCN argument | PureCN function |
---|---|---|
--sampleid -i |
sampleid |
runAbsoluteCN |
--normal |
normal.coverage.file |
runAbsoluteCN |
--tumor |
tumor.coverage.file |
runAbsoluteCN |
--vcf |
vcf.file |
runAbsoluteCN |
--rds |
file.rds |
readCurationFile |
--mapping-bias-file |
mapping.bias.file |
setMappingBiasVcf |
--normaldb |
normalDB (serialized with saveRDS ) |
calculateTangentNormal , filterTargets |
--seg-file |
seg.file |
runAbsoluteCN |
--log-ratio-file |
log.ratio |
runAbsoluteCN |
--additional-tumors |
tumor.coverage.files |
processMultipleSamples |
--sex |
sex |
runAbsoluteCN |
--genome |
genome |
runAbsoluteCN |
--intervals |
interval.file |
runAbsoluteCN |
--stats-file |
stats.file |
filterVcfMuTect |
--min-af |
af.range |
filterVcfBasic |
--snp-blacklist |
snp.blacklist |
filterVcfBasic |
--error |
error |
runAbsoluteCN |
--db-info-flag |
DB.info.flag |
runAbsoluteCN |
--popaf-info-field |
POPAF.info.field |
runAbsoluteCN |
--cosmic-cnt-info-field |
Cosmic.CNT.info.field |
runAbsoluteCN |
--min-cosmic-cnt |
min.cosmic.cnt |
setPriorVcf |
--interval-padding |
interval.padding |
filterVcfBasic |
--min-total-counts |
min.total.counts |
filterIntervals |
--min-fraction-offtarget |
min.fraction.offtarget |
filterIntervals |
--fun-segmentation |
fun.segmentation |
runAbsoluteCN |
--alpha |
alpha |
segmentationCBS |
--undo-sd |
undo.SD |
segmentationCBS |
--changepoints-penalty |
changepoints.penalty |
segmentationGATK4 |
--additional-cmd-args |
additional.cmd.args |
segmentationGATK4 |
--max-segments |
max.segments |
runAbsoluteCN |
--min-logr-sdev |
min.logr.sdev |
runAbsoluteCN |
--min-purity |
test.purity |
runAbsoluteCN |
--max-purity |
test.purity |
runAbsoluteCN |
--min-ploidy |
min.ploidy |
runAbsoluteCN |
--max-ploidy |
max.ploidy |
runAbsoluteCN |
--max-copy-number |
test.num.copy |
runAbsoluteCN |
--post-optimize |
post.optimize |
runAbsoluteCN |
--bootstrap-n |
n |
bootstrapResults |
--speedup-heuristics |
speedup.heuristics |
runAbsoluteCN |
--model-homozygous |
model.homozygous |
runAbsoluteCN |
--model |
model |
runAbsoluteCN |
--log-ratio-calibration |
log.ratio.calibration |
runAbsoluteCN |
--max-non-clonal |
max.non.clonal |
runAbsoluteCN |
--max-homozygous-loss |
max.homozygous.loss |
runAbsoluteCN |
--out-vcf |
return.vcf |
predictSomatic |
--out -o |
||
--parallel |
BPPARAM |
runAbsoluteCN |
--cores |
BPPARAM |
runAbsoluteCN |
--seed |
||
--version -v |
||
--force -f |
||
--help -h |
Argument name | Corresponding PureCN argument | PureCN function |
---|---|---|
--rds |
file.rds |
readCurationFile |
--callable |
callable |
callMutationBurden |
--exclude |
exclude |
callMutationBurden |
--max-prior-somatic |
max.prior.somatic |
callMutationBurden |
--signatures |
deconstructSigs::whichSignatures |
|
--signature-databases |
deconstructSigs::whichSignatures |
|
--out |
||
--version -v |
||
--force -f |
||
--help -h |