Title: | Copy number calling and SNV classification using targeted short read sequencing |
---|---|
Description: | This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality. PureCN is designed for targeted short read sequencing data, integrates well with standard somatic variant detection and copy number pipelines, and has support for tumor samples without matching normal samples. |
Authors: | Markus Riester [aut, cre] , Angad P. Singh [aut] |
Maintainer: | Markus Riester <[email protected]> |
License: | Artistic-2.0 |
Version: | 2.13.0 |
Built: | 2024-10-31 03:48:25 UTC |
Source: | https://github.com/bioc/PureCN |
This function can be used to adjust the log ratio for tumor purity and ploidy for downstream tools that expect a log2 ratio (for example GISTIC).
adjustLogRatio(ratio, purity, ploidy, is.log2 = TRUE, min.ratio = 2^-8)
adjustLogRatio(ratio, purity, ploidy, is.log2 = TRUE, min.ratio = 2^-8)
ratio |
Vector of log2 tumor vs normal coverage ratios. |
purity |
Purity of sample. |
ploidy |
Ploidy of sample. |
is.log2 |
|
min.ratio |
Minimum (non-log2-transformed) ratio. Set to approx -8
|
numeric(length(log.ratio))
, log.ratio
adjusted
for purity
and ploidy
Markus Riester
Nature Biotechnology. * Toal (2018), https://github.com/lima1/PureCN/issues/40
normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") tumor.coverage.file <- system.file("extdata", "example_tumor.txt.gz", package = "PureCN") normal <- readCoverageFile(normal.coverage.file) tumor <- readCoverageFile(tumor.coverage.file) log.ratio <- calculateLogRatio(normal, tumor) log.ratio.adjusted <- adjustLogRatio(log.ratio, 0.65, 1.73)
normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") tumor.coverage.file <- system.file("extdata", "example_tumor.txt.gz", package = "PureCN") normal <- readCoverageFile(normal.coverage.file) tumor <- readCoverageFile(tumor.coverage.file) log.ratio <- calculateLogRatio(normal, tumor) log.ratio.adjusted <- adjustLogRatio(log.ratio, 0.65, 1.73)
This function can be used to add a ‘Gene’ meta column containing
gene symbols to a GRanges
object.
It applies heuristics to find the protein coding genes that were
likely meant to target in the assay design in case transcripts
overlap.
annotateTargets(x, txdb, org)
annotateTargets(x, txdb, org)
x |
A |
txdb |
A |
org |
A |
A GRanges
object.
Markus Riester
library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(org.Hs.eg.db) normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") x <- head(readCoverageFile(normal.coverage.file), 100) x <- annotateTargets(x,TxDb.Hsapiens.UCSC.hg19.knownGene, org.Hs.eg.db)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(org.Hs.eg.db) normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") x <- head(readCoverageFile(normal.coverage.file), 100) x <- annotateTargets(x,TxDb.Hsapiens.UCSC.hg19.knownGene, org.Hs.eg.db)
This function bootstraps variants, then optionally re-ranks solutions by using the bootstrap estimate of the likelihood score, and then optionally removes solutions that never ranked high in any bootstrap replicate.
bootstrapResults(res, n = 500, top = NULL, reorder = FALSE)
bootstrapResults(res, n = 500, top = NULL, reorder = FALSE)
res |
Return object of the |
n |
Number of bootstrap replicates. |
top |
Include solution if it appears in the top |
reorder |
Reorder results by bootstrap value. |
Returns a runAbsoluteCN
object with added bootstrap
value to each solution. This value
is the fraction of bootstrap replicates in which the solution ranked first.
Markus Riester
data(purecn.example.output) ret.boot <- bootstrapResults(purecn.example.output, n=100) plotAbs(ret.boot, type="overview")
data(purecn.example.output) ret.boot <- bootstrapResults(purecn.example.output, n=100) plotAbs(ret.boot, type="overview")
Takes a BAM file and an interval file as input and returns coverage for each
interval. Coverage should be then GC-normalized using the
correctCoverageBias
function before determining purity and
ploidy with runAbsoluteCN
. Uses the scanBam
function
and applies low quality, duplicate reads as well as secondary alignment
filters.
calculateBamCoverageByInterval( bam.file, interval.file, output.file = NULL, index.file = bam.file, keep.duplicates = FALSE, chunks = 20, ... )
calculateBamCoverageByInterval( bam.file, interval.file, output.file = NULL, index.file = bam.file, keep.duplicates = FALSE, chunks = 20, ... )
bam.file |
Filename of a BAM file. |
interval.file |
File specifying the intervals. Interval is expected in first column in format CHR:START-END. |
output.file |
Optionally, write minimal coverage file. Can be read with
the |
index.file |
The bai index. This is expected without the .bai file
suffix, see |
keep.duplicates |
Keep or remove duplicated reads. |
chunks |
Split |
... |
Additional parameters passed to |
Returns total and average coverage by intervals.
Markus Riester
preprocessIntervals
correctCoverageBias runAbsoluteCN
bam.file <- system.file("extdata", "ex1.bam", package = "PureCN", mustWork = TRUE) interval.file <- system.file("extdata", "ex1_intervals.txt", package = "PureCN", mustWork = TRUE) # Calculate raw coverage from BAM file. These need to be corrected for # GC-bias using the correctCoverageBias function before determining purity # and ploidy. coverage <- calculateBamCoverageByInterval(bam.file = bam.file, interval.file = interval.file)
bam.file <- system.file("extdata", "ex1.bam", package = "PureCN", mustWork = TRUE) interval.file <- system.file("extdata", "ex1_intervals.txt", package = "PureCN", mustWork = TRUE) # Calculate raw coverage from BAM file. These need to be corrected for # GC-bias using the correctCoverageBias function before determining purity # and ploidy. coverage <- calculateBamCoverageByInterval(bam.file = bam.file, interval.file = interval.file)
This function is automatically called by runAbsoluteCN
when
normal and tumor coverage are provided (and not a segmentation file or
target-level log-ratios). This function is therefore normally not called by
the user.
calculateLogRatio(normal, tumor)
calculateLogRatio(normal, tumor)
normal |
Normal coverage read in by the |
tumor |
Tumor coverage read in by the |
numeric(length(tumor))
, tumor vs. normal copy number log-ratios
for all targets.
Markus Riester
normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") tumor.coverage.file <- system.file("extdata", "example_tumor.txt.gz", package = "PureCN") normal <- readCoverageFile(normal.coverage.file) tumor <- readCoverageFile(tumor.coverage.file) log.ratio <- calculateLogRatio(normal, tumor)
normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") tumor.coverage.file <- system.file("extdata", "example_tumor.txt.gz", package = "PureCN") normal <- readCoverageFile(normal.coverage.file) tumor <- readCoverageFile(tumor.coverage.file) log.ratio <- calculateLogRatio(normal, tumor)
Function calculate mapping bias for each variant in the provided panel of normals GenomicsDB.
calculateMappingBiasGatk4( workspace, reference.genome, min.normals = 1, min.normals.betafit = 7, min.normals.assign.betafit = 3, min.normals.position.specific.fit = 10, min.median.coverage.betafit = 5, num.betafit.clusters = 9, min.betafit.rho = 1e-04, max.betafit.rho = 0.2, AF.info.field = "AF" )
calculateMappingBiasGatk4( workspace, reference.genome, min.normals = 1, min.normals.betafit = 7, min.normals.assign.betafit = 3, min.normals.position.specific.fit = 10, min.median.coverage.betafit = 5, num.betafit.clusters = 9, min.betafit.rho = 1e-04, max.betafit.rho = 0.2, AF.info.field = "AF" )
workspace |
Path to the GenomicsDB created by |
reference.genome |
Reference FASTA file. |
min.normals |
Minimum number of normals with heterozygous SNP for calculating position-specific mapping bias. |
min.normals.betafit |
Minimum number of normals with heterozygous SNP fitting a beta distribution |
min.normals.assign.betafit |
Minimum number of normals with heterozygous SNPs to assign to a beta binomal fit cluster |
min.normals.position.specific.fit |
Minimum normals to use position-specific beta-binomial fits. Otherwise only clustered fits are used. |
min.median.coverage.betafit |
Minimum median coverage of normals with heterozygous SNP for fitting a beta distribution |
num.betafit.clusters |
Maximum number of beta binomial fit clusters |
min.betafit.rho |
Minimum dispersion factor rho |
max.betafit.rho |
Maximum dispersion factor rho |
AF.info.field |
Field in the |
A GRanges
object with mapping bias and number of normal
samples with this variant.
Markus Riester
## Not run: resources_file <- system.file("extdata", "gatk4_pon_db.tgz", package = "PureCN") tmp_dir <- tempdir() untar(resources_file, exdir = tmp_dir) workspace <- file.path(tmp_dir, "gatk4_pon_db") bias <- calculateMappingBiasGatk4(workspace, "hg19") saveRDS(bias, "mapping_bias.rds") unlink(tmp_dir, recursive=TRUE) ## End(Not run)
## Not run: resources_file <- system.file("extdata", "gatk4_pon_db.tgz", package = "PureCN") tmp_dir <- tempdir() untar(resources_file, exdir = tmp_dir) workspace <- file.path(tmp_dir, "gatk4_pon_db") bias <- calculateMappingBiasGatk4(workspace, "hg19") saveRDS(bias, "mapping_bias.rds") unlink(tmp_dir, recursive=TRUE) ## End(Not run)
Function calculate mapping bias for each variant in the provided panel of normals VCF.
calculateMappingBiasVcf( normal.panel.vcf.file, min.normals = 1, min.normals.betafit = 7, min.normals.assign.betafit = 3, min.normals.position.specific.fit = 10, min.median.coverage.betafit = 5, num.betafit.clusters = 9, min.betafit.rho = 1e-04, max.betafit.rho = 0.2, yieldSize = 50000, genome )
calculateMappingBiasVcf( normal.panel.vcf.file, min.normals = 1, min.normals.betafit = 7, min.normals.assign.betafit = 3, min.normals.position.specific.fit = 10, min.median.coverage.betafit = 5, num.betafit.clusters = 9, min.betafit.rho = 1e-04, max.betafit.rho = 0.2, yieldSize = 50000, genome )
normal.panel.vcf.file |
|
min.normals |
Minimum number of normals with heterozygous SNP for calculating position-specific mapping bias. |
min.normals.betafit |
Minimum number of normals with heterozygous SNP fitting a beta binomial distribution |
min.normals.assign.betafit |
Minimum number of normals with heterozygous SNPs to assign to a beta binomal fit cluster |
min.normals.position.specific.fit |
Minimum normals to use position-specific beta-binomial fits. Otherwise only clustered fits are used. |
min.median.coverage.betafit |
Minimum median coverage of normals with heterozygous SNP for fitting a beta binomial distribution |
num.betafit.clusters |
Maximum number of beta binomial fit clusters |
min.betafit.rho |
Minimum dispersion factor rho |
max.betafit.rho |
Maximum dispersion factor rho |
yieldSize |
See |
genome |
See |
A GRanges
object with mapping bias and number of normal
samples with this variant.
Markus Riester
normal.panel.vcf <- system.file("extdata", "normalpanel.vcf.gz", package = "PureCN") bias <- calculateMappingBiasVcf(normal.panel.vcf, genome = "h19") saveRDS(bias, "mapping_bias.rds")
normal.panel.vcf <- system.file("extdata", "normalpanel.vcf.gz", package = "PureCN") bias <- calculateMappingBiasVcf(normal.panel.vcf, genome = "h19") saveRDS(bias, "mapping_bias.rds")
This function calculates the probability of correctly rejecting the null hypothesis that an alt allele is a sequencing error rather than a true (mono-)clonal mutation.
calculatePowerDetectSomatic( coverage, f = NULL, purity = NULL, ploidy = NULL, cell.fraction = 1, error = 0.001, fpr = 5e-07, verbose = TRUE )
calculatePowerDetectSomatic( coverage, f = NULL, purity = NULL, ploidy = NULL, cell.fraction = 1, error = 0.001, fpr = 5e-07, verbose = TRUE )
coverage |
Mean sequencing coverage. |
f |
Mean expected allelic fraction. If |
purity |
Purity of sample. Only required when |
ploidy |
Ploidy of sample. Only required when |
cell.fraction |
Fraction of cells harboring mutation. Ignored if
|
error |
Estimated sequencing error rate. |
fpr |
Required false positive rate for mutation vs. sequencing error. |
verbose |
Verbose output. |
A list with elements
power |
Power to detect somatic mutations. |
k |
Minimum number of supporting reads. |
f |
Expected allelic fraction. |
Markus Riester
Carter et al. (2012), Absolute quantification of somatic DNA alterations in human cancer. Nature Biotechnology.
purity <- c(0.1, 0.15, 0.2, 0.25, 0.4, 0.6, 1) coverage <- seq(5, 35, 1) power <- lapply(purity, function(p) sapply(coverage, function(cv) calculatePowerDetectSomatic(coverage = cv, purity = p, ploidy = 2, verbose = FALSE)$power)) # Figure S7b in Carter et al. plot(coverage, power[[1]], col = 1, xlab = "Sequence coverage", ylab = "Detection power", ylim = c(0, 1), type = "l") for (i in 2:length(power)) lines(coverage, power[[i]], col = i) abline(h = 0.8, lty = 2, col = "grey") legend("bottomright", legend = paste("Purity", purity), fill = seq_along(purity)) # Figure S7c in Carter et al. coverage <- seq(5, 350, 1) power <- lapply(purity, function(p) sapply(coverage, function(cv) calculatePowerDetectSomatic(coverage = cv, purity = p, ploidy = 2, cell.fraction = 0.2, verbose = FALSE)$power)) plot(coverage, power[[1]], col = 1, xlab = "Sequence coverage", ylab = "Detection power", ylim = c(0, 1), type = "l") for (i in 2:length(power)) lines(coverage, power[[i]], col = i) abline(h = 0.8, lty = 2, col = "grey") legend("bottomright", legend = paste("Purity", purity), fill = seq_along(purity))
purity <- c(0.1, 0.15, 0.2, 0.25, 0.4, 0.6, 1) coverage <- seq(5, 35, 1) power <- lapply(purity, function(p) sapply(coverage, function(cv) calculatePowerDetectSomatic(coverage = cv, purity = p, ploidy = 2, verbose = FALSE)$power)) # Figure S7b in Carter et al. plot(coverage, power[[1]], col = 1, xlab = "Sequence coverage", ylab = "Detection power", ylim = c(0, 1), type = "l") for (i in 2:length(power)) lines(coverage, power[[i]], col = i) abline(h = 0.8, lty = 2, col = "grey") legend("bottomright", legend = paste("Purity", purity), fill = seq_along(purity)) # Figure S7c in Carter et al. coverage <- seq(5, 350, 1) power <- lapply(purity, function(p) sapply(coverage, function(cv) calculatePowerDetectSomatic(coverage = cv, purity = p, ploidy = 2, cell.fraction = 0.2, verbose = FALSE)$power)) plot(coverage, power[[1]], col = 1, xlab = "Sequence coverage", ylab = "Detection power", ylim = c(0, 1), type = "l") for (i in 2:length(power)) lines(coverage, power[[i]], col = i) abline(h = 0.8, lty = 2, col = "grey") legend("bottomright", legend = paste("Purity", purity), fill = seq_along(purity))
Reimplementation of GATK4 denoising. Please cite the relevant GATK publication if you use this in a publication.
calculateTangentNormal( tumor.coverage.file, normalDB, num.eigen = 20, ignore.sex = FALSE, sex = NULL )
calculateTangentNormal( tumor.coverage.file, normalDB, num.eigen = 20, ignore.sex = FALSE, sex = NULL )
tumor.coverage.file |
Coverage file or data of a tumor sample. |
normalDB |
Database of normal samples, created with
|
num.eigen |
Number of eigen vectors used. |
ignore.sex |
If |
sex |
Sex of sample. If |
Markus Riester
tumor.coverage.file <- system.file('extdata', 'example_tumor.txt.gz', package = 'PureCN') normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") normal2.coverage.file <- system.file("extdata", "example_normal2.txt.gz", package = "PureCN") normal.coverage.files <- c(normal.coverage.file, normal2.coverage.file) normalDB <- createNormalDatabase(normal.coverage.files) pool <- calculateTangentNormal(tumor.coverage.file, normalDB)
tumor.coverage.file <- system.file('extdata', 'example_tumor.txt.gz', package = 'PureCN') normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") normal2.coverage.file <- system.file("extdata", "example_normal2.txt.gz", package = "PureCN") normal.coverage.files <- c(normal.coverage.file, normal2.coverage.file) normalDB <- createNormalDatabase(normal.coverage.files) pool <- calculateTangentNormal(tumor.coverage.file, normalDB)
Function to extract major copy number alterations from a
runAbsoluteCN
return object.
callAlterations( res, id = 1, cutoffs = c(0.5, 6, 7), log.ratio.cutoffs = c(-0.9, 0.9), failed = NULL, all.genes = FALSE )
callAlterations( res, id = 1, cutoffs = c(0.5, 6, 7), log.ratio.cutoffs = c(-0.9, 0.9), failed = NULL, all.genes = FALSE )
res |
Return object of the |
id |
Candidate solutions to be used. |
cutoffs |
Copy numbers cutoffs to call losses, focal amplifications and broad amplifications. |
log.ratio.cutoffs |
Copy numbers log-ratio cutoffs to call losses and amplifications in failed samples. |
failed |
Indicates whether sample was failed. If |
all.genes |
If |
A data.frame
with gene-level amplification and deletion
calls.
Markus Riester
data(purecn.example.output) callAlterations(purecn.example.output) callAlterations(purecn.example.output, all.genes=TRUE)["ESR2",]
data(purecn.example.output) callAlterations(purecn.example.output) callAlterations(purecn.example.output, all.genes=TRUE)["ESR2",]
This function can be used to obtain gene-level copy number calls from segmentations. This is useful for comparing PureCN's segmentations with segmentations obtained by different tools on the gene-level. Segmentation file can contain multiple samples.
callAlterationsFromSegmentation( sampleid, chr, start, end, num.mark = NA, seg.mean, C, interval.file, fun.focal = findFocal, args.focal = list(), ... )
callAlterationsFromSegmentation( sampleid, chr, start, end, num.mark = NA, seg.mean, C, interval.file, fun.focal = findFocal, args.focal = list(), ... )
sampleid |
The sampleid column in the segmentation file. |
chr |
The chromosome column. |
start |
The start positions of the segments. |
end |
The end positions of the segments. |
num.mark |
Optionally, the number of probes or markers in each segment. |
seg.mean |
The segment mean. |
C |
The segment integer copy number. |
interval.file |
A mapping file that assigns GC content and gene symbols
to each exon in the coverage files. Used for generating gene-level calls.
First column in format CHR:START-END. Second column GC content (0 to 1).
Third column gene symbol. This file is generated with the
|
fun.focal |
Function for identifying focal amplifications. Defaults to
|
args.focal |
Arguments for focal amplification function. |
... |
Arguments passed to |
A list of callAlterations
data.frame
objects,
one for each sample.
Markus Riester
data(purecn.example.output) seg <- purecn.example.output$results[[1]]$seg interval.file <- system.file("extdata", "example_intervals.txt", package = "PureCN") calls <- callAlterationsFromSegmentation(sampleid = seg$ID, chr = seg$chrom, start = seg$loc.start, end = seg$loc.end, num.mark = seg$num.mark, seg.mean = seg$seg.mean, C = seg$C, interval.file = interval.file)
data(purecn.example.output) seg <- purecn.example.output$results[[1]]$seg interval.file <- system.file("extdata", "example_intervals.txt", package = "PureCN") calls <- callAlterationsFromSegmentation(sampleid = seg$ID, chr = seg$chrom, start = seg$loc.start, end = seg$loc.end, num.mark = seg$num.mark, seg.mean = seg$seg.mean, C = seg$C, interval.file = interval.file)
Function to extract amplification from a
runAbsoluteCN
return object in samples of too low purity
for the standard callAlterations
.
callAmplificationsInLowPurity( res, normalDB, pvalue.cutoff = 0.001, percentile.cutoff = 90, min.width = 3, all.genes = FALSE, purity = NULL, BPPARAM = NULL )
callAmplificationsInLowPurity( res, normalDB, pvalue.cutoff = 0.001, percentile.cutoff = 90, min.width = 3, all.genes = FALSE, purity = NULL, BPPARAM = NULL )
res |
Return object of the |
normalDB |
Normal database, created with
|
pvalue.cutoff |
Copy numbers log-ratio cutoffs to call
amplifications as calculating using the log-ratios observed in
|
percentile.cutoff |
Only report genes with log2-ratio mean exceeding this sample-wise cutoff. |
min.width |
Minimum number of targets |
all.genes |
If |
purity |
If not |
BPPARAM |
|
A data.frame
with gene-level amplification calls.
Markus Riester
data(purecn.example.output) normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") normal2.coverage.file <- system.file("extdata", "example_normal2.txt.gz", package = "PureCN") normal.coverage.files <- c(normal.coverage.file, normal2.coverage.file) normalDB <- createNormalDatabase(normal.coverage.files) callAmplificationsInLowPurity(purecn.example.output, normalDB)["EIF2A", ]
data(purecn.example.output) normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") normal2.coverage.file <- system.file("extdata", "example_normal2.txt.gz", package = "PureCN") normal.coverage.files <- c(normal.coverage.file, normal2.coverage.file) normalDB <- createNormalDatabase(normal.coverage.files) callAmplificationsInLowPurity(purecn.example.output, normalDB)["EIF2A", ]
This function provides detailed CIN information.
callCIN( res, id = 1, allele.specific = TRUE, reference.state = c("dominant", "normal") )
callCIN( res, id = 1, allele.specific = TRUE, reference.state = c("dominant", "normal") )
res |
Return object of the |
id |
Candidate solution to extract CIN from. |
allele.specific |
Use allele-specific or only total copy number for
detecting abnormal regions. Copy-number neutral LOH would be ignored when
this parameter is set to |
reference.state |
Copy number regions different from the reference
state are counted as abnormal. Default is |
Returns double(1)
with CIN value.
Markus Riester
data(purecn.example.output) head(callCIN(purecn.example.output))
data(purecn.example.output) head(callCIN(purecn.example.output))
This function provides detailed LOH information by region.
callLOH(res, id = 1, arm.cutoff = 0.9, keep.no.snp.segments = TRUE)
callLOH(res, id = 1, arm.cutoff = 0.9, keep.no.snp.segments = TRUE)
res |
Return object of the |
id |
Candidate solution to extract LOH from. |
arm.cutoff |
Min fraction LOH on a chromosome arm to call whole arm events. |
keep.no.snp.segments |
Segments without heterozygous SNPs have no LOH information. This defines whether these segments should be reported anyways. |
Returns data.frame
with LOH regions.
Markus Riester
data(purecn.example.output) head(callLOH(purecn.example.output))
data(purecn.example.output) head(callLOH(purecn.example.output))
This function provides detailed mutation burden information.
callMutationBurden( res, id = 1, remove.flagged = TRUE, min.prior.somatic = 0.1, max.prior.somatic = 1, min.cellfraction = 0, fun.countMutation = function(vcf) width(vcf) == 1, callable = NULL, exclude = NULL )
callMutationBurden( res, id = 1, remove.flagged = TRUE, min.prior.somatic = 0.1, max.prior.somatic = 1, min.cellfraction = 0, fun.countMutation = function(vcf) width(vcf) == 1, callable = NULL, exclude = NULL )
res |
Return object of the |
id |
Candidate solution to extract mutation burden from.
|
remove.flagged |
Remove variants flagged by
|
min.prior.somatic |
Exclude variants with somatic prior probability lower than this cutoff. |
max.prior.somatic |
Exclude variants with somatic prior probability higher than this cutoff. This is useful for removing hotspot mutations in small panels that might inflate the mutation burden. |
min.cellfraction |
Exclude variants with cellular fraction lower than this cutoff. These are sub-clonal mutations or artifacts with very low allelic fraction. |
fun.countMutation |
Function that can be used to filter the
input VCF further for filtering, for example to only keep missense
mutations. Expects a |
callable |
|
exclude |
|
Returns data.frame
with mutation counts and sizes
of callable regions.
Markus Riester
data(purecn.example.output) callMutationBurden(purecn.example.output) # To calculate exact mutations per megabase, we can provide a BED # file containing all callable regions callableBed <- import(system.file("extdata", "example_callable.bed.gz", package = "PureCN")) # We can exclude some regions for mutation burden calculation, # for example intronic regions. exclude <- GRanges(seqnames = "chr1", IRanges(start = 1, end = max(end(callableBed)))) # We can also exclude specific mutations by filtering the input VCF myVcfFilter <- function(vcf) seqnames(vcf)!="chr2" callsCallable <- callMutationBurden(purecn.example.output, callable = callableBed, exclude = exclude, fun.countMutation = myVcfFilter)
data(purecn.example.output) callMutationBurden(purecn.example.output) # To calculate exact mutations per megabase, we can provide a BED # file containing all callable regions callableBed <- import(system.file("extdata", "example_callable.bed.gz", package = "PureCN")) # We can exclude some regions for mutation burden calculation, # for example intronic regions. exclude <- GRanges(seqnames = "chr1", IRanges(start = 1, end = max(end(callableBed)))) # We can also exclude specific mutations by filtering the input VCF myVcfFilter <- function(vcf) seqnames(vcf)!="chr2" callsCallable <- callMutationBurden(purecn.example.output, callable = callableBed, exclude = exclude, fun.countMutation = myVcfFilter)
A list of data.frames containing centromere positions for hg18, hg19 and hg38. Downloaded from the UCSC genome browser.
data(centromeres)
data(centromeres)
A list with three data frames, "hg18", "hg19", and "hg38". Each containes three columns
chrom
a factor with levels chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr20
chr21
chr22
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrX
chrY
chromStart
a numeric vector
chromEnd
a numeric vector
The script downloadCentromeres.R in the extdata directory was used to generate the data.frames.
data(centromeres)
data(centromeres)
Takes as input coverage data and a mapping file for GC content and optionally replication timing. Will then normalize coverage data for GC-bias. Plots the pre and post normalization GC profiles.
correctCoverageBias( coverage.file, interval.file, output.file = NULL, plot.bias = FALSE, plot.max.density = 50000, output.qc.file = NULL )
correctCoverageBias( coverage.file, interval.file, output.file = NULL, plot.bias = FALSE, plot.max.density = 50000, output.qc.file = NULL )
coverage.file |
Coverage file or coverage data parsed with the
|
interval.file |
File providing GC content for each exon in the coverage
files. First column in format CHR:START-END. Additional optional columns
provide gene symbols, mappability and replication timing. This file is
generated with the |
output.file |
Optionally, write file with GC corrected coverage. Can be
read with the |
plot.bias |
Optionally, plot profiles of the pre-normalized and post-normalized coverage. Provides a quick visual check of coverage bias. |
plot.max.density |
By default, if the number of intervals in the
probe-set is > 50000, uses a kernel density estimate to plot the coverage
distribution. This uses the |
output.qc.file |
Write miscellaneous coverage QC metrics to file. |
Angad Singh, Markus Riester
normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") interval.file <- system.file("extdata", "example_intervals.txt", package = "PureCN") coverage <- correctCoverageBias(normal.coverage.file, interval.file)
normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") interval.file <- system.file("extdata", "example_intervals.txt", package = "PureCN") coverage <- correctCoverageBias(normal.coverage.file, interval.file)
Function to create a CSV file that can be used to mark the correct solution
in the output of a runAbsoluteCN
run.
createCurationFile( file.rds, overwrite.uncurated = TRUE, overwrite.curated = FALSE )
createCurationFile( file.rds, overwrite.uncurated = TRUE, overwrite.curated = FALSE )
file.rds |
Output of the |
overwrite.uncurated |
Overwrite existing files unless flagged as ‘Curated’. |
overwrite.curated |
Overwrite existing files even if flagged as ‘Curated’. |
A data.frame
with the tumor purity and ploidy of the maximum
likelihood solution.
Markus Riester
data(purecn.example.output) file.rds <- "Sample1_PureCN.rds" saveRDS(purecn.example.output, file = file.rds) createCurationFile(file.rds)
data(purecn.example.output) file.rds <- "Sample1_PureCN.rds" saveRDS(purecn.example.output, file = file.rds) createCurationFile(file.rds)
Function to create a database of normal samples, used to normalize tumor coverages.
createNormalDatabase( normal.coverage.files, sex = NULL, coverage.outliers = c(0.25, 4), min.coverage = 0.25, max.missing = 0.03, low.coverage = 15, optimal.off.target.counts = 120, plot = FALSE, ... )
createNormalDatabase( normal.coverage.files, sex = NULL, coverage.outliers = c(0.25, 4), min.coverage = 0.25, max.missing = 0.03, low.coverage = 15, optimal.off.target.counts = 120, plot = FALSE, ... )
normal.coverage.files |
Vector with file names pointing to coverage files of normal samples. |
sex |
|
coverage.outliers |
Exclude samples with coverages below or above the specified cutoffs (fractions of the normal sample coverages median). Only for databases with more than 5 samples. |
min.coverage |
Exclude intervals with coverage lower than the specified fraction of the chromosome median in the pool of normals. |
max.missing |
Exclude intervals with zero coverage in the specified fraction of normal samples. |
low.coverage |
Specifies the maximum number of total reads (NOT average coverage) to call a target low coverage. |
optimal.off.target.counts |
Used to suggest an optimal off-target interval width (BETA). |
plot |
Diagnostics plot, useful to tune parameters. |
... |
Arguments passed to the |
A normal database that can be used in the
calculateTangentNormal
function to retrieve a coverage
normalization sample for a given tumor sample.
Markus Riester
normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") normal2.coverage.file <- system.file("extdata", "example_normal2.txt.gz", package = "PureCN") normal.coverage.files <- c(normal.coverage.file, normal2.coverage.file) normalDB <- createNormalDatabase(normal.coverage.files)
normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") normal2.coverage.file <- system.file("extdata", "example_normal2.txt.gz", package = "PureCN") normal.coverage.files <- c(normal.coverage.file, normal2.coverage.file) normalDB <- createNormalDatabase(normal.coverage.files)
This function determines which intervals in the coverage files should be
included or excluded in the segmentation. It is called via the
fun.filterIntervals
argument of runAbsoluteCN
. The
arguments are passed via args.filterIntervals
.
filterIntervals( normal, tumor, log.ratio, seg.file, filter.lowhigh.gc = 0.001, min.coverage = 15, min.total.counts = 100, min.targeted.base = 5, min.mappability = c(0.6, 0.1), min.fraction.offtarget = 0.05, normalDB = NULL )
filterIntervals( normal, tumor, log.ratio, seg.file, filter.lowhigh.gc = 0.001, min.coverage = 15, min.total.counts = 100, min.targeted.base = 5, min.mappability = c(0.6, 0.1), min.fraction.offtarget = 0.05, normalDB = NULL )
normal |
Coverage data for normal sample. |
tumor |
Coverage data for tumor sample. |
log.ratio |
Copy number log-ratios, one for each interval in the coverage file. |
seg.file |
If not |
filter.lowhigh.gc |
Quantile q (defines lower q and upper 1-q) for
removing intervals with outlier GC profile. Assuming that GC correction might
not have been worked on those. Requires |
min.coverage |
Minimum coverage in both normal and tumor. Intervals with
lower coverage are ignored. If a |
min.total.counts |
Exclude intervals with fewer than that many reads in combined tumor and normal. |
min.targeted.base |
Exclude intervals with targeted base (size in bp) smaller than this cutoff. This is useful when the same interval file was used to calculate GC content. For such small targets, the GC content is likely very different from the true GC content of the probes. |
min.mappability |
|
min.fraction.offtarget |
Skip off-target regions when less than the specified fraction of all intervals passes all filters |
normalDB |
Normal database, created with
|
logical(length(log.ratio))
specifying which intervals should be
used in segmentation.
Markus Riester
normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") normal2.coverage.file <- system.file("extdata", "example_normal2.txt.gz", package = "PureCN") normal.coverage.files <- c(normal.coverage.file, normal2.coverage.file) normalDB <- createNormalDatabase(normal.coverage.files) tumor.coverage.file <- system.file("extdata", "example_tumor.txt.gz", package = "PureCN") vcf.file <- system.file("extdata", "example.vcf.gz", package = "PureCN") interval.file <- system.file("extdata", "example_intervals.txt", package = "PureCN") # The max.candidate.solutions, max.ploidy and test.purity parameters are set to # non-default values to speed-up this example. This is not a good idea for real # samples. ret <-runAbsoluteCN(normal.coverage.file = normal.coverage.file, tumor.coverage.file = tumor.coverage.file, genome = "hg19", vcf.file = vcf.file, normalDB = normalDB, sampleid = "Sample1", interval.file = interval.file, args.filterIntervals = list(min.targeted.base = 10), max.ploidy = 4, test.purity = seq(0.3, 0.7, by = 0.05), max.candidate.solutions = 1)
normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") normal2.coverage.file <- system.file("extdata", "example_normal2.txt.gz", package = "PureCN") normal.coverage.files <- c(normal.coverage.file, normal2.coverage.file) normalDB <- createNormalDatabase(normal.coverage.files) tumor.coverage.file <- system.file("extdata", "example_tumor.txt.gz", package = "PureCN") vcf.file <- system.file("extdata", "example.vcf.gz", package = "PureCN") interval.file <- system.file("extdata", "example_intervals.txt", package = "PureCN") # The max.candidate.solutions, max.ploidy and test.purity parameters are set to # non-default values to speed-up this example. This is not a good idea for real # samples. ret <-runAbsoluteCN(normal.coverage.file = normal.coverage.file, tumor.coverage.file = tumor.coverage.file, genome = "hg19", vcf.file = vcf.file, normalDB = normalDB, sampleid = "Sample1", interval.file = interval.file, args.filterIntervals = list(min.targeted.base = 10), max.ploidy = 4, test.purity = seq(0.3, 0.7, by = 0.05), max.candidate.solutions = 1)
Function to remove artifacts and low confidence/quality variant calls.
filterVcfBasic( vcf, tumor.id.in.vcf = NULL, use.somatic.status = TRUE, snp.blacklist = NULL, af.range = c(0.03, 0.97), contamination.range = c(0.01, 0.075), min.coverage = 15, min.base.quality = 25, max.base.quality = 50, base.quality.offset = 1, min.supporting.reads = NULL, error = 0.001, target.granges = NULL, remove.off.target.snvs = TRUE, model.homozygous = FALSE, interval.padding = 50, DB.info.flag = "DB" )
filterVcfBasic( vcf, tumor.id.in.vcf = NULL, use.somatic.status = TRUE, snp.blacklist = NULL, af.range = c(0.03, 0.97), contamination.range = c(0.01, 0.075), min.coverage = 15, min.base.quality = 25, max.base.quality = 50, base.quality.offset = 1, min.supporting.reads = NULL, error = 0.001, target.granges = NULL, remove.off.target.snvs = TRUE, model.homozygous = FALSE, interval.padding = 50, DB.info.flag = "DB" )
vcf |
|
tumor.id.in.vcf |
The tumor id in the |
use.somatic.status |
If somatic status and germline data is available, then use this information to remove non-heterozygous germline SNPs or germline SNPS with biased allelic fractions. |
snp.blacklist |
A file with blacklisted genomic regions. Must
be parsable by |
af.range |
Exclude variants with allelic fraction smaller or greater than the two values, respectively. The higher value removes homozygous SNPs, which potentially have allelic fractions smaller than 1 due to artifacts or contamination. If a matched normal is available, this value is ignored, because homozygosity can be confirmed in the normal. |
contamination.range |
Count variants in germline databases with allelic fraction in the specified range. If the number of these putative contamination variants exceeds an expected value and if they are found on almost all chromosomes, the sample is flagged as potentially contaminated and extra contamination estimation steps will be performed later on. |
min.coverage |
Minimum coverage in tumor. Variants with lower coverage are ignored. |
min.base.quality |
Minimium base quality in tumor. Requires a |
max.base.quality |
Maximum base quality in tumor. Requires a |
base.quality.offset |
Subtracts the specified value from the base quality score.
Useful to add some cushion for too optimistically calibrated scores.
Requires a |
min.supporting.reads |
Minimum number of reads supporting the alt
allele. If |
error |
Estimated sequencing error rate. Used to calculate minimum
number of supporting reads using |
target.granges |
|
remove.off.target.snvs |
If set to a true value, will remove all SNVs outside the covered regions. |
model.homozygous |
If set to |
interval.padding |
Include variants in the interval flanking regions of
the specified size in bp. Requires |
DB.info.flag |
Flag in INFO of VCF that marks presence in common
germline databases. Defaults to |
A list with elements
vcf |
The filtered |
flag |
A flag ( |
flag_comment |
A comment describing the flagging. |
Markus Riester
# This function is typically only called by runAbsolute via # fun.filterVcf and args.filterVcf. vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") vcf <- readVcf(vcf.file, "hg19") vcf.filtered <- filterVcfBasic(vcf)
# This function is typically only called by runAbsolute via # fun.filterVcf and args.filterVcf. vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") vcf <- readVcf(vcf.file, "hg19") vcf.filtered <- filterVcfBasic(vcf)
Function to remove artifacts and low confidence/quality calls from a MuTect
generated VCF file. Also applies filters defined in filterVcfBasic
.
This function will only keep variants listed in the stats file and those not
matching the specified failure reasons.
filterVcfMuTect( vcf, tumor.id.in.vcf = NULL, stats.file = NULL, ignore = c("clustered_read_position", "fstar_tumor_lod", "nearby_gap_events", "poor_mapping_region_alternate_allele_mapq", "poor_mapping_region_mapq0", "possible_contamination", "strand_artifact", "seen_in_panel_of_normals"), ... )
filterVcfMuTect( vcf, tumor.id.in.vcf = NULL, stats.file = NULL, ignore = c("clustered_read_position", "fstar_tumor_lod", "nearby_gap_events", "poor_mapping_region_alternate_allele_mapq", "poor_mapping_region_mapq0", "possible_contamination", "strand_artifact", "seen_in_panel_of_normals"), ... )
vcf |
|
tumor.id.in.vcf |
The tumor id in the VCF file, optional. |
stats.file |
MuTect stats file. If |
ignore |
MuTect flags that mark variants for exclusion. |
... |
Additional arguments passed to |
A list with elements vcf
, flag
and
flag_comment
. vcf
contains the filtered CollapsedVCF
,
flag
a logical(1)
flag if problems were identified, further
described in flag_comment
.
Markus Riester
### This function is typically only called by runAbsolute via the ### fun.filterVcf and args.filterVcf comments. library(VariantAnnotation) vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") vcf <- readVcf(vcf.file, "hg19") vcf.filtered <- filterVcfMuTect(vcf)
### This function is typically only called by runAbsolute via the ### fun.filterVcf and args.filterVcf comments. library(VariantAnnotation) vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") vcf <- readVcf(vcf.file, "hg19") vcf.filtered <- filterVcfMuTect(vcf)
Function to remove artifacts and low confidence/quality calls from a
GATK4/MuTect2 generated VCF file. Also applies filters defined in
filterVcfBasic
.
filterVcfMuTect2( vcf, tumor.id.in.vcf = NULL, ignore = c("clustered_events", "t_lod", "str_contraction", "read_position", "position", "fragment_length", "multiallelic", "clipping", "strand_artifact", "strand_bias", "slippage", "weak_evidence", "orientation", "haplotype"), ... )
filterVcfMuTect2( vcf, tumor.id.in.vcf = NULL, ignore = c("clustered_events", "t_lod", "str_contraction", "read_position", "position", "fragment_length", "multiallelic", "clipping", "strand_artifact", "strand_bias", "slippage", "weak_evidence", "orientation", "haplotype"), ... )
vcf |
|
tumor.id.in.vcf |
The tumor id in the VCF file, optional. |
ignore |
MuTect2 flags that mark variants for exclusion. |
... |
Additional arguments passed to |
A list with elements vcf
, flag
and
flag_comment
. vcf
contains the filtered CollapsedVCF
,
flag
a logical(1)
flag if problems were identified, further
described in flag_comment
.
Markus Riester
### This function is typically only called by runAbsolute via the ### fun.filterVcf and args.filterVcf comments. library(VariantAnnotation) vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") vcf <- readVcf(vcf.file, "hg19") vcf.filtered <- filterVcfMuTect(vcf)
### This function is typically only called by runAbsolute via the ### fun.filterVcf and args.filterVcf comments. library(VariantAnnotation) vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") vcf <- readVcf(vcf.file, "hg19") vcf.filtered <- filterVcfMuTect(vcf)
Function to find focal amplifications in segmented data. This is
automatically called in runAbsoluteCN
.
findFocal(seg, max.size = 3e+06, cn.diff = 2, min.amp.cn = 5)
findFocal(seg, max.size = 3e+06, cn.diff = 2, min.amp.cn = 5)
seg |
Segmentation data. |
max.size |
Cutoff for focal in base pairs. |
cn.diff |
Minimum copy number delta between neighboring segments. |
min.amp.cn |
Minimum amplification integer copy number. Segments with lower copy number are not tested. |
logical(n)
, indicating for all n segments whether they are
focally amplified or not.
Markus Riester
normal.coverage.file <- system.file("extdata", "example_normal_tiny.txt", package = "PureCN") tumor.coverage.file <- system.file("extdata", "example_tumor_tiny.txt", package = "PureCN") vcf.file <- system.file("extdata", "example.vcf.gz", package = "PureCN") interval.file <- system.file("extdata", "example_intervals_tiny.txt", package = "PureCN") # The max.candidate.solutions, max.ploidy and test.purity parameters are set to # non-default values to speed-up this example. This is not a good idea for real # samples. ret <-runAbsoluteCN(normal.coverage.file = normal.coverage.file, tumor.coverage.file = tumor.coverage.file, vcf.file = vcf.file, genome="hg19", sampleid = "Sample1", interval.file = interval.file, max.candidate.solutions = 1, max.ploidy = 4, test.purity = seq(0.3, 0.7, by = 0.05), args.focal=list(max.size = 2e+06), fun.focal = findFocal)
normal.coverage.file <- system.file("extdata", "example_normal_tiny.txt", package = "PureCN") tumor.coverage.file <- system.file("extdata", "example_tumor_tiny.txt", package = "PureCN") vcf.file <- system.file("extdata", "example.vcf.gz", package = "PureCN") interval.file <- system.file("extdata", "example_intervals_tiny.txt", package = "PureCN") # The max.candidate.solutions, max.ploidy and test.purity parameters are set to # non-default values to speed-up this example. This is not a good idea for real # samples. ret <-runAbsoluteCN(normal.coverage.file = normal.coverage.file, tumor.coverage.file = tumor.coverage.file, vcf.file = vcf.file, genome="hg19", sampleid = "Sample1", interval.file = interval.file, max.candidate.solutions = 1, max.ploidy = 4, test.purity = seq(0.3, 0.7, by = 0.05), args.focal=list(max.size = 2e+06), fun.focal = findFocal)
This function determines the sex of a sample by the coverage ratio of chrX and chrY. Loss of chromosome Y (LOY) can result in a wrong female call. For small targeted panels, this will only work when sufficient sex marker genes such as AMELY are covered. For optimal results, parameters might need to be tuned for the assay.
getSexFromCoverage( coverage.file, min.ratio = 25, min.ratio.na = 20, remove.outliers = TRUE )
getSexFromCoverage( coverage.file, min.ratio = 25, min.ratio.na = 20, remove.outliers = TRUE )
coverage.file |
Coverage file or data read with
|
min.ratio |
Min chrX/chrY coverage ratio to call sample as female. |
min.ratio.na |
Min chrX/chrY coverage ratio to call sample as
|
remove.outliers |
Removes coverage outliers before calculating mean chromosome coverages. |
Returns a character(1)
with M
for male, F
for
female, or NA
if unknown.
Markus Riester
tumor.coverage.file <- system.file("extdata", "example_tumor.txt.gz", package = "PureCN") sex <- getSexFromCoverage(tumor.coverage.file)
tumor.coverage.file <- system.file("extdata", "example_tumor.txt.gz", package = "PureCN") sex <- getSexFromCoverage(tumor.coverage.file)
This function detects non-random distribution of homozygous variants on chromosome X compared to all other chromosomes. A non-significant Fisher's exact p-value indicates more than one chromosome X copy. This function is called in runAbsoluteCN as sanity check when a VCF is provided. It is also useful for determining sex when no sex marker genes on chrY (e.g. AMELY) are available.
getSexFromVcf( vcf, tumor.id.in.vcf = NULL, min.or = 4, min.or.na = 2.5, max.pv = 0.001, homozygous.cutoff = 0.95, af.cutoff = 0.2, min.coverage = 15, use.somatic.status = TRUE )
getSexFromVcf( vcf, tumor.id.in.vcf = NULL, min.or = 4, min.or.na = 2.5, max.pv = 0.001, homozygous.cutoff = 0.95, af.cutoff = 0.2, min.coverage = 15, use.somatic.status = TRUE )
vcf |
CollapsedVCF object, read in with the |
tumor.id.in.vcf |
The tumor id in the CollapsedVCF (optional). |
min.or |
Minimum odds-ratio to call sample as male. If p-value is not significant due to a small number of SNPs on chromosome X, sample will be called as NA even when odds-ratio exceeds this cutoff. |
min.or.na |
Minimum odds-ratio to not call a sample. Odds-ratios in the
range |
max.pv |
Maximum Fisher's exact p-value to call sample as male. |
homozygous.cutoff |
Minimum allelic fraction to call position homozygous. |
af.cutoff |
Remove all SNVs with allelic fraction lower than the specified value. |
min.coverage |
Minimum coverage in tumor. Variants with lower coverage are ignored. |
use.somatic.status |
If somatic status and germline data is available, then exclude somatic variants. |
Returns a character(1)
with M
for male, F
for
female, or NA
if unknown.
Markus Riester
vcf.file <- system.file("extdata", "example.vcf.gz", package = "PureCN") vcf <- readVcf(vcf.file, "hg19") # This example vcf is filtered and contains no homozygous calls, # which are necessary for determining sex from chromosome X. getSexFromVcf(vcf)
vcf.file <- system.file("extdata", "example.vcf.gz", package = "PureCN") vcf <- readVcf(vcf.file, "hg19") # This example vcf is filtered and contains no homozygous calls, # which are necessary for determining sex from chromosome X. getSexFromVcf(vcf)
This function provides various plots for finding correct purity and ploidy
combinations in the results of a runAbsoluteCN
call.
plotAbs( res, id = 1, type = c("hist", "overview", "BAF", "AF", "all"), chr = NULL, germline.only = TRUE, show.contour = FALSE, purity = NULL, ploidy = NULL, alpha = TRUE, show.segment.means = c("SNV", "segments", "both"), max.mapping.bias = 0.8, palette.name = "Paired", col.snps = "#2b6391", col.chr.shading = "#f0f0f0", ... )
plotAbs( res, id = 1, type = c("hist", "overview", "BAF", "AF", "all"), chr = NULL, germline.only = TRUE, show.contour = FALSE, purity = NULL, ploidy = NULL, alpha = TRUE, show.segment.means = c("SNV", "segments", "both"), max.mapping.bias = 0.8, palette.name = "Paired", col.snps = "#2b6391", col.chr.shading = "#f0f0f0", ... )
res |
Return object of the |
id |
Candidate solutions to be plotted. |
type |
Different types of plots. |
chr |
If |
germline.only |
If |
show.contour |
For |
purity |
Display expected integer copy numbers for purity, defaults to
purity of the solution ( |
ploidy |
Display expected integer copy numbers for ploidy, defaults to
ploidy of the solution ( |
alpha |
Add transparency to the plot if VCF contains many variants
(>2000, |
show.segment.means |
Show segment means in germline allele frequency
plot? If |
max.mapping.bias |
Exclude variants with high mapping bias from
plotting. Note that bias is reported on an inverse scale; a variant with
mapping bias of 1 has no bias. ( |
palette.name |
The default |
col.snps |
The color used for germline SNPs. |
col.chr.shading |
The color used for shading alternate chromosomes. |
... |
Additonal parameters passed to the |
Returns NULL
.
Markus Riester
data(purecn.example.output) plotAbs(purecn.example.output, type="overview") # plot details for the maximum likelihood solution (rank 1) plotAbs(purecn.example.output, 1, type="hist") plotAbs(purecn.example.output, 1, type="BAF") plotAbs(purecn.example.output, 1, type = "BAF", chr="chr2")
data(purecn.example.output) plotAbs(purecn.example.output, type="overview") # plot details for the maximum likelihood solution (rank 1) plotAbs(purecn.example.output, 1, type="hist") plotAbs(purecn.example.output, 1, type="BAF") plotAbs(purecn.example.output, 1, type = "BAF", chr="chr2")
Averages the coverage of a list of samples.
poolCoverage(all.data, remove.chrs = c(), w = NULL)
poolCoverage(all.data, remove.chrs = c(), w = NULL)
all.data |
List of normals, read with |
remove.chrs |
Remove these chromosomes from the pool. |
w |
|
A data.frame
with the averaged coverage over all normals.
Markus Riester
normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") normal2.coverage.file <- system.file("extdata", "example_normal2.txt.gz", package = "PureCN") normal.coverage.files <- c(normal.coverage.file, normal2.coverage.file) pool <- poolCoverage(lapply(normal.coverage.files, readCoverageFile), remove.chrs = c("chrX", "chrY"))
normal.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") normal2.coverage.file <- system.file("extdata", "example_normal2.txt.gz", package = "PureCN") normal.coverage.files <- c(normal.coverage.file, normal2.coverage.file) pool <- poolCoverage(lapply(normal.coverage.files, readCoverageFile), remove.chrs = c("chrX", "chrY"))
This function takes as input the output of a runAbsoluteCN
run
and provides SNV posterior probabilities for all possible states.
predictSomatic(res, id = 1, return.vcf = FALSE)
predictSomatic(res, id = 1, return.vcf = FALSE)
res |
Return object of the |
id |
Candidate solutions to be analyzed. |
return.vcf |
Returns an annotated |
A data.frame
or CollapsedVCF
with SNV state posterior
probabilities.
Markus Riester
data(purecn.example.output) # the output data was created using a matched normal sample, but in case # no matched normal is available, this will help predicting somatic vs. # germline status purecnSnvs <- predictSomatic(purecn.example.output) # Prefer GRanges? purecnSnvs <- GRanges(predictSomatic(purecn.example.output)) # write a VCF file purecnVcf <- predictSomatic(purecn.example.output, return.vcf=TRUE) writeVcf(purecnVcf, file = "Sample1_PureCN.vcf")
data(purecn.example.output) # the output data was created using a matched normal sample, but in case # no matched normal is available, this will help predicting somatic vs. # germline status purecnSnvs <- predictSomatic(purecn.example.output) # Prefer GRanges? purecnSnvs <- GRanges(predictSomatic(purecn.example.output)) # write a VCF file purecnVcf <- predictSomatic(purecn.example.output, return.vcf=TRUE) writeVcf(purecnVcf, file = "Sample1_PureCN.vcf")
Optimize intervals for copy number calling by tiling long intervals and by
including off-target regions. Uses scanFa
from the Rsamtools package
to retrieve GC content of intervals in a reference FASTA file. If provided,
will annotate intervals with mappability and replication timing scores.
preprocessIntervals( interval.file, reference.file, output.file = NULL, off.target = FALSE, average.target.width = 400, min.target.width = 100, min.off.target.width = 20000, average.off.target.width = 2e+05, off.target.padding = -500, mappability = NULL, min.mappability = c(0.6, 0.1, 0.7), reptiming = NULL, average.reptiming.width = 1e+05, exclude = NULL, off.target.seqlevels = c("targeted", "all"), small.targets = c("resize", "drop") )
preprocessIntervals( interval.file, reference.file, output.file = NULL, off.target = FALSE, average.target.width = 400, min.target.width = 100, min.off.target.width = 20000, average.off.target.width = 2e+05, off.target.padding = -500, mappability = NULL, min.mappability = c(0.6, 0.1, 0.7), reptiming = NULL, average.reptiming.width = 1e+05, exclude = NULL, off.target.seqlevels = c("targeted", "all"), small.targets = c("resize", "drop") )
interval.file |
File specifying the intervals. Interval is expected in
first column in format CHR:START-END. Instead of a file, a |
reference.file |
Reference FASTA file. |
output.file |
Optionally, write GC content file. |
off.target |
Include off-target regions. |
average.target.width |
Split large targets to approximately this size. |
min.target.width |
Make sure that target regions are of at least
this specified width. See |
min.off.target.width |
Only include off-target regions of that size |
average.off.target.width |
Split off-target regions to that |
off.target.padding |
Pad off-target regions. |
mappability |
Annotate intervals with mappability score. Assumed on a scale
from 0 to 1, with score being 1/(number alignments). Expected as |
min.mappability |
|
reptiming |
Annotate intervals with replication timing score. Expected as
|
average.reptiming.width |
Tile |
exclude |
Any target that overlaps with this |
off.target.seqlevels |
Controls how to deal with chromosomes/contigs
found in the |
small.targets |
Strategy to deal with targets smaller than
|
Returns GC content by interval as GRanges
object.
Markus Riester
Talevich et al. (2016). CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Comput Biol.
reference.file <- system.file("extdata", "ex2_reference.fa", package = "PureCN", mustWork = TRUE) interval.file <- system.file("extdata", "ex2_intervals.txt", package = "PureCN", mustWork = TRUE) bed.file <- system.file("extdata", "ex2_intervals.bed", package = "PureCN", mustWork = TRUE) preprocessIntervals(interval.file, reference.file, output.file = "gc_file.txt") intervals <- import(bed.file) preprocessIntervals(intervals, reference.file, output.file = "gc_file.txt")
reference.file <- system.file("extdata", "ex2_reference.fa", package = "PureCN", mustWork = TRUE) interval.file <- system.file("extdata", "ex2_intervals.txt", package = "PureCN", mustWork = TRUE) bed.file <- system.file("extdata", "ex2_intervals.bed", package = "PureCN", mustWork = TRUE) preprocessIntervals(interval.file, reference.file, output.file = "gc_file.txt") intervals <- import(bed.file) preprocessIntervals(intervals, reference.file, output.file = "gc_file.txt")
This function performs normalization and segmentation when multiple for the same patient are available.
processMultipleSamples( tumor.coverage.files, sampleids, normalDB, num.eigen = 20, genome, plot.cnv = TRUE, w = NULL, min.interval.weight = 1/3, max.segments = NULL, chr.hash = NULL, centromeres = NULL, ... )
processMultipleSamples( tumor.coverage.files, sampleids, normalDB, num.eigen = 20, genome, plot.cnv = TRUE, w = NULL, min.interval.weight = 1/3, max.segments = NULL, chr.hash = NULL, centromeres = NULL, ... )
tumor.coverage.files |
Coverage data for tumor samples. |
sampleids |
Sample ids, used in output files. |
normalDB |
Database of normal samples, created with
|
num.eigen |
Number of eigen vectors used. |
genome |
Genome version, for example hg19. Needed to get centromere positions. |
plot.cnv |
Segmentation plots. |
w |
Weight of samples. Can be used to downweight poor quality samples.
If |
min.interval.weight |
Can be used to ignore intervals with low weights. |
max.segments |
If not |
chr.hash |
Mapping of non-numerical chromsome names to numerical names
(e.g. chr1 to 1, chr2 to 2, etc.). If |
centromeres |
A |
... |
Arguments passed to the segmentation function. |
CURRENTLY DEFUNCT BECAUSE IT DEPENDS ON THE DEFUNCT COPYNUMBER PACKAGE. We are working on a replacement.
data.frame
containing the segmentation.
Markus Riester
Nilsen G., Liestol K., Van Loo P., Vollan H., Eide M., Rueda O., Chin S., Russell R., Baumbusch L., Caldas C., Borresen-Dale A., Lingjaerde O. (2012). "Copynumber: Efficient algorithms for single- and multi-track copy number segmentation." BMC Genomics, 13(1), 591.
normal1.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") normal2.coverage.file <- system.file("extdata", "example_normal2.txt.gz", package = "PureCN") tumor1.coverage.file <- system.file("extdata", "example_tumor.txt.gz", package = "PureCN") tumor2.coverage.file <- system.file("extdata", "example_tumor2.txt.gz", package = "PureCN") normal.coverage.files <- c(normal1.coverage.file, normal2.coverage.file) tumor.coverage.files <- c(tumor1.coverage.file, tumor2.coverage.file) normalDB <- createNormalDatabase(normal.coverage.files) # seg <- processMultipleSamples(tumor.coverage.files, # sampleids = c("Sample1", "Sample2"), # normalDB = normalDB, # genome = "hg19")
normal1.coverage.file <- system.file("extdata", "example_normal.txt.gz", package = "PureCN") normal2.coverage.file <- system.file("extdata", "example_normal2.txt.gz", package = "PureCN") tumor1.coverage.file <- system.file("extdata", "example_tumor.txt.gz", package = "PureCN") tumor2.coverage.file <- system.file("extdata", "example_tumor2.txt.gz", package = "PureCN") normal.coverage.files <- c(normal1.coverage.file, normal2.coverage.file) tumor.coverage.files <- c(tumor1.coverage.file, tumor2.coverage.file) normalDB <- createNormalDatabase(normal.coverage.files) # seg <- processMultipleSamples(tumor.coverage.files, # sampleids = c("Sample1", "Sample2"), # normalDB = normalDB, # genome = "hg19")
These functions are defunct and no longer available.
The following functions are defunct; use the replacement indicated below:
autoCurateResults: no replacement
calculateGCContentByInterval: preprocessIntervals
calculateIntervalWeights: createNormalDatabase
createExonWeightFile: createNormalDatabase
createSNPBlacklist: setMappingBiasVcf
createTargetWeights: createNormalDatabase
filterTargets: filterIntervals
findBestNormal: calculateTangentNormal
getDiploid: no replacement
plotBestNormal: no replacement
readCoverageGatk: readCoverageFile
These functions are provided for compatibility with older versions of ‘PureCN’ only, and will be defunct at the next release.
The following functions are deprecated and will be made defunct; use the replacement indicated below:
This provides the output of the DNAcopy::getbdry
call using segmentationCBS
default parameters.
data(purecn.DNAcopy.bdry)
data(purecn.DNAcopy.bdry)
Output of the DNAcopy::getbdry
call.
This provides the output of the runAbsoluteCN
call used in the
vignette and examples.
data(purecn.example.output)
data(purecn.example.output)
Output of the runAbsoluteCN
call used in the vignette.
Read file containing counts of ref and alt alleles of common Toolkit 4.
readAllelicCountsFile(file, format, zero = NULL)
readAllelicCountsFile(file, format, zero = NULL)
file |
Input file containing counts of ref and alt alleles |
format |
File format. If missing, derived from the file extension. Currently only GATK4 CollectAllelicCounts (tsv) format supported. |
zero |
Start position is 0-based. Default is |
A CollapsedVCF
with the parsed allelic counts.
Markus Riester
ac.file <- system.file("extdata", "example_allelic_counts.tsv", package="PureCN") vcf_ac <- readAllelicCountsFile(ac.file)
ac.file <- system.file("extdata", "example_allelic_counts.tsv", package="PureCN") vcf_ac <- readAllelicCountsFile(ac.file)
Read coverage file produced by external tools like The Genome Analysis
Toolkit or by calculateBamCoverageByInterval
.
readCoverageFile(file, format, zero = NULL, read.length = 100)
readCoverageFile(file, format, zero = NULL, read.length = 100)
file |
Target coverage file. |
format |
File format. If missing, derived from the file extension. Currently GATK3 DepthofCoverage, GATK4 CollectFragmentCounts (hdf5), and CNVkit formats supported. |
zero |
Start position is 0-based. Default is |
read.length |
For output formats which do not provide both counts and total coverages, approximate them using the specified read length. |
A data.frame
with the parsed coverage information.
Markus Riester
calculateBamCoverageByInterval
tumor.coverage.file <- system.file("extdata", "example_tumor.txt.gz", package = "PureCN") coverage <- readCoverageFile(tumor.coverage.file)
tumor.coverage.file <- system.file("extdata", "example_tumor.txt.gz", package = "PureCN") coverage <- readCoverageFile(tumor.coverage.file)
Function that can be used to read the curated output of the
runAbsoluteCN
function.
readCurationFile( file.rds, file.curation = gsub(".rds$", ".csv", file.rds), remove.failed = FALSE, report.best.only = FALSE, min.ploidy = NULL, max.ploidy = NULL )
readCurationFile( file.rds, file.curation = gsub(".rds$", ".csv", file.rds), remove.failed = FALSE, report.best.only = FALSE, min.ploidy = NULL, max.ploidy = NULL )
file.rds |
Output of the |
file.curation |
Filename of a curation file that points to the correct tumor purity and ploidy solution. |
remove.failed |
Do not return solutions that failed. |
report.best.only |
Only return correct/best solution (useful on low memory machines when lots of samples are loaded). |
min.ploidy |
Minimum ploidy to be considered. If |
max.ploidy |
Maximum ploidy to be considered. If |
The return value of the corresponding runAbsoluteCN
call, but with the results array manipulated according the curation CSV file
and arguments of this function.
Markus Riester
runAbsoluteCN createCurationFile
data(purecn.example.output) file.rds <- "Sample1_PureCN.rds" createCurationFile(file.rds) # User can change the maximum likelihood solution manually in the generated # CSV file. The correct solution is then loaded with readCurationFile. purecn.curated.example.output <-readCurationFile(file.rds)
data(purecn.example.output) file.rds <- "Sample1_PureCN.rds" createCurationFile(file.rds) # User can change the maximum likelihood solution manually in the generated # CSV file. The correct solution is then loaded with readCurationFile. purecn.curated.example.output <-readCurationFile(file.rds)
Read file containing coordinates of on- and off-target intervals
generated by preprocessIntervals
.
readIntervalFile(interval.file, strict = TRUE, verbose = TRUE)
readIntervalFile(interval.file, strict = TRUE, verbose = TRUE)
interval.file |
A mapping file that assigns GC content and gene symbols
to each exon in the coverage files. Used for generating gene-level calls.
First column in format CHR:START-END. Second column GC content (0 to 1).
Third column gene symbol. This file is generated with the
|
strict |
Error out with missing columns |
verbose |
Verbose output |
A GRanges
object with the parsed intervals.
Markus Riester
interval.file <- system.file("extdata", "example_intervals.txt", package = "PureCN") x <- readIntervalFile(interval.file)
interval.file <- system.file("extdata", "example_intervals.txt", package = "PureCN") x <- readIntervalFile(interval.file)
Read log2 ratio file produced by external tools like The Genome Analysis Toolkit version 4.
readLogRatioFile(file, format, zero = NULL)
readLogRatioFile(file, format, zero = NULL)
file |
Log2 coverage file. |
format |
File format. If missing, derived from the file extension. Currently GATK4 DenoiseReadCounts format supported. A simple GATK3-style format, two columns with coordinates as string in format chr:start-stop in first and log2-ratio in second is also supported. |
zero |
Start position is 0-based. Default is |
A GRange
with the log2 ratio.
Markus Riester
logratio.file <- system.file("extdata", "example_gatk4_denoised_cr.tsv.gz", package = "PureCN") logratio <- readLogRatioFile(logratio.file)
logratio.file <- system.file("extdata", "example_gatk4_denoised_cr.tsv.gz", package = "PureCN") logratio <- readLogRatioFile(logratio.file)
Read segmentation files produced by DNAcopy, CNVkit or GATK4.
readSegmentationFile( seg.file, sampleid, model.homozygous = FALSE, format, zero = FALSE, verbose = TRUE )
readSegmentationFile( seg.file, sampleid, model.homozygous = FALSE, format, zero = FALSE, verbose = TRUE )
seg.file |
File with segmentation |
sampleid |
Sampleid, for segmentation files containing multiple samples |
model.homozygous |
Unless |
format |
File format. If missing, derived from the file extension. Currently DNAcopy, and GATK4 (ModelSegments) format supported. CNVkit uses DNAcopy format. |
zero |
Start position is 0-based. Default is |
verbose |
Verbose output. |
A data.frame
.
Markus Riester
seg.file <- system.file("extdata", "example_seg.txt", package = "PureCN") seg <- readSegmentationFile(seg.file, "Sample1")
seg.file <- system.file("extdata", "example_seg.txt", package = "PureCN") seg <- readSegmentationFile(seg.file, "Sample1")
This function takes as input tumor and normal control coverage data and a VCF containing allelic fractions of germline variants and somatic mutations. Normal control does not need to be from the same patient. In case VCF does not contain somatic status, it should contain either dbSNP or population allele frequencies, and optionally COSMIC annotation. Returns purity and ploidy combinations, sorted by likelihood score. Provides copy number and LOH data, by both gene and genomic region.
runAbsoluteCN( normal.coverage.file = NULL, tumor.coverage.file = NULL, log.ratio = NULL, seg.file = NULL, seg.file.sdev = 0.4, vcf.file = NULL, normalDB = NULL, genome, centromeres = NULL, sex = c("?", "F", "M", "diploid"), fun.filterVcf = filterVcfMuTect, args.filterVcf = list(), fun.setPriorVcf = setPriorVcf, args.setPriorVcf = list(), fun.setMappingBiasVcf = setMappingBiasVcf, args.setMappingBiasVcf = list(), fun.filterIntervals = filterIntervals, args.filterIntervals = list(), fun.segmentation = segmentationCBS, args.segmentation = list(), fun.focal = findFocal, args.focal = list(), sampleid = NULL, min.ploidy = 1.4, max.ploidy = 6, test.num.copy = 0:7, test.purity = seq(0.15, 0.95, by = 0.01), prior.purity = NULL, prior.K = 0.999, prior.contamination = 0.01, max.candidate.solutions = 20, candidates = NULL, min.coverage = 15, max.coverage.vcf = 300, max.non.clonal = 0.2, max.homozygous.loss = c(0.05, 1e+07), non.clonal.M = 1/3, max.mapping.bias = 0.8, max.pon = 3, iterations = 30, min.variants.segment = 5, log.ratio.calibration = 0.1, smooth.log.ratio = TRUE, model.homozygous = FALSE, error = 0.001, interval.file = NULL, max.dropout = c(0.95, 1.1), min.logr.sdev = 0.15, max.logr.sdev = 0.6, max.segments = 300, min.gof = 0.8, min.variants = 20, plot.cnv = TRUE, vcf.field.prefix = "", cosmic.vcf.file = NULL, DB.info.flag = "DB", POPAF.info.field = "POP_AF", Cosmic.CNT.info.field = "Cosmic.CNT", min.pop.af = 0.001, model = c("beta", "betabin"), post.optimize = FALSE, speedup.heuristics = 2, BPPARAM = NULL, log.file = NULL, verbose = TRUE )
runAbsoluteCN( normal.coverage.file = NULL, tumor.coverage.file = NULL, log.ratio = NULL, seg.file = NULL, seg.file.sdev = 0.4, vcf.file = NULL, normalDB = NULL, genome, centromeres = NULL, sex = c("?", "F", "M", "diploid"), fun.filterVcf = filterVcfMuTect, args.filterVcf = list(), fun.setPriorVcf = setPriorVcf, args.setPriorVcf = list(), fun.setMappingBiasVcf = setMappingBiasVcf, args.setMappingBiasVcf = list(), fun.filterIntervals = filterIntervals, args.filterIntervals = list(), fun.segmentation = segmentationCBS, args.segmentation = list(), fun.focal = findFocal, args.focal = list(), sampleid = NULL, min.ploidy = 1.4, max.ploidy = 6, test.num.copy = 0:7, test.purity = seq(0.15, 0.95, by = 0.01), prior.purity = NULL, prior.K = 0.999, prior.contamination = 0.01, max.candidate.solutions = 20, candidates = NULL, min.coverage = 15, max.coverage.vcf = 300, max.non.clonal = 0.2, max.homozygous.loss = c(0.05, 1e+07), non.clonal.M = 1/3, max.mapping.bias = 0.8, max.pon = 3, iterations = 30, min.variants.segment = 5, log.ratio.calibration = 0.1, smooth.log.ratio = TRUE, model.homozygous = FALSE, error = 0.001, interval.file = NULL, max.dropout = c(0.95, 1.1), min.logr.sdev = 0.15, max.logr.sdev = 0.6, max.segments = 300, min.gof = 0.8, min.variants = 20, plot.cnv = TRUE, vcf.field.prefix = "", cosmic.vcf.file = NULL, DB.info.flag = "DB", POPAF.info.field = "POP_AF", Cosmic.CNT.info.field = "Cosmic.CNT", min.pop.af = 0.001, model = c("beta", "betabin"), post.optimize = FALSE, speedup.heuristics = 2, BPPARAM = NULL, log.file = NULL, verbose = TRUE )
normal.coverage.file |
Coverage file of normal control (optional
if log.ratio is provided - then it will be only used to filter low coverage
exons). Should be already GC-normalized with
|
tumor.coverage.file |
Coverage file of tumor. If |
log.ratio |
Copy number log-ratios for all exons in the coverage files.
If |
seg.file |
Segmented data. Optional, to support third-pary
segmentation tools. If |
seg.file.sdev |
If |
vcf.file |
VCF file.
Optional, but typically needed to select between local optima of similar
likelihood. Can also be a |
normalDB |
Normal database, created with
|
genome |
Genome version, for example hg19. See |
centromeres |
A |
sex |
Sex of sample. If |
fun.filterVcf |
Function for filtering variants. Expected output is a
list with elements |
args.filterVcf |
Arguments for variant filtering function. Arguments
|
fun.setPriorVcf |
Function to set prior for somatic status for each
variant in the VCF. Defaults to |
args.setPriorVcf |
Arguments for somatic prior function. |
fun.setMappingBiasVcf |
Function to set mapping bias for each variant
in the VCF. Defaults to |
args.setMappingBiasVcf |
Arguments for mapping bias function. |
fun.filterIntervals |
Function for filtering low-quality intervals in the
coverage files. Needs to return a |
args.filterIntervals |
Arguments for target filtering function. Arguments
|
fun.segmentation |
Function for segmenting the copy number log-ratios.
Expected return value is a |
args.segmentation |
Arguments for segmentation function. Arguments
|
fun.focal |
Function for identifying focal amplifications. Defaults to
|
args.focal |
Arguments for focal amplification function. |
sampleid |
Sample id, provided in output files etc. |
min.ploidy |
Minimum ploidy to be considered. |
max.ploidy |
Maximum ploidy to be considered. |
test.num.copy |
Copy numbers tested in the grid search. Note that focal amplifications can have much higher copy numbers, but they will be labeled as subclonal (because they do not fit the integer copy numbers). |
test.purity |
Considered tumor purity values. |
prior.purity |
|
prior.K |
This defines the prior probability that the multiplicity of a SNV corresponds to either the maternal or the paternal copy number (for somatic variants additionally to a multiplicity of 1). For perfect segmentations, this value would be 1; values smaller than 1 thus may provide some robustness against segmentation errors. |
prior.contamination |
The prior probability that a known SNP is from a different individual. |
max.candidate.solutions |
Number of local optima considered in optimization and variant fitting steps. If there are too many local optima, it will use specified number of top candidate solutions, but will also include all optima close to diploid, because silent genomes have often lots of local optima. |
candidates |
Candidates to optimize from a previous run
( |
min.coverage |
Minimum coverage in both normal and tumor. Intervals and
variants with lower coverage are ignored. This value is provided to the
|
max.coverage.vcf |
This will set the maximum number of reads in the SNV
fitting. This is to avoid that small non-reference biases that come
apparent only at high coverages have a dramatic influence on likelihood
scores. Only relevant for |
max.non.clonal |
Maximum genomic fraction assigned to a subclonal copy number state. |
max.homozygous.loss |
|
non.clonal.M |
Average expected cellular fraction of sub-clonal somatic mutations. This is to calculate expected allelic fractions of a single sub-clonal bin for variants. For all somatic variants, more accurate cellular fractions are calculated. |
max.mapping.bias |
Exclude variants with high mapping bias from the likelihood score calculation. Note that bias is reported on an inverse scale; a variant with mapping bias of 1 has no bias. |
max.pon |
Exclude variants found more than |
iterations |
Maximum number of iterations in the Simulated Annealing copy number fit optimization. Note that this an integer optimization problem that should converge quickly. Allowed range is 10 to 250. |
min.variants.segment |
Flag segments with fewer variants. The minor copy number estimation is not reliable with insufficient variants. |
log.ratio.calibration |
Re-calibrate log-ratios in the window
|
smooth.log.ratio |
Smooth |
model.homozygous |
Homozygous germline SNPs are uninformative and by
default removed. In 100 percent pure samples such as cell lines, however,
heterozygous germline SNPs appear homozygous in case of LOH. Setting this
parameter to |
error |
Estimated sequencing error rate. Used to calculate minimum
number of supporting reads for variants using
|
interval.file |
A mapping file that assigns GC content and gene symbols
to each exon in the coverage files. Used for generating gene-level calls.
First column in format CHR:START-END. Second column GC content (0 to 1).
Third column gene symbol. This file is generated with the
|
max.dropout |
Measures GC bias as ratio of coverage in AT-rich (GC <
0.5) versus GC-rich on-target regions (GC >= 0.5). High coverage drop-out might
indicate that data was not GC-normalized (optional with larger pool of
normal samples). A warning pointing to a normalized log-ratio drop-out likely
indicates that the sample quality is insufficient. For log-ratio drop-out,
a warning is thrown when half the |
min.logr.sdev |
Minimum log-ratio standard deviation used in the model. Useful to make fitting more robust to outliers in very clean data. |
max.logr.sdev |
Flag noisy samples with segment log-ratio standard deviation larger than this. Assay specific and needs to be calibrated. |
max.segments |
Flag noisy samples with a large number of segments. Assay specific and needs to be calibrated. |
min.gof |
Flag purity/ploidy solutions with poor fit. |
min.variants |
Do not attempt to fit allelic fractions for samples with fewer variants passing all filters. |
plot.cnv |
Generate segmentation plots. |
vcf.field.prefix |
Prefix all newly created VCF field names with this string. |
cosmic.vcf.file |
Add a |
DB.info.flag |
Flag in INFO of VCF that marks presence in common
germline databases. Defaults to |
POPAF.info.field |
As alternative to a flag, use an info field that
contains population allele frequencies. The |
Cosmic.CNT.info.field |
Info field containing hits in the Cosmic database |
min.pop.af |
Minimum population allele frequency in
|
model |
Use either a beta or a beta-binomial distribution for fitting
observed to expected allelic fractions of alterations in |
post.optimize |
Optimize purity using final SCNA-fit and variants. This might take a long time when lots of variants need to be fitted, but will typically result in a slightly more accurate purity, especially for rather silent genomes or very low purities. Otherwise, it will just use the purity determined via the SCNA-fit. |
speedup.heuristics |
Tries to avoid spending computation time on local optima that are unlikely correct. Set to 0 to turn this off, to 1 to only apply heuristics that in worst case will decrease accuracy slightly or to 2 to turn on all heuristics. |
BPPARAM |
|
log.file |
If not |
verbose |
Verbose output. |
A list with elements
candidates |
Results of the grid search. |
results |
All local optima, sorted by final rank. |
input |
The input data. |
Markus Riester
Riester et al. (2016). PureCN: Copy number calling and SNV classification using targeted short read sequencing. Source Code for Biology and Medicine, 11, pp. 13.
Carter et al. (2012), Absolute quantification of somatic DNA alterations in human cancer. Nature Biotechnology.
correctCoverageBias
segmentationCBS
calculatePowerDetectSomatic
normal.coverage.file <- system.file('extdata', 'example_normal_tiny.txt', package = 'PureCN') tumor.coverage.file <- system.file('extdata', 'example_tumor_tiny.txt', package = 'PureCN') vcf.file <- system.file('extdata', 'example.vcf.gz', package = 'PureCN') interval.file <- system.file('extdata', 'example_intervals_tiny.txt', package = 'PureCN') # The max.candidate.solutions, max.ploidy and test.purity parameters are set to # non-default values to speed-up this example. This is not a good idea for real # samples. ret <-runAbsoluteCN(normal.coverage.file = normal.coverage.file, tumor.coverage.file = tumor.coverage.file, genome = 'hg19', vcf.file = vcf.file, sampleid = 'Sample1', interval.file = interval.file, max.ploidy = 4, test.purity = seq(0.3, 0.7, by = 0.05), max.candidate.solutions = 1) # If a high-quality segmentation was obtained with third-party tools: seg.file <- system.file('extdata', 'example_seg.txt', package = 'PureCN') # By default, PureCN will re-segment the data, for example to identify # regions of copy number neutral LOH. If this is not wanted, we can provide # a minimal segmentation function which just returns the provided one: funSeg <- function(seg, ...) return(seg) res <- runAbsoluteCN(seg.file = seg.file, fun.segmentation = funSeg, max.ploidy = 4, test.purity = seq(0.3, 0.7, by = 0.05), max.candidate.solutions = 1, genome='hg19', interval.file = interval.file)
normal.coverage.file <- system.file('extdata', 'example_normal_tiny.txt', package = 'PureCN') tumor.coverage.file <- system.file('extdata', 'example_tumor_tiny.txt', package = 'PureCN') vcf.file <- system.file('extdata', 'example.vcf.gz', package = 'PureCN') interval.file <- system.file('extdata', 'example_intervals_tiny.txt', package = 'PureCN') # The max.candidate.solutions, max.ploidy and test.purity parameters are set to # non-default values to speed-up this example. This is not a good idea for real # samples. ret <-runAbsoluteCN(normal.coverage.file = normal.coverage.file, tumor.coverage.file = tumor.coverage.file, genome = 'hg19', vcf.file = vcf.file, sampleid = 'Sample1', interval.file = interval.file, max.ploidy = 4, test.purity = seq(0.3, 0.7, by = 0.05), max.candidate.solutions = 1) # If a high-quality segmentation was obtained with third-party tools: seg.file <- system.file('extdata', 'example_seg.txt', package = 'PureCN') # By default, PureCN will re-segment the data, for example to identify # regions of copy number neutral LOH. If this is not wanted, we can provide # a minimal segmentation function which just returns the provided one: funSeg <- function(seg, ...) return(seg) res <- runAbsoluteCN(seg.file = seg.file, fun.segmentation = funSeg, max.ploidy = 4, test.purity = seq(0.3, 0.7, by = 0.05), max.candidate.solutions = 1, genome='hg19', interval.file = interval.file)
The default segmentation function. This function is called via the
fun.segmentation
argument of runAbsoluteCN
. The
arguments are passed via args.segmentation
.
segmentationCBS( normal, tumor, log.ratio, seg, plot.cnv, sampleid, weight.flag.pvalue = 0.01, alpha = 0.005, undo.SD = NULL, vcf = NULL, tumor.id.in.vcf = 1, normal.id.in.vcf = NULL, max.segments = NULL, min.logr.sdev = 0.15, prune.hclust.h = NULL, prune.hclust.method = "ward.D", chr.hash = NULL, additional.cmd.args = "", centromeres = NULL )
segmentationCBS( normal, tumor, log.ratio, seg, plot.cnv, sampleid, weight.flag.pvalue = 0.01, alpha = 0.005, undo.SD = NULL, vcf = NULL, tumor.id.in.vcf = 1, normal.id.in.vcf = NULL, max.segments = NULL, min.logr.sdev = 0.15, prune.hclust.h = NULL, prune.hclust.method = "ward.D", chr.hash = NULL, additional.cmd.args = "", centromeres = NULL )
normal |
Coverage data for normal sample. |
tumor |
Coverage data for tumor sample. |
log.ratio |
Copy number log-ratios, one for each target in the coverage files. |
seg |
If segmentation was provided by the user, this data structure will contain this segmentation. Useful for minimal segmentation functions. Otherwise PureCN will re-segment the data. This segmentation function ignores this user provided segmentation. |
plot.cnv |
Segmentation plots. |
sampleid |
Sample id, used in output files. |
weight.flag.pvalue |
Flag values with one-sided p-value smaller than this cutoff. |
alpha |
Alpha value for CBS, see documentation for the |
undo.SD |
|
vcf |
Optional |
tumor.id.in.vcf |
Id of tumor in case multiple samples are stored in VCF. |
normal.id.in.vcf |
Id of normal in in VCF. Currently not used. |
max.segments |
If not |
min.logr.sdev |
Minimum log-ratio standard deviation used in the model. Useful to make fitting more robust to outliers in very clean data. |
prune.hclust.h |
Height in the |
prune.hclust.method |
Cluster method used in the |
chr.hash |
Mapping of non-numerical chromsome names to numerical names
(e.g. chr1 to 1, chr2 to 2, etc.). If |
additional.cmd.args |
|
centromeres |
A |
data.frame
containing the segmentation.
Markus Riester
Olshen, A. B., Venkatraman, E. S., Lucito, R., Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5: 557-572.
Venkatraman, E. S., Olshen, A. B. (2007). A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23: 657-63.
normal.coverage.file <- system.file("extdata", "example_normal_tiny.txt", package = "PureCN") tumor.coverage.file <- system.file("extdata", "example_tumor_tiny.txt", package = "PureCN") vcf.file <- system.file("extdata", "example.vcf.gz", package = "PureCN") interval.file <- system.file("extdata", "example_intervals_tiny.txt", package = "PureCN") # The max.candidate.solutions, max.ploidy and test.purity parameters are set to # non-default values to speed-up this example. This is not a good idea for real # samples. ret <-runAbsoluteCN(normal.coverage.file = normal.coverage.file, tumor.coverage.file = tumor.coverage.file, vcf.file = vcf.file, genome = "hg19", sampleid = "Sample1", interval.file = interval.file, max.candidate.solutions = 1, max.ploidy = 4, test.purity = seq(0.3, 0.7, by = 0.05), fun.segmentation = segmentationCBS, args.segmentation = list(alpha = 0.001))
normal.coverage.file <- system.file("extdata", "example_normal_tiny.txt", package = "PureCN") tumor.coverage.file <- system.file("extdata", "example_tumor_tiny.txt", package = "PureCN") vcf.file <- system.file("extdata", "example.vcf.gz", package = "PureCN") interval.file <- system.file("extdata", "example_intervals_tiny.txt", package = "PureCN") # The max.candidate.solutions, max.ploidy and test.purity parameters are set to # non-default values to speed-up this example. This is not a good idea for real # samples. ret <-runAbsoluteCN(normal.coverage.file = normal.coverage.file, tumor.coverage.file = tumor.coverage.file, vcf.file = vcf.file, genome = "hg19", sampleid = "Sample1", interval.file = interval.file, max.candidate.solutions = 1, max.ploidy = 4, test.purity = seq(0.3, 0.7, by = 0.05), fun.segmentation = segmentationCBS, args.segmentation = list(alpha = 0.001))
A wrapper for GATK4s ModelSegmentation function, useful when normalization
is performed with other tools than GATK4, for example PureCN.
This function is called via the
fun.segmentation
argument of runAbsoluteCN
. The
arguments are passed via args.segmentation
.
segmentationGATK4( normal, tumor, log.ratio, seg, vcf = NULL, tumor.id.in.vcf = 1, normal.id.in.vcf = NULL, min.logr.sdev = 0.15, prune.hclust.h = NULL, prune.hclust.method = NULL, changepoints.penality = NULL, additional.cmd.args = "", chr.hash = NULL, ... )
segmentationGATK4( normal, tumor, log.ratio, seg, vcf = NULL, tumor.id.in.vcf = 1, normal.id.in.vcf = NULL, min.logr.sdev = 0.15, prune.hclust.h = NULL, prune.hclust.method = NULL, changepoints.penality = NULL, additional.cmd.args = "", chr.hash = NULL, ... )
normal |
Coverage data for normal sample. Ignored in this function. |
tumor |
Coverage data for tumor sample. |
log.ratio |
Copy number log-ratios, one for each exon in coverage file. |
seg |
If segmentation was provided by the user, this data structure will contain this segmentation. Useful for minimal segmentation functions. Otherwise PureCN will re-segment the data. This segmentation function ignores this user provided segmentation. |
vcf |
Optional |
tumor.id.in.vcf |
Id of tumor in case multiple samples are stored in VCF. |
normal.id.in.vcf |
Id of normal in in VCF. Currently not used. |
min.logr.sdev |
Minimum log-ratio standard deviation used in the model. Useful to make fitting more robust to outliers in very clean data. |
prune.hclust.h |
Ignored in this function. |
prune.hclust.method |
Ignored in this function. |
changepoints.penality |
The |
additional.cmd.args |
|
chr.hash |
Not needed here since |
... |
Currently unused arguments provided to other segmentation functions. |
data.frame
containing the segmentation.
Markus Riester
normal.coverage.file <- system.file("extdata", "example_normal_tiny.txt", package="PureCN") tumor.coverage.file <- system.file("extdata", "example_tumor_tiny.txt", package="PureCN") vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") # The max.candidate.solutions, max.ploidy and test.purity parameters are set to # non-default values to speed-up this example. This is not a good idea for real # samples. ## Not run: ret <-runAbsoluteCN(normal.coverage.file=normal.coverage.file, tumor.coverage.file=tumor.coverage.file, vcf.file=vcf.file, sampleid="Sample1", genome="hg19", fun.segmentation = segmentationGATK4, max.ploidy=4, args.segmentation = list(additional.cmd.args = "--gcs-max-retries 19"), test.purity=seq(0.3,0.7,by=0.05), max.candidate.solutions=1) ## End(Not run)
normal.coverage.file <- system.file("extdata", "example_normal_tiny.txt", package="PureCN") tumor.coverage.file <- system.file("extdata", "example_tumor_tiny.txt", package="PureCN") vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") # The max.candidate.solutions, max.ploidy and test.purity parameters are set to # non-default values to speed-up this example. This is not a good idea for real # samples. ## Not run: ret <-runAbsoluteCN(normal.coverage.file=normal.coverage.file, tumor.coverage.file=tumor.coverage.file, vcf.file=vcf.file, sampleid="Sample1", genome="hg19", fun.segmentation = segmentationGATK4, max.ploidy=4, args.segmentation = list(additional.cmd.args = "--gcs-max-retries 19"), test.purity=seq(0.3,0.7,by=0.05), max.candidate.solutions=1) ## End(Not run)
A minimal segmentation function useful when segmentation was performed by
third-pary tools. When a CollapsedVCF
with germline SNPs is provided,
it will cluster segments using hclust
. Otherwise it will use the
segmentation as provided.
This function is called via the
fun.segmentation
argument of runAbsoluteCN
. The
arguments are passed via args.segmentation
.
segmentationHclust( seg, vcf = NULL, tumor.id.in.vcf = 1, normal.id.in.vcf = NULL, min.logr.sdev = 0.15, prune.hclust.h = NULL, prune.hclust.method = "ward.D", chr.hash = NULL, ... )
segmentationHclust( seg, vcf = NULL, tumor.id.in.vcf = 1, normal.id.in.vcf = NULL, min.logr.sdev = 0.15, prune.hclust.h = NULL, prune.hclust.method = "ward.D", chr.hash = NULL, ... )
seg |
If segmentation was provided by the user, this data structure will contain this segmentation. Useful for minimal segmentation functions. Otherwise PureCN will re-segment the data. This segmentation function ignores this user provided segmentation. |
vcf |
Optional |
tumor.id.in.vcf |
Id of tumor in case multiple samples are stored in VCF. |
normal.id.in.vcf |
Id of normal in in VCF. Currently not used. |
min.logr.sdev |
Minimum log-ratio standard deviation used in the model. Useful to make fitting more robust to outliers in very clean data (currently not used in this segmentation function). |
prune.hclust.h |
Height in the |
prune.hclust.method |
Cluster method used in the |
chr.hash |
Mapping of non-numerical chromsome names to numerical names
(e.g. chr1 to 1, chr2 to 2, etc.). If |
... |
Currently unused arguments provided to other segmentation functions. |
data.frame
containing the segmentation.
Markus Riester
vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") interval.file <- system.file("extdata", "example_intervals_tiny.txt", package="PureCN") seg.file <- system.file('extdata', 'example_seg.txt', package = 'PureCN') res <- runAbsoluteCN(seg.file = seg.file, fun.segmentation = segmentationHclust, max.ploidy = 4, vcf.file = vcf.file, test.purity = seq(0.3, 0.7, by = 0.05), max.candidate.solutions = 1, genome = 'hg19', interval.file = interval.file)
vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") interval.file <- system.file("extdata", "example_intervals_tiny.txt", package="PureCN") seg.file <- system.file('extdata', 'example_seg.txt', package = 'PureCN') res <- runAbsoluteCN(seg.file = seg.file, fun.segmentation = segmentationHclust, max.ploidy = 4, vcf.file = vcf.file, test.purity = seq(0.3, 0.7, by = 0.05), max.candidate.solutions = 1, genome = 'hg19', interval.file = interval.file)
Alternative segmentation function using the PSCBS
package. This
function is called via the fun.segmentation
argument of
runAbsoluteCN
. The arguments are passed via
args.segmentation
.
segmentationPSCBS( normal, tumor, log.ratio, seg, plot.cnv, sampleid, weight.flag.pvalue = 0.01, alpha = 0.005, undo.SD = NULL, flavor = "tcn&dh", tauA = 0.03, vcf = NULL, tumor.id.in.vcf = 1, normal.id.in.vcf = NULL, max.segments = NULL, boost.on.target.max.size = 30, min.logr.sdev = 0.15, prune.hclust.h = NULL, prune.hclust.method = "ward.D", chr.hash = NULL, additional.cmd.args = "", centromeres = NULL, ... )
segmentationPSCBS( normal, tumor, log.ratio, seg, plot.cnv, sampleid, weight.flag.pvalue = 0.01, alpha = 0.005, undo.SD = NULL, flavor = "tcn&dh", tauA = 0.03, vcf = NULL, tumor.id.in.vcf = 1, normal.id.in.vcf = NULL, max.segments = NULL, boost.on.target.max.size = 30, min.logr.sdev = 0.15, prune.hclust.h = NULL, prune.hclust.method = "ward.D", chr.hash = NULL, additional.cmd.args = "", centromeres = NULL, ... )
normal |
Coverage data for normal sample. Ignored in this function. |
tumor |
Coverage data for tumor sample. |
log.ratio |
Copy number log-ratios, one for each exon in coverage file. |
seg |
If segmentation was provided by the user, this data structure will contain this segmentation. Useful for minimal segmentation functions. Otherwise PureCN will re-segment the data. This segmentation function ignores this user provided segmentation. |
plot.cnv |
Segmentation plots. |
sampleid |
Sample id, used in output files. |
weight.flag.pvalue |
Flag values with one-sided p-value smaller than this cutoff. |
alpha |
Alpha value for CBS, see documentation for the |
undo.SD |
|
flavor |
Flavor value for PSCBS. See |
tauA |
tauA argument for PSCBS. See |
vcf |
Optional VCF object with germline allelic ratios. |
tumor.id.in.vcf |
Id of tumor in case multiple samples are stored in VCF. |
normal.id.in.vcf |
Id of normal in in VCF. If |
max.segments |
If not |
boost.on.target.max.size |
When off-target regions are noisy compared to on-target, try to find small segments of specified maximum size that might be missed to due the increased noise. Set to 0 to turn boosting off. |
min.logr.sdev |
Minimum log-ratio standard deviation used in the model. Useful to make fitting more robust to outliers in very clean data. |
prune.hclust.h |
Height in the |
prune.hclust.method |
Cluster method used in the |
chr.hash |
Mapping of non-numerical chromsome names to numerical names
(e.g. chr1 to 1, chr2 to 2, etc.). If |
additional.cmd.args |
|
centromeres |
A |
... |
Additional parameters passed to the
|
data.frame
containing the segmentation.
Markus Riester
Olshen, A. B., Venkatraman, E. S., Lucito, R., Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5: 557-572.
Venkatraman, E. S., Olshen, A. B. (2007). A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23: 657-63.
Olshen et al. (2011). Parent-specific copy number in paired tumor-normal studies using circular binary segmentation. Bioinformatics.
normal.coverage.file <- system.file("extdata", "example_normal_tiny.txt", package = "PureCN") tumor.coverage.file <- system.file("extdata", "example_tumor_tiny.txt", package = "PureCN") vcf.file <- system.file("extdata", "example.vcf.gz", package = "PureCN") # The max.candidate.solutions, max.ploidy and test.purity parameters are set to # non-default values to speed-up this example. This is not a good idea for real # samples. ret <-runAbsoluteCN(normal.coverage.file = normal.coverage.file, tumor.coverage.file = tumor.coverage.file, vcf.file = vcf.file, sampleid = "Sample1", genome = "hg19", fun.segmentation = segmentationPSCBS, max.ploidy = 4, test.purity = seq(0.3, 0.7, by = 0.05), max.candidate.solutions = 1)
normal.coverage.file <- system.file("extdata", "example_normal_tiny.txt", package = "PureCN") tumor.coverage.file <- system.file("extdata", "example_tumor_tiny.txt", package = "PureCN") vcf.file <- system.file("extdata", "example.vcf.gz", package = "PureCN") # The max.candidate.solutions, max.ploidy and test.purity parameters are set to # non-default values to speed-up this example. This is not a good idea for real # samples. ret <-runAbsoluteCN(normal.coverage.file = normal.coverage.file, tumor.coverage.file = tumor.coverage.file, vcf.file = vcf.file, sampleid = "Sample1", genome = "hg19", fun.segmentation = segmentationPSCBS, max.ploidy = 4, test.purity = seq(0.3, 0.7, by = 0.05), max.candidate.solutions = 1)
Function to set mapping bias for each variant in the provided
CollapsedVCF
object. By default, it returns the same value for all
variants, but a mapping bias file can be provided for position-specific
mapping bias calculation.
setMappingBiasVcf( vcf, tumor.id.in.vcf = NULL, mapping.bias.file = NULL, smooth = TRUE, smooth.n = 5 )
setMappingBiasVcf( vcf, tumor.id.in.vcf = NULL, mapping.bias.file = NULL, smooth = TRUE, smooth.n = 5 )
vcf |
|
tumor.id.in.vcf |
Id of tumor in case multiple samples are stored in VCF. |
mapping.bias.file |
A precomputed mapping bias database
obtained by |
smooth |
Impute mapping bias of variants not found in the panel by
smoothing of neighboring SNPs. Requires |
smooth.n |
Number of neighboring variants used for smoothing. |
Adds elements to the vcf
INFO
field
bias |
A |
pon.count |
A |
shape1 , shape2
|
Fit of a beta distribution. |
Markus Riester
# This function is typically only called by runAbsoluteCN via # fun.setMappingBiasVcf and args.setMappingBiasVcf. vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") vcf <- readVcf(vcf.file, "hg19") vcf.bias <- setMappingBiasVcf(vcf)
# This function is typically only called by runAbsoluteCN via # fun.setMappingBiasVcf and args.setMappingBiasVcf. vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") vcf <- readVcf(vcf.file, "hg19") vcf.bias <- setMappingBiasVcf(vcf)
Function to set prior for somatic mutation status for each variant in the
provided CollapsedVCF
object.
setPriorVcf( vcf, prior.somatic = c(0.5, 5e-04, 0.999, 1e-04, 0.995, 0.5), tumor.id.in.vcf = NULL, min.cosmic.cnt = 6, DB.info.flag = "DB", Cosmic.CNT.info.field = "Cosmic.CNT" )
setPriorVcf( vcf, prior.somatic = c(0.5, 5e-04, 0.999, 1e-04, 0.995, 0.5), tumor.id.in.vcf = NULL, min.cosmic.cnt = 6, DB.info.flag = "DB", Cosmic.CNT.info.field = "Cosmic.CNT" )
vcf |
|
prior.somatic |
Prior probabilities for somatic mutations. First value is for the case when no matched normals are available and the variant is not in germline databases (second value). Third value is for variants with MuTect somatic call. Different from 1, because somatic mutations in segments of copy number 0 have 0 probability and artifacts can thus have dramatic influence on likelihood score. Forth value is for variants not labeled as somatic by MuTect. Last two values are optional, if vcf contains a flag Cosmic.CNT, it will set the prior probability for variants with CNT > 6 to the first of those values in case of no matched normal available (0.995 default). Final value is for the case that variant is in both germline databases and COSMIC count > 6. |
tumor.id.in.vcf |
Id of tumor in case multiple samples are stored in VCF. |
min.cosmic.cnt |
Minimum number of hits in the COSMIC database to call variant as likely somatic. |
DB.info.flag |
Flag in INFO of VCF that marks presence in common
germline databases. Defaults to |
Cosmic.CNT.info.field |
Info field containing hits in the Cosmic database |
The vcf
with numeric(nrow(vcf))
vector with the
prior probability of somatic status for each variant in the
CollapsedVCF
added to the INFO
field PR
.
Markus Riester
# This function is typically only called by runAbsoluteCN via the # fun.setPriorVcf and args.setPriorVcf comments. vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") vcf <- readVcf(vcf.file, "hg19") vcf <- setPriorVcf(vcf)
# This function is typically only called by runAbsoluteCN via the # fun.setPriorVcf and args.setPriorVcf comments. vcf.file <- system.file("extdata", "example.vcf.gz", package="PureCN") vcf <- readVcf(vcf.file, "hg19") vcf <- setPriorVcf(vcf)