Title: | Detection of subclonal SNVs in deep sequencing data. |
---|---|
Description: | This package provides provides quantitative variant callers for detecting subclonal mutations in ultra-deep (>=100x coverage) sequencing experiments. The deepSNV algorithm is used for a comparative setup with a control experiment of the same loci and uses a beta-binomial model and a likelihood ratio test to discriminate sequencing errors and subclonal SNVs. The shearwater algorithm computes a Bayes classifier based on a beta-binomial model for variant calling with multiple samples for precisely estimating model parameters - such as local error rates and dispersion - and prior knowledge, e.g. from variation data bases such as COSMIC. |
Authors: | Niko Beerenwinkel [ths], Raul Alcantara [ctb], David Jones [ctb], John Marshall [ctb], Inigo Martincorena [ctb], Moritz Gerstung [aut, cre] |
Maintainer: | Moritz Gerstung <[email protected]> |
License: | GPL-3 |
Version: | 1.53.0 |
Built: | 2024-11-29 07:27:51 UTC |
Source: | https://github.com/bioc/deepSNV |
Detection of subclonal SNVs in deep sequencing experiments
This packages provides algorithms for detecting subclonal single nucleotide variants (SNVs) and their frequencies from ultra-deep sequencing data. It retrieves the nucleotide counts at each position and each strand from two .bam files and tests for differences between the two experiments with a likelihood ratio test using either a binomial or and overdispersed beta-binomial model. The statistic can be tuned across genomic sites by a shared Dirichlet prior and there package provides procedures for normalizing sequencing data from different runs.
Moritz Gerstung, Wellcome Trust Sanger Institute, [email protected]
Gerstung M, Beisel C, Rechsteiner M, Wild P, Schraml P, Moch H, and Beerenwinkel N. Reliable detection of subclonal single-nucleotide variants in tumour cell populations. Nat Commun 3:811 (2012). DOI:10.1038/ncomms1814.
## Short example with 2 SNVs at frequency ~10% regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140) ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10) show(ex) # show method plot(ex) # scatter plot summary(ex) # summary with significant SNVs ex[1:3,] # subsetting the first three genomic positions tail(test(ex, total=TRUE)) # retrieve the test counts on both strands tail(control(ex, total=TRUE)) ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself. # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585) # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10) data(HIVmix) # attach data instead.. show(HIVmix) plot(HIVmix) head(summary(HIVmix))
## Short example with 2 SNVs at frequency ~10% regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140) ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10) show(ex) # show method plot(ex) # scatter plot summary(ex) # summary with significant SNVs ex[1:3,] # subsetting the first three genomic positions tail(test(ex, total=TRUE)) # retrieve the test counts on both strands tail(control(ex, total=TRUE)) ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself. # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585) # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10) data(HIVmix) # attach data instead.. show(HIVmix) plot(HIVmix) head(summary(HIVmix))
This function uses a C interface to read the nucleotide counts on each position of a .bam alignment. The counts of both strands are reported separately
and nucleotides below a quality cutoff are masked. It is called by deepSNV
to parse the alignments of the test and control experiments,
respectively.
bam2R( file, chr, start, stop, q = 25, mq = 0, s = 2, head.clip = 0, max.depth = 1e+06, verbose = FALSE, mask = 0, keepflag = 0, max.mismatches = NULL )
bam2R( file, chr, start, stop, q = 25, mq = 0, s = 2, head.clip = 0, max.depth = 1e+06, verbose = FALSE, mask = 0, keepflag = 0, max.mismatches = NULL )
file |
The name of the .bam file as a string. |
chr |
The chromosome as a string. |
start |
The start position (1-indexed). |
stop |
The end position (1-indexed). |
q |
An optional cutoff for the nucleotide Phred quality. Default q = 25. Nucleotides with Q < q will be masked by 'N'. |
mq |
An optional cutoff for the read mapping quality. Default mq = 0 (no filter). reads with MQ < mq will be discarded. |
s |
Optional choice of the strand. Defaults to s = 2 (both). |
head.clip |
Should n nucleotides from the head of reads be clipped? Default 0. |
max.depth |
The maximal depth for the pileup command. Default 1,000,000. |
verbose |
Boolean. Set to TRUE if you want to get additional output. |
mask |
Integer indicating which flags to filter. Default 0 (no mask). Try 3844 (UNMAP|SECONDARY|QCFAIL|DUP|SUPPLEMENTARY). |
keepflag |
Integer indicating which flags to keep. Default 0 (no mask). Try 3 (PAIRED|PROPERLY_PAIRED). |
max.mismatches |
Integer indicating maximum NM value to allow in a read. Default NULL (no filter). |
A named matrix
with rows corresponding to genomic positions and columns for the nucleotide counts (A, T, C, G, -), masked nucleotides (N), (INS)ertions, (DEL)etions, (HEAD)s and (TAIL)s that count how often a read begins and ends at the given position, respectively,
and the sum of alignment (QUAL)ities, which can be indicative of alignment problems.
Counts from matches on the reference strand (s=0) are uppercase, counts on the complement (s=1) are lowercase. The returned matrix has 11 * 2 (strands) = 22 columns and (stop - start + 1) rows.
Moritz Gerstung
## Simple example: counts <- bam2R(file = system.file("extdata", "test.bam", package="deepSNV"), chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140, q = 10, mask = 3844) show(counts) ## Not run: Requires an internet connection, but try yourself. # bam <- bam2R(file = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585, q=10, mask = 3844) # head(bam)
## Simple example: counts <- bam2R(file = system.file("extdata", "test.bam", package="deepSNV"), chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140, q = 10, mask = 3844) show(counts) ## Not run: Requires an internet connection, but try yourself. # bam <- bam2R(file = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585, q=10, mask = 3844) # head(bam)
This is the workhorse of the shearwater test. It computes the Bayes factor for each sample, nucleotide and position of the null-model vs. the alternative of a real variant.
bbb( counts, rho = NULL, alternative = "greater", truncate = 0.1, rho.min = 1e-04, rho.max = 0.1, pseudo = .Machine$double.eps, return.value = c("BF", "P0", "err"), model = c("OR", "AND", "adaptive"), min.cov = NULL, max.odds = 10, mu.min = 1e-06, mu.max = 1 - mu.min )
bbb( counts, rho = NULL, alternative = "greater", truncate = 0.1, rho.min = 1e-04, rho.max = 0.1, pseudo = .Machine$double.eps, return.value = c("BF", "P0", "err"), model = c("OR", "AND", "adaptive"), min.cov = NULL, max.odds = 10, mu.min = 1e-06, mu.max = 1 - mu.min )
counts |
An |
rho |
Disperision factor. If NULL, estimated from the data. |
alternative |
The alternative. Currently only "greater" is implemented. |
truncate |
The model uses a compound control sample which is the sum of all samples with a relative nucleotide frequency below truncate at this locus. Default = 0.1. |
rho.min |
Lower bound for the method of moment estimate of the dispersion factor rho. |
rho.max |
Upper bound for the method of moment estimate of the dispersion factor rho. |
pseudo |
A pseudo count to be added to the counts to avoid problems with zeros. |
return.value |
Return value. Either "BF" for Bayes Factor of "P0" for the posterior probability (assuming a prior of 0.5). |
model |
The null model to use. For "OR" it requires the alternative model to be violated on either of the strands, for "AND" the null is specified such that the error rates of the sample of interest and the compound control sample are identical on both strands. "AND" typically yield many more calls. The most recent addition is "adaptive", which switches from "OR" to "AND", if the coverage is less than min.cov, or if the odds of forward and reverse coverage is greater than max.odds. Default = "OR". |
min.cov |
Minimal coverage to swith from OR to AND, if model is "adaptive" |
max.odds |
Maximal odds before switching from OR to AND if model is "adaptive" and min.cov=NULL. |
mu.min |
Minimum of the error rate mu. |
mu.max |
Maximal error rate mu. |
An array
of Bayes factors
Experimental code, subject to changes
mg14
## Load data from deepSNV example regions <- GRanges("B.FR.83.HXB2_LAI_IIIB_BRU_K034", IRanges(start = 3120, end=3140)) files <- c(system.file("extdata", "test.bam", package="deepSNV"), system.file("extdata", "control.bam", package="deepSNV")) counts <- loadAllData(files, regions, q=10) ## Run (bbb) computes the Bayes factor bf <- bbb(counts, model = "OR", rho=1e-4) vcf <- bf2Vcf(bf, counts, regions, samples = files, prior = 0.5, mvcf = TRUE) ## Compare to deepSNV bf <- bbb(counts, model = "AND", rho=1e-4) dpSNV <- deepSNV(test = files[1], control = files[2], regions=regions, q=10) plot(p.val(dpSNV), bf[1,,]/(1+bf[1,,]), log="xy")
## Load data from deepSNV example regions <- GRanges("B.FR.83.HXB2_LAI_IIIB_BRU_K034", IRanges(start = 3120, end=3140)) files <- c(system.file("extdata", "test.bam", package="deepSNV"), system.file("extdata", "control.bam", package="deepSNV")) counts <- loadAllData(files, regions, q=10) ## Run (bbb) computes the Bayes factor bf <- bbb(counts, model = "OR", rho=1e-4) vcf <- bf2Vcf(bf, counts, regions, samples = files, prior = 0.5, mvcf = TRUE) ## Compare to deepSNV bf <- bbb(counts, model = "AND", rho=1e-4) dpSNV <- deepSNV(test = files[1], control = files[2], regions=regions, q=10) plot(p.val(dpSNV), bf[1,,]/(1+bf[1,,]), log="xy")
Maximum likelihood version of Shearwater producing p-values instead of Bayes factors.
betabinLRT( counts, rho = NULL, truncate = 0.05, rho.min = 1e-04, rho.max = 0.8, maxvaf = 0.3, mindepth = 10, maxtruncate = 0.5 )
betabinLRT( counts, rho = NULL, truncate = 0.05, rho.min = 1e-04, rho.max = 0.8, maxvaf = 0.3, mindepth = 10, maxtruncate = 0.5 )
counts |
The array of counts typically generated by loadAllData. |
rho |
Use this variable to fix the dispersion parameter to a value of interest. Default: NULL, rho will be estimated from the data. |
truncate |
Samples with variant allele frequencies higher than "truncate" will be excluded from the background error model. |
rho.min |
If rho=NULL, rho will be estimated from the data in the interval [rho.min,rho.max]. |
rho.max |
If rho=NULL, rho will be estimated from the data in the interval [rho.min,rho.max]. |
maxvaf |
Sites with an average rate of mimatches higher than maxvaf will not be considered (e.g. SNPs or reference sites). |
mindepth |
Minimum coverage required to test a site. |
maxtruncate |
Maximum number of samples that can be excluded from the background error model by truncate for a site to be tested. |
A list with two arrays for P- and Q-values.
Inigo Martincorena and Moritz Gerstung
Martincorena I, Roshan A, Gerstung M, et al. (2015). High burden and pervasive positive selection of somatic mutations in normal human skin. _Science_ (Under consideration).
# code to be added
# code to be added
VCF
object with variant calls from an array of Bayes factors.This function thresholds the Bayes factors computed by the shearwater algorithm and creates a VCF
object as output.
bf2Vcf( BF, counts, regions, samples = 1:nrow(counts), err = NULL, mu = NULL, cutoff = 0.05, prior = 0.5, mvcf = TRUE )
bf2Vcf( BF, counts, regions, samples = 1:nrow(counts), err = NULL, mu = NULL, cutoff = 0.05, prior = 0.5, mvcf = TRUE )
BF |
array of Bayes factors from |
counts |
array of counts from |
regions |
|
samples |
vector of samples names. |
err |
Optional matrix of error rates, otherwise recomputed from counts. |
mu |
Optional matrix of relative frequencies, otherwise recomputed from counts. |
cutoff |
Cutoff for the posterior artifact probability below which a variant is considered to be true (default = 0.05) |
prior |
matrix of prior probabilities for finding a true call, typically from |
mvcf |
boolean flag, if TRUE compute a large VCF with as many genotype columns as samples. Default TRUE. Otherwise use duplicate rows and only one genotype column. The sample is then provided by the info:PD field. Can be inefficient for large sample sizes. |
A VCF
object
Experimental code, subject to changes
mg14
This function computes the consensus sequence from a matrix of nucleotide counts, or the control slot of a deepSNV object.
consensusSequence(x, ...) ## S4 method for signature 'matrix' consensusSequence(x, vector=FALSE, haploid=TRUE, het.cut = .333) ## S4 method for signature 'deepSNV' consensusSequence(x, vector=FALSE, haploid=TRUE, het.cut = .333)
consensusSequence(x, ...) ## S4 method for signature 'matrix' consensusSequence(x, vector=FALSE, haploid=TRUE, het.cut = .333) ## S4 method for signature 'deepSNV' consensusSequence(x, vector=FALSE, haploid=TRUE, het.cut = .333)
x |
An object. Either an |
... |
Additional arguments passed to methods. |
vector |
Boolean where TRUE indicates that a character vector should be returned. |
haploid |
Should the consensus be called for a haploid control? Otherwise, also all bases larger than het.cut are rerported. Default haploid = TRUE. |
het.cut |
Heterozygous cutoff. If haploid = FALSE, report all nucleotides with relative frequency larger than het.cut. Default = 0.333. |
A DNAString
with the consensus sequence, or if vector = TRUE, a character vector.
Moritz Gerstung
data(HIVmix) seq = consensusSequence(HIVmix) consensusSequence(HIVmix, vector=TRUE)[1:10]
data(HIVmix) seq = consensusSequence(HIVmix) consensusSequence(HIVmix, vector=TRUE)[1:10]
Convenience function to obtain the control counts from a deepSNV object.
control(deepSNV, ...) ## S4 method for signature 'deepSNV' control(deepSNV, total = FALSE)
control(deepSNV, ...) ## S4 method for signature 'deepSNV' control(deepSNV, total = FALSE)
deepSNV |
a |
... |
Additional param passed to specific methods |
total |
Logical. If true the sum of both strands is returned |
A matrix with the absolute frequencies summed over both strands.
data(HIVmix) control(HIVmix)[1:10,] control(HIVmix, total=TRUE)[1:10,]
data(HIVmix) control(HIVmix)[1:10,] control(HIVmix, total=TRUE)[1:10,]
Convenience function to get the coordinates from a deepSNV object.
coordinates(deepSNV, ...) ## S4 method for signature 'deepSNV' coordinates(deepSNV)
coordinates(deepSNV, ...) ## S4 method for signature 'deepSNV' coordinates(deepSNV)
deepSNV |
a |
... |
Additional param passed to specific methods |
A data.frame
with columns "chrom(osome)" and "pos(ition)".
data(HIVmix) coordinates(HIVmix)[1:10,]
data(HIVmix) coordinates(HIVmix)[1:10,]
A table with counts of the HIVmix data set. Used for minimal unit testing.
data("counts", package="deepSNV") countsFromBam <- bam2R(file = system.file("extdata", "test.bam", package="deepSNV"), chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140, q = 10) all(counts == countsFromBam)
data("counts", package="deepSNV") countsFromBam <- bam2R(file = system.file("extdata", "test.bam", package="deepSNV"), chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140, q = 10) all(counts == countsFromBam)
Beta-binomial probability distribution
dbetabinom(x, n, mu, rho, log = FALSE)
dbetabinom(x, n, mu, rho, log = FALSE)
x |
Counts |
n |
Size |
mu |
Probability |
rho |
Dispersion. rho in (0,1) |
log |
Return logarithmic values |
d
mg14
This generic function can handle different types of inputs for the test and control experiments. It either reads from two .bam files,
uses two matrices of nucleotide counts, or re-evaluates the test results from a deepSNV-class
object. The actual test is a
likelihood ratio test of a (beta-)binomial model for the individual nucleotide counts on each position under the hypothesis that both experiments share the same parameter,
and the alternative that the parameters differ. Because the difference in degrees of freedom is 1, the test statistic
is asymptotically distributed as
. The statistic may be tuned by a nucleotide specific Dirichlet prior that is learned across all genomic sites,
see
estimateDirichlet
. If the model is beta-binomial, a global dispersion parameter is used for all sites. It can be learned with estimateDispersion
.
deepSNV(test, control, ...) ## S4 method for signature 'matrix,matrix' deepSNV(test,control, alternative = c('greater', 'less', 'two.sided'), dirichlet.prior = NULL, pseudo.count=1, combine.method = c("fisher", "max", "average"), over.dispersion = 100, model = c("bin", "betabin"), ...) ## S4 method for signature 'deepSNV,missing' deepSNV(test, control, ...) ## S4 method for signature 'character,character' deepSNV(test, control, regions, q=25, s=2, head.clip=0, ...) ## S4 method for signature 'matrix,character' deepSNV(test, control, regions, q=25, s=2, ...) ## S4 method for signature 'character,matrix' deepSNV(test, control, regions, q=25, s=2, ...)
deepSNV(test, control, ...) ## S4 method for signature 'matrix,matrix' deepSNV(test,control, alternative = c('greater', 'less', 'two.sided'), dirichlet.prior = NULL, pseudo.count=1, combine.method = c("fisher", "max", "average"), over.dispersion = 100, model = c("bin", "betabin"), ...) ## S4 method for signature 'deepSNV,missing' deepSNV(test, control, ...) ## S4 method for signature 'character,character' deepSNV(test, control, regions, q=25, s=2, head.clip=0, ...) ## S4 method for signature 'matrix,character' deepSNV(test, control, regions, q=25, s=2, ...) ## S4 method for signature 'character,matrix' deepSNV(test, control, regions, q=25, s=2, ...)
test |
The test experiment. Either a .bam file, or a matrix with nucleotide counts, or a |
control |
The control experiment. Must be of the same type as test, or missing if test is a |
... |
Additional arguments. |
alternative |
The alternative to be tested. One of greater, less, or two.sided. |
dirichlet.prior |
A base-sepecific Dirichlet prior specified as a matrix. Default NULL. |
pseudo.count |
If dirichlet.prior=NULL, a pseudocount can be used to define a flat prior. |
combine.method |
The method to combine p-values. One of "fisher" (default), "max", or "average". See |
over.dispersion |
A numeric factor for the over.dispersion, if the model is beta-binomial. Default 100. |
model |
Which model to use. Either "bin", or "betabin". Default "bin". |
regions |
The regions to be parsed if test and control are .bam files. Either a |
q |
The quality arguement passed to |
s |
The strand argument passed to |
head.clip |
The head.clip argument passed to |
A deepSNV
object
Moritz Gerstung
## Short example with 2 SNVs at frequency ~10% regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140) ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10) show(ex) # show method plot(ex) # scatter plot summary(ex) # summary with significant SNVs ex[1:3,] # subsetting the first three genomic positions tail(test(ex, total=TRUE)) # retrieve the test counts on both strands tail(control(ex, total=TRUE)) ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself. # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585) # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10) data(HIVmix) # attach data instead.. show(HIVmix) plot(HIVmix) head(summary(HIVmix))
## Short example with 2 SNVs at frequency ~10% regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140) ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10) show(ex) # show method plot(ex) # scatter plot summary(ex) # summary with significant SNVs ex[1:3,] # subsetting the first three genomic positions tail(test(ex, total=TRUE)) # retrieve the test counts on both strands tail(control(ex, total=TRUE)) ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself. # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585) # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10) data(HIVmix) # attach data instead.. show(HIVmix) plot(HIVmix) head(summary(HIVmix))
This class stores the contents of the deepSNV test. It is typically initialized with deepSNV
.
This class has the following slots:
The P-values of the test.
A matrix with the nucleotide counts in the test experiment. The column names of the nucleotide counts are A, T, C, G, - for the positivie strand and a, t, c, g, _ for the reverse.
A matrix with the nucleotide counts in the control experiment. The column names must be the same as for the test.
A data.frame
with the genomic coordinates chr and pos, and other columns, if desired.
A matrix with the nucleotide-specific Dirichlet prior
The pseudo count if used)
A string with the alternative used in the test.
A character vector with the nucleotides tested.
A data.frame
with columns chr, start, and stop.
A list with two entries test and control storing the filenames (if the object was initialized from two bam-files).
The method for combining p-values as a character string.
The statistical model, either bin for binomial, or betabin for beta-binomial
If the model is beta-binomial, the first parameter for the beta-binomial model, which is shared across sites.
The last function call to deepSNV.
The log likelihood of the data under the null hypothesis. (Excluding zeros on the opposite site under a one-sided test.)
Moritz Gerstung
## Short example with 2 SNVs at frequency ~10% regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140) ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10) show(ex) # show method plot(ex) # scatter plot summary(ex) # summary with significant SNVs ex[1:3,] # subsetting the first three genomic positions tail(test(ex, total=TRUE)) # retrieve the test counts on both strands tail(control(ex, total=TRUE)) ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself. # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585) # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10) data(HIVmix) # attach data instead.. show(HIVmix) plot(HIVmix) head(summary(HIVmix))
## Short example with 2 SNVs at frequency ~10% regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140) ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10) show(ex) # show method plot(ex) # scatter plot summary(ex) # summary with significant SNVs ex[1:3,] # subsetting the first three genomic positions tail(test(ex, total=TRUE)) # retrieve the test counts on both strands tail(control(ex, total=TRUE)) ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself. # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585) # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10) data(HIVmix) # attach data instead.. show(HIVmix) plot(HIVmix) head(summary(HIVmix))
The prior learns the parameters of a Dirichlet distribution seperately for each consensus base. The expected value of the Dirichlet distributions
is the base-substitution matrix, where rows correspond to the initial nucleotide and columns to the substituted nucleotide. The absolute values determine
the higher moments of the Dirichlet distributions. After having learned the prior the deepSNV-class
test is recomputed.
estimateDirichlet(control) ## S4 method for signature 'matrix' estimateDirichlet(control) ## S4 method for signature 'deepSNV' estimateDirichlet(control)
estimateDirichlet(control) ## S4 method for signature 'matrix' estimateDirichlet(control) ## S4 method for signature 'deepSNV' estimateDirichlet(control)
control |
Either a matrix with nucleotide counts or a |
An deepSNV-class
object.
Moritz Gerstung
data(phiX) estimateDirichlet(phiX)
data(phiX) estimateDirichlet(phiX)
This function estimates the dispersion factor in a beta-binomial model of the nucleotide counts. This model assumes that the count for nucleotide j at position i is
distributed after a beta-binomial , where
is the coverage.
The base and nucleotide specific parameter
is estimated from the local mean by the method-of-moments estimate,
is a shared
overdispersion parameter. It is estimated via a numerical optimization of the likelihood under the null-hypothesis.
estimateDispersion(test, control, ...) ## S4 method for signature 'deepSNV,missing' estimateDispersion(test, control, alternative = NULL, interval = c(0,1000)) ## S4 method for signature 'matrix,matrix' estimateDispersion(test, control, alternative = NULL, interval = c(0,1000))
estimateDispersion(test, control, ...) ## S4 method for signature 'deepSNV,missing' estimateDispersion(test, control, alternative = NULL, interval = c(0,1000)) ## S4 method for signature 'matrix,matrix' estimateDispersion(test, control, alternative = NULL, interval = c(0,1000))
test |
Either a deepSNV object, or a matrix with the test counts. |
control |
Missing if test is a deepSNV object, otherwise missing. |
... |
Additional param passed to specific methods |
alternative |
The alternative to be tested. One of "greater", "less", "two-sided" (default). If test is a deepSNV object, automatically taken from the corresponding slot if unspecified. |
interval |
The interval to be screened for the overdispersion factor. Default (0,1000). |
A deepSNV-class
object if the input was a deepSNV object. Otherwise the loglikelihood and the estimated parameter.
Moritz Gerstung
data("RCC", package="deepSNV") plot(RCC) summary(RCC)[,1:6] RCC.bb = estimateDispersion(RCC, alternative = "two.sided") summary(RCC.bb)
data("RCC", package="deepSNV") plot(RCC) summary(RCC)[,1:6] RCC.bb = estimateDispersion(RCC, alternative = "two.sided") summary(RCC.bb)
It uses a method of moments approximation to estimate rho from the variances of the relative frequencies nu across samples
estimateRho(x, mu, ix, pseudo.rho = .Machine$double.eps)
estimateRho(x, mu, ix, pseudo.rho = .Machine$double.eps)
x |
counts |
mu |
relative frequency across all samples |
ix |
index indicating the set of samples to use (typically indicating those with relative frequency smaller than 0.1). |
pseudo.rho |
a pseudo count added to each sample to avoid problems with zeros. Default = .Machine$double.eps |
rho
Experimental code, subject to changes
mg14
Subsetting for deepSNV objects.
## S4 method for signature 'deepSNV,ANY,ANY,ANY' x[i, j]
## S4 method for signature 'deepSNV,ANY,ANY,ANY' x[i, j]
x |
A |
i |
Row indeces. |
j |
Column (nucleotide) indeces. |
A deepSNV-class
object.
Moritz Gerstung
data(HIVmix) HIVmix[1:10,]
data(HIVmix) HIVmix[1:10,]
This function uses the parallel package and the bam2R interface to load all nucleotide counts from a list of bam files and a set of regions into a large array.
loadAllData(files, regions, ..., mc.cores = 1)
loadAllData(files, regions, ..., mc.cores = 1)
files |
A character vector with the paths to all bam files |
regions |
Either a GRanges or data.frame with the coordinates of interest |
... |
Arguments passed to bam2R |
mc.cores |
Number of cores used for loading, default = 1 |
counts
Experimental code, subject to changes
mg14
This function computes the prior probability of detecting a true variant from a variation data base. It assumes a VCF file with a CNT slot for the count of a given base substitution. Such a VCF file can be downloaded at ftp://ngs.sanger.ac.uk/production/cosmic/. The prior probability is simply defined as pi.mut * CNT[i]/sum(CNT). On sites with no count, a background probability of pi0 is used.
makePrior(COSMIC, regions, pi.gene = 0.1, pi.backgr = 1e-04)
makePrior(COSMIC, regions, pi.gene = 0.1, pi.backgr = 1e-04)
COSMIC |
A VCF object from COSMIC VCF export. |
regions |
A GRanges object with the regions (gene) of interest. |
pi.gene |
Probability that a gene is mutated |
pi.backgr |
Background probability of a locus being mutated. Default 1e-4, corresponding to an expected value of 1 SNV per 1e4 bases. |
A vector of prior values with length given by the length of the regions GRanges object.
Experimental code, subject to changes
mg14
## Make prior (not run) #COSMIC <- readVcf("PATHTO/CosmicCodingMuts_v64_02042013_noLimit.vcf.gz", genome="GChr37") #prior <- makePrior(COSMIC[info(COSMIC)$GENE=="TP53"], regions=GRanges(17, IRanges(7571720,7578811))) #plot(prior[,1], type="h")
## Make prior (not run) #COSMIC <- readVcf("PATHTO/CosmicCodingMuts_v64_02042013_noLimit.vcf.gz", genome="GChr37") #prior <- makePrior(COSMIC[info(COSMIC)$GENE=="TP53"], regions=GRanges(17, IRanges(7571720,7578811))) #plot(prior[,1], type="h")
This functions performs a Manhattan plot of the p-values of a deepSNV test against the position
manhattanPlot(x, col = nt.col)
manhattanPlot(x, col = nt.col)
x |
An |
col |
An optional vector of colors for the nucleotides. |
NULL.
Moritz Gerstung
data(HIVmix) manhattanPlot(HIVmix)
data(HIVmix) manhattanPlot(HIVmix)
Little helper function to split the count objects into a smaller digestible chunks and run function FUN on each subset
mcChunk(FUN, X, split = 250, mc.cores = 1, ...)
mcChunk(FUN, X, split = 250, mc.cores = 1, ...)
FUN |
The function to call on each chunk |
X |
The object to be subsetted using [,i,] |
split |
The size of each chunk |
mc.cores |
The number of cores to use |
... |
Additional arguments passed to FUN |
The value of FUN
Experimental code, subject to changes
mg14
This functions performs a loess
normalization of the nucleotide. This experimental feature can
be used to compare experiments from different libraries or sequencing runs that may have differing noise characteristics.
normalize(test, control, ...) ## S4 method for signature 'matrix,matrix' normalize(test, control, round=TRUE, ...) ## S4 method for signature 'deepSNV,missing' normalize(test, control, ...)
normalize(test, control, ...) ## S4 method for signature 'matrix,matrix' normalize(test, control, round=TRUE, ...) ## S4 method for signature 'deepSNV,missing' normalize(test, control, ...)
test |
Either an |
control |
Missing if test is an |
... |
Parameters passed to |
round |
Logical. Should normalized counts be rounded to integers? Default=TRUE |
A deepSNV-class
object.
This feature is somewhat experimental and the results should be treated with care. Sometimes it can be better to leave the data unnormalized and use a model with greater dispersion instead.
Moritz Gerstung
data(phiX, package = "deepSNV") plot(phiX) phiN <- normalize(phiX, round = TRUE) plot(phiN)
data(phiX, package = "deepSNV") plot(phiX) phiN <- normalize(phiX, round = TRUE) plot(phiN)
This function combines two P-values into a single one using a statistic defined by method.
"fisher" uses the product of the two, in this case the logarithm of the product is
distributed. If the method = "max", the resulting P-value is
.
For method = "average" the mean is used, yielding a P-value of
if
and
otherwise. "negfisher" is the negative of Fisher's method using $1-F(1-P_1, 1-P_2)$, where $F$ is the combination
function of Fisher's method; for small $P_1,P_2$, the result is very similar to method="average". Fisher's method behaves a bit like a logical AND
of the joint null-hypothesis, whereas negative Fisher is like an OR.
p.combine(p1, p2, method = c("fisher", "max", "average", "prod", "negfisher"))
p.combine(p1, p2, method = c("fisher", "max", "average", "prod", "negfisher"))
p1 |
P-value 1 |
p2 |
P-value 2 |
method |
One of "fisher" (default), "max" or "average" |
p-values
Moritz Gerstung
p1 <- runif(1000) p2 <- runif(1000) hist(p1) p.avg = p.combine(p1,p2, method="average") hist(p.avg) p.fish = p.combine(p1,p2, method="fisher") hist(p.fish) p.max = p.combine(p1,p2, method="max") hist(p.max) pairs(data.frame(p1,p2,p.fish,p.max,p.avg))
p1 <- runif(1000) p2 <- runif(1000) hist(p1) p.avg = p.combine(p1,p2, method="average") hist(p.avg) p.fish = p.combine(p1,p2, method="fisher") hist(p.fish) p.max = p.combine(p1,p2, method="max") hist(p.max) pairs(data.frame(p1,p2,p.fish,p.max,p.avg))
Convenience function to get the p-values from a deepSNV object.
p.val(deepSNV, ...) ## S4 method for signature 'deepSNV' p.val(deepSNV)
p.val(deepSNV, ...) ## S4 method for signature 'deepSNV' p.val(deepSNV)
deepSNV |
a |
... |
Additional param passed to specific methods |
A matrix with the p-values.
data(HIVmix) p.val(HIVmix)[1:10,]
data(HIVmix) p.val(HIVmix)[1:10,]
Cumulative beta-binomial probability distribution
pbetabinom(x, n, mu, rho, log = FALSE)
pbetabinom(x, n, mu, rho, log = FALSE)
x |
Counts |
n |
Sample size |
mu |
Probability |
rho |
Dispersion. rho in (0,1) |
log |
Return logarithmic values |
Probability
mg14
Data from two phiX experiments sequenced on a GAIIx.
data(phiX, package="deepSNV") plot(phiX) phiN <- normalize(phiX, round=TRUE) plot(phiN)
data(phiX, package="deepSNV") plot(phiX) phiN <- normalize(phiX, round=TRUE) plot(phiN)
Prior from COSMIC v63 for the TP53 gene
data("pi", package="deepSNV") plot(pi[,1], type="h")
data("pi", package="deepSNV") plot(pi[,1], type="h")
This function plots the relative nucleotide frequencies of the test against the control experiment on a logarithmit scale. The color of the symbols
denotes the nucleotide, and the area of the circle is proportional to the of the p-value.
## S3 method for class 'deepSNV' plot( x, sig.level = NULL, col = NULL, col.null = "grey", cex.min = 0.2, ylab = "Relative Frequency in Test", xlab = "Relative Frequency in Control", pch = 16, ... )
## S3 method for class 'deepSNV' plot( x, sig.level = NULL, col = NULL, col.null = "grey", cex.min = 0.2, ylab = "Relative Frequency in Test", xlab = "Relative Frequency in Control", pch = 16, ... )
x |
A deep SNV object. |
sig.level |
By default, p-values below sig.level are drawn as filled circles. |
col |
Color of the nucleotides. |
col.null |
Color of insignificant nucleotides. |
cex.min |
The minimal size of the points. |
ylab |
The y-axis label. |
xlab |
The x-axis label. |
pch |
The plotting symbol. Default = 16 (filled circle) |
... |
Additional arguments passed to plot. |
Moritz Gerstung
## Short example with 2 SNVs at frequency ~10% regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140) ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10) show(ex) # show method plot(ex) # scatter plot summary(ex) # summary with significant SNVs ex[1:3,] # subsetting the first three genomic positions tail(test(ex, total=TRUE)) # retrieve the test counts on both strands tail(control(ex, total=TRUE)) ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself. # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585) # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10) data(HIVmix) # attach data instead.. show(HIVmix) plot(HIVmix) head(summary(HIVmix))
## Short example with 2 SNVs at frequency ~10% regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140) ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10) show(ex) # show method plot(ex) # scatter plot summary(ex) # summary with significant SNVs ex[1:3,] # subsetting the first three genomic positions tail(test(ex, total=TRUE)) # retrieve the test counts on both strands tail(control(ex, total=TRUE)) ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself. # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585) # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10) data(HIVmix) # attach data instead.. show(HIVmix) plot(HIVmix) head(summary(HIVmix))
VCF
object with variant calls from an array of q-values.This function thresholds the q-values computed by the shearwater algorithm and creates a VCF
object as output.
qvals2Vcf( qvals, counts, regions, samples = 1:nrow(counts), err = NULL, mu = NULL, cutoff = 0.05, mvcf = TRUE )
qvals2Vcf( qvals, counts, regions, samples = 1:nrow(counts), err = NULL, mu = NULL, cutoff = 0.05, mvcf = TRUE )
qvals |
array of q-values from |
counts |
array of counts from |
regions |
|
samples |
vector of samples names. |
err |
Optional matrix of error rates, otherwise recomputed from counts. |
mu |
Optional matrix of relative frequencies, otherwise recomputed from counts. |
cutoff |
Cutoff for the q-values below which a variant is considered to be true (default = 0.05) |
mvcf |
boolean flag, if TRUE compute a large VCF with as many genotype columns as samples. Default TRUE. Otherwise use duplicate rows and only one genotype column. The sample is then provided by the info:PD field. Can be inefficient for large sample sizes. |
A VCF
object
Experimental code, subject to changes
mg14
Deep sequencing experiments of a renal cell carcinoma and healthy control tissue.
data("RCC", package="deepSNV") summary(RCC, adjust.method="bonferroni")[,1:6] plot(RCC) RCC.bb <- estimateDispersion(RCC, alternative="two.sided") summary(RCC.bb, adjust.method="bonferroni")[,1:6] plot(RCC.bb)
data("RCC", package="deepSNV") summary(RCC, adjust.method="bonferroni")[,1:6] plot(RCC) RCC.bb <- estimateDispersion(RCC, alternative="two.sided") summary(RCC.bb, adjust.method="bonferroni")[,1:6] plot(RCC.bb)
This function masks homopolymeric repeats longer than a given width. These are hot-spots of sequencing error and can confound the analysis.
repeatMask(x, ...) ## S4 method for signature 'DNAString' repeatMask(x, w=5, flank=TRUE) ## S4 method for signature 'deepSNV' repeatMask(x, w=5, flank=TRUE)
repeatMask(x, ...) ## S4 method for signature 'DNAString' repeatMask(x, w=5, flank=TRUE) ## S4 method for signature 'deepSNV' repeatMask(x, w=5, flank=TRUE)
x |
An object. Either a |
... |
Additional param passed to specific methods |
w |
Integer. The minimal length at which repeats should be masked. Default |
flank |
Boolean. Indicates whether the sites adjacent to the repeat should also be masked. |
A boolean vector where TRUE indicates a non-homopolymeric region.
Moritz Gerstung
data(HIVmix) which(repeatMask(HIVmix))
data(HIVmix) which(repeatMask(HIVmix))
Convenience function to compute the relative frequencies from a matrix with absolute counts.
RF(freq, total = FALSE)
RF(freq, total = FALSE)
freq |
A matrix with nucleotide counts. |
total |
If the nucleotide counts have columns for forward and reverse direction, return each strand sepratatelu (FALSE), or add the two (TRUE). |
A matrix with the relative frequencies.
Moritz Gerstung
data(HIVmix) RF(test(HIVmix))[1:10,] RF(test(HIVmix), total=TRUE)[1:10,]
data(HIVmix) RF(test(HIVmix))[1:10,] RF(test(HIVmix), total=TRUE)[1:10,]
Show method for deepSNV objects
## S4 method for signature 'deepSNV' show(object)
## S4 method for signature 'deepSNV' show(object)
object |
A |
Moritz Gerstung
## Short example with 2 SNVs at frequency ~10% regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140) ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10) show(ex) # show method plot(ex) # scatter plot summary(ex) # summary with significant SNVs ex[1:3,] # subsetting the first three genomic positions tail(test(ex, total=TRUE)) # retrieve the test counts on both strands tail(control(ex, total=TRUE)) ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself. # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585) # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10) data(HIVmix) # attach data instead.. show(HIVmix) plot(HIVmix) head(summary(HIVmix))
## Short example with 2 SNVs at frequency ~10% regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140) ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10) show(ex) # show method plot(ex) # scatter plot summary(ex) # summary with significant SNVs ex[1:3,] # subsetting the first three genomic positions tail(test(ex, total=TRUE)) # retrieve the test counts on both strands tail(control(ex, total=TRUE)) ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself. # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585) # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10) data(HIVmix) # attach data instead.. show(HIVmix) plot(HIVmix) head(summary(HIVmix))
Tabularize significant SNVs by evalutating the p-values of the deepSNV
test.
## S4 method for signature 'deepSNV' summary( object, sig.level = 0.05, adjust.method = "bonferroni", fold.change = 1, value = c("data.frame", "VCF") )
## S4 method for signature 'deepSNV' summary( object, sig.level = 0.05, adjust.method = "bonferroni", fold.change = 1, value = c("data.frame", "VCF") )
object |
A |
sig.level |
The desired significance level. |
adjust.method |
The adjustment method for multiple testing corrections. See |
fold.change |
The minimal fold change required of the relative frequency. Default 1. |
value |
String. The type of the returned object. Either "data.frame" for a |
If value="data.frame", a data.frame
with the following columns:
chr |
The chromosome |
pos |
The position (1-based) |
ref |
The reference (consensus) nucleotide |
var |
The variant nucleotide |
p.val |
The (corrected) p-value |
freq.var |
The relative frequency of the SNV |
sigma2.freq.var |
The estimated variance of the frequency |
n.tst.fw |
The variant counts in the test experiment, forward strand |
cov.tst.fw |
The coverage in the test experiment, forward strand |
n.tst.bw |
The variant counts in the test experiment, backward strand |
cov.tst.bw |
The coverage in the test experiment, backward strand |
n.ctrl.fw |
The variant counts in the control experiment, forward strand |
cov.ctrl.fw |
The coverage in the control experiment, forward strand |
n.ctrl.bw |
The variant counts in the control experiment, backward strand |
cov.ctrl.bw |
The coverage in the control experiment, backward strand |
raw.p.val |
The raw p-value |
If value = "VCF", this functions returns a VCF-class
object with the following entries:
FIXED:
REF |
Reference allele in control sample. Note that deletions in the control sample will be reported like insertions, e.g. if the consensus of the control is A,- at positions 1 and 2 (relative to the reference) and the test was A,A, then this would be denoted as REF="A" and VAR="AA" with coordinate IRanges(1,2). This may cause ambiguities when the VCF object is written to text with writeVcf(), which discards the width of the coordinate, and this variant remains indistinguishable from an insertion to the _reference_ genome. |
VAR |
Variant allele in test sample |
QUAL |
-10*log10(raw.p.val) |
INFO:
VF |
Variant frequency. Variant allele frequency in the test minus variant allele frequency in the control. |
VFV |
Variant frequency variance. Variance of the variant frequency; can be thought of as confidence interval. |
GENO (one column for test and one column for control):
FW |
Forward allele count |
BW |
Backward allele count |
DFW |
Forward read depth |
DBW |
Backward read depth |
Moritz Gerstung
## Short example with 2 SNVs at frequency ~10% regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140) ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10) show(ex) # show method plot(ex) # scatter plot summary(ex) # summary with significant SNVs ex[1:3,] # subsetting the first three genomic positions tail(test(ex, total=TRUE)) # retrieve the test counts on both strands tail(control(ex, total=TRUE)) ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself. # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585) # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10) data(HIVmix) # attach data instead.. show(HIVmix) plot(HIVmix) head(summary(HIVmix))
## Short example with 2 SNVs at frequency ~10% regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140) ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10) show(ex) # show method plot(ex) # scatter plot summary(ex) # summary with significant SNVs ex[1:3,] # subsetting the first three genomic positions tail(test(ex, total=TRUE)) # retrieve the test counts on both strands tail(control(ex, total=TRUE)) ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself. # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585) # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10) data(HIVmix) # attach data instead.. show(HIVmix) plot(HIVmix) head(summary(HIVmix))
Convenience function to obtain the test counts from a deepSNV object.
test(deepSNV, ...) ## S4 method for signature 'deepSNV' test(deepSNV, total = FALSE)
test(deepSNV, ...) ## S4 method for signature 'deepSNV' test(deepSNV, total = FALSE)
deepSNV |
a |
... |
Additional param passed to specific methods |
total |
Logical. If true the sum of both strands is returned |
A matrix with the absolute frequencies summed over both strands.
data(HIVmix) test(HIVmix)[1:10,] test(HIVmix, total=TRUE)[1:10,]
data(HIVmix) test(HIVmix)[1:10,] test(HIVmix, total=TRUE)[1:10,]
Two .bam alignments as example data sets are downloaded remotely via http. Sequenced were a 1,512 nt fragment of the HIV genome and a mixture (90% + 10%) with another variants. The two sequences were confirmed by Sanger sequencing and stored in the table trueSNVs.
data(HIVmix) data(trueSNVs) table(p.adjust(p.val(HIVmix), method="BH") < 0.05, trueSNVs)
data(HIVmix) data(trueSNVs) table(p.adjust(p.val(HIVmix), method="BH") < 0.05, trueSNVs)