Title: | Mosaic Aneuploidy Detection and Quantification using Massive Parallel Sequencing Data |
---|---|
Description: | The MADSEQ package provides a group of hierarchical Bayeisan models for the detection of mosaic aneuploidy, the inference of the type of aneuploidy and also for the quantification of the fraction of aneuploid cells in the sample. |
Authors: | Yu Kong, Adam Auton, John Murray Greally |
Maintainer: | Yu Kong <[email protected]> |
License: | GPL(>=2) |
Version: | 1.33.0 |
Built: | 2024-12-19 03:39:22 UTC |
Source: | https://github.com/bioc/MADSEQ |
The MADSEQ package provides a group of hierarchical Bayesian models for the detection and quantification of mosaic aneuploidy using massive parallele sequencing data.
MADSEQ is a group of hierarchical Bayesian models used for the detection and quantification of mosaic aneuploidy. The package takes bam file and vcf file as input. There are functions for the calculation of the coverage for the sequencing data; the normalization of the coverage to correct GC bias; the detection and quantification of mosaic aneuploidy and the inference of the type of aneuploidy (monosomy, mitotic trisomy, meiotic trisomy, loss of heterozygosity). The package also includes function to visualize the estimated distribution for detected mosaic aneuploidy. To fully understand how to use the MADSEQ package, please check the documentation. The manual explains what data do you need, and how to process the data to be ready for the model, what steps to follow and how to interpret the output from our model.
Yu Kong
Martyn Plummer (2016). rjags: Bayesian Graphical Models using
MCMC. R package version 4-6.
http://CRAN.R-project.org/package=rjags
C. Alkan, J. Kidd, T. Marques-Bonet et al (2009).
Personalized copy number and segmental duplication maps using
next-generation sequencing. Nature Genetics, 41(10):1061-7.
An MadSeq object returned by the function runMadSeq
,
the object contains the posterior distribution and deltaBIC value of
a trisomy chromosome 18
aneuploidy_chr18
aneuploidy_chr18
An MadSeq object
MadSeq object returned from runMadSeq
function,
mitotic trisomy has been detected for the chromosome18
## to load the data data(aneuploidy_chr18) ## check statistics of the data summary(aneuploidy_chr18)
## to load the data data(aneuploidy_chr18) ## check statistics of the data summary(aneuploidy_chr18)
An S4 method to access the delta BIC values of MadSeq
object
deltaBIC(object) ## S4 method for signature 'MadSeq' deltaBIC(object)
deltaBIC(object) ## S4 method for signature 'MadSeq' deltaBIC(object)
object |
A |
A numeric
vector containing deltaBIC values between selected
model and other models
Yu Kong
## load the example MadSeq object come with the package data("aneuploidy_chr18") ## access deltaBIC deltaBIC(aneuploidy_chr18)
## load the example MadSeq object come with the package data("aneuploidy_chr18") ## access deltaBIC deltaBIC(aneuploidy_chr18)
An S4 class contains estimated result returned from runMadSeq
function
posterior
A matrix
contains the posterior distribution
from the selected model
deltaBIC
A numeric vector
contains the deltaBIC value between
selected model and other models. The deltaBIC between models indicate the
confidence level that selected model against other models:
deltaBIC ~ [0,2]: Not worth more than a bare mention
deltaBIC ~ [2,6]: Positive
deltaBIC ~ [6,10]: Strong
deltaBIC >10: Very Strong
In the code below, x
is a MadSeq
object. posterior(x)
: Get the matrix containing posterior distribution
of selected model. deltaBIC(x)
: Get the deltaBIC between selected model and other
models
In the code below, x
is a MadSeq
object. summary(x)
: summarize the posterior distribution
In the code below, x
is a MadSeq
object. plotMadSeq(x)
: Plot the posterior distribution of all
parameters in selected model. plotFraction(x)
: Plot the estimated distribution of the
fraction of aneuploid sample. plotMixture(x)
: Plot the distribution of AAF estimated from
the selected model.
Yu Kong
function to normalize coverage by GC content and quantile normalization
normalizeCoverage(object, ..., control = NULL, writeToFile = TRUE, destination = NULL, plot = TRUE)
normalizeCoverage(object, ..., control = NULL, writeToFile = TRUE, destination = NULL, plot = TRUE)
object |
A |
... |
additional |
control |
A |
writeToFile |
|
destination |
A |
plot |
|
If writeToFile
is set to TRUE, normalized coverage will be
written to the destination
. Otherwise, a GRangesList
object containing each of input sample will be returned.
The normalize function works better when you have multiple samples sequenced using the same protocol, namely have the same targeted regions. And if you have female sample and male sample, the best way is to normalize them separately.
Yu Kong
C. Alkan, J. Kidd, T. Marques-Bonet et al (2009). Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genetics, 41(10):1061-7.
##------------------------------------ ##if you deal with single sample ##------------------------------------ ## 1. prepare coverage and gc ## specify the path to the location of bed file target = system.file("extdata","target.bed",package="MADSEQ") ## specify the path to the bam file aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ") ## prepare coverage data for the aneuploidy sample aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19") ## normalize the coverage ##---- if not write to file ---- aneuploidy_norm = normalizeCoverage(aneuploidy_cov_gc,writeToFile=FALSE) ## check the GRangesList and subset your sample aneuploidy_norm names(aneuploidy_norm) aneuploidy_norm["aneuploidy_cov_gc"] ##---- if write to file ---- normalizeCoverage(aneuploidy_cov_gc,writeToFile=TRUE,destination=".") ##----------------------------------------------------------- ##if you deal with multiple samples without normal control ##----------------------------------------------------------- ## specify the path to the location of bed file target = system.file("extdata","target.bed",package="MADSEQ") ## specify the path to the bam file aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ") normal_bam = system.file("extdata","normal.bam",package="MADSEQ") ## prepare coverage data for the samples aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19") normal_cov_gc = prepareCoverageGC(target,normal_bam,"hg19") ## normalize the coverage normed=normalizeCoverage(aneuploidy_cov_gc,normal_cov_gc,writeToFile=FALSE) names(normed) normed["aneuploidy_cov_gc"] normed["normal_cov_gc"] ## or normalizeCoverage(aneuploidy_cov_gc,normal_cov_gc, writeToFile=TRUE,destination=".") ##----------------------------------------------------------- ##if you deal with multiple samples with a normal control ##----------------------------------------------------------- ## specify the path to the location of bed file target = system.file("extdata","target.bed",package="MADSEQ") ## specify the path to the bam file aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ") normal_bam = system.file("extdata","normal.bam",package="MADSEQ") ## prepare coverage data for the samples aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19") normal_cov_gc = prepareCoverageGC(target,normal_bam,"hg19") ## normalize the coverage normed = normalizeCoverage(aneuploidy_cov_gc, control=normal_cov_gc,writeToFile=FALSE) ## or normalizeCoverage(aneuploidy_cov_gc,control=normal_cov_gc, writeToFile=TRUE,destination=".")
##------------------------------------ ##if you deal with single sample ##------------------------------------ ## 1. prepare coverage and gc ## specify the path to the location of bed file target = system.file("extdata","target.bed",package="MADSEQ") ## specify the path to the bam file aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ") ## prepare coverage data for the aneuploidy sample aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19") ## normalize the coverage ##---- if not write to file ---- aneuploidy_norm = normalizeCoverage(aneuploidy_cov_gc,writeToFile=FALSE) ## check the GRangesList and subset your sample aneuploidy_norm names(aneuploidy_norm) aneuploidy_norm["aneuploidy_cov_gc"] ##---- if write to file ---- normalizeCoverage(aneuploidy_cov_gc,writeToFile=TRUE,destination=".") ##----------------------------------------------------------- ##if you deal with multiple samples without normal control ##----------------------------------------------------------- ## specify the path to the location of bed file target = system.file("extdata","target.bed",package="MADSEQ") ## specify the path to the bam file aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ") normal_bam = system.file("extdata","normal.bam",package="MADSEQ") ## prepare coverage data for the samples aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19") normal_cov_gc = prepareCoverageGC(target,normal_bam,"hg19") ## normalize the coverage normed=normalizeCoverage(aneuploidy_cov_gc,normal_cov_gc,writeToFile=FALSE) names(normed) normed["aneuploidy_cov_gc"] normed["normal_cov_gc"] ## or normalizeCoverage(aneuploidy_cov_gc,normal_cov_gc, writeToFile=TRUE,destination=".") ##----------------------------------------------------------- ##if you deal with multiple samples with a normal control ##----------------------------------------------------------- ## specify the path to the location of bed file target = system.file("extdata","target.bed",package="MADSEQ") ## specify the path to the bam file aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ") normal_bam = system.file("extdata","normal.bam",package="MADSEQ") ## prepare coverage data for the samples aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19") normal_cov_gc = prepareCoverageGC(target,normal_bam,"hg19") ## normalize the coverage normed = normalizeCoverage(aneuploidy_cov_gc, control=normal_cov_gc,writeToFile=FALSE) ## or normalizeCoverage(aneuploidy_cov_gc,control=normal_cov_gc, writeToFile=TRUE,destination=".")
histgram of the posterior distribution of the fraction of aneuploid cells estimated by the selected model.
plotFraction(object, prob = 0.95) ## S4 method for signature 'MadSeq' plotFraction(object, prob = 0.95)
plotFraction(object, prob = 0.95) ## S4 method for signature 'MadSeq' plotFraction(object, prob = 0.95)
object |
|
prob |
A |
the histgram of posterior distribution of the fraction
If normal model has been selected by runMadSeq
function,
no fraction plot will be produced by this function.
Yu Kong
Yu Kong
runMadSeq
, plotMadSeq
,
plotMixture
## load the example MadSeq object come with the package data("aneuploidy_chr18") ## plot estimated fraction of aneuploid cells plotFraction(aneuploidy_chr18)
## load the example MadSeq object come with the package data("aneuploidy_chr18") ## plot estimated fraction of aneuploid cells plotFraction(aneuploidy_chr18)
plot the density plot for each of the parameters in the posterior distribution from selected model
plotMadSeq(object) ## S4 method for signature 'MadSeq' plotMadSeq(object)
plotMadSeq(object) ## S4 method for signature 'MadSeq' plotMadSeq(object)
object |
the density plot for parameters in the posterior distribution of selected model.
Yu Kong
Yu Kong
runMadSeq
, plotFraction
,
plotMixture
## load the example MadSeq object come with the package data("aneuploidy_chr18") ## plot the posterior distribution plotMadSeq(aneuploidy_chr18)
## load the example MadSeq object come with the package data("aneuploidy_chr18") ## plot the posterior distribution plotMadSeq(aneuploidy_chr18)
density plot presents the posterior distribution of alternative allele frequency (AAF) estimated from selected model
plotMixture(object) ## S4 method for signature 'MadSeq' plotMixture(object)
plotMixture(object) ## S4 method for signature 'MadSeq' plotMixture(object)
object |
density plot for the posterior distribution of AAF
Yu Kong
Yu Kong
runMadSeq
, plotMadSeq
,
plotFraction
## load the example MadSeq object come with the package data("aneuploidy_chr18") ## plot the distribution of estimated AAF plotMixture(aneuploidy_chr18)
## load the example MadSeq object come with the package data("aneuploidy_chr18") ## plot the distribution of estimated AAF plotMixture(aneuploidy_chr18)
An S4 method to access the posterior distribution of
MadSeq
object
posterior(object) ## S4 method for signature 'MadSeq' posterior(object)
posterior(object) ## S4 method for signature 'MadSeq' posterior(object)
object |
A |
A matrix
containing posterior distribution of selected model
Yu Kong
Yu Kong
## load the example MadSeq object come with the package data("aneuploidy_chr18") ## access posterior distribution posterior(aneuploidy_chr18)
## load the example MadSeq object come with the package data("aneuploidy_chr18") ## access posterior distribution posterior(aneuploidy_chr18)
Given a bam file and a bed file containing targeted regions, return sequencing coverage and GC content for each targeted region
prepareCoverageGC(target_bed, bam, genome_assembly = "hg19")
prepareCoverageGC(target_bed, bam, genome_assembly = "hg19")
target_bed |
A |
bam |
character, path to the bam file. Please make sure that bam file is sorted, and the index bam is present |
genome_assembly |
A |
a GRanges object with at least two mcols: depth and GC, each range indicating a targeted region
The bam file should be sorted and indexed.
Yu Kong
## specify the path to the location of bed file target = system.file("extdata","target.bed",package="MADSEQ") ## specify the path to the bam file aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ") normal_bam = system.file("extdata","normal.bam",package="MADSEQ") ## prepare coverage data for the samples aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19") normal_cov_gc = prepareCoverageGC(target,normal_bam,"hg19")
## specify the path to the location of bed file target = system.file("extdata","target.bed",package="MADSEQ") ## specify the path to the bam file aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ") normal_bam = system.file("extdata","normal.bam",package="MADSEQ") ## prepare coverage data for the samples aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19") normal_cov_gc = prepareCoverageGC(target,normal_bam,"hg19")
given the vcf file and bed file containing targeted region, generate processed heterozygous sites for furthur analysis
prepareHetero(vcffile, target_bed, genome = "hg19", writeToFile = TRUE, destination = NULL, plot = FALSE)
prepareHetero(vcffile, target_bed, genome = "hg19", writeToFile = TRUE, destination = NULL, plot = FALSE)
vcffile |
A |
target_bed |
A |
genome |
A |
writeToFile |
|
destination |
A |
plot |
A |
If writeToFile
is set to TRUE, processed table will be
written to the destination
. Otherwise, a GRanges
object containing each of input sample will be returned.
1. The vcf file you provided need to be compressed by bgzip
2. The vcf file should contain depth and allelic depth for variants in the
FORMAT field
Yu Kong
## specify the path to the vcf.gz file for the aneuploidy sample aneuploidy_vcf=system.file("extdata","aneuploidy.vcf.gz",package="MADSEQ") target = system.file("extdata","target.bed",package="MADSEQ") ##------ if not write to file ------ aneuploidy_hetero=prepareHetero(aneuploidy_vcf,target,writeToFile=FALSE) ##------ if write to file ------ prepareHetero(aneuploidy_vcf, target,writeToFile=TRUE, destination=".")
## specify the path to the vcf.gz file for the aneuploidy sample aneuploidy_vcf=system.file("extdata","aneuploidy.vcf.gz",package="MADSEQ") target = system.file("extdata","target.bed",package="MADSEQ") ##------ if not write to file ------ aneuploidy_hetero=prepareHetero(aneuploidy_vcf,target,writeToFile=FALSE) ##------ if write to file ------ prepareHetero(aneuploidy_vcf, target,writeToFile=TRUE, destination=".")
Take in the heterozygous sites and coverage information, use different models (normal, monosomy, mitotic trisomy, meiotic trisomy, loss of heterozygosity) to fit the data, and select the model fit the data best according to BIC value and return estimation of the fraction of aneuploid cells.
runMadSeq(hetero, coverage, target_chr, adapt = 10000, burnin = 10000, nChain = 2, nStep = 10000, thinSteps = 2, checkConvergence = FALSE, plot = TRUE)
runMadSeq(hetero, coverage, target_chr, adapt = 10000, burnin = 10000, nChain = 2, nStep = 10000, thinSteps = 2, checkConvergence = FALSE, plot = TRUE)
hetero |
A |
coverage |
A |
target_chr |
A |
adapt |
A |
burnin |
A |
nChain |
A |
nStep |
A |
thinSteps |
A |
checkConvergence |
A |
plot |
A |
An S4 object of class MadSeq
containing the posterior
distribution for the selected model, and deltaBIC between five models.
1.If you didn't write normalized coverage into file, please subset the
normalized coverage GRanges
object from the GRangesList
object returned from the normalizeCoverage
funciton.
2. When specify target_chr
, please make sure it consist with the
contig names in your sequencing data, example: "chr1" and "1".
3. If checkConvergence
set to TRUE, the nChain
has to be >2
4. If it shows that your chains are not converged,
helpful options are increasing the adapt and burnin steps.
5. Because the model is an MCMC sampling process, it can take a very long
time to finish. Running in the background or HPC is recommended.
Yu Kong
Martyn Plummer (2016). rjags: Bayesian Graphical Models using
MCMC. R package version 4-6.
https://CRAN.R-project.org/package=rjags
MadSeq
, plotMadSeq
,
plotFraction
, plotMixture
## ------------------------------------------------------------------------ ## The following example is for the case that normalized coverage and ## processed heterozygous sites have not been written to files. For more ## examples, please check the documentation. ## ------------------------------------------------------------------------ ##------Prepare Heterozygous Sites ## specify the path to the vcf.gz file for the aneuploidy sample aneuploidy_vcf = system.file("extdata","aneuploidy.vcf.gz",package="MADSEQ") ## specify the path to the bed file containing targeted region target = system.file("extdata","target.bed",package="MADSEQ") ## prepare heterozygous sites aneuploidy_hetero = prepareHetero(aneuploidy_vcf,target, writeToFile=FALSE) ##------Prepare Normalized Coverage ## specify the path to the bam file aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ") normal_bam = system.file("extdata","normal.bam",package="MADSEQ") ## prepare coverage data for the samples aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19") normal_cov_gc = prepareCoverageGC(target,normal_bam,"hg19") ## normalize the coverage normed = normalizeCoverage(aneuploidy_cov_gc, control=normal_cov_gc,writeToFile=FALSE) ##------subset normalized coverage GRanges object aneuploidy_normed_cov = normed[["aneuploidy_cov_gc"]] ## check chromosome18 ## (to speed up the example, we only run one chain and less steps here, ## but default settings are recommended in real case) aneuploidy_chr18 = runMadSeq(aneuploidy_hetero, aneuploidy_normed_cov, target_chr="chr18", adapt=100, burnin=200, nChain =1, nStep = 1000, thinSteps=1)
## ------------------------------------------------------------------------ ## The following example is for the case that normalized coverage and ## processed heterozygous sites have not been written to files. For more ## examples, please check the documentation. ## ------------------------------------------------------------------------ ##------Prepare Heterozygous Sites ## specify the path to the vcf.gz file for the aneuploidy sample aneuploidy_vcf = system.file("extdata","aneuploidy.vcf.gz",package="MADSEQ") ## specify the path to the bed file containing targeted region target = system.file("extdata","target.bed",package="MADSEQ") ## prepare heterozygous sites aneuploidy_hetero = prepareHetero(aneuploidy_vcf,target, writeToFile=FALSE) ##------Prepare Normalized Coverage ## specify the path to the bam file aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ") normal_bam = system.file("extdata","normal.bam",package="MADSEQ") ## prepare coverage data for the samples aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19") normal_cov_gc = prepareCoverageGC(target,normal_bam,"hg19") ## normalize the coverage normed = normalizeCoverage(aneuploidy_cov_gc, control=normal_cov_gc,writeToFile=FALSE) ##------subset normalized coverage GRanges object aneuploidy_normed_cov = normed[["aneuploidy_cov_gc"]] ## check chromosome18 ## (to speed up the example, we only run one chain and less steps here, ## but default settings are recommended in real case) aneuploidy_chr18 = runMadSeq(aneuploidy_hetero, aneuploidy_normed_cov, target_chr="chr18", adapt=100, burnin=200, nChain =1, nStep = 1000, thinSteps=1)
An S4 method to summarize statistics for MadSeq
object
## S4 method for signature 'MadSeq' summary(object)
## S4 method for signature 'MadSeq' summary(object)
object |
A |
a table
containing statistics for each parameters in the
selected model
Yu Kong
## load the example MadSeq object come with the package data("aneuploidy_chr18") ## show statistics summary(aneuploidy_chr18)
## load the example MadSeq object come with the package data("aneuploidy_chr18") ## show statistics summary(aneuploidy_chr18)