Title: | Detection, Characterization and Visualization of Kataegis in Sequencing Data |
---|---|
Description: | Kataegis refers to the occurrence of regional hypermutation and is a phenomenon observed in a wide range of malignancies. Using changepoint detection katdetectr aims to identify putative kataegis foci from common data-formats housing genomic variants. Katdetectr has shown to be a robust package for the detection, characterization and visualization of kataegis. |
Authors: | Daan Hazelaar [aut, cre] , Job van Riet [aut] , Harmen van de Werken [ths] |
Maintainer: | Daan Hazelaar <[email protected]> |
License: | GPL-3 + file LICENSE |
Version: | 1.9.0 |
Built: | 2024-10-30 08:37:43 UTC |
Source: | https://github.com/bioc/katdetectr |
Constructor function for a KatDetect object.
constructKatdetect( genomicVariants = VariantAnnotation::VRanges(), segments = GenomicRanges::GRanges(), kataegisFoci = GenomicRanges::GRanges(), info = list() )
constructKatdetect( genomicVariants = VariantAnnotation::VRanges(), segments = GenomicRanges::GRanges(), kataegisFoci = GenomicRanges::GRanges(), info = list() )
genomicVariants |
(VRanges) |
segments |
(GRanges) |
kataegisFoci |
(GRanges) |
info |
(list) |
(KatDetect): Returns a KatDetect object.
Daan Hazelaar
constructKatdetect()
constructKatdetect()
Detection of kataegis foci using changepoint detection.
Changepoint detection is performed on the intermutation distance (IMD) of the variants using the changepoint (killick 2014) package.
Note that we recommend using the default parameters for the detection of kataegis.
detectKataegis( genomicVariants, refSeq = "hg19", minSizeKataegis = 6, IMDcutoff = 1000, test.stat = "Exponential", penalty = "BIC", pen.value = 0, method = "PELT", minseglen = 2, BPPARAM = BiocParallel::SerialParam(), aggregateRecords = FALSE )
detectKataegis( genomicVariants, refSeq = "hg19", minSizeKataegis = 6, IMDcutoff = 1000, test.stat = "Exponential", penalty = "BIC", pen.value = 0, method = "PELT", minseglen = 2, BPPARAM = BiocParallel::SerialParam(), aggregateRecords = FALSE )
genomicVariants |
(VRanges, VCF or MAF): VRanges, or path to VCF or MAF file containing genomic variants. |
refSeq |
(character or data.frame): The used reference genome for variant calling. Choose: "hg19" or "hg38". For analysis of non standard sequences: provide a data.frame containing the length of the sequences. |
minSizeKataegis |
(integer): Minimal number of variants required within a segment for classification as a kataegis foci. |
IMDcutoff |
(numberic or function): When a numeric is supplied this represents the max mean IMD within a segment for classification as a kataegis foci. When a custom function is supplied by the user a IMD cutoff value is determined for each segment. |
test.stat |
(character): Distribution that is fitted to the data (Exponential or Empirical). See cpt.meanvar. |
penalty |
(character): Penalty used to guard against overfitting (BIC or Manual). See cpt.meanvar. |
pen.value |
(integer): Only needed for manual penalty. See cpt.meanvar. |
method |
(character): The search method used in changepoint analysis. Choice of: "PELT", "AMOC", "SegNeigh" or "BinSeg". |
minseglen |
(integer): Min. size of segments (no. of variants). |
BPPARAM |
(BiocParallelParam): Can be used for parallelization purposes. |
aggregateRecords |
(logical): Aggregate multiple samples and treat as-if all records originate from a single sample. |
(KatDetect): Returns a KatDetect object including putative kataegis foci.
Daan Hazelaar
Job van Riet
Killick R, Eckley I (2014). “changepoint: An R package for changepoint analysis.” Journal of statistical software, 58(3), 1–19.
syntheticData <- generateSyntheticData() kd <- detectKataegis(syntheticData)
syntheticData <- generateSyntheticData() kd <- detectKataegis(syntheticData)
This function generates a synthetic dataset (VRanges) containing background genomic variants and a no. of desired interjected kataegis foci.
generateSyntheticData( genome = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, nBackgroundVariants = 100, seqnames = NULL, probMutationType = c(0.8, 0.1, 0.1), nKataegisFoci = 5, nKataegisVariants = 20, expectedIMD = 100, sampleName = "syntheticData", removeValidationColumns = TRUE )
generateSyntheticData( genome = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, nBackgroundVariants = 100, seqnames = NULL, probMutationType = c(0.8, 0.1, 0.1), nKataegisFoci = 5, nKataegisVariants = 20, expectedIMD = 100, sampleName = "syntheticData", removeValidationColumns = TRUE )
genome |
(BSgenome): The genome (DNA) which will be sampled for genomic variants. |
nBackgroundVariants |
(integer): The no. of generated background genomic variants. |
seqnames |
(character): The sequences on which genomic variants will be sampled. If NULL, then all human autosomes and sex chromosomes will be used. |
probMutationType |
(numeric): The probability of a generated variant being an SNV, MNV, Deletion or Insertion, respectively. |
nKataegisFoci |
(integer): The no. of generated and interjected kataegis foci. |
nKataegisVariants |
(integer): The no. of genomic variants within each kataegis foci. |
expectedIMD |
(integer): The expected mean intermutational distance (IMD) of the generated kataegis foci. |
sampleName |
(character): The name of the sample |
removeValidationColumns |
(logical): Include columns with extra information regarding mutation sampling? |
(VRanges): VRanges containing background genomic variants and pre-defined kataegis foci.
Daan Hazelaar
Job van Riet
syntheticData1 <- generateSyntheticData() syntheticData2 <- generateSyntheticData( genome = BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38, nBackgroundVariants = 75, seqnames = c("chr1", "chrX"), nKataegisFoci = 1, nKataegisVariants = 25, sampleName = "testSample", removeValidationColumns = FALSE )
syntheticData1 <- generateSyntheticData() syntheticData2 <- generateSyntheticData( genome = BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38, nBackgroundVariants = 75, seqnames = c("chr1", "chrX"), nKataegisFoci = 1, nKataegisVariants = 25, sampleName = "testSample", removeValidationColumns = FALSE )
Retrieve genomic variants from KatDetect object.
getGenomicVariants(x) ## S4 method for signature 'KatDetect' getGenomicVariants(x)
getGenomicVariants(x) ## S4 method for signature 'KatDetect' getGenomicVariants(x)
x |
(KatDetect): KatDetect object. |
(VRanges): Returns a VRanges with annotated genomic variants.
kd <- constructKatdetect() getGenomicVariants(kd)
kd <- constructKatdetect() getGenomicVariants(kd)
Retrieve model parameters from a KatDetect object.
getInfo(x) ## S4 method for signature 'KatDetect' getInfo(x)
getInfo(x) ## S4 method for signature 'KatDetect' getInfo(x)
x |
(KatDetect): KatDetect object. |
(list): Returns a list with all model parameters used for kataegis detection.
kd <- constructKatdetect() getInfo(kd)
kd <- constructKatdetect() getInfo(kd)
Retrieve kateagis foci from a KatDetect object.
getKataegisFoci(x) ## S4 method for signature 'KatDetect' getKataegisFoci(x)
getKataegisFoci(x) ## S4 method for signature 'KatDetect' getKataegisFoci(x)
x |
(KatDetect): KatDetect object. |
(GRanges): Returns a GRanges with annotated kataegis foci.
kd <- constructKatdetect() getKataegisFoci(kd)
kd <- constructKatdetect() getKataegisFoci(kd)
Retrieve segments from KatDetect a object.
getSegments(x) ## S4 method for signature 'KatDetect' getSegments(x)
getSegments(x) ## S4 method for signature 'KatDetect' getSegments(x)
x |
(KatDetect): KatDetect object. |
(GRanges): Returns a GRanges with annotated segments.
kd <- constructKatdetect() getSegments(kd)
kd <- constructKatdetect() getSegments(kd)
The katdetectr package introduces a new S4 object which stores all relevant information regarding kataegis detection.
(KatDetect) KatDetect object.
kataegisFoci
(GRanges): Contains all annotated putative kataegis foci.
genomicVariants
(VRanges): Contains all processed and annotated genomic variants.
segments
(GRanges): Contains all segments detected after changepoint analysis.
info
(list): Contains some general information and model parameters used for kataegis detection.
Daan Hazelaar
Job van Riet
syntheticData <- generateSyntheticData() kd <- detectKataegis(syntheticData) getKataegisFoci(kd) getGenomicVariants(kd) getSegments(kd) getInfo(kd)
syntheticData <- generateSyntheticData() kd <- detectKataegis(syntheticData) getKataegisFoci(kd) getGenomicVariants(kd) getSegments(kd) getInfo(kd)
The katdetectr package provides three main functions: detectKataegis(), rainfallPlot() and generateSyntheticData()
See the vignette for more details, a step by step explanation of these function, and the general katdetectr workflow.
Visualize the IMD, segments and putative kataegis foci using a rainfall plot.
Y-axis represents the 5' intermutation distance (IMD) of each genomic variant in a log10-scale. X-axis represent variant ID.
Variants within kataegis foci are bold with the kataegis foci shown in a blue rectangle (if showKataegis = TRUE). Color represent the mutation type whilst horizontal lines represent the mean IMD of a segments and Vertical lines depict the detected changepoints (if showSegmentation = TRUE).
rainfallPlot( kd, showSequence = "All", showKataegis = TRUE, showSegmentation = FALSE )
rainfallPlot( kd, showSequence = "All", showKataegis = TRUE, showSegmentation = FALSE )
kd |
(KatDetect): KatDetect object. |
showSequence |
(character): Which sequence(s) should be visualized? Choice of: 'All', 'Kataegis', c('seqname1', 'seqname2'). |
showKataegis |
(logical): Highlight putative kataegis foci and underlying genomic variants? |
showSegmentation |
(logical): Show changepoints and mean IMD of each segment? |
(ggplot): Returns rainfall plot.
Daan Hazelaar
Job van Riet
syntheticData <- generateSyntheticData(nBackgroundVariants = 200) kd <- detectKataegis(syntheticData) # Visualize the IMD of the genomic variants by constructing a rainfall plot katdetectr::rainfallPlot(kd) # Show the chromosomes which contain one or more kataegis foci katdetectr::rainfallPlot(kd, showSequence = "Kataegis") # Show only chromosome 1 and 2 katdetectr::rainfallPlot(kd = kd, showSequence = c("chr1", "chr2")) # Display changepoints and mean IMD per segment katdetectr::rainfallPlot(kd = kd, showSequence = c("chr1", "chr2"), showSegmentation = TRUE)
syntheticData <- generateSyntheticData(nBackgroundVariants = 200) kd <- detectKataegis(syntheticData) # Visualize the IMD of the genomic variants by constructing a rainfall plot katdetectr::rainfallPlot(kd) # Show the chromosomes which contain one or more kataegis foci katdetectr::rainfallPlot(kd, showSequence = "Kataegis") # Show only chromosome 1 and 2 katdetectr::rainfallPlot(kd = kd, showSequence = c("chr1", "chr2")) # Display changepoints and mean IMD per segment katdetectr::rainfallPlot(kd = kd, showSequence = c("chr1", "chr2"), showSegmentation = TRUE)