1Department of Dermatology, Roswell Park Comprehensive Cancer Center, Buffalo, NY 2Department of Cell Stress Biology, Roswell Park Comprehensive Cancer Center, Buffalo, NY 3Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY
In detail description of methods and application of seq.hotSPOT
Next generation sequencing is a powerful tool for assessment of mutation burden in both healthy and diseased tissues. However, in order to sufficiently capture mutation burden in clinically healthy tissues, deep sequencing is required. While whole-exome and whole-genome sequencing are popular methods for sequencing cancer samples, it is not economically feasible to sequence large genomic regions at the high depth needed for healthy tissues. Therefore, it is important to identify relevant genomic areas to design targeted sequencing panels.
Currently, minimal resources exist which enable researchers to design
their own targeted sequencing panels based on specific biological
questions and tissues of interest. seq.hotSPOT may be used in
combination with the Bioconductor package RTCGA.mutations
,
which can be used to pull mutation datasets from the TCGA database to be
used as input data in seq.hotSPOT functions. This would not only allow
users to identify highly mutated regions in cancer of interest, but the
package RTCGA.clinical
may be also used to identify highly
mutated regions in subsets of patients with specific clinical features
of interest.
seq.hotSPOT provides a resource for designing effective sequencing panels to help improve mutation capture efficacy for ultradeep sequencing projects. Establishing efficient targeted sequencing panels can allow researchers to study mutation burden in tissues at high depth without the economic burden of whole-exome or whole-genome sequencing.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("seq.hotSPOT")
Load [seq.hotSPOT][]
The mutation dataset should include two columns containing the chromosome and genomic position of each mutation. The columns should be named “chr” and “pos” respectively. Optionally, the gene names for each mutation may be included under a column named “gene”.
This algorithm searches the mutational dataset (input) for mutational hotspot regions on each chromosome:
amps <- amp_pool(data = mutation_data, amp = 100)
head(amps)
#> lowerbound upperbound chromosome count id mut_lowerbound mut_upperbound
#> 1 1803511 1803610 4 17 x 1803553 1803564
#> 2 1806007 1806106 4 11 x 1806047 1806066
#> 3 1808912 1809011 4 6 x 1808958 1808970
#> 4 126329597 126329696 4 34 x 126329601 126329700
#> 5 7577035 7577134 17 10 x 7577058 7577127
#> 6 7577498 7577597 17 38 x 7577537 7577569
fw_bins <- fw_hotspot(bins = amps, data = mutation_data, amp = 100, len = 1000, include_genes = TRUE)
head(fw_bins)
#> Lowerbound Upperbound Chromosome Mutation Count Cumulative Panel Length
#> 6 7577498 7577597 17 38 100
#> 4 126329597 126329696 4 34 200
#> 1 1803511 1803610 4 17 300
#> 102 120512189 120512288 1 13 400
#> 2 1806007 1806106 4 11 500
#> 5 7577035 7577134 17 10 600
#> Cumulative Mutations Gene
#> 6 38 TP53
#> 4 72 FAT4
#> 1 89 FGFR3
#> 102 102 NOTCH2
#> 2 113 FGFR3
#> 5 123 TP53
com_bins <- com_hotspot(fw_panel = fw_bins, bins = amps, data = mutation_data,
amp = 100, len = 1000, size = 3, include_genes = TRUE)
head(com_bins)
#> Lowerbound Upperbound Chromosome Mutation Count Cumulative Panel Length
#> 6 7577498 7577597 17 38 100
#> 4 126329597 126329696 4 34 200
#> 1 1803511 1803610 4 17 300
#> 2 1806007 1806106 4 11 400
#> 5 7577035 7577134 17 10 500
#> 47 120497611 120497710 1 8 600
#> Cumulative Mutations Gene
#> 6 38 TP53
#> 4 72 FAT4
#> 1 89 FGFR3
#> 2 100 FGFR3
#> 5 110 TP53
#> 47 118 NOTCH2
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] seq.hotSPOT_1.7.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] hash_2.2.6.3 digest_0.6.37 R6_2.5.1
#> [4] fastmap_1.2.0 xfun_0.49 maketools_1.3.1
#> [7] cachem_1.1.0 R.utils_2.12.3 knitr_1.49
#> [10] htmltools_0.5.8.1 rmarkdown_2.29 buildtools_1.0.0
#> [13] lifecycle_1.0.4 cli_3.6.3 R.methodsS3_1.8.2
#> [16] sass_0.4.9 jquerylib_0.1.4 compiler_4.4.2
#> [19] R.oo_1.27.0 sys_3.4.3 tools_4.4.2
#> [22] evaluate_1.0.1 bslib_0.8.0 yaml_2.3.10
#> [25] BiocManager_1.30.25 jsonlite_1.8.9 rlang_1.1.4