Uniparental disomy (UPD) is a rare genetic condition where an individual inherits both copies of a chromosome from one parent, as opposed to the typical inheritance of one copy from each parent. The extent of its contribution as a causative factor in rare genetic diseases remains largely unexplored. UPDs can lead to disease either by inheriting a carrier pathogenic mutation as homozygous from a carrier parent or by causing errors in genetic imprinting. Nevertheless, there are currently no standardized methods available for the detection and characterization of these events.
We have developed UPDhmm
R/Bioconductor package. The
package provides a tool method to detect, classify and stablish the
location of uniparental disomy events.
“UPDhmm
relies on a Hidden Markov Model (HMM) to
identify regions with UPD.
In our HMM model, observations are the combination of the genotypes from
the father, mother and proband for every genomic variant in the input
data. The hidden states of the model represent five inheritance
patterns: normal (mendelian inheritance), maternal isodisomy, paternal
isodisomy, maternal heterodisomy and paternal heterodisomy. Emission
probabilities were defined based on the inheritance patterns. Viterbi
algorithm was employed to infer the most likely combination of hidden
states underlying a sequence of observations, thus defining the most
likely inheritance pattern for each genetic variant. UPDhmm
reports segments in the genome having an inheritance pattern different
than normal, and thus, likely harbouring a UPD.”
The input of the package is a multisample vcf file with the following requirements:
The UPDhmm package includes one example dataset, adapted from GIB This dataset serves as a practical illustration and can be utilized for testing and familiarizing users with the functionality of the package.
After reading the VCF file, the vcfCheck()
function is
employed for preprocessing the input data. This function facilitates
reading the VCF in the suitable format for the UPDhmm package.
The principal function of UPDhmm
package,
calculateEvents()
, is the central function for identifying
Uniparental Disomy (UPD) events. It takes the output from the previous
vcfCheck()
function and systematically analyzes genomic
data, splitting the VCF into chromosomes and applying the Viterbi
algorithm.
calculateEvents()
function encompasses a serie of
subprocesses for identifying Uniparental disomies (UPDs):
data.frame
seqnames start end group n_snps log_likelihood p_value
1 3 100374740 197574936 het_mat 47 75.21933 4.212224e-18
2 6 32489853 32490000 iso_mat 6 38.63453 5.110683e-10
3 6 32609207 32609341 iso_mat 11 95.23543 1.690378e-22
4 11 55322539 55587117 iso_mat 7 40.02082 2.512703e-10
5 14 105408955 105945891 het_fat 30 20.84986 4.967274e-06
6 15 22382655 23000272 iso_mat 12 48.33859 3.586255e-12
n_mendelian_error
1 2
2 5
3 7
4 3
5 2
6 3
The calculateEvents
function returns a
data.frame
containing all detected events in the provided
trio. If no events are found, the function will return an empty
data.frame.
Column name | Description |
---|---|
seqnames |
Chromosome |
start |
Start position of the block |
end |
End position of the block |
n_snps |
Number of variants within the event |
group |
Predicted state |
log_likelihood |
log likelihood ratio |
p_value |
p-value |
n_mendelian_error |
Count of Mendelian errors (genotypic combinations that are inconsistent with Mendelian inheritance principles) |
To visualize the results, the karyoploteR
package can be
used. Here, a custom function is provided to facilitate easy
implementation with the output format results. This function enables the
visualization of the detected blocks by the calculateEvents
function.
In this example, we can observe how the package detects the simulated event on chromosome 3, as well as the specific type of event. In the plot, the autosomes chromosomes are represented, and we can see that the entire q arm of chromosome 3 is colored. With the legend, we can identify that the event is a maternal heterodisomy.
library(karyoploteR)
library(regioneR)
plotUPDKp <- function(updEvents) {
updEvents$seqnames <- paste0("chr", updEvents$seqnames)
het_fat <- toGRanges(subset(
updEvents,
group == "het_fat"
)[, c("seqnames", "start", "end")])
het_mat <- toGRanges(subset(
updEvents,
group == "het_mat"
)[, c("seqnames", "start", "end")])
iso_fat <- toGRanges(subset(
updEvents,
group == "iso_fat"
)[, c("seqnames", "start", "end")])
iso_mat <- toGRanges(subset(
updEvents,
group == "iso_mat"
)[, c("seqnames", "start", "end")])
kp <- plotKaryotype(genome = "hg19")
kpPlotRegions(kp, het_fat, col = "#AAF593")
kpPlotRegions(kp, het_mat, col = "#FFB6C1")
kpPlotRegions(kp, iso_fat, col = "#A6E5FC")
kpPlotRegions(kp, iso_mat, col = "#E9B864")
colors <- c("#AAF593", "#FFB6C1", "#A6E5FC", "#E9B864")
legend("topright",
legend = c("Het_Fat", "Het_Mat", "Iso_Fat", "Iso_Mat"),
fill = colors
)
}
plotUPDKp(updEvents)
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] karyoploteR_1.33.0 regioneR_1.39.0
[3] VariantAnnotation_1.53.0 Rsamtools_2.23.1
[5] Biostrings_2.75.1 XVector_0.47.0
[7] SummarizedExperiment_1.37.0 Biobase_2.67.0
[9] GenomicRanges_1.59.1 GenomeInfoDb_1.43.2
[11] IRanges_2.41.1 S4Vectors_0.45.2
[13] MatrixGenerics_1.19.0 matrixStats_1.4.1
[15] BiocGenerics_0.53.3 generics_0.1.3
[17] dplyr_1.1.4 UPDhmm_1.3.0
loaded via a namespace (and not attached):
[1] DBI_1.2.3 bitops_1.0-9 gridExtra_2.3
[4] rlang_1.1.4 magrittr_2.0.3 biovizBase_1.55.0
[7] compiler_4.4.2 RSQLite_2.3.8 GenomicFeatures_1.59.1
[10] png_0.1-8 vctrs_0.6.5 ProtGenerics_1.39.0
[13] stringr_1.5.1 pkgconfig_2.0.3 crayon_1.5.3
[16] fastmap_1.2.0 backports_1.5.0 utf8_1.2.4
[19] rmarkdown_2.29 UCSC.utils_1.3.0 bit_4.5.0
[22] xfun_0.49 zlibbioc_1.52.0 cachem_1.1.0
[25] jsonlite_1.8.9 blob_1.2.4 DelayedArray_0.33.2
[28] BiocParallel_1.41.0 parallel_4.4.2 cluster_2.1.6
[31] R6_2.5.1 stringi_1.8.4 bslib_0.8.0
[34] RColorBrewer_1.1-3 bezier_1.1.2 rtracklayer_1.67.0
[37] rpart_4.1.23 jquerylib_0.1.4 Rcpp_1.0.13-1
[40] knitr_1.49 base64enc_0.1-3 Matrix_1.7-1
[43] nnet_7.3-19 tidyselect_1.2.1 dichromat_2.0-0.1
[46] rstudioapi_0.17.1 abind_1.4-8 yaml_2.3.10
[49] codetools_0.2-20 curl_6.0.1 lattice_0.22-6
[52] tibble_3.2.1 KEGGREST_1.47.0 evaluate_1.0.1
[55] foreign_0.8-87 pillar_1.9.0 BiocManager_1.30.25
[58] checkmate_2.3.2 RCurl_1.98-1.16 ensembldb_2.31.0
[61] ggplot2_3.5.1 munsell_0.5.1 scales_1.3.0
[64] BiocStyle_2.35.0 glue_1.8.0 lazyeval_0.2.2
[67] Hmisc_5.2-0 maketools_1.3.1 tools_4.4.2
[70] BiocIO_1.17.1 data.table_1.16.2 sys_3.4.3
[73] BSgenome_1.75.0 GenomicAlignments_1.43.0 buildtools_1.0.0
[76] XML_3.99-0.17 grid_4.4.2 AnnotationDbi_1.69.0
[79] colorspace_2.1-1 GenomeInfoDbData_1.2.13 htmlTable_2.4.3
[82] restfulr_0.0.15 Formula_1.2-5 cli_3.6.3
[85] HMM_1.0.1 fansi_1.0.6 S4Arrays_1.7.1
[88] AnnotationFilter_1.31.0 gtable_0.3.6 sass_0.4.9
[91] digest_0.6.37 SparseArray_1.7.2 rjson_0.2.23
[94] htmlwidgets_1.6.4 memoise_2.0.1 htmltools_0.5.8.1
[97] lifecycle_1.0.4 httr_1.4.7 bit64_4.5.2
[100] bamsignals_1.39.0