--- title: "The UPDhmm's User Guide" author: - name: Marta Sevilla Porras affiliation: - Universitat Pompeu Fabra (UPF) - Centro de Investigación Biomédica en Red (CIBERER) email: marta.sevilla@upf.edu - name: Carlos Ruiz Arenas affiliation: - Universidad de Navarra (UNAV) email: cruizarenas@unav.es date: "`r Sys.Date()`" package: "`r BiocStyle::pkg_ver('UPDhmm')`" abstract: > This vignette provides an introductory guide to the UPDhmm package, developed for detecting uniparental disomies (UPDs) events on genetic data from trio experiments (i.e. involving a father, mother, and proband). UPDs are a genetic condition in which an individual inherits both copies from the same parent. UPDs can consist on a whole chromosome or a segment. These events can result in disease by leading to homozygosity or specific alleles, potentially manifesting recessive traits or contributing to genetic disorders through errors associated with genetic imprinting. The key topics covered in this document include: (1) Package installation (2) Data loading and pre-processing (3) Identification of UPDs output: BiocStyle::html_document: number_sections: true toc: yes fig_caption: yes toc_float: true vignette: > %\VignetteIndexEntry{Detection of UPDs in HTS data} %\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( comment = "", warning = FALSE, message = FALSE, cache = FALSE ) ``` # Introduction ## Background Uniparental disomy (UPD) is a rare genetic condition where an individual inherits both copies of a chromosome from one parent, as opposed to the typical inheritance of one copy from each parent. The extent of its contribution as a causative factor in rare genetic diseases remains largely unexplored. UPDs can lead to disease either by inheriting a carrier pathogenic mutation as homozygous from a carrier parent or by causing errors in genetic imprinting. Nevertheless, there are currently no standardized methods available for the detection and characterization of these events. We have developed `UPDhmm` R/Bioconductor package. The package provides a tool method to detect, classify and stablish the location of uniparental disomy events. ## Method overview "`UPDhmm` relies on a Hidden Markov Model (HMM) to identify regions with UPD. In our HMM model, observations are the combination of the genotypes from the father, mother and proband for every genomic variant in the input data. The hidden states of the model represent five inheritance patterns: normal (mendelian inheritance), maternal isodisomy, paternal isodisomy, maternal heterodisomy and paternal heterodisomy. Emission probabilities were defined based on the inheritance patterns. Viterbi algorithm was employed to infer the most likely combination of hidden states underlying a sequence of observations, thus defining the most likely inheritance pattern for each genetic variant. `UPDhmm` reports segments in the genome having an inheritance pattern different than normal, and thus, likely harbouring a UPD." # Set-up the packages ```{r installation, eval=FALSE, include=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("UPDhmm") BiocManager::install("regioneR") BiocManager::install("karyoploteR") ``` ## Loading libraries ```{r loading libraries, message = FALSE} library(UPDhmm) library(dplyr) ``` ## Quick start The input of the package is a multisample vcf file with the following requirements: - Only biallelic sites are only allowed - GT format allowed are: 0/1, 1/0, 0/1 and 1/1 or 0|1, 1|0, 0|1 and 1|1 # Datasets {#datasets} The UPDhmm package includes one example dataset, adapted from [GIB](https://www.nist.gov/programs-projects/genome-bottle) This dataset serves as a practical illustration and can be utilized for testing and familiarizing users with the functionality of the package. # Preprocessing After reading the VCF file, the `vcfCheck()` function is employed for preprocessing the input data. This function facilitates reading the VCF in the suitable format for the UPDhmm package. ```{r read Vcf,echo=TRUE} library(UPDhmm) library(VariantAnnotation) file <- system.file(package = "UPDhmm", "extdata", "test_het_mat.vcf.gz") vcf <- readVcf(file) processedVcf <- vcfCheck( vcf, proband = "NA19675", mother = "NA19678", father = "NA19679" ) ``` # Uniparental disomy detection The principal function of `UPDhmm` package, `calculateEvents()`, is the central function for identifying Uniparental Disomy (UPD) events. It takes the output from the previous `vcfCheck()` function and systematically analyzes genomic data, splitting the VCF into chromosomes and applying the Viterbi algorithm. `calculateEvents()` function encompasses a serie of subprocesses for identifying Uniparental disomies (UPDs): (1) Split vcf into chromosomes (2) Apply Viterbi algorithm to every chromosome (3) Transform the inferred hidden states to coordinates `data.frame` (4) Create blocks of contiguous variants with the same state (5) Calculate the statistics (log-likelihood and p-value). ```{r calculateEvents function, echo=TRUE} updEvents <- calculateEvents(largeCollapsedVcf = processedVcf) head(updEvents) ``` ## Results description The `calculateEvents` function returns a `data.frame` containing all detected events in the provided trio. If no events are found, the function will return an empty data.frame. | Column name | Description | |---------------------|----------------------------------------------------| | `seqnames` | Chromosome | | `start` | Start position of the block | | `end` | End position of the block | | `n_snps` | Number of variants within the event | | `group` | Predicted state | | `log_likelihood` | log likelihood ratio | | `p_value` | p-value | | `n_mendelian_error` | Count of Mendelian errors (genotypic combinations that are inconsistent with Mendelian inheritance principles) | # Results Visualization To visualize the results, the `karyoploteR` package can be used. Here, a custom function is provided to facilitate easy implementation with the output format results. This function enables the visualization of the detected blocks by the `calculateEvents` function. In this example, we can observe how the package detects the simulated event on chromosome 3, as well as the specific type of event. In the plot, the autosomes chromosomes are represented, and we can see that the entire q arm of chromosome 3 is colored. With the legend, we can identify that the event is a maternal heterodisomy. ```{r echo=TRUE} library(karyoploteR) library(regioneR) plotUPDKp <- function(updEvents) { updEvents$seqnames <- paste0("chr", updEvents$seqnames) het_fat <- toGRanges(subset( updEvents, group == "het_fat" )[, c("seqnames", "start", "end")]) het_mat <- toGRanges(subset( updEvents, group == "het_mat" )[, c("seqnames", "start", "end")]) iso_fat <- toGRanges(subset( updEvents, group == "iso_fat" )[, c("seqnames", "start", "end")]) iso_mat <- toGRanges(subset( updEvents, group == "iso_mat" )[, c("seqnames", "start", "end")]) kp <- plotKaryotype(genome = "hg19") kpPlotRegions(kp, het_fat, col = "#AAF593") kpPlotRegions(kp, het_mat, col = "#FFB6C1") kpPlotRegions(kp, iso_fat, col = "#A6E5FC") kpPlotRegions(kp, iso_mat, col = "#E9B864") colors <- c("#AAF593", "#FFB6C1", "#A6E5FC", "#E9B864") legend("topright", legend = c("Het_Fat", "Het_Mat", "Iso_Fat", "Iso_Mat"), fill = colors ) } plotUPDKp(updEvents) ``` # Session Info ```{r} sessionInfo() ``` # References