Title: | Detecting Uniparental Disomy through NGS trio data |
---|---|
Description: | Uniparental disomy (UPD) is a genetic condition where an individual inherits both copies of a chromosome or part of it from one parent, rather than one copy from each parent. This package contains a HMM for detecting UPDs through HTS (High Throughput Sequencing) data from trio assays. By analyzing the genotypes in the trio, the model infers a hidden state (normal, father isodisomy, mother isodisomy, father heterodisomy and mother heterodisomy). |
Authors: | Marta Sevilla [aut, cre] , Carlos Ruiz-Arenas [aut] |
Maintainer: | Marta Sevilla <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.3.0 |
Built: | 2024-10-31 06:32:06 UTC |
Source: | https://github.com/bioc/UPDhmm |
Function to transform a large collapsed VCF into a dataframe, incorporating predicted states along with the log-likelihood ratio and p-value.
addOr(filtered_def_blocks_states, largeCollapsedVcf, hmm, genotypes)
addOr(filtered_def_blocks_states, largeCollapsedVcf, hmm, genotypes)
filtered_def_blocks_states |
data.frame object containing the blocks |
largeCollapsedVcf |
Input VCF file |
hmm |
Hidden Markov Model used to infer the events. The format should adhere to the general HMM format from HMM package with a series of elements:
|
genotypes |
Possible GT formats and its correspondence with the hmm |
data.frame containing the transformed information.
Apply the hidden Markov model using the Viterbi algorithm.
applyViterbi(largeCollapsedVcf, hmm, genotypes)
applyViterbi(largeCollapsedVcf, hmm, genotypes)
largeCollapsedVcf |
input vcf file |
hmm |
Hidden Markov Model used to infer the events |
genotypes |
Possible GT formats and its correspondence with the hmm |
largeCollapsedVcf
Function to transform a large collapsed VCF into a dataframe with predicted states, including chromosome, start position, end position and metadata.
asDfVcf(largeCollapsedVcf, genotypes)
asDfVcf(largeCollapsedVcf, genotypes)
largeCollapsedVcf |
Name of the large collapsed VCF file. |
genotypes |
Possible GT formats and its correspondence with the hmm |
dataframe
Function to simplify contiguous variants with the same state into blocks.
blocksVcf(df)
blocksVcf(df)
df |
data.frame resulting from the |
data.frame containing information on the chromosome, start #' position of the block, end position of the block, and predicted state.
This function predicts the hidden states by applying the Viterbi algorithm using the Hidden Markov Model (HMM) from the UPDhmm package. It takes the genotypes of the trio as input and includes a final step to simplify the results into blocks.
calculateEvents(largeCollapsedVcf, hmm = NULL)
calculateEvents(largeCollapsedVcf, hmm = NULL)
largeCollapsedVcf |
The VCF file in the general format
(largeCollapsedVcf) with VariantAnnotation package. Previously edited with
|
hmm |
Default = NULL. If no arguments are added, the package
will use the default HMM already implemented, based on Mendelian
inheritance. If an optional HMM is desired, it should adhere to the
general HMM format from
|
A data.frame object containing all detected events in the provided trio. If no events are found, the function will return an empty data.frame.
file <- system.file(package = "UPDhmm", "extdata", "test_het_mat.vcf.gz") vcf <- VariantAnnotation::readVcf(file) processedVcf <- vcfCheck(vcf, proband = "NA19675", mother = "NA19678", father = "NA19679" )
file <- system.file(package = "UPDhmm", "extdata", "test_het_mat.vcf.gz") vcf <- VariantAnnotation::readVcf(file) processedVcf <- vcfCheck(vcf, proband = "NA19675", mother = "NA19678", father = "NA19679" )
This dataset provides Hidden Markov Model (HMM) parameters for predicting uniparental disomy (UPD) events in trio genomic data.
Five different possible states.
Code symbols used for genotype combinations.
The initial probabilities of each state.
Probabilities of transitioning from one state to another.
Given a certain genotype combination, the odds of each possible state.
data(hmm)
data(hmm)
A list with 5 different elements
Created in-house based on basic Mendelian rules for calculating UPD events.
data(hmm)
data(hmm)
This function takes a VCF file and converts it into a largeCollapsedVcf object using the VariantAnnotation package. It also rename the sample for subsequent steps needed in UPDhmm package. Additionally, it features an optional parameter, quality_check, which triggers warnings when variants lack sufficient quality based on RD and GQ parameters in the input VCF.
vcfCheck(largeCollapsedVcf, father, mother, proband, check_quality = FALSE)
vcfCheck(largeCollapsedVcf, father, mother, proband, check_quality = FALSE)
largeCollapsedVcf |
The file in largeCollapsedVcf format. |
father |
Name of the father's sample. |
mother |
Name of the mother's sample. |
proband |
Name of the proband's sample. |
check_quality |
Optional argument. TRUE/FALSE. If quality parameters want to be measured. Default = FALSE |
largeCollapsedVcf (VariantAnnotation VCF format).
fl <- system.file("extdata", "test_het_mat.vcf.gz", package = "UPDhmm") vcf <- VariantAnnotation::readVcf(fl) processedVcf <- vcfCheck(vcf, proband = "Sample1", mother = "Sample3", father = "Sample2")
fl <- system.file("extdata", "test_het_mat.vcf.gz", package = "UPDhmm") vcf <- VariantAnnotation::readVcf(fl) processedVcf <- vcfCheck(vcf, proband = "Sample1", mother = "Sample3", father = "Sample2")