Title: | Get inversion status in predefined regions |
---|---|
Description: | scoreInvHap can get the samples' inversion status of known inversions. scoreInvHap uses SNP data as input and requires the following information about the inversion: genotype frequencies in the different haplotypes, R2 between the region SNPs and inversion status and heterozygote genotypes in the reference. The package include this data for 21 inversions. |
Authors: | Carlos Ruiz [aut], Dolors Pelegrà [aut], Juan R. Gonzalez [aut, cre] |
Maintainer: | Dolors Pelegri-Siso <[email protected]> |
License: | file LICENSE |
Version: | 1.29.0 |
Built: | 2024-10-31 04:47:43 UTC |
Source: | https://github.com/bioc/scoreInvHap |
Internal
adaptRefs(Refs, alleletable, haploid = FALSE)
adaptRefs(Refs, alleletable, haploid = FALSE)
Refs |
List with the allele frequencies |
alleletable |
Data.frame with the alleles per SNP (from getAlleleTable) |
haploid |
Logical. If TRUE, modify references for haploid samples |
List with the same values than Refs but adapted to imputation data
This function checks the genotype object before passing the SNPs to 'scoreInvHap'. The function removes SNPs with different alleles or different allele frequencies. Nonetheless, it is possible that these SNPs could be recovered after an examination of the results. Be aware that testing of allele frequencies might fail for small datasets.
checkSNPs(SNPobj, checkAlleleFreqs = TRUE)
checkSNPs(SNPobj, checkAlleleFreqs = TRUE)
SNPobj |
List with SNPs data from plink or |
checkAlleleFreqs |
Should allele frequencies be check (Default: TRUE) |
List containing the SNPs prepared for scoreInvHap
genos: Object with genotype data ready for scoreInvHap
wrongAlleles: Character vector with the SNPs discarded due to having alleles different to reference
wrongFreqs: Character vector with the SNPs discarded due to having allele frequencies different to reference
## Run method if(require(VariantAnnotation)){ vcf <- readVcf(system.file("extdata", "example.vcf", package = "scoreInvHap"), "hg19") resList <- checkSNPs(vcf) resList }
## Run method if(require(VariantAnnotation)){ vcf <- readVcf(system.file("extdata", "example.vcf", package = "scoreInvHap"), "hg19") resList <- checkSNPs(vcf) resList }
This function computes the similarity scores between the sample SNPs and the haplotype's reference.
classifSNPs( genos, R2, refs, alleletable, BPPARAM = BiocParallel::SerialParam() ) classifSNPsImpute(genos, R2, refs, BPPARAM = BiocParallel::SerialParam())
classifSNPs( genos, R2, refs, alleletable, BPPARAM = BiocParallel::SerialParam() ) classifSNPsImpute(genos, R2, refs, BPPARAM = BiocParallel::SerialParam())
genos |
Matrix with the samples genotypes. It is the result of |
R2 |
Vector with the R2 between the SNPs and the inversion status. |
refs |
List of matrices. Each matrix has, for an SNP, the frequencies of each genotype in the different haplotypes. |
alleletable |
Data frame with the reference alleles computed with |
BPPARAM |
A |
classifSNPs computes, for each individual, similarity scores for all the present haplotypes. For each SNP, we compute as many similarity scores as haplotypes present in the reference. We have defined the similarity score as the frequency of this genotype in the different haplotype population. To compute the global similarity score, we have computed a mean of the scores by SNP weighted by the R2 between the SNP and the haplotype classification.
classifSNPsImpute is a version of classifSNPs that works with posterior probabilities of imputed genotypes.
List with the results:
scores: Matrix with the simmilarity scores of the individuals
numSNPs: Vector with the number of SNPs used in each computation
Internal
computeScore(geno, refs, R2)
computeScore(geno, refs, R2)
geno |
Vector with the sample genotypes. It is the result of
|
refs |
List of matrices. Each matrix has, for an SNP, the frequencies of each genotype in the different haplotypes. |
R2 |
Vector with the R2 between the SNPs and the inversion status |
List with the results:
scores: Vector with the simmilarity scores of the sample
numSNPs: Numeric with the number of SNPs used in the computation
This function tries to solve discrepancies between the reference and sample genotypes. The cause of these discrepancies is that samples and references have used different strands to codify the SNP. This function get the complement genotypes for the discordant SNPs and checks if discordancies are solved.
correctAlleleTable(alleletable, hetRefs, map)
correctAlleleTable(alleletable, hetRefs, map)
alleletable |
Data.frame with the alleles per SNP (from getAlleleTable) |
hetRefs |
Character vector with the heterozygous genotypes in the reference. |
map |
Data.frame with the annotation of the SNPs (from plink format) |
alleletable without discrepancies between these genotypes and the references.
Get a data.frame that maps the numeric genotype of a SNPmatrix (0, 1, 2) into the real genotype. Heterozygous genotypes are ordered alphabetically.
getAlleleTable(map)
getAlleleTable(map)
map |
Data.frame with the annotation of the SNPs (from plink format) |
Data.frame with genotypes map
Get a matrix with the sample genotypes from all SNP.
getGenotypesTable(geno, allele)
getGenotypesTable(geno, allele)
geno |
SnpMatrix (from plink format) |
allele |
Data.frame with the alleles per SNP (from getAlleleTable) |
Character matrix with the samples genotypes
This function estimates the inversion status of the samples using the probabilities
computed in classifSNPs
getInvStatus(scores)
getInvStatus(scores)
scores |
Matrix of probabilities (from |
List with the results:
class: Vector with the most probable classification
certainty: Vector with the certainty of the most probable classification
Dataset with the heterozygote genotypes of all the SNPs used in any of the references. This dataset include all the SNPs that are present inside the inversion's region in 1000 Genomes Phase 3.
hetRefs
hetRefs
List of character vectors with the heterozygous genotypes of the SNPs present included the region of 21 inversions. Each element is named with the SNPs names.
Description of the SNPs inclued in scoreInvHap references. The description includes the coordinates in hg19, the dbSNP identifier, the reference and alternative allele and the allele frequency in the European Samples of 1000 Genomes.
info
info
data.frame
Description of the 21 human inversions whose references are included in scoreInvHap. The description includes the citogenic location, the coordinates in hg19, the number of alleles and the number of SNPs with a MAF > 5 Samples of 1000 Genomes.
inversionGR
inversionGR
GenomicRanges with the inversions' description in the metada
Internal. Modify feature data from VCF to comply with scoreInvHap requirements.
prepareMap(vcf)
prepareMap(vcf)
vcf |
|
Data.frame with the feature data
Dataset with the genotype frequencies in the different haplotype populations. These frequencies have been computed using the European samples of 1000 Genomes Phase 3 data. Real inversion status have been obtained from invFEST and 1000Genomes.
Refs
Refs
List of matrices for 20 inversions. Each matrix has the frequency of each genotype in each haplotype.
scoreInvHap can get the samples' inversion status of known inversions. scoreInvHap uses SNP data as input and requires the following information about the inversion: genotype frequencies in the different inversion groups, R2 between the region SNPs and inversion status, heterozygote genotypes in the reference, allele frequencies in the reference population and inversion frequencies. The package include this data for 21 inversions.
This is the main function of 'scoreInvHap' package. This function accepts SNPs data in a plink or a VCF format and compute the inversion prediction. The list of available inversions is included in a GenomicRanges called 'inversionGR'.
scoreInvHap( SNPlist, inv = NULL, SNPsR2, hetRefs, Refs, R2 = 0, probs = FALSE, BPPARAM = BiocParallel::SerialParam(), verbose = FALSE )
scoreInvHap( SNPlist, inv = NULL, SNPsR2, hetRefs, Refs, R2 = 0, probs = FALSE, BPPARAM = BiocParallel::SerialParam(), verbose = FALSE )
SNPlist |
List with SNPs data from plink or |
inv |
Character with the name of the inversion to genotype. The available inversions are included in a table in the main vignette. |
SNPsR2 |
Vector with the R2 of the SNPs of the region |
hetRefs |
Vector with the heterozygote form of the SNP in the inversion |
Refs |
List with the allele frequencies in the references |
R2 |
Vector with the R2 between the SNPs and the inversion status |
probs |
Logical. If TRUE, scores are computed using posterior probabilities. If FALSE, scores are computed using best guess. Only applied when SNPlist is a VCF. |
BPPARAM |
A |
verbose |
Should message be shown? |
A scoreInvHap
object
# See list of inversions data(inversionGR) inversionGR ## Run method if(require(VariantAnnotation)){ vcf <- readVcf(system.file("extdata", "example.vcf", package = "scoreInvHap"), "hg19") res <- scoreInvHap(vcf, inv = "inv7_005") }
# See list of inversions data(inversionGR) inversionGR ## Run method if(require(VariantAnnotation)){ vcf <- readVcf(system.file("extdata", "example.vcf", package = "scoreInvHap"), "hg19") res <- scoreInvHap(vcf, inv = "inv7_005") }
Container with the results of the classification pipeline
## S4 method for signature 'scoreInvHapRes' classification(object, minDiff = 0, callRate = 0, inversion = TRUE) ## S4 method for signature 'scoreInvHapRes' certainty(object) ## S4 method for signature 'scoreInvHapRes' diffscores(object) ## S4 method for signature 'scoreInvHapRes' maxscores(object) ## S4 method for signature 'scoreInvHapRes' numSNPs(object) ## S4 method for signature 'scoreInvHapRes' plotCallRate(object, callRate = 0.9, ...) ## S4 method for signature 'scoreInvHapRes' plotScores(object, minDiff = 0.1, ...) ## S4 method for signature 'scoreInvHapRes' propSNPs(object) ## S4 method for signature 'scoreInvHapRes' scores(object)
## S4 method for signature 'scoreInvHapRes' classification(object, minDiff = 0, callRate = 0, inversion = TRUE) ## S4 method for signature 'scoreInvHapRes' certainty(object) ## S4 method for signature 'scoreInvHapRes' diffscores(object) ## S4 method for signature 'scoreInvHapRes' maxscores(object) ## S4 method for signature 'scoreInvHapRes' numSNPs(object) ## S4 method for signature 'scoreInvHapRes' plotCallRate(object, callRate = 0.9, ...) ## S4 method for signature 'scoreInvHapRes' plotScores(object, minDiff = 0.1, ...) ## S4 method for signature 'scoreInvHapRes' propSNPs(object) ## S4 method for signature 'scoreInvHapRes' scores(object)
object |
|
minDiff |
Numeric with the threshold of the minimum difference between the top and the second score. Used to filter samples. |
callRate |
Numeric with the threshold of the minimum call rate of the samples. Used to filter samples. |
inversion |
Logical. If true, haplotypes classification is adapted to return inversion status. (Default: TRUE) |
... |
Further parameters passed to plot function. |
A scoreInvHapRes instance
classification
: Get classification
certainty
: Get classification certainty
diffscores
: Get maximum similarity scores
maxscores
: Get maximum similarity scores
numSNPs
: Get number of SNPs used in computation
plotCallRate
: Plot call rate based QC
plotScores
: Plot scores based QC
propSNPs
: Get proportions of SNPs used in computation
scores
: Get similarity scores
classification
Factor with the individuals classification
scores
Simmilarity scores for the different haplotypes.
numSNPs
Numeric with SNPs used to compute the scores.
certainty
Numeric with the certainty of the classification for each individual.
if(require(VariantAnnotation)){ vcf <- readVcf(system.file("extdata", "example.vcf", package = "scoreInvHap"), "hg19") ## Create scoreInvHapRes class from pipeline res <- scoreInvHap(vcf, inv = "inv7_005") ## Print object res ## Get haplotype classification classification(res) ## Get similiraty scores scores(res) }
if(require(VariantAnnotation)){ vcf <- readVcf(system.file("extdata", "example.vcf", package = "scoreInvHap"), "hg19") ## Create scoreInvHapRes class from pipeline res <- scoreInvHap(vcf, inv = "inv7_005") ## Print object res ## Get haplotype classification classification(res) ## Get similiraty scores scores(res) }
Dataset with R2 between the SNPs and the inversion status. This values are used to weigth similarity scores. These values have been computed using the European samples of 1000 Genomes Phase 3 data. Real inversion status have been estimated using invClust.
SNPsR2
SNPsR2
List of numeric vectors for 21 inversions