Title: | TVTB: The VCF Tool Box |
---|---|
Description: | The package provides S4 classes and methods to filter, summarise and visualise genetic variation data stored in VCF files. In particular, the package extends the FilterRules class (S4Vectors package) to define news classes of filter rules applicable to the various slots of VCF objects. Functionalities are integrated and demonstrated in a Shiny web-application, the Shiny Variant Explorer (tSVE). |
Authors: | Kevin Rue-Albrecht [aut, cre] |
Maintainer: | Kevin Rue-Albrecht <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.33.0 |
Built: | 2024-11-18 04:44:37 UTC |
Source: | https://github.com/bioc/TVTB |
The package provides S4 classes and methods to filter, summarise and visualise genetic variation data stored in VCF files. In particular, the package extends the FilterRules class (S4Vectors package) to define news classes of filter rules applicable to the various slots of VCF objects. Functionalities are integrated and demonstrated in a Shiny web-application, the Shiny Variant Explorer (tSVE).
Package: | TVTB |
Type: | Package |
Title: | TVTB: The VCF Tool Box |
Version: | 1.33.0 |
Date: | 2024-07-05 |
Authors@R: | person("Kevin", "Rue-Albrecht", role = c("aut", "cre"), email = "[email protected]") |
Description: | The package provides S4 classes and methods to filter, summarise and visualise genetic variation data stored in VCF files. In particular, the package extends the FilterRules class (S4Vectors package) to define news classes of filter rules applicable to the various slots of VCF objects. Functionalities are integrated and demonstrated in a Shiny web-application, the Shiny Variant Explorer (tSVE). |
License: | Artistic-2.0 |
Depends: | R (>= 3.4), methods, utils, stats |
Imports: | AnnotationFilter, BiocGenerics (>= 0.25.1), BiocParallel, Biostrings, ensembldb, GenomeInfoDb, GenomicRanges, GGally, ggplot2, Gviz, limma, IRanges (>= 2.21.6), reshape2, Rsamtools, S4Vectors (>= 0.25.14), SummarizedExperiment, VariantAnnotation (>= 1.19.9) |
Suggests: | EnsDb.Hsapiens.v75 (>= 0.99.7), shiny (>= 0.13.2.9005), DT (>= 0.1.67), rtracklayer, BiocStyle (>= 2.5.19), knitr (>= 1.12), rmarkdown, testthat, covr, pander |
biocViews: | Software, Genetics, GeneticVariability, GenomicVariation, DataRepresentation, GUI, Genetics, DNASeq, WholeGenome, Visualization, MultipleComparison, DataImport, VariantAnnotation, Sequencing, Coverage, Alignment, SequenceMatching |
Collate: | utils.R tSVE.R AllClasses.R AllGenerics.R Genotypes-class.R TVTBparam-class.R VcfFilterRules-class.R parseCSQToGRanges.R countGenos-methods.R autodetectGenotypes.R addCountGenos-methods.R addFrequencies-methods.R addOverallFrequencies-methods.R addPhenoLevelFrequencies-methods.R dropInfo.R readVcf-methods.R variantsInSamples-methods.R vepInPhenoLevel-methods.R plotInfo.R pairsInfo.R show-methods.R |
VignetteBuilder: | knitr |
URL: | https://github.com/kevinrue/TVTB |
BugReports: | https://github.com/kevinrue/TVTB/issues |
Config/pak/sysreqs: | make libicu-dev libjpeg-dev libpng-dev libxml2-dev libssl-dev |
Repository: | https://bioc.r-universe.dev |
RemoteUrl: | https://github.com/bioc/TVTB |
RemoteRef: | HEAD |
RemoteSha: | 9b5085b997b0259f25f0100b2b28d2b55c829dc4 |
Author: | Kevin Rue-Albrecht [aut, cre] |
Maintainer: | Kevin Rue-Albrecht <[email protected]> |
Index of help topics:
Genotypes-class Genotypes class objects TVTB-package TVTB: The VCF Tool Box TVTBparam-class TVTBparam class objects VcfFilterRules-class VcfFilterRules class objects addCountGenos-methods Add count of genotypes to INFO field addFrequencies-methods Group-level genotypes counts and allele frequencies addOverallFrequencies-methods Overall genotypes counts and allele frequencies addPhenoLevelFrequencies-methods Genotypes and allele frequencies for a given phenotype level autodetectGenotypes-methods Define genotypes in the TVTBparam metadata slot class:VcfFixedRules VCF filters class objects sub-types countGenos-methods Count occurences of genotypes dropInfo-methods Remove INFO keys from VCF objects pairsInfo-methods Plot an INFO metric on a genomic axis. parseCSQToGRanges Parse the CSQ column of a VCF object into a GRanges object plotInfo-methods Plot an INFO metric on a genomic axis. readVcf,character,TVTBparam-method Read VCF files tSVE The Shiny Variant Explorer (tSVE) web-application variantsInSamples-methods Identify variants observed in samples vepInPhenoLevel-methods VEP predictions of variants observed in samples
Kevin Rue-Albrecht [aut, cre]
Maintainer: Kevin Rue-Albrecht <[email protected]>
Adds the total occurences of a set of genotypes as an INFO
field for each variant.
All given genotypes are counted toward a single
total (e.g. grand total of c("0/0", "0|0")
),
while other genotypes are silently ignored.
## S4 method for signature 'ExpandedVCF' addCountGenos( vcf, genos, key, description, samples = 1:ncol(vcf), force = FALSE)
## S4 method for signature 'ExpandedVCF' addCountGenos( vcf, genos, key, description, samples = 1:ncol(vcf), force = FALSE)
vcf |
|
genos |
|
key |
Name of the INFO field to create or update
( |
description |
|
samples |
|
force |
If |
In all cases, the new INFO field is inserted after the last existing field.
In other words, overwriting an existing INFO field is achieved by dropping
it from the data and header of the info
slot,
and subsequently inserting the new data after the last remaining INFO field.
ExpandedVCF
object including an additional INFO field stating
the count of genotypes.
Kevin Rue-Albrecht
countGenos,ExpandedVCF-method
and geno,VCF-method
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes(ref = "0|0", het = c("0|1", "1|0"), alt = "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addCountGenos( vcf, het(tparam), suffix(tparam)["het"], "Number of heterozygous genotypes")
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes(ref = "0|0", het = c("0|1", "1|0"), alt = "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addCountGenos( vcf, het(tparam), suffix(tparam)["het"], "Number of heterozygous genotypes")
Adds genotypes counts
(reference homozygote, heterozygote, and alternate homozygote)
and allele frequencies (alternate and minor)
as INFO fields in an ExpandedVCF
object.
Counts and frequencies may be calculated overall
(i.e. across all samples), or within groups of samples
(i.e. within phenotype levels).
Multiple genotypes can be counted toward a single
frequency (e.g. combined c("0/0", "0|0")
for
homozygote reference genotypes).
## S4 method for signature 'ExpandedVCF,list' addFrequencies(vcf, phenos, force = FALSE) ## S4 method for signature 'ExpandedVCF,character' addFrequencies(vcf, phenos, force = FALSE) ## S4 method for signature 'ExpandedVCF,missing' addFrequencies(vcf, force = FALSE)
## S4 method for signature 'ExpandedVCF,list' addFrequencies(vcf, phenos, force = FALSE) ## S4 method for signature 'ExpandedVCF,character' addFrequencies(vcf, phenos, force = FALSE) ## S4 method for signature 'ExpandedVCF,missing' addFrequencies(vcf, force = FALSE)
vcf |
|
phenos |
If Otherwise,
either a |
force |
If If |
The phenos
argument is central to control the behaviour of this method.
If phenos=NULL
, genotypes and frequencies are calculated across all the
samples in the ExpandedVCF
object, and stored in INFO fields
named according to settings stored in the TVTBparam
object (see below).
If phenos
is a character
vector of phenotypes present in
colnames(colData(vcf))
, counts and frequencies are calculated for each
level of those phenotypes, and stored in INFO fields prefixed with
"<phenotype>_<level>_" and suffixed with the settings stored in the
param
object (see below).
Finally, if phenos
is a named list
, names
must be
phenotypes present in colnames(colData(vcf))
, and values must be levels
of those phenotypes. In this case, counts and frequencies are calculated for
the given levels of the given phenotypes, and stored in INFO fields as
described above.
The param
object controls the key (suffix) of INFO fields as
follows:
names(ref(param))
Count of reference homozygote genotypes.
names(het(param))
Count of heterozygote genotypes.
names(alt(param))
Count of alternate homozygote genotypes.
aaf(param)
Alternate allele frequency.
maf(param)
Minor allele frequency
ExpandedVCF
object including additional
INFO fields for genotype counts and allele frequencies.
See Details.
Kevin Rue-Albrecht
addOverallFrequencies,ExpandedVCF-method
,
addPhenoLevelFrequencies,ExpandedVCF-method
,
VCF
,
and TVTBparam
.
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addFrequencies(vcf, list(super_pop = "AFR"))
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addFrequencies(vcf, list(super_pop = "AFR"))
Adds dataset-wide genotypes counts
(reference homozygote, heterozygote, and alternate homozygote)
and allele frequencies (alternate and minor)
as INFO fields in an ExpandedVCF
object.
Counts and frequencies may be calculated across all samples.
Multiple genotypes can be counted toward a single
frequency (e.g. combined c("0/0", "0|0")
for
homozygote reference genotypes).
## S4 method for signature 'ExpandedVCF' addOverallFrequencies(vcf, force = FALSE)
## S4 method for signature 'ExpandedVCF' addOverallFrequencies(vcf, force = FALSE)
vcf |
|
force |
If If |
Genotypes and frequencies are calculated across all the
samples in the ExpandedVCF
object, and stored in INFO fields
named according to settings stored in the TVTBparam
object (see below).
The param
object controls the key of INFO fields as
follows:
names(ref(param))
Count of reference homozygote genotypes.
names(het(param))
Count of heterozygote genotypes.
names(alt(param))
Count of alternate homozygote genotypes.
aaf(param)
Alternate allele frequency.
maf(param)
Minor allele frequency
ExpandedVCF
object including additional
INFO fields for genotype counts and allele frequencies.
See Details.
A warning
message is issued if genotypes are not fully defined in the
TVTBparam
.
Kevin Rue-Albrecht
addFrequencies,ExpandedVCF,list-method
,
addPhenoLevelFrequencies,ExpandedVCF-method
,
and VCF
.
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addOverallFrequencies(vcf, tparam)
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addOverallFrequencies(vcf, tparam)
Adds genotypes counts
(reference homozygote, heterozygote, and alternate homozygote)
and allele frequencies (alternate and minor) calculated
in a group of samples associated with a given level of a given phenotype
as INFO fields in an ExpandedVCF
object.
Multiple genotypes can be counted toward a single
frequency (e.g. combined c("0/0", "0|0")
for
homozygote reference genotypes).
## S4 method for signature 'ExpandedVCF' addPhenoLevelFrequencies( vcf, pheno, level, force = FALSE)
## S4 method for signature 'ExpandedVCF' addPhenoLevelFrequencies( vcf, pheno, level, force = FALSE)
vcf |
|
pheno |
Phenotype in |
level |
Phenotype level in |
force |
If If |
Genotypes and frequencies are calculated within the groups of samples
associated with the given level of the given phenotype,
and stored in INFO fields
named according to settings stored in
metadata(vcf)[["TVTBparam"]]
(see below).
The TVTBparam
object controls
the key suffix of INFO fields as follows:
names(ref(param))
Count of reference homozygote genotypes.
names(het(param))
Count of heterozygote genotypes.
names(alt(param))
Count of alternate homozygote genotypes.
aaf(param)
Alternate allele frequency.
maf(param)
Minor allele frequency
ExpandedVCF
object including additional
INFO fields for genotype counts and allele frequencies.
See Details.
A warning
message is issued if genotypes are not fully defined in the
TVTBparam
.
Kevin Rue-Albrecht
addFrequencies,ExpandedVCF,list-method
,
addOverallFrequencies,ExpandedVCF-method
,
VCF
,
and TVTBparam
.
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addPhenoLevelFrequencies(vcf, "super_pop", "AFR")
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addPhenoLevelFrequencies(vcf, "super_pop", "AFR")
This method attempts to auto-detect genotypes (i.e.
homozygote reference, heterozygote, and homozygote alternate)
in a VCF
object, and sets or creates a TVTBparam
object accordingly, in the metadata
slot.
## S4 method for signature 'VCF' autodetectGenotypes(vcf)
## S4 method for signature 'VCF' autodetectGenotypes(vcf)
vcf |
|
VCF
object including a new or updated TVTBparam
object
in metadata(vcf)[["TVTBparam"]]
.
A warning
message is issued if genotypes cannot be fully defined.
Kevin Rue-Albrecht
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam() # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) # warning expected vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- autodetectGenotypes(vcf)
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam() # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) # warning expected vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- autodetectGenotypes(vcf)
Counts the total occurences of a set of genotypes by row
in a matrix of genotype. All given genotypes are counted toward a single
total (e.g. grand total of c("0/0", "0|0")
),
while other genotypes are silently ignored.
## S4 method for signature 'ExpandedVCF' countGenos( x, genos, pheno = NULL, level = NULL)
## S4 method for signature 'ExpandedVCF' countGenos( x, genos, pheno = NULL, level = NULL)
x |
|
genos |
|
pheno |
If |
level |
If |
An integer
vector representing the aggregated count of the given
genotypes in each row.
Kevin Rue-Albrecht
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- countGenos(vcf, het(tparam), "super_pop", "AFR")
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- countGenos(vcf, het(tparam), "super_pop", "AFR")
Given a character
vector of INFO keys, removes either the associated
header, data, or both from a VCF
object.
If no INFO key is given (the default), all INFO keys are checked and removed
from the given slot if they do not have a matching entry in the other slot.
## S4 method for signature 'VCF' dropInfo( vcf, key = NULL, slot = "both")
## S4 method for signature 'VCF' dropInfo( vcf, key = NULL, slot = "both")
vcf |
|
key |
If |
slot |
Should the INFO keys be removed from the "header", the "data",
or |
An integer
vector representing the aggregated count of the given
genotypes in each row.
In the future, x
should also support genotype quality (GQ) to consider
only genotypes above a given quality cut-off.
Kevin Rue-Albrecht
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- dropInfo(vcf) dropInfo(vcf, "CSQ")
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- dropInfo(vcf) dropInfo(vcf, "CSQ")
The Genotypes
class stores genotype definitions in a convenient format.
Genotypes( ref = NA_character_, het = NA_character_, alt = NA_character_, suffix = c(ref="REF", het="HET", alt="ALT"))
Genotypes( ref = NA_character_, het = NA_character_, alt = NA_character_, suffix = c(ref="REF", het="HET", alt="ALT"))
ref |
A |
het |
A |
alt |
A |
suffix |
Set the individual INFO key suffixes used to store the statistics of homozygote reference, heterozygote, and homozygote alternate genotypes, in this order. See Details section. |
Genotypes may be initialised as NA_character_
and updated from an
imported VCF
object using the
autodetectGenotypes
method.
This may be useful if genotype encodings are not known beforehand.
For each suffix stored in the Genotypes
object, TVTB
may store data in the VCF
object under the INFO keys defined as follows:
Statistics across all samples in the ExpandedVCF
(e.g. "MAF").
Statistics across samples associated with a given level of a given phenotype (e.g. "gender_male_MAF").
Users are recommended to avoid using those INFO keys for other purposes.
A Genotypes
object that contains genotype definitions.
In the following code snippets x
is a Genotypes
object.
ref(x)
, ref(x) <- value
Gets or sets the vector that declares homozygote reference genotypes.
het(x)
, het(x) <- value
Gets or sets the vector that declares heterozygote genotypes.
alt(x)
, alt(x) <- value
Gets or sets the vector that declares homozygote alternate genotypes.
genos(x)
Gets a vector of concatenated
homozygote reference, heterozygote, and homozygote alternate genotypes.
See also ref
, het
, alt
, and carrier
accessors.
carrier(x)
Gets a vector of concatenated
heterozygote and homozygote alternate genotypes.
See also het
and alt
accessors.
suffix(x)
Gets a named character
vector that declares individual suffixes
used to store the data for each set of genotypes in the INFO field of the
VCF
object.
Names of this vector are ref
, het
, and alt
.
Kevin Rue-Albrecht
VCF
,
TVTBparam
,
and addCountGenos-methods
.
# Constructors ---- genotypes <- Genotypes("0|0", c("0|1", "1|0"), "1|1") # Accessors ---- ## Concatenated homozygote reference, heterozygote, and alternate heterozygote ## genotypes stored in the Genotypes object returned by the genos() accessor. genos(genotypes) ## Individual genotypes can be extracted with ref(), het(), alt() accessors. ref(genotypes) het(genotypes) alt(genotypes) ## Their individual INFO key suffixes can be extracted with suffix() accessors ## and the relevant name suffix(genotypes) suffix(genotypes)["ref"] suffix(genotypes)["het"] suffix(genotypes)["alt"] ## Concatenated heterozygote, and alternate heterozygote genotypes are ## returned by the carrier() accessor. carrier(genotypes) names(carrier(genotypes))
# Constructors ---- genotypes <- Genotypes("0|0", c("0|1", "1|0"), "1|1") # Accessors ---- ## Concatenated homozygote reference, heterozygote, and alternate heterozygote ## genotypes stored in the Genotypes object returned by the genos() accessor. genos(genotypes) ## Individual genotypes can be extracted with ref(), het(), alt() accessors. ref(genotypes) het(genotypes) alt(genotypes) ## Their individual INFO key suffixes can be extracted with suffix() accessors ## and the relevant name suffix(genotypes) suffix(genotypes)["ref"] suffix(genotypes)["het"] suffix(genotypes)["alt"] ## Concatenated heterozygote, and alternate heterozygote genotypes are ## returned by the carrier() accessor. carrier(genotypes) names(carrier(genotypes))
Make a matrix of plots that display a metric calculated in levels of a given
phenotype,
and stored in columns of the info
slot of a VCF
object.
## S4 method for signature 'VCF' pairsInfo(vcf, metric, phenotype, ..., title = metric)
## S4 method for signature 'VCF' pairsInfo(vcf, metric, phenotype, ..., title = metric)
vcf |
|
metric |
Metric to plot on the Y axis.
All columns in the |
phenotype |
Column in the |
... |
Additional arguments, passed to the |
title |
Title for the graph, passed to the |
gg
object returned by the ggpairs method.
Kevin Rue-Albrecht
ggpairs
,
addPhenoLevelFrequencies,ExpandedVCF-method
,
and VCF
.
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addFrequencies(vcf, "super_pop") # Example usage ---- pairsInfo(vcf, "MAF", "super_pop")
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addFrequencies(vcf, "super_pop") # Example usage ---- pairsInfo(vcf, "MAF", "super_pop")
Parse the CSQ column in a VCF object returned from the Ensembl Variant Effect Predictor (VEP).
**This method was rescued following the deprecation of the package
ensemblVEP
in the Bioconductor release 3.20
.**
## S4 method for signature 'VCF' parseCSQToGRanges(x, VCFRowID=character(), ..., info.key = "CSQ")
## S4 method for signature 'VCF' parseCSQToGRanges(x, VCFRowID=character(), ..., info.key = "CSQ")
x |
A |
VCFRowID |
A When |
info.key |
The name of the INFO key that VEP writes the consequences to in the output
(default is |
... |
Arguments passed to other methods. Currently not used. |
When ensemblVEP
returns a VCF
object, the consequence data
are returned unparsed in the 'CSQ' INFO column. parseCSQToGRanges
parses these data into a GRanges
object that is expanded to match
the dimension of the 'CSQ' data. Because each variant can have multiple
matches, the ranges in the GRanges
are repeated.
If rownames from the original VCF are provided as VCFRowID
a
metadata column is included in the result that maps back to the row
(variant) in the original VCF. This option is only applicable when the
info.key
field has data (is not empty).
If no info.key
column is found the function returns the data in
rowRanges()
.
Returns a GRanges
object with consequence data as the
metadata columns. If no 'CSQ' column is found the GRanges
from rowRanges()
is returned.
Valerie Obenchain, Kevin Rue-Albrecht
Ensembl VEP Home: http://uswest.ensembl.org/info/docs/tools/vep/index.html
library(VariantAnnotation) file <- system.file("extdata", "moderate.vcf", package = "TVTB") vep <- readVcf(file) ## The returned 'CSQ' data are unparsed. info(vep)$CSQ ## Parse into a GRanges and include the 'VCFRowID' column. vcf <- readVcf(file, "hg19") csq <- parseCSQToGRanges(vep, VCFRowID=rownames(vcf)) csq[1:4]
library(VariantAnnotation) file <- system.file("extdata", "moderate.vcf", package = "TVTB") vep <- readVcf(file) ## The returned 'CSQ' data are unparsed. info(vep)$CSQ ## Parse into a GRanges and include the 'VCFRowID' column. vcf <- readVcf(file, "hg19") csq <- parseCSQToGRanges(vep, VCFRowID=rownames(vcf)) csq[1:4]
Plot, on a genomic axis, a metric calculated in levels of a given phenotype,
and stored in columns of the info
slot of a VCF
object.
## S4 method for signature 'VCF' plotInfo( vcf, metric, range, annotation, phenotype, type = c("p", "heatmap"), zero.rm = FALSE)
## S4 method for signature 'VCF' plotInfo( vcf, metric, range, annotation, phenotype, type = c("p", "heatmap"), zero.rm = FALSE)
vcf |
|
metric |
Metric to plot on the Y axis.
All columns in the |
range |
A |
annotation |
An |
phenotype |
Column in the |
type |
Plotting type(s), as listed in |
zero.rm |
If |
list
returned by the plotTracks
method.
Kevin Rue-Albrecht
plotTracks
,
addPhenoLevelFrequencies,ExpandedVCF-method
,
and VCF
.
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addFrequencies(vcf, "super_pop") # Example usage ---- if (requireNamespace("EnsDb.Hsapiens.v75")){ plotInfo( vcf, "MAF", range(GenomicRanges::granges(vcf)), EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75, "super_pop" ) }
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addFrequencies(vcf, "super_pop") # Example usage ---- if (requireNamespace("EnsDb.Hsapiens.v75")){ plotInfo( vcf, "MAF", range(GenomicRanges::granges(vcf)), EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75, "super_pop" ) }
Read Variant Call Format (VCF) files, attaches the given TVTBparam
in the metadata
slot of the resulting VCF
object,
and attaches optional phenotype information in the phenoData
slot.
## S4 method for signature 'character,TVTBparam' readVcf( file, genome, param, ..., colData = DataFrame(), autodetectGT = FALSE) ## S4 method for signature 'TabixFile,TVTBparam' readVcf( file, genome, param, ..., colData = DataFrame(), autodetectGT = FALSE)
## S4 method for signature 'character,TVTBparam' readVcf( file, genome, param, ..., colData = DataFrame(), autodetectGT = FALSE) ## S4 method for signature 'TabixFile,TVTBparam' readVcf( file, genome, param, ..., colData = DataFrame(), autodetectGT = FALSE)
file , genome
|
See |
param |
The |
... |
Additional arguments, passed to methods. |
colData |
Phenotype information in a If supplied, only samples identifiers present in |
autodetectGT |
If |
VCF
object.
See ?VCF
for complete details of the class structure.
A warning
message is issued if genotypes cannot be fully defined,
when autodetectGT=TRUE
.
Kevin Rue-Albrecht
readVcf,TabixFile,ScanVcfParam-method
,
and VCF
.
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf.gz", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Example usage ---- vcf <- readVcf(vcfFile, "b37", tparam, colData = phenotypes)
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf.gz", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Example usage ---- vcf <- readVcf(vcfFile, "b37", tparam, colData = phenotypes)
Currently unsupported — Package undergoing major updates.
This function starts the interactive tSVE
shiny web-application that
allows to interactively load and visualise genetic variants and their
Ensembl Variant Effect Predictor (VEP) predictions using the package methods.
All arguments after the ... set default values for the application
(e.g. widgets).
tSVE( ..., refGT = "0|0", hetGT = c("0|1", "1|2", "0|2", "1|0", "2|1", "2|0"), altGT = c("1|1", "2|2"), vepKey = "CSQ", refSuffix = "REF", hetSuffix = "HET", altSuffix = "ALT", aafSuffix = "AAF", mafSuffix = "MAF", genoHeatmap.height = "500px", options.width = 120, autodetectGTimport = FALSE )
tSVE( ..., refGT = "0|0", hetGT = c("0|1", "1|2", "0|2", "1|0", "2|1", "2|0"), altGT = c("1|1", "2|2"), vepKey = "CSQ", refSuffix = "REF", hetSuffix = "HET", altSuffix = "ALT", aafSuffix = "AAF", mafSuffix = "MAF", genoHeatmap.height = "500px", options.width = 120, autodetectGTimport = FALSE )
... |
Additional arguments passed to the |
refGT |
Default homozygote reference genotypes. |
hetGT |
Default heterozygote genotypes. |
altGT |
Default homozygote alternate genotypes. |
vepKey |
Default INFO key for the VEP prediction field. |
refSuffix |
Default INFO key suffix used to store the data for homozygote reference genotypes. |
hetSuffix |
Default INFO key suffix used to store the data for heterozygote genotypes. |
altSuffix |
Default INFO key suffix used to store the data for homozygote alternate genotypes. |
aafSuffix |
Default INFO key suffix used to store the data for alternate allele frequency. |
mafSuffix |
Default INFO key suffix used to store the data for minor allele frequency. |
genoHeatmap.height |
Default height (in pixels) of the heatmap that represents the genotype of each variant in each sample. |
options.width |
Sets |
autodetectGTimport |
Default checkbox value. If |
Not applicable (yet).
Kevin Rue-Albrecht
Interface to EnsDb
adapted from the ensembldb
package.
if (interactive()){ runEnsDbApp() }
if (interactive()){ runEnsDbApp() }
The TVTBparam
class stores recurrent parameters of the TVTB
package in a convenient format.
TVTBparam( genos, ranges = GRangesList(), aaf = "AAF", maf = "MAF", vep = "CSQ", bp = SerialParam(), svp = ScanVcfParam(which = reduce(unlist(ranges))))
TVTBparam( genos, ranges = GRangesList(), aaf = "AAF", maf = "MAF", vep = "CSQ", bp = SerialParam(), svp = ScanVcfParam(which = reduce(unlist(ranges))))
genos |
A |
ranges |
A |
aaf |
INFO key suffix used to store the alternate allele frequency (AAF). |
maf |
INFO key suffix used to store the minor allele frequency (MAF). |
vep |
INFO key suffix used to extract the VEP predictions. See |
bp |
A |
svp |
A |
For each suffix stored in the TVTBparam
object, TVTB
may store data in the VCF
object under the INFO keys defined as follows:
Statistics across all samples in the ExpandedVCF
(e.g. "MAF").
Statistics across samples associated with a given level of a given phenotype (e.g. "gender_male_MAF").
Users are recommended to avoid using those INFO keys for other purposes.
A TVTBparam
object that contains recurrent parameters.
In the following code snippets x
is a TVTBparam
object.
genos(x)
, genos(x) <- value
Gets or sets the Genotypes
object stored in the genos
slot.
ranges(x)
, ranges(x) <- value
List of genomic ranges to group variants during analyses and plots.
ref(x)
, ref(x) <- value
Gets or sets the character
vector
that declares homozygote reference genotypes.
het(x)
, het(x) <- value
Gets or sets the character
vector
that declares heterozygote genotypes.
alt(x)
, alt(x) <- value
Gets or sets the character
vector
that declares homozygote alternate genotypes.
carrier(x)
Gets a character
vectors of concatenated
heterozygote and homozygote alternate genotypes.
See also het
and alt
accessors.
aaf(x)
, aaf(x) <- value
Gets or sets the INFO key suffix used to store the alternate allele frequency (AAF).
maf(x)
, maf(x) <- value
Gets or sets the INFO key suffix used to store the minor allele frequency (MAF).
vep(x)
, maf(x) <- value
Gets or sets the INFO key suffix used to extract the VEP predictions.
bp(x)
, bp(x) <- value
Gets or sets the BiocParallel
parameters.
suffix(x)
Gets a named character
vector that declares individual suffixes
used to store the data for each set of genotypes in the INFO field of the
VCF
object.
Names of this vector are ref
, het
, alt
, aaf
,
and maf
.
svp(x)
, svp(x) <- value
Gets or sets the ScanVcfParam
parameters.
Kevin Rue-Albrecht
Genotypes
,
VCF
,
ExpandedVCF
,
addCountGenos-methods
vepInPhenoLevel-methods
,
variantsInSamples-methods
,
and BiocParallelParam
.
# Constructors ---- grl <- GenomicRanges::GRangesList(GenomicRanges::GRanges( "15", IRanges::IRanges(48413170, 48434757, names = "SLC24A5") )) tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1"), ranges = grl) # Accessors ---- ## The Genotypes object stored in the genos slot of the TVTBparam object ## return by the genos() accessor. genos(tparam) ## Genomic ranges stored in the TVTBparam object returned by the ranges() ## accessor. ranges(tparam) ## Individual genotypes can be extracted with ref(), het(), alt() accessors. ref(tparam) het(tparam) alt(tparam) ## Their individual INFO key suffixes can be extracted with suffix() applied to ## the above accessors. suffix(tparam) suffix(tparam)["ref"] suffix(tparam)["het"] suffix(tparam)["alt"] suffix(tparam)["aaf"] suffix(tparam)["maf"] ## Heterozygote, and alternate heterozygote genotypes are ## returned by the carrier() accessor. carrier(tparam) ## INFO key suffix of alternate/minor allele frequency returned by the aaf() ## and maf() accessors. aaf(tparam) maf(tparam) ## INFO key suffix of the VEP predictions returned by the vep() accessor. vep(tparam) ## BiocParallel parameters bp(tparam) ## ScanVcfParam parameters svp(tparam)
# Constructors ---- grl <- GenomicRanges::GRangesList(GenomicRanges::GRanges( "15", IRanges::IRanges(48413170, 48434757, names = "SLC24A5") )) tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1"), ranges = grl) # Accessors ---- ## The Genotypes object stored in the genos slot of the TVTBparam object ## return by the genos() accessor. genos(tparam) ## Genomic ranges stored in the TVTBparam object returned by the ranges() ## accessor. ranges(tparam) ## Individual genotypes can be extracted with ref(), het(), alt() accessors. ref(tparam) het(tparam) alt(tparam) ## Their individual INFO key suffixes can be extracted with suffix() applied to ## the above accessors. suffix(tparam) suffix(tparam)["ref"] suffix(tparam)["het"] suffix(tparam)["alt"] suffix(tparam)["aaf"] suffix(tparam)["maf"] ## Heterozygote, and alternate heterozygote genotypes are ## returned by the carrier() accessor. carrier(tparam) ## INFO key suffix of alternate/minor allele frequency returned by the aaf() ## and maf() accessors. aaf(tparam) maf(tparam) ## INFO key suffix of the VEP predictions returned by the vep() accessor. vep(tparam) ## BiocParallel parameters bp(tparam) ## ScanVcfParam parameters svp(tparam)
Identifies variants observed (uniquely) in at least one sample of a given group.
## S4 method for signature 'ExpandedVCF' variantsInSamples( vcf, samples = 1:ncol(vcf), unique = FALSE)
## S4 method for signature 'ExpandedVCF' variantsInSamples( vcf, samples = 1:ncol(vcf), unique = FALSE)
vcf |
|
samples |
|
unique |
If |
An named integer
vector of indices indicating the name and index of
variants that are (uniquely) observed in at least one non-reference genotype
in the given group of samples.
A warning
message is issued if genotypes are not fully defined in the
TVTBparam
.
Kevin Rue-Albrecht
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame( read.table(file = phenoFile, header = TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- variantsInSamples( vcf, which(SummarizedExperiment::colData(vcf)[,"super_pop"] == "EUR"))
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame( read.table(file = phenoFile, header = TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- variantsInSamples( vcf, which(SummarizedExperiment::colData(vcf)[,"super_pop"] == "EUR"))
The VcfFixedRules
and VcfInfoRules
classes
store filters applicable to the fixed
and info
slots
of VCF
objects, respectively.
The VcfVepRules
stores filters applicable to Ensembl VEP predictions
stores in a given INFO key.
All arguments are first passed to S4Vectors::FilterRules
before re-typing the resulting as a VcfFixedRules
, VcfInfoRules
,
or VcfVepRules
class.
In the following code snippets x
is an object from any of the classes
decribed in this help page, except when specified otherwise.
active(x)
, active(x)<-
Gets or sets the active state of each filter rule in x
.
Inherited from FilterRules
vep(x)
, vep(x)<-
Gets or sets the INFO key where the Ensembl VEP predictions
to use for filtering are stored.
Returns NA_character_
for filters not applicable to VEP predictions.
type(x)
Returns "filter"
(linkS4class{FilterRules}
),
"fixed"
(linkS4class{VcfFixedRules}
),
"info"
(linkS4class{VcfInfoRules}
),
or "vep"
(linkS4class{VcfVepRules}
)
as a character
vector of length(x)
.
VcfFixedRules(exprs = list(), ..., active = TRUE)
VcfInfoRules(exprs = list(), ..., active = TRUE)
VcfVepRules(exprs = list(), ..., active = TRUE, vep = "CSQ")
All methods construct an object of the corresponding class
with the rules given in the list exprs
or in ...
.
The initial active state of the rules is given by active
,
which is recycled as necessary.
See the constructor of FilterRules
for more details.
In the following code snippets x
and value
are objects from any of the classes described in this help page.
x[i]
: Subsets the filter rules using the same interface as for
List
.
x[[i]]
: Extracts an expression or function via the same interface
as for List
.
x[i] <- value
: Replaces a filter rule by one of the
same class.
The active state(s) and name(s) are transferred from value
to
x
.
x[[i]] <- value
:
The same interface as for List
.
The default active state for new rules is TRUE.
In the following code snippets x
, values
, and ...
are objects from any of the classes described in this help page, or
VcfFilterRules
.
append(x, values, after = length(x))
:
Appends the values onto x
at the index given by after
.
c(x, ...,)
:
Concatenates the filters objects in ...
onto the end of x
.
Note that combining rules of different types
(e.g. VcfFixedRules
and VcfVepRules
)
produces a VcfFilterRules
object.
As described in the S4Vectors
documentation:
eval(expr, envir, enclos)
:
Evaluates a rule instance (passed as the expr
argument)
in their respective context of a VCF
object
(passed as the envir
argument).
i.e.:
VcfFixedRules
: fixed(envir)
VcfInfoRules
: info(envir)
VcfVepRules
:
mcols(parseCSQToGRanges(envir, ...))
FilterRules
: envir
evalSeparately(expr, envir, enclos)
:
subsetByFilter(x, filter)
summary(object)
See eval,FilterRules,ANY-method
for details.
Kevin Rue-Albrecht
FilterRules
,
VcfFilterRules
,
and VCF
.
# Constructors ---- fixedRules <- VcfFixedRules(list( pass = expression(FILTER == "PASS"), qual = expression(QUAL > 20) )) fixedRules infoRules <- VcfInfoRules(list( common = expression(MAF > 0.01), # minor allele frequency alt = expression(ALT > 0) # count of alternative homozygotes )) infoRules vepRules <- VcfVepRules(list( missense = expression(Consequence %in% c("missense_variant")), CADD = expression(CADD_PHRED > 15) )) vepRules filterRules <- S4Vectors::FilterRules(list( PASS = function(x) fixed(x)$FILTER == "PASS", COMMON = function(x) info(x)$MAF > 0.05 )) filterRules # Accessors ---- ## get/set the active state directly S4Vectors::active(infoRules) S4Vectors::active(infoRules)["common"] <- FALSE ## See S4Vectors::FilterRules for more examples # Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addOverallFrequencies(vcf) # Applying filters to VCF objects ---- ## Evaluate filters S4Vectors::eval(fixedRules, vcf) S4Vectors::eval(infoRules, vcf) S4Vectors::eval(vepRules, vcf) S4Vectors::eval(filterRules, vcf) summary(S4Vectors::eval(vepRules, vcf)) ## Evaluate filters separately S4Vectors::evalSeparately(vepRules, vcf) summary(S4Vectors::evalSeparately(vepRules, vcf)) ## Subset VCF by filters S4Vectors::subsetByFilter(vcf, vepRules) # Subsetting and Replacement ---- vep1 <- vepRules[1] # VcfVepRules vepRules[[1]] # expression # Replace the expression (active reset to TRUE, original name retained) vepRules[[2]] <- expression(CADD_PHRED > 30) # Replace the rule (active state and name transferred from v5obj) vepRules[2] <- VcfVepRules( list(newRule = expression(CADD_PHRED > 30)), active = FALSE)
# Constructors ---- fixedRules <- VcfFixedRules(list( pass = expression(FILTER == "PASS"), qual = expression(QUAL > 20) )) fixedRules infoRules <- VcfInfoRules(list( common = expression(MAF > 0.01), # minor allele frequency alt = expression(ALT > 0) # count of alternative homozygotes )) infoRules vepRules <- VcfVepRules(list( missense = expression(Consequence %in% c("missense_variant")), CADD = expression(CADD_PHRED > 15) )) vepRules filterRules <- S4Vectors::FilterRules(list( PASS = function(x) fixed(x)$FILTER == "PASS", COMMON = function(x) info(x)$MAF > 0.05 )) filterRules # Accessors ---- ## get/set the active state directly S4Vectors::active(infoRules) S4Vectors::active(infoRules)["common"] <- FALSE ## See S4Vectors::FilterRules for more examples # Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addOverallFrequencies(vcf) # Applying filters to VCF objects ---- ## Evaluate filters S4Vectors::eval(fixedRules, vcf) S4Vectors::eval(infoRules, vcf) S4Vectors::eval(vepRules, vcf) S4Vectors::eval(filterRules, vcf) summary(S4Vectors::eval(vepRules, vcf)) ## Evaluate filters separately S4Vectors::evalSeparately(vepRules, vcf) summary(S4Vectors::evalSeparately(vepRules, vcf)) ## Subset VCF by filters S4Vectors::subsetByFilter(vcf, vepRules) # Subsetting and Replacement ---- vep1 <- vepRules[1] # VcfVepRules vepRules[[1]] # expression # Replace the expression (active reset to TRUE, original name retained) vepRules[[2]] <- expression(CADD_PHRED > 30) # Replace the rule (active state and name transferred from v5obj) vepRules[2] <- VcfVepRules( list(newRule = expression(CADD_PHRED > 30)), active = FALSE)
The VcfFilterRules
class can stores multiple types of filters
applicable to various slots of VCF
objects.
All arguments must be VcfFixedRules
, VcfInfoRules
,
VcfVepRules
, VcfFilterRules
of FilterRules
objects.
In the following code snippets x
is a VcfFilterRules
object.
active(x)
, active(x)<-
Get or set the active state of each filter rule in x
.
Inherited from FilterRules
vep(x)
, vep(x)<-
Gets or sets the INFO key where the Ensembl VEP predictions to use for filtering are stored.
type(x)
Gets the type of each filter stored in a VcfFilterRules
object.
Read-only.
VcfFilterRules(...)
constructs an VcfFilterRules
object from
VcfFixedRules
, VcfInfoRules
, VcfVepRules
,
and VcfFilterRules
objects in ...
.
In the code snippets below, x
is a VcfFilterRules
object.
x[i, drop = TRUE]
: Subsets the filter rules using the
same interface as for Vector
.
If all filter rules are of the same type and drop=TRUE
(default),
the resulting object is re-typed to the most specialised class,
if possible. In other words, if all remaining filter rules are of
type "vep"
, the object will be type as VcfVepRules
.
x[[i]]
: Extracts an expression or function via the same interface
as for List
.
x[i] <- value
: Replaces a filter rule by one of any valid class
(VcfFixedRules
, VcfInfoRules
, VcfVepRules
, or
VcfFilterRules
).
The active state(s), name(s), and type(s) (if applicable)
are transferred from value
.
x[[i]] <- value
: The same interface as for
List
. The default active state for new
rules is TRUE
.
In the following code snippets x
is an object of class
VcfFilterRules
,
while values
and ...
are objects from any of the
classes VcfFixedRules
, VcfInfoRules
, VcfVepRules
,
or VcfFilterRules
:
append(x, values, after = length(x))
:
Appends the values onto x
at the index given by after
.
c(x, ...,)
:
Concatenates the filters objects in ...
onto the end of x
.
As described in the S4Vectors
documentation:
eval(expr, envir, enclos)
Evaluates each active rule in a VcfFilterRules
instance
(passed as the expr
argument)
in their respective context of a VCF
object
(passed as the envir
argument).
evalSeparately(expr, envir, enclos)
:
subsetByFilter(x, filter)
summary(object)
See eval,FilterRules,ANY-method
for details.
Kevin Rue-Albrecht
FilterRules
,
VcfFixedRules
,
VcfInfoRules
,
VcfVepRules
,
and VCF
.
# Constructors ---- fixedR <- VcfFixedRules(list( pass = expression(FILTER == "PASS"), qual = expression(QUAL > 20) )) fixedR infoR <- VcfInfoRules(list( common = expression(MAF > 0.1), # minor allele frequency present = expression(ALT + HET > 0) # count of non-REF homozygotes )) # ...is synonym to... infoR <- VcfInfoRules(list( common = expression(MAF > 0.1), # minor allele frequency present = expression(ALT > 0 | HET > 0) )) infoR vepR <- VcfVepRules(list( missense = expression(Consequence %in% c("missense_variant")), CADD = expression(CADD_PHRED > 15) )) vepR vcfRules <- VcfFilterRules(fixedR, infoR, vepR) vcfRules # Accessors ---- ## Type of each filter stored in the VcfFilterRules object type(vcfRules) # Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addOverallFrequencies(vcf, tparam) # Applying filters to VCF objects ---- ## Evaluate filters eval(vcfRules, vcf) ## Evaluate filters separately as.data.frame(evalSeparately(vcfRules, vcf)) # Interestingly, the only common missense variant has a lower CADD score ## Deactivate the CADD score filter active(vcfRules)["CADD"] <- FALSE ## Subset VCF by filters (except CADD, deactivated above) subsetByFilter(vcf, vcfRules) # Subsetting and Replacement ---- v123 <- vcfRules[1:3] # Extract the expression v5expr <- vcfRules[[5]] # Subset the object v5obj <- vcfRules[5] # Replace the expression (active reset to TRUE, original name retained) v123[[2]] <- v5expr # Replace the rule (active state and name transferred from v5obj) v123[2] <- v5obj
# Constructors ---- fixedR <- VcfFixedRules(list( pass = expression(FILTER == "PASS"), qual = expression(QUAL > 20) )) fixedR infoR <- VcfInfoRules(list( common = expression(MAF > 0.1), # minor allele frequency present = expression(ALT + HET > 0) # count of non-REF homozygotes )) # ...is synonym to... infoR <- VcfInfoRules(list( common = expression(MAF > 0.1), # minor allele frequency present = expression(ALT > 0 | HET > 0) )) infoR vepR <- VcfVepRules(list( missense = expression(Consequence %in% c("missense_variant")), CADD = expression(CADD_PHRED > 15) )) vepR vcfRules <- VcfFilterRules(fixedR, infoR, vepR) vcfRules # Accessors ---- ## Type of each filter stored in the VcfFilterRules object type(vcfRules) # Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addOverallFrequencies(vcf, tparam) # Applying filters to VCF objects ---- ## Evaluate filters eval(vcfRules, vcf) ## Evaluate filters separately as.data.frame(evalSeparately(vcfRules, vcf)) # Interestingly, the only common missense variant has a lower CADD score ## Deactivate the CADD score filter active(vcfRules)["CADD"] <- FALSE ## Subset VCF by filters (except CADD, deactivated above) subsetByFilter(vcf, vcfRules) # Subsetting and Replacement ---- v123 <- vcfRules[1:3] # Extract the expression v5expr <- vcfRules[[5]] # Subset the object v5obj <- vcfRules[5] # Replace the expression (active reset to TRUE, original name retained) v123[[2]] <- v5expr # Replace the rule (active state and name transferred from v5obj) v123[2] <- v5obj
Returns VEP predictions for variants observed (uniquey) in samples associated with a given phenotype level.
## S4 method for signature 'ExpandedVCF' vepInPhenoLevel( vcf, phenoCol, level, vepCol, unique = FALSE)
## S4 method for signature 'ExpandedVCF' vepInPhenoLevel( vcf, phenoCol, level, vepCol, unique = FALSE)
vcf |
|
phenoCol |
Name of a column in |
level |
Phenotype level; only variants observed in at least one sample will be considered. |
vepCol |
VEP prediction fields; |
unique |
If |
A GRanges
including all VEP predictions associated with a variant
seen in at least one sample (heterozygote or alternate homozygote)
associated with the phenotype level.
The GRanges
contains at least one column for the VEP prediction
value.
Additional columns containing another VEP prediction field may be added
using the facet
argument.
If available, "Feature"
is a recommended value for this argument,
as VEP typically produce one prediction per variant per feature.
A warning
message is issued if genotypes are not fully defined in the
TVTBparam
.
Kevin Rue-Albrecht
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame( read.table(file = phenoFile, header = TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vepInPhenoLevel(vcf, "super_pop", "AFR", c("CADD_PHRED", "Feature", "IMPACT"))
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame( read.table(file = phenoFile, header = TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vepInPhenoLevel(vcf, "super_pop", "AFR", c("CADD_PHRED", "Feature", "IMPACT"))