| Title: | TVTB: The VCF Tool Box |
|---|---|
| Description: | The package provides S4 classes and methods to filter, summarise and visualise genetic variation data stored in VCF files. In particular, the package extends the FilterRules class (S4Vectors package) to define news classes of filter rules applicable to the various slots of VCF objects. Functionalities are integrated and demonstrated in a Shiny web-application, the Shiny Variant Explorer (tSVE). |
| Authors: | Kevin Rue-Albrecht [aut, cre] |
| Maintainer: | Kevin Rue-Albrecht <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.39.0 |
| Built: | 2026-05-29 10:14:24 UTC |
| Source: | https://github.com/bioc/TVTB |
The package provides S4 classes and methods to filter, summarise and visualise genetic variation data stored in VCF files. In particular, the package extends the FilterRules class (S4Vectors package) to define news classes of filter rules applicable to the various slots of VCF objects. Functionalities are integrated and demonstrated in a Shiny web-application, the Shiny Variant Explorer (tSVE).
| Package: | TVTB |
| Type: | Package |
| Title: | TVTB: The VCF Tool Box |
| Version: | 1.39.0 |
| Date: | 2025-08-05 |
| Authors@R: | person("Kevin", "Rue-Albrecht", role = c("aut", "cre"), email = "[email protected]") |
| Description: | The package provides S4 classes and methods to filter, summarise and visualise genetic variation data stored in VCF files. In particular, the package extends the FilterRules class (S4Vectors package) to define news classes of filter rules applicable to the various slots of VCF objects. Functionalities are integrated and demonstrated in a Shiny web-application, the Shiny Variant Explorer (tSVE). |
| License: | Artistic-2.0 |
| Depends: | R (>= 3.4), methods, utils, stats |
| Imports: | AnnotationFilter, BiocGenerics (>= 0.25.1), BiocParallel, Biostrings, ensembldb, Seqinfo, GenomicRanges, GGally, ggplot2, Gviz, limma, IRanges (>= 2.21.6), reshape2, Rsamtools, S4Vectors (>= 0.25.14), SummarizedExperiment, VariantAnnotation (>= 1.19.9) |
| Suggests: | EnsDb.Hsapiens.v75 (>= 0.99.7), shiny (>= 0.13.2.9005), DT (>= 0.1.67), rtracklayer, BiocStyle (>= 2.5.19), knitr (>= 1.12), rmarkdown, testthat, covr, pander |
| biocViews: | Software, Genetics, GeneticVariability, GenomicVariation, DataRepresentation, GUI, Genetics, DNASeq, WholeGenome, Visualization, MultipleComparison, DataImport, VariantAnnotation, Sequencing, Coverage, Alignment, SequenceMatching |
| Collate: | utils.R tSVE.R AllClasses.R AllGenerics.R Genotypes-class.R TVTBparam-class.R VcfFilterRules-class.R parseCSQToGRanges.R countGenos-methods.R autodetectGenotypes.R addCountGenos-methods.R addFrequencies-methods.R addOverallFrequencies-methods.R addPhenoLevelFrequencies-methods.R dropInfo.R readVcf-methods.R variantsInSamples-methods.R vepInPhenoLevel-methods.R plotInfo.R pairsInfo.R show-methods.R |
| VignetteBuilder: | knitr |
| URL: | https://github.com/kevinrue/TVTB |
| BugReports: | https://github.com/kevinrue/TVTB/issues |
| Config/Bioconductor/UnsupportedPlatforms: | windows |
| Config/pak/sysreqs: | cmake make libbz2-dev libicu-dev libjpeg-dev liblzma-dev libpng-dev libuv1-dev libxml2-dev libssl-dev xz-utils zlib1g-dev |
| Repository: | https://bioc.r-universe.dev |
| Date/Publication: | 2026-04-28 12:44:36 UTC |
| RemoteUrl: | https://github.com/bioc/TVTB |
| RemoteRef: | HEAD |
| RemoteSha: | c77deb7c910e894acf8844311da6fcca42390faf |
| Author: | Kevin Rue-Albrecht [aut, cre] |
| Maintainer: | Kevin Rue-Albrecht <[email protected]> |
Index of help topics:
addCountGenos-methods Add count of genotypes to INFO field
addFrequencies-methods
Group-level genotypes counts and allele
frequencies
addOverallFrequencies-methods
Overall genotypes counts and allele frequencies
addPhenoLevelFrequencies-methods
Genotypes and allele frequencies for a given
phenotype level
autodetectGenotypes-methods
Define genotypes in the TVTBparam metadata slot
class:VcfFixedRules VCF filters class objects sub-types
countGenos-methods Count occurences of genotypes
dropInfo-methods Remove INFO keys from VCF objects
Genotypes-class Genotypes class objects
pairsInfo-methods Plot an INFO metric on a genomic axis.
parseCSQToGRanges Parse the CSQ column of a VCF object into a
GRanges object
plotInfo-methods Plot an INFO metric on a genomic axis.
readVcf,character,TVTBparam-method
Read VCF files
tSVE The Shiny Variant Explorer (tSVE)
web-application
TVTB-package TVTB: The VCF Tool Box
TVTBparam-class TVTBparam class objects
variantsInSamples-methods
Identify variants observed in samples
VcfFilterRules-class VcfFilterRules class objects
vepInPhenoLevel-methods
VEP predictions of variants observed in samples
Kevin Rue-Albrecht [aut, cre]
Maintainer: Kevin Rue-Albrecht <[email protected]>
Adds the total occurences of a set of genotypes as an INFO
field for each variant.
All given genotypes are counted toward a single
total (e.g. grand total of c("0/0", "0|0")),
while other genotypes are silently ignored.
## S4 method for signature 'ExpandedVCF' addCountGenos( vcf, genos, key, description, samples = 1:ncol(vcf), force = FALSE)## S4 method for signature 'ExpandedVCF' addCountGenos( vcf, genos, key, description, samples = 1:ncol(vcf), force = FALSE)
vcf |
|
genos |
|
key |
Name of the INFO field to create or update
( |
description |
|
samples |
|
force |
If |
In all cases, the new INFO field is inserted after the last existing field.
In other words, overwriting an existing INFO field is achieved by dropping
it from the data and header of the info slot,
and subsequently inserting the new data after the last remaining INFO field.
ExpandedVCF object including an additional INFO field stating
the count of genotypes.
Kevin Rue-Albrecht
countGenos,ExpandedVCF-method
and geno,VCF-method
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes(ref = "0|0", het = c("0|1", "1|0"), alt = "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addCountGenos( vcf, het(tparam), suffix(tparam)["het"], "Number of heterozygous genotypes")# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes(ref = "0|0", het = c("0|1", "1|0"), alt = "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addCountGenos( vcf, het(tparam), suffix(tparam)["het"], "Number of heterozygous genotypes")
Adds genotypes counts
(reference homozygote, heterozygote, and alternate homozygote)
and allele frequencies (alternate and minor)
as INFO fields in an ExpandedVCF object.
Counts and frequencies may be calculated overall
(i.e. across all samples), or within groups of samples
(i.e. within phenotype levels).
Multiple genotypes can be counted toward a single
frequency (e.g. combined c("0/0", "0|0") for
homozygote reference genotypes).
## S4 method for signature 'ExpandedVCF,list' addFrequencies(vcf, phenos, force = FALSE) ## S4 method for signature 'ExpandedVCF,character' addFrequencies(vcf, phenos, force = FALSE) ## S4 method for signature 'ExpandedVCF,missing' addFrequencies(vcf, force = FALSE)## S4 method for signature 'ExpandedVCF,list' addFrequencies(vcf, phenos, force = FALSE) ## S4 method for signature 'ExpandedVCF,character' addFrequencies(vcf, phenos, force = FALSE) ## S4 method for signature 'ExpandedVCF,missing' addFrequencies(vcf, force = FALSE)
vcf |
|
phenos |
If Otherwise,
either a |
force |
If If |
The phenos argument is central to control the behaviour of this method.
If phenos=NULL, genotypes and frequencies are calculated across all the
samples in the ExpandedVCF object, and stored in INFO fields
named according to settings stored in the TVTBparam object (see below).
If phenos is a character vector of phenotypes present in
colnames(colData(vcf)), counts and frequencies are calculated for each
level of those phenotypes, and stored in INFO fields prefixed with
"<phenotype>_<level>_" and suffixed with the settings stored in the
param object (see below).
Finally, if phenos is a named list, names must be
phenotypes present in colnames(colData(vcf)), and values must be levels
of those phenotypes. In this case, counts and frequencies are calculated for
the given levels of the given phenotypes, and stored in INFO fields as
described above.
The param object controls the key (suffix) of INFO fields as
follows:
names(ref(param))Count of reference homozygote genotypes.
names(het(param))Count of heterozygote genotypes.
names(alt(param))Count of alternate homozygote genotypes.
aaf(param)Alternate allele frequency.
maf(param)Minor allele frequency
ExpandedVCF object including additional
INFO fields for genotype counts and allele frequencies.
See Details.
Kevin Rue-Albrecht
addOverallFrequencies,ExpandedVCF-method,
addPhenoLevelFrequencies,ExpandedVCF-method,
VCF,
and TVTBparam.
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addFrequencies(vcf, list(super_pop = "AFR"))# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addFrequencies(vcf, list(super_pop = "AFR"))
Adds dataset-wide genotypes counts
(reference homozygote, heterozygote, and alternate homozygote)
and allele frequencies (alternate and minor)
as INFO fields in an ExpandedVCF object.
Counts and frequencies may be calculated across all samples.
Multiple genotypes can be counted toward a single
frequency (e.g. combined c("0/0", "0|0") for
homozygote reference genotypes).
## S4 method for signature 'ExpandedVCF' addOverallFrequencies(vcf, force = FALSE)## S4 method for signature 'ExpandedVCF' addOverallFrequencies(vcf, force = FALSE)
vcf |
|
force |
If If |
Genotypes and frequencies are calculated across all the
samples in the ExpandedVCF object, and stored in INFO fields
named according to settings stored in the TVTBparam object (see below).
The param object controls the key of INFO fields as
follows:
names(ref(param))Count of reference homozygote genotypes.
names(het(param))Count of heterozygote genotypes.
names(alt(param))Count of alternate homozygote genotypes.
aaf(param)Alternate allele frequency.
maf(param)Minor allele frequency
ExpandedVCF object including additional
INFO fields for genotype counts and allele frequencies.
See Details.
A warning message is issued if genotypes are not fully defined in the
TVTBparam.
Kevin Rue-Albrecht
addFrequencies,ExpandedVCF,list-method,
addPhenoLevelFrequencies,ExpandedVCF-method,
and VCF.
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addOverallFrequencies(vcf, tparam)# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addOverallFrequencies(vcf, tparam)
Adds genotypes counts
(reference homozygote, heterozygote, and alternate homozygote)
and allele frequencies (alternate and minor) calculated
in a group of samples associated with a given level of a given phenotype
as INFO fields in an ExpandedVCF object.
Multiple genotypes can be counted toward a single
frequency (e.g. combined c("0/0", "0|0") for
homozygote reference genotypes).
## S4 method for signature 'ExpandedVCF' addPhenoLevelFrequencies( vcf, pheno, level, force = FALSE)## S4 method for signature 'ExpandedVCF' addPhenoLevelFrequencies( vcf, pheno, level, force = FALSE)
vcf |
|
pheno |
Phenotype in |
level |
Phenotype level in |
force |
If If |
Genotypes and frequencies are calculated within the groups of samples
associated with the given level of the given phenotype,
and stored in INFO fields
named according to settings stored in
metadata(vcf)[["TVTBparam"]] (see below).
The TVTBparam object controls
the key suffix of INFO fields as follows:
names(ref(param))Count of reference homozygote genotypes.
names(het(param))Count of heterozygote genotypes.
names(alt(param))Count of alternate homozygote genotypes.
aaf(param)Alternate allele frequency.
maf(param)Minor allele frequency
ExpandedVCF object including additional
INFO fields for genotype counts and allele frequencies.
See Details.
A warning message is issued if genotypes are not fully defined in the
TVTBparam.
Kevin Rue-Albrecht
addFrequencies,ExpandedVCF,list-method,
addOverallFrequencies,ExpandedVCF-method,
VCF,
and TVTBparam.
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addPhenoLevelFrequencies(vcf, "super_pop", "AFR")# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- addPhenoLevelFrequencies(vcf, "super_pop", "AFR")
This method attempts to auto-detect genotypes (i.e.
homozygote reference, heterozygote, and homozygote alternate)
in a VCF object, and sets or creates a TVTBparam
object accordingly, in the metadata slot.
## S4 method for signature 'VCF' autodetectGenotypes(vcf)## S4 method for signature 'VCF' autodetectGenotypes(vcf)
vcf |
|
VCF object including a new or updated TVTBparam object
in metadata(vcf)[["TVTBparam"]] .
A warning message is issued if genotypes cannot be fully defined.
Kevin Rue-Albrecht
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam() # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) # warning expected vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- autodetectGenotypes(vcf)# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam() # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) # warning expected vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- autodetectGenotypes(vcf)
Counts the total occurences of a set of genotypes by row
in a matrix of genotype. All given genotypes are counted toward a single
total (e.g. grand total of c("0/0", "0|0")),
while other genotypes are silently ignored.
## S4 method for signature 'ExpandedVCF' countGenos( x, genos, pheno = NULL, level = NULL)## S4 method for signature 'ExpandedVCF' countGenos( x, genos, pheno = NULL, level = NULL)
x |
|
genos |
|
pheno |
If |
level |
If |
An integer vector representing the aggregated count of the given
genotypes in each row.
Kevin Rue-Albrecht
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- countGenos(vcf, het(tparam), "super_pop", "AFR")# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vcf <- countGenos(vcf, het(tparam), "super_pop", "AFR")
Given a character vector of INFO keys, removes either the associated
header, data, or both from a VCF object.
If no INFO key is given (the default), all INFO keys are checked and removed
from the given slot if they do not have a matching entry in the other slot.
## S4 method for signature 'VCF' dropInfo( vcf, key = NULL, slot = "both")## S4 method for signature 'VCF' dropInfo( vcf, key = NULL, slot = "both")
vcf |
|
key |
If |
slot |
Should the INFO keys be removed from the "header", the "data",
or |
An integer vector representing the aggregated count of the given
genotypes in each row.
In the future, x should also support genotype quality (GQ) to consider
only genotypes above a given quality cut-off.
Kevin Rue-Albrecht
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- dropInfo(vcf) dropInfo(vcf, "CSQ")# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- dropInfo(vcf) dropInfo(vcf, "CSQ")
The Genotypes class stores genotype definitions in a convenient format.
Genotypes( ref = NA_character_, het = NA_character_, alt = NA_character_, suffix = c(ref="REF", het="HET", alt="ALT"))Genotypes( ref = NA_character_, het = NA_character_, alt = NA_character_, suffix = c(ref="REF", het="HET", alt="ALT"))
ref |
A |
het |
A |
alt |
A |
suffix |
Set the individual INFO key suffixes used to store the statistics of homozygote reference, heterozygote, and homozygote alternate genotypes, in this order. See Details section. |
Genotypes may be initialised as NA_character_ and updated from an
imported VCF object using the
autodetectGenotypes method.
This may be useful if genotype encodings are not known beforehand.
For each suffix stored in the Genotypes object, TVTB
may store data in the VCF object under the INFO keys defined as follows:
Statistics across all samples in the ExpandedVCF
(e.g. "MAF").
Statistics across samples associated with a given level of a given phenotype (e.g. "gender_male_MAF").
Users are recommended to avoid using those INFO keys for other purposes.
A Genotypes object that contains genotype definitions.
In the following code snippets x is a Genotypes object.
ref(x), ref(x) <- value
Gets or sets the vector that declares homozygote reference genotypes.
het(x), het(x) <- value
Gets or sets the vector that declares heterozygote genotypes.
alt(x), alt(x) <- value
Gets or sets the vector that declares homozygote alternate genotypes.
genos(x)Gets a vector of concatenated
homozygote reference, heterozygote, and homozygote alternate genotypes.
See also ref, het, alt, and carrier
accessors.
carrier(x)Gets a vector of concatenated
heterozygote and homozygote alternate genotypes.
See also het and alt accessors.
suffix(x)Gets a named character vector that declares individual suffixes
used to store the data for each set of genotypes in the INFO field of the
VCF object.
Names of this vector are ref, het, and alt.
Kevin Rue-Albrecht
VCF,
TVTBparam,
and addCountGenos-methods.
# Constructors ---- genotypes <- Genotypes("0|0", c("0|1", "1|0"), "1|1") # Accessors ---- ## Concatenated homozygote reference, heterozygote, and alternate heterozygote ## genotypes stored in the Genotypes object returned by the genos() accessor. genos(genotypes) ## Individual genotypes can be extracted with ref(), het(), alt() accessors. ref(genotypes) het(genotypes) alt(genotypes) ## Their individual INFO key suffixes can be extracted with suffix() accessors ## and the relevant name suffix(genotypes) suffix(genotypes)["ref"] suffix(genotypes)["het"] suffix(genotypes)["alt"] ## Concatenated heterozygote, and alternate heterozygote genotypes are ## returned by the carrier() accessor. carrier(genotypes) names(carrier(genotypes))# Constructors ---- genotypes <- Genotypes("0|0", c("0|1", "1|0"), "1|1") # Accessors ---- ## Concatenated homozygote reference, heterozygote, and alternate heterozygote ## genotypes stored in the Genotypes object returned by the genos() accessor. genos(genotypes) ## Individual genotypes can be extracted with ref(), het(), alt() accessors. ref(genotypes) het(genotypes) alt(genotypes) ## Their individual INFO key suffixes can be extracted with suffix() accessors ## and the relevant name suffix(genotypes) suffix(genotypes)["ref"] suffix(genotypes)["het"] suffix(genotypes)["alt"] ## Concatenated heterozygote, and alternate heterozygote genotypes are ## returned by the carrier() accessor. carrier(genotypes) names(carrier(genotypes))
Make a matrix of plots that display a metric calculated in levels of a given
phenotype,
and stored in columns of the info slot of a VCF object.
## S4 method for signature 'VCF' pairsInfo(vcf, metric, phenotype, ..., title = metric)## S4 method for signature 'VCF' pairsInfo(vcf, metric, phenotype, ..., title = metric)
vcf |
|
metric |
Metric to plot on the Y axis.
All columns in the |
phenotype |
Column in the |
... |
Additional arguments, passed to the |
title |
Title for the graph, passed to the |
gg object returned by the ggpairs method.
Kevin Rue-Albrecht
ggpairs,
addPhenoLevelFrequencies,ExpandedVCF-method,
and VCF.
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addFrequencies(vcf, "super_pop") # Example usage ---- pairsInfo(vcf, "MAF", "super_pop")# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addFrequencies(vcf, "super_pop") # Example usage ---- pairsInfo(vcf, "MAF", "super_pop")
Parse the CSQ column in a VCF object returned from the Ensembl Variant Effect Predictor (VEP).
**This method was rescued following the deprecation of the package
ensemblVEP in the Bioconductor release 3.20.**
## S4 method for signature 'VCF' parseCSQToGRanges(x, VCFRowID=character(), ..., info.key = "CSQ")## S4 method for signature 'VCF' parseCSQToGRanges(x, VCFRowID=character(), ..., info.key = "CSQ")
x |
A |
VCFRowID |
A When |
info.key |
The name of the INFO key that VEP writes the consequences to in the output
(default is |
... |
Arguments passed to other methods. Currently not used. |
When ensemblVEP returns a VCF object, the consequence data
are returned unparsed in the 'CSQ' INFO column. parseCSQToGRanges
parses these data into a GRanges object that is expanded to match
the dimension of the 'CSQ' data. Because each variant can have multiple
matches, the ranges in the GRanges are repeated.
If rownames from the original VCF are provided as VCFRowID a
metadata column is included in the result that maps back to the row
(variant) in the original VCF. This option is only applicable when the
info.key field has data (is not empty).
If no info.key column is found the function returns the data in
rowRanges().
Returns a GRanges object with consequence data as the
metadata columns. If no 'CSQ' column is found the GRanges
from rowRanges() is returned.
Valerie Obenchain, Kevin Rue-Albrecht
Ensembl VEP Home: http://uswest.ensembl.org/info/docs/tools/vep/index.html
library(VariantAnnotation) file <- system.file("extdata", "moderate.vcf", package = "TVTB") vep <- readVcf(file) ## The returned 'CSQ' data are unparsed. info(vep)$CSQ ## Parse into a GRanges and include the 'VCFRowID' column. vcf <- readVcf(file, "hg19") csq <- parseCSQToGRanges(vep, VCFRowID=rownames(vcf)) csq[1:4]library(VariantAnnotation) file <- system.file("extdata", "moderate.vcf", package = "TVTB") vep <- readVcf(file) ## The returned 'CSQ' data are unparsed. info(vep)$CSQ ## Parse into a GRanges and include the 'VCFRowID' column. vcf <- readVcf(file, "hg19") csq <- parseCSQToGRanges(vep, VCFRowID=rownames(vcf)) csq[1:4]
Plot, on a genomic axis, a metric calculated in levels of a given phenotype,
and stored in columns of the info slot of a VCF object.
## S4 method for signature 'VCF' plotInfo( vcf, metric, range, annotation, phenotype, type = c("p", "heatmap"), zero.rm = FALSE)## S4 method for signature 'VCF' plotInfo( vcf, metric, range, annotation, phenotype, type = c("p", "heatmap"), zero.rm = FALSE)
vcf |
|
metric |
Metric to plot on the Y axis.
All columns in the |
range |
A |
annotation |
An |
phenotype |
Column in the |
type |
Plotting type(s), as listed in |
zero.rm |
If |
list returned by the plotTracks method.
Kevin Rue-Albrecht
plotTracks,
addPhenoLevelFrequencies,ExpandedVCF-method,
and VCF.
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addFrequencies(vcf, "super_pop") # Example usage ---- if (requireNamespace("EnsDb.Hsapiens.v75")){ plotInfo( vcf, "MAF", range(GenomicRanges::granges(vcf)), EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75, "super_pop" ) }# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addFrequencies(vcf, "super_pop") # Example usage ---- if (requireNamespace("EnsDb.Hsapiens.v75")){ plotInfo( vcf, "MAF", range(GenomicRanges::granges(vcf)), EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75, "super_pop" ) }
Read Variant Call Format (VCF) files, attaches the given TVTBparam
in the metadata slot of the resulting VCF object,
and attaches optional phenotype information in the phenoData slot.
## S4 method for signature 'character,TVTBparam' readVcf( file, genome, param, ..., colData = DataFrame(), autodetectGT = FALSE) ## S4 method for signature 'TabixFile,TVTBparam' readVcf( file, genome, param, ..., colData = DataFrame(), autodetectGT = FALSE)## S4 method for signature 'character,TVTBparam' readVcf( file, genome, param, ..., colData = DataFrame(), autodetectGT = FALSE) ## S4 method for signature 'TabixFile,TVTBparam' readVcf( file, genome, param, ..., colData = DataFrame(), autodetectGT = FALSE)
file, genome
|
See |
param |
The |
... |
Additional arguments, passed to methods. |
colData |
Phenotype information in a If supplied, only samples identifiers present in |
autodetectGT |
If |
VCF object.
See ?VCF for complete details of the class structure.
A warning message is issued if genotypes cannot be fully defined,
when autodetectGT=TRUE.
Kevin Rue-Albrecht
readVcf,TabixFile,ScanVcfParam-method,
and VCF.
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf.gz", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Example usage ---- vcf <- readVcf(vcfFile, "b37", tparam, colData = phenotypes)# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf.gz", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame(read.table(phenoFile, TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Example usage ---- vcf <- readVcf(vcfFile, "b37", tparam, colData = phenotypes)
Currently unsupported — Package undergoing major updates.
This function starts the interactive tSVE shiny web-application that
allows to interactively load and visualise genetic variants and their
Ensembl Variant Effect Predictor (VEP) predictions using the package methods.
All arguments after the ... set default values for the application
(e.g. widgets).
tSVE( ..., refGT = "0|0", hetGT = c("0|1", "1|2", "0|2", "1|0", "2|1", "2|0"), altGT = c("1|1", "2|2"), vepKey = "CSQ", refSuffix = "REF", hetSuffix = "HET", altSuffix = "ALT", aafSuffix = "AAF", mafSuffix = "MAF", genoHeatmap.height = "500px", options.width = 120, autodetectGTimport = FALSE )tSVE( ..., refGT = "0|0", hetGT = c("0|1", "1|2", "0|2", "1|0", "2|1", "2|0"), altGT = c("1|1", "2|2"), vepKey = "CSQ", refSuffix = "REF", hetSuffix = "HET", altSuffix = "ALT", aafSuffix = "AAF", mafSuffix = "MAF", genoHeatmap.height = "500px", options.width = 120, autodetectGTimport = FALSE )
... |
Additional arguments passed to the |
refGT |
Default homozygote reference genotypes. |
hetGT |
Default heterozygote genotypes. |
altGT |
Default homozygote alternate genotypes. |
vepKey |
Default INFO key for the VEP prediction field. |
refSuffix |
Default INFO key suffix used to store the data for homozygote reference genotypes. |
hetSuffix |
Default INFO key suffix used to store the data for heterozygote genotypes. |
altSuffix |
Default INFO key suffix used to store the data for homozygote alternate genotypes. |
aafSuffix |
Default INFO key suffix used to store the data for alternate allele frequency. |
mafSuffix |
Default INFO key suffix used to store the data for minor allele frequency. |
genoHeatmap.height |
Default height (in pixels) of the heatmap that represents the genotype of each variant in each sample. |
options.width |
Sets |
autodetectGTimport |
Default checkbox value. If |
Not applicable (yet).
Kevin Rue-Albrecht
Interface to EnsDb adapted from the ensembldb package.
if (interactive()){ runEnsDbApp() }if (interactive()){ runEnsDbApp() }
The TVTBparam class stores recurrent parameters of the TVTB
package in a convenient format.
TVTBparam( genos, ranges = GRangesList(), aaf = "AAF", maf = "MAF", vep = "CSQ", bp = SerialParam(), svp = ScanVcfParam(which = reduce(unlist(ranges))))TVTBparam( genos, ranges = GRangesList(), aaf = "AAF", maf = "MAF", vep = "CSQ", bp = SerialParam(), svp = ScanVcfParam(which = reduce(unlist(ranges))))
genos |
A |
ranges |
A |
aaf |
INFO key suffix used to store the alternate allele frequency (AAF). |
maf |
INFO key suffix used to store the minor allele frequency (MAF). |
vep |
INFO key suffix used to extract the VEP predictions. See |
bp |
A |
svp |
A |
For each suffix stored in the TVTBparam object, TVTB
may store data in the VCF object under the INFO keys defined as follows:
Statistics across all samples in the ExpandedVCF
(e.g. "MAF").
Statistics across samples associated with a given level of a given phenotype (e.g. "gender_male_MAF").
Users are recommended to avoid using those INFO keys for other purposes.
A TVTBparam object that contains recurrent parameters.
In the following code snippets x is a TVTBparam object.
genos(x), genos(x) <- value
Gets or sets the Genotypes object stored in the genos slot.
ranges(x), ranges(x) <- value
List of genomic ranges to group variants during analyses and plots.
ref(x), ref(x) <- value
Gets or sets the character vector
that declares homozygote reference genotypes.
het(x), het(x) <- value
Gets or sets the character vector
that declares heterozygote genotypes.
alt(x), alt(x) <- value
Gets or sets the character vector
that declares homozygote alternate genotypes.
carrier(x)Gets a character vectors of concatenated
heterozygote and homozygote alternate genotypes.
See also het and alt accessors.
aaf(x), aaf(x) <- value
Gets or sets the INFO key suffix used to store the alternate allele frequency (AAF).
maf(x), maf(x) <- value
Gets or sets the INFO key suffix used to store the minor allele frequency (MAF).
vep(x), maf(x) <- value
Gets or sets the INFO key suffix used to extract the VEP predictions.
bp(x), bp(x) <- value
Gets or sets the BiocParallel parameters.
suffix(x)Gets a named character vector that declares individual suffixes
used to store the data for each set of genotypes in the INFO field of the
VCF object.
Names of this vector are ref, het, alt, aaf,
and maf.
svp(x), svp(x) <- value
Gets or sets the ScanVcfParam parameters.
Kevin Rue-Albrecht
Genotypes,
VCF,
ExpandedVCF,
addCountGenos-methods
vepInPhenoLevel-methods,
variantsInSamples-methods,
and BiocParallelParam.
# Constructors ---- grl <- GenomicRanges::GRangesList(GenomicRanges::GRanges( "15", IRanges::IRanges(48413170, 48434757, names = "SLC24A5") )) tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1"), ranges = grl) # Accessors ---- ## The Genotypes object stored in the genos slot of the TVTBparam object ## return by the genos() accessor. genos(tparam) ## Genomic ranges stored in the TVTBparam object returned by the ranges() ## accessor. ranges(tparam) ## Individual genotypes can be extracted with ref(), het(), alt() accessors. ref(tparam) het(tparam) alt(tparam) ## Their individual INFO key suffixes can be extracted with suffix() applied to ## the above accessors. suffix(tparam) suffix(tparam)["ref"] suffix(tparam)["het"] suffix(tparam)["alt"] suffix(tparam)["aaf"] suffix(tparam)["maf"] ## Heterozygote, and alternate heterozygote genotypes are ## returned by the carrier() accessor. carrier(tparam) ## INFO key suffix of alternate/minor allele frequency returned by the aaf() ## and maf() accessors. aaf(tparam) maf(tparam) ## INFO key suffix of the VEP predictions returned by the vep() accessor. vep(tparam) ## BiocParallel parameters bp(tparam) ## ScanVcfParam parameters svp(tparam)# Constructors ---- grl <- GenomicRanges::GRangesList(GenomicRanges::GRanges( "15", IRanges::IRanges(48413170, 48434757, names = "SLC24A5") )) tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1"), ranges = grl) # Accessors ---- ## The Genotypes object stored in the genos slot of the TVTBparam object ## return by the genos() accessor. genos(tparam) ## Genomic ranges stored in the TVTBparam object returned by the ranges() ## accessor. ranges(tparam) ## Individual genotypes can be extracted with ref(), het(), alt() accessors. ref(tparam) het(tparam) alt(tparam) ## Their individual INFO key suffixes can be extracted with suffix() applied to ## the above accessors. suffix(tparam) suffix(tparam)["ref"] suffix(tparam)["het"] suffix(tparam)["alt"] suffix(tparam)["aaf"] suffix(tparam)["maf"] ## Heterozygote, and alternate heterozygote genotypes are ## returned by the carrier() accessor. carrier(tparam) ## INFO key suffix of alternate/minor allele frequency returned by the aaf() ## and maf() accessors. aaf(tparam) maf(tparam) ## INFO key suffix of the VEP predictions returned by the vep() accessor. vep(tparam) ## BiocParallel parameters bp(tparam) ## ScanVcfParam parameters svp(tparam)
Identifies variants observed (uniquely) in at least one sample of a given group.
## S4 method for signature 'ExpandedVCF' variantsInSamples( vcf, samples = 1:ncol(vcf), unique = FALSE)## S4 method for signature 'ExpandedVCF' variantsInSamples( vcf, samples = 1:ncol(vcf), unique = FALSE)
vcf |
|
samples |
|
unique |
If |
An named integer vector of indices indicating the name and index of
variants that are (uniquely) observed in at least one non-reference genotype
in the given group of samples.
A warning message is issued if genotypes are not fully defined in the
TVTBparam.
Kevin Rue-Albrecht
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame( read.table(file = phenoFile, header = TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- variantsInSamples( vcf, which(SummarizedExperiment::colData(vcf)[,"super_pop"] == "EUR"))# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame( read.table(file = phenoFile, header = TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- variantsInSamples( vcf, which(SummarizedExperiment::colData(vcf)[,"super_pop"] == "EUR"))
The VcfFixedRules and VcfInfoRules classes
store filters applicable to the fixed and info slots
of VCF objects, respectively.
The VcfVepRules stores filters applicable to Ensembl VEP predictions
stores in a given INFO key.
All arguments are first passed to S4Vectors::FilterRules
before re-typing the resulting as a VcfFixedRules, VcfInfoRules,
or VcfVepRules class.
In the following code snippets x is an object from any of the classes
decribed in this help page, except when specified otherwise.
active(x), active(x)<-
Gets or sets the active state of each filter rule in x.
Inherited from FilterRules
vep(x), vep(x)<-
Gets or sets the INFO key where the Ensembl VEP predictions
to use for filtering are stored.
Returns NA_character_ for filters not applicable to VEP predictions.
type(x)Returns "filter" (linkS4class{FilterRules}),
"fixed" (linkS4class{VcfFixedRules}),
"info" (linkS4class{VcfInfoRules}),
or "vep" (linkS4class{VcfVepRules})
as a character vector of length(x).
VcfFixedRules(exprs = list(), ..., active = TRUE)
VcfInfoRules(exprs = list(), ..., active = TRUE)
VcfVepRules(exprs = list(), ..., active = TRUE, vep = "CSQ")
All methods construct an object of the corresponding class
with the rules given in the list exprs or in ....
The initial active state of the rules is given by active,
which is recycled as necessary.
See the constructor of FilterRules for more details.
In the following code snippets x and value
are objects from any of the classes described in this help page.
x[i]: Subsets the filter rules using the same interface as for
List.
x[[i]]: Extracts an expression or function via the same interface
as for List.
x[i] <- value: Replaces a filter rule by one of the
same class.
The active state(s) and name(s) are transferred from value to
x.
x[[i]] <- value:
The same interface as for List.
The default active state for new rules is TRUE.
In the following code snippets x, values, and ...
are objects from any of the classes described in this help page, or
VcfFilterRules.
append(x, values, after = length(x)):
Appends the values onto x at the index given by after.
c(x, ...,):
Concatenates the filters objects in ... onto the end of x.
Note that combining rules of different types
(e.g. VcfFixedRules and VcfVepRules)
produces a VcfFilterRules object.
As described in the S4Vectors documentation:
eval(expr, envir, enclos):
Evaluates a rule instance (passed as the expr argument)
in their respective context of a VCF object
(passed as the envir argument).
i.e.:
VcfFixedRules: fixed(envir)
VcfInfoRules: info(envir)
VcfVepRules:
mcols(parseCSQToGRanges(envir, ...))
FilterRules: envir
evalSeparately(expr, envir, enclos):
subsetByFilter(x, filter)
summary(object)
See eval,FilterRules,ANY-method for details.
Kevin Rue-Albrecht
FilterRules,
VcfFilterRules,
and VCF.
# Constructors ---- fixedRules <- VcfFixedRules(list( pass = expression(FILTER == "PASS"), qual = expression(QUAL > 20) )) fixedRules infoRules <- VcfInfoRules(list( common = expression(MAF > 0.01), # minor allele frequency alt = expression(ALT > 0) # count of alternative homozygotes )) infoRules vepRules <- VcfVepRules(list( missense = expression(Consequence %in% c("missense_variant")), CADD = expression(CADD_PHRED > 15) )) vepRules filterRules <- S4Vectors::FilterRules(list( PASS = function(x) fixed(x)$FILTER == "PASS", COMMON = function(x) info(x)$MAF > 0.05 )) filterRules # Accessors ---- ## get/set the active state directly S4Vectors::active(infoRules) S4Vectors::active(infoRules)["common"] <- FALSE ## See S4Vectors::FilterRules for more examples # Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addOverallFrequencies(vcf) # Applying filters to VCF objects ---- ## Evaluate filters S4Vectors::eval(fixedRules, vcf) S4Vectors::eval(infoRules, vcf) S4Vectors::eval(vepRules, vcf) S4Vectors::eval(filterRules, vcf) summary(S4Vectors::eval(vepRules, vcf)) ## Evaluate filters separately S4Vectors::evalSeparately(vepRules, vcf) summary(S4Vectors::evalSeparately(vepRules, vcf)) ## Subset VCF by filters S4Vectors::subsetByFilter(vcf, vepRules) # Subsetting and Replacement ---- vep1 <- vepRules[1] # VcfVepRules vepRules[[1]] # expression # Replace the expression (active reset to TRUE, original name retained) vepRules[[2]] <- expression(CADD_PHRED > 30) # Replace the rule (active state and name transferred from v5obj) vepRules[2] <- VcfVepRules( list(newRule = expression(CADD_PHRED > 30)), active = FALSE)# Constructors ---- fixedRules <- VcfFixedRules(list( pass = expression(FILTER == "PASS"), qual = expression(QUAL > 20) )) fixedRules infoRules <- VcfInfoRules(list( common = expression(MAF > 0.01), # minor allele frequency alt = expression(ALT > 0) # count of alternative homozygotes )) infoRules vepRules <- VcfVepRules(list( missense = expression(Consequence %in% c("missense_variant")), CADD = expression(CADD_PHRED > 15) )) vepRules filterRules <- S4Vectors::FilterRules(list( PASS = function(x) fixed(x)$FILTER == "PASS", COMMON = function(x) info(x)$MAF > 0.05 )) filterRules # Accessors ---- ## get/set the active state directly S4Vectors::active(infoRules) S4Vectors::active(infoRules)["common"] <- FALSE ## See S4Vectors::FilterRules for more examples # Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addOverallFrequencies(vcf) # Applying filters to VCF objects ---- ## Evaluate filters S4Vectors::eval(fixedRules, vcf) S4Vectors::eval(infoRules, vcf) S4Vectors::eval(vepRules, vcf) S4Vectors::eval(filterRules, vcf) summary(S4Vectors::eval(vepRules, vcf)) ## Evaluate filters separately S4Vectors::evalSeparately(vepRules, vcf) summary(S4Vectors::evalSeparately(vepRules, vcf)) ## Subset VCF by filters S4Vectors::subsetByFilter(vcf, vepRules) # Subsetting and Replacement ---- vep1 <- vepRules[1] # VcfVepRules vepRules[[1]] # expression # Replace the expression (active reset to TRUE, original name retained) vepRules[[2]] <- expression(CADD_PHRED > 30) # Replace the rule (active state and name transferred from v5obj) vepRules[2] <- VcfVepRules( list(newRule = expression(CADD_PHRED > 30)), active = FALSE)
The VcfFilterRules class can stores multiple types of filters
applicable to various slots of VCF objects.
All arguments must be VcfFixedRules, VcfInfoRules,
VcfVepRules, VcfFilterRules of FilterRules objects.
In the following code snippets x is a VcfFilterRules object.
active(x), active(x)<-
Get or set the active state of each filter rule in x.
Inherited from FilterRules
vep(x), vep(x)<-
Gets or sets the INFO key where the Ensembl VEP predictions to use for filtering are stored.
type(x)Gets the type of each filter stored in a VcfFilterRules object.
Read-only.
VcfFilterRules(...)
constructs an VcfFilterRules object from
VcfFixedRules, VcfInfoRules, VcfVepRules,
and VcfFilterRules objects in ....
In the code snippets below, x is a VcfFilterRules object.
x[i, drop = TRUE]: Subsets the filter rules using the
same interface as for Vector.
If all filter rules are of the same type and drop=TRUE (default),
the resulting object is re-typed to the most specialised class,
if possible. In other words, if all remaining filter rules are of
type "vep", the object will be type as VcfVepRules.
x[[i]]: Extracts an expression or function via the same interface
as for List.
x[i] <- value: Replaces a filter rule by one of any valid class
(VcfFixedRules, VcfInfoRules, VcfVepRules, or
VcfFilterRules).
The active state(s), name(s), and type(s) (if applicable)
are transferred from value.
x[[i]] <- value: The same interface as for
List. The default active state for new
rules is TRUE.
In the following code snippets x is an object of class
VcfFilterRules,
while values and ... are objects from any of the
classes VcfFixedRules, VcfInfoRules, VcfVepRules,
or VcfFilterRules:
append(x, values, after = length(x)):
Appends the values onto x at the index given by after.
c(x, ...,):
Concatenates the filters objects in ... onto the end of x.
As described in the S4Vectors documentation:
eval(expr, envir, enclos)
Evaluates each active rule in a VcfFilterRules instance
(passed as the expr argument)
in their respective context of a VCF object
(passed as the envir argument).
evalSeparately(expr, envir, enclos):
subsetByFilter(x, filter)
summary(object)
See eval,FilterRules,ANY-method for details.
Kevin Rue-Albrecht
FilterRules,
VcfFixedRules,
VcfInfoRules,
VcfVepRules,
and VCF.
# Constructors ---- fixedR <- VcfFixedRules(list( pass = expression(FILTER == "PASS"), qual = expression(QUAL > 20) )) fixedR infoR <- VcfInfoRules(list( common = expression(MAF > 0.1), # minor allele frequency present = expression(ALT + HET > 0) # count of non-REF homozygotes )) # ...is synonym to... infoR <- VcfInfoRules(list( common = expression(MAF > 0.1), # minor allele frequency present = expression(ALT > 0 | HET > 0) )) infoR vepR <- VcfVepRules(list( missense = expression(Consequence %in% c("missense_variant")), CADD = expression(CADD_PHRED > 15) )) vepR vcfRules <- VcfFilterRules(fixedR, infoR, vepR) vcfRules # Accessors ---- ## Type of each filter stored in the VcfFilterRules object type(vcfRules) # Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addOverallFrequencies(vcf, tparam) # Applying filters to VCF objects ---- ## Evaluate filters eval(vcfRules, vcf) ## Evaluate filters separately as.data.frame(evalSeparately(vcfRules, vcf)) # Interestingly, the only common missense variant has a lower CADD score ## Deactivate the CADD score filter active(vcfRules)["CADD"] <- FALSE ## Subset VCF by filters (except CADD, deactivated above) subsetByFilter(vcf, vcfRules) # Subsetting and Replacement ---- v123 <- vcfRules[1:3] # Extract the expression v5expr <- vcfRules[[5]] # Subset the object v5obj <- vcfRules[5] # Replace the expression (active reset to TRUE, original name retained) v123[[2]] <- v5expr # Replace the rule (active state and name transferred from v5obj) v123[2] <- v5obj# Constructors ---- fixedR <- VcfFixedRules(list( pass = expression(FILTER == "PASS"), qual = expression(QUAL > 20) )) fixedR infoR <- VcfInfoRules(list( common = expression(MAF > 0.1), # minor allele frequency present = expression(ALT + HET > 0) # count of non-REF homozygotes )) # ...is synonym to... infoR <- VcfInfoRules(list( common = expression(MAF > 0.1), # minor allele frequency present = expression(ALT > 0 | HET > 0) )) infoR vepR <- VcfVepRules(list( missense = expression(Consequence %in% c("missense_variant")), CADD = expression(CADD_PHRED > 15) )) vepR vcfRules <- VcfFilterRules(fixedR, infoR, vepR) vcfRules # Accessors ---- ## Type of each filter stored in the VcfFilterRules object type(vcfRules) # Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf(vcfFile, param = tparam) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) vcf <- addOverallFrequencies(vcf, tparam) # Applying filters to VCF objects ---- ## Evaluate filters eval(vcfRules, vcf) ## Evaluate filters separately as.data.frame(evalSeparately(vcfRules, vcf)) # Interestingly, the only common missense variant has a lower CADD score ## Deactivate the CADD score filter active(vcfRules)["CADD"] <- FALSE ## Subset VCF by filters (except CADD, deactivated above) subsetByFilter(vcf, vcfRules) # Subsetting and Replacement ---- v123 <- vcfRules[1:3] # Extract the expression v5expr <- vcfRules[[5]] # Subset the object v5obj <- vcfRules[5] # Replace the expression (active reset to TRUE, original name retained) v123[[2]] <- v5expr # Replace the rule (active state and name transferred from v5obj) v123[2] <- v5obj
Returns VEP predictions for variants observed (uniquey) in samples associated with a given phenotype level.
## S4 method for signature 'ExpandedVCF' vepInPhenoLevel( vcf, phenoCol, level, vepCol, unique = FALSE)## S4 method for signature 'ExpandedVCF' vepInPhenoLevel( vcf, phenoCol, level, vepCol, unique = FALSE)
vcf |
|
phenoCol |
Name of a column in |
level |
Phenotype level; only variants observed in at least one sample will be considered. |
vepCol |
VEP prediction fields; |
unique |
If |
A GRanges including all VEP predictions associated with a variant
seen in at least one sample (heterozygote or alternate homozygote)
associated with the phenotype level.
The GRanges contains at least one column for the VEP prediction
value.
Additional columns containing another VEP prediction field may be added
using the facet argument.
If available, "Feature" is a recommended value for this argument,
as VEP typically produce one prediction per variant per feature.
A warning message is issued if genotypes are not fully defined in the
TVTBparam.
Kevin Rue-Albrecht
# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame( read.table(file = phenoFile, header = TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vepInPhenoLevel(vcf, "super_pop", "AFR", c("CADD_PHRED", "Feature", "IMPACT"))# Example data ---- # VCF file vcfFile <- system.file("extdata", "moderate.vcf", package = "TVTB") # Phenotype file phenoFile <- system.file("extdata", "moderate_pheno.txt", package = "TVTB") phenotypes <- S4Vectors::DataFrame( read.table(file = phenoFile, header = TRUE, row.names = 1)) # TVTB parameters tparam <- TVTBparam(Genotypes("0|0", c("0|1", "1|0"), "1|1")) # Pre-process variants vcf <- VariantAnnotation::readVcf( vcfFile, param = tparam, colData = phenotypes) vcf <- VariantAnnotation::expand(vcf, row.names = TRUE) # Example usage ---- vepInPhenoLevel(vcf, "super_pop", "AFR", c("CADD_PHRED", "Feature", "IMPACT"))