NEWS
VariantAnnotation 1.36.0
NEW FEATURES
- ref<-, alt<-, qual<- and filt<- allow replacement value length
recycling
VariantAnnotation 1.28.0
NEW FEATURES
- Update package to support VCF format version 4.3
- SAMPLE field lines can now have key 'SAMPLE' or 'META'.
To avoid a name clash, the existing 'META' DataFrame has
been split by row into separate DataFrames. The
'meta(VCFHeader)' getter now returns one DataFrame
per unique key in the header.
- PEDIGREE header line now begins with 'ID'
- Add vcfFields method for character, VCFHeader, VcfFile and VCF
to return all available vcf fields in CharacterList().
- Add support for single breakend notation (thanks d-cameron)
BUG FIXES
- .formatInfo() now return a column with all 'NA' for a missing value
instead of dropping the column.
VariantAnnotation 1.26.0
MODIFICATIONS
- Clarify fixed fields must be
- Following renaming of RangesList class -> IntegerRangesList
- Updates to accommodate change to '[<-' method for SummarizedExperiment
- DP4 assumed to come from INFO field
- Extract altDepth and totalDepth from DP4 GENO field when present
- predictCoding() now respects 'alt_init_codons' at the start of CDS
region only
- Remove error message in 'rowRanges<-' and 'mcol<-' methods that
check for fixed column name.
VariantAnnotation 1.24.0
NEW FEATURES
- Add subset,VCF-method that knows about info()
- Add alt,ref accessors for VRangesList
- More efficient show,VCF-method
- 'rowRanges<-' and 'mcols<-' on VCF class behave as they do
on RangedSummarizedExperiment
- info,VCFHeader() and geno,VCFHeader() return a DataFrame with
the correct columns in the case of empty
BUG FIXES
- Fix "ref<-" recycling on VRanges
- Fix bug in locateVariants(); code to fetch IntronVariants() was
incorrectly fetching IntergenicVariants()
- Fix bug in rbind,VCF,VCF-method
VariantAnnotation 1.22.0
NEW FEATURES
- add import() wrapper for VCF files
- add support for Number='R' in vcf parsing
- add indexVcf() and methods for character,VcfFile,VcfFileList
MODIFICATIONS
- throw message() instead of warning() when non-nucleotide
variations are set to NA
- replace 'force=TRUE' with 'pruning.mode="coarse"' in seqlevels() setter
- add 'pruning.mode' argument to keepSeqlevels() in man page example
- idempotent VcfFile()
- add 'idType' arg to IntergenicVariants() constructor
- modify locateVariants man page example to work around
issue that distance,GRanges,TxDb does not support gene ranges on
multiple chromosomes
- modify VcfFile() constructor to detect index file if not specified
- order vignettes from intro to advanced; use BiocStyle::latex2()
- remove unused SNPlocs.Hsapiens.dbSNP.20110815 from the Suggests field
- follow rename change in S4Vectors from vector_OR_factor to
vector_or_factor
- pass classDef to .RsamtoolsFileList; VariantAnnotation may not be
on the search path
BUG FIXES
- fix expansion of 'A' fields when there are multiple columns
VariantAnnotation 1.20.0
NEW FEATURES
- add import() wrapper for VCF files
MODIFICATIONS
- use now-public R_GetConnection
- remove defunct readVcfLongForm() generic
- remove 'genome' argument from readVcf()
- improvements to VCF to VRanges coercion
- support Varscan2 AD/RD convention when coercing VCF to VRanges
- use [["FT"]] to avoid picking up FTZ field
- summarizeVariants() recognize '.' as missing GT field
- document scanVcfheader() behavior for duplicate row names
BUG FIXES
- ensure only 1 matching hub resource selected in filterVcf vignette
- fix check for FILT == "PASS"
- correct column alignment in makeVRangesFromGRanges()
- fix check for AD conformance
VariantAnnotation 1.19.0
NEW FEATURES
- add SnpMatrixToVCF()
- add patch from Stephanie Gogarten to support 'PL' in genotypeToSnpMatrix()
MODIFICATIONS
- move getSeq,XStringSet-method from VariantAnnotation to BSgenome
- update filterVcf vignette
- remove 'pivot' export
- work on readVcf():
- 5X speedup for readVcf (at least in one case) by not using "==" to
compare a list to a character (the list gets coerced to character,
which is expensive for huge VCFs)
- avoiding relist.list()
- update summarizeVariants() to comply with new SummarizedExperiment
rownames requirement
- defunct VRangesScanVcfParam() and restrictToSNV()
- use elementNROWS() instead of elementLengths()
- togroup(x) now only works on a ManyToOneGrouping object so replace
togroup(x, ...) calls with togroup(PartitioningByWidth(x), ...) when 'x'
is a list-like object that is not a ManyToOneGrouping object.
- drop validity assertion that altDepth must be NA when alt is NA
there are VCFs in the wild that use e.g. "*" for alt, but include depth
- export PLtoGP()
- VariantAnnotation 100% RangedData-free
BUG FIXES
- use short path names in src/Makevars.win
VariantAnnotation 1.18.0
MODIFICATIONS
- defunct VRangesScanVcfParam()
- defunct restrictToSNV()
BUG FIXES
- scanVcf,character,missing-method ignores blank data lines.
- Build path for C code made robust on Windows.
VariantAnnotation 1.16.0
NEW FEATURES
- support REF and ALT values ".", "+" and "-" in predictCoding()
- return non-translated characters in VARCODON in predictCoding() output
- add 'verbose' option to readVcf() and friends
- writeVcf() writes 'fileformat' header line always
- readVcf() converts REF and ALT values "*" and "I" to ” and '.'
MODIFICATIONS
- VRanges uses '*' strand by default
- coerce 'alt' to DNStringSet for predictCoding,VRanges-method
- add detail to documentation for 'ignore.strand' in predictCoding()
- be robust to single requrested INFO column not present in vcf file
- replace old SummarizedExperiment class from GenomicRanges with the
new new RangedSummarizedExperiment from SummarizedExperiment
package
- return strand of 'subject' for intronic variants in locateVariants()
BUG FIXES
- writeVcf() does not duplicate header lines when chunking
- remove extra tab after INFO when no FORMAT data are present
- filteVcf() supports 'param' with ranges
VariantAnnotation 1.14.0
NEW FEATURES
- gVCF support:
- missing END header written out with writeVcf()
- expand() handles NON_REF 'REF' value
- support 'Type=Character' in INFO header fields
- add 'row.names' argument to expand()
- add 'Efficient Usage' section to readVcf() man page
- efficiency improvements to info(..., row.names=)
- anyDuplicated() less expensive than any(duplicated())
- use row.names=FALSE when not needed, e.g., show()
- add genotypeCodesToNucleotides()
- add support for gvcf in isSNP family of functions
- add VcfFile and VcfFileList classes
- support 'Type=Character' of unspecified length (.)
- add isDelins() from Robert Castelo
- add makeVRangesFromGRanges() from Thomas Sandmann
MODIFICATIONS
- VCFHeader support:
- SAMPLE and PEDIGREE header fields are now parsed
- meta(VCFHeader) returns DataFrameList instead of DataFrame
- show(VCFHeader) displays the outer list names in meta
- fixed(VCFHeader) returns 'ALT' and 'REF' if present
- 'ALT' in expandedVCF output is DNAStringSet, not *List
- remove .listCumsum() and .listCumsumShifted() helpers
- add multiple INFO field unit test from Julian Gehring
- add additional expand() unit tests
- modify readVccfAsVRanges() to use ScanVcfParam() as the
'param'; deprecate VRangesScanVcfParam
- replace mapCoords() with mapToTranscripts()
- change 'CDSID' output from integer to IntegerList in
locateVariants() and predictCoding()
- add readVcf,character,ANY,ANY; remove readVcf,character,ANY,ScanVcfParam
- replace rowData() accessor with rowRanges()
- replace 'rowData' argument with 'rowRanges' (construct SE, VCF classes)
- replace getTranscriptSeqs() with extractTranscripts()
BUG FIXES
- readVcf() properly handles Seqinfo class as 'genome'
- allow 'ignore.strand' to pass through mapCoords()
- writeVcf() no longer ignores rows with no genotype field
- expand() properly handles
- less than all INFO fields are selected
- VCF has only one row
- only one INFO column
- don't call path() on non-*File objects
- split (relist) of VRanges now yields a CompressedVRangesList
- predictCoding() now ignores zero-width ranges
VariantAnnotation 1.12.0
NEW FEATURES
- allow GRanges in 'rowData' to hold user-defined metadata cols
(i.e., cols other than paramRangeID, REF, ALT, etc.)
- add isSNV() family of functions
- add faster method for converting a list matrix to an array
- add 'c' method for typed Rle classes so class is preserved
- add CITATION file
- rework writeVCF():
- FORMAT and genotype fields are parsed in C
- output file is written from C
- chunking added for large VCFs
MODIFICATIONS
- add 'row.names' to readVcf()
- deprecate restrictToSNV(); replaced by isSNV() family
- remove use of seqapply()
- show info / geno headers without splitting across blocks
- use mapCoords() in predictCoding() and locateVariants()
- deprecate refLocsToLocalLocs()
- propagate strand in predictCoding()
- replace deprecated seqsplit() with splitAsList()
- ensure GT field, if present, comes first in VCF output
- modify DESCRIPTION Author and Maintainer fileds with @R
- add 'row.names' to info,VCF-method
BUG FIXES
- modify expand() to work with no 'info' fields are imported
- remove duplicate rows from .splicesites()
- fix handling of real-valued NAs in geno omatrix construction
in writeVcf()
VariantAnnotation 1.10.0
NEW FEATURES
- add support for ##contig in VCF header
- add 'meta<-', 'info<-', 'geno<-' replacement methods for
VCFHeader
- add 'header<-' replacement method for VCF
- add strand to output from locationVariants()
- add support for writeVcf() to process Rle data in geno matrix
- readVcf() now parses 'geno' fields with Number=G as
((#alleles + 1) * (#alleles + 2)) / 2
- writeVcf() now sorts the VCF when 'index=TRUE'
- add 'fixed<-,VCFHeader,DataFrameList' method
- add convenience functions for reading VCF into VRanges
- add Rplinkseq test script
- add 'isSNV', 'isInsertion', 'isDeletion', 'isIndel',
'isTransition', 'isPrecise', 'isSV' and 'isSubstitution'
generics
- add 'isSNV', 'isInsertion', 'isDeletion', 'isIndel'
methods for VRanges and VCF classes
- add match methods between ExpandedVCF and VRanges
- add support for VRanges %in% TabixFile
MODIFICATIONS
- expand,VCF-method ignores 'AD' header of 'AD' geno is NULL
- add support for SIFT.Hsapiens.dbSNP137
- remove locateVariants() dependence on chr7-sub.vcf.gz
- modify expand() to handle 'AD' field where 'Number' is integer
- rename readVRangesFromVCF() to readVcfAsVRanges()
- remove check for circular chromosomes in locateVariants()
and predictCoding() and refLocsToLocalLocs()
- modify filterVcf() to handle ranges in ScanVcfParam
- pass 'genetic.code' through predictCoding()
- change default to 'row.names=TRUE' for readGT(), readGeno(),
and readInfo()
- fixed() on empty VCF now returns DataFrame with column names
and data types vs an empty DataFrame
- update biocViews
- modify 'show,VCF' to represent empty values in XStringSet
with '.'
- replace rtracklayer:::pasteCollapse with unstrsplit()
DEPRECATED and DEFUNCT
- remove defunct dbSNPFilter(), regionfilter() and MatrixToSnpMatrix()
- defunct readVcfLongForm()
BUG FIXES
- modify expand.geno() to handle case where header and geno don't match
- modify writeVcf() to write out rownames with ":" character
instead of treating as missing
- fix how sample names were passed from 'ScanVcfParam' to scanVcf()
- fix bug in 'show,VCF' method
- fix bugs in VRanges -> VCF coercion methods
- fix bug in lightweight read* functions that were ignoring
samples in ScanVcfParam
- fix bug in writeVcf() when no 'ALT' is present
VariantAnnotation 1.8.0
NEW FEATURES
- Add 'upstream' and 'downstream' arguments to IntergenicVariants()
constructor.
- Add 'samples' argument to ScanVcfParam().
- Add readGT(), readGeno() and readInfo().
- Add VRanges, VRangesList, SimpleVRangesList, and CompressedVRangesList
classes.
- Add coercion VRanges -> VCF and VCF -> VRanges.
- Add methods for VRanges family:
altDepth(), refDepth(), totalDepth(), altFraction()
called(), hardFilters(), sampleNames(), softFilterMatrix()
isIndel(), resetFilter().
- Add stackedSamples,VRangesList method.
MODIFICATIONS
- VCF validity method now requires the number of rows in info()
to match the length of rowData().
- PRECEDEID and FOLLOWID from locateVariants() are now CharacterLists
with all genes in 'upstream' and 'downstream' range.
- Modify rownames on rowData() GRanges to CHRAM:POS_REF/ALT for
variants with no ID.
- readVcf() returns info() and geno() in the order specified in
the ScanVcfParam.
- Work on scanVcf():
- free parse memory at first opportunity
- define it_next in .c rather than .h
- parse ALT "." in C
- hash incoming strings
- parse only param-requested 'fixed', 'info', 'geno' fields
- Add dimnames<-,VCF method to prevent 'fixed' fields from being
copied into 'rowData' when new rownames or colnames were assigned.
- Support read/write for an emtpy VCF.
- readVcf(file=character, ...) method attempts coercion to
TabixFile.
- Support for read/write an emtpy VCF.
- Add performance section to vignette; convert to BiocStyle.
- expand,CompressedVcf method expands geno() field 'AD' to
length ALT + 1. The expanded field is a (n x y x 2) array.
- 'genome' argument to readVcf() can be a character(1) or
Seqinfo object.
DEPRECATED and DEFUNCT
- Defunct dbSNPFilter(), regionFilter() and MatrixToSnpMatrix().
- Deprecate readVcfLongForm().
BUG FIXES
- Fix bug in compatibility of read/writeVcf() when no INFO are columns
present.
- Fix bug in locateVariants() when 'features' has no txid and cdsid.
- Fix bug in asVCF() when writing header lines.
- Fix bug in "expand" methods for VCF to handle multiple 'A'
columns in info().
VariantAnnotation 1.6.0
NEW FEATURES
- VCF is now VIRTUAL. Concrete subclasses are CollapsedVCF
and ExpandedVCF.
- Add filterVcf() generic and methods for character and TabixFile.
This method creates one VCF file from another, using FilterRules.
- Enhance show,VCF method with header information.
- Stephanie Gogarten added genotypeToSnpMatrix() generic and
CollapsedVCF and matrix methods.
- Chris Wallace added snpSummary() generic and CollapsedVCF
method.
- Add cbind and rbind for VCF objects.
MODIFICATIONS
- writeVcf,connection-method allows writing to console and appending.
- writeVcf,connection-method accepts connections with open="a",
only adding a header if the file does not already exist.
- predictCoding and genotypeToSnpMatrix can now handle
ALT as CharacterList. Structural variants are set to
empty character ("").
- When no INFO data are present in a vcf file, the info()
slot is now an empty DataFrame. Previously an empty column
named 'INFO' was returned.
- Empty VCF class now has an empty VCFHeader
- expand,CollapsedVCF-method expands 'geno' data with Number=A.
- VCF class accessors "fixed", "info" now return DataFrame instead
of GRanges. "rowData" returns fixed fileds as the mcols.
- Updates to the vignette.
DEPRECATED and DEFUNCT
- Deprecate dbSNPFilter() and regionFilter().
- Deprecate MatrixToSnpMatrix().
BUG FIXES
- Multiple bugs fixed in "locateVariants".
- Multiple bugs fixed in "writeVcf".
- Bug fixed in subsetting of VCF objects.
- Bug fixed in "predictCoding" related to QUERYID column not
mapping back to original indices (rows).
VariantAnnotation 1.4.0
NEW FEATURES
- "summarizeVariants" for summarizing counts by sample
- new VariantType 'PromoterVariants()' added to "locateVariants"
MODIFICATIONS
- "ref", "alt", "filt" and "qual" accessors for VCF-class now return
a single variable instead of GRanges with variable as metadata
VariantAnnotation 1.2.0
NEW FEATURES
- "readVcf" has genome argument, can be subset on ranges or VCF elements
with "ScanVcfParam"
- "scanVcfHeader" returns VCFHeader class with accessors fixed, info, geno,
etc.
- "writeVcf" writes out a VCF file from a VCF class
- "locateVariants" methods
- returns GRanges instead of DataFrame
- 'region' argument allows specification of variants by region
- output includes txID, geneID and cdsID
- has cache argument for repeated calls over multiple vcf files
- "predictCoding" methods
- returns GRanges instead of DataFrame
- output includes txID, geneID, cdsID,
cds-based and protein-based coordinates
VariantAnnotation 1.0.0
NEW FEATURES
- "readVcf" methods for reading and parsing VCF files into a SummarizedExperiment
- "locateVariants" and "predictCoding" for identifying amino acid coding
changes in nonsynonymous variants
- "dbSNPFilter" and "regionFilter" for filtering variants on membership in
dbSNP or on a particular location in the genome
- access to PolyPhen and SIFT predictions through "keys" , "cols" and
"select" methods. See ?SIFT or ?PolyPhen.
BUG FIXES
- No changes classified as 'bug fixes' (package under active
development)