NEWS
SeqArray 1.46.0
UTILITIES
- ‘seqGetData()' return NULL, if ’var.name=character()'
SeqArray 1.44.3
UTILITIES
- update the C codes according to '_R_USE_STRICT_R_HEADERS_=true' &
'_R_CXX_USE_NO_REMAP_=true'
SeqArray 1.44.2
BUG FIXES
- fix 'seqAddValue(, val=vector("list", NUM_VARIANT))'
- fix the ploidy returned from 'seqVCF_Header()', when there are genotypes
of males and females on Chromosome X
SeqArray 1.44.1
UTILITIES
- new option 'numvariant' in 'seqEmptyFile()'
BUG FIXES
- 'seqMerge()' should internally use "chr_position_ref_alt" to distinguish
the variants in different files
- 'seqAddValue(, varnm="annotation/filter")' should work with a factor
variable
- 'seqAddValue(, varnm="variant.id")' can reset the variant IDs with a
different number of the variants
SeqArray 1.44.0
UTILITIES
- tweak the display of progress information in 'seqVCF2GDS()'
- 'seqVCF_Header(, getnum=TRUE, verbose=TRUE)' to show the progress
information for scanning the VCF file
- new 'seqGetData(, "$dosage_alt2")' and 'seqGetData(, "$dosage_sp2")' for
sex chromosomes, when the alleles are partially missing (e.g., genotypes
on chromosome X for males)
- new 'verbose.clean' in 'seqExport()' to control how much information to
be displayed
SeqArray 1.42.4
BUG FIXES
- 'seqGetData(, "$dosage_alt")' and 'seqGetData(, "$dosage_sp")' work
correctly when the ploidy is >2 and there are missing alleles
- fix a bug that ‘seqParallel()' does not call a user-defined ’.combine'
when 'parallel=1'
SeqArray 1.42.1
UTILITIES
- update the help files of 'seqBlockApply()' and 'seqUnitApply()'
- detect the output filename extension in 'seqGDS2VCF()' without
considering the case of the characters, supporting .gz, .bgz, .bz and .xz
as a filename extension
- fix the compiler warning: -Wformat-security
- new option 'include.pheno=TRUE' in 'seqBED2GDS()'
SeqArray 1.42.0
UTILITIES
- new option 'write.rsid' in 'seqGDS2BED()'
SeqArray 1.40.1
BUG FIXES
- 'seqAddValue(gdsfile, varnm="position")' works correctly
SeqArray 1.40.0
- fix the compiler warning: sprintf is deprecated
SeqArray 1.38.0
UTILITIES
- new option 'ext_nbyte' in 'seqGet2bGeno()'
- 'seqAlleleCount()' and 'seqGetAF_AC_Missing()' return NA instead of zero
when all genotypes are missing at a site
- 'seqGDS2VCF()' does not output the FORMAT column if there is no selected
sample (e.g., site-only VCF files)
- 'seqGetData(, "$chrom_pos2")' is similar to 'seqGetData(, "$chrom_pos")'
except the duplicates with the suffix ("_1", "_2" or >2)
NEW FEATURES
- 'seqGDS2BED()' can convert to PLINK BED files with the best-guess
genotypes when there are only numeric dosages in the GDS file
- 'seqEmptyFile()' outputs an empty GDS file
SeqArray 1.36.2
BUG FIXES
- fix the bug at multi-allelic sites with more than 15 different alleles,
see https://github.com/zhengxwen/SeqArray/issues/78
SeqArray 1.36.1
BUG FIXES
- 'seqExport()' failed when there is no variant
- 'seqSetFilter(, ret.idx=TRUE)', see
https://github.com/zhengxwen/SeqArray/issues/80
SeqArray 1.36.0
NEW FEATURES
- new functions 'seqUnitCreate()', 'seqUnitSubset()' and 'seqUnitMerge()'
- new functions 'seqFilterPush()' and 'seqFilterPop()'
- new functions 'seqGet2bGeno()' and 'seqGetAF_AC_Missing()'
- new function 'seqGetData(, "$dosage_sp")' for a sparse matrix of dosages
- the first argument 'gdsfile' can be a file name in 'seqAlleleFreq()',
'seqAlleleCount()', 'seqMissing()'
- new function ‘seqMulticoreSetup()' for setting a multicore cluster
according to a numeric value assigned to the argument ’parallel'
UTILITIES
- allow opening a duplicated GDS file ('allow.duplicate=TRUE') when the
input is a file name instead of a GDS object in 'seqGDS2VCF()',
'seqGDS2SNP()', 'seqGDS2BED()', 'seqVCF2GDS()', 'seqSummary()',
'seqCheck()' and 'seqMerge()'
- remove the deprecated '.progress' in 'seqMissing()', 'seqAlleleCount()'
and 'seqAlleleFreq()'
- add 'summary.SeqUnitListClass()'
- no genotype and phase data nodes from 'seqSNP2GDS()' if SNP dosage GDS
is the input
BUG FIXES
- ‘seqUnitApply()' works correctly with selected samples if ’parallel' is
a non-fork cluster
- 'seqVCF2GDS()' and 'seqVCF_Header()' work correctly if the VCF header has
white space
- 'seqGDS2BED()' with selected samples for sex and phenotype information
- ‘seqGDS2VCF()' failed if there is no ’genotype/data' in the GDS file
SeqArray 1.32.0
NEW FEATURES
- new option 'ret.idx' in 'seqSetFilter()' for unsorted sample and variant
indices
- new option 'ret.idx' in 'seqSetFilterAnnotID()' for unsorted variant
index
- rewrite the function ‘seqSetFilterPos()': new options ’ref' and 'alt',
'multi.pos=TRUE' by default
- new option 'packed.idx' in 'seqAddValue()' for packing an indexing
variable
- new option 'warn' in 'seqSetFilter()' to enable or disable the warning
- new functions 'seqNewVarData()' and 'seqListVarData()' for
variable-length data
UTILITIES
- allow no variant in 'seqApply()' and 'seqBlockApply()'
- the list object returned from 'seqGetData()' always have names if there
are more than one input variable names
BUG FIXES
- 'seqGDS2VCF()' should output "." instead of NA in the FILTER column
- ‘seqGetData()' should support factor when ’.padNA=TRUE' or '.tolist=TRUE'
- fix 'seqGDS2VCF()' with factor variables
- ‘seqSummary(gds, "$filter")' should return a data frame with zero row if
’annotation/filter' is not a factor
SeqArray 1.30.0
UTILITIES
- show a warning when an unsorted index is used in 'seqSetFilter()'
- show a message if 'seqVCF_Header()' fails
- a new option 'chr_prefix' in 'seqGDS2VCF()'
BUG FIXES
- ‘seqVCF_Header()' fixes ’contig' in the header of VCF if there are
different fields
SeqArray 1.28.1
BUG FIXES
- 'seqRecompress(, verbose=FALSE)' works correctly
- 'seqSetFilter(, action="push+set")' should not reset the filter
before setting a new filter
SeqArray 1.28.0
NEW FEATURES
- new functions 'seqUnitSlidingWindows()', 'seqUnitApply()',
'seqUnitFilterCond()'
- new variable "$variant_index", "$sample_index" in 'SeqGetData()',
'seqBlockApply()' and 'seqUnitApply()' to get the indices of selected
variants
- new arguments '.padNA' and '.envir' in 'seqGetData()'
- new functions 'seqSetFilterAnnotID()' and 'seqGDS2BED()'
- multicore function in 'seqBED2GDS(, parallel=)'
- new package-wide option 'options(seqarray.nofork=TRUE)' to disable forking
- new option 'minor' in 'seqAlleleFreq()' and 'seqAlleleCount()'
- new option 'verbose' in ‘seqMissing()', 'seqAlleleFreq()' and
'seqAlleleCount()'; ’.progress' is deprecated, but still can be used
for compatiblity
- ‘seqAlleleFreq()', 'seqAlleleCount()', 'seqMissing()',
'seqSetFilterCond()' work on ’annotation/format/DS', if 'genotype/data'
is not available
UTILITIES
- 'seqAddValue()' adds vectors, matrices and data frame to "annotation/info"
- 'seqBED2GDS()' allows a single file name without the extended file names
(.bed, .fam, .bim)
- allele flip in 'seqBED2GDS()' to allow major allele to be reference
- rewrite 'seqGetData()' for faster loading
- significantly improve ‘seqBlockApply()' on ’annotation/info/VARIABLE'
(https://github.com/zhengxwen/SeqArray/issues/59)
- add a S3 method 'print.SeqVCFHeaderClass()' for 'seqVCF_Header()'
- new option '.tolist' in 'seqGetData()', 'seqBlockApply()' and
'seqUnitApply()'
- String "." in a VCF file are converted to a blank string (missing value)
in 'seqVCF2GDS()'
- add a class name 'SeqVarDataList' to the returned 'list(length, data)'
from 'seqGetData()'
- new option 'seqMissing(, per.variant=NA)'
- add 'comment.char=""' to 'seqBED2GDS()'
SeqArray 1.26.2
NEW FEATURES
- multiple variable names are allowed in 'seqGetData(, var.name=)'
BUG FIXES
- fix 'seqGetData(, "genotype", .useraw=NA)'
(https://github.com/zhengxwen/SeqArray/issues/58)
SeqArray 1.26.1
BUG FIXES
- fails to correctly select duplicate indices in
'seqSetFilter(f, variant.sel=)'
SeqArray 1.26.0
NEW FEATURES
- new function 'seqAddValue()'
UTILITIES
- RLE chromosome coding in 'seqBED2GDS()'
- change the file name "vignettes/R_Integration.Rmd" to
"vignettes/SeqArray.Rmd", so 'vignette("SeqArray")' can work directly
- correct Estimated remaining Time to Complete (ETC) for load balancing in
'seqParallel()'
BUG FIXES
- 'seqBED2GDS(, verbose=FALSE)' should have no display
CHANGES
- use a svg file instead of png in vignettes
SeqArray 1.24.2
NEW FEATURES
- add the compiler information in 'seqSystem()'
- new arguments '.balancing', '.bl_size' and '.bl_progress' in
'seqParallel()' for load balancing
UTILITIES
- improve unix forking processes for load balancing in 'seqParallel()'
BUG FIXES
- fix 'seqSummary()' when no phase data
SeqArray 1.24.0
NEW FEATURES
- a new function 'seqResetVariantID()'
- a new option in 'seqRecompress(, compress="none")' to uncompress all data
- 'seqGetData()' allows a GDS file name in the first argument
SeqArray 1.22.6
BUG FIXES
- 'seqSetFilter(, sample.id=)' fails to correctly select samples in a few
cases (since SeqArray>=v1.22.0 uses the distribution of selected samples
to optimize the data access of genotypes, see
https://github.com/zhengxwen/SeqArray/issues/48)
- the bgzf VCF file is truncated in 'seqGDS2VCF()' since the file is not
closed appropriately
- invalid chromosomes and position in the output of 'seqMerge()' when
merging different samples but same variants
SeqArray 1.22.3
NEW FEATURES
- a new option 'scenario' in 'seqVCF2GDS()' and 'seqBCF2GDS()'
UTILITIES
- more information in 'seqDelete()'
BUG FIXES
- export a haploid VCF file using 'seqGDS2VCF()'
- export VCF without any FORMAT data in 'seqGDS2VCF()'
- export GDS without genotypes in 'seqExport()'
- fix parallel file writing in seqVCF2GDS(), when no genotype
SeqArray 1.22.0
NEW FEATURES
- 'seqSNP2GDS()' imports dosage GDS files
- 'seqVCF_Header()' allows a BCF file as an input
- a new function 'seqRecompress()'
- a new function 'seqCheck()' for checking the data integrity of a SeqArray
GDS file
- 'seqGDS2SNP()' exports dosage GDS files
UTILITIES
- avoid duplicated meta-information lines in 'seqVCF2GDS()' and
'seqVCF_Header()'
- require >= R_v3.5.0, since reading from connections in text mode is
buffered
- 'seqDigest()' requires the digest package
- optimization in reading genotypes from a subset of samples (according to
gdsfmt_1.17.5)
BUG FIXES
- 'seqVCF2GDS()' and 'seqVCF_Header()' are able to import site-only VCF
files (i.e., VCF with no sample)
- fix 'seqVCF2GDS()' and 'seqBCF2GDS()' since reading from connections in
text mode is buffered in R >= v3.5.0
SeqArray 1.20.1
BUG FIXES
- 'seqExport()' fails to export haploid data (e.g., Y chromosome)
- 'seqVCF2GDS()' fails to convert INFO variables when Number="R"
SeqArray 1.20.0
NEW FEATURES
- 'seqGDS2VCF()' outputs a bgzip vcf file for tabix indexing
- two more options "Ultra" and "UltraMax" in 'seqStorageOption()'
- '@chrom_rle_val' and '@chrom_rle_len' are added to a GDS file for
faster chromosome indexing
- new function 'seqBCF2GDS()' (requiring the software bcftools)
- new function 'seqSetFilterPos()'
- new variable "$dosage_alt" in 'seqGetData()' and 'seqApply()'
- import VCF files with no GT in 'seqVCF2GDS()'
UTILITIES
- 'seqDigest(f, "annotation/filter")' works on a factor variable
- improve the computational efficiency of 'seqMerge()' to avoid genotype
recompression by padding the 2-bit genotype array in bytes
- significantly improve 'seqBlockApply()' (its speed is close to
'seqApply()')
- reduce the overhead in 'seqSetFilter(, variant.sel=...)'
SeqArray 1.18.2
BUG FIXES
- fix an issue: 'seqSetFilterChrom()' extends a genomic range
upstream and downstream 1bp
- use '.onLoad()' instead of '.onAttach()' to fix
https://support.bioconductor.org/p/104405/#104443
SeqArray 1.18.0
NEW FEATURES
- progress information: showing overall running time when completed
- new variable names "$ref" and "$alt" can be used in 'seqGetData()' and
'seqBlockApply()'
- new argument '.progress' in 'seqDigest()'
- new argument 'ref.allele' in 'seqAlleleCount()'
- new variable name "$chrom_pos_allele" can be used in 'seqGetData()' and
'seqBlockApply()'
UTILITIES
- move VariantAnnotation to the suggest field from the import field
- remove an unused argument '.list_dup' in 'seqBlockApply()'
- slightly improve the computational efficiency of ‘seqAlleleFreq()' and
'seqAlleleCount()' when ’ref.allele=0'
- ‘seqGetData(f, "$chrom_pos")' outputs characters with the format
’chromosome:position' instead of 'chromosome_position'
BUG FIXES
- fix the unexpected behaviors in 'seqSetFilter(, action="push")' and
'seqSetFilter(, action="push+intersect")'
- fix a bug in 'seqGetData(f, "$dosage")' when the number of unique alleles
at a site greater than 3 (https://github.com/zhengxwen/SeqArray/issues/21)
- fix a bug in 'seqSNP2GDS()' for inverted genotypes during importing data
from SNP GDS files (https://github.com/zhengxwen/SeqArray/issues/22)
- fix an issue of no phase data in 'seqExport()'
SeqArray 1.16.0
- a new argument 'intersect' in 'seqSetFilter()' and 'seqSetFilterChrom()'
- a new function 'seqSetFilterCond()'
- 'seqVCF2GDS()' allows arbitrary numbers of different alleles if REF and
ALT in VCF are missing
- optimize internal indexing for FORMAT annotations to avoid reloading
the indexing from the GDS file
- a new CITATION file
- 'LZMA_RA' is the default compression method in 'seqBED2GDS()' and
'seqSNP2GDS()'
- 'seqVCF_Header()' correctly calculates ploidy with missing genotypes
SeqArray 1.14.1
- The default compression setting in 'seqVCF2GDS()' and 'seqMerge()' is
changed from "ZIP_RA" to "LZMA_RA"
- 'seqVCF2GDS()': variable-length encoding method is used to store
integers in the FORMAT field of VCF files to reduce the file size and
compression time
SeqArray 1.12.9
- the version number was bumped for the Bioconductor release version 3.3
- 'seqVCF_SampID()', 'seqVCF_Header()' and 'seqVCF2GDS()' allow a
connection object instead of a file name
- "$num_allele" is allowed in 'seqGetData()' and 'seqApply()' (the numbers
of distinct alleles)
- a new option '.progress' in 'seqAlleleFreq()', 'seqMissing()' and
'seqAlleleCount()'
- 'as.is' can be a 'gdsn.class' object in 'seqApply()'
- v1.12.7: a new argument 'parallel' in 'seqApply()', BiocParallel
integration in 'seqParallel()' and a new function 'seqBlockApply()'
- v1.12.8: a new function 'seqGetParallel()'
SeqArray 1.12.0
- utilizes the official C API 'R_GetConnection()' to accelerate text
import and export, requiring R (>=v3.3.0); alternative version (backward
compatible with R_v2.15.0) is also available on github
(https://github.com/zhengxwen/SeqArray/releases/tag/v1.11.18)
- ~4x speedup in the sequential version of 'seqVCF2GDS()', and
'seqVCF2GDS()' can run in parallel
- variables in "annotation/format/" should be two-dimensional as what
mentioned in the vignette.
- rewrite 'seqSummary()'
- a new vignette file with Rmarkdown format (replacing SeqArray-JSM2013.pdf)
- bug fix in 'seqBED2GDS()' if the total number of genotypes > 2^31
(integer overflow)
- bug fixes in 'seqMerge()' if chromosome and positions are not unique
- 'seqStorage.Option()' is renamed to 'seqStorageOption()'
- new function 'seqDigest()'
- 'seqVCF.Header()' is renamed to 'seqVCF_Header()',
'seqVCF.SampID()' is renamed to 'seqVCF_SampID()'
- seqSetFilter(): 'samp.sel' is deprecated since v1.11.12, please use
'sample.sel' instead
- accelerate reading genotypes with SSE2(+13%) and AVX2(+23%)
- new function 'seqSystem()'
- allow "$dosage" in 'seqGetData()' and 'seqApply()' for the dosages of
reference allele
- accelerate 'seqSetFilterChrom()' and allow a selection with
multiple regions
- new methods '\S4method{seqSetFilter}{SeqVarGDSClass, GRanges}()' and
'\S4method{seqSetFilter}{SeqVarGDSClass, GRangesList}()'
- 'as.is' in ‘seqApply()' allows a ’connection' object (created by file,
gzfile, etc)
- 'seqSummary(f, "genotype")$seldim' returns a vector with 3 integers
(ploidy, # of selected samples, # of selected variants) instead of
2 integers
SeqArray 1.10.6
- fix a memory issue in ‘seqAlleleFreq()' when ’ref.allele' is a vector
- ‘seqSetFilter()' allows numeric vectors in ’samp.sel' and 'variant.sel'
- 'seqSummary()' returns ploidy and reference
- 'seqStorage.Option()' controls the compression level of FORMAT/DATA
- ‘seqVCF2GDS()' allows extract part of VCF files via ’start' and 'count'
- 'seqMerge()' combines multiple GDS files with the same samples
- export methods for compatibility with VariantAnnotation
- a new argument '.useraw' in 'seqGetFilter()'
- a new argument 'allow.duplicate' in 'seqOpen()'
- fix a bug in 'seqParallel()'
(https://github.com/zhengxwen/SeqArray/issues/11) and optimize
its performance
- 'gdsfile' could be NULL in 'seqParallel()'
SeqArray 1.10.0
- a new function 'seqGDS2SNP()'
- supported by the SNPRelate package
- support 'seqApply(..., margin="by.sample")'
- new functions 'seqOptimize()', 'seqMissing()', 'seqAlleleFreq()',
'seqNumAllele()' and 'seqSetFilterChrom()'
- "intersection" and "push+intersection" in 'seqSetFilter()'
- parallel implementation in 'seqNumAllele()', 'seqMissing()' and
'seqAlleleFreq()'
- a new function 'seqExport()'
- new argument ".useraw" in 'seqApply()'
- fix a bug for duplicated "variant.id",
https://github.com/zhengxwen/SeqArray/issues/7
- fix an issue of 'seqVCF2GDS()' when there are duplicated format or
info ID
- improve access speed (+50%, benchmark on calling
seqApply(..., FUN=function(x) {}))
- new functions 'seqSNP2GDS()', 'seqBED2GDS()', 'seqAlleleCount()' and
'seqResetFilter()'
- 'seqCompress.Option()' is renamed to 'seqStorage.Option()'
- "ZIP_RA" is the default value in 'seqStorageOption()' and other
functions instead of "ZIP_RA.max"
- 'seqSetFilter()' becomes a S4 method
SeqArray 1.8.0
- bug fix in getting genotypes if position > 2^31
- add an option 'ignore.chr.prefix' to the function 'seqVCF2GDS()'
- 'seqVCF2GDS()' ignores the INFO or FORMAT variables if they are not
defined ahead
- a new action 'push+set' in the function 'seqSetFilter()'
- bug fix if 'requireNamespace("SeqArray")' is called from other packages
SeqArray 1.6.0
- fix a bug in 'seqVCF2GDS()' when the values in the FILTER column are
all missing
- enhance 'seqVCF.Header()'
- support the LinkingTo mechanism
- fix the error in haploid genotypes (Y chromosome)
SeqArray 1.4.0
- update according to the new version of VariantAnnotation
- update test codes to avoid the conflict
- bumped version as all packages that depend on Rcpp must be rebuilt
- modify to new biocViews to DESCRIPTION file
SeqArray 1.2.0
- add a new argument "action" to the function 'seqSetFilter()'
- add a new function 'seqInfoNewVar' which allows adding new variables
to the INFO fields
- minor bug fix in asVCF
- update man page "SeqVarGDSClass-class.Rd" with new methods
- in DESCRIPTION, BiocGenerics listed in "Suggests" instead of "Imports"
as suggested by R CMD check
- bug fix in seqDelete
- revise the function 'seqTranspose' according to the update of
gdsfmt (v1.0.0)
- revise the argument 'var.index' in the function 'seqApply()'
- basic supports of 'GRanges' and 'DNAStringSetList'
- added methods 'qual', 'filt', 'asVCF'
- 'granges' method uses length of reference allele to set width
- minor bug fix to avoid 'seqGetData()' crashing when no value returned
from a variable-length variable
- update documents
SeqArray 1.0.0
- the version number was bumped for the Bioconductor release version
SeqArray 0.99.0
- initial Bioconductor package submission