Replace read.table and write.table with much faster data.table::fread and data.table::fwrite in functions that convert to and from text. The exception is createAffyIntensityFile and checkIntensityFile(affy.inten=TRUE), which still use read.table to preserve the behavior of removing lines commented with the "#" character.
Add GenotypeIterator and GenotypeBlockIterator classes. These classes allow returning blocks of SNPs with each call to iterateFilter.
Add a function to coerce a GenotypeData object to a VCF object for use with VariantAnnotation.
Move ncdf4 from Imports to Suggests. Users who wish to use NetCDF files instead of GDS will have to install the ncdf4 package separately. This change eliminates the requirement to install the NetCDF library on Linux machines for users who plan to use GDS only.
Replace ZIP_RA with LZMA_RA for GDS compression.
Default is no compression for genotypes.
Speed up corr.by.snp in duplicateDiscordance.
Replace ncdf with ncdf4
Deprecate plinkToNcdf and convertVcfGds (use SNPRelate functions instead)
Add function kingIBS0FSCI to define expected IBS0 spread of full siblings based on allele frequency.
do not compute qbeta for all points in qqPlot if thinning
add error handling to close GdsGenotypeReader and GdsIntensityReader gds files if they fail the validity method check
Use ZIP_RA as default compression in GDS files for faster access to compressed data
bug fix in checkImputedDosageFile if not writing a log file of missing values and an entire sample is missing from the file
bug fix for coloring truncated points in manhattanPlot
added support for hard-calling genotypes from imputed genotype probabilities in imputedDosageFile
changed colors for ibdPlot
added permute argument to exactHWE
allow multiple color schemes for plots color-coded by genotype
pedigreePairwiseRelatedness identifies great grandparent/great grandchild (GGp) and grand avuncular (GAv)
allow character scanID in createDataFile
added col argument to manhattanPlot
createDataFile converts non-finite values to NA.
alleleFrequency includes scans with missing sex.
Added option to reorder samples in vcfWrite.
Added option to read genotypes coded with nucleotides in createDataFile.
Added beta and standard error for GxE term to assocRegression output.
Added number of cases and controls to assocRegression output.
Added "ci" argument to qqPlot function.
Added "snpID" and "scanID" arguments to getGenotypeSelection.
Added getScanAnnotation, getSnpAnnotation accessors for GenotypeData objects.
Added data tables for genome build 38: centromeres.hg38.RData, pseudoautosomal.hg38.RData, HLA.hg38.RData, pcaSnpFilters.hg38.RData.
convertGdsNcdf works for transposed (sample x snp) genotype files.
Removed "outfile" arguments from batchChisqTest, batchFisherTest, and mendelErr. Saving output to a file should happen outside the function calls.
batchChisqTest and batchFisherTest have snp.include arguments to run on individual SNPs. Using batchFisherTest with this argument is recommended to replace the deprecated assocTestFisherExact.
assocRegression replaces assocTestRegression. Only one model is allowed per function call.
assocCoxPH replaces assocTestCPH. Output format is now similar to assocRegression.
exactHWE replaces gwasExactHWE.
assocRegression, assocCoxPH, and exactHWE include the option to select blocks of SNPs by index for easier parallelization.
scan.chromosome.filter is no longer an option; use setMissingGenotypes to filter data prior to running other functions.
Add use.names argument to getGenotype and getGenotype selection
Add order=c("file", "selection") argument to getGenotypeSelection
duplicateDiscordanceAcrossDatasets and dupDosageCorAcrossDatasets will not match on unmapped SNPs
Add drop=TRUE argument to getVariable, etc.
Add dupDosageCorAcrossDatasets.
Add getGenotypeSelection method to MatrixGenotypeReader.
Add getGenotypeSelection to select non-continguous SNPs and scans from GDS files.
Bug fix in imputedDosageFile - for IMPUTE2, include columns from .samples file in output scan annotation.
Allow getting variables from sub-nodes in a GDS file (e.g., getVariable(GdsReader, "snp.annot/qual")).
Add getNodeDescription method to GdsReader.
Added examples of converting from PLINK and VCF in Formats vignette.
imputedDosageFile replaces ncdfImputedDosage and gdsImputedDosage
setMissingGenotypes replaces ncdfSetMissingGenotypes and gdsSetMissingGenotypes
convertNcdfGds and convertGdsNcdf will convert files with any variable names (not just genotype)
Fixed bug in vcfWrite to output missing data code for ID column
Data cleaning vignette uses createDataFile instead of ncdfCreate and ncdfAddData
Data cleaning vignette uses snpgdsOpen and snpgdsClose
Fixed bug in assocTestCPH when there is no Y chromosome in the data
convertNcdfGds will not write entire snp and sample annotations to file
createDataFile replaces ncdfCreate and ncdfAddData
patch from Karl Forner to allow use of open gds objects in constructors for GdsReader and GdsGenotypeReader
removed duplicated .probToDosage function from ncdfImputedDosage.R source file
expanded matching options in duplicateDiscordanceAcrossDatasets
allowed truncate to be a numeric value or TRUE in qqPlot
added pasteSorted function
in case of missing allele code, return character genotype as NA
bug fix in assocTestRegression when a block contains only 1 SNP
added vcfCheck function to compare VCF file to GenotypeData object
bug fix in gwasExactHW when a block contains only 1 genotype
changed colors of BAF plots so points can be more easily distinguished
added ref.allele option to vcfWrite to select either A or B as the reference allele for each SNP
added vcfWrite function to write VCF file from GenotypeData object
bug fix in qqPlot, manhattanPlot when requesting thinning when bins only have 1 point
added pointsPerBin argument to manhattanPlot
added optional thinThreshold argument to manhattanPlot and qqPlot functions
updated gdsSubset for new gdsfmt read.gdsn syntax (also changed in release version)
Added ylim argument to qqPlot.
added block size support for GDS files stored with scan,snp dimensions
gdsSubset and gdsSubsetCheck now operate on the fastest dimension of the GDS file
updates/bug fixes to gdsSubset/gdsSubsetCheck - different missing value attributes may be set if sub.storage type is different.
Added gdsSubset and gdsSubsetCheck functions to make a subset GDS file that includes only specified SNPs and scans from an existing GDS file
Updated gdsImputedDosage and gdsCheckImputedDosage to account for IMPUTE2 gprobs files that have missing values (specified by three equal probability strings)
Updated gdsCheckImputedDosage to produce optional logfile reporting any missing genotypes
Revised anomFilterBAF - fewer centromere spanning anomalies that aren't real, corrects some merging issues (previously it would merge sections that really were different split widths). Users should be aware that this will increase running time.
Remove defunct functions.
Improve efficiency of gwasExactHW, mendelErr, assocTestRegression (reduce number of calls to rbind).
Bug fix in getChromosome method for SnpAnnotationDataFrame (proper behavior of unnamed "index" argument).
Added gdsImputedDosage function.
GdsGenotypeReader can return transposed genotypes.
ScanAnnotationDataFrame and ScanAnnotationSQLite allow non-integer scanID.
Fix getAlleleA and getAlleleB in GdsGenotypeReader to work with indels.
Documentation now located in vignettes/ folder.
Added ibdAssignRelationshipsKing.
Added support for genotype GDS files with scan x snp dimensions in GdsGenotypeReader.
Added gdsSetMissingGenotypes, updated argument names in ncdfSetMissingGenotypes.
Changed colorscheme in manhattanPlot.R.
Bug fix in ibdPlot - diagonal orange bars are back.
Bug fix in plinkWrite for writing just one sample.
Bug fix in printing pedigreeCheck error message.
Changed handling of GxE interaction variables in assocTestRegression.
Updated vignette for SNPRelate 0.9.16.
gwasExactHW will run on all chromosomes except (Y,M), rather than (autosome,X,XY) only.
More informative error messages in anomDetectBAF and anomDetectLOH.
Changed labeling of IBD plots from "HS" to "Deg2" and "FC" to "Deg3."
Bug fix in pedigreePairwiseRelatedness - no more warning about multiple values passed to stringsAsFactor.
pedigreeClean and pedigreeFindDuplicates are now defunct. Use pedigreeCheck instead.
assocTestRegression computes allele counts separately for each model.
convertNcdfGds uses information from a SnpAnnotationDataFrame to store allele and chromosome codes in the GDS file.
Adding missing value support to GdsReader.
Fixed bug in getAttribute method for GdsReader.
Updated GdsReader for compatibility with gdsfmt 0.9.11 (no longer compatible with older versions).
Fixed bug in genotypeToCharacter that resulted in calls to getGenotype(char=TRUE) for a single SNP to return NA.
Renamed minorAlleleSensitivitySpecificity to minorAlleleDetectionAccuracy and added additional output.
Added function minorAlleleSensitivitySpecificity.
Deprecated pedigreeClean and pedigreeFindDuplicates. pedigreeCheck now encompasses all pedigree checks and should be used instead.
Added pedigreeMaxUnrelated to find the maximum set of unrelated members of a pedigree.
Added additional output column "MAF" to matrix returned by alleleFrequency.
Removed hard-coding of autosomes as 1:22; can now set a vector of integer codes corresponding to autosomes with "autosomeCode" argument at object creation and retrieve with "autosomeCode" methods. This change makes GWASTools compatible with non-human organisms.
Added option to duplicateDiscordanceAcrossDatasets to count missing data as discordance.
Added option to start axes of genoClusterPlot at 0.
Removed "alleleA.col" and "alleleB.col" options from plink functions, as "alleleA" and "alleleB" are now standard names.
Added "getAlleleA" and "getAlleleB" methods to GdsGenotypeReader.
Added "getDimension" method to NcdfReader.
Added "getAlleleA" and "getAlleleB" methods to SnpAnnotation* and GenotypeData objects.
Added genotypeToCharacter function to convert genotypes from number of A alleles to A/B format.
getGenotype for GenotypeData has option char=TRUE to return character genotypes in A/B format.
Added option to duplicateDiscordanceAcrossDatasets to calculate minor allele discordance.
Added convertVcfGds to extract bi-allelic SNPs from a VCF file.
Added ncdfImputedDosage to convert output from common imputation programs to NetCDF. assocTestRegression has an additional argument dosage=TRUE to be used with these files.
Added vignette describing GWASTools data structures.
Bug fix in pedigreePairwiseRelatedness related to use of character identifiers.
assocTestRegression returns NA for snps where cases or controls are monomorphic, added assocTestFisherExact to use in that case.
Added snp.exclude argument to pseudoautoIntensityPlot.
Bug fix in messages reporting file read times when creating or checking netCDF files.
Added vignette on converting VCF to NetCDF with annotation.
Prevent duplicateDiscordance from checking correlation by SNP in cases of no variation.
Added GdsReader and GdsGenotypeReader classes with dependency on gdsfmt. GenotypeData objects can also be created with GdsGenotypeReader objects in the "data" slot.
Fixed bug in duplicateDiscordance when Y chromosome is not included.
Fixed bug in chromIntensityPlot so ideogram scales correctly if SNPs are excluded.
Fixed bug in assocTestCPH that could lead to false positives if additive model failed but GxE model did not.
Allow multiple variables for stratified analysis in assocTestCPH.
Pedigree functions accept non-numeric identifiers and provide additional output.
In batchChisqTest, Yates correction cannot be bigger than the terms it corrects. Changed to match bug fix to chisq.test in R 2.15.1.
Removed automatic subtitle from qqPlot.
Allow selection of theoretical boundaries to draw in ibdPlot.
Added function asSnpMatrix to convert a GenotypeData object to a SnpMatrix object for use with snpStats.
Added chromosome ideograms to chromIntensityPlot and anomStatsPlot. anomStatsPlot has an option to put multiple anomalies on the same plot.
Updated vignette.
Use lazy loading of data.
manhattanPlot and snpCorrelationPlot accept character vectors of chromosome; chrom.labels argument no longer used.
close method of NcdfReader returns invisibly.
anomSegStats checks for SNPs in centromere gaps.
anomStatsPlot has option to plot LRR/BAF individually (for greater flexibility in layout).
Updates to arguments for plot titles in chromIntensityPlot, anomStatsPlot, and pseudoautoIntensityPlot for consistency.
plinkCheck has map.alt argument to override default GenotypeData -> PLINK annotation conversion.
Updated positions of pseudoautosomal regions.
Added plinkToNcdf to convert PLINK files to NetCDF for use in GWASTools.
chromIntensityPlot and pseudoautoIntensityPlot have cex=0.5 by default.
chromIntensityPlot colors now match anomStatsPlot colors.
plinkCheck has options to skip checking parents and sex.
plinkCheck sorts alleles by character to avoid phase mismatches.
plinkWrite and plinkCheck print progress messages if verbose=TRUE.
duplicateDiscordance and duplicateDiscordanceAcrossDatasets use only one pair of scans per subject by default.
duplicateDiscordanceProbability sets small negative values to 0.
duplicateDiscordance has an option to compute correlation by SNP.
Added scan.exclude argument to plinkCheck.
Added ncdfSetMissingGenotypes function.
plinkCheck now writes a log file with all mismatches found.
duplicateDiscordance excludes Y chrom SNPs for females.
duplicateDiscordance has an option to consider only pairs involving the minor allele.
batchChisqTest and batchFisherTest now return n results for n batches even if n=2.
batchFisherTest has return.by.snp=FALSE as default.
Added LR tests to assocTestRegression.
Bug fix in calculation of mean odds ratio in batchFisherTest.
Bug fix in missingGenotypeByScanChrom for data sets with only one female.
Added functions plinkWrite and plinkCheck for writing and checking PLINK ped and map files.
Added pcaSnpFilters data set for identifying regions with high PC-SNP correlation.