NEWS
SynExtend 1.25.1
- Large rewrite of gff import,
SummarizePairs and other functions, added EvaluatePairs function and some other ancillary functions.
SynExtend 1.23.1
- Large rewrite of
NucleotideOverlap to support accurate kmer hit tabulation for features with introns
SynExtend 1.21.2
- Deprecated consensus clustering removed from
ExoLabel
ExoLabel now supports argument header=c(FALSE, TRUE, integer(1L)) to skip the first 0,1,n lines of each file (respectively)
ExoLabel now supports gzip-compressed files as input
- Some examples in man pages have been made faster
SynExtend 1.21.1
- Internal code reorganization for
ExoLabel
- Further arguments to
SuperTree now correctly passthrough to Treeline
- Some deprecated arguments to
EvoWeaver have been removed
SynExtend 1.21.0
- First development version of Bioconductor 3.22
SynExtend 1.20.0
- Official Bioconductor 3.21 release
SynExtend 1.19.11
- Fixes bug where
ExoLabel wouldn't handle multiple files correctly
ExoLabel now reads in networks roughly twice as fast
ExoLabel now correctly skips negative weighted edges
- Various improvements to
ExoLabel output formatting
- Enables experimental tuning of distance scaling in
ExoLabel attenuation
SynExtend 1.19.10
- Fixes critical bug in
ExoLabel that previously caused large networks to always report a single large cluster
ExoLabel no longer allows negative weights to become positive via attenuation (positive weights can still become negative)
SynExtend 1.19.9
- Bugfixes to
ExoLabel
ExoLabel now defaults to use_fast_sort=TRUE
- Various QoL changes for
ExoLabel(..., return_table=TRUE)
ExoLabel now correctly reports timing
SynExtend 1.19.8
ExoLabel has a new parameter to tune the performance of hop-length attenuation.
- Documentation and formatting updates.
SynExtend 1.19.7
ExoLabel now uses hop-length attenuation to mitigate formation of massive communities.
ExoLabel no longer supports inflation, since attentuation does a better job handling this without introducing additional parameters.
ExoLabel will now print a lot less when running with verbose=TRUE in non-interactive mode -- just as informative, but less junk caused by lots of unrendered carriage returns.
SynExtend 1.19.6
- fixes multiple bugs in EstimRearrScen
SynExtend 1.19.5
- fixes bug in GeneVector.EvoWeaver that could affect DNA-based analyses
SynExtend 1.19.4
- fixes bug that prevented building on Windows
- adds multiple clustering support for ExoLabel
SynExtend 1.19.3
ExoLabel will no longer crash when given a network lacking a trailing newline
- Various internal improvements and code refinements
SynExtend 1.19.2
- Lots of bug fixes for
ExoLabel
ExoLabel now reports disk consumption during execution
SynExtend 1.19.1
ExoLabel is even faster due to in-place external sort for faster file I/O
- Other quality of life improvements to
ExoLabel
SynExtend 1.19.0
- First development version of Bioconductor 3.21
SynExtend 1.18.0
- Official Bioconductor 3.20 release
SynExtend 1.17.7
ExoLabel is much much faster and does a better job cleaning up when aborted early
ExoLabel now has fewer arguments
- Updates to man pages
SynExtend 1.17.6
- Lots of internal improvements to
ExoLabel to increase computational speed and decrease disk
usage.
ExoLabel will no longer crash if given relative paths.
- Adds more internal error checking to prevent some rare bugs.
- Updates man pages to reflect new changes.
- Updates
EstimateExoLabel to reflect new changes.
SynExtend 1.17.5
ExoLabel will no longer brick R during sorts on large files.
ExoLabel reports more progress during some lengthy processing sections when verbose=TRUE
- Known issue: "Copying source file" step is still non-interruptable, will be fixed in a later update
SynExtend 1.17.4
ExoLabel now allows an inflation argument to control application of inflation
SynExtend 1.17.3
predict.EvoWeaver now supports returning p-values separately from raw score for some algorithms.
- OpenMP implementation for
EvoWeaver algorithms that support it has been fixed
SynExtend 1.17.2
RandForest function added to train random forest models
- Associated man pages for
RandForest and DecisionTree objects
- New methods for
DecisionTree objects to plot and coerce to dendrogram
- Small bugfix to
subset.dendrogram
SynExtend 1.17.1
- Major updates to
EvoWeaver:
predict.EvoWeaver now returns a data.frame by default
Method arguments are updated to match their names in the associated EvoWeaver manuscript
- Above changes have propagated to documentation files
- New Phylogenetic Profiling methods with improved accuracy
- New meta-methods
PhylogeneticProfiling, PhylogeneticStructure, GeneOrganization, SequenceLevel for predict.EvoWeaver
- New pre-trained Ensemble models have been included
- Updates to
ExoLabel for better status printing
SynExtend 1.16.0
SynExtend 1.15.3
- Addition of
FastLabelOOM function to find communities in graphs/networks on disk space.
SynExtend 1.15.2
- Addition of
PrepareSeqs function, beginning the process of deprecating PairSummaries in favor of more cohesive and user friendly functions.
SynExtend 1.15.1
- Fixes bug in JRF distance causing scores to be higher than expected.
SynExtend 1.13.8
- Updates to all EvoWeaver documentation files
- Fixed small bug in
PhyloDistance causing Method='JRF' to return similarity rather than the distance
- Fixed small bug in
TreeDistance.EvoWeaver resulting in an inconsistent calculation of score when using TreeMethods='JRF'
SynExtend 1.13.7
SynExtend 1.13.6
ProtWeaver and ProtWeb have been renamed to EvoWeaver and EvoWeb, respectively
- New sequence level method for
EvoWeaver
- Various small internal updates to
EvoWeaver
SynExtend 1.13.5
- Minor changes to
SelectByK and vignette
SynExtend 1.13.4
- New predictor
PAPV.ProtWeaver to calculate p-values for presence/absence profiles.
ContextTree now uses MirrorTree with species tree correction and p/a overlap correction
- Updates to documentation
SynExtend 1.13.3
predict.ProtWeaver now supports multiple algorithms at once (ex. predict(ew, Method=c("Jaccard", "Hamming")))
- Documentation for
ProtWeaver and associated methods has been updated to match recent updates.
SynExtend 1.13.2
FastQFromSRR function added as a convenience wrapper for the SRAtoolkit function fastq-dump.
SynExtend 1.13.1
SuperTree now works directly with dist objects, providing better performance and scaling
- Updates to
simMat objects
- No longer throw a warning when initialized in RStudio
- Formatting is cleaner and supports larger object names
- Updates to
NVDC.ProtWeaver
- Now supports amino acid sequences using the
DNAseqs=FALSE argument
- Now calculates a p-value-weighted score
- Adds
MakeBlastDb function to create a BLAST database from R, plus associated documentation updates
- Smaller fixes to some
ProtWeaver methods
predict.ProtWeaver no longer returns using invisible (this was annoying and unneccessary)
- APC correction for
MutualInformation.ProtWeaver removed to allow for parallelization
MirrorTree.ProtWeaver now works correctly with MTCorrection="speciestree"
CorrGL.ProtWeaver now uses Fisher's Exact Test for p-values rather than the R value of spearman correlation
- Many internal performance improvements
ProtWeaver almost entirely uses dist objects rather than matrix, saving significantly on memory
- faster
Cophenetic function implemented internally
- Copied internal
.Call('cophenetic') from DECIPHER to SynExtend to avoid potential namespace issues
- Small fixes to remove some notes from
BiocCheck::BiocCheck()
- Variety of small updates to pass
BiocCheck
SynExtend 1.12.0
- Official Bioconductor 3.17 release (even with SynExtend 1.11.8)
SynExtend 1.11.8
- Fixes various small bugs in
MoransI
- Adds some multiprocessing support (more will be added in the future)
- Slight rework to species trees and their interaction with
ProtWeaver objects
ProtWeaver has new attribute speciesTree, can be initialized with a dendrogram object
- New method
SpeciesTree to get species tree from a ProtWeaver object (or compute one, if it doesn't exist)
- Various internal improvements for Bioc style consistency
- Various documentation updates
SynExtend 1.11.7
- Adds new optimized
dendrapply implementation (overloads stats::dendrapply)
- Adds
HungarianAlgorithm for optimal solving of the linear assignment problem (O(n^3) complexity)
- Adds new C code for fast computation of Pearson's R and p-value
- Adds new
Ancestral.ProtWeaver algorithm for calculating coevolution from correlated residue changes
- Supporting code and documentation for
Ancestral.ProtWeaver
- Other new internal methods
- Various updates and optimizations to internal methods and documentations
- Updates
GRF method to be called CI (for Clustering Information Distance)
Method="CI" in PhyloDistance now calculates an approximate p-value using simulated data from Smith (2020)
SynExtend 1.11.6
- Adds new Residue method
NVDT using gene sequence Natural Vector with Dinucleotide and Trinucleotide frequency
- Adds some new C methods to speed up calculation of NVDT
- Fixes
.Call() not using PACKAGE="SynExtend"
- Updates to documentation
SynExtend 1.11.5
- Adds new colocalization algorithm
ColocMoran, uses Coloc with MoransI to correct for phylogenetic signal
- Adds new colocalization algorithm
TranscripMI, uses mutual information of transcriptional direction
- Adds new corrections/checks to allow for transcriptional direction to be in labels
- Various bugfixes to support new four number labelling scheme
- Various documentation updates
- Adds new function
MoransI to calculate Moran's I for a set of spatially distributed signals
SynExtend 1.11.4
- Internal code refactor
ShuffleC now supports reproducibility using R's set.seed
ShuffleC now support sampling with replacement, performance is around 2.25x faster than sample
SynExtend 1.11.3
- Internal bugfixes for JRF Distance--previous commit was incorrectly calculating values
- Adds new
TreeDistance predictor for ProtWeaver, incorporating all tree distance metrics; these metrics are bundled due to some backend optimizations that improve performance
- Bugfixes for
PhyloDistance
- Adds Random Projection for
MirrorTree predictor to solve memory problems and increase accuracy
- New internal random number generator using xorshift, significantly faster than
sample()
HammingGL changed to CorrGL, now uses Pearson's R weighted by p-value
- Refactors internal predictors to reduce size of codebase and remove redundancies
- Internal
ShuffleC function to replicate sample functionality with 2-6x speedup
- Method
GainLoss now uses bootstrapping to estimate a p-value
- Updates to documentation files
SynExtend 1.11.2
- Adds KF Distance for trees
- Adds Jaccard Robinson Foulds Distance for trees
- Reworks tree distances into
PhyloDistance function
- Numerous new documentation pages
- Updates internal functions to use
rapply instead of dendrapply to avoid stack overflow issues due to R recursion
SynExtend 1.11.1
- Minor bugfix to RF distance
- updates gitignore for workflows
SynExtend 1.10.1
SynExtend 1.9.21
- Adds new
RFDist function to calculate Robinson-Foulds Distance
- Adds normalization for
GeneralizedRF to make the distance between 0 and 1
- Minor bugfixes
- Documentation for new functions
SynExtend 1.9.20
- Adds new
GeneralizedRF function to calculated information-theoretic Generalized Robinson-Foulds distance between two dendrograms.
- Documentation for new function
- New ProtWeaver predictor based off of
GeneralizedRF metric
- New internal C source code for
GeneralizedRF
SynExtend 1.9.19
- Adds new
DPhyloStatistic function to calculate the D-statistic for a binary state against a phylogeny following Fritz and Purvis (2009).
- Documentation for new function
- new internal C source code for
DPhyloStatistic
- new internal C source code for random utility functions, currently only has functions to generate random numbers
SynExtend 1.9.18
- Various internal improvements to presence/absence profile methods
SynExtend 1.9.17
- Adds new prediction algorithm
GainLoss
- Adds new internal C implementation of dendrograms, significantly faster than R dendrograms
ProtWeaver methods Behdenna and GainLoss can now infer a species tree when possible
- Updates
Jaccard and Hamming methods to use C implementations for distance calculation
- Adds
HammingGL method to calculate Hamming distance of gain/loss events
- Minor bugfixes to
ProtWeaver methods relating to subsetting
- Updates to various
man pages
SynExtend 1.9.16
- Removes
flatdendrapply, function was already included in SynExtend
- minor bugfixes to
ProtWeaver
SynExtend 1.9.15
- Edits to
SelectByK, function can work as intended, but is still too conservative at false positive removal.
SynExtend 1.9.14
- Adds new function
flatdendrapply for more options to apply functions to dendrograms. Function is used in SuperTree.
- Adds new function
SuperTree to construct a species tree from a set of gene trees.
- Adds new dataset
SuperTreeEx for SuperTree and flatdendrapply examples.
SynExtend 1.9.13
SelectByK function argument ClusterSelect switched to ClusterScalar. Cluster number selection now performed by fitting sum of total within cluster sum of squares to a right hyperbola and taking the ceiling of the half-max. Scalar allows a user to pick different tolerances to select more, or less clusters. Plotting behavior updated.
SynExtend 1.9.12
simMat class now supports empty indexing (s[])
simMat class now supports logical accession (s[c(T,F,T),])
SynExtend 1.9.11
- Added the function
SelectByK that allows for quick removal of false positive predicted pairs based on a relatively simple k-means approach. Function is currently designed for use on the single genome-to-genome pairwise comparison, and not on an all-vs-all many genomes scale, though it may provide acceptable results on that scale.
SynExtend 1.9.10
- New
simMat class for dist-like similarity matrices that can be manipulated like base matrices
- Major update to
ProtWeaver internals
- All internal calls use
simMat objects whenever possible to decrease memory footprint
- Note
ContextTree and ProfDCA require matrices internally
ProtWeb objects now inherit from simMat
ProtWeb.show and ProtWeb.print now display predictions in a more natural way
GetProtWebData() deprecated; ProtWeb now inherits as.matrix.simMat and as.data.frame.simMat
- New documentation pages for
simMat class
GetProtWebData documentation page reworked into ProtWeb documentation file.
- Fixes new bug in
Method='Hamming' introduced in SynExtend 1.9.9
SynExtend 1.9.9
- Fixes minor bug in
Method='Hamming'
- Moves some code around
SynExtend 1.9.8
- Major refactor to file structure of
ProtWeaver to make individual files more manageable
- Adds new documentation files for individual prediction streams of
predict.ProtWeaver
SynExtend 1.9.7
BlockReconciliation now returns a an object of class PairSummaries.
SynExtend 1.9.6
- Fixes an error where warnings were mistakenly output to the user
SynExtend 1.9.5
- Moves platform-specific files in
src/ (originally added by mistake)
SynExtend 1.9.4
- Lots of bugfixes to
ResidueMI.ProtWeaver
predict.ProtWeaver now correctly labels rows/columns with gene names, not numbers
predict.ProtWeaver now correctly handles Subset arguments
predict.ProtWeaver(..., Subset=3) will correctly predict for all pairs involving gene 3 (or for any gene x, as long as Subset is a length 1 character or integer vector).
SynExtend 1.9.2
- Adds residue MI method to
ProtWeaver
- Various bugfixes for
ProtWeaver
SynExtend 1.7.14
- Various improvements for
GenRearrScen, improves consistency and output formatting
- Major bugfix for
ProtWeaver methods using dendrogram objects
ProtWeaver now correctly guards against non-bifurcating dendrograms in methods that expect it
SynExtend 1.7.13
- Introduces new
ProtWeaver class to predict functional association of genes from COGs or gene trees. This implements many algorithms commonly used in the literature, such as MirrorTree and Inverse Potts Models.
predict(ProtWeaverObject) returns a ProtWeb class with information on predicted associations.
- Adds
BlastSeqs to run BLAST queries on sequences stored as an XStringSet or FASTA file.
SynExtend 1.7.12
- Updates to
ExtractBy function. Methods and inputs simplified and adjusted, and significant improvements to speed.
SynExtend 1.7.11
- Updated
NucleotideOverlaps to now correctly registers hits in genes with a large degree of overlap with the immediately preceding gene.
- Fixed aberrant behavior in
BlockExpansion where contigs with zero features could cause an error in expansion attempts.
SynExtend 1.7.10
BlockReconciliation now allows for setting either block size or mean PID for reconciliation precedence.
SynExtend 1.7.9
- Added retention thresholds to
BlockReconciliation.
SynExtend 1.7.8
BlockExpansion cases corrected for zero added rows.
SynExtend 1.7.7
- Improvements to
BlockExpansion and BlockReconciliation functions.
SynExtend 1.7.5
- Began integration of
DECIPHER's ScoreAlignment function.
SynExtend 1.7.4
- Fixed a bug in
PairSummaries function.
SynExtend 1.7.3
- Added
BlockExpansion function.
SynExtend 1.7.2
- Adjustment in how
PairSummaries handles default translation tables and GFF derived gene calls.
SynExtend 1.7.1
- Large changes under the hood to
PairSummaries.
- Failure to accurately assign neighbors in some cases should now be fixed.
- Extraction of genomic features is now faster.
OffSetsAllowed argument now defaults to FALSE. This argument may be dropped in the future in favor of a more complex function post-summary.
- Small edits to
SequenceSimilarity
SynExtend 1.5.4
- Added the function
SubSetPairs that allows for easy trimming of predicted pairs based on conflicting predictions and / or prediction statistics.
- Added the function
EstimageGenomeRearrangements that generates rearrangement scenarios of large scale genomic events using the double cut and join model.
SynExtend 1.5.3
- Added the function
SequenceSimilarity and made improvements to runtime in DisjointSet.
SynExtend 1.4.1
- Fixed a small bug in consensus scores in
PairSummaries where features facing on different strands had their score computed incorrectly.
SynExtend 1.3.15
- Changes to concensus score in
PairSummaries.
SynExtend 1.3.14
- Major changes to the
PairSummaries function and minor changes to NucleotideOverlaps, ExtractBy, and FindSets. Adjustments to the model that PairSummaries calls on to predict PIDs.
SynExtend 1.3.13
ExtractBy function has been added. Allows extraction of feature sequences into XStingSets organized by the a PairSummaries object or the single linkage clusters implied by pairings within the PairSummaries objects.
DisjointSet function added to extract single linkage clusters from a PairSummaries object.
SynExtend 1.3.12
PairSummaries now computes 4-mer distance between predicted pairs.
SynExtend 1.3.11
PairSummaries now returns a column titled Adjacent that provides the number of directly adjacent neighbor pairs to a predicted pair. Gap filling code adjusted.
- The function
FindSets has been added and performs single linkage clustering on a pairs list as represented by vectors of integers using the Union-Find algorithm. Long term this function will have a larger wrapper function for user ease of access but will remain exposed.
SynExtend 1.3.10
NucleotideOverlap now passes it's GeneCalls object forward, allowing PairSummaries to forego inclusion of that object as an argument.
SynExtend 1.3.9
- Minor vignette and suggested package changes.
SynExtend 1.3.8
PairSummaries now allows users to fill in specific matching gaps in blocks of predicted pairs with the arguments AllowGaps and OffSetsAllowed.
SynExtend 1.3.7
- Adjustments to progress bars in both
PairSummaries and NucleotideOverlap.
- PID prediction models in
PairSummaries adjusted.
SynExtend 1.3.6
- Contig name matching has been implemented. Scheme expects users to follow NCBI contig naming and gff formats, accepting contig names from gffs directly, and removing the first whitespace and everything thereafter from FASTA headers. Contig name matching can be disabled if users wish, using the argument
AcceptContigNames, but ensuring that the correct contigs in GeneCalls objects are matched to the appropriate contigs in Synteny objects are then the user's responsibility.
SynExtend 1.3.5
PairSummaries now translates sequences based on transl_table attributes provided by gene calls
PairSummaries now uses a generic model for predicting PID
gffToDataFrame now parses out the transl_table attribute
SynExtend 1.3.2
- Minor changes to
NucleotideOverlap
- Major changes to
PairSummaries - can now take in objects of class Genes build by the DECIPHER function FindGenes()
SynExtend 0.99.30
- Vignette and help files edited for clarity
SynExtend 0.99.1
SynExtend submitted to Bioconductor
- Added function
gffToDataFrame
- Added function
NucleotideOverlap
- Added function
PairSummaries