Removed almost all C/C++ code, so compiling on different platforms is easier
New LSH algorithm based on Annoy library produced by Spotify (https://github.com/spotify/annoy).
Performance tests are now based on Rank Based Overlap measure (http://www.williamwebber.com/research/papers/wmz10_tois.pdf).
LSH parameters have changed. See man pages for eiQuery and eiCluster for details
loadLSHData and freeLSHData are deprecated and no longer do anything. See man pages for details.
eiCluster can now cluster subsets of database
use new features in ChemmineR to store duplicate descriptors only once.
embedded descriptors now stored in database and the matrix file is written out only as needed by LSH to create an index.
Database schema changes make this version incompatible with version 1.2 an earlier. Existing databases will need to be re-loaded.
speed improvements
ieInit now accepts a SNOW cluster for parallel inserts
allow compounds to be updated in-place by name
eiQuery can now return similarity values instead of distances
The eiR packages introduces efficient methods for accelerating structure similarity searches and clustering of very large compound datasets. The acceleration is achieved by applying embedding and indexing techniques to represent chemical compounds in a high-dimensional Euclidean space and to employ ultra-fast pre-screening of the compound dataset using the LSH-assisted nearest neighbor search in the embedding space. This method can drastically reduce the search time of large databases, by a factor of 40–200 fold when searching for the 100 closest compounds to a query.