| Title: | Fast K-Nearest Neighbor Search for Single-Cell Analysis |
|---|---|
| Description: | Drop-in replacement for BiocNeighbors::findKNN using the jvecfor Java library, which builds on the jvector library to leverage the Java Vector API for portable SIMD acceleration across AVX2, AVX-512, and ARM NEON hardware. jvecfor/jvector implements HNSW-DiskANN approximate search and VP-tree exact search. The package achieves approximately 2x speedup over Annoy-based search at n >= 50K cells while returning output structurally identical to BiocNeighbors, making it suitable for seamless integration into existing Bioconductor single-cell workflows. Convenience wrappers delegate shared nearest-neighbor (SNN) and k-nearest-neighbor (KNN) graph construction to the bluster package. |
| Authors: | Anestis Gkanogiannis [aut, cre] (ORCID: <https://orcid.org/0000-0002-6441-0688>) |
| Maintainer: | Anestis Gkanogiannis <[email protected]> |
| License: | GPL-3 |
| Version: | 1.1.0 |
| Built: | 2026-05-30 07:51:51 UTC |
| Source: | https://github.com/bioc/jvecfor |
Drop-in replacement for BiocNeighbors::findKNN using the jvecfor
Java library (HNSW-DiskANN approximate and VP-tree exact methods). Achieves
approximately 2x speedup over Annoy-based search at n >= 50K cells.
Convenience wrappers delegate SNN/KNN graph construction to the bluster
package.
fastFindKNNKNN search – returns index + distance matrices.
fastMakeSNNGraphKNN -> SNN graph via bluster.
fastMakeKNNGraphKNN -> KNN graph via bluster.
JvecforParamBiocNeighbors parameter class for drop-in integration with scran, scater, etc.
jvecfor_setupInstall a custom jvecfor JAR.
jvecfor.verboseLogical. Enable Java/jvecfor progress
logging globally. Default FALSE.
jvecfor.jarCharacter. Path to a custom jvecfor JAR file.
Overrides the bundled JAR in inst/java/.
Maintainer: Anestis Gkanogiannis [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/gkanogiannis/jvecfor/issues
Drop-in replacement for BiocNeighbors::findKNN using the jvecfor
Java library. Supports HNSW-DiskANN approximate search
(type="ann") and VP-tree exact search (type="knn").
fastFindKNN( X, k = 15L, type = c("ann", "knn"), metric = c("euclidean", "cosine", "dot_product"), num.threads = NULL, BPPARAM = BiocParallel::bpparam(), ef.search = 0L, M = 16L, oversample.factor = 1, pq.subspaces = 0L, get.distance = TRUE, verbose = getOption("jvecfor.verbose", FALSE) )fastFindKNN( X, k = 15L, type = c("ann", "knn"), metric = c("euclidean", "cosine", "dot_product"), num.threads = NULL, BPPARAM = BiocParallel::bpparam(), ef.search = 0L, M = 16L, oversample.factor = 1, pq.subspaces = 0L, get.distance = TRUE, verbose = getOption("jvecfor.verbose", FALSE) )
X |
A numeric matrix, |
k |
Integer. Number of nearest neighbors to find (excluding self). Default 15. |
type |
Character. |
metric |
Character. Distance metric: |
num.threads |
Integer or NULL. Number of Java threads. If NULL,
defaults to |
BPPARAM |
A |
ef.search |
Integer. HNSW-DiskANN beam width override (0 = auto:
|
M |
Integer. HNSW-DiskANN maximum connections per node. Higher
values (e.g. 32) improve recall for high-dimensional data at the
cost of more memory and a slower build. Only meaningful when
|
oversample.factor |
Numeric. Oversampling multiplier for the ANN
beam width. When > 1.0, fetches
|
pq.subspaces |
Integer. Number of Product Quantization subspaces
for approximate ANN scoring (0 = disabled). Typical value:
|
get.distance |
Logical. Return distance matrix alongside index? Default TRUE. |
verbose |
Logical. Pass |
A named list:
n x k integer matrix of 1-indexed neighbor indices.
n x k numeric matrix of distances/similarities, or
NULL if get.distance=FALSE.
set.seed(42) X <- matrix(rnorm(200), nrow = 20, ncol = 10) # Full examples require Java >= 20 on PATH nn <- fastFindKNN(X, k = 3) dim(nn$index) # 20 x 3 dim(nn$distance) # 20 x 3 # High-recall HNSW-DiskANN with wider beam and more connections nn2 <- fastFindKNN(X, k = 3, M = 32, ef.search = 100, oversample.factor = 2.0) # Dot-product similarity (ANN only) nn3 <- fastFindKNN(X, k = 3, metric = "dot_product")set.seed(42) X <- matrix(rnorm(200), nrow = 20, ncol = 10) # Full examples require Java >= 20 on PATH nn <- fastFindKNN(X, k = 3) dim(nn$index) # 20 x 3 dim(nn$distance) # 20 x 3 # High-recall HNSW-DiskANN with wider beam and more connections nn2 <- fastFindKNN(X, k = 3, M = 32, ef.search = 100, oversample.factor = 2.0) # Dot-product similarity (ANN only) nn3 <- fastFindKNN(X, k = 3, metric = "dot_product")
Convenience wrapper that calls fastFindKNN then delegates graph
construction to bluster::neighborsToKNNGraph.
fastMakeKNNGraph( X, k = 15L, type = c("ann", "knn"), metric = c("euclidean", "cosine", "dot_product"), num.threads = NULL, BPPARAM = BiocParallel::bpparam(), ef.search = 0L, M = 16L, oversample.factor = 1, pq.subspaces = 0L, verbose = getOption("jvecfor.verbose", FALSE), directed = FALSE, ... )fastMakeKNNGraph( X, k = 15L, type = c("ann", "knn"), metric = c("euclidean", "cosine", "dot_product"), num.threads = NULL, BPPARAM = BiocParallel::bpparam(), ef.search = 0L, M = 16L, oversample.factor = 1, pq.subspaces = 0L, verbose = getOption("jvecfor.verbose", FALSE), directed = FALSE, ... )
X |
A numeric matrix, |
k |
Integer. Number of nearest neighbors. Default 15. |
type |
Character. |
metric |
Character. |
num.threads |
Integer or NULL. Number of Java threads. If NULL,
defaults to |
BPPARAM |
A |
ef.search |
Integer. HNSW-DiskANN beam width override (0 = auto). Default 0L. |
M |
Integer. HNSW-DiskANN max connections per node. Default 16L. |
oversample.factor |
Numeric. Oversampling multiplier. Default 1.0. |
pq.subspaces |
Integer. PQ subspaces (0 = disabled). Default 0L. |
verbose |
Logical. Enable Java verbose logging. Default
|
directed |
Logical. Build directed graph? Default FALSE. |
... |
Additional arguments forwarded to
|
An igraph object (KNN graph).
set.seed(42) X <- matrix(rnorm(5000), nrow = 100, ncol = 50) # Full examples require Java >= 20 on PATH g <- fastMakeKNNGraph(X, k = 10) igraph::vcount(g) # 100set.seed(42) X <- matrix(rnorm(5000), nrow = 100, ncol = 50) # Full examples require Java >= 20 on PATH g <- fastMakeKNNGraph(X, k = 10) igraph::vcount(g) # 100
Convenience wrapper that calls fastFindKNN then delegates graph
construction to bluster::neighborsToSNNGraph.
fastMakeSNNGraph( X, k = 15L, type = c("ann", "knn"), metric = c("euclidean", "cosine", "dot_product"), num.threads = NULL, BPPARAM = BiocParallel::bpparam(), ef.search = 0L, M = 16L, oversample.factor = 1, pq.subspaces = 0L, verbose = getOption("jvecfor.verbose", FALSE), snn.type = "rank", ... )fastMakeSNNGraph( X, k = 15L, type = c("ann", "knn"), metric = c("euclidean", "cosine", "dot_product"), num.threads = NULL, BPPARAM = BiocParallel::bpparam(), ef.search = 0L, M = 16L, oversample.factor = 1, pq.subspaces = 0L, verbose = getOption("jvecfor.verbose", FALSE), snn.type = "rank", ... )
X |
A numeric matrix, |
k |
Integer. Number of nearest neighbors. Default 15. |
type |
Character. |
metric |
Character. |
num.threads |
Integer or NULL. Number of Java threads. If NULL,
defaults to |
BPPARAM |
A |
ef.search |
Integer. HNSW-DiskANN beam width override (0 = auto). Default 0L. |
M |
Integer. HNSW-DiskANN max connections per node. Default 16L. |
oversample.factor |
Numeric. Oversampling multiplier. Default 1.0. |
pq.subspaces |
Integer. PQ subspaces (0 = disabled). Default 0L. |
verbose |
Logical. Enable Java verbose logging. Default
|
snn.type |
Character. SNN weighting scheme passed to
|
... |
Additional arguments forwarded to
|
An igraph object (weighted, undirected SNN graph).
set.seed(42) X <- matrix(rnorm(5000), nrow = 100, ncol = 50) # Full examples require Java >= 20 on PATH g <- fastMakeSNNGraph(X, k = 10) igraph::vcount(g) # 100 # Higher recall with larger beam and more connections g2 <- fastMakeSNNGraph(X, k = 10, M = 32, oversample.factor = 2.0)set.seed(42) X <- matrix(rnorm(5000), nrow = 100, ncol = 50) # Full examples require Java >= 20 on PATH g <- fastMakeSNNGraph(X, k = 10) igraph::vcount(g) # 100 # Higher recall with larger beam and more connections g2 <- fastMakeSNNGraph(X, k = 10, M = 32, oversample.factor = 2.0)
Copies a custom jvecfor JAR to the user data directory
(tools::R_user_dir("jvecfor", "data")). The bundled JAR in
inst/java/ is used by default; call jvecfor_setup() only
to override it with a custom build. Alternatively, set
options(jvecfor.jar = "/path/to/jvecfor.jar") for a session-level
override without copying.
jvecfor_setup(jar_path = NULL)jvecfor_setup(jar_path = NULL)
jar_path |
Path to the jvecfor JAR. If |
Invisibly returns the path to the installed JAR.
# Show where custom JARs are stored tools::R_user_dir("jvecfor", "data") # List bundled JARs dir(system.file("java", package = "jvecfor"), pattern = "*.jar")# Show where custom JARs are stored tools::R_user_dir("jvecfor", "data") # List bundled JARs dir(system.file("java", package = "jvecfor"), pattern = "*.jar")
A BiocNeighborIndex
subclass storing the data matrix and JvecforParam parameters.
The actual Java HNSW/VP-tree index is built on-the-fly when
findKNN is called.
dataNumeric matrix (rows = observations, cols = features).
paramA JvecforParam object.
namesRow names from the original matrix, or NULL.
A BiocNeighborParam
subclass for the jvecfor Java backend.
Passing a JvecforParam object as the BNPARAM argument to
findKNN or higher-level functions
(e.g. scran::buildSNNGraph, scater::runUMAP) routes
neighbor search through jvecfor's HNSW-DiskANN or VP-tree engine.
JvecforParam( type = "ann", distance = "Euclidean", M = 16L, ef.search = 0L, oversample.factor = 1, pq.subspaces = 0L, verbose = FALSE ) ## S4 method for signature 'JvecforParam' show(object) ## S4 method for signature 'JvecforParam' buildIndex(X, BNPARAM, transposed = FALSE, ...) ## S4 method for signature 'JvecforIndex' findKnnFromIndex( BNINDEX, k, get.index = TRUE, get.distance = TRUE, num.threads = 1, subset = NULL, ... )JvecforParam( type = "ann", distance = "Euclidean", M = 16L, ef.search = 0L, oversample.factor = 1, pq.subspaces = 0L, verbose = FALSE ) ## S4 method for signature 'JvecforParam' show(object) ## S4 method for signature 'JvecforParam' buildIndex(X, BNPARAM, transposed = FALSE, ...) ## S4 method for signature 'JvecforIndex' findKnnFromIndex( BNINDEX, k, get.index = TRUE, get.distance = TRUE, num.threads = 1, subset = NULL, ... )
type |
Character. |
distance |
Character. |
M |
Integer. HNSW max connections per node. Default 16L. |
ef.search |
Integer. HNSW beam width (0 = auto). Default 0L. |
oversample.factor |
Numeric. Oversampling multiplier. Default 1.0. |
pq.subspaces |
Integer. PQ subspaces (0 = disabled). Default 0L. |
verbose |
Logical. Java progress logging. Default FALSE. |
object |
A |
X |
A numeric matrix (rows = observations, cols = features). |
BNPARAM |
A |
transposed |
Logical. If TRUE, |
... |
Ignored. |
BNINDEX |
A |
k |
Integer. Number of nearest neighbors. |
get.index |
Logical. Return index matrix? Default TRUE. |
get.distance |
Logical. Return distance matrix? Default TRUE. |
num.threads |
Integer. Thread count. Default 1. |
subset |
Integer vector. Row indices to return results for. All rows are computed; this filters the output. Default NULL (all rows). |
A JvecforParam object.
A JvecforIndex object.
A named list with index (n-by-k integer matrix or
NULL) and distance (n-by-k numeric matrix or NULL).
show(JvecforParam): Print a summary of the parameter object.
buildIndex(JvecforParam): Build a JvecforIndex from a data matrix.
findKnnFromIndex(JvecforIndex): Find k-nearest neighbors using a
JvecforIndex.
JvecforParam(): Constructor for JvecforParam objects.
typeCharacter. "ann" (HNSW-DiskANN, default) or
"knn" (VP-tree exact).
MInteger. HNSW max connections per node. Default 16L.
ef.searchInteger. HNSW beam width (0 = auto). Default 0L.
oversample.factorNumeric. Oversampling multiplier (>= 1). Default 1.0.
pq.subspacesInteger. Product-quantization subspaces (0 = disabled). Default 0L.
verboseLogical. Enable Java progress logging. Default FALSE.
"Euclidean" and "Cosine" (title-case, following
BiocNeighbors convention). The jvecfor-specific "dot_product"
metric is only available via fastFindKNN directly.
queryKNN is not supported. The Java backend performs
self-KNN only (all points query against all points in a single
JVM invocation).
The index built by buildIndex stores the data matrix
in R memory; the actual Java HNSW/VP-tree index is rebuilt each
time findKNN is called.
fastFindKNN for the standalone function with
full parameter control including dot_product metric.
library(BiocNeighbors) p <- JvecforParam() p # Custom parameters p2 <- JvecforParam(type = "knn", distance = "Cosine", M = 32L) # Use with BiocNeighbors (requires Java >= 20): # res <- findKNN(X, k = 10, BNPARAM = JvecforParam())library(BiocNeighbors) p <- JvecforParam() p # Custom parameters p2 <- JvecforParam(type = "knn", distance = "Cosine", M = 32L) # Use with BiocNeighbors (requires Java >= 20): # res <- findKNN(X, k = 10, BNPARAM = JvecforParam())