buildNetwork() now scales to large, clonally expanded repertoires. The C++ engine was rewritten around candidate generation instead of all-pairs comparison, with no new dependencies and identical edge output for the default settings:
2k. Hamming uses pigeonhole segment indexing. Both fall back to length-blocked comparison outside their range.Speedups range from roughly 2x on mixed repertoires with many V genes to about 70x when one block holds many unique sequences at a tight threshold.
expand argument to buildNetwork(). "clique" (default) materializes every pairwise edge and reproduces the exact edge multiplicity that community-detection clustering expects. "star" links identical sequences through a single hub and connects related groups hub to hub. "star" produces far fewer edges and preserves connected components exactly, so it is a large memory win when the downstream step depends on connectivity.inferCDR() reference validation checkgetIMGT() now uses immReferent as a backend for downloading and caching IMGT reference sequencesframe, max.retries, and verbose parameters from getIMGT()refresh parameter to getIMGT() for controlling cache behaviorhash, httr, and rvest package dependencies (now handled by immReferent).parseSpecies() with .mapSpecies() for immReferent species name compatibilitypropertyEncoder() function when the suggested Peptides package is not installed.The buildNetwork() function has been significantly enhanced to include:
Multiple distance metrics: Added support for five distance metrics via the dist_type parameter:
"levenshtein" (default): Standard edit distance"hamming": Substitutions only (requires equal-length sequences)"damerau": Levenshtein with transpositions"nw": Needleman-Wunsch global alignment"sw": Smith-Waterman local alignmentFlexible normalization: New normalize parameter supports three modes:
"none" (default): Raw distance values (backward compatible)"maxlen": Normalize by max(length(seq1), length(seq2))"length": Normalize by mean sequence lengthbuildNetwork() returns a symmetric distance matrix and drops nonstructural zerospropertyEncoder() and onehotEncoder() to use C++ backend with encodeSequences.cppbuildNetwork() via C++ integration (fastEditEdges.cpp)calculateMotif() use C++ (calculateMotif.cpp)generateSequences(), mutateSequences(), adjacencyMatrix(), tokenizeSequences(), geometricEncodergetIR()variationalSequences()getIMGT() for website functioningcalculateEntropy() added to calculate the positional entropy along a biological sequencecalculateEntropy()calculateFrequency() added to calculate the positional frequency along a biological sequencecalculateMotif() added to get motif quantification of sequencescalculateGeneUsage() added for single/paired gene enumerationscaleMatrix() for comprehensive scale/transformation functionssummaryMatrix() for fast summarization of matrix valuesinferCDR()generateSequence() range issue for single-length sequencespositionalEncoder()buildNetwork() for relative thresholding returns relative valuegetIMGT() checks for availability of IMGT websitevariationalSequences() evaluate presence of KerasvariationalSequences() examplevariationalSequences()propertyEncoder()variationalSequences()variationalSequences()getIMGT()getIMGT()getIMGT() to mention IMGT license on first usepositionalEncoder()formatGenes() and inferCDR() for new example data