| Title: | Methods for BCR single-cell embedding |
|---|---|
| Description: | Implementation of the Ibex algorithm for single-cell embedding based on BCR sequences. The package includes a standalone function to encode BCR sequence information by amino acid properties or sequence order using tensorflow-based autoencoder. In addition, the package interacts with SingleCellExperiment or Seurat data objects. |
| Authors: | Nick Borcherding [aut, cre, cph], Qile Yang [ctb] (ORCID: <https://orcid.org/0009-0005-0148-2499>) |
| Maintainer: | Nick Borcherding <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.3.1 |
| Built: | 2026-05-29 22:07:41 UTC |
| Source: | https://github.com/bioc/Ibex |
Ibex implements methods for embedding B-cell receptor (BCR) sequences from single-cell assays into a continuous latent space. It supports amino-acid property–based and sequence-order encodings via a TensorFlow autoencoder, and interoperates with common single-cell containers such as SingleCellExperiment and SeuratObject.
Key features
Encode BCR sequence information using biochemical properties or raw sequence order (TensorFlow autoencoder).
Interoperate with SingleCellExperiment and SeuratObject for downstream analysis and visualization.
Utilities for loading pretrained models and managing dependencies in an isolated basilisk environment.
Getting started
browseVignettes("Ibex")
Models and caching
Pretrained encoders can be retrieved with aa.model.loader(), which
validates against internal metadata and caches downloaded artifacts; see the
function help for cache location and behavior.
Python/TensorFlow note Ibex uses basilisk to provision an isolated Python environment at runtime; no manual setup is usually required.
Maintainer: Nick Borcherding [email protected] [copyright holder]
Other contributors:
Qile Yang [email protected] (ORCID) [contributor]
https://github.com/BorchLab/Ibex
https://github.com/BorchLab/Ibex/issues
This function enhances BCR processing by incorporating additional
sequence information from CDR1 and CDR2 regions before applying the BCR
combination logic. The function depends on
scRepertoire::combineBCR().
combineExpandedBCR( input.data, samples = NULL, ID = NULL, call.related.clones = TRUE, threshold = 0.85, removeNA = FALSE, removeMulti = FALSE, filterMulti = TRUE, filterNonproductive = TRUE )combineExpandedBCR( input.data, samples = NULL, ID = NULL, call.related.clones = TRUE, threshold = 0.85, removeNA = FALSE, removeMulti = FALSE, filterMulti = TRUE, filterNonproductive = TRUE )
input.data |
List of filtered contig annotations. |
samples |
Character vector. Labels of samples (required). |
ID |
Character vector. Additional sample labeling (optional). |
call.related.clones |
Logical. Whether to call related clones based on
nucleotide sequence and V gene. Default is |
threshold |
Numeric. Normalized edit distance for clone clustering.
Default is |
removeNA |
Logical. Whether to remove any chain without values. Default
is |
removeMulti |
Logical. Whether to remove barcodes with more than two
chains. Default is |
filterMulti |
Logical. Whether to select the highest-expressing light
and heavy chains. Default is |
filterNonproductive |
Logical. Whether to remove nonproductive chains.
Default is |
A list of consolidated BCR clones with expanded CDR sequences.
#' # Get Data ibex_vdj <- get(data("ibex_vdj")) combined.BCR <- combineExpandedBCR(list(ibex_vdj), samples = "Sample1", filterNonproductive = TRUE)#' # Get Data ibex_vdj <- get(data("ibex_vdj")) combined.BCR <- combineExpandedBCR(list(ibex_vdj), samples = "Sample1", filterNonproductive = TRUE)
This function generates a single-cell object with a reduced representation of RNA expression by clone. The approach is inspired by the method introduced in CoNGA. Users can generate either a mean representation of features by clone or identify a representative cell using count-based minimal Euclidean distance. Please read and cite the original work by the authors of CoNGA.
CoNGAfy( input.data, method = "dist", features = NULL, assay = "RNA", meta.carry = c("CTaa", "CTgene") )CoNGAfy( input.data, method = "dist", features = NULL, assay = "RNA", meta.carry = c("CTaa", "CTgene") )
input.data |
A single-cell dataset in Seurat or SingleCellExperiment format. |
method |
Character. Specifies the method to reduce the dataset:
|
features |
Character vector. Selected genes for the reduction. If |
assay |
Character. The name of the assay or assays to include in the output. Defaults to the active assay. |
meta.carry |
Character vector. Metadata variables to carry over from the input single-cell object to the output. |
A reduced single-cell object where each clonotype is represented by a single cell.
#' # Get Data ibex_example <- get(data("ibex_example")) ibex.clones <- CoNGAfy(ibex_example, method = "dist") ibex.clones <- CoNGAfy(ibex_example, method = "mean")#' # Get Data ibex_example <- get(data("ibex_example")) ibex.clones <- CoNGAfy(ibex_example, method = "dist") ibex.clones <- CoNGAfy(ibex_example, method = "mean")
This function subsets a Seurat or SingleCellExperiment object,
removing cells where the CTaa column is missing or contains unwanted patterns.
filter.cells(sc.obj, chain)filter.cells(sc.obj, chain)
sc.obj |
A Seurat or SingleCellExperiment object. |
chain |
Character. Specifies the chain type ("Heavy" or "Light"). |
A filtered Seurat or SingleCellExperiment object.
This object includes normalized gene expression values, metadata annotations, and B cell clonotype information derived from 10x V(D)J sequencing. It is intended as a small example dataset for testing and demonstration purposes.
A SingleCellExperiment object with 32,285 genes (rows) and 200 cells (columns).
List of matrices containing expression values: counts (raw counts) and logcounts (log-transformed).
Empty in this example (no gene-level annotations).
A DataFrame with 14 columns of cell metadata, including:
- orig.ident: Original sample identity.
- nCount_RNA: Total number of counts per cell.
- nFeature_RNA: Number of detected genes per cell.
- cloneSize: Size of each clone.
- ident: Cluster assignment.
Contains dimensionality reductions: PCA, pca, and apca.
One alternative experiment named BEAM containing additional expression data.
This function runs the Ibex algorithm to generate latent vectors from input data. The output can be returned as a matrix, with options to choose between deep learning autoencoders or geometric transformations based on the BLOSUM62 matrix.
Ibex_matrix( input.data, chain = c("Heavy", "Light"), method = c("encoder", "geometric"), encoder.model = c("CNN", "VAE", "CNN.EXP", "VAE.EXP"), encoder.input = c("atchleyFactors", "crucianiProperties", "kideraFactors", "MSWHIM", "tScales", "OHE"), geometric.theta = pi/3, species = "Human", verbose = TRUE )Ibex_matrix( input.data, chain = c("Heavy", "Light"), method = c("encoder", "geometric"), encoder.model = c("CNN", "VAE", "CNN.EXP", "VAE.EXP"), encoder.input = c("atchleyFactors", "crucianiProperties", "kideraFactors", "MSWHIM", "tScales", "OHE"), geometric.theta = pi/3, species = "Human", verbose = TRUE )
input.data |
Input data, which can be:
|
chain |
Character. Specifies which chain to analyze:
|
method |
Character. The algorithm to use for generating latent vectors:
|
encoder.model |
Character. The type of autoencoder model to use:
|
encoder.input |
Character. Specifies the input features for the encoder model. Options include:
|
geometric.theta |
Numeric. Angle (in radians) for the geometric
transformation. Only used when |
species |
Character. Default is "Human" or "Mouse". |
verbose |
Logical. Whether to print progress messages. Default is TRUE. |
A matrix of latent vectors generated by the specified method.
immApex::propertyEncoder(),
immApex::geometricEncoder()
# Get Data ibex_example <- get(data("ibex_example")) # Using the encoder method with a variational autoencoder ibex_values <- Ibex_matrix(ibex_example, chain = "Heavy", method = "encoder", encoder.model = "VAE", encoder.input = "atchleyFactors") # Using the geometric method with a specified angle ibex_values <- Ibex_matrix(ibex_example, chain = "Heavy", method = "geometric", geometric.theta = pi) # Using a character vector of amino acid sequences sequences <- c("CARDYW", "CARDSSGYW", "CARDTGYW") ibex_values <- Ibex_matrix(sequences, chain = "Heavy", method = "geometric")# Get Data ibex_example <- get(data("ibex_example")) # Using the encoder method with a variational autoencoder ibex_values <- Ibex_matrix(ibex_example, chain = "Heavy", method = "encoder", encoder.model = "VAE", encoder.input = "atchleyFactors") # Using the geometric method with a specified angle ibex_values <- Ibex_matrix(ibex_example, chain = "Heavy", method = "geometric", geometric.theta = pi) # Using a character vector of amino acid sequences sequences <- c("CARDYW", "CARDSSGYW", "CARDTGYW") ibex_values <- Ibex_matrix(sequences, chain = "Heavy", method = "geometric")
This dataset contains single-cell V(D)J sequencing annotations from the 10x Genomics BEAM-Ab Mouse dataset. It includes V(D)J gene calls, CDR regions, productivity information, and clonotype assignments for each contig.
A data frame with 6 rows and 35 columns:
Character. Unique cell barcode.
Logical. Whether the barcode is identified as a cell.
Character. Unique identifier for each contig.
Logical. Whether the contig is high confidence.
Integer. Length of the contig.
Character. Chain type (e.g., IGH, IGK).
Character. V gene annotation.
Character. D gene annotation.
Character. J gene annotation.
Character. C gene annotation.
Logical. Whether the contig is full-length.
Logical. Whether the contig is productive.
Character. Amino acid sequence for Framework Region 1.
Character. Nucleotide sequence for FWR1.
Character. Amino acid sequence for CDR1.
Character. Nucleotide sequence for CDR1.
Character. Amino acid sequence for FWR2.
Character. Nucleotide sequence for FWR2.
Character. Amino acid sequence for CDR2.
Character. Nucleotide sequence for CDR2.
Character. Amino acid sequence for FWR3.
Character. Nucleotide sequence for FWR3.
Character. Amino acid sequence for CDR3.
Character. Nucleotide sequence for CDR3.
Character. Amino acid sequence for FWR4.
Character. Nucleotide sequence for FWR4.
Integer. Number of reads supporting the contig.
Integer. Number of UMIs supporting the contig.
Character. Clonotype ID from 10x output.
Character. Consensus ID from 10x output.
Integer. Exact subclonotype grouping.
This function applies the Ibex algorithm to single-cell data, integrating
seamlessly with Seurat or SingleCellExperiment pipelines. The algorithm
generates latent dimensions using deep learning or geometric transformations,
storing the results in the dimensional reduction slot. runIbex will
automatically subset the single-cell object based on amino acid sequences
present for the given chain selection.
runIbex( sc.data, chain = "Heavy", method = "encoder", encoder.model = "VAE", encoder.input = "atchleyFactors", geometric.theta = pi, reduction.name = "Ibex", species = "Human", verbose = TRUE )runIbex( sc.data, chain = "Heavy", method = "encoder", encoder.model = "VAE", encoder.input = "atchleyFactors", geometric.theta = pi, reduction.name = "Ibex", species = "Human", verbose = TRUE )
sc.data |
A single-cell dataset, which can be:
|
chain |
Character. Specifies the chain to analyze:
|
method |
Character. Algorithm to use for generating latent dimensions:
|
encoder.model |
Character. The type of autoencoder model to use:
|
encoder.input |
Character. Input features for the encoder model:
|
geometric.theta |
Numeric. Angle (in radians) for geometric transformation.
Used only when |
reduction.name |
Character. The name to assign to the dimensional reduction. This is useful for running Ibex with multiple parameter settings and saving results under different names. |
species |
Character. Default is "Human" or "Mouse". |
verbose |
Logical. Whether to print progress messages. Default is TRUE. |
An updated Seurat or SingleCellExperiment object with Ibex dimensions added to the dimensional reduction slot.
# Get Data ibex_example <- get(data("ibex_example")) # Using the encoder method with a variational autoencoder ibex_example <- runIbex(ibex_example, chain = "Heavy", method = "encoder", encoder.model = "VAE", encoder.input = "atchleyFactors") # Using the geometric method with a specified angle ibex_example <- runIbex(ibex_example, chain = "Heavy", method = "geometric", geometric.theta = pi)# Get Data ibex_example <- get(data("ibex_example")) # Using the encoder method with a variational autoencoder ibex_example <- runIbex(ibex_example, chain = "Heavy", method = "encoder", encoder.model = "VAE", encoder.input = "atchleyFactors") # Using the geometric method with a specified angle ibex_example <- runIbex(ibex_example, chain = "Heavy", method = "geometric", geometric.theta = pi)