Package 'ClustIRR' reference manual

Title:	Clustering of immune receptor repertoires
Description:	ClustIRR analyzes repertoires of B- and T-cell receptors. It starts by identifying communities of immune receptors with similar specificities, based on the sequences of their complementarity-determining regions (CDRs). Next, it employs a Bayesian probabilistic models to quantify differential community occupancy (DCO) between repertoires, allowing the identification of expanding or contracting communities in response to e.g. infection or cancer treatment.
Authors:	Simo Kitanovski [aut, cre] , Kai Wollek [aut]
Maintainer:	Simo Kitanovski <[email protected]>
License:	GPL-3 + file LICENSE
Version:	1.5.42
Built:	2025-03-01 10:24:55 UTC
Source:	https://github.com/bioc/ClustIRR

BLOSUM62 matrix

Description

Predefined scoring matrix for amino acid or nucleoitide alignments.

Usage

data("BLOSUM62")data("BLOSUM62")

Format

BLOSUM62 is a square symmetric matrix. Rows and columns are identical single letters, representing nucleotide or amino acid. Elements are integer coefficients (substitution scores).

Details

BLOSUM62 was obtained from NCBI (the same matrix used by the stand- alone BLAST software).

Source

See https://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62

References

See https://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62

Examples

data(BLOSUM62, package = "ClustIRR")
BLOSUM62
data(BLOSUM62, package = "ClustIRR")
BLOSUM62

Objects of the class clust_irr are generated by the function cluster_irr. These objects are used to store the clustering results in a structured way, such that they may be used as inputs of other ClustIRR functions (e.g. get_graph, plot_graph, etc.).

The output is an S4 object of class clust_irr. This object contains two sublists:

clust, list, contains clustering results for each IR chain. The results are stored as data.frame in separate sub-list named appropriately (e.g. CDR3a, CDR3b, CDR3g, etc.). Each row in the data.frames contains a pair of CDR3s.

The remaining columns contain similarity scores for the complete CDR3 sequences (column weight) or their cores (column cweight). The columns max_len and max_clen store the length of the longer CDR3 and CDR3 core sequence in the pair, and these used to normalize the scores weight and cweight: the normalized scores are shown in the columns nweight and ncweight
inputs, list, contains all user provided inputs (see Arguments)

Arguments

`clust`	list, contains clustering results for each TCR/BCR chain. The results are stored in separate sub-list named appropriately (e.g. CDR3a, CDR3b, CDR3g, etc.)
`inputs`	list, contains all user provided inputs

Value

The output is an S4 object of class clust_irr

Accessors

To access the slots of clust_irr object we have two accessor functions. In the description below, x is a clust_irr object.

get_clustirr_clust: get_clustirr_clust(x): Extract the clustering results (slot clust)
get_clustirr_inputs: get_clustirr_inputs(x): Extract the processed inputs (slot inputs)

Examples

# load package input data
data("CDR3ab", package = "ClustIRR")
s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1)

# run analysis
c <- cluster_irr(s = s)

# output class
class(c)

# output structure
str(c)

# inspect which CDR3bs are globally similar
knitr::kable(head(slot(c, "clust")$CDR3b))


# clust_irr S4 object generated 'manually' from the individual results
new_clust_irr <- new("clust_irr", 
                     clust = slot(object = c, name = "clust"),
                     inputs = slot(object = c, name = "inputs"))

# we should get identical outputs
identical(x = new_clust_irr, y = c)
# load package input data
data("CDR3ab", package = "ClustIRR")
s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1)

# run analysis
c <- cluster_irr(s = s)

# output class
class(c)

# output structure
str(c)

# inspect which CDR3bs are globally similar
knitr::kable(head(slot(c, "clust")$CDR3b))


# clust_irr S4 object generated 'manually' from the individual results
new_clust_irr <- new("clust_irr", 
                     clust = slot(object = c, name = "clust"),
                     inputs = slot(object = c, name = "inputs"))

# we should get identical outputs
identical(x = new_clust_irr, y = c)

Clustering of immune receptor repertoires (IRRs)

Description

cluster_irr computes similarities between immune receptors (IRs = T-cell and B-cell receptors) based on their CDR3 sequences.

Usage

cluster_irr(s,
            meta = NULL,
            control = list(gmi = 0.7,
                           trim_flank_aa = 3,
                           db_dist = 0,
                           db_custom = NULL))
cluster_irr(s,
            meta = NULL,
            control = list(gmi = 0.7,
                           trim_flank_aa = 3,
                           db_dist = 0,
                           db_custom = NULL))

Arguments

s

a data.frame with complementarity determining region 3 (CDR3) amino acid sequences observed in IRR clones (data.frame rows). The data.frame has the following columns (IR clone features):

sample: name of the IRR (e.g. 'A')
clone_size: cell count in the clone (=clonal expansion)
CDR3?: amino acid CDR3 sequence. Replace '?' with the appropriate name of the immmune receptor chain (e.g. CDR3a for CDR3s from TCR $\alpha$ chain; or CDR3d for CDR3s from TCR $\delta$ chain. Meanwhile, if paired CDR3s from both chains are available, then you can provide both in separate columns e.g.:
- CDR3b and CDR3a [for $\alpha\beta$ TCRs]
- CDR3g and CDR3d [for $\gamma\delta$ TCRs]
- CDR3h and CDR3l [for heavy/light chain BCRs]

meta

data.frame with meta-data for each clone, which may contain clone-specific data, such as, V/J genes, cell-type (e.g. CD8+, CD4+), nut also repertoire-specific data, such as, biological condition, HLA type, age, etc. This data will be used to annotate the graph nodes and help downstream analyses.

control

auxiliary parameters to control the algorithm's behavior. See the details below:

gmi: the minimum sequence identity between a pair of CDR3 sequences for them to even be considered for alignment and scoring (default = 0.7; 70 percent identity).
trim_flank_aa: how many amino acids should be trimmed from the flanks of all CDR3 sequences to isolate the CDR3 cores. trim_flank_aa = 3 (default).
db_custom: additional database (data.frame) which allows us to annotate CDR3 sequences from the input (s) with their cognate antigens. The structure of db_custom must be identical to that in data(vdjdb, package = "ClustIRR"). ClustIRR will use the internal databases if db_custom=NULL (default). Three databases (data only from human CDR3) are integrated in ClustIRR: VDJdb, TCR3d and McPAS-TCR.
db_dist: we compute edit distances between CDR3 sequences from s and from a database (e.g. VDJdb). If a particular distance is smaller than or equal to db_dist (default = 0), then we annotate the CDR3 from s with the specificity of the database CDR3 sequence.

Details

ClustIRR performs the following steps.

Compute similarities between clones within each repertoire $\rightarrow$ the function cluster_irr performs this step
Construct a graph from each TCR repertoire
Construct a joint similarity graph ( $J$ )
Detect communities in $J$
Analyze Differential Community Occupancy (DCO)
- Between individual TCR repertoires with model $M$
- Between groups of TCR repertoires from biological conditions with model $M_h$
Inspect results

Value

The output is an S4 object of class clust_irr. This object contains two sublists:

clust, list, contains clustering results for each receptor chain. The results are stored as data.frame in separate sub-list named appropriately (e.g. CDR3a, CDR3b, CDR3g, etc.). Each row in the data.frames contains a pair of CDR3s.

The remaining columns contain similarity scores for the complete CDR3 sequences (column weight) or their cores (column cweight). The columns max_len and max_clen store the length of the longer CDR3 sequence and core in the pair, and these used to normalize the scores weight and cweight: the normalized scores are shown in the columns nweight and ncweight
inputs, list, contains all user provided inputs (see Arguments)

Examples

# load package input data
data("CDR3ab", package = "ClustIRR")
s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1)

# run analysis
c <- cluster_irr(s = s)

# output class
class(c)

# output structure
str(c)

# inspect which CDR3bs are similar
knitr::kable(head(slot(c, "clust")$CDR3b))
# load package input data
data("CDR3ab", package = "ClustIRR")
s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1)

# run analysis
c <- cluster_irr(s = s)

# output class
class(c)

# output structure
str(c)

# inspect which CDR3bs are similar
knitr::kable(head(slot(c, "clust")$CDR3b))

Datasets `CDR3ab` and `D1` with TCR $\alpha\beta$ mock repertoires

Description

TCR $\alpha\beta$ repertoire with 10,000 T-cells (rows). Each T-cell has the following features: amino acid sequences of their complementarity determining region 3 (CDR3); and variable (V) and joining (J) gene names for TCR chains $\alpha$ and $\beta$ .

Important remark: this is a mock dataset, all CDR3 sequences and the genes were sampled from a larger set of CDR3 $\beta$ sequences and genes of naive CD8+ T cells in humans.

We used this data to create dataset D1: three TCR $\alpha\beta$ repertoires a, b, and c, each with 500 TCR clones. We simulated clonal expansion with increasing degree in TCR repertoires b and c. The TCR repertoires as stores as element of a list. For each TCR repertoires we have a metadata: ma, mb, and mc.

Usage

# For the raw data with 10,000 TCR clones
data(CDR3ab)

# For dataset D1
data(D1)
# For the raw data with 10,000 TCR clones
data(CDR3ab)

# For dataset D1
data(D1)

Format

data.frame with rows as TCR clones and 6 columns

CDR3a: CDR3 $\alpha$ amino acid sequence
TRAV: variable (V) gene of TCR $\alpha$
TRAV: joining (J) gene of TCR $\alpha$
CDR3b: CDR3 $\beta$ amino acid sequence
TRBV: variable (V) gene of TCR $\beta$
TRBV: joining (J) gene of TCR $\beta$

Value

data(CDR3ab) loads the object CDR3ab, which is a data.frame with six columns (3 for TCR $\alpha$ and 3 for TCR $\beta$ ) and rows for each TCR clone (see details).

Source

GLIPH version 2

Examples

data("CDR3ab")
data("D1")
data("CDR3ab")
data("D1")

Model-based differential community occupancy (DCO) analysis

Description

This algorithm takes as input a community matrix, and quantifies the relative enrichment/depletion of individual communities in each sample using a Bayesian hierarchical model.

Usage

dco(community_occupancy_matrix, mcmc_control, compute_delta=TRUE, groups = NA)
dco(community_occupancy_matrix, mcmc_control, compute_delta=TRUE, groups = NA)

Arguments

`community_occupancy_matrix`	matrix, rows are communities, columns are repertoires, matrix entries are numbers of cells in each community and repertoire.
`mcmc_control`	list, configurations for the Markov Chain Monte Carlo (MCMC) simulation.

mcmc_warmup = 750; number of MCMC warmups
mcmc_iter = 1500; number of MCMC iterations
mcmc_chains = 4; number of MCMC chains
mcmc_cores = 1; number of computer cores
mcmc_algorithm = "NUTS"; which MCMC algorithm to use
adapt_delta = 0.95; MCMC step size
max_treedepth = 12; the max value, in exponents of 2, of what the binary tree size in NUTS should have.

compute_delta

should delta be computed by the Stan model? This may be take up extra memory.

groups

vector with integers $\geq1$ , one for each repertoire (column in community_occupancy_matrix). This specifies the biological group of each repertoire (e.g. for cancer repertoire we may specify the index 1, and for normal repertoires the index 2). If this vector is specified, ClustIRR will employ a hierarchical model, modeling the dependence between the repertoires within each group. Else (which is the default setting in ClustIRR), ClustIRR will treat the repertoires as independent samples by employing a simpler model.

Value

The output is a list with the folling elements:

`fit`	model fit (stan object)
`posterior_summary`	nested list with data.frames, summary of model parameters, including their means, medians, 95% credible intervals, etc. Predicted observations (y_hat), which are useful for posterior predictive checks are also provided.
`community_occupancy_matrix`	matrix, rows are communities, columns are repertoires, matrix entries are numbers of cells in each community and repertoire.
`mcmc_control`	mcmc configuration inputs provided as list.
`compute_delta`	the input compute_delta.
`groups`	the input groups.

Examples

# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"],
                  CDR3b = CDR3ab[1:500, "CDR3b"],
                  clone_size = 1,
                  sample = "a")

b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"],
                  CDR3b = CDR3ab[401:900, "CDR3b"],
                  clone_size = 1,
                  sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          algorithm = "leiden",
                          resolution = 1,
                          weight = "ncweight",
                          chains = c("CDR3a", "CDR3b"))

# look at outputs
names(gcd)

# look at the community matrix
head(gcd$community_occupancy_matrix)

# look at the community summary
head(gcd$community_summary$wide)

# look at the node summary
head(gcd$node_summary)

# differential community occupancy analysis
dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix)

names(dco)
# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"],
                  CDR3b = CDR3ab[1:500, "CDR3b"],
                  clone_size = 1,
                  sample = "a")

b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"],
                  CDR3b = CDR3ab[401:900, "CDR3b"],
                  clone_size = 1,
                  sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          algorithm = "leiden",
                          resolution = 1,
                          weight = "ncweight",
                          chains = c("CDR3a", "CDR3b"))

# look at outputs
names(gcd)

# look at the community matrix
head(gcd$community_occupancy_matrix)

# look at the community summary
head(gcd$community_summary$wide)

# look at the node summary
head(gcd$node_summary)

# differential community occupancy analysis
dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix)

names(dco)

Decode graph communities

Description

Given a graph based on which we have detected communities (with the function detect_communities), and a community ID, the function will try to partition the community nodes according to user-defined filters: edge and node filters.

For instance, the user may only be interested in retaining edges with core edge weight > 4; or making sure that nodes that have same 'cell_type' (node meta datafrom) are grouped together. Or the user might want to treat all nodes that have the same V, D and J gene names and HLA types as subgroups, in which case all edges between nodes that do not share the same sets of attributes are dicarded.

Based on these filters, ClustIRR will reformat the edges in the selected community and then find connected components in the resulting graph.

Usage

decode_communities(community_id, graph, edge_filter, node_filter)
decode_communities(community_id, graph, edge_filter, node_filter)

Arguments

`graph`	`igraph` object that has been analyzed by graph-based community detection methods as implemented in `detect_communities`
`community_id`	which community should be decoded?
`edge_filter`	data.frame with edge filters. The deta.frame has three columns:

name: edge attribute name
value: edge attribute value (threshold)
operation: logical operation that tells ClustIRR which edge attribute values should pass the filter. Possible operations: "<", ">", ">=", "<=", "==" and "!=".

node_filter

a vector with node attributes. Groups of nodes that have the same attribute values among ALL provided attributes will be treated as a subcomponent.

Value

The output is a "filtered" igraph object.

Examples

# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:300, "CDR3a"],
                CDR3b = CDR3ab[1:300, "CDR3b"],
                clone_size = 1,
                sample = "a")

b <- data.frame(CDR3a = CDR3ab[201:400, "CDR3a"],
                CDR3b = CDR3ab[201:400, "CDR3b"],
                clone_size = 1,
                sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          weight = "nweight",
                          algorithm = "leiden",
                          resolution = 1,
                          iterations = 100,
                          chains = c("CDR3a", "CDR3b"))


# We "decompose" the communities in the gcd object using decode_community 
# based on the attributes of the edges (edge_filter) and nodes (node_filter).
# We can pick from these edge attributes and create filters:
library(igraph)
edge_attr_names(graph = gcd$graph)

# For instance, the following edge-filter will instruct ClustIRR to keep 
# edges with: edge attributes: nweight>=3 \bold{AND} ncweight>=3
edge_filter <- rbind(data.frame(name = "nweight", value = 3, operation = ">="),
                     data.frame(name = "ncweight", value = 3, operation = ">="))

# In addition, we can construct filters based on the following node attributes:
vertex_attr_names(graph = gcd$graph)

# The following node-filter will instruct ClustIRR to retain edges 
# between nodes that have shared node attributed with respect to ALL 
# of the following node attributes: 
node_filter <- data.frame(name = "Ag_gene")

# Lets inspect community with ID = 1.
c1 <- decode_communities(community_id = 1, 
                         graph = gcd$graph,
                         edge_filter = edge_filter,
                         node_filter = node_filter)

# Plot resulting igraph
par(mar = c(0, 0, 0, 0))
plot(c1, vertex.size = 10)

# Now look at node attributes
as_data_frame(x = c1, what = "vertices")[,c("name", "component_id", 
                                            "CDR3b", "CDR3a", "Ag_gene")]
# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:300, "CDR3a"],
                CDR3b = CDR3ab[1:300, "CDR3b"],
                clone_size = 1,
                sample = "a")

b <- data.frame(CDR3a = CDR3ab[201:400, "CDR3a"],
                CDR3b = CDR3ab[201:400, "CDR3b"],
                clone_size = 1,
                sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          weight = "nweight",
                          algorithm = "leiden",
                          resolution = 1,
                          iterations = 100,
                          chains = c("CDR3a", "CDR3b"))


# We "decompose" the communities in the gcd object using decode_community 
# based on the attributes of the edges (edge_filter) and nodes (node_filter).
# We can pick from these edge attributes and create filters:
library(igraph)
edge_attr_names(graph = gcd$graph)

# For instance, the following edge-filter will instruct ClustIRR to keep 
# edges with: edge attributes: nweight>=3 \bold{AND} ncweight>=3
edge_filter <- rbind(data.frame(name = "nweight", value = 3, operation = ">="),
                     data.frame(name = "ncweight", value = 3, operation = ">="))

# In addition, we can construct filters based on the following node attributes:
vertex_attr_names(graph = gcd$graph)

# The following node-filter will instruct ClustIRR to retain edges 
# between nodes that have shared node attributed with respect to ALL 
# of the following node attributes: 
node_filter <- data.frame(name = "Ag_gene")

# Lets inspect community with ID = 1.
c1 <- decode_communities(community_id = 1, 
                         graph = gcd$graph,
                         edge_filter = edge_filter,
                         node_filter = node_filter)

# Plot resulting igraph
par(mar = c(0, 0, 0, 0))
plot(c1, vertex.size = 10)

# Now look at node attributes
as_data_frame(x = c1, what = "vertices")[,c("name", "component_id", 
                                            "CDR3b", "CDR3a", "Ag_gene")]

Graph-based community detection (GCD)

Description

Graph-based community detection in graphs constructed by get_graph or get_joint_graph.

Usage

detect_communities(graph,
                   weight = "nweight",
                   algorithm = "leiden", 
                   resolution = 1,
                   iterations = 100,
                   chains)
detect_communities(graph,
                   weight = "nweight",
                   algorithm = "leiden", 
                   resolution = 1,
                   iterations = 100,
                   chains)

Arguments

`graph`	`igraph` object
`algorithm`	graph-based community detection (GCD) method: leiden (default), louvain or infomap.
`resolution`	clustering resolution (default = 1) for GCD.
`iterations`	clustering iterations (default = 100) for GCD.
`weight`	which edge weight attribute (default = nweight) should be used for GCD
`chains`	which chains should be used for clustering? For instance: chains = "CDR3a"; or chains = "CDR3b"; or chains = c("CDR3a", "CDR3b").

Details

ClustIRR employs graph-based community detection (GCD) algorithms, such as Louvain, Leiden or InfoMap, to identify communities of nodes that have high density of edges among each other, and low density of edges with nodes outside the community.

Value

The output is a list with the folling elements:

`community_occupancy_matrix`	matrix, rows are communities, columns are repertoires, matrix entries are numbers of cells in each community and repertoire.
`community_summary`	data.frame, rows are communities and their properties are provided as columns.
`node_summary`	data.frame, rows are nodes (clones) and their properties are provided as columnscontains all user provided.
`graph`	igraph object, processed graph object.
`graph_structure_quality`	graph modularity and quality (only for Leiden) measure of the strength of division of the graph into communities.
`input_config`	list, inputs provided as list.

Examples

# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:300, "CDR3a"],
                  CDR3b = CDR3ab[1:300, "CDR3b"],
                  clone_size = 1,
                  sample = "a")

b <- data.frame(CDR3a = CDR3ab[201:400, "CDR3a"],
                  CDR3b = CDR3ab[201:400, "CDR3b"],
                  clone_size = 1,
                  sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          weight = "nweight",
                          algorithm = "leiden",
                          resolution = 1,
                          iterations = 100,
                          chains = c("CDR3a", "CDR3b"))

# look at outputs
names(gcd)

# look at the community occupancymatrix
head(gcd$community_occupancy_matrix)

# look at the community summary
head(gcd$community_summary$wide)

# look at the node summary
head(gcd$node_summary)
# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:300, "CDR3a"],
                  CDR3b = CDR3ab[1:300, "CDR3b"],
                  clone_size = 1,
                  sample = "a")

b <- data.frame(CDR3a = CDR3ab[201:400, "CDR3a"],
                  CDR3b = CDR3ab[201:400, "CDR3b"],
                  clone_size = 1,
                  sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          weight = "nweight",
                          algorithm = "leiden",
                          resolution = 1,
                          iterations = 100,
                          chains = c("CDR3a", "CDR3b"))

# look at outputs
names(gcd)

# look at the community occupancymatrix
head(gcd$community_occupancy_matrix)

# look at the community summary
head(gcd$community_summary$wide)

# look at the node summary
head(gcd$node_summary)

Estimate the number of antigen-specific T-cells in selected communities

Description

Use node_summary data.frame generated by the function detect_communities; and 2. antigen species/genes to estimate the number of antigen-specific T-cells in selected communities in each repertoire.

Usage

get_ag_summary(node_summary,
               ag_species, 
               ag_genes,
               db = "vdjdb",
               db_dist = 0,
               chain = "both")
get_ag_summary(node_summary,
               ag_species, 
               ag_genes,
               db = "vdjdb",
               db_dist = 0,
               chain = "both")

Arguments

`node_summary`	`node_summary` data.frame
`ag_species`	antigen species, character vector, e.g. c("EBV", "CMV")
`ag_genes`	antigen genes, character vector, e.g. "MLANA"
`db`	annotation database, character, e.g. "vdjdb"
`db_dist`	maximum edit distance threshold for matching, nummeric
`chain`	immune receptor chain for annotation, "both", "CDR3a" or "CDR3b"

Details

The user has to provide a vector of antigen species (e.g. ag_species = c("EBV", "CMV")) and/or a vector of antigen genes (e.g. ag_genes = "MLANA"). Furthermore, the user has to provide nodes (node_summary data.frame created by the function detect_communities) and a vector with community IDs.

The user can also select an annotation database db, such as "vdjdb", "mcpas" or "tcr3d"; and restrict the annotation to specific IR chains, such as "CDR3a", "CDR3b" or "both". By default, we will look for perfect matches (db_dist=0) between CDR3 sequences in the input and in the annotation database for annotation. Flexible annotation based on edit distances can be performed by increasing db_dist.

Value

The output is a data.frame with the number of T-cells specific for the antigenic species/genes (columns) provided as input per repertoire (row), including the total number of T-cells in each repertoire.

Examples

# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"],
                  CDR3b = CDR3ab[1:500, "CDR3b"],
                  clone_size = 1,
                  sample = "a")

b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"],
                  CDR3b = CDR3ab[401:900, "CDR3b"],
                  clone_size = 1,
                  sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          algorithm = "leiden",
                          resolution = 1,
                          weight = "nweight",
                          chains = c("CDR3a", "CDR3b"))

# differential community occupancy analysis
dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix)

ag_summary <- get_ag_summary(node_summary = gcd$node_summary,
                             ag_species = c("EBV", "CMV"),
                             ag_genes = "MLANA",
                             db = "vdjdb",
                             db_dist = 0,
                             chain = "both")
# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"],
                  CDR3b = CDR3ab[1:500, "CDR3b"],
                  clone_size = 1,
                  sample = "a")

b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"],
                  CDR3b = CDR3ab[401:900, "CDR3b"],
                  clone_size = 1,
                  sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          algorithm = "leiden",
                          resolution = 1,
                          weight = "nweight",
                          chains = c("CDR3a", "CDR3b"))

# differential community occupancy analysis
dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix)

ag_summary <- get_ag_summary(node_summary = gcd$node_summary,
                             ag_species = c("EBV", "CMV"),
                             ag_genes = "MLANA",
                             db = "vdjdb",
                             db_dist = 0,
                             chain = "both")

Compare community $\beta$ s between pairs of repertoires

Description

Visualize the $\beta$ means as a 2D scatterplot, representing relative community occupancies for all pairs of repertoires. At the same time, annotate the communities (dots) based on their specificity.

Usage

get_beta_scatterplot(beta,
                     node_summary,
                     ag_species, 
                     ag_genes,
                     db = "vdjdb",
                     db_dist = 0,
                     chain = "both")
get_beta_scatterplot(beta,
                     node_summary,
                     ag_species, 
                     ag_genes,
                     db = "vdjdb",
                     db_dist = 0,
                     chain = "both")

Arguments

`beta`	`beta` data.frame
`node_summary`	`node_summary` data.frame
`ag_species`	antigen species, character vector, e.g. c("EBV", "CMV")
`ag_genes`	antigen genes, character vector, e.g. "MLANA"
`db`	annotation database, character, e.g. "vdjdb"
`db_dist`	maximum edit distance threshold for matching, nummeric
`chain`	immune receptor chain for annotation, "both", "CDR3a" or "CDR3b"

Details

Value

The output is a list with 4 elements:

node_annotations: annotated node_summary

beta_summary: annotated beta

vars: annotation variables

scatterplots: a list of scatterplots: Each element of the list contains a scatterplots for a specific antigen species/gene. Within each element of the list there are $n^2$ panels (comparisons between repertoire pairs), with $n$ as the number of repertoires.

Examples

# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"],
                  CDR3b = CDR3ab[1:500, "CDR3b"],
                  clone_size = 1,
                  sample = "a")

b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"],
                  CDR3b = CDR3ab[401:900, "CDR3b"],
                  clone_size = 1,
                  sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          algorithm = "leiden",
                          resolution = 1,
                          weight = "ncweight",
                          chains = c("CDR3a", "CDR3b"))

# look at outputs
names(gcd)

# look at the community matrix
head(gcd$community_occupancy_matrix)

# look at the community summary
head(gcd$community_summary$wide)

# look at the node summary
head(gcd$node_summary)

# differential community occupancy analysis
dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix)

names(dco)

# generate beta violin plots
beta_scatterplot <- get_beta_scatterplot(beta = dco$posterior_summary$beta,
                                         node_summary = gcd$node_summary,
                                         ag_species = c("EBV", "CMV"),
                                         ag_genes = "MLANA",
                                         db = "vdjdb",
                                         db_dist = 0,
                                         chain = "both")
# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"],
                  CDR3b = CDR3ab[1:500, "CDR3b"],
                  clone_size = 1,
                  sample = "a")

b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"],
                  CDR3b = CDR3ab[401:900, "CDR3b"],
                  clone_size = 1,
                  sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          algorithm = "leiden",
                          resolution = 1,
                          weight = "ncweight",
                          chains = c("CDR3a", "CDR3b"))

# look at outputs
names(gcd)

# look at the community matrix
head(gcd$community_occupancy_matrix)

# look at the community summary
head(gcd$community_summary$wide)

# look at the node summary
head(gcd$node_summary)

# differential community occupancy analysis
dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix)

names(dco)

# generate beta violin plots
beta_scatterplot <- get_beta_scatterplot(beta = dco$posterior_summary$beta,
                                         node_summary = gcd$node_summary,
                                         ag_species = c("EBV", "CMV"),
                                         ag_genes = "MLANA",
                                         db = "vdjdb",
                                         db_dist = 0,
                                         chain = "both")

Visualize distribution of $\beta$ means in each repertoire as violin plots

Description

Visualize the $\beta$ means as violin plots, representing relative community occupancies for individual repertoires. At the same time, annotate the communities (dots) based on their specificity.

Usage

get_beta_violins(beta,
                 node_summary,
                 ag_species,
                 ag_genes,
                 db = "vdjdb",
                 db_dist = 0,
                 chain = "both")
get_beta_violins(beta,
                 node_summary,
                 ag_species,
                 ag_genes,
                 db = "vdjdb",
                 db_dist = 0,
                 chain = "both")

Arguments

`beta`	`beta` data.frame
`node_summary`	`node_summary` data.frame
`ag_species`	antigen species, character vector, e.g. c("EBV", "CMV")
`ag_genes`	antigen genes, character vector, e.g. "MLANA"
`db`	annotation database, character, e.g. "vdjdb"
`db_dist`	maximum edit distance threshold for matching, nummeric
`chain`	immune receptor chain for annotation, "both", "CDR3a" or "CDR3b"

Details

Value

The output is a list with 4 elements:

node_annotations: annotated node_summary

beta_summary: annotated beta

vars: annotation variables

violins: violin plots (one for each antigen species and gene)

violins: a list of violin plots. Each element of the list contains a violin visual for a specific antigen species/gene.

Examples

# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"],
                  CDR3b = CDR3ab[1:500, "CDR3b"],
                  clone_size = 1,
                  sample = "a")

b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"],
                  CDR3b = CDR3ab[401:900, "CDR3b"],
                  clone_size = 1,
                  sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          algorithm = "leiden",
                          resolution = 1,
                          weight = "ncweight",
                          chains = c("CDR3a", "CDR3b"))

# differential community occupancy analysis
dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix)

# generate beta violin plots
beta_violins <- get_beta_violins(beta = dco$posterior_summary$beta,
                                 node_summary = gcd$node_summary,
                                 ag_species = c("EBV", "CMV"),
                                 ag_genes = "MLANA",
                                 db = "vdjdb",
                                 db_dist = 0,
                                 chain = "both")
# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"],
                  CDR3b = CDR3ab[1:500, "CDR3b"],
                  clone_size = 1,
                  sample = "a")

b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"],
                  CDR3b = CDR3ab[401:900, "CDR3b"],
                  clone_size = 1,
                  sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          algorithm = "leiden",
                          resolution = 1,
                          weight = "ncweight",
                          chains = c("CDR3a", "CDR3b"))

# differential community occupancy analysis
dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix)

# generate beta violin plots
beta_violins <- get_beta_violins(beta = dco$posterior_summary$beta,
                                 node_summary = gcd$node_summary,
                                 ag_species = c("EBV", "CMV"),
                                 ag_genes = "MLANA",
                                 db = "vdjdb",
                                 db_dist = 0,
                                 chain = "both")

Get `igraph` object from `clust_irr` object

Description

Given a clust_irr object generated by the function cluster_irr, the function get_graph constructs an igraph object.

The graph nodes represent IR clones. Undirected edges are drawn between pairs of nodes, and the attributes of these edges are assigned based on the clust_irr outputs: $\bar{\omega}$ , $\bar{\omega}_c$ , etc.

Usage

get_graph(clust_irr)get_graph(clust_irr)

Arguments

clust_irr

S4 object generated by the function cluster_irr

Value

The output is a list with the following elements. First, the list contains an igraph object. The graph nodes and edges contain attributes encoded in the clust_irr objects. Second, it contains a data.frame in which rows are clones (nodes) in the graph. Third, the list contains the logical variable joint_graph, which is set to TRUE if the graph is a joint graph generated by the function get_joint_graph and FALSE if the graph is not a joint graph generated by get_graph.

Examples

# load package input data
data("CDR3ab", package = "ClustIRR")
s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1)

# run ClustIRR analysis
out <- cluster_irr(s = s)

# get graph
g <- get_graph(clust_irr = out)

names(g)
# load package input data
data("CDR3ab", package = "ClustIRR")
s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1)

# run ClustIRR analysis
out <- cluster_irr(s = s)

# get graph
g <- get_graph(clust_irr = out)

names(g)

Generate honycomb plot: visualize community occupancy of pairs of immune receptor repertoires

Description

Use the community_occupancy_matrix generated by the function detect_communities to generate honeycomb plots for each pair of repertoires. In each plot, we will show communities (rows in the matric community_occupancy_matrix) as dots and their intensities in a pair of repertoires (x-axis and y-axis). The density of dots is encoded by the color of the honeycomb-like hexagons.

Usage

get_honeycombs(com)
get_honeycombs(com)

Arguments

com

community_occupancy_matrix, matrix generated by detect_communities

Details

Value

The output is a list with ggplots. Given n repertoires (columns in input community_occupancy_matrix), it will generate n*(n-1)/2 plots. You can arrange the ggplots (or a portion of them) in any shape e.g. with the R-package patchwork.

Examples

# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:300, "CDR3a"],
                CDR3b = CDR3ab[1:300, "CDR3b"],
                clone_size = 1,
                sample = "a")

b <- data.frame(CDR3a = CDR3ab[201:400, "CDR3a"],
                  CDR3b = CDR3ab[201:400, "CDR3b"],
                  clone_size = 1,
                  sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          algorithm = "leiden",
                          resolution = 1,
                          weight = "ncweight",
                          chains = c("CDR3a", "CDR3b"))

# get honeycombs
g <- get_honeycombs(com = gcd$community_occupancy_matrix)

g
# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3a = CDR3ab[1:300, "CDR3a"],
                CDR3b = CDR3ab[1:300, "CDR3b"],
                clone_size = 1,
                sample = "a")

b <- data.frame(CDR3a = CDR3ab[201:400, "CDR3a"],
                  CDR3b = CDR3ab[201:400, "CDR3b"],
                  clone_size = 1,
                  sample = "b")
b$clone_size[1] <- 20

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get joint graph
jg <- get_joint_graph(clust_irrs = c)

# detect communities
gcd <- detect_communities(graph = jg$graph, 
                          algorithm = "leiden",
                          resolution = 1,
                          weight = "ncweight",
                          chains = c("CDR3a", "CDR3b"))

# get honeycombs
g <- get_honeycombs(com = gcd$community_occupancy_matrix)

g

Create joint `igraph` object from multiple `clust_irr` objects

Description

Given a vector of clust_irr objects, generated by the function cluster_irr, the function get_joint_graph performs the following steps:

1. runs the function get_graph on each clust_irr object

2. merges the nodes: if graph a and b have |a| and |b| nodes, then the joint graph has |a|+|b| nodes, regardless of whether exactly the same clone (vertex) is found in both graphs.

3. draws edges between nodes from the different graphs using the same algorithm for drawing edges between nodes within an IRR (see function clust_irr).

4. return a joint graph as igraph object

5. return the input clustirr object list

6. return a logical joint_graph=TRUE

Usage

get_joint_graph(clust_irrs, cores = 1)get_joint_graph(clust_irrs, cores = 1)

Arguments

`clust_irrs`	A list of at least two S4 objects generated with the function `cluster_irr`
`cores`	number of computer cores to use (default = 1)

Value

The main output is an igraph object.

Examples

# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "a", clone_size = 1)
b <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "b", clone_size = 1)

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get graph
g <- get_joint_graph(clust_irrs = c)

names(g)
# load package input data
data("CDR3ab", package = "ClustIRR")
a <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "a", clone_size = 1)
b <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "b", clone_size = 1)

# run ClustIRR analysis
c <- c(cluster_irr(s = a), cluster_irr(s = b))

# get graph
g <- get_joint_graph(clust_irrs = c)

names(g)

CDR3 sequences and their matching epitopes obtained from McPAS-TCR

Description

data.frame with CDR3a and/or CDR3b sequences and their matching antigenic epitopes obtained from McPAS-TCR. The remaining CDR3 columns are set to NA. For data processing details see the script inst/script/get_mcpastcr.R

Usage

data(mcpas)
data(mcpas)

Format

data.frame with columns:

CDR3a: CDR3a amino acid sequence
CDR3b: CDR3b amino acid sequence
CDR3g: CDR3g amino acid sequence -> NA
CDR3d: CDR3d amino acid sequence -> NA
CDR3h: CDR3h amino acid sequence -> NA
CDR3l: CDR3l amino acid sequence -> NA
CDR3_species: CDR3 species (e.g. human, mouse, ...)
Antigen_species: antigen species
Antigen_gene: antigen gene
Reference: Reference (Pubmed ID)

Value

data(mcpas) loads the object McPAS-TCR

Source

McPAS-TCR, June 2024

Examples

data(mcpas)
data(mcpas)

Plot ClustIRR graph

Description

This function visualizes a graph. The main input is g object created by the function get_graph.

Usage

plot_graph(g,
           select_by = "Ag_species",
           as_visnet = FALSE,
           show_singletons = TRUE, 
           node_opacity = 1)
plot_graph(g,
           select_by = "Ag_species",
           as_visnet = FALSE,
           show_singletons = TRUE, 
           node_opacity = 1)

Arguments

`g`	Object returned by the functions `get_graph` or `get_joint_graph`
`as_visnet`	logical, if `as_visnet`=TRUE we plot an interactive graph with visNetwork. If `as_visnet`=FALSE, we plot a static graph with igraph.
`select_by`	character string, two values are possible: "Ag_species" or "Ag_gene". This only has an effect if `as_visnet` = TRUE, i.e. if the graph is interactive. It will allow the user to highligh clones (nodes) in the graph that are associated with a specific antigenic specie or gene. The mapping between CDR3 and antigens is extracted from databases, such as, VDJdb, McPAS-TCR and TCR3d. This mapping is done by the function `get_graph`. If none of the clones in the graph are matched to a CDR3, then the user will have no options to select/highlight.
`show_singletons`	logical, if `show_singletons`=TRUE we plot all vertices. If `show_singletons`=FALSE, we plot only vertices connected by edges.
`node_opacity`	probability, controls the opacity of node colors. Lower values corresponding to more transparent colors.

Value

The output is an igraph or visNetwork plot.

The size of the vertices increases linearly as the logarithm of the degree of the clonal expansion (number of cells per clone) in the corresponding clones.

Examples

# load package input data
data("CDR3ab", package = "ClustIRR")
s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1)

# run ClustIRR analysis
out <- cluster_irr(s = s)

# get graph
g <- get_graph(clust_irr = out)

# plot graph with vertices as clones
plot_graph(g, as_visnet=FALSE, show_singletons=TRUE, node_opacity = 0.8)
# load package input data
data("CDR3ab", package = "ClustIRR")
s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1)

# run ClustIRR analysis
out <- cluster_irr(s = s)

# get graph
g <- get_graph(clust_irr = out)

# plot graph with vertices as clones
plot_graph(g, as_visnet=FALSE, show_singletons=TRUE, node_opacity = 0.8)

CDR3 sequences and their matching epitopes obtained from TCR3d

Description

data.frame with paired CDR3a and CDR3b CDR3 sequences and their matching epitopes obtained from TCR3d. The remaining CDR3 columns are set to NA. The antigenic epitopes come from cancer antigens and from viral antigens. For data processing details see the script inst/script/get_tcr3d.R

Usage

data(tcr3d)
data(tcr3d)

Format

data.frame with columns:

CDR3a: CDR3a amino acid sequence
CDR3b: CDR3b amino acid sequence
CDR3g: CDR3g amino acid sequence -> NA
CDR3d: CDR3d amino acid sequence -> NA
CDR3h: CDR3h amino acid sequence -> NA
CDR3l: CDR3l amino acid sequence -> NA
CDR3_species: CDR3 species (e.g. human, mouse, ...)
Antigen_species: antigen species
Antigen_gene: antigen gene
Reference: Reference ID

Value

data(tcr3d) loads the object tcr3d

Source

TCR3d, June 2024

Examples

data("tcr3d")
data("tcr3d")

CDR3 sequences and their matching epitopes obtained from VDJdb

Description

data.frame with unpaired CDR3a or CDR3b sequences and their matching epitopes obtained from VDJdb. The remaining CDR3 columns are set to NA. For data processing details see the script inst/script/get_vdjdb.R

Usage

data(vdjdb)
data(vdjdb)

Format

data.frame with columns:

CDR3a: CDR3a amino acid sequence
CDR3b: CDR3b amino acid sequence
CDR3g: CDR3g amino acid sequence -> NA
CDR3d: CDR3d amino acid sequence -> NA
CDR3h: CDR3h amino acid sequence -> NA
CDR3l: CDR3l amino acid sequence -> NA
CDR3_species: CDR3 species (e.g. human, mouse, ...)
Antigen_species: antigen species
Antigen_gene: antigen gene
Reference: Reference (Pubmed ID)

Value

data(vdjdb) loads the object vdjdb

Source

VDJdb, December 2024

Examples

data("vdjdb")
data("vdjdb")

Package 'ClustIRR'

Help Index

BLOSUM62 matrix

Description

Usage

Format

Details

Source

References

Examples

clust_irr class

Description

Arguments

Value

Accessors

Examples

Clustering of immune receptor repertoires (IRRs)

Description

Usage

Arguments

Details

Value

Examples

Datasets CDR3ab and D1 with TCRαβ\alpha\betaαβ mock repertoires

Description

Usage

Format

Value

Source

Examples

Model-based differential community occupancy (DCO) analysis

Description

Usage

Arguments

Value

Examples

Decode graph communities

Description

Usage

Arguments

Value

Examples

Graph-based community detection (GCD)

Description

Usage

Arguments

Details

Value

Examples

Estimate the number of antigen-specific T-cells in selected communities

Description

Usage

Arguments

Details

Value

Examples

Compare community β\betaβs between pairs of repertoires

Description

Usage

Arguments

Details

Value

Examples

Visualize distribution of β\betaβ means in each repertoire as violin plots

Description

Usage

Arguments

Details

Value

Examples

Get igraph object from clust_irr object

Description

Usage

Arguments

Value

Examples

Generate honycomb plot: visualize community occupancy of pairs of immune receptor repertoires

Description

Usage

Arguments

Datasets `CDR3ab` and `D1` with TCR $\alpha\beta$ mock repertoires

Compare community $\beta$ s between pairs of repertoires

Visualize distribution of $\beta$ means in each repertoire as violin plots

Get `igraph` object from `clust_irr` object

Create joint `igraph` object from multiple `clust_irr` objects