Title: | Clustering of immune receptor repertoires |
---|---|
Description: | ClustIRR analyzes repertoires of B- and T-cell receptors. It starts by identifying communities of immune receptors with similar specificities, based on the sequences of their complementarity-determining regions (CDRs). Next, it employs a Bayesian probabilistic models to quantify differential community occupancy (DCO) between repertoires, allowing the identification of expanding or contracting communities in response to e.g. infection or cancer treatment. |
Authors: | Simo Kitanovski [aut, cre] |
Maintainer: | Simo Kitanovski <[email protected]> |
License: | GPL-3 + file LICENSE |
Version: | 1.5.42 |
Built: | 2025-03-01 10:24:55 UTC |
Source: | https://github.com/bioc/ClustIRR |
Predefined scoring matrix for amino acid or nucleoitide alignments.
data("BLOSUM62")
data("BLOSUM62")
BLOSUM62 is a square symmetric matrix. Rows and columns are identical single letters, representing nucleotide or amino acid. Elements are integer coefficients (substitution scores).
BLOSUM62 was obtained from NCBI (the same matrix used by the stand- alone BLAST software).
See https://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62
See https://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62
data(BLOSUM62, package = "ClustIRR") BLOSUM62
data(BLOSUM62, package = "ClustIRR") BLOSUM62
Objects of the class clust_irr
are generated by the function
cluster_irr
. These objects are used to store the clustering results
in a structured way, such that they may be used as inputs of other ClustIRR
functions (e.g. get_graph, plot_graph, etc.).
The output is an S4
object of class clust_irr
. This object
contains two sublists:
clust
, list, contains clustering results for each IR chain.
The results are stored as data.frame in separate sub-list named appropriately
(e.g. CDR3a, CDR3b, CDR3g, etc.). Each row in the data.frames contains a pair
of CDR3s.
The remaining columns contain similarity scores for the complete CDR3 sequences
(column weight
) or their cores (column cweight
). The columns
max_len
and max_clen
store the length of the longer CDR3 and
CDR3 core sequence in the pair, and these used to normalize the scores
weight
and cweight
: the normalized scores are shown in the
columns nweight
and ncweight
inputs
, list, contains all user provided inputs (see Arguments)
clust |
list, contains clustering results for each TCR/BCR chain. The results are stored in separate sub-list named appropriately (e.g. CDR3a, CDR3b, CDR3g, etc.) |
inputs |
list, contains all user provided inputs |
The output is an S4 object of class clust_irr
To access the slots of clust_irr
object we have two accessor
functions. In the description below, x
is a clust_irr
object.
get_clustirr_clust(x)
:
Extract the clustering results (slot clust)
get_clustirr_inputs(x)
:
Extract the processed inputs (slot inputs)
# load package input data data("CDR3ab", package = "ClustIRR") s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1) # run analysis c <- cluster_irr(s = s) # output class class(c) # output structure str(c) # inspect which CDR3bs are globally similar knitr::kable(head(slot(c, "clust")$CDR3b)) # clust_irr S4 object generated 'manually' from the individual results new_clust_irr <- new("clust_irr", clust = slot(object = c, name = "clust"), inputs = slot(object = c, name = "inputs")) # we should get identical outputs identical(x = new_clust_irr, y = c)
# load package input data data("CDR3ab", package = "ClustIRR") s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1) # run analysis c <- cluster_irr(s = s) # output class class(c) # output structure str(c) # inspect which CDR3bs are globally similar knitr::kable(head(slot(c, "clust")$CDR3b)) # clust_irr S4 object generated 'manually' from the individual results new_clust_irr <- new("clust_irr", clust = slot(object = c, name = "clust"), inputs = slot(object = c, name = "inputs")) # we should get identical outputs identical(x = new_clust_irr, y = c)
cluster_irr
computes similarities between immune receptors (IRs =
T-cell and B-cell receptors) based on their CDR3 sequences.
cluster_irr(s, meta = NULL, control = list(gmi = 0.7, trim_flank_aa = 3, db_dist = 0, db_custom = NULL))
cluster_irr(s, meta = NULL, control = list(gmi = 0.7, trim_flank_aa = 3, db_dist = 0, db_custom = NULL))
s |
a data.frame with complementarity determining region 3 (CDR3) amino acid sequences observed in IRR clones (data.frame rows). The data.frame has the following columns (IR clone features):
|
meta |
data.frame with meta-data for each clone, which may contain clone-specific data, such as, V/J genes, cell-type (e.g. CD8+, CD4+), nut also repertoire-specific data, such as, biological condition, HLA type, age, etc. This data will be used to annotate the graph nodes and help downstream analyses. |
control |
auxiliary parameters to control the algorithm's behavior. See the details below:
|
ClustIRR performs the following steps.
Compute similarities between clones within each repertoire
the function
cluster_irr
performs this step
Construct a graph from each TCR repertoire
Construct a joint similarity graph ()
Detect communities in
Analyze Differential Community Occupancy (DCO)
Between individual TCR repertoires with model
Between groups of TCR repertoires from biological conditions
with model
Inspect results
The output is an S4
object of class clust_irr
. This object
contains two sublists:
clust
, list, contains clustering results for each receptor chain.
The results are stored as data.frame in separate sub-list named appropriately
(e.g. CDR3a, CDR3b, CDR3g, etc.). Each row in the data.frames contains a pair
of CDR3s.
The remaining columns contain similarity scores for the complete CDR3 sequences
(column weight
) or their cores (column cweight
). The columns
max_len
and max_clen
store the length of the longer CDR3 sequence
and core in the pair, and these used to normalize the scores weight
and
cweight
: the normalized scores are shown in the columns nweight
and ncweight
inputs
, list, contains all user provided inputs (see Arguments)
# load package input data data("CDR3ab", package = "ClustIRR") s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1) # run analysis c <- cluster_irr(s = s) # output class class(c) # output structure str(c) # inspect which CDR3bs are similar knitr::kable(head(slot(c, "clust")$CDR3b))
# load package input data data("CDR3ab", package = "ClustIRR") s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1) # run analysis c <- cluster_irr(s = s) # output class class(c) # output structure str(c) # inspect which CDR3bs are similar knitr::kable(head(slot(c, "clust")$CDR3b))
CDR3ab
and D1
with TCR
mock repertoiresTCR repertoire with 10,000 T-cells (rows). Each T-cell
has the following features: amino acid sequences of their complementarity
determining region 3 (CDR3); and variable (V) and joining (J) gene names
for TCR chains
and
.
Important remark: this is a mock dataset, all CDR3 sequences and the genes were
sampled from a larger set of CDR3 sequences and genes of naive
CD8+ T cells in humans.
We used this data to create dataset D1: three TCR
repertoires
a
, b
, and c
, each with 500 TCR
clones. We simulated clonal expansion with increasing degree in TCR
repertoires b
and c
. The TCR repertoires as stores as
element of a list. For each TCR repertoires we have a metadata:
ma
, mb
, and mc
.
# For the raw data with 10,000 TCR clones data(CDR3ab) # For dataset D1 data(D1)
# For the raw data with 10,000 TCR clones data(CDR3ab) # For dataset D1 data(D1)
data.frame
with rows as TCR clones and 6 columns
CDR3a
: CDR3 amino acid sequence
TRAV
: variable (V) gene of TCR
TRAV
: joining (J) gene of TCR
CDR3b
: CDR3 amino acid sequence
TRBV
: variable (V) gene of TCR
TRBV
: joining (J) gene of TCR
data(CDR3ab) loads the object CDR3ab, which is a data.frame with six columns
(3 for TCR and 3 for TCR
) and rows for each TCR clone
(see details).
data("CDR3ab") data("D1")
data("CDR3ab") data("D1")
This algorithm takes as input a community matrix, and quantifies the relative enrichment/depletion of individual communities in each sample using a Bayesian hierarchical model.
dco(community_occupancy_matrix, mcmc_control, compute_delta=TRUE, groups = NA)
dco(community_occupancy_matrix, mcmc_control, compute_delta=TRUE, groups = NA)
community_occupancy_matrix |
matrix, rows are communities, columns are repertoires, matrix entries are numbers of cells in each community and repertoire. |
mcmc_control |
list, configurations for the Markov Chain Monte Carlo (MCMC) simulation. |
mcmc_warmup = 750; number of MCMC warmups
mcmc_iter = 1500; number of MCMC iterations
mcmc_chains = 4; number of MCMC chains
mcmc_cores = 1; number of computer cores
mcmc_algorithm = "NUTS"; which MCMC algorithm to use
adapt_delta = 0.95; MCMC step size
max_treedepth = 12; the max value, in exponents of 2, of what the binary tree size in NUTS should have.
compute_delta |
should delta be computed by the Stan model? This may be take up extra memory. |
groups |
vector with integers |
The output is a list with the folling elements:
fit |
model fit (stan object) |
posterior_summary |
nested list with data.frames, summary of model parameters, including their means, medians, 95% credible intervals, etc. Predicted observations (y_hat), which are useful for posterior predictive checks are also provided. |
community_occupancy_matrix |
matrix, rows are communities, columns are repertoires, matrix entries are numbers of cells in each community and repertoire. |
mcmc_control |
mcmc configuration inputs provided as list. |
compute_delta |
the input compute_delta. |
groups |
the input groups. |
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"], CDR3b = CDR3ab[1:500, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"], CDR3b = CDR3ab[401:900, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, algorithm = "leiden", resolution = 1, weight = "ncweight", chains = c("CDR3a", "CDR3b")) # look at outputs names(gcd) # look at the community matrix head(gcd$community_occupancy_matrix) # look at the community summary head(gcd$community_summary$wide) # look at the node summary head(gcd$node_summary) # differential community occupancy analysis dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix) names(dco)
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"], CDR3b = CDR3ab[1:500, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"], CDR3b = CDR3ab[401:900, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, algorithm = "leiden", resolution = 1, weight = "ncweight", chains = c("CDR3a", "CDR3b")) # look at outputs names(gcd) # look at the community matrix head(gcd$community_occupancy_matrix) # look at the community summary head(gcd$community_summary$wide) # look at the node summary head(gcd$node_summary) # differential community occupancy analysis dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix) names(dco)
Given a graph based on which we have detected communities (with the function
detect_communities
), and a community ID, the function will try to
partition the community nodes according to user-defined filters: edge and
node filters.
For instance, the user may only be interested in retaining edges with core edge weight > 4; or making sure that nodes that have same 'cell_type' (node meta datafrom) are grouped together. Or the user might want to treat all nodes that have the same V, D and J gene names and HLA types as subgroups, in which case all edges between nodes that do not share the same sets of attributes are dicarded.
Based on these filters
, ClustIRR will reformat the edges in the
selected community and then find connected components in the
resulting graph.
decode_communities(community_id, graph, edge_filter, node_filter)
decode_communities(community_id, graph, edge_filter, node_filter)
graph |
|
community_id |
which community should be decoded? |
edge_filter |
data.frame with edge filters. The deta.frame has three columns: |
name: edge attribute name
value: edge attribute value (threshold)
operation: logical operation that tells ClustIRR which edge attribute values should pass the filter. Possible operations: "<", ">", ">=", "<=", "==" and "!=".
node_filter |
a vector with node attributes. Groups of nodes that have the same attribute values among ALL provided attributes will be treated as a subcomponent. |
The output is a "filtered" igraph object.
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:300, "CDR3a"], CDR3b = CDR3ab[1:300, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[201:400, "CDR3a"], CDR3b = CDR3ab[201:400, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, weight = "nweight", algorithm = "leiden", resolution = 1, iterations = 100, chains = c("CDR3a", "CDR3b")) # We "decompose" the communities in the gcd object using decode_community # based on the attributes of the edges (edge_filter) and nodes (node_filter). # We can pick from these edge attributes and create filters: library(igraph) edge_attr_names(graph = gcd$graph) # For instance, the following edge-filter will instruct ClustIRR to keep # edges with: edge attributes: nweight>=3 \bold{AND} ncweight>=3 edge_filter <- rbind(data.frame(name = "nweight", value = 3, operation = ">="), data.frame(name = "ncweight", value = 3, operation = ">=")) # In addition, we can construct filters based on the following node attributes: vertex_attr_names(graph = gcd$graph) # The following node-filter will instruct ClustIRR to retain edges # between nodes that have shared node attributed with respect to ALL # of the following node attributes: node_filter <- data.frame(name = "Ag_gene") # Lets inspect community with ID = 1. c1 <- decode_communities(community_id = 1, graph = gcd$graph, edge_filter = edge_filter, node_filter = node_filter) # Plot resulting igraph par(mar = c(0, 0, 0, 0)) plot(c1, vertex.size = 10) # Now look at node attributes as_data_frame(x = c1, what = "vertices")[,c("name", "component_id", "CDR3b", "CDR3a", "Ag_gene")]
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:300, "CDR3a"], CDR3b = CDR3ab[1:300, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[201:400, "CDR3a"], CDR3b = CDR3ab[201:400, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, weight = "nweight", algorithm = "leiden", resolution = 1, iterations = 100, chains = c("CDR3a", "CDR3b")) # We "decompose" the communities in the gcd object using decode_community # based on the attributes of the edges (edge_filter) and nodes (node_filter). # We can pick from these edge attributes and create filters: library(igraph) edge_attr_names(graph = gcd$graph) # For instance, the following edge-filter will instruct ClustIRR to keep # edges with: edge attributes: nweight>=3 \bold{AND} ncweight>=3 edge_filter <- rbind(data.frame(name = "nweight", value = 3, operation = ">="), data.frame(name = "ncweight", value = 3, operation = ">=")) # In addition, we can construct filters based on the following node attributes: vertex_attr_names(graph = gcd$graph) # The following node-filter will instruct ClustIRR to retain edges # between nodes that have shared node attributed with respect to ALL # of the following node attributes: node_filter <- data.frame(name = "Ag_gene") # Lets inspect community with ID = 1. c1 <- decode_communities(community_id = 1, graph = gcd$graph, edge_filter = edge_filter, node_filter = node_filter) # Plot resulting igraph par(mar = c(0, 0, 0, 0)) plot(c1, vertex.size = 10) # Now look at node attributes as_data_frame(x = c1, what = "vertices")[,c("name", "component_id", "CDR3b", "CDR3a", "Ag_gene")]
Graph-based community detection in graphs constructed by get_graph
or get_joint_graph
.
detect_communities(graph, weight = "nweight", algorithm = "leiden", resolution = 1, iterations = 100, chains)
detect_communities(graph, weight = "nweight", algorithm = "leiden", resolution = 1, iterations = 100, chains)
graph |
|
algorithm |
graph-based community detection (GCD) method: leiden (default), louvain or infomap. |
resolution |
clustering resolution (default = 1) for GCD. |
iterations |
clustering iterations (default = 100) for GCD. |
weight |
which edge weight attribute (default = nweight) should be used for GCD |
chains |
which chains should be used for clustering? For instance: chains = "CDR3a"; or chains = "CDR3b"; or chains = c("CDR3a", "CDR3b"). |
ClustIRR employs graph-based community detection (GCD) algorithms, such as
Louvain, Leiden or InfoMap, to identify communities
of nodes that have
high density of edges among each other, and low density of edges with nodes
outside the community.
The output is a list with the folling elements:
community_occupancy_matrix |
matrix, rows are communities, columns are repertoires, matrix entries are numbers of cells in each community and repertoire. |
community_summary |
data.frame, rows are communities and their properties are provided as columns. |
node_summary |
data.frame, rows are nodes (clones) and their properties are provided as columnscontains all user provided. |
graph |
igraph object, processed graph object. |
graph_structure_quality |
graph modularity and quality (only for Leiden) measure of the strength of division of the graph into communities. |
input_config |
list, inputs provided as list. |
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:300, "CDR3a"], CDR3b = CDR3ab[1:300, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[201:400, "CDR3a"], CDR3b = CDR3ab[201:400, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, weight = "nweight", algorithm = "leiden", resolution = 1, iterations = 100, chains = c("CDR3a", "CDR3b")) # look at outputs names(gcd) # look at the community occupancymatrix head(gcd$community_occupancy_matrix) # look at the community summary head(gcd$community_summary$wide) # look at the node summary head(gcd$node_summary)
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:300, "CDR3a"], CDR3b = CDR3ab[1:300, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[201:400, "CDR3a"], CDR3b = CDR3ab[201:400, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, weight = "nweight", algorithm = "leiden", resolution = 1, iterations = 100, chains = c("CDR3a", "CDR3b")) # look at outputs names(gcd) # look at the community occupancymatrix head(gcd$community_occupancy_matrix) # look at the community summary head(gcd$community_summary$wide) # look at the node summary head(gcd$node_summary)
Use node_summary
data.frame generated by the function
detect_communities
; and 2. antigen species/genes to
estimate the number of antigen-specific T-cells in selected
communities in each repertoire.
get_ag_summary(node_summary, ag_species, ag_genes, db = "vdjdb", db_dist = 0, chain = "both")
get_ag_summary(node_summary, ag_species, ag_genes, db = "vdjdb", db_dist = 0, chain = "both")
node_summary |
|
ag_species |
antigen species, character vector, e.g. c("EBV", "CMV") |
ag_genes |
antigen genes, character vector, e.g. "MLANA" |
db |
annotation database, character, e.g. "vdjdb" |
db_dist |
maximum edit distance threshold for matching, nummeric |
chain |
immune receptor chain for annotation, "both", "CDR3a" or "CDR3b" |
The user has to provide a vector of antigen species (e.g.
ag_species
= c("EBV", "CMV")) and/or a vector of antigen
genes (e.g. ag_genes
= "MLANA"). Furthermore, the user has
to provide nodes (node_summary
data.frame created by the
function detect_communities
) and a vector with community IDs.
The user can also select an annotation database db
,
such as "vdjdb", "mcpas" or "tcr3d"; and restrict the annotation to
specific IR chains, such as "CDR3a", "CDR3b" or "both". By default,
we will look for perfect matches (db_dist=0
) between CDR3
sequences in the input and in the annotation database for annotation.
Flexible annotation based on edit distances can be performed by
increasing db_dist
.
The output is a data.frame with the number of T-cells specific for the antigenic species/genes (columns) provided as input per repertoire (row), including the total number of T-cells in each repertoire.
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"], CDR3b = CDR3ab[1:500, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"], CDR3b = CDR3ab[401:900, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, algorithm = "leiden", resolution = 1, weight = "nweight", chains = c("CDR3a", "CDR3b")) # differential community occupancy analysis dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix) ag_summary <- get_ag_summary(node_summary = gcd$node_summary, ag_species = c("EBV", "CMV"), ag_genes = "MLANA", db = "vdjdb", db_dist = 0, chain = "both")
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"], CDR3b = CDR3ab[1:500, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"], CDR3b = CDR3ab[401:900, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, algorithm = "leiden", resolution = 1, weight = "nweight", chains = c("CDR3a", "CDR3b")) # differential community occupancy analysis dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix) ag_summary <- get_ag_summary(node_summary = gcd$node_summary, ag_species = c("EBV", "CMV"), ag_genes = "MLANA", db = "vdjdb", db_dist = 0, chain = "both")
s between pairs of repertoiresVisualize the means as a 2D scatterplot, representing relative
community occupancies for all pairs of repertoires. At the same time, annotate
the communities (dots) based on their specificity.
get_beta_scatterplot(beta, node_summary, ag_species, ag_genes, db = "vdjdb", db_dist = 0, chain = "both")
get_beta_scatterplot(beta, node_summary, ag_species, ag_genes, db = "vdjdb", db_dist = 0, chain = "both")
beta |
|
node_summary |
|
ag_species |
antigen species, character vector, e.g. c("EBV", "CMV") |
ag_genes |
antigen genes, character vector, e.g. "MLANA" |
db |
annotation database, character, e.g. "vdjdb" |
db_dist |
maximum edit distance threshold for matching, nummeric |
chain |
immune receptor chain for annotation, "both", "CDR3a" or "CDR3b" |
The user has to provide a vector of antigen species (e.g.
ag_species
= c("EBV", "CMV")) and/or a vector of antigen
genes (e.g. ag_genes
= "MLANA"). Furthermore, the user has
to provide nodes (node_summary
data.frame created by the
function detect_communities
) and beta
data.frame
which is part of posterior_summary
generated by the function
dco
.
The user can also select an annotation database db
,
such as "vdjdb", "mcpas" or "tcr3d"; and restrict the annotation to
specific IR chains, such as "CDR3a", "CDR3b" or "both". By default,
we will look for perfect matches (db_dist=0
) between CDR3
sequences in the input and in the annotation database for annotation.
Flexible annotation based on edit distances can be performed by
increasing db_dist
.
The output is a list with 4 elements:
node_annotations
: annotated node_summary
beta_summary
: annotated beta
vars
: annotation variables
scatterplots
: a list of scatterplots: Each element of the list contains
a scatterplots for a specific antigen species/gene. Within each element of the
list there are panels (comparisons between repertoire pairs), with
as the number of repertoires.
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"], CDR3b = CDR3ab[1:500, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"], CDR3b = CDR3ab[401:900, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, algorithm = "leiden", resolution = 1, weight = "ncweight", chains = c("CDR3a", "CDR3b")) # look at outputs names(gcd) # look at the community matrix head(gcd$community_occupancy_matrix) # look at the community summary head(gcd$community_summary$wide) # look at the node summary head(gcd$node_summary) # differential community occupancy analysis dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix) names(dco) # generate beta violin plots beta_scatterplot <- get_beta_scatterplot(beta = dco$posterior_summary$beta, node_summary = gcd$node_summary, ag_species = c("EBV", "CMV"), ag_genes = "MLANA", db = "vdjdb", db_dist = 0, chain = "both")
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"], CDR3b = CDR3ab[1:500, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"], CDR3b = CDR3ab[401:900, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, algorithm = "leiden", resolution = 1, weight = "ncweight", chains = c("CDR3a", "CDR3b")) # look at outputs names(gcd) # look at the community matrix head(gcd$community_occupancy_matrix) # look at the community summary head(gcd$community_summary$wide) # look at the node summary head(gcd$node_summary) # differential community occupancy analysis dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix) names(dco) # generate beta violin plots beta_scatterplot <- get_beta_scatterplot(beta = dco$posterior_summary$beta, node_summary = gcd$node_summary, ag_species = c("EBV", "CMV"), ag_genes = "MLANA", db = "vdjdb", db_dist = 0, chain = "both")
means in each repertoire
as violin plotsVisualize the means as violin plots, representing relative
community occupancies for individual repertoires. At the same time, annotate
the communities (dots) based on their specificity.
get_beta_violins(beta, node_summary, ag_species, ag_genes, db = "vdjdb", db_dist = 0, chain = "both")
get_beta_violins(beta, node_summary, ag_species, ag_genes, db = "vdjdb", db_dist = 0, chain = "both")
beta |
|
node_summary |
|
ag_species |
antigen species, character vector, e.g. c("EBV", "CMV") |
ag_genes |
antigen genes, character vector, e.g. "MLANA" |
db |
annotation database, character, e.g. "vdjdb" |
db_dist |
maximum edit distance threshold for matching, nummeric |
chain |
immune receptor chain for annotation, "both", "CDR3a" or "CDR3b" |
The user has to provide a vector of antigen species (e.g.
ag_species
= c("EBV", "CMV")) and/or a vector of antigen
genes (e.g. ag_genes
= "MLANA"). Furthermore, the user has
to provide nodes (node_summary
data.frame created by the
function detect_communities
) and beta
data.frame
which is part of posterior_summary
generated by the function
dco
.
The user can also select an annotation database db
,
such as "vdjdb", "mcpas" or "tcr3d"; and restrict the annotation to
specific IR chains, such as "CDR3a", "CDR3b" or "both". By default,
we will look for perfect matches (db_dist=0
) between CDR3
sequences in the input and in the annotation database for annotation.
Flexible annotation based on edit distances can be performed by
increasing db_dist
.
The output is a list with 4 elements:
node_annotations
: annotated node_summary
beta_summary
: annotated beta
vars
: annotation variables
violins
: violin plots (one for each antigen species and gene)
violins
: a list of violin plots. Each element of the list contains
a violin visual for a specific antigen species/gene.
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"], CDR3b = CDR3ab[1:500, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"], CDR3b = CDR3ab[401:900, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, algorithm = "leiden", resolution = 1, weight = "ncweight", chains = c("CDR3a", "CDR3b")) # differential community occupancy analysis dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix) # generate beta violin plots beta_violins <- get_beta_violins(beta = dco$posterior_summary$beta, node_summary = gcd$node_summary, ag_species = c("EBV", "CMV"), ag_genes = "MLANA", db = "vdjdb", db_dist = 0, chain = "both")
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:500, "CDR3a"], CDR3b = CDR3ab[1:500, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[401:900, "CDR3a"], CDR3b = CDR3ab[401:900, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, algorithm = "leiden", resolution = 1, weight = "ncweight", chains = c("CDR3a", "CDR3b")) # differential community occupancy analysis dco <- dco(community_occupancy_matrix = gcd$community_occupancy_matrix) # generate beta violin plots beta_violins <- get_beta_violins(beta = dco$posterior_summary$beta, node_summary = gcd$node_summary, ag_species = c("EBV", "CMV"), ag_genes = "MLANA", db = "vdjdb", db_dist = 0, chain = "both")
igraph
object from clust_irr
objectGiven a clust_irr
object generated by the function cluster_irr
,
the function get_graph
constructs an igraph
object.
The graph nodes represent IR clones. Undirected edges are drawn between
pairs of nodes, and the attributes of these edges are assigned based
on the clust_irr
outputs: ,
, etc.
get_graph(clust_irr)
get_graph(clust_irr)
clust_irr |
S4 object generated by the function |
The output is a list with the following elements. First, the list contains
an igraph
object. The graph nodes and edges contain attributes encoded
in the clust_irr
objects. Second, it contains a data.frame in which
rows are clones (nodes) in the graph. Third, the list contains the logical
variable joint_graph
, which is set to TRUE
if the graph is a
joint graph generated by the function get_joint_graph
and FALSE
if the graph is not a joint graph generated by get_graph
.
# load package input data data("CDR3ab", package = "ClustIRR") s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1) # run ClustIRR analysis out <- cluster_irr(s = s) # get graph g <- get_graph(clust_irr = out) names(g)
# load package input data data("CDR3ab", package = "ClustIRR") s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1) # run ClustIRR analysis out <- cluster_irr(s = s) # get graph g <- get_graph(clust_irr = out) names(g)
Use the community_occupancy_matrix
generated by the function
detect_communities
to generate honeycomb plots for each pair of
repertoires. In each plot, we will show communities (rows in the matric
community_occupancy_matrix
) as dots and their intensities in a
pair of repertoires (x-axis and y-axis). The density of dots is encoded
by the color of the honeycomb-like hexagons.
get_honeycombs(com)
get_honeycombs(com)
com |
|
Use the community_occupancy_matrix
generated by the function
detect_communities
to generate honeycomb plots for each pair of
repertoires. In each plot, we will show communities (rows in the matric
community_occupancy_matrix
) as dots and their intensities in a
pair of repertoires (x-axis and y-axis). The density of dots is encoded
by the color of the honeycomb-like hexagons.
The output is a list with ggplots. Given n repertoires (columns in input
community_occupancy_matrix
), it will generate n*(n-1)/2 plots.
You can arrange the ggplots (or a portion of them) in any shape e.g. with
the R-package patchwork.
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:300, "CDR3a"], CDR3b = CDR3ab[1:300, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[201:400, "CDR3a"], CDR3b = CDR3ab[201:400, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, algorithm = "leiden", resolution = 1, weight = "ncweight", chains = c("CDR3a", "CDR3b")) # get honeycombs g <- get_honeycombs(com = gcd$community_occupancy_matrix) g
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3a = CDR3ab[1:300, "CDR3a"], CDR3b = CDR3ab[1:300, "CDR3b"], clone_size = 1, sample = "a") b <- data.frame(CDR3a = CDR3ab[201:400, "CDR3a"], CDR3b = CDR3ab[201:400, "CDR3b"], clone_size = 1, sample = "b") b$clone_size[1] <- 20 # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get joint graph jg <- get_joint_graph(clust_irrs = c) # detect communities gcd <- detect_communities(graph = jg$graph, algorithm = "leiden", resolution = 1, weight = "ncweight", chains = c("CDR3a", "CDR3b")) # get honeycombs g <- get_honeycombs(com = gcd$community_occupancy_matrix) g
igraph
object from multiple clust_irr
objectsGiven a vector of clust_irr
objects, generated by the function
cluster_irr
, the function get_joint_graph
performs the
following steps:
1. runs the function get_graph
on each clust_irr
object
2. merges the nodes: if graph a
and b
have |a|
and |b|
nodes, then the joint graph has |a|+|b|
nodes,
regardless of whether exactly the same clone (vertex) is found in both
graphs.
3. draws edges between nodes from the different graphs using the same
algorithm for drawing edges between nodes within an IRR (see function
clust_irr
).
4. return a joint graph as igraph
object
5. return the input clustirr object list
6. return a logical joint_graph
=TRUE
get_joint_graph(clust_irrs, cores = 1)
get_joint_graph(clust_irrs, cores = 1)
clust_irrs |
A list of at least two S4 objects generated with the
function |
cores |
number of computer cores to use (default = 1) |
The main output is an igraph
object.
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "a", clone_size = 1) b <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "b", clone_size = 1) # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get graph g <- get_joint_graph(clust_irrs = c) names(g)
# load package input data data("CDR3ab", package = "ClustIRR") a <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "a", clone_size = 1) b <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "b", clone_size = 1) # run ClustIRR analysis c <- c(cluster_irr(s = a), cluster_irr(s = b)) # get graph g <- get_joint_graph(clust_irrs = c) names(g)
data.frame with CDR3a
and/or CDR3b
sequences and their
matching antigenic epitopes obtained from McPAS-TCR. The remaining CDR3
columns are set to NA. For data processing details see the script
inst/script/get_mcpastcr.R
data(mcpas)
data(mcpas)
data.frame
with columns:
CDR3a
: CDR3a amino acid sequence
CDR3b
: CDR3b amino acid sequence
CDR3g
: CDR3g amino acid sequence -> NA
CDR3d
: CDR3d amino acid sequence -> NA
CDR3h
: CDR3h amino acid sequence -> NA
CDR3l
: CDR3l amino acid sequence -> NA
CDR3_species
: CDR3 species (e.g. human, mouse, ...)
Antigen_species
: antigen species
Antigen_gene
: antigen gene
Reference
: Reference (Pubmed ID)
data(mcpas) loads the object McPAS-TCR
data(mcpas)
data(mcpas)
This function visualizes a graph. The main input is g
object
created by the function get_graph
.
plot_graph(g, select_by = "Ag_species", as_visnet = FALSE, show_singletons = TRUE, node_opacity = 1)
plot_graph(g, select_by = "Ag_species", as_visnet = FALSE, show_singletons = TRUE, node_opacity = 1)
g |
Object returned by the functions |
as_visnet |
logical, if |
select_by |
character string, two values are possible: "Ag_species" or
"Ag_gene". This only has an effect if |
show_singletons |
logical, if |
node_opacity |
probability, controls the opacity of node colors. Lower values corresponding to more transparent colors. |
The output is an igraph
or visNetwork
plot.
The size of the vertices increases linearly as the logarithm of the degree of the clonal expansion (number of cells per clone) in the corresponding clones.
# load package input data data("CDR3ab", package = "ClustIRR") s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1) # run ClustIRR analysis out <- cluster_irr(s = s) # get graph g <- get_graph(clust_irr = out) # plot graph with vertices as clones plot_graph(g, as_visnet=FALSE, show_singletons=TRUE, node_opacity = 0.8)
# load package input data data("CDR3ab", package = "ClustIRR") s <- data.frame(CDR3b = CDR3ab[1:100, "CDR3b"], sample = "A", clone_size = 1) # run ClustIRR analysis out <- cluster_irr(s = s) # get graph g <- get_graph(clust_irr = out) # plot graph with vertices as clones plot_graph(g, as_visnet=FALSE, show_singletons=TRUE, node_opacity = 0.8)
data.frame with paired CDR3a
and CDR3b
CDR3 sequences and their
matching epitopes obtained from TCR3d. The remaining CDR3 columns are set to
NA. The antigenic epitopes come from cancer antigens and from viral antigens.
For data processing details see the script inst/script/get_tcr3d.R
data(tcr3d)
data(tcr3d)
data.frame
with columns:
CDR3a
: CDR3a amino acid sequence
CDR3b
: CDR3b amino acid sequence
CDR3g
: CDR3g amino acid sequence -> NA
CDR3d
: CDR3d amino acid sequence -> NA
CDR3h
: CDR3h amino acid sequence -> NA
CDR3l
: CDR3l amino acid sequence -> NA
CDR3_species
: CDR3 species (e.g. human, mouse, ...)
Antigen_species
: antigen species
Antigen_gene
: antigen gene
Reference
: Reference ID
data(tcr3d) loads the object tcr3d
data("tcr3d")
data("tcr3d")
data.frame with unpaired CDR3a
or CDR3b
sequences and their
matching epitopes obtained from VDJdb. The remaining CDR3 columns are set
to NA. For data processing details see the script inst/script/get_vdjdb.R
data(vdjdb)
data(vdjdb)
data.frame
with columns:
CDR3a
: CDR3a amino acid sequence
CDR3b
: CDR3b amino acid sequence
CDR3g
: CDR3g amino acid sequence -> NA
CDR3d
: CDR3d amino acid sequence -> NA
CDR3h
: CDR3h amino acid sequence -> NA
CDR3l
: CDR3l amino acid sequence -> NA
CDR3_species
: CDR3 species (e.g. human, mouse, ...)
Antigen_species
: antigen species
Antigen_gene
: antigen gene
Reference
: Reference (Pubmed ID)
data(vdjdb) loads the object vdjdb
data("vdjdb")
data("vdjdb")