| Title: | Grouping of Lymphocyte Interactions by Paratope Hotspots |
|---|---|
| Description: | An R implementation of the GLIPH and GLIPH2 algorithms for clustering T cell receptors (TCRs) predicted to bind the same HLA-restricted peptide antigen. Identifies specificity groups based on local (motif-based) and global (sequence-based) CDR3 similarities. Integrates with the scRepertoire ecosystem via immApex for single-cell immune repertoire analysis. Users should cite the original GLIPH algorithm papers: Glanville et al. (2017) <doi:10.1038/nature22976> and Huang et al. (2020) <doi:10.1038/s41587-020-0505-4>. |
| Authors: | Nick Borcherding [aut, cre] |
| Maintainer: | Nick Borcherding <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.99.5 |
| Built: | 2026-06-02 23:38:32 UTC |
| Source: | https://github.com/bioc/immGLIPH |
Calculates scores for CDR3 clusters following the GLIPH and GLIPH2 scoring procedures. Depending on the information provided, a final score is computed from up to five cluster properties: cluster size, enrichment of CDR3 lengths, enrichment of V genes, enrichment of clonal expansions, and enrichment of common HLA alleles.
clusterScoring( cluster_list, cdr3_sequences, refdb_beta = "human_v2.0_CD48", v_usage_freq = NULL, cdr3_length_freq = NULL, ref_cluster_size = "original", gliph_version = 1, sim_depth = 1000, hla_cutoff = 0.1, n_cores = 1 )clusterScoring( cluster_list, cdr3_sequences, refdb_beta = "human_v2.0_CD48", v_usage_freq = NULL, cdr3_length_freq = NULL, ref_cluster_size = "original", gliph_version = 1, sim_depth = 1000, hla_cutoff = 0.1, n_cores = 1 )
cluster_list |
A |
cdr3_sequences |
A
|
refdb_beta |
A |
v_usage_freq |
A |
cdr3_length_freq |
A |
ref_cluster_size |
A
Default: |
gliph_version |
A
Default: |
sim_depth |
A |
hla_cutoff |
A |
n_cores |
A |
A data.frame of cluster scoring results. The first
column contains the total score and additional columns contain up to
five individual scores (cluster size, CDR3 length enrichment, V-gene
enrichment, clonal expansion enrichment, and common HLA enrichment).
Glanville, Jacob, et al. "Identifying specificity groups in the T cell receptor repertoire." Nature 547.7661 (2017): 94.
https://github.com/immunoengineer/gliph
utils::data("gliph_input_data") ref_df <- gliph_input_data[, c("CDR3b", "TRBV")] res <- runGLIPH(cdr3_sequences = gliph_input_data[seq_len(200), ], refdb_beta = ref_df, sim_depth = 100, n_cores = 1) scoring_results <- clusterScoring( cluster_list = res$cluster_list, cdr3_sequences = gliph_input_data[seq_len(200), ], refdb_beta = ref_df, gliph_version = 1, sim_depth = 100, n_cores = 1)utils::data("gliph_input_data") ref_df <- gliph_input_data[, c("CDR3b", "TRBV")] res <- runGLIPH(cdr3_sequences = gliph_input_data[seq_len(200), ], refdb_beta = ref_df, sim_depth = 100, n_cores = 1) scoring_results <- clusterScoring( cluster_list = res$cluster_list, cdr3_sequences = gliph_input_data[seq_len(200), ], refdb_beta = ref_df, gliph_version = 1, sim_depth = 100, n_cores = 1)
De novo generation of CDR3 sequences based on GLIPH or GLIPH2 clustering results. Using the position-specific abundance of amino acids in the CDR3 region of sequences within a convergence group, artificial sequences are simulated following the approach established in Glanville et al. The generated sequences are scored by a positional weight matrix (PWM) derived from the convergence group, and optionally normalized against a reference database. The top-scoring sequences are returned.
deNovoTCRs( convergence_group_tag, result_folder = "", clustering_output = NULL, refdb_beta = "gliph_reference", normalization = FALSE, accept_sequences_with_C_F_start_end = TRUE, sims = 1e+05, num_tops = 1000, min_length = 10, make_figure = FALSE, n_cores = 1 )deNovoTCRs( convergence_group_tag, result_folder = "", clustering_output = NULL, refdb_beta = "gliph_reference", normalization = FALSE, accept_sequences_with_C_F_start_end = TRUE, sims = 1e+05, num_tops = 1000, min_length = 10, make_figure = FALSE, n_cores = 1 )
convergence_group_tag |
Character. Tag of the convergence group to use for prediction. |
result_folder |
Character. Path to the folder containing clustering
output files and where results will be saved. If the value is |
clustering_output |
List. The output list from |
refdb_beta |
Character or data.frame. Specifies the reference database to use. When a data.frame is provided, the first column should contain CDR3b sequences and the second column (optional) should contain V genes. The following keyword can be used to select a built-in database:
Default: |
normalization |
Logical. If |
accept_sequences_with_C_F_start_end |
Logical. If |
sims |
Numeric. Number of de novo CDR3 sequences to generate.
Default: |
num_tops |
Numeric. Number of top-scoring de novo sequences to
return. Default: |
min_length |
Numeric. Minimum CDR3 sequence length; also determines
the number of N-terminal positions used for PWM scoring.
Default: |
make_figure |
Logical. Whether to plot the |
n_cores |
Numeric. Number of cores for parallel computation. If
|
A list with the following elements:
A data.frame of the num_tops
best-scoring generated sequences and their corresponding scores.
A data.frame of the convergence
group sequences and their corresponding scores.
A data.frame with each observed
CDR3 length and its probability of occurrence in the convergence group.
The length distribution of generated sequences mirrors this
distribution.
A data.frame containing the positional weight
matrix used for scoring. Columns represent amino acids and rows
represent positions relative to the N-terminus.
A list of data.frames containing
the positional weight matrices used for sequence generation, one per
observed CDR3 length. Columns represent amino acids and rows represent
positions relative to the N-terminus.
If result_folder is specified, a tab-delimited file named
<convergence_group_tag>_de_novo.txt is also written to disk.
Glanville, Jacob, et al. "Identifying specificity groups in the T cell receptor repertoire." Nature 547.7661 (2017): 94.
https://github.com/immunoengineer/gliph
# Build a minimal clustering output to demonstrate deNovoTCRs fake_cluster <- data.frame( CDR3b = c("CASSLAPGATNEKLFF", "CASSLAPGGTNEKLFF", "CASSLAPGDTNEKLFF", "CASSLAPGETNEKLFF", "CASSLAPGANEKLFF", "CASSLAPGVTNEKLFF"), TRBV = rep("TRBV5-1", 6), stringsAsFactors = FALSE ) fake_output <- list(cluster_list = list("motif-LAP" = fake_cluster)) ref_seqs <- fake_cluster[, c("CDR3b", "TRBV")] new_seqs <- deNovoTCRs( convergence_group_tag = "motif-LAP", clustering_output = fake_output, refdb_beta = ref_seqs, sims = 100, num_tops = 10, min_length = 8, make_figure = FALSE, n_cores = 1 )# Build a minimal clustering output to demonstrate deNovoTCRs fake_cluster <- data.frame( CDR3b = c("CASSLAPGATNEKLFF", "CASSLAPGGTNEKLFF", "CASSLAPGDTNEKLFF", "CASSLAPGETNEKLFF", "CASSLAPGANEKLFF", "CASSLAPGVTNEKLFF"), TRBV = rep("TRBV5-1", 6), stringsAsFactors = FALSE ) fake_output <- list(cluster_list = list("motif-LAP" = fake_cluster)) ref_seqs <- fake_cluster[, c("CDR3b", "TRBV")] new_seqs <- deNovoTCRs( convergence_group_tag = "motif-LAP", clustering_output = fake_output, refdb_beta = ref_seqs, sims = 100, num_tops = 10, min_length = 8, make_figure = FALSE, n_cores = 1 )
Searches a character vector of amino acid sequences for k-mer motifs and
returns their frequencies. Both continuous and discontinuous (gapped) motifs
are supported. When immApex (>= 2.0.0) is installed, the
C++-accelerated immApex::calculateMotif() backend is used
automatically for improved performance; otherwise the function falls back
to a pure-R implementation based on qgrams.
findMotifs(seqs, q = 2:4, kmer_mindepth = NULL, discontinuous = FALSE)findMotifs(seqs, q = 2:4, kmer_mindepth = NULL, discontinuous = FALSE)
seqs |
A character vector of amino acid sequences in which motifs will be identified and counted. |
q |
A numeric vector of motif lengths to search for.
Default: |
kmer_mindepth |
The minimum number of times a k-mer must be observed
in |
discontinuous |
Whether to include discontinuous (gapped) motifs in
the search. Default: |
A data.frame with two columns: motif (the k-mer
string) and V1 (the observed frequency).
utils::data("gliph_input_data") sample_seqs <- as.character(gliph_input_data$CDR3b) res <- findMotifs(seqs = sample_seqs)utils::data("gliph_input_data") sample_seqs <- as.character(gliph_input_data$CDR3b) res <- findMotifs(seqs = sample_seqs)
Downloads the reference repertoire data from Zenodo on first use and caches locally via BiocFileCache. Subsequent calls load from the cache without re-downloading.
getGLIPHreference(force_download = FALSE, verbose = TRUE)getGLIPHreference(force_download = FALSE, verbose = TRUE)
force_download |
Logical. If |
verbose |
Logical. Print messages. Default: |
The cached file contains a named list with entries for each built-in
reference database (see .valid_reference_names).
A named list of reference databases. Each element is a list
with refseqs, vgene_frequencies, and
cdr3_length_frequencies.
# Available reference database names c("human_v1.0_CD4", "human_v1.0_CD8", "human_v1.0_CD48", "human_v2.0_CD4", "human_v2.0_CD8", "human_v2.0_CD48", "mouse_v1.0_CD4", "mouse_v1.0_CD8", "mouse_v1.0_CD48", "gliph_reference") ref <- getGLIPHreference() names(ref) head(ref[["human_v2.0_CD48"]]$refseqs)# Available reference database names c("human_v1.0_CD4", "human_v1.0_CD8", "human_v1.0_CD48", "human_v2.0_CD4", "human_v2.0_CD8", "human_v2.0_CD48", "mouse_v1.0_CD4", "mouse_v1.0_CD8", "mouse_v1.0_CD48", "gliph_reference") ref <- getGLIPHreference() names(ref) head(ref[["human_v2.0_CD48"]]$refseqs)
Draws a random subset of reference motif regions with the same size as the
sample set. When cdr3_len_stratify and/or vgene_stratify are
enabled, the function preserves the CDR3 length and/or V-gene distribution
of the sample in the subsample. This is used internally by the repeated
random sampling (RRS) local-similarity method in runGLIPH.
getRandomSubsample( cdr3_len_stratify = FALSE, vgene_stratify = FALSE, refseqs_motif_region, motif_region, motif_lengths_list, ref_motif_lengths_id_list, motif_region_vgenes_list, ref_motif_vgenes_id_list, ref_lengths_vgenes_list, lengths_vgenes_list )getRandomSubsample( cdr3_len_stratify = FALSE, vgene_stratify = FALSE, refseqs_motif_region, motif_region, motif_lengths_list, ref_motif_lengths_id_list, motif_region_vgenes_list, ref_motif_vgenes_id_list, ref_lengths_vgenes_list, lengths_vgenes_list )
cdr3_len_stratify |
Whether to preserve the CDR3 length distribution.
Default: |
vgene_stratify |
Whether to preserve the V-gene distribution.
Default: |
refseqs_motif_region |
Character vector of reference motif regions. |
motif_region |
Character vector of sample motif regions. |
motif_lengths_list |
Named list mapping CDR3 lengths to their
frequency in |
ref_motif_lengths_id_list |
Named list mapping CDR3 lengths to
indices in |
motif_region_vgenes_list |
Named list mapping V-genes to their
frequency in |
ref_motif_vgenes_id_list |
Named list mapping V-genes to indices in
|
ref_lengths_vgenes_list |
Nested list mapping CDR3 length x V-gene
combinations to indices in |
lengths_vgenes_list |
Nested list mapping CDR3 length x V-gene
combinations to their frequency in the sample. Required when both
stratification flags are |
A character vector of length length(motif_region) drawn from
refseqs_motif_region.
ref_seqs <- c("ASSG", "ASSD", "ASSE", "ASSF", "ASSK", "ASSL") sample_seqs <- c("ASSG", "ASSF", "ASSL") sub <- getRandomSubsample( refseqs_motif_region = ref_seqs, motif_region = sample_seqs )ref_seqs <- c("ASSG", "ASSD", "ASSE", "ASSF", "ASSK", "ASSL") sample_seqs <- c("ASSG", "ASSF", "ASSL") sub <- getRandomSubsample( refseqs_motif_region = ref_seqs, motif_region = sample_seqs )
A data.frame of 365 TRB CDR3 sequences with V-gene and patient
annotations, derived from the scRepertoire example dataset (Yost
et al. 2021). The data were extracted from the gliph_sce
SingleCellExperiment object using immApex::getIR().
data(gliph_input_data)data(gliph_input_data)
A data.frame with 365 rows and 3 columns (CDR3b, TRBV, patient).
CDR3bAmino acid sequence of the TRB CDR3 region.
TRBVTRBV gene name (e.g. "TRBV9").
patientPatient/sample identifier
(e.g. "P17B", "P19L").
Yost, K. E. et al. Clonal replacement of tumor-specific T cells following PD-1 blockade. Nature Medicine 25, 1251–1259 (2019).
Built from scRepertoire example data; see
data-raw/build_example_data.R.
gliph_sce for the parent SingleCellExperiment object,
runGLIPH for the main analysis function.
A SingleCellExperiment object containing
2,000 genes across 500 cells, with T-cell receptor clonotype information
stored in the colData. Built from the scRepertoire example
dataset using combineTCR() and combineExpression().
data(gliph_sce)data(gliph_sce)
A SingleCellExperiment with 2000 genes and 500 cells.
The colData includes scRepertoire columns such as CTaa
(amino acid clonotype), CTgene (gene-level clonotype), CTnt
(nucleotide clonotype), CTstrict (strict clonotype), and clone
frequency/proportion columns. These can be parsed by
immApex::getIR() to extract chain-specific TCR data.
This object demonstrates how to pass a SingleCellExperiment directly to
runGLIPH.
Yost, K. E. et al. Clonal replacement of tumor-specific T cells following PD-1 blockade. Nature Medicine 25, 1251–1259 (2019).
Built from scRepertoire example data; see
data-raw/build_example_data.R.
gliph_input_data for a plain data.frame extracted
from this object, runGLIPH for the main analysis function.
A list of three data.frames containing germline-encoded
fragments of V (gTRV), D (gTRD), and J (gTRJ) gene
segments that may appear in the CDR3 region. These fragments are used by the
GLIPH2 algorithm to identify germline-encoded sequence segments.
data(gTRB)data(gTRB)
A list of 3 data.frames: gTRV, gTRD, and gTRJ.
Lefranc, M.-P. IMGT, the international ImMunoGeneTics database. Nucl. Acids Res. 29, 207–209 (2001).
Reads the tab-delimited output files produced by runGLIPH (when
result_folder was specified) and reconstructs the same list structure
that runGLIPH() returns.
loadGLIPH(result_folder = "")loadGLIPH(result_folder = "")
result_folder |
Path to the folder containing the saved GLIPH output files. |
A list with the same structure as the return value of
runGLIPH, including elements such as cluster_list,
cluster_properties, motif_enrichment, connections, and
parameters.
utils::data("gliph_input_data") ref_df <- gliph_input_data[, c("CDR3b", "TRBV")] tmp_dir <- tempfile("gliph_out_") res <- runGLIPH( cdr3_sequences = gliph_input_data[seq_len(200), ], method = "gliph1", refdb_beta = ref_df, result_folder = tmp_dir, sim_depth = 50, n_cores = 1 ) reloaded <- loadGLIPH(result_folder = tmp_dir) unlink(tmp_dir, recursive = TRUE)utils::data("gliph_input_data") ref_df <- gliph_input_data[, c("CDR3b", "TRBV")] tmp_dir <- tempfile("gliph_out_") res <- runGLIPH( cdr3_sequences = gliph_input_data[seq_len(200), ], method = "gliph1", refdb_beta = ref_df, result_folder = tmp_dir, sim_depth = 50, n_cores = 1 ) reloaded <- loadGLIPH(result_folder = tmp_dir) unlink(tmp_dir, recursive = TRUE)
Uses the visNetwork package to build an interactive network graph from the
clustering results produced by runGLIPH. Nodes represent
individual CDR3b sequences and edges encode local or global sequence
similarities. The resulting visualization is fully interactive: scroll to
zoom, hover over a node for details, and click a node to highlight its
direct neighbors.
plotNetwork( clustering_output = NULL, result_folder = "", show_additional_columns = NULL, color_info = "total.score", color_palette = viridis::viridis, local_edge_color = "orange", global_edge_color = "#68bceb", size_info = NULL, absolute_size = FALSE, cluster_min_size = 3, n_cores = 1 )plotNetwork( clustering_output = NULL, result_folder = "", show_additional_columns = NULL, color_info = "total.score", color_palette = viridis::viridis, local_edge_color = "orange", global_edge_color = "#68bceb", size_info = NULL, absolute_size = FALSE, cluster_min_size = 3, n_cores = 1 )
clustering_output |
Output list returned by |
result_folder |
Path to the folder containing saved GLIPH output
files. When a non-empty path is supplied the results are loaded from disk
and |
show_additional_columns |
Character vector of extra column names whose
values should be displayed in the node tooltips. Column names from the
original |
color_info |
Column name used to colour the nodes. Accepts any column
from the input |
color_palette |
A function that accepts a single integer |
local_edge_color |
Colour applied to edges representing local
similarities. Default: |
global_edge_color |
Colour applied to edges representing global
similarities. Default: |
size_info |
Column name whose numeric values determine node sizes.
Accepts columns from |
absolute_size |
If |
cluster_min_size |
Minimum number of members a cluster must contain to
be included in the plot. Default: |
n_cores |
Number of cores for parallel processing. When |
A visNetwork object containing the interactive network graph.
utils::data("gliph_input_data") ref_df <- gliph_input_data[, c("CDR3b", "TRBV")] res <- runGLIPH(cdr3_sequences = gliph_input_data[seq_len(200),], method = "gliph1", refdb_beta = ref_df, sim_depth = 100, n_cores = 1) plotNetwork(clustering_output = res, n_cores = 1)utils::data("gliph_input_data") ref_df <- gliph_input_data[, c("CDR3b", "TRBV")] res <- runGLIPH(cdr3_sequences = gliph_input_data[seq_len(200),], method = "gliph1", refdb_beta = ref_df, sim_depth = 100, n_cores = 1) plotNetwork(clustering_output = res, n_cores = 1)
A list with two elements providing expected cluster-size
probabilities under the null model (no true convergence):
originalProbabilities from the original GLIPH algorithm, applied uniformly across all sample sizes.
simulatedProbabilities estimated from 500-step simulations at sample sizes of 125, 250, 500, 1000, 2000, 4000, 6000, 8000, and 10000 random reference sequences. During scoring the row closest to the actual sample size is used.
data(ref_cluster_sizes)data(ref_cluster_sizes)
A list with 2 elements: original and simulated.
Glanville, J. et al. Identifying specificity groups in the T cell receptor repertoire. Nature 547, 94–98 (2017).
A named list of naive TCR repertoire reference databases used for
motif enrichment testing and cluster scoring. The data is not
bundled with the package; it is downloaded on first use from Zenodo
and cached locally via BiocFileCache (see
getGLIPHreference).
NULL. Data is downloaded on first use via
getGLIPHreference.
Each element is itself a list with three components:
refseqsA data.frame with columns CDR3b
(amino acid sequence) and TRBV (V-gene name).
vgene_frequenciesA data.frame with columns
vgene and freq giving the relative frequency of each
V gene in the reference repertoire.
cdr3_length_frequenciesA data.frame with columns
len and freq giving the relative frequency of each
CDR3 length in the reference repertoire.
The following named entries are available:
"human_v1.0_CD4", "human_v1.0_CD8",
"human_v1.0_CD48" – Glanville et al. (2017)
"human_v2.0_CD4", "human_v2.0_CD8",
"human_v2.0_CD48" – Huang et al. (2020)
"mouse_v1.0_CD4", "mouse_v1.0_CD8",
"mouse_v1.0_CD48" – Glanville et al. (2017)
"gliph_reference" – Legacy alias for
"human_v1.0_CD48"
No return value. This documents the reference_list object
which is downloaded at runtime by getGLIPHreference.
Glanville, J. et al. Identifying specificity groups in the T cell receptor repertoire. Nature 547, 94–98 (2017).
Huang, H. et al. Analyzing the Mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening. Nature Biotechnology 38, 1194–1202 (2020).
Raw data downloaded from http://50.255.35.37:8080/tools.
getGLIPHreference to download or load the data,
runGLIPH and clusterScoring which use the
reference internally via the refdb_beta parameter.
Unified entry point for the GLIPH/GLIPH2 algorithm for grouping T cell receptors by antigen specificity. The function identifies locally and globally similar CDR3b sequences, clusters them into convergence groups, and scores each group for biological relevance.
runGLIPH( cdr3_sequences, method = c("gliph2", "gliph1", "custom"), chains = "TRB", result_folder = "", refdb_beta = "human_v2.0_CD48", v_usage_freq = NULL, cdr3_length_freq = NULL, ref_cluster_size = "original", sim_depth = 1000, lcminp = 0.01, lcminove = c(1000, 100, 10), kmer_mindepth = 3, accept_CF = TRUE, min_seq_length = 8, gccutoff = NULL, structboundaries = TRUE, boundary_size = 3, motif_length = c(2, 3, 4), local_similarities = TRUE, global_similarities = TRUE, local_method = NULL, global_method = NULL, clustering_method = NULL, scoring_method = NULL, cluster_min_size = 2, hla_cutoff = 0.1, n_cores = 1, motif_distance_cutoff = 3, discontinuous_motifs = FALSE, all_aa_interchangeable = FALSE, boost_local_significance = FALSE, global_vgene = FALSE, cdr3_len_stratify = FALSE, vgene_stratify = FALSE, public_tcrs = TRUE, vgene_match = "none", scoring_sim_depth = 1000, verbose = TRUE )runGLIPH( cdr3_sequences, method = c("gliph2", "gliph1", "custom"), chains = "TRB", result_folder = "", refdb_beta = "human_v2.0_CD48", v_usage_freq = NULL, cdr3_length_freq = NULL, ref_cluster_size = "original", sim_depth = 1000, lcminp = 0.01, lcminove = c(1000, 100, 10), kmer_mindepth = 3, accept_CF = TRUE, min_seq_length = 8, gccutoff = NULL, structboundaries = TRUE, boundary_size = 3, motif_length = c(2, 3, 4), local_similarities = TRUE, global_similarities = TRUE, local_method = NULL, global_method = NULL, clustering_method = NULL, scoring_method = NULL, cluster_min_size = 2, hla_cutoff = 0.1, n_cores = 1, motif_distance_cutoff = 3, discontinuous_motifs = FALSE, all_aa_interchangeable = FALSE, boost_local_significance = FALSE, global_vgene = FALSE, cdr3_len_stratify = FALSE, vgene_stratify = FALSE, public_tcrs = TRUE, vgene_match = "none", scoring_sim_depth = 1000, verbose = TRUE )
cdr3_sequences |
Input data containing CDR3b amino acid sequences.
Accepts a character vector, a When a
|
method |
Character. Algorithm preset to use.
Default: |
chains |
Character. Chain type for extraction from |
result_folder |
Character. Path to output folder. If |
refdb_beta |
Character or |
v_usage_freq |
|
cdr3_length_freq |
|
ref_cluster_size |
Character. Reference cluster size strategy.
Default: |
sim_depth |
Integer. Simulation depth for repeated random sampling
(local method |
lcminp |
Numeric. Local convergence maximum p-value threshold.
Default: |
lcminove |
Numeric vector. Local convergence minimum fold-change
per motif length (lengths 2, 3, and 4 respectively).
Default: |
kmer_mindepth |
Integer. Minimum number of kmer observations
required to consider a motif.
Default: |
accept_CF |
Logical. If |
min_seq_length |
Integer. Minimum CDR3b sequence length to retain.
Default: |
gccutoff |
Numeric or |
structboundaries |
Logical. If |
boundary_size |
Integer. Number of positions to trim from each end
when |
motif_length |
Numeric vector. Motif lengths to search.
Default: |
local_similarities |
Logical. If |
global_similarities |
Logical. If |
local_method |
Character or
Default: |
global_method |
Character or
Default: |
clustering_method |
Character or
Default: |
scoring_method |
Character or
Default: |
cluster_min_size |
Integer. Minimum number of unique CDR3b sequences
required to retain a convergence group.
Default: |
hla_cutoff |
Numeric. Significance cutoff for HLA enrichment testing.
Default: |
n_cores |
Integer or |
motif_distance_cutoff |
Integer. Maximum positional distance between
shared motifs for two CDR3b sequences to be linked (GLIPH2).
Default: |
discontinuous_motifs |
Logical. If |
all_aa_interchangeable |
Logical. If |
boost_local_significance |
Logical. If |
global_vgene |
Logical. If |
cdr3_len_stratify |
Logical. If |
vgene_stratify |
Logical. If |
public_tcrs |
Logical or character. Controls cross-donor edge
filtering. For
Default: |
vgene_match |
Character. V-gene matching requirement for custom clustering.
Default: |
scoring_sim_depth |
Integer. Simulation depth used specifically for
convergence group scoring.
Default: |
verbose |
Logical. If |
A list with the following elements:
sample_logdata.frame. Motif counts per simulation
iteration (only present when local_method = "rrs").
motif_enrichmentlist with two elements:
selected_motifsdata.frame of significantly
enriched motifs passing all thresholds.
all_motifsdata.frame of all evaluated motifs
with enrichment statistics.
global_enrichmentlist. Global struct enrichment
results (GLIPH2 only; NULL otherwise).
connectionsdata.frame. Edge list representing the
clone network.
cluster_propertiesdata.frame. Convergence group
properties and scores.
cluster_listNamed list of data.frame
objects with per-cluster member details.
parameterslist. All input parameters used for the
run.
Glanville, J. et al. (2017). Identifying specificity groups in the T cell receptor repertoire. Nature, 547, 94–98. doi:10.1038/nature22976
Huang, H. et al. (2020). Analyzing the Mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening. Nature Biotechnology, 38, 1194–1202. doi:10.1038/s41587-020-0505-4
utils::data("gliph_input_data") ref_df <- gliph_input_data[, c("CDR3b", "TRBV")] res <- runGLIPH( cdr3_sequences = gliph_input_data[seq_len(200), ], method = "gliph2", refdb_beta = ref_df, sim_depth = 50, n_cores = 1 )utils::data("gliph_input_data") ref_df <- gliph_input_data[, c("CDR3b", "TRBV")] res <- runGLIPH( cdr3_sequences = gliph_input_data[seq_len(200), ], method = "gliph2", refdb_beta = ref_df, sim_depth = 50, n_cores = 1 )