Title: | Differential gene usage in immune repertoires |
---|---|
Description: | Detection of biases in the usage of immunoglobulin (Ig) genes is an important task in immune repertoire profiling. IgGeneUsage detects aberrant Ig gene usage between biological conditions using a probabilistic model which is analyzed computationally by Bayes inference. With this IgGeneUsage also avoids some common problems related to the current practice of null-hypothesis significance testing. |
Authors: | Simo Kitanovski [aut, cre] |
Maintainer: | Simo Kitanovski <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.21.0 |
Built: | 2024-10-30 07:42:53 UTC |
Source: | https://github.com/bioc/IgGeneUsage |
IgGeneUsage detects aberrant immunoglobulin (Ig) gene usage between adaptive immune repertoires that belong to different biological conditions using a probabilistic model which is analyzed computationally by Bayes inference.
This package contains functions for:
differential Ig gene usage analysis (function DGU)
posterior predictive checks (part of results generated by function DGU)
leave-one-out cross validation (function LOO)
Authors and maintainers:
Simo Kitanovski [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/snaketron/IgGeneUsage/issues
Data of CDR3 sequence from human T-cells receptors (TRB-chain) downloaded from VDJdb. CDR3 sequences annotated to epitopes in Influenza-A and CMV were selected from different publications, as long as the publication contains at least 100 CDR3 sequences. Each publication is considered as a repertoire (sample).
To compute the net CDR3 sequence charge, we consider the amino acids K, R and H as +1 charged, while D and E as -1 charged. Thus, we computed the net charge of a CDR3 sequence by adding up the individual residue charges.
data("CDR3_Epitopes")
data("CDR3_Epitopes")
A data frame with 4 columns: "individual_id", "condition", "gene_name" and "gene_usage_count". The format of the data is suitible to be used as input in IgGeneUsage
gene_name = net charge group
https://vdjdb.cdr3.net/
data(CDR3_Epitopes) head(CDR3_Epitopes)
data(CDR3_Epitopes) head(CDR3_Epitopes)
A small example dataset that has the following features:
1 conditions
5 individuals (samples)
15 Ig genes
This dataset was simulated from zero-inflated beta-binomial (ZIBB) distribution. Simulation code is available in inst/scripts/d_zibb_1.R
data("d_zibb_1", package = "IgGeneUsage")
data("d_zibb_1", package = "IgGeneUsage")
A data frame with 4 columns:
"individual_id"
"condition"
"gene_name"
"gene_name_count"
This format is accepted by IgGeneUsage.
Simulation code is provided in inst/scripts/d_zibb_1.R
data("d_zibb_1", package = "IgGeneUsage") head(d_zibb_1)
data("d_zibb_1", package = "IgGeneUsage") head(d_zibb_1)
A small example dataset that has the following features:
1 conditions
5 individuals (samples)
3 biological replicates per individual
15 Ig genes
This dataset was simulated from zero-inflated beta-binomial (ZIBB) distribution. Simulation code is available in inst/scripts/d_zibb_2.R
data("d_zibb_2", package = "IgGeneUsage")
data("d_zibb_2", package = "IgGeneUsage")
A data frame with columns:
"individual_id"
"condition"
"gene_name"
"replicate"
"gene_name_count"
This format is accepted by IgGeneUsage.
Simulation code is provided in inst/scripts/d_zibb_2.R
data("d_zibb_2", package = "IgGeneUsage") head(d_zibb_2)
data("d_zibb_2", package = "IgGeneUsage") head(d_zibb_2)
A small example dataset that has the following features:
3 conditions
5 samples per condition
8 Ig genes
This dataset was simulated from zero-inflated beta-binomial (ZIBB) distribution. Simulation code is available in inst/scripts/d_zibb_3.R
data("d_zibb_3", package = "IgGeneUsage")
data("d_zibb_3", package = "IgGeneUsage")
A data frame with columns:
"individual_id"
"condition"
"gene_name"
"gene_name_count"
This format is accepted by IgGeneUsage.
Simulation code is provided in inst/scripts/d_zibb_3.R
data("d_zibb_3", package = "IgGeneUsage") head(d_zibb_3)
data("d_zibb_3", package = "IgGeneUsage") head(d_zibb_3)
A small example dataset that has the following features:
2 conditions
7 individuals per condition
4 replicates per individual
8 Ig genes
This dataset was simulated from zero-inflated beta-binomial (ZIBB) distribution. Simulation code is available in inst/scripts/d_zibb_4.R
data("d_zibb_4", package = "IgGeneUsage")
data("d_zibb_4", package = "IgGeneUsage")
A data frame with columns:
"individual_id"
"condition"
"gene_name"
"replicate"
"gene_name_count"
This format is accepted by IgGeneUsage.
Simulation code is provided in inst/scripts/d_zibb_4.R
data("d_zibb_4", package = "IgGeneUsage") head(d_zibb_4)
data("d_zibb_4", package = "IgGeneUsage") head(d_zibb_4)
A small example of paired-sample IRRs with these features:
3 conditions
6 individuals with one IRRs per condition
10 Ig genes
This dataset was simulated from zero-inflated beta-binomial (ZIBB) distribution. Simulation code is available in inst/scripts/d_zibb_5.R
data("d_zibb_5", package = "IgGeneUsage")
data("d_zibb_5", package = "IgGeneUsage")
A data frame with columns:
"individual_id"
"condition"
"gene_name"
"gene_name_count"
This format is accepted by IgGeneUsage.
Simulation code is provided in inst/scripts/d_zibb_5.R
data("d_zibb_5", package = "IgGeneUsage") head(d_zibb_5)
data("d_zibb_5", package = "IgGeneUsage") head(d_zibb_5)
A small example of paired-sample *with replicates* IRRs with these features:
3 conditions
9 individuals with one IRRs per condition
10 Ig genes
4 replicates per individual
This dataset was simulated from zero-inflated beta-binomial (ZIBB) distribution. Simulation code is available in inst/scripts/d_zibb_6.R
data("d_zibb_6", package = "IgGeneUsage")
data("d_zibb_6", package = "IgGeneUsage")
A data frame with columns:
"individual_id"
"condition"
"gene_name"
"replicate"
"gene_name_count"
This format is accepted by IgGeneUsage.
Simulation code is provided in inst/scripts/d_zibb_6.R
data("d_zibb_6", package = "IgGeneUsage") head(d_zibb_6)
data("d_zibb_6", package = "IgGeneUsage") head(d_zibb_6)
IgGeneUsage detects differential gene usage (DGU) in immune repertoires that belong to two biological conditions.
DGU(ud, mcmc_warmup, mcmc_steps, mcmc_chains, mcmc_cores, hdi_lvl, adapt_delta, max_treedepth, paired = FALSE)
DGU(ud, mcmc_warmup, mcmc_steps, mcmc_chains, mcmc_cores, hdi_lvl, adapt_delta, max_treedepth, paired = FALSE)
ud |
Data.frame with 4 or 5 columns:
ud can also be be a SummarizedExperiment object. See examplary data 'data(Ig_SE)' for more information. |
mcmc_chains , mcmc_warmup , mcmc_steps , mcmc_cores
|
Number of MCMC chains (default = 4), number of cores to use (default = 1), length of MCMC chains (default = 1,500), length of adaptive part of MCMC chains (default = 500). |
hdi_lvl |
Highest density interval (HDI) (default = 0.95). |
adapt_delta |
MCMC setting (default = 0.95). |
max_treedepth |
MCMC setting (default = 12). |
paired |
should a paired samples differential Ig gene analaysis be performed (default = FALSE)? |
The main input of IgGeneUsage is a table with Ig gene usage frequencies for a set of repertoires that belong to one of two biological condition. For the DGU analysis between two biological conditions, IgGeneUsage employs a Bayesian hierarchical model for zero-inflated beta-binomial (ZIBB) regression (see vignette 'User Manual: IgGeneUsage').
dgu |
DGU statistics for each gene: 1) es = effect size on DGU (mean, median standard error (se), standard deviation (sd), L (low boundary of HDI), H (high boundary of HDI); 2) contrast = direction of the effect; 3) pmax = probability of DGU. This summary is only available if the input data contains at least two conditions |
gu |
gene usage (GU) summary of each gene in each condition |
fit |
stanfit object |
ppc |
two types of posterior predictive checks: 1) repertoire- specific, 2) condition-specific |
ud |
processed gene usage data used for the model |
Simo Kitanovski <[email protected]>
LOO, Ig, IGHV_Epitopes, IGHV_HCV, Ig_SE, d_zibb_1, d_zibb_2, d_zibb_3
# input data data(d_zibb_2) head(d_zibb_2) # run differential gene usage (DGU) M <- DGU(ud = d_zibb_2, mcmc_warmup = 350, mcmc_steps = 1500, mcmc_chains = 2, mcmc_cores = 1, hdi_lvl = 0.95, adapt_delta = 0.8, max_treedepth = 10, paired = FALSE) # look at M elements names(M) # look at DGU results head(M$dgu) # look at posterior predictive checks (PPC) head(M$ppc)
# input data data(d_zibb_2) head(d_zibb_2) # run differential gene usage (DGU) M <- DGU(ud = d_zibb_2, mcmc_warmup = 350, mcmc_steps = 1500, mcmc_chains = 2, mcmc_cores = 1, hdi_lvl = 0.95, adapt_delta = 0.8, max_treedepth = 10, paired = FALSE) # look at M elements names(M) # look at DGU results head(M$dgu) # look at posterior predictive checks (PPC) head(M$ppc)
A small example database subset from study evaluating vaccine-induced changes in B-cell populations publicly provided by R-package alakazam (version 0.2.11). It contains IGHV gene family usage, reported in four B-cell populations (samples IgM, IgD, IgG and IgA) across two timepoints (conditions = -1 hour and +7 days).
data("Ig")
data("Ig")
A data frame with 4 columns: "sample_id", "condition", "gene_name" and "gene_usage_count". The format of the data is suitible to be used as input in IgGeneUsage
R package: alakazam version 0.2.11
Laserson U and Vigneault F, et al. High-resolution antibody dynamics of vaccine-induced immune responses. Proc Natl Acad Sci USA. 2014 111:4928-33.
data(Ig) head(Ig)
data(Ig) head(Ig)
A small example database subset from study evaluating vaccine-induced changes in B-cell populations publicly provided by R-package alakazam (version 0.2.11). It contains IGHV gene family usage, reported in four B-cell populations (samples IgM, IgD, IgG and IgA) across two timepoints (conditions = -1 hour and +7 days).
data("Ig_SE")
data("Ig_SE")
A SummarizedExperiment object with 1) assay data (rows = gene name, columns = repertoires) and 2) column data.frame in which the sample names and the corresponding biological condition labels are noted.
R package: alakazam version 0.2.11
Laserson U and Vigneault F, et al. High-resolution antibody dynamics of vaccine-induced immune responses. Proc Natl Acad Sci USA. 2014 111:4928-33.
# inspect the data data(Ig_SE) # repertoire information: must have the two columns: 'condition' and 'individual_id' SummarizedExperiment::colData(Ig_SE) # assay counts (gene frequency usage) SummarizedExperiment::assay(x = Ig_SE)
# inspect the data data(Ig_SE) # repertoire information: must have the two columns: 'condition' and 'individual_id' SummarizedExperiment::colData(Ig_SE) # assay counts (gene frequency usage) SummarizedExperiment::assay(x = Ig_SE)
Publicly available dataset of IGHV segment usage in memory B-cells of 22 HCV+ individuals and 7 healthy donors.
data("IGHV_HCV")
data("IGHV_HCV")
A data frame with 4 columns: "individual_id", "condition", "gene_name" and "gene_usage_count". The format of the data is suitible to be used as input in IgGeneUsage
Tucci, Felicia A., et al. "Biased IGH VDJ gene repertoire and clonal expansions in B cells of chronically hepatitis C virus–infected individuals." Blood 131.5 (2018): 546-557.
data(IGHV_HCV) head(IGHV_HCV)
data(IGHV_HCV) head(IGHV_HCV)
IgGeneUsage detects differential gene usage (DGU) in immune repertoires that belong to two biological conditions.
To quantify the robustness of the estimated probability of DGU (pmax), IgGeneUsage has a built-in procedure for a fully Bayesian leave-one-out (LOO) analysis. In each LOO step we discard the data of one of the repertoires, and use the remaining data to analyze for DGU with IgGeneUsage. In each step we record pmax for all genes. Finally, we evaluate the variability of pmax for a given across the different LOO steps. Low variability in pmax: robust DGU; high variability: unrobust DGU.
For datasets that include many repertoires (e.g. 100) LOO can be computationally costly.
LOO(ud, mcmc_warmup, mcmc_steps, mcmc_chains, mcmc_cores, hdi_lvl, adapt_delta, max_treedepth, paired = FALSE)
LOO(ud, mcmc_warmup, mcmc_steps, mcmc_chains, mcmc_cores, hdi_lvl, adapt_delta, max_treedepth, paired = FALSE)
ud |
Data.frame with 4 columns:
ud can also be be a SummarizedExperiment object. See dataset 'data(Ig_SE)' for more information. |
mcmc_chains , mcmc_warmup , mcmc_steps , mcmc_cores
|
Number of MCMC chains (default = 4), number of cores to use (default = 1), length of MCMC chains (default = 1,500), length of adaptive part of MCMC chains (default = 500). |
hdi_lvl |
Highest density interval (HDI) (default = 0.95). |
adapt_delta |
MCMC setting (default = 0.95). |
max_treedepth |
MCMC setting (default = 12). |
paired |
should a paired samples differential Ig gene analaysis be performed (default = FALSE)? |
IgGeneUsage invokes the function DGU in each LOO step. For more details see help for DGU or vignette 'User Manual: IgGeneUsage'.
loo |
DGU statistics for each Ig gene for specific LOO step:
|
Simo Kitanovski <[email protected]>
DGU, Ig, IGHV_Epitopes, IGHV_HCV, Ig_SE, d_zibb_1, d_zibb_2, d_zibb_3
# input data: data("Ig", package = "IgGeneUsage") head(Ig) # run leave-one-out (LOO) L <- LOO(ud = Ig, mcmc_warmup = 500, mcmc_steps = 2000, mcmc_chains = 3, mcmc_cores = 1, hdi_lvl = 0.95, adapt_delta = 0.99, max_treedepth = 10, paired = FALSE) # how many LOOs? names(L) # elements in first LOO, see vignette about how to extract results names(L[[1]])
# input data: data("Ig", package = "IgGeneUsage") head(Ig) # run leave-one-out (LOO) L <- LOO(ud = Ig, mcmc_warmup = 500, mcmc_steps = 2000, mcmc_chains = 3, mcmc_cores = 1, hdi_lvl = 0.95, adapt_delta = 0.99, max_treedepth = 10, paired = FALSE) # how many LOOs? names(L) # elements in first LOO, see vignette about how to extract results names(L[[1]])