--- title: "Single-cell immune repertoire trajectory analysis with dandelionR" output: BiocStyle::html_document: toc: true toc_depth: 2 number_sections: true knitr: opts_chunk: dev: 'png' fig.align: 'left' date: "`r Sys.Date()`" vignette: > %\VignetteIndexEntry{Single-cell immune repertoire trajectory analysis with dandelionR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo=FALSE, results="hide", message=FALSE} knitr::opts_chunk$set(error = FALSE, message = FALSE, warning = FALSE) library(BiocStyle) ``` # Foreword Welcome to `dandelionR`! `dandelionR` is an R package for performing single-cell immune repertoire trajectory analysis, based on the original python implementation in [dandelion](https://www.github.com/zktuong/dandelion). It provides all the necessary tools to interface with [scRepertoire](https://github.com/ncborcherding/scRepertoire) and a custom implementation of absorbing markov chain for pseudotime inference, inspired based on the [palantir](https://github.com/dpeerlab/Palantir) python package. This is a work in progress, so please feel free to open an issue if you encounter any problems or have any suggestions for improvement. # Installation You can install `dandelionR` with: ## Bioconductor ```{r, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("dandelionR") ``` ## Development version (GitHub) ```{r, eval = FALSE} if (!requireNamespace("devtools", quietly = TRUE)) { install.packages("devtools") } devtools::install_github("tuonglab/dandelionR", dependencies = TRUE) ``` In a standard analysis workflow in R, users probably choose to read in their VDJ data with [scRepertoire](https://github.com/ncborcherding/scRepertoire). In this vignette, we will demonstrate how to perform TCR trajectory analysis starting from 'raw' data i.e. just a standard single-cell gene expression data (stored in `SingleCellExperiment`) and VDJ data (in AIRR format). Install `scater` and `scRepertoire` if you haven't already. ```{r, eval = FALSE} # only for the tutorial if (!requireNamespace("scater", quietly = TRUE)) { BiocManager::install("scater") } if (!requireNamespace("scRepertoire", quietly = TRUE)) { BiocManager::install("scRepertoire") } # or devtools::install_github("ncborcherding/scRepertoire") ``` # Usage ## Load the required libraries ```{r, message=FALSE, warning=FALSE} library(dandelionR) library(scRepertoire) library(scater) ``` ## Load the demo data Due to size limitations of the package, we have provided a very trimmed down version of the demo data to ~2000 cells. The full dataset can be found here accordingly: GEX (Lymphoid Cells) - https://developmental.cellatlas.io/fetal-immune VDJ - https://github.com/zktuong/dandelion-demo-files/ The VDJ data is in the dandelion_manuscript/data/dandelion-remap folder. ```{r} data(demo_sce) data(demo_airr) ``` Check out the other vignette for an example dataset that starts from the original `dandelion` output associated with the original [manuscript](https://www.nature.com/articles/s41587-023-01734-7). ```{r} vignette("vignette_reproduce_original") ``` We will also set the seed so that the plots and results are consistent. ```{r} set.seed(123) ``` # Use `scRepertoire` to load the VDJ data For the trajectory analysis work here, we are focusing on the main productive TCR chains. Therefore we will flag `filterMulti = TRUE`, which will keep the selection of the 2 corresponding chains with the highest expression for a single barcode. For more details, refer to `scRepertoire`'s [docs](https://www.borch.dev/uploads/screpertoire/reference/combinetcr). ```{r} contig.list <- loadContigs(input = demo_airr, format = "AIRR") # Format to `scRepertoire`'s requirements and some light filtering combined.TCR <- combineTCR(contig.list, removeNA = TRUE, removeMulti = FALSE, filterMulti = TRUE ) ``` ## Merging VDJ data with gene expression data Next we will combine the gene expression data with the VDJ data to create a `SingleCellExperiment` object. ```{r} sce <- combineExpression(combined.TCR, demo_sce) ``` # Initiate `dandelionR` workflow Here, the data is ready to be used for the pseudobulk and trajectory analysis workflow in `dandelionR`. Because this is a alpha-beta TCR data, we will set the `mode_option` to "abT". This will append `abT` to the relevant columns holding the VDJ gene information. If you are going to try other types of VDJ data e.g. BCR, you should set `mode_option` to "B" instead. And this argument should be consistently set with the `vdjPseudobulk` function later. Since the TCR data is already filtered for productive chains in `combineTCR`, we will set `already.productive = TRUE` and can keep `allowed_chain_status` as `NULL`. We will also subset the data to only include the main T-cell types: CD8+T, CD4+T, ABT(ENTRY), DP(P)_T, DP(Q)_T. ```{r} sce <- setupVdjPseudobulk(sce, mode_option = "abT", already.productive = TRUE, subsetby = "anno_lvl_2_final_clean", groups = c("CD8+T", "CD4+T", "ABT(ENTRY)", "DP(P)_T", "DP(Q)_T") ) ``` The main output of this function is a `SingleCellExperiment` object with the relevant VDJ information appended to the `colData`, particularly the columns with the `_main` suffix e.g. `v_call_abT_VJ_main`, `j_call_abT_VJ_main` etc. ```{r} head(colData(sce)) ``` Visualise the UMAP of the filtered data. ```{r} plotUMAP(sce, color_by = "anno_lvl_2_final_clean") ``` ## Milo object and neighbourhood graph construction We will use miloR to create the pseudobulks based on the gene expression data. The goal is to construct a neighbourhood graph with many neighbors with which we can sample the representative neighbours to form the objects. ```{r, warning = FALSE} library(miloR) milo_object <- Milo(sce) milo_object <- buildGraph(milo_object, k = 30, d = 20, reduced.dim = "X_scvi") milo_object <- makeNhoods(milo_object, reduced_dims = "X_scvi", d = 20, prop = 0.3 ) ``` ## Construct UMAP on milo neighbor graph We can visualise this milo object using UMAP. ```{r, warning = FALSE} milo_object <- miloUmap(milo_object, n_neighbors = 30) ``` ```{r} plotUMAP(milo_object, color_by = "anno_lvl_2_final_clean", dimred = "UMAP_knngraph" ) ``` # Construct pseudobulked VDJ feature space Next, we will construct the pseudobulked VDJ feature space using the neighbourhood graph constructed above. We will also run PCA on the pseudobulked VDJ feature space. ```{r} pb.milo <- vdjPseudobulk(milo_object, mode_option = "abT", col_to_take = "anno_lvl_2_final_clean" ) ``` Inspect the newly created `pb.milo` object. ```{r} pb.milo ``` We can compute and visualise the PCA of the pseudobulked VDJ feature space. ```{r} pb.milo <- runPCA(pb.milo, assay.type = "Feature_space", ncomponents = 20) plotPCA(pb.milo, color_by = "anno_lvl_2_final_clean") ``` ## TCR trajectory inference using Absorbing Markov Chain In the original `dandelion` python package, the trajectory inference is done using the `palantir` package. Here, we implement the absorbing markov chain approach in dandelionR to infer the trajectory, leveraging on `destiny` for diffusion map computation. ### Define root and branch tips ```{r} # extract the PCA matrix pca <- t(as.matrix(reducedDim(pb.milo, type = "PCA"))) # define the CD8 terminal cell as the top-most cell and CD4 terminal cell as # the bottom-most cell branch.tips <- c(which.max(pca[2, ]), which.min(pca[2, ])) names(branch.tips) <- c("CD8+T", "CD4+T") # define the start of our trajectory as the right-most cell root <- which.max(pca[1, ]) ``` ### Construct diffusion map ```{r, warning = FALSE} library(destiny) # Run diffusion map on the PCA dm <- DiffusionMap(t(pca), n_pcs = 10, n_eigs = 10) ``` ### Compute diffusion pseudotime on diffusion map ```{r} dif.pse <- DPT(dm, tips = c(root, branch.tips), w_width = 0.1) ``` ```{r, message=FALSE} # the root is automatically called DPT + index of the root cell DPTroot <- paste0("DPT", root) # store pseudotime in milo object pb.milo$pseudotime <- dif.pse[[DPTroot]] # set the colours for pseudotime pal <- colorRampPalette(rev((RColorBrewer::brewer.pal(9, "RdYlBu"))))(255) plotPCA(pb.milo, color_by = "pseudotime") + scale_colour_gradientn(colours = pal) ``` # Markov chain construction on the pseudobulk VDJ feature space This step will compute the Markov chain probabilities on the pseudobulk VDJ feature space. It will return the branch probabilities in the `colData` and the column name corresponds to the branch tips defined earlier. ```{r} pb.milo <- markovProbability( milo = pb.milo, diffusionmap = dm, terminal_state = branch.tips, root_cell = root, pseudotime_key = "pseudotime", knn = 30 ) ``` Inspect the `pb.milo` object to see the newly added columns. ```{r} head(colData(pb.milo)) ``` # Visualising branch probabilities With the Markov chain probabilities computed, we can visualise the branch probabilities towards CD4+ or CD8+ T-cell fate on the PCA plot. ```{r, message=FALSE} plotPCA(pb.milo, color_by = "CD8+T") + scale_color_gradientn(colors = pal) plotPCA(pb.milo, color_by = "CD4+T") + scale_color_gradientn(colors = pal) ``` # Transfer The next step is to project the pseudotime and the branch probability information from the pseudobulks back to each cell in the dataset. If the cell do not belong to any of the pseudobulk, it will be removed. If a cell belongs to multiple pseudobulk samples, its value should be calculated as a weighted average of the corresponding values from each pseudobulk, where each weight is inverse of the size of the pseudobulk. ## Project pseudobulk data to each cell ```{r} cdata <- projectPseudotimeToCell(milo_object, pb.milo, branch.tips) ``` ## Visualise the trajectory data on a per cell basis ```{r, message=FALSE} plotUMAP(cdata, color_by = "anno_lvl_2_final_clean", dimred = "UMAP_knngraph") plotUMAP(cdata, color_by = "pseudotime", dimred = "UMAP_knngraph") + scale_color_gradientn(colors = pal) plotUMAP(cdata, color_by = "CD4+T", dimred = "UMAP_knngraph") + scale_color_gradientn(colors = pal) plotUMAP(cdata, color_by = "CD8+T", dimred = "UMAP_knngraph") + scale_color_gradientn(colors = pal) ``` And that's it! We have successfully inferred the trajectory of the T-cells in this dataset! # Session info ```{r, warning = FALSE} sessionInfo() ```