Application of high-throughput sequencing of T and B lymphocyte antigen receptors has great potential for improving the monitoring of lymphoid malignancies, assessing immune reconstitution after hematopoietic stem cell transplantation, and characterizing the composition of lymphocyte repertoires (Warren, E. H. et al. Blood 2013;122:19–22). LymhoSeq is an R package designed to import, analyze, and visualize antigen receptor sequencing from Adaptive Biotechnologies’ ImmunoSEQ assay. The package is also adaptable to the analysis of T and B cell receptor sequencing processed using other platforms such as MiXCR or IMGT/HighV-QUEST. This vignette has been written to highlight some of the features of LymphoSeq and guide the user through a typical workflow.
The LymphoSeq function readImmunoSeq
imports
tab-separated value (.tsv
) files exported by Adaptive
Biotechnologies ImmunoSEQ analyzer v2 where each row represents a unique
sequence and each column is a variable with information about that
sequence such as read count, frequency, or variable gene name.
Note that the file format for ImmunoSEQ analyzer v3 is not yet
supported and users must choose to export the v2 format from the
analyzer software. Only files with the extension
.tsv
are imported while all other are disregarded. It is
possible to import files processed using other platforms as long as the
files are tab-delimited, are given the extension .tsv
and
have identical column names as the ImmunoSEQs files (see
readImmunoSeq
manual for a list of column names used by
this file type). Refer to the LymphoSeq manual regarding the required
column names used by each function.
To explore the features of LymphoSeq, this package includes 2 example
data sets. The first is a data set of T cell receptor beta (TCRB)
sequencing from 10 blood samples acquired serially from a single patient
who underwent a bone marrow transplant (Kanakry,
C.G., et al. JCI Insight 2016;1(5):pii: e86252). The
second, is a data set of B cell receptor immunoglobulin heavy (IGH)
chain sequencing from Burkitt lymphoma tumor biopsies acquired from 10
different individuals (Lombardo, K.A.,
et al. Blood Advances 2017 1:535-544). To improve
performance, both data sets contain only the top 1,000 most frequent
sequences. The complete data sets are publicly available through Adapatives’
immuneACCESS portal. As shown in the example below, you can specify
the path to the example data sets using the command
system.file("extdata", "TCRB_sequencing", package = "LymphoSeq")
for the TCRB files and
system.file("extdata", "IGH_sequencing", package = "LymphoSeq")
for the IGH files.
readImmunoSeq
imports each file in the specified
directory as a list object where each file becomes a data frame. You can
import all columns from each file by setting the columns
parameter to "all"
or list just those columns of interest.
Be aware that Adaptive Biotechnologies has changed the column names of
their files over time and if the headings of your files are not all the
same, you will need to specify "all"
or provide all
variations of the column header. By default, the columns
parameter is set to import only those columns used by LymphoSeq.
## Loading required package: LymphoSeqDB
TCRB.path <- system.file("extdata", "TCRB_sequencing", package = "LymphoSeq")
TCRB.list <- readImmunoSeq(path = TCRB.path)
Notice that each data frame listed in the TCRB.list
object is named according the ImmunoSEQ file names. If different names
are desired, you may rename the original .tsv
files or
assign names(TCRB.list)
to a new character vector of
desired names in the same order as the list.
[1] "TRB_CD4_949" "TRB_CD8_949" "TRB_CD8_CMV_369"
[4] "TRB_Unsorted_0" "TRB_Unsorted_1320" "TRB_Unsorted_1496"
[7] "TRB_Unsorted_32" "TRB_Unsorted_369" "TRB_Unsorted_83"
[10] "TRB_Unsorted_949"
Having the data in the form of a list makes it easy to apply a
function over that list using the base function lapply
. For
example, you may use the function dim
to report the
dimensions of each data frame as shown below. Noticed that each data
frame in the example below has less than 1,000 rows and 11 columns.
$TRB_CD4_949
[1] 1000 11
$TRB_CD8_949
[1] 1000 11
$TRB_CD8_CMV_369
[1] 414 11
$TRB_Unsorted_0
[1] 1000 11
$TRB_Unsorted_1320
[1] 1000 11
$TRB_Unsorted_1496
[1] 1000 11
$TRB_Unsorted_32
[1] 920 11
$TRB_Unsorted_369
[1] 1000 11
$TRB_Unsorted_83
[1] 1000 11
$TRB_Unsorted_949
[1] 1000 11
In place of dim
, you may also use colnames
,
nrow
, ncol
, or other more complex functions
that perform operations on subsetted columns.
If you imported all of the files from your project but just want to
perform an analysis on a subset, use standard R methods to subset the
list. Remember that a single bracket [
returns a list and a
double bracket [[
returns a single data frame.
[1] "TRB_CD8_CMV_369"
aminoAcid
1
2 CASSPVSNEQFF
3 CASSQEVPPYQAFF
4
5 CASSQEASGRQTQYF
6 CASSLEHTGATNEKLFF
nucleotide
1 TCAATTCCCTGGAGCTTGGTGACTCTGCTGTGTATTTCTGTGCCAGCAGCCATCGGGACAGAGAACACTGAAGCTTTCTTTGGACAA
2 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGCCCCGTGAGCAATGAGCAGTTCTTCGGGCCA
3 ATCAATTCCCTGGAGCTTGGTGACTCTGCTGTGTATTTCTGTGCCAGCAGCCAAGAAGTTCCGCCTTACCAAGCTTTCTTTGGACAA
4 TGCCATCCCCAACCAGACAGCTCTTTACTTCTGTGCCACCAGTGTCCACAAACAGGGGGCAGGACCGGGGAGCTGTTTTTTGGAGAA
5 CACACCCTGCAGCCAGAAGACTCGGCCCTGTATCTCTGCGCCAGCAGCCAAGAGGCTAGCGGGAGACAGACCCAGTACTTCGGGCCA
6 GCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGTTTGGAGCACACGGGTGCAACTAATGAAAAACTGTTTTTTGGCAGT
count frequencyCount estimatedNumberGenomes vFamilyName dFamilyName
1 1450 0.06606637 1450 TCRBV03 TCRBD01
2 822 0.03737558 822 TCRBV28 TCRBD02
3 797 0.03635297 797 TCRBV03
4 702 0.03203462 702 TCRBV24 TCRBD01
5 704 0.03201317 704 TCRBV04 TCRBD02
6 653 0.02968602 653 TCRBV28 TCRBD02
jFamilyName vGeneName dGeneName jGeneName
1 TCRBJ01 unresolved TCRBD01-01 TCRBJ01-01
2 TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01
3 TCRBJ01 unresolved unresolved TCRBJ01-01
4 TCRBJ02 unresolved TCRBD01-01 TCRBJ02-02
5 TCRBJ02 TCRBV04-03 TCRBD02-01 TCRBJ02-05
6 TCRBJ01 TCRBV28-01 TCRBD02-01 TCRBJ01-04
For more complex subsetting, you can use a metadata file where one column contains the file names and the other columns have additional information about the sample files. You can then subset the metadata file using criteria from the other columns to give you just a character vector of file names that you can use to subset TCRB.list. In the example below, a metadata file is imported for the example TCRB data set which contains information on the number of days post bone marrow transplant the sample was collected and the cellular phenopyte the blood sample was sorted for prior to sequencing.
TCRB.metadata <- read.csv(system.file("extdata", "TCRB_metadata.csv", package = "LymphoSeq"))
TCRB.metadata
samples day phenotype
1 TRB_Unsorted_0 0 Unsorted
2 TRB_Unsorted_32 32 Unsorted
3 TRB_Unsorted_83 82 Unsorted
4 TRB_CD8_CMV_369 369 CD8+CMV+
5 TRB_Unsorted_369 369 Unsorted
6 TRB_CD4_949 949 CD4+
7 TRB_CD8_949 949 CD8+
8 TRB_Unsorted_949 949 Unsorted
9 TRB_Unsorted_1320 1320 Unsorted
10 TRB_Unsorted_1496 1496 Unsorted
selected <- as.character(TCRB.metadata[TCRB.metadata$phenotype == "Unsorted" &
TCRB.metadata$day > 300, "samples"])
TCRB.list.selected <- TCRB.list[selected]
names(TCRB.list.selected)
[1] "TRB_Unsorted_369" "TRB_Unsorted_949" "TRB_Unsorted_1320"
[4] "TRB_Unsorted_1496"
A productive sequence is defined as a sequences that is in frame and
does not have an early stop codon. If you sequenced genomic DNA as
opposed to complimentary DNA made from RNA, then you will have
unproductive and productive sequences in your data files. Use the
function productiveSeq
to remove unproductive sequences and
recompute the frequencyCount for each of your samples.
If you are interested in just the complementarity determining region
3 (CDR3) amino acid sequences, then set aggregate to
"aminoAcid"
and the count and estimated number of genomes
for duplicate amino acid sequences will be summed. Note that the
resulting list of data frames will have columns corresponding to
“aminoAcid”, “count”, “frequencyCount”, and “estimatedNumberGenomes” (if
this column is available) only. All other columns, such as those
corresponding to the V, D, and J gene names, will be removed if they
were included in your original file list. The reason for this is to
avoid confusion since a single amino acid CDR3 sequence may be encoded
by multiple different nucleotide sequences with differing V, D, and J
genes.
productive.TRB.aa <- productiveSeq(file.list = TCRB.list, aggregate = "aminoAcid",
prevalence = FALSE)
Alternatively, you may set aggregate to "nucleotide"
and
the resulting list of data frames will all have the same columns as your
original file list. Take note that some LymphoSeq functions require a
productive sequence list aggregated by amino acid or nucleotide.
productive.TRB.nt <- productiveSeq(file.list = TCRB.list, aggregate = "nucleotide",
prevalence = FALSE)
If the parameter prevalence
is set to TRUE
,
then a new column is added to each of the data frames giving the
prevalence (%) of each TCR beta CDR3 amino acid sequence in 55 healthy
donor peripheral blood samples. Values range from 0 to 100% where 100%
means the sequence appeared in the blood of all 55 individuals. The data
for this operation resides in a separate package that is automatically
loaded called LymphoSeqDB. Please refer to that package manual for more
details.
Notice in the example below that there are no amino acid sequences given in the first and fourth row of the TCRB.list data frame for sample “TRB_Unsorted_949”. This is because the nucleotide sequence is out of frame and does not produce a productively transcribed amino acid sequence. If an asterisk (*) appears in the amino acid sequences, this would indicate an early stop codon.
aminoAcid
1
2 CASSPVSNEQFF
3 CASSQEVPPYQAFF
4
5 CASSQEASGRQTQYF
6 CASSLEHTGATNEKLFF
nucleotide
1 TCAATTCCCTGGAGCTTGGTGACTCTGCTGTGTATTTCTGTGCCAGCAGCCATCGGGACAGAGAACACTGAAGCTTTCTTTGGACAA
2 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGCCCCGTGAGCAATGAGCAGTTCTTCGGGCCA
3 ATCAATTCCCTGGAGCTTGGTGACTCTGCTGTGTATTTCTGTGCCAGCAGCCAAGAAGTTCCGCCTTACCAAGCTTTCTTTGGACAA
4 TGCCATCCCCAACCAGACAGCTCTTTACTTCTGTGCCACCAGTGTCCACAAACAGGGGGCAGGACCGGGGAGCTGTTTTTTGGAGAA
5 CACACCCTGCAGCCAGAAGACTCGGCCCTGTATCTCTGCGCCAGCAGCCAAGAGGCTAGCGGGAGACAGACCCAGTACTTCGGGCCA
6 GCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGTTTGGAGCACACGGGTGCAACTAATGAAAAACTGTTTTTTGGCAGT
count frequencyCount estimatedNumberGenomes vFamilyName dFamilyName
1 1450 0.06606637 1450 TCRBV03 TCRBD01
2 822 0.03737558 822 TCRBV28 TCRBD02
3 797 0.03635297 797 TCRBV03
4 702 0.03203462 702 TCRBV24 TCRBD01
5 704 0.03201317 704 TCRBV04 TCRBD02
6 653 0.02968602 653 TCRBV28 TCRBD02
jFamilyName vGeneName dGeneName jGeneName
1 TCRBJ01 unresolved TCRBD01-01 TCRBJ01-01
2 TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01
3 TCRBJ01 unresolved unresolved TCRBJ01-01
4 TCRBJ02 unresolved TCRBD01-01 TCRBJ02-02
5 TCRBJ02 TCRBV04-03 TCRBD02-01 TCRBJ02-05
6 TCRBJ01 TCRBV28-01 TCRBD02-01 TCRBJ01-04
After productiveSeq
is run, the unproductive sequences
are removed and the frequencyCount is recalculated for each sequence. If
there were two identical amino acid sequences that differed in their
nucleotide sequence, they would be combined and their counts added
together.
aminoAcid count frequencyCount estimatedNumberGenomes
1 CASSPVSNEQFF 822 5.773283 822
2 CASSQEVPPYQAFF 797 5.597696 797
3 CASSQEASGRQTQYF 704 4.944515 704
4 CASSLEHTGATNEKLFF 653 4.586318 653
5 CASSPGDEQYF 619 4.347521 619
6 CSARSPSTGTLAEAFF 429 3.013064 429
Finally, notice that the productive.TRB.nt data frame for sample “TRB_Unsorted_949” below has additional columns not present in productive.TRB.aa but are in TCRB.list. This is because the data frame was aggregated by nucleotide sequence and all of the original columns from TCRB.list were carried over.
aminoAcid
1 CASSPVSNEQFF
2 CASSQEVPPYQAFF
3 CASSQEASGRQTQYF
4 CASSLEHTGATNEKLFF
5 CASSPGDEQYF
6 CSARSPSTGTLAEAFF
nucleotide
1 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGCCCCGTGAGCAATGAGCAGTTCTTCGGGCCA
2 ATCAATTCCCTGGAGCTTGGTGACTCTGCTGTGTATTTCTGTGCCAGCAGCCAAGAAGTTCCGCCTTACCAAGCTTTCTTTGGACAA
3 CACACCCTGCAGCCAGAAGACTCGGCCCTGTATCTCTGCGCCAGCAGCCAAGAGGCTAGCGGGAGACAGACCCAGTACTTCGGGCCA
4 GCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGTTTGGAGCACACGGGTGCAACTAATGAAAAACTGTTTTTTGGCAGT
5 CCCCTGACCCTGGAGTCTGCCAGGCCCTCACATACCTCTCAGTACCTCTGTGCCAGCAGTCCGGGGGACGAGCAGTACTTCGGGCCG
6 AGTGCCCATCCTGAAGACAGCAGCTTCTACATCTGCAGTGCTAGATCACCCAGTACAGGGACCCTCGCTGAAGCTTTCTTTGGACAA
count frequencyCount estimatedNumberGenomes vFamilyName dFamilyName
1 822 5.773283 822 TCRBV28 TCRBD02
2 797 5.597696 797 TCRBV03
3 704 4.944515 704 TCRBV04 TCRBD02
4 653 4.586318 653 TCRBV28 TCRBD02
5 619 4.347521 619 TCRBV25 TCRBD02
6 429 3.013064 429 TCRBV20 TCRBD01
jFamilyName vGeneName dGeneName jGeneName
1 TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01
2 TCRBJ01 unresolved unresolved TCRBJ01-01
3 TCRBJ02 TCRBV04-03 TCRBD02-01 TCRBJ02-05
4 TCRBJ01 TCRBV28-01 TCRBD02-01 TCRBJ01-04
5 TCRBJ02 TCRBV25-01 TCRBD02-01 TCRBJ02-07
6 TCRBJ01 unresolved TCRBD01-01 TCRBJ01-01
To create a table summarizing the total number of sequences, number
of unique productive sequences, number of genomes, entropy, clonality,
Gini coefficient, and the frequency (%) of the top productive sequence
in each imported file, use the function clonality
.
samples totalSequences uniqueProductiveSequences totalCount
1 TRB_CD4_949 1000 845 25769
2 TRB_Unsorted_369 1000 830 339413
3 TRB_Unsorted_83 1000 823 236732
4 TRB_CD8_949 1000 794 26239
5 TRB_CD8_CMV_369 414 281 1794
6 TRB_Unsorted_1320 1000 838 178190
7 TRB_Unsorted_1496 1000 832 33669
8 TRB_Unsorted_949 1000 831 6549
9 TRB_Unsorted_0 1000 838 18161
10 TRB_Unsorted_32 920 767 31078
clonality giniCoefficient topProductiveSequence totalGenomes
1 0.442719 0.8665242 30.091732 25769
2 0.425965 0.8447387 29.720171 NA
3 0.338114 0.7766277 23.645843 NA
4 0.430615 0.9026124 19.346779 26239
5 0.331570 0.7606261 16.487936 1794
6 0.421630 0.9016617 14.579022 178190
7 0.389318 0.8812733 14.248338 33669
8 0.305784 0.7654438 13.837321 6549
9 0.280923 0.8184686 5.773283 18161
10 0.134242 0.6007820 4.865016 NA
The clonality score is derived from the Shannon entropy, which is calculated from the frequencies of all productive sequences divided by the logarithm of the total number of unique productive sequences. This normalized entropy value is then inverted (1 - normalized entropy) to produce the clonality metric.
The Gini coefficient is an alternative metric used to calculate repertoire diversity and is derived from the Lorenz curve. The Lorenz curve is drawn such that x-axis represents the cumulative percentage of unique sequences and the y-axis represents the cumulative percentage of reads. A line passing through the origin with a slope of 1 reflects equal frequencies of all clones. The Gini coefficient is the ratio of the area between the line of equality and the observed Lorenz curve over the total area under the line of equality.
Both Gini coefficient and clonality are reported on a scale from 0 to 1 where 0 indicates all sequences have the same frequency and 1 indicates the repertoire is dominated by a single sequence.
A phylogenetic tree is a useful way to visualize the similarity
between sequences. The phyloTree
function create a
phylogenetic tree of a single sample using neighbor joining tree
estimation for amino acid or nucleotide CDR3 sequences. Each leaf in the
tree represents a sequence color coded by the V, D, and J gene usage.
The number next to each leaf refers to the sequence count. A triangle
shaped leaf indicates the most frequent sequence. The distance between
leaves on the horizontal axis corresponds to the sequence similarity
(i.e. the further apart the leaves are horizontally, the less similar
the sequences are to one another).
phyloTree(list = productive.IGH.nt, sample = "IGH_MVQ92552A_BL", type = "nucleotide",
layout = "rectangular")
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
ℹ The deprecated feature was likely used in the LymphoSeq package.
Please report the issue to the authors.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.
In LymphoSeq, you can perform a multiple sequence alignment using one of three methods provided by the Bioconductor msa package (ClustalW, ClustalOmega, or Muscle) and output results to the console or as a pdf file. One may perform the alignment of all amino acid or nucleotide sequences in a single sample. Alternatively, one may search for a given sequence within a list of samples using an edit distance threshold.
alignSeq(list = productive.IGH.nt, sample = "IGH_MVQ92552A_BL", type = "aminoAcid",
method = "ClustalW", output = "consule")
use default substitution matrix
To search for one or more amino acid or nucleotide CDR3 sequences in
a list of data frames, use the function searchSeq
. You may
specify to search in either a list of productive or unproductive data
frames.
searchSeq(list = productive.TRB.aa, sequence = "CASSPVSNEQFF", type = "aminoAcid",
match = "global", editDistance = 0)
sample aminoAcid count frequencyCount estimatedNumberGenomes
1 TRB_Unsorted_0 CASSPVSNEQFF 822 5.773282764 822
2 TRB_CD8_949 CASSPVSNEQFF 2 0.008923791 2
If you have only a partial sequence, set the parameter match to
"partial"
. If you are looking for related sequences that
differ by one or more nucleotides or amino acids, then increase the
editDistance
value. Edit distance is a way of quantifying
how dissimilar two sequences are to one another by counting the minimum
number of operations required to transform one sequence into the other.
For example, an edit distance of 0 means the sequences are identical and
an edit distance of 1 indicates that the sequences differ by a single
amino acid or nucleotide.
To search your entire list of data frames for a published amino acid
CDR3 TCRB sequence with known antigen specificity, use the function
searchPublished
.
sample aminoAcid count frequencyCount estimatedNumberGenomes
1 TRB_Unsorted_32 CASASSGTDTQYF 33 0.131594688 0
2 TRB_Unsorted_1496 CASSETGGTEAFF 2 0.007267706 2
3 TRB_Unsorted_949 CASSFSTDTQYF 2 0.038277512 2
4 TRB_CD8_949 CASSIRSAYEQYF 7 0.031233268 7
5 TRB_Unsorted_1320 CASSIRSAYEQYF 15 0.010313886 15
6 TRB_Unsorted_1496 CASSIRSSYEQYF 4 0.014535412 4
PMID HLA antigen epitope prevalence
1 20647322 HLA-A*24:02 Leukemia <NA> 7.3
2 19786555 HLA-A*02:01 Melanoma EAAGIGILTV 34.5
3 23267020 HLA-A*02 EBV BMFL1-GLCTLVAML 85.5
4 21048112 HLA-A*02 EBV BMLF1-GLCTLVAML 18.2
5 21048112 HLA-A*02 EBV BMLF1-GLCTLVAML 18.2
6 21048112 HLA-A*02 EBV BMLF1-GLCTLVAML 45.5
For each found sequence, a table is provides listing the antigen, epitope, HLA type, PubMed ID (PMID), and prevalence (%) of the sequence among 55 healthy donor blood samples. The data for this function resides in the separate LymphoSeqDB package that is automatically loaded when the function is called. Please refer to that package manual for more details.
Antigen receptor repertoire diversity can be characterized by a
number such as clonality or Gini coefficient calculated by the
clonality
function. Alternatively, you can visualize the
repertoire diversity by plotting the Lorenz curve for each sample as
defined above. In this plot, the more diverse samples will appear near
the dotted diagonal line (the line of equality) whereas the more clonal
samples will appear to have a more bowed shape.
Alternatively, you can get a feel for the repertoire diversity by
plotting the cumulative frequency of a selected number of the top most
frequent clones using the function topSeqsPlot
. In this
case, each of the top sequences are represented by a different color and
all less frequent clones will be assigned a single color (violet).
Both of these functions are built using the ggplot2 package. You can
reformat the plot using ggplot2 functions. Please refer to the
lorenzCurve
and topSeqsPlot
manual for
specific examples.
To compare the T or B cell repertoires of all samples in a pairwise
fashion, use the bhattacharyyaMatrix
or
similarityMatrix
functions. Both the Bhattacharyya
coefficient and similarity score are measures of the amount of overlap
between two samples. The value for each ranges from 0 to 1 where 1
indicates the sequence frequencies are identical in the two samples and
0 indicates no shared frequencies. The Bhattacharyya coefficient differs
from the similarity score in that it involves weighting each shared
sequence in the two distributions by the arithmetic mean of the
frequency of each sequence, while calculating the similarity scores
involves weighting each shared sequence in the two distributions by the
geometric mean of the frequency of each sequence in the two
distributions.
bhattacharyya.matrix <- bhattacharyyaMatrix(productive.seqs = productive.TRB.aa)
bhattacharyya.matrix
TRB_Unsorted_1496 TRB_Unsorted_1320 TRB_Unsorted_949
TRB_Unsorted_1496 1.00000000 0.95291865 0.85590946
TRB_Unsorted_1320 0.95291865 1.00000000 0.87674297
TRB_Unsorted_949 0.85590946 0.87674297 1.00000000
TRB_Unsorted_369 0.52401899 0.53276440 0.51749643
TRB_Unsorted_83 0.30375115 0.33141730 0.37518862
TRB_Unsorted_32 0.28782025 0.31292611 0.28018709
TRB_CD8_CMV_369 0.75528817 0.77886610 0.71503376
TRB_Unsorted_0 0.01542965 0.01627338 0.01388704
TRB_CD8_949 0.78871302 0.81794347 0.81414832
TRB_CD4_949 0.44361207 0.43743309 0.42587691
TRB_Unsorted_369 TRB_Unsorted_83 TRB_Unsorted_32
TRB_Unsorted_1496 0.524018989 0.30375115 0.287820253
TRB_Unsorted_1320 0.532764399 0.33141730 0.312926108
TRB_Unsorted_949 0.517496433 0.37518862 0.280187090
TRB_Unsorted_369 1.000000000 0.46978757 0.192342747
TRB_Unsorted_83 0.469787569 1.00000000 0.297865580
TRB_Unsorted_32 0.192342747 0.29786558 1.000000000
TRB_CD8_CMV_369 0.512020867 0.40059194 0.272207037
TRB_Unsorted_0 0.008900138 0.01347907 0.008658058
TRB_CD8_949 0.532143928 0.43208064 0.350967319
TRB_CD4_949 0.176016728 0.06449128 0.023973222
TRB_CD8_CMV_369 TRB_Unsorted_0 TRB_CD8_949 TRB_CD4_949
TRB_Unsorted_1496 0.755288167 0.015429649 0.78871302 0.443612066
TRB_Unsorted_1320 0.778866101 0.016273376 0.81794347 0.437433093
TRB_Unsorted_949 0.715033758 0.013887037 0.81414832 0.425876907
TRB_Unsorted_369 0.512020867 0.008900138 0.53214393 0.176016728
TRB_Unsorted_83 0.400591940 0.013479071 0.43208064 0.064491279
TRB_Unsorted_32 0.272207037 0.008658058 0.35096732 0.023973222
TRB_CD8_CMV_369 1.000000000 0.008967238 0.86559885 0.001116121
TRB_Unsorted_0 0.008967238 1.000000000 0.04164991 0.006956798
TRB_CD8_949 0.865598846 0.041649912 1.00000000 0.000000000
TRB_CD4_949 0.001116121 0.006956798 0.00000000 1.000000000
TRB_Unsorted_1496 TRB_Unsorted_1320 TRB_Unsorted_949
TRB_Unsorted_1496 1.00000000 0.9581854 0.89906548
TRB_Unsorted_1320 0.95818541 1.0000000 0.93863667
TRB_Unsorted_949 0.89906548 0.9386367 1.00000000
TRB_Unsorted_369 0.74206009 0.6514381 0.77154129
TRB_Unsorted_83 0.54589852 0.4325548 0.59723589
TRB_Unsorted_32 0.32105103 0.4035552 0.26159989
TRB_CD8_CMV_369 0.67188308 0.7063167 0.65222570
TRB_Unsorted_0 0.02730081 0.0371885 0.01387248
TRB_CD8_949 0.78972983 0.7648513 0.90794949
TRB_CD4_949 0.47223856 0.2900181 0.72941015
TRB_Unsorted_369 TRB_Unsorted_83 TRB_Unsorted_32
TRB_Unsorted_1496 0.74206009 0.54589852 0.32105103
TRB_Unsorted_1320 0.65143812 0.43255478 0.40355518
TRB_Unsorted_949 0.77154129 0.59723589 0.26159989
TRB_Unsorted_369 1.00000000 0.47127662 0.23387758
TRB_Unsorted_83 0.47127662 1.00000000 0.42829336
TRB_Unsorted_32 0.23387758 0.42829336 1.00000000
TRB_CD8_CMV_369 0.70312873 0.58191936 0.17384922
TRB_Unsorted_0 0.01588281 0.02653832 0.01017423
TRB_CD8_949 0.69642713 0.60793633 0.40165091
TRB_CD4_949 0.11852468 0.04184631 0.03070217
TRB_CD8_CMV_369 TRB_Unsorted_0 TRB_CD8_949 TRB_CD4_949
TRB_Unsorted_1496 0.671883079 0.027300812 0.78972983 0.472238563
TRB_Unsorted_1320 0.706316742 0.037188504 0.76485132 0.290018135
TRB_Unsorted_949 0.652225696 0.013872476 0.90794949 0.729410153
TRB_Unsorted_369 0.703128733 0.015882807 0.69642713 0.118524684
TRB_Unsorted_83 0.581919360 0.026538319 0.60793633 0.041846313
TRB_Unsorted_32 0.173849223 0.010174234 0.40165091 0.030702175
TRB_CD8_CMV_369 1.000000000 0.007247298 0.85081995 0.001761048
TRB_Unsorted_0 0.007247298 1.000000000 0.04387449 0.031309635
TRB_CD8_949 0.850819946 0.043874488 1.00000000 0.000000000
TRB_CD4_949 0.001761048 0.031309635 0.00000000 1.000000000
The results of either function can be visualized by the
pairwisePlot
function.
To view sequences shared between two or more samples, use the
function commonSeqs
. This function requires that a
productive amino acid list be specified.
common <- commonSeqs(samples = c("TRB_Unsorted_0", "TRB_Unsorted_32"),
productive.aa = productive.TRB.aa)
head(common)
aminoAcid TRB_Unsorted_0 TRB_Unsorted_32
1 CASSQDRTGQYGYTF 0.47057171 0.80551900
2 CAWTGGTTEAFF 0.10535188 0.15153328
3 CAISEGNYGYTF 0.03511729 0.18742274
4 CASSFGIQETQYF 0.01404692 0.09570523
To visualize the number of overlapping sequences between two or three
samples in the form of a Venn diagram, use the function
commonSeqVenn
.
commonSeqsVenn(samples = c("TRB_Unsorted_32", "TRB_Unsorted_83"),
productive.seqs = productive.TRB.aa)
commonSeqsVenn(samples = c("TRB_Unsorted_0", "TRB_Unsorted_32", "TRB_Unsorted_83"),
productive.seqs = productive.TRB.aa)
To compare the frequency of sequences between two samples as a
scatter plot, use the function commonSeqsPlot
.
commonSeqsPlot("TRB_Unsorted_32", "TRB_Unsorted_83",
productive.aa = productive.TRB.aa, show = "common")
If you have more than 3 samples to compare, use the
commonSeqBar
function. You can chose to color a single
sample with the color.sample
argument or a desired
intersection with the color.intersection
argument.
When comparing a sample from two different time points, it is useful
to identify sequences that are significantly more or less abundant in
one versus the other time point (DeWitt, W.S.,
et al. Journal of Virology 2015 89(8):4517-4526). The
differentialAbundance
function uses a Fisher exact test to
calculate differential abundance of each sequence in two time points and
reports the log2 transformed fold change, P value and adjusted P
value.
differentialAbundance(list = productive.TRB.aa,
sample1 = "TRB_Unsorted_949",
sample2 = "TRB_Unsorted_1320",
type = "aminoAcid", q = 0.01)
aminoAcid TRB_Unsorted_949 TRB_Unsorted_1320 p
1 CASSPPTGERDTQYF 5.62679426 12.93567573 2.406577e-66
2 CASSQDLMTVDSLFAGANVLTF 8.05741627 3.11891910 2.038483e-63
3 CASSLAGDSQETQYF 2.02870813 6.57888404 6.843544e-52
4 CASSSIKTGATEAFF 0.61244019 0.01856499 3.382517e-31
5 CASSQDTGNEQFF 0.32535885 0.00000000 1.481077e-25
6 CASSQGGSYNSPLHF 0.28708134 0.00000000 1.238489e-22
7 CASSFHRDAAYGYTF 0.24880383 0.00000000 1.034867e-19
8 CASSQEGGRDNEQFF 0.36363636 0.01031389 1.991455e-19
9 CASSPWSNEKLFF 0.22966507 0.00000000 2.990614e-18
10 CASSYFRDGGELGTF 0.57416268 0.07013442 2.034201e-16
11 CASSDMAIGREQYF 0.19138756 0.00000000 2.496149e-15
12 CASSRIGRERDEQYF 0.19138756 0.00000000 2.496149e-15
13 CATSDRRQADNVDIQYF 0.19138756 0.00000000 2.496149e-15
14 CASRDGQGSGNTIYF 0.91866029 0.25647196 9.081279e-13
15 CASRQGSYGYTF 0.15311005 0.00000000 2.081896e-12
16 CASSFPRLAGGTDTQYF 0.15311005 0.00000000 2.081896e-12
17 CASSQDERFSGNTIYF 0.15311005 0.00000000 2.081896e-12
18 CASSRQGTFSGNTEAFF 0.24880383 0.01237666 1.186840e-11
19 CASSYVGDGYTF 1.12918660 2.44507856 1.936386e-11
20 CASSAGDGYEQYF 0.13397129 0.00000000 6.010805e-11
21 CASSLRWGATGELFF 0.13397129 0.00000000 6.010805e-11
22 CASSQNLVRTANNSPLHF 0.13397129 0.00000000 6.010805e-11
23 CASTLRWGKTGELFF 0.13397129 0.00000000 6.010805e-11
24 CASLGAGGWTEAFF 0.38277512 0.05225702 1.047460e-10
25 CASSLYPSTDTQYF 0.40191388 1.23285316 8.660086e-10
26 CASSFNRYSSSYNEQFF 0.30622010 0.03506721 8.950179e-10
27 CASSNPGQTSYGYTF 0.11483254 0.00000000 1.735106e-09
28 CASSQDSGGEQYF 0.11483254 0.00000000 1.735106e-09
29 CASSRGPMTEAFF 0.11483254 0.00000000 1.735106e-09
30 CASSLNGPGQGAYEQYF 0.09569378 0.00000000 5.007709e-08
31 CASSQDTGYEQYF 0.09569378 0.00000000 5.007709e-08
32 CASSQSGDYNEQFF 0.09569378 0.00000000 5.007709e-08
33 CASSQVLELGRGYTF 0.09569378 0.00000000 5.007709e-08
34 CSAREMAGGLNSPLHF 0.09569378 0.00000000 5.007709e-08
35 CAWSDFQGPRSGNTIYF 0.47846890 1.15102967 6.651968e-07
36 CASSPDKWGYTF 0.59330144 1.29886203 1.152347e-06
37 CASSLPTGGLMNTEAFF 0.07655502 0.00000000 1.445013e-06
38 CASSNGRKNTEAFF 0.07655502 0.00000000 1.445013e-06
39 CASSPLVTGNTEAFF 0.07655502 0.00000000 1.445013e-06
40 CASSQDSGNEQFF 0.07655502 0.00000000 1.445013e-06
41 CASSQDSQGVGKNIQYF 0.07655502 0.00000000 1.445013e-06
42 CASSSGQRPYF 0.07655502 0.00000000 1.445013e-06
43 CSASLNHALEQYF 0.47846890 1.07401932 6.437625e-06
q l2fc
1 3.205561e-63 -1.200970
2 2.713221e-60 1.369271
3 9.101914e-49 -1.697282
4 4.495366e-28 5.043912
5 1.966870e-22 8.345888
6 1.643475e-19 8.165316
7 1.372234e-16 7.958865
8 2.638678e-16 5.139837
9 3.959573e-15 7.843388
10 2.691248e-13 3.033265
11 3.299908e-12 7.580353
12 3.299908e-12 7.580353
13 3.299908e-12 7.580353
14 1.197821e-09 1.840730
15 2.743938e-09 7.258425
16 2.743938e-09 7.258425
17 2.743938e-09 7.258425
18 1.560694e-08 4.329314
19 2.544412e-08 -1.114597
20 7.892187e-08 7.065780
21 7.892187e-08 7.065780
22 7.892187e-08 7.065780
23 7.892187e-08 7.065780
24 1.371125e-07 2.872800
25 1.132739e-06 -1.617043
26 1.169788e-06 3.126374
27 2.266048e-06 6.843388
28 2.266048e-06 6.843388
29 2.266048e-06 6.843388
30 6.525044e-05 6.580353
31 6.525044e-05 6.580353
32 6.525044e-05 6.580353
33 6.525044e-05 6.580353
34 6.525044e-05 6.580353
35 8.634255e-04 -1.266428
36 1.494594e-03 -1.130411
37 1.872737e-03 6.258425
38 1.872737e-03 6.258425
39 1.872737e-03 6.258425
40 1.872737e-03 6.258425
41 1.872737e-03 6.258425
42 1.872737e-03 6.258425
43 8.304536e-03 -1.166523
To create a data frame of unique, productive amino acid sequences as
rows and sample names as headers use the seqMatrix
function. Each value in the data frame represents the frequency that
each sequence appears in the sample. You can specify your own list of
sequences or all unique sequences in the list using the output of the
function uniqueSeqs
. The uniqueSeqs
function
creates a data frame of all unique, productive sequences and reports the
total count in all samples.
aminoAcid count
3143 CASSQDWERLGEQFF 99480
2178 CASSLQGREKLFF 90567
3039 CASSQDLMTVDSLFAGANVLTF 68682
2506 CASSPAGAYYNEQFF 30454
2744 CASSPPTGERDTQYF 24703
1642 CASSLAGDSQETQYF 22147
sequence.matrix <- seqMatrix(productive.aa = productive.TRB.aa, sequences = unique.seqs$aminoAcid)
head(sequence.matrix)
aminoAcid numberSamples TRB_CD4_949 TRB_CD8_949 TRB_CD8_CMV_369
1 CASSQDRTGQYGYTF 9 0 0.99500268 0.6032172
2 CASSLQGREKLFF 8 0 8.73192932 5.1608579
3 CASSQDLMTVDSLFAGANVLTF 8 0 10.65054435 8.8471850
4 CASSYSGNTEAFF 8 0 0.16955203 0.1340483
5 CASSFMDWTGGNSPLHF 8 0 0.04461895 0.4691689
6 CASSREGDQPQHF 8 0 0.29894699 0.1340483
TRB_Unsorted_0 TRB_Unsorted_1320 TRB_Unsorted_1496 TRB_Unsorted_32
1 0.47057171 0.43249562 0.50147171 0.80551900
2 0.00000000 7.36548974 4.86209528 4.86501575
3 0.00000000 3.11891910 2.60910644 1.55122224
4 0.04214075 0.07082202 0.07631091 0.00000000
5 0.00000000 0.04606869 0.04360624 0.03588946
6 0.00000000 0.29085158 0.47240089 2.84723053
TRB_Unsorted_369 TRB_Unsorted_83 TRB_Unsorted_949
1 1.0268620 1.8625084 0.80382775
2 9.0363857 23.6458434 6.18181818
3 12.1288068 12.1922371 8.05741627
4 0.3891633 0.8385633 0.07655502
5 0.5047288 0.6695954 0.03827751
6 0.2220807 0.1573815 0.19138756
If just the top clones with a frequency greater than a specified
amount are of interest to you, then use the topFreq
function. This creates a data frame of the top productive amino acid
sequences having a minimum specified frequency and reports the minimum,
maximum, and mean frequency that the sequence appears in a list of
samples. For TCRB sequences, the prevalence (%) and the published
antigen specificity of that sequence are also provided.
aminoAcid minFrequency maxFrequency meanFrequency
387 CASSQDRTGQYGYTF 0.43249562 1.8625084 0.8334973
276 CASSLQGREKLFF 4.86209528 23.6458434 8.7311794
382 CASSQDLMTVDSLFAGANVLTF 1.55122224 12.1922371 7.3944297
436 CASSREGDQPQHF 0.13404826 2.8472305 0.5767910
158 CASSFMDWTGGNSPLHF 0.03588946 0.6695954 0.2314942
536 CASSYSGNTEAFF 0.04214075 0.8385633 0.2246444
numberSamples prevalence antigen
387 9 3.6
276 8 30.9
382 8 1.8
436 8 18.2
158 8 1.8
536 8 69.1 CMV
One very useful thing to do is merge the output of
seqMatrix
and topFreq
.
top.freq <- topFreq(productive.aa = productive.TRB.aa, percent = 0)
top.freq.matrix <- merge(top.freq, sequence.matrix)
head(top.freq.matrix)
aminoAcid numberSamples minFrequency maxFrequency meanFrequency
1 CAAGDTTLYEQYF 1 0.014046917 0.014046917 0.014046917
2 CAAGTSGDTQYF 1 0.019138756 0.019138756 0.019138756
3 CAARGGGESYEQYF 1 0.028093833 0.028093833 0.028093833
4 CAATRRQGDVMNTEAFF 1 0.083541316 0.083541316 0.083541316
5 CAAWGTGPLGSSGANVLTF 1 0.029931447 0.029931447 0.029931447
6 CACALGDGYTF 1 0.001375185 0.001375185 0.001375185
prevalence antigen TRB_CD4_949 TRB_CD8_949 TRB_CD8_CMV_369 TRB_Unsorted_0
1 0.0 0 0 0 0.01404692
2 1.8 0 0 0 0.00000000
3 0.0 0 0 0 0.02809383
4 0.0 0 0 0 0.00000000
5 0.0 0 0 0 0.00000000
6 0.0 0 0 0 0.00000000
TRB_Unsorted_1320 TRB_Unsorted_1496 TRB_Unsorted_32 TRB_Unsorted_369
1 0.000000000 0 0 0.00000000
2 0.000000000 0 0 0.00000000
3 0.000000000 0 0 0.00000000
4 0.000000000 0 0 0.08354132
5 0.000000000 0 0 0.00000000
6 0.001375185 0 0 0.00000000
TRB_Unsorted_83 TRB_Unsorted_949
1 0.00000000 0.00000000
2 0.00000000 0.01913876
3 0.00000000 0.00000000
4 0.00000000 0.00000000
5 0.02993145 0.00000000
6 0.00000000 0.00000000
To visually track the frequency of sequences across multiple samples,
use the function cloneTrack
. This function takes the output
from the seqMatrix
function. You can specify a character
vector of amino acid sequences using the parameter track
to
highlight those sequences with a different color. Alternatively, you can
highlight all of the sequences from a given sample using the parameter
map
. If the mapping feature is use, then you must specify a
productive amino acid list and a character vector of labels to title the
mapped samples. To hide sequences that are not being tracked or mapped,
set unassigned
to FALSE.
cloneTrack(sequence.matrix = sequence.matrix,
productive.aa = productive.TRB.aa,
map = c("TRB_CD4_949", "TRB_CD8_949"),
label = c("CD4", "CD8"),
track = "CASSPPTGERDTQYF",
unassigned = FALSE)
Refer to the cloneTrack
manual for examples on how to
reformat the chart using ggplot2 function.
To compare the V, D, and J gene usage across samples, start by
creating a data frame of V, D, and J gene counts and frequencies using
the function geneFreq
. You can specify if you are
interested in the “VDJ”, “DJ”, “VJ”, “DJ”, “V”, “D”, or “J” loci using
the locus
parameter. Set family
to TRUE if you
prefer the family names instead of the gene names as reported by
ImmunoSeq.
samples familyName count frequencyGene
1 TRB_Unsorted_949 TCRBV02 72 1.377990
2 TRB_Unsorted_949 TCRBV03 165 3.157895
3 TRB_Unsorted_949 TCRBV04 780 14.928230
4 TRB_Unsorted_949 TCRBV05 398 7.617225
5 TRB_Unsorted_949 TCRBV06 413 7.904306
6 TRB_Unsorted_949 TCRBV07 475 9.090909
To create a chord diagram showing VJ or DJ gene associations from one
or more more samples, combine the output of geneFreq
with
the function chordDiagramVDJ
. This function works well the
topSeqs
function that creates a data frame of a selected
number of top productive sequences. In the example below, a chord
diagram is made showing the association between V and J genes of just
the single dominant clones in each sample. The size of the ribbons
connecting VJ genes correspond to the number of samples that have that
recombination event. The thicker the ribbon, the higher the frequency of
the recombination.
top.seqs <- topSeqs(productive.seqs = productive.TRB.nt, top = 1)
chordDiagramVDJ(sample = top.seqs,
association = "VJ",
colors = c("darkred", "navyblue"))
You can also visualize the results of geneFreq
as a heat
map, word cloud, our cumulative frequency bar plot with the support of
additional R packages as shown below.
vGenes <- geneFreq(productive.nt = productive.TRB.nt, locus = "V", family = TRUE)
library(RColorBrewer)
library(grDevices)
RedBlue <- grDevices::colorRampPalette(rev(RColorBrewer::brewer.pal(11, "RdBu")))(256)
library(wordcloud)
wordcloud::wordcloud(words = vGenes[vGenes$samples == "TRB_Unsorted_83", "familyName"],
freq = vGenes[vGenes$samples == "TRB_Unsorted_83", "frequencyGene"],
colors = RedBlue)
library(reshape)
vGenes <- reshape::cast(vGenes, familyName ~ samples, value = "frequencyGene", sum)
rownames(vGenes) = as.character(vGenes$familyName)
vGenes$familyName = NULL
library(pheatmap)
pheatmap::pheatmap(vGenes, color = RedBlue, scale = "row")
vGenes <- geneFreq(productive.nt = productive.TRB.nt, locus = "V", family = TRUE)
library(ggplot2)
multicolors <- grDevices::colorRampPalette(rev(RColorBrewer::brewer.pal(9, "Set1")))(28)
ggplot2::ggplot(vGenes, aes(x = samples, y = frequencyGene, fill = familyName)) +
geom_bar(stat = "identity") +
theme_minimal() +
scale_y_continuous(expand = c(0, 0)) +
guides(fill = guide_legend(ncol = 2)) +
scale_fill_manual(values = multicolors) +
labs(y = "Frequency (%)", x = "", fill = "") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
Occasionally you may identify one or more sequences in your data set
that appear to be contamination. You can remove an amino acid sequence
from all data frames using the function removeSeq
and
recompute frequencyCount for all remaining sequences.
sample aminoAcid count frequencyCount estimatedNumberGenomes
1 TRB_CD4_949 CASSESAGSTGELFF 5019 30.091732 5019
2 TRB_Unsorted_1320 CASSESAGSTGELFF 10326 7.100079 10326
3 TRB_Unsorted_949 CASSESAGSTGELFF 338 6.468900 338
4 TRB_Unsorted_1496 CASSESAGSTGELFF 1755 6.377412 1755
cleansed <- removeSeq(file.list = productive.TRB.aa, sequence = "CASSESAGSTGELFF")
searchSeq(list = cleansed, sequence = "CASSESAGSTGELFF")
No sequences found.
If you need to combine multiple samples into one, use the
mergeFiles
function. It merges two or more sample data
frames into a single data frame and aggregates count, frequencyCount,
and estimatedNumberGenomes.
Advances in high-throughput sequencing have enabled characterizing T and B lymphocyte repertoires with unprecedented depth. LymphoSeq was developed as a tool to assist in the analysis of targeted next generation sequencing of the hypervariable CDR3 region of T and B cell receptors. The three key features of this R package are to characterize lymphocyte repertoire diversity, compare two or more lymphocyte repertoires, and track the frequency of CDR3 sequences across multiple samples. LymphoSeq also provides the unique ability to search for sequences in a curated database of published TCRB sequences with known antigen specificity. Finally, LymphoSeq can assign the percent prevalence that any given TCRB sequence appears in a the peripheral blood in healthy population of donors.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.5.1 pheatmap_1.0.12 reshape_0.8.9 wordcloud_2.6
## [5] RColorBrewer_1.1-3 LymphoSeq_1.35.0 LymphoSeqDB_0.99.2
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.1.4 farver_2.1.2
## [4] Biostrings_2.75.1 fastmap_1.2.0 lazyeval_0.2.2
## [7] digest_0.6.37 lifecycle_1.0.4 tidytree_0.4.6
## [10] VennDiagram_1.7.3 magrittr_2.0.3 compiler_4.4.2
## [13] rlang_1.1.4 sass_0.4.9 tools_4.4.2
## [16] igraph_2.1.1 utf8_1.2.4 yaml_2.3.10
## [19] data.table_1.16.2 knitr_1.49 lambda.r_1.2.4
## [22] phangorn_2.12.1 labeling_0.4.3 plyr_1.8.9
## [25] aplot_0.2.3 withr_3.0.2 purrr_1.0.2
## [28] BiocGenerics_0.53.3 sys_3.4.3 grid_4.4.2
## [31] stats4_4.4.2 fansi_1.0.6 msa_1.39.0
## [34] colorspace_2.1-1 scales_1.3.0 cli_3.6.3
## [37] UpSetR_1.4.0 rmarkdown_2.29 crayon_1.5.3
## [40] treeio_1.31.0 generics_0.1.3 stringdist_0.9.12
## [43] ggtree_3.15.0 httr_1.4.7 ape_5.8
## [46] cachem_1.1.0 zlibbioc_1.52.0 parallel_4.4.2
## [49] ggplotify_0.1.2 formatR_1.14 XVector_0.47.0
## [52] yulab.utils_0.1.8 vctrs_0.6.5 Matrix_1.7-1
## [55] jsonlite_1.8.9 gridGraphics_0.5-1 IRanges_2.41.1
## [58] patchwork_1.3.0 S4Vectors_0.45.2 maketools_1.3.1
## [61] ineq_0.2-13 tidyr_1.3.1 jquerylib_0.1.4
## [64] glue_1.8.0 codetools_0.2-20 gtable_0.3.6
## [67] shape_1.4.6.1 futile.logger_1.4.3 GenomeInfoDb_1.43.2
## [70] UCSC.utils_1.3.0 quadprog_1.5-8 munsell_0.5.1
## [73] tibble_3.2.1 pillar_1.9.0 htmltools_0.5.8.1
## [76] GenomeInfoDbData_1.2.13 circlize_0.4.16 R6_2.5.1
## [79] evaluate_1.0.1 lattice_0.22-6 futile.options_1.0.1
## [82] ggfun_0.1.7 bslib_0.8.0 Rcpp_1.0.13-1
## [85] fastmatch_1.1-4 gridExtra_2.3 nlme_3.1-166
## [88] xfun_0.49 fs_1.6.5 buildtools_1.0.0
## [91] pkgconfig_2.0.3 GlobalOptions_0.1.2