You should have identified true somatic variants in the mitochondrial (and nuclear) genome. The remaining vignettes of this package document how to get there. Here, we start with count matrices of the alternative and the reference alleles, across a number of sites of interest. Such data is available from two patients (P1, P2) as part of this package. Only P1 is analyzed as part of this vignette but the commented code for P2 should be fully functional.
load(system.file("data/M_P1.RData", package = "mitoClone2"))
load(system.file("data/N_P1.RData", package = "mitoClone2"))
P1 <- mutationCallsFromMatrix(as.matrix(M_P1), as.matrix(N_P1))
A first important step is to decide which mutations to include in the clustering. The default is to use all mutations that are covered in at least 20% of the cells, but this assignment can be changed manually.
The next step is to run SCITE or PhISCS to compute the
most likely phylogenetic tree. PhISCS is bundled in this package, but
the package needs to be run in an environment where gurobi and the gurobipy
python package are available. For example, you could set up a
conda
environment that contains this package. SCITE does
not require any additional licenses but may need to be manually
compiled.
## this next step takes approx 4.1 minutes to run
tmpd <- tempdir()
dir.create(paste0(tmpd, "/p1"))
P1 <- varCluster(P1, tempfolder = paste0(tmpd, "/p1"), method = "SCITE")
#> Warning in max(which(x == 1)): no non-missing arguments to max; returning -Inf
This step can take a while to run. It computes a likely phylogenetic tree of all the mutations. In the case of SCITE, multiple equally likely tree can be produced. In it’s current state, this package simply selects the first tree from the list.
In many cases, the order of the leaves on these trees is arbitrary,
because mutations systematically co-occur. We therefore cluster the
mutations into clones. In detail, we take every every branch on the tree
and then shuffle the order of mutations in that branch while
re-calculating the likelihood. If swapping nodes leads to small changes
in the likelihood, these nodes are then merged into a “clone”. The
parameter min.lik
that controls the merging is set
arbitrarily, see below for more information.
P1 <- clusterMetaclones(P1, min.lik = 1)
#> Warning in mutcalls@mut2clone[branches[[i]]] <-
#> as.integer(max(mutcalls@mut2clone) + : number of items to replace is not a
#> multiple of replacement length
This step also assigns each cell to the most likely clone, and
provides an estimate of the likelihood. Refer to
help(mutationCalls)
for more info on how these results are
stored.
Finally, the clustering can be plotted.
The parameter min.lik
that controls the merging is set
arbitrarily. In practice, the goal of these analyses is to group
mutations into clones for subsequent analyses (such as differential
expression analyses) and it may make sense to overwrite the result of
clusterMetaclones
manually; for example, if a subclone
defned on a mitochondrial mutation only should be treated as part of a
more clearly defined upstream clone for differential expression
analysis.
To overwrite the result of clusterMetaclones
, first
retrieve the assignment of mutations to clones:
m2c <- mitoClone2:::getMut2Clone(P1)
print(m2c)
#> X1282GA X2537GA X3335TC X3350TC X5492TC X7527GDEL X8167TC X11196GA
#> 2 3 5 5 1 4 5 2
#> X11559GA X14462GA X16233AG EAPP SRSF2 CEBPA KLF7 root
#> 4 3 5 4 4 3 2 5
## To e.g. treat the mt:2537G>A and mt:14462:G>A mutations as a
## subclone distinct from CEBPA, we can assign them a new clonal
## identity
m2c[c("X2537GA", "X14462GA")] <- as.integer(6)
P1.new <- mitoClone2:::overwriteMetaclones(P1, m2c)
plotClones(P1.new)
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] mitoClone2_1.13.0 rmarkdown_2.29
#>
#> loaded via a namespace (and not attached):
#> [1] KEGGREST_1.47.0 SummarizedExperiment_1.37.0
#> [3] gtable_0.3.6 ggplot2_3.5.1
#> [5] rjson_0.2.23 xfun_0.49
#> [7] bslib_0.8.0 Biobase_2.67.0
#> [9] lattice_0.22-6 vctrs_0.6.5
#> [11] tools_4.4.2 bitops_1.0-9
#> [13] generics_0.1.3 curl_6.0.1
#> [15] stats4_4.4.2 parallel_4.4.2
#> [17] tibble_3.2.1 AnnotationDbi_1.69.0
#> [19] RSQLite_2.3.9 blob_1.2.4
#> [21] pkgconfig_2.0.3 pheatmap_1.0.12
#> [23] Matrix_1.7-1 BSgenome_1.75.0
#> [25] RColorBrewer_1.1-3 S4Vectors_0.45.2
#> [27] lifecycle_1.0.4 GenomeInfoDbData_1.2.13
#> [29] farver_2.1.2 compiler_4.4.2
#> [31] Rsamtools_2.23.1 Biostrings_2.75.3
#> [33] munsell_0.5.1 codetools_0.2-20
#> [35] deepSNV_1.53.0 GenomeInfoDb_1.43.2
#> [37] htmltools_0.5.8.1 sys_3.4.3
#> [39] buildtools_1.0.0 sass_0.4.9
#> [41] RCurl_1.98-1.16 yaml_2.3.10
#> [43] pillar_1.10.0 crayon_1.5.3
#> [45] jquerylib_0.1.4 BiocParallel_1.41.0
#> [47] DelayedArray_0.33.3 cachem_1.1.0
#> [49] abind_1.4-8 digest_0.6.37
#> [51] VariantAnnotation_1.53.0 restfulr_0.0.15
#> [53] maketools_1.3.1 splines_4.4.2
#> [55] fastmap_1.2.0 grid_4.4.2
#> [57] colorspace_2.1-1 cli_3.6.3
#> [59] SparseArray_1.7.2 magrittr_2.0.3
#> [61] GenomicFeatures_1.59.1 S4Arrays_1.7.1
#> [63] XML_3.99-0.17 scales_1.3.0
#> [65] UCSC.utils_1.3.0 bit64_4.5.2
#> [67] XVector_0.47.0 httr_1.4.7
#> [69] matrixStats_1.4.1 bit_4.5.0.1
#> [71] png_0.1-8 VGAM_1.1-12
#> [73] memoise_2.0.1 evaluate_1.0.1
#> [75] knitr_1.49 Rhtslib_3.3.1
#> [77] BiocIO_1.17.1 GenomicRanges_1.59.1
#> [79] IRanges_2.41.2 rtracklayer_1.67.0
#> [81] rlang_1.1.4 glue_1.8.0
#> [83] DBI_1.2.3 formatR_1.14
#> [85] BiocGenerics_0.53.3 jsonlite_1.8.9
#> [87] R6_2.5.1 GenomicAlignments_1.43.0
#> [89] MatrixGenerics_1.19.0 zlibbioc_1.52.0