Title: | Longitudinal Analysis of Cancer Evolution (LACE) |
---|---|
Description: | LACE is an algorithmic framework that processes single-cell somatic mutation profiles from cancer samples collected at different time points and in distinct experimental settings, to produce longitudinal models of cancer evolution. The approach solves a Boolean Matrix Factorization problem with phylogenetic constraints, by maximizing a weighed likelihood function computed on multiple time points. |
Authors: | Daniele Ramazzotti [aut] , Fabrizio Angaroni [aut], Davide Maspero [cre, aut], Alex Graudenzi [aut], Luca De Sano [aut] , Gianluca Ascolani [aut] |
Maintainer: | Davide Maspero <[email protected]> |
License: | file LICENSE |
Version: | 2.11.0 |
Built: | 2024-10-30 07:36:28 UTC |
Source: | https://github.com/bioc/LACE |
Compute mutation distance among variants from LACE corrected genotype and use it to perform hierarchical clustering.
## S3 method for class 'mutation.distance' compute(inference)
## S3 method for class 'mutation.distance' compute(inference)
inference |
Results of the inference by LACE. |
A matrix mutation_distance with the mutation distance among variants computed from LACE corrected genotype and related hierarchical clustering.
data(inference) mutation_distance <- compute.mutation.distance(inference)
data(inference) mutation_distance <- compute.mutation.distance(inference)
Compute error rates for the considered variants comparing observed data to LACE corrected genotype.
## S3 method for class 'variants.error.rates' compute(D, inference)
## S3 method for class 'variants.error.rates' compute(D, inference)
D |
Mutation data from multiple experiments for a list of driver genes provided as a data matrix per time point. |
inference |
Results of the inference by LACE. |
A matrix variants_error_rates with the estimated error rates for the considered variants.
data(longitudinal_sc_variants) data(inference) variants_error_rates <- compute.variants.error.rates(longitudinal_sc_variants,inference)
data(longitudinal_sc_variants) data(inference) variants_error_rates <- compute.variants.error.rates(longitudinal_sc_variants,inference)
Results obtained with the function LACE on the provided input data from Rambow, Florian, et al. "Toward minimal residual disease-directed therapy in melanoma." Cell 174.4 (2018): 843-855.
data(inference)
data(inference)
Results obtained with the function LACE on the provided input data
Results obtained with the function LACE on the provided input data
Perform the inference of the maximum likelihood clonal tree from longitudinal data.
LACE( D, lik_w = NULL, alpha = NULL, beta = NULL, initialization = NULL, random_tree = FALSE, keep_equivalent = TRUE, check_indistinguishable = TRUE, num_rs = 50, num_iter = 10000, n_try_bs = 500, learning_rate = 1, marginalize = FALSE, error_move = FALSE, num_processes = Inf, seed = NULL, verbose = TRUE, log_file = "", show = TRUE )
LACE( D, lik_w = NULL, alpha = NULL, beta = NULL, initialization = NULL, random_tree = FALSE, keep_equivalent = TRUE, check_indistinguishable = TRUE, num_rs = 50, num_iter = 10000, n_try_bs = 500, learning_rate = 1, marginalize = FALSE, error_move = FALSE, num_processes = Inf, seed = NULL, verbose = TRUE, log_file = "", show = TRUE )
D |
Mutation data from multiple experiments for a list of driver genes. It can be either a list with a data matrix per time point or a SummarizedExperiment object. In this latter, the object must contain two fields: assays and colData. Assays stores one unique data matrix pooling all single cells observed at each time point and colData stores a vector of labels reporting the time point when each single cell was sequenced. Ordering of cells in assays field and colData field must be the same. |
lik_w |
Weight for each data point. If not provided, weights to correct for sample sizes are used. |
alpha |
False positive error rate provided as list of elements; if a vector of alpha (and beta) is provided, the inference is performed for multiple values and the solution at maximum-likelihood is returned. |
beta |
False negative error rate provided as list of elements; if a vector of beta (and alpha) is provided, the inference is performed for multiple values and the solution at maximum-likelihood is returned. |
initialization |
Binary matrix representing a perfect philogeny clonal tree; clones are rows and mutations are columns. This parameter overrides "random_tree". |
random_tree |
Boolean. Shall I start MCMC search from a random tree? If FALSE (default) and initialization is NULL, search is started from a TRaIT tree (BMC Bioinformatics . 2019 Apr 25;20(1):210. doi: 10.1186/s12859-019-2795-4). |
keep_equivalent |
Boolean. Shall I return results (B and C) at equivalent likelihood with the best returned solution? |
check_indistinguishable |
Boolean. Shall I remove any indistinguishable event from input data prior inference? |
num_rs |
Number of restarts during mcmc inference. |
num_iter |
Maximum number of mcmc steps to be performed during the inference. |
n_try_bs |
Number of steps without change in likelihood of best solution after which to stop the mcmc. |
learning_rate |
Parameter to tune the probability of accepting solutions at lower values during mcmc. Value of learning_rate = 1 (default), set a probability proportional to the difference in likelihood; values of learning_rate greater than 1 inclease the chance of accepting solutions at lower likelihood during mcmc while values lower than 1 decrease such probability. |
marginalize |
Boolean. Shall I marginalize C when computing likelihood? |
error_move |
Boolean. Shall I include estimation of error rates in the MCMC moves? |
num_processes |
Number of processes to be used during parallel execution. To execute in single process mode, this parameter needs to be set to either NA or NULL. |
seed |
Seed for reproducibility. |
verbose |
Boolean. Shall I print to screen information messages during the execution? |
log_file |
log file where to print outputs when using parallel. If parallel execution is disabled, this parameter is ignored. |
show |
Boolean. Show the interactive interface to explore the output. |
A list of 9 elements: B, C, clones_prevalence, relative_likelihoods, joint_likelihood, clones_summary and error_rates. Here, B returns the maximum likelihood longitudinal clonal tree, C the attachment of cells to clones, corrected_genotypes the corrected genotypes and clones_prevalence clones' prevalence; relative_likelihoods and joint_likelihood are respectively the likelihood of the solutions at each individual time points and the joint likelihood; clones_summary provide a summary of association of mutations to clones. In equivalent_solutions, solutions (B and C) with likelihood equivalent to the best solution are returned. Finally error_rates provides the best values of alpha and beta among the considered ones.
data(longitudinal_sc_variants) inference = LACE(D = longitudinal_sc_variants, lik_w = c(0.2308772,0.2554386,0.2701754,0.2435088), alpha = list(c(0.10,0.05,0.05,0.05)), beta = list(c(0.10,0.05,0.05,0.05)), keep_equivalent = TRUE, num_rs = 5, num_iter = 10, n_try_bs = 5, num_processes = NA, seed = 12345, verbose = FALSE, show = FALSE)
data(longitudinal_sc_variants) inference = LACE(D = longitudinal_sc_variants, lik_w = c(0.2308772,0.2554386,0.2701754,0.2435088), alpha = list(c(0.10,0.05,0.05,0.05)), beta = list(c(0.10,0.05,0.05,0.05)), keep_equivalent = TRUE, num_rs = 5, num_iter = 10, n_try_bs = 5, num_processes = NA, seed = 12345, verbose = FALSE, show = FALSE)
This function generates a longitudinal clonal tree and a graphic interface to explore the data using as input the clonal tree formatted in the same way as the one produced by LACE during the imputation steps
lace_interface( B_mat, clones_prevalence, C_mat, error_rates, width = NULL, height = NULL, elementId = NULL, info = "" )
lace_interface( B_mat, clones_prevalence, C_mat, error_rates, width = NULL, height = NULL, elementId = NULL, info = "" )
B_mat |
(Required). B is the clonal tree matrix where columns are the clonal mutations and and rows are the clones. The clonal tree matrix should contain a column and a row named "Root" representing the root of the tree and the wild type, respectively. B is a binary matrix where 1 are the mutations associated to the clones. The wild type column has all ones |
clones_prevalence |
(Required) The clonal prevalence matrix |
C_mat |
(Required) The corrected clonal attachment |
error_rates |
(Required) The false positive alpha and false negative beta error rates used to infer the clonal tree |
width |
(optional) Size of the window interafce |
height |
(optional) Size of the window interafce |
elementId |
(optional) Element id |
info |
(Optional). HTML formatted text with information regarding the experiments |
An implementation of the htmlwidgets
Perform the inference of the maximum likelihood clonal tree from longitudinal data.
lacedata( D, lik_w = NULL, alpha = NULL, beta = NULL, initialization = NULL, random_tree = FALSE, keep_equivalent = TRUE, check_indistinguishable = TRUE, num_rs = 50, num_iter = 10000, n_try_bs = 500, learning_rate = 1, marginalize = FALSE, error_move = FALSE, num_processes = Inf, seed = NULL, verbose = TRUE, log_file = "" )
lacedata( D, lik_w = NULL, alpha = NULL, beta = NULL, initialization = NULL, random_tree = FALSE, keep_equivalent = TRUE, check_indistinguishable = TRUE, num_rs = 50, num_iter = 10000, n_try_bs = 500, learning_rate = 1, marginalize = FALSE, error_move = FALSE, num_processes = Inf, seed = NULL, verbose = TRUE, log_file = "" )
D |
Mutation data from multiple experiments for a list of driver genes. It can be either a list with a data matrix per time point or a SummarizedExperiment object. In this latter, the object must contain two fields: assays and colData. Assays stores one unique data matrix pooling all single cells observed at each time point and colData stores a vector of labels reporting the time point when each single cell was sequenced. Ordering of cells in assays field and colData field must be the same. |
lik_w |
Weight for each data point. If not provided, weights to correct for sample sizes are used. |
alpha |
False positive error rate provided as list of elements; if a vector of alpha (and beta) is provided, the inference is performed for multiple values and the solution at maximum-likelihood is returned. |
beta |
False negative error rate provided as list of elements; if a vector of beta (and alpha) is provided, the inference is performed for multiple values and the solution at maximum-likelihood is returned. |
initialization |
Binary matrix representing a perfect philogeny clonal tree; clones are rows and mutations are columns. This parameter overrides "random_tree". |
random_tree |
Boolean. Shall I start MCMC search from a random tree? If FALSE (default) and initialization is NULL, search is started from a TRaIT tree (BMC Bioinformatics . 2019 Apr 25;20(1):210. doi: 10.1186/s12859-019-2795-4). |
keep_equivalent |
Boolean. Shall I return results (B and C) at equivalent likelihood with the best returned solution? |
check_indistinguishable |
Boolean. Shall I remove any indistinguishable event from input data prior inference? |
num_rs |
Number of restarts during mcmc inference. |
num_iter |
Maximum number of mcmc steps to be performed during the inference. |
n_try_bs |
Number of steps without change in likelihood of best solution after which to stop the mcmc. |
learning_rate |
Parameter to tune the probability of accepting solutions at lower values during mcmc. Value of learning_rate = 1 (default), set a probability proportional to the difference in likelihood; values of learning_rate greater than 1 inclease the chance of accepting solutions at lower likelihood during mcmc while values lower than 1 decrease such probability. |
marginalize |
Boolean. Shall I marginalize C when computing likelihood? |
error_move |
Boolean. Shall I include estimation of error rates in the MCMC moves? |
num_processes |
Number of processes to be used during parallel execution. To execute in single process mode, this parameter needs to be set to either NA or NULL. |
seed |
Seed for reproducibility. |
verbose |
Boolean. Shall I print to screen information messages during the execution? |
log_file |
log file where to print outputs when using parallel. If parallel execution is disabled, this parameter is ignored. |
shiny interface
data(longitudinal_sc_variants) lacedata(D = longitudinal_sc_variants, lik_w = c(0.2308772,0.2554386,0.2701754,0.2435088), alpha = list(c(0.10,0.05,0.05,0.05)), beta = list(c(0.10,0.05,0.05,0.05)), keep_equivalent = TRUE, num_rs = 5, num_iter = 10, n_try_bs = 5, num_processes = NA, seed = 12345, verbose = FALSE)
data(longitudinal_sc_variants) lacedata(D = longitudinal_sc_variants, lik_w = c(0.2308772,0.2554386,0.2701754,0.2435088), alpha = list(c(0.10,0.05,0.05,0.05)), beta = list(c(0.10,0.05,0.05,0.05)), keep_equivalent = TRUE, num_rs = 5, num_iter = 10, n_try_bs = 5, num_processes = NA, seed = 12345, verbose = FALSE)
LACEview
displays a Shiny user interface to
handle the VCF and BAM files processing that is needed to
construct the input for the LACE inference algorithms.
The function generates also the maximum likelihood longitudinal
clonal tree, and shows the output for further explorations of
the results.
LACEview()
LACEview()
The GUI
The package is available on GitHub and Bioconductor. LACE 2.0 requires R > 4.1.0 and Bioconductor.
To install directly from github run:
remotes::install_github("https://github.com/BIMIB-DISCo/LACE", dependencies = TRUE)
LACE 2.0 uses Annovar and Samtools suite as back-ends for variant calling annotation and depth computation, respectively.
Annovar is a variant calling software written in Perl freely available upon registration to their website at https://annovar.openbioinformatics.org/en/latest/.
Perl can be found and installed at https://www.perl.org/.
Samtools suite is a set of tools to handle SAM/BAM/BED file format. It is freely available at http://www.htslib.org/. To install Samtools follow the instructions in their website.
The function LACE
is still available for retrocompatibility.
The dataset includes somatic single nucleotide variants at the single cell resolution. SNVs are called from SMARTseq2 fastq obtained from Gene Expression Omnibus database with the accession number: GSE116237. The dataset includes single cell data from a PDX melanoma model before and on treatment with BRAF and MEK inhibitors. The fastq files are processed to obtain the mutational profile following GATK best practice (https://gatkforums.broadinstitute.org/gatk/discussion/3891/calling-variants-in-rnaseq) usign the GRCh38 human genome as reference. Mutation data are stored in an N x M binary matrix with N single cells and M somatic single nucleotide variants. Row names report the ID of the fastq file related to a specific single cell; columns names report the SNV that are formatted as GeneName_chromosome_position_referenceAllele_alternateAllele. Each matrix entry can be 1 (mutation detected), 0 (mutation absent) or NA (too low coverage to determine the presence or absence of that mutation). For further details, please refer to the Methods Section and the section 3.1 of supplementary materials of Ramazzotti, Daniele, et al. "Longitudinal cancer evolution from single cells." bioRxiv (2020).
data(longitudinal_sc_variants)
data(longitudinal_sc_variants)
List of mutation data for four time points
List of mutational data for a total of 475 single cells
Rambow, Florian, et al. "Toward minimal residual disease-directed therapy in melanoma." Cell 174.4 (2018): 843-855.
Plot a longitudinal tree inferred by LACE.
longitudinal.tree.plot( inference, rem_unseen_leafs = TRUE, show_plot = TRUE, filename = "lg_output.xml", labels_show = "mutations", clone_labels = NULL, show_prev = TRUE, label.cex = 1, size = 500, size2 = NULL, tk_plot = FALSE, tp_lines = TRUE, tp_mark = TRUE, tp_mark_alpha = 0.5, legend = TRUE, legend_position = "topright", label_offset = 4, legend_cex = 0.8 )
longitudinal.tree.plot( inference, rem_unseen_leafs = TRUE, show_plot = TRUE, filename = "lg_output.xml", labels_show = "mutations", clone_labels = NULL, show_prev = TRUE, label.cex = 1, size = 500, size2 = NULL, tk_plot = FALSE, tp_lines = TRUE, tp_mark = TRUE, tp_mark_alpha = 0.5, legend = TRUE, legend_position = "topright", label_offset = 4, legend_cex = 0.8 )
inference |
Results of the inference by LACE. |
rem_unseen_leafs |
If TRUE (default) remove all the leafs that have never been observed (prevalence = 0 in each time point) |
show_plot |
If TRUE (default) output the longitudinal tree to the current graphical device. |
filename |
Specify the name of the file where to save the longitudinal tree. Dot or graphml formats are supported and are chosen based on the extenction of the filename (.dot or .xml). |
labels_show |
Specify which type of label should be placed on the tree; options are, "mutations": parental edges are labeled with the acquired mutation between the two nodes (genotypes); "clones": nodes (genotypes) are labeled with their last acquired mutation; "both": either nodes and edges are labeled as specified above; "none": no labels will show on the longitudinal tree. |
clone_labels |
Character vector that specifies the name of the nodes (genotypes). If it is NULL (default), nodes will be labeled as specified by "label" parameter. |
show_prev |
If TRUE (default) add to clones label the correspongind prevalance. |
label.cex |
Specify the size of the labels. |
size |
Specify size of the nodes. The final area is proportional with the node prevalence. |
size2 |
Specify the size of the second dimension of the nodes. If NULL (default), it is set equal to "size". |
tk_plot |
If TRUE, uses tkplot function from igraph library to plot an interactive tree. Default is FALSE. |
tp_lines |
If TRUE (defaul) the function draws lines between timepoints. |
tp_mark |
If TRUE (defaul) the function draws different colored area under the nodes in different time points. |
tp_mark_alpha |
Specify the alpha value of the area drawed when tp_mark = TRUE. |
legend |
If TRUE (default) a legend will be displayed on the plot. |
legend_position |
Specify the legend position. |
label_offset |
Move the mutation labels horizontally (default = 4) |
legend_cex |
Specify size of the legend text. |
An igraph object g with the longitudinal tree inferred by LACE.
data(inference) clone_labels = c("ARPC2","PRAME","HNRNPC","COL1A2","RPL5","CCT8") longitudinal.tree.plot(inference = inference, labels = "clones", clone_labels = clone_labels, legend_position = "topleft")
data(inference) clone_labels = c("ARPC2","PRAME","HNRNPC","COL1A2","RPL5","CCT8") longitudinal.tree.plot(inference = inference, labels = "clones", clone_labels = clone_labels, legend_position = "topleft")