Title: | CIMICE-R: (Markov) Chain Method to Inferr Cancer Evolution |
---|---|
Description: | CIMICE is a tool in the field of tumor phylogenetics and its goal is to build a Markov Chain (called Cancer Progression Markov Chain, CPMC) in order to model tumor subtypes evolution. The input of CIMICE is a Mutational Matrix, so a boolean matrix representing altered genes in a collection of samples. These samples are assumed to be obtained with single-cell DNA analysis techniques and the tool is specifically written to use the peculiarities of this data for the CMPC construction. |
Authors: | Nicolò Rossi [aut, cre] (Lab. of Computational Biology and Bioinformatics, Department of Mathematics, Computer Science and Physics, University of Udine, <https://orcid.org/0000-0002-6353-7396>) |
Maintainer: | Nicolò Rossi <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.15.0 |
Built: | 2024-10-30 04:39:29 UTC |
Source: | https://github.com/bioc/CIMICE |
Given M mutational matrix, add samples as row names, and genes as column names. If there are repetitions in row names, these are solved by adding a sequential identifier to the names.
annotate_mutational_matrix(M, samples, genes)
annotate_mutational_matrix(M, samples, genes)
M |
mutational matrix |
samples |
list of sample names |
genes |
list of gene names |
N with the set row and column names
require(Matrix) genes <- c("A", "B", "C") samples <- c("S1", "S2", "S2") M <- Matrix(c(0,0,1,0,0,1,0,1,1), ncol=3, sparse=TRUE, byrow = TRUE) annotate_mutational_matrix(M, samples, genes)
require(Matrix) genes <- c("A", "B", "C") samples <- c("S1", "S2", "S2") M <- Matrix(c(0,0,1,0,0,1,0,1,1), ncol=3, sparse=TRUE, byrow = TRUE) annotate_mutational_matrix(M, samples, genes)
Sort the rows of a binary matrix in ascending order
binary_radix_sort(mat)
binary_radix_sort(mat)
mat |
a binary matrix (of 0 and 1) |
the sorted matrix
require(Matrix) m <- Matrix(c(1,1,0,1,0,0,0,1,1), sparse = TRUE, ncol = 3) binary_radix_sort(m)
require(Matrix) m <- Matrix(c(1,1,0,1,0,0,0,1,1), sparse = TRUE, ncol = 3) binary_radix_sort(m)
Create a graph from the "build_topology_subset" edge list, so that it respects the subset relation, omitting the transitive edges.
build_subset_graph(edges, labels)
build_subset_graph(edges, labels)
edges |
edge list, built from "build_topology_subset" |
labels |
list of node labels, to be paired with the graph |
a graph with the subset topology, omitting transitive edges
require(dplyr) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] edges <- build_topology_subset(samples) g <- build_subset_graph(edges, labels)
require(dplyr) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] edges <- build_topology_subset(samples) g <- build_subset_graph(edges, labels)
Create an edge list E representing the 'subset' relation for binary strings so that:
build_topology_subset(samples)
build_topology_subset(samples)
samples |
input dataset (mutational matrix) as matrix |
the computed edge list
require(dplyr) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] build_topology_subset(samples)
require(dplyr) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] build_topology_subset(samples)
This function creates a reader to read a text file in batches (or chunks). It can be used for very large files that cannot fit in RAM.
chunk_reader(file_path)
chunk_reader(file_path)
file_path |
path to large file |
a list-object containing the function 'read' to read lines from the given file, and 'close' to close the connection to the file stream.
# open connection to file reader <- chunk_reader( system.file("extdata", "paac_jhu_2014_500.maf", package = "CIMICE", mustWork = TRUE) ) while(TRUE){ # read a chunk chunk <- reader$read(10) if(length(chunk) == 0){ break } # --- process chunk --- } # close connection reader$close()
# open connection to file reader <- chunk_reader( system.file("extdata", "paac_jhu_2014_500.maf", package = "CIMICE", mustWork = TRUE) ) while(TRUE){ # read a chunk chunk <- reader$read(10) if(length(chunk) == 0){ break } # --- process chunk --- } # close connection reader$close()
R implementation of the CIMICE tool. CIMICE is a tool in the field of tumor phylogenetics and its goal is to build a Markov Chain (called Cancer Progression Markov Chain, CPMC) in order to model tumor subtypes evolution. The input of CIMICE is a Mutational Matrix, so a boolean matrix representing altered genes in a collection of samples. These samples are assumed to be obtained with single-cell DNA analysis techniques and the tool is specifically written to use the peculiarities of this data for the CMPC construction. See 'https://github.com/redsnic/tumorEvolutionWithMarkovChains/tree/master/GenotypeEvolutionPaths' for the original Java version of this tool.
CIMICE-R: (Markov) Chain Method to Infer Cancer Evolution
Nicolò Rossi [email protected]
Count duplicate rows and compact the dataset (mutational). The column 'freq' will contain the counts for each row.
compact_dataset(mutmatrix)
compact_dataset(mutmatrix)
mutmatrix |
input dataset (mutational matrix) |
a list with matrix (the compacted dataset (mutational matrix)), counts (frequencies of genotypes) and row_names (comma separated string of sample IDs) fields
compact_dataset(example_dataset())
compact_dataset(example_dataset())
This procedure computes the weights for edges of a graph accordingly to CIMICE specification. (See vignettes for further explainations)
compute_weights_default(g, freqs)
compute_weights_default(g, freqs)
g |
a graph (must be a DAG with no transitive edges) |
freqs |
observed frequencies of genotypes |
a graph with the computed weights
require(dplyr) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] g <- graph_non_transitive_subset_topology(samples, labels) compute_weights_default(g, freqs)
require(dplyr) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] g <- graph_non_transitive_subset_topology(samples, labels) compute_weights_default(g, freqs)
Computes the Down weights formula using a Dinamic Programming approach (starting call), see vignettes for further explaination.
computeDWNW(g, freqs, no.of.children, A, normUpWeights)
computeDWNW(g, freqs, no.of.children, A, normUpWeights)
g |
graph (a Directed Acyclic Graph) |
freqs |
observed genotype frequencies |
no.of.children |
number of children for each node |
A |
adjacency matrix of G |
normUpWeights |
normalized up weights as computed by normalizeUPW |
a vector containing the Up weights for each edge
require(dplyr) require(igraph) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] g <- graph_non_transitive_subset_topology(samples, labels) # prepare adj matrix A <- as.matrix(as_adj(g)) # pre-compute exiting edges from each node no.of.children <- get_no_of_children(A,g) upWeights <- computeUPW(g, freqs, no.of.children, A) normUpWeights <- normalizeUPW(g, freqs, no.of.children, A, upWeights) computeDWNW(g, freqs, no.of.children, A, normUpWeights)
require(dplyr) require(igraph) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] g <- graph_non_transitive_subset_topology(samples, labels) # prepare adj matrix A <- as.matrix(as_adj(g)) # pre-compute exiting edges from each node no.of.children <- get_no_of_children(A,g) upWeights <- computeUPW(g, freqs, no.of.children, A) normUpWeights <- normalizeUPW(g, freqs, no.of.children, A, upWeights) computeDWNW(g, freqs, no.of.children, A, normUpWeights)
Computes the Down weights formula using a Dinamic Programming approach (recursion), see vignettes for further explaination.
computeDWNW_aux(g, edge, freqs, no.of.children, A, normUpWeights)
computeDWNW_aux(g, edge, freqs, no.of.children, A, normUpWeights)
g |
graph (a Directed Acyclic Graph) |
edge |
the currently considered edge |
freqs |
observed genotype frequencies |
no.of.children |
number of children for each node |
A |
adjacency matrix of G |
normUpWeights |
normalized up weights as computed by normalizeUPW |
a vector containing the Up weights for each edge
Computes the up weights formula using a Dinamic Programming approach (starting call), see vignettes for further explaination.
computeUPW(g, freqs, no.of.children, A)
computeUPW(g, freqs, no.of.children, A)
g |
graph (a Directed Acyclic Graph) |
freqs |
observed genotype frequencies |
no.of.children |
number of children for each node |
A |
adjacency matrix of G |
a vector containing the Up weights for each edge
require(dplyr) require(igraph) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] g <- graph_non_transitive_subset_topology(samples, labels) # prepare adj matrix A <- as.matrix(as_adj(g)) # pre-compute exiting edges from each node no.of.children <- get_no_of_children(A,g) computeUPW(g, freqs, no.of.children, A)
require(dplyr) require(igraph) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] g <- graph_non_transitive_subset_topology(samples, labels) # prepare adj matrix A <- as.matrix(as_adj(g)) # pre-compute exiting edges from each node no.of.children <- get_no_of_children(A,g) computeUPW(g, freqs, no.of.children, A)
Computes the up weights formula using a Dinamic Programming approach (recursion), see vignettes for further explaination.
computeUPW_aux(g, edge, freqs, no.of.children, A)
computeUPW_aux(g, edge, freqs, no.of.children, A)
g |
graph (a Directed Acyclic Graph) |
edge |
the currently considered edge |
freqs |
observed genotype frequencies |
no.of.children |
number of children for each node |
A |
adjacency matrix of G |
a vector containing the Up weights for each edge
Prepare correlation plot based on a mutational matrix
corrplot_from_mutational_matrix(mutmatrix)
corrplot_from_mutational_matrix(mutmatrix)
mutmatrix |
input dataset |
the computed correlation plot
corrplot_from_mutational_matrix(example_dataset())
corrplot_from_mutational_matrix(example_dataset())
Prepare a correlation plot computed from genes' perspective using a mutational matrix
corrplot_genes(mutmatrix)
corrplot_genes(mutmatrix)
mutmatrix |
input dataset (mutational matrix) |
the computed correlation plot
corrplot_genes(example_dataset())
corrplot_genes(example_dataset())
Prepare a correlation plot computed from samples' perspective using a mutational matrix
corrplot_samples(mutmatrix)
corrplot_samples(mutmatrix)
mutmatrix |
input dataset (mutational matrix) |
the computed correlation plot
corrplot_samples(example_dataset())
corrplot_samples(example_dataset())
executes the preprocessing steps of CIMICE
dataset_preprocessing(dataset)
dataset_preprocessing(dataset)
dataset |
a mutational matrix as a (sparse) matrix |
Preprocessing steps:
1) dataset is compacted
2) genotype frequencies are computed
3) labels are prepared
a list containing the mutational matrix ("samples"), the mutational frequencies of the genotypes ("freqs"), the node labels ("labels") and finally the gene names ("genes")
require(dplyr) example_dataset() %>% dataset_preprocessing
require(dplyr) example_dataset() %>% dataset_preprocessing
executes the preprocessing steps of CIMICE
dataset_preprocessing_population(compactedDataset)
dataset_preprocessing_population(compactedDataset)
compactedDataset |
a list (matrix: a mutational matrix, counts: number of samples with given genotype). "counts" is normalized automatically. |
Preprocessing steps:
1) genotype frequencies are computed
2) labels are prepared
a list containing the mutational matrix ("samples"), the mutational frequencies of the genotypes ("freqs"), the node labels ("labels") and finally the gene names ("genes")
require(dplyr) example_dataset_withFreqs() %>% dataset_preprocessing_population
require(dplyr) example_dataset_withFreqs() %>% dataset_preprocessing_population
Draws the output graph using ggplot
draw_ggraph(out, digits = 4, ...)
draw_ggraph(out, digits = 4, ...)
out |
the output object of CIMICE (es, from quick run) |
digits |
precision for edges' weights |
... |
other arguments for format_labels |
ggraph object representing g as described
draw_ggraph(quick_run(example_dataset()))
draw_ggraph(quick_run(example_dataset()))
Draws the output graph using networkD3
draw_networkD3(out, ...)
draw_networkD3(out, ...)
out |
the output object of CIMICE (es, from quick run) |
... |
other arguments for format_labels |
networkD3 object representing g as described
draw_networkD3(quick_run(example_dataset()))
draw_networkD3(quick_run(example_dataset()))
Draws the output graph using VisNetwork
draw_visNetwork(out, ...)
draw_visNetwork(out, ...)
out |
the output object of CIMICE (es, from quick run) |
... |
other arguments for format_labels |
visNetwork object representing g as described
draw_visNetwork(quick_run(example_dataset()))
draw_visNetwork(quick_run(example_dataset()))
Creates a simple example dataset
example_dataset()
example_dataset()
a simple mutational matrix
example_dataset()
example_dataset()
Creates a simple example dataset with frequency column
example_dataset_withFreqs()
example_dataset_withFreqs()
a simple mutational matrix
example_dataset_withFreqs()
example_dataset_withFreqs()
Checks if a generator can be normalized so that it actually is a Markov Chain
finalize_generator(generator)
finalize_generator(generator)
generator |
a generator |
A generator with edge weights that respect DTMC definition
require(dplyr) example_dataset() %>% make_generator_stub() %>% set_generator_edges( list( "D", "A, D", 1 , "A", "A, D", 1 , "A, D", "A, C, D", 1 , "A, D", "A, B, D", 1 , "Clonal", "D", 1 , "Clonal", "A", 1 , "D", "D", 1 , "A", "A", 1 , "A, D", "A, D", 1 , "A, C, D", "A, C, D", 1 , "A, B, D", "A, B, D", 1 , "Clonal", "Clonal", 1 )) %>% finalize_generator
require(dplyr) example_dataset() %>% make_generator_stub() %>% set_generator_edges( list( "D", "A, D", 1 , "A", "A, D", 1 , "A, D", "A, C, D", 1 , "A, D", "A, B, D", 1 , "Clonal", "D", 1 , "Clonal", "A", 1 , "D", "D", 1 , "A", "A", 1 , "A, D", "A, D", 1 , "A, C, D", "A, C, D", 1 , "A, B, D", "A, B, D", 1 , "Clonal", "Clonal", 1 )) %>% finalize_generator
Fix the absence of the clonal genotype in the data (if needed)
fix_clonal_genotype(samples, freqs, labels, matching_samples)
fix_clonal_genotype(samples, freqs, labels, matching_samples)
samples |
input dataset (mutational matrix) as matrix |
freqs |
genotype frequencies (in the rows' order) |
labels |
list of gene names (in the columns' order) |
matching_samples |
list of sample names matching each genotype |
a named list containing the fixed "samples", "freqs" and "labels"
require(dplyr) # compact compactedDataset <- compact_dataset(example_dataset()) samples <- compactedDataset$matrix # save genes' names genes <- colnames(compactedDataset$matrix) # keep the information on frequencies for further analysis freqs <- compactedDataset$counts/sum(compactedDataset$counts) # prepare node labels listing the mutated genes for each node labels <- prepare_labels(samples, genes) if( is.null(compactedDataset$row_names) ){ compactedDataset$row_names <- rownames(compactedDataset$matrix) } matching_samples <- compactedDataset$row_names # matching_samples matching_samples # fix Colonal genotype absence, if needed fix <- fix_clonal_genotype(samples, freqs, labels, matching_samples)
require(dplyr) # compact compactedDataset <- compact_dataset(example_dataset()) samples <- compactedDataset$matrix # save genes' names genes <- colnames(compactedDataset$matrix) # keep the information on frequencies for further analysis freqs <- compactedDataset$counts/sum(compactedDataset$counts) # prepare node labels listing the mutated genes for each node labels <- prepare_labels(samples, genes) if( is.null(compactedDataset$row_names) ){ compactedDataset$row_names <- rownames(compactedDataset$matrix) } matching_samples <- compactedDataset$row_names # matching_samples matching_samples # fix Colonal genotype absence, if needed fix <- fix_clonal_genotype(samples, freqs, labels, matching_samples)
Prepare labels based on multiple identifiers so that they do not excede a certain size (if they do, a simple number is used)
format_labels(labels, max_col = 3, max_row = 3)
format_labels(labels, max_col = 3, max_row = 3)
labels |
a charachter vector of the labels to manage |
max_col |
maximum number of identifiers in a single row for a label |
max_row |
maximum number of rows of identifiers in a label |
the updated labels
format_labels(c("A, B", "C, D, E"))
format_labels(c("A, B", "C, D, E"))
Create the histogram of the genes' mutational frequencies
gene_mutations_hist(mutmatrix, binwidth = 1)
gene_mutations_hist(mutmatrix, binwidth = 1)
mutmatrix |
input dataset (mutational matrix) |
binwidth |
binwidth parameter for the histogram (as in ggplot) |
the newly created histogram
gene_mutations_hist(example_dataset(), binwidth = 10)
gene_mutations_hist(example_dataset(), binwidth = 10)
Compute number of children for each node given an adj matrix
get_no_of_children(A, g)
get_no_of_children(A, g)
A |
Adjacency matrix of the graph g |
g |
a graph |
a vector containing the number of children for each node in g
require(dplyr) require(igraph) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] g <- graph_non_transitive_subset_topology(samples, labels) A <- as_adj(g) get_no_of_children(A, g)
require(dplyr) require(igraph) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] g <- graph_non_transitive_subset_topology(samples, labels) A <- as_adj(g) get_no_of_children(A, g)
By default, CIMICE computes the relation between genotypes using the subset relation. For the following steps it is also important that the transitive edges are removed.
graph_non_transitive_subset_topology(samples, labels)
graph_non_transitive_subset_topology(samples, labels)
samples |
mutational matrix |
labels |
genotype labels |
a graph with the wanted topology
require(dplyr) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] graph_non_transitive_subset_topology(samples, labels)
require(dplyr) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] graph_non_transitive_subset_topology(samples, labels)
Initialize a dataset for "line by line" creation
make_dataset(...)
make_dataset(...)
... |
gene names (do not use '"', the input is automatically converted to strings) |
a mutational matrix without samples structured as (sparse) matrix
make_dataset(APC,P53,KRAS)
make_dataset(APC,P53,KRAS)
Create a generator topology directly from a dataset. The topology will follow the subset relation.
make_generator_stub(dataset)
make_generator_stub(dataset)
dataset |
A compacted CIMICE dataset |
a generator, with weight = 0 for all the edges
make_generator_stub(example_dataset())
make_generator_stub(example_dataset())
This function helps creating labels for nodes with different information
make_labels(out, mode = "samplesIDs", max_col = 3, max_row = 3)
make_labels(out, mode = "samplesIDs", max_col = 3, max_row = 3)
out |
the output object of CIMICE (es, from quick run) |
mode |
which labels to print: samplesIDs (matching samples), sequentialIDs (just a sequential numer), geneIDs (genotype) |
max_col |
identifiers are represented in a max_col times max_row grid (if the number of IDs exceeds, a the sequentialID number is used instead) |
max_row |
identifiers are represented in a max_col times max_row grid (if the number of IDs exceeds, a the sequentialID number is used instead) |
the requested labels
make_labels(quick_run(example_dataset()))
make_labels(quick_run(example_dataset()))
Normalizes Down weights so that the sum of weights of edges exiting a node is 1
normalizeDWNW(g, freqs, no.of.children, A, downWeights)
normalizeDWNW(g, freqs, no.of.children, A, downWeights)
g |
graph (a Directed Acyclic Graph) |
freqs |
observed genotype frequencies |
no.of.children |
number of children for each node |
A |
adjacency matrix of G |
downWeights |
Down weights as computed by computeDWNW |
a vector containing the normalized Down weights for each edge
require(dplyr) require(igraph) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] g <- graph_non_transitive_subset_topology(samples, labels) # prepare adj matrix A <- as.matrix(as_adj(g)) # pre-compute exiting edges from each node no.of.children <- get_no_of_children(A,g) upWeights <- computeUPW(g, freqs, no.of.children, A) normUpWeights <- normalizeUPW(g, freqs, no.of.children, A, upWeights) downWeights <- computeDWNW(g, freqs, no.of.children, A, normUpWeights) normalizeUPW(g, freqs, no.of.children, A, downWeights)
require(dplyr) require(igraph) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] g <- graph_non_transitive_subset_topology(samples, labels) # prepare adj matrix A <- as.matrix(as_adj(g)) # pre-compute exiting edges from each node no.of.children <- get_no_of_children(A,g) upWeights <- computeUPW(g, freqs, no.of.children, A) normUpWeights <- normalizeUPW(g, freqs, no.of.children, A, upWeights) downWeights <- computeDWNW(g, freqs, no.of.children, A, normUpWeights) normalizeUPW(g, freqs, no.of.children, A, downWeights)
Normalizes up weights so that the sum of weights of edges entering in a node is 1
normalizeUPW(g, freqs, no.of.children, A, upWeights)
normalizeUPW(g, freqs, no.of.children, A, upWeights)
g |
graph (a Directed Acyclic Graph) |
freqs |
observed genotype frequencies |
no.of.children |
number of children for each node |
A |
adjacency matrix of G |
upWeights |
Up weights as computed by computeUPW |
a vector containing the normalized Up weights for each edge
require(dplyr) require(igraph) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] g <- graph_non_transitive_subset_topology(samples, labels) # prepare adj matrix A <- as.matrix(as_adj(g)) # pre-compute exiting edges from each node no.of.children <- get_no_of_children(A,g) upWeights <- computeUPW(g, freqs, no.of.children, A) normalizeUPW(g, freqs, no.of.children, A, upWeights)
require(dplyr) require(igraph) preproc <- example_dataset() %>% dataset_preprocessing samples <- preproc[["samples"]] freqs <- preproc[["freqs"]] labels <- preproc[["labels"]] genes <- preproc[["genes"]] g <- graph_non_transitive_subset_topology(samples, labels) # prepare adj matrix A <- as.matrix(as_adj(g)) # pre-compute exiting edges from each node no.of.children <- get_no_of_children(A,g) upWeights <- computeUPW(g, freqs, no.of.children, A) normalizeUPW(g, freqs, no.of.children, A, upWeights)
Given a boolean matrix, randomly add False Positives (FP), False Negatives (FN) and Missing data following user defined rates. In the final matrix, missing data is represented by the value 3.
perturb_dataset(dataset, FP_rate = 0, FN_rate = 0, MIS_rate = 0)
perturb_dataset(dataset, FP_rate = 0, FN_rate = 0, MIS_rate = 0)
dataset |
a matrix/sparse matrix |
FP_rate |
False Positive rate |
FN_rate |
False Negative rate |
MIS_rate |
Missing Data rate |
Note that CIMICE does not support dataset with missing data natively, so using MIS_rate != 0 will then require some pre-processing.
the new, perturbed, matrix
require(dplyr) example_dataset() %>% make_generator_stub() %>% set_generator_edges( list( "D", "A, D", 1 , "A", "A, D", 1 , "A, D", "A, C, D", 1 , "A, D", "A, B, D", 1 , "Clonal", "D", 1 , "Clonal", "A", 1 , "D", "D", 1 , "A", "A", 1 , "A, D", "A, D", 1 , "A, C, D", "A, C, D", 1 , "A, B, D", "A, B, D", 1 , "Clonal", "Clonal", 1 )) %>% finalize_generator %>% simulate_generator(3, 10) %>% perturb_dataset(FP_rate = 0.01, FN_rate = 0.1, MIS_rate = 0.12)
require(dplyr) example_dataset() %>% make_generator_stub() %>% set_generator_edges( list( "D", "A, D", 1 , "A", "A, D", 1 , "A, D", "A, C, D", 1 , "A, D", "A, B, D", 1 , "Clonal", "D", 1 , "Clonal", "A", 1 , "D", "D", 1 , "A", "A", 1 , "A, D", "A, D", 1 , "A, C, D", "A, C, D", 1 , "A, B, D", "A, B, D", 1 , "Clonal", "Clonal", 1 )) %>% finalize_generator %>% simulate_generator(3, 10) %>% perturb_dataset(FP_rate = 0.01, FN_rate = 0.1, MIS_rate = 0.12)
Simple ggraph interface to draw a generator
plot_generator(generator)
plot_generator(generator)
generator |
a generator |
a basic plot of this generator
require(dplyr) example_dataset() %>% make_generator_stub() %>% set_generator_edges( list( "D", "A, D", 1 , "A", "A, D", 1 , "A, D", "A, C, D", 1 , "A, D", "A, B, D", 1 , "Clonal", "D", 1 , "Clonal", "A", 1 , "D", "D", 1 , "A", "A", 1 , "A, D", "A, D", 1 , "A, C, D", "A, C, D", 1 , "A, B, D", "A, B, D", 1 , "Clonal", "Clonal", 1 )) %>% finalize_generator %>% plot_generator
require(dplyr) example_dataset() %>% make_generator_stub() %>% set_generator_edges( list( "D", "A, D", 1 , "A", "A, D", 1 , "A, D", "A, C, D", 1 , "A, D", "A, B, D", 1 , "Clonal", "D", 1 , "Clonal", "A", 1 , "D", "D", 1 , "A", "A", 1 , "A, D", "A, D", 1 , "A, C, D", "A, C, D", 1 , "A, B, D", "A, B, D", 1 , "Clonal", "Clonal", 1 )) %>% finalize_generator %>% plot_generator
Prints a string in the form of the command that sets weights for all the edges of this generator.
prepare_generator_edge_set_command(generator, by = "labels")
prepare_generator_edge_set_command(generator, by = "labels")
generator |
a generator |
by |
"labels" or "samples" to use gene labels or sample labels as references for edge identifiers. |
NULL (the string with the function calls is printed on the stdout)
require(dplyr) example_dataset() %>% make_generator_stub() %>% prepare_generator_edge_set_command()
require(dplyr) example_dataset() %>% make_generator_stub() %>% prepare_generator_edge_set_command()
Prepare node labels so that each node is labelled with a comma separated list of the alterated genes representing its associated genotype.
prepare_labels(samples, genes)
prepare_labels(samples, genes)
samples |
input dataset (mutational matrix) as matrix |
genes |
list of gene names (in the columns' order) |
Note that after this procedure the user is expected also to run fix_clonal_genotype to also add the clonal genortype to the mutational matrix if it is not present.
the computed edge list
require(dplyr) # compact compactedDataset <- compact_dataset(example_dataset()) samples <- compactedDataset$matrix # save genes' names genes <- colnames(compactedDataset$matrix) # keep the information on frequencies for further analysis freqs <- compactedDataset$counts/sum(compactedDataset$counts) # prepare node labels listing the mutated genes for each node labels <- prepare_labels(samples, genes)
require(dplyr) # compact compactedDataset <- compact_dataset(example_dataset()) samples <- compactedDataset$matrix # save genes' names genes <- colnames(compactedDataset$matrix) # keep the information on frequencies for further analysis freqs <- compactedDataset$counts/sum(compactedDataset$counts) # prepare node labels listing the mutated genes for each node labels <- prepare_labels(samples, genes)
This function executes CIMICE analysis on a dataset using default settings.
quick_run(dataset, mode = "CAPRI")
quick_run(dataset, mode = "CAPRI")
dataset |
a mutational matrix as a (sparse) matrix |
mode |
indicates the used input format. Must be either "CAPRI" or "CAPRIpop" |
a list object representing the graph computed by CIMICE with the structure 'list(topology = g, weights = W, labels = labels, freqs=freqs)'
quick_run(example_dataset())
quick_run(example_dataset())
Read a "CAPRI" formatted file, as read_CAPRI
read(filepath)
read(filepath)
filepath |
path to file |
the described mutational matrix as a (sparse) matrix
read(system.file("extdata", "example.CAPRI", package = "CIMICE", mustWork = TRUE))
read(system.file("extdata", "example.CAPRI", package = "CIMICE", mustWork = TRUE))
Read a "CAPRI" formatted file from the file system
read_CAPRI(filepath)
read_CAPRI(filepath)
filepath |
path to file |
the described mutational matrix as a (sparse) matrix
# "pathToDataset/myDataset.CAPRI" read_CAPRI(system.file("extdata", "example.CAPRI", package = "CIMICE", mustWork = TRUE))
# "pathToDataset/myDataset.CAPRI" read_CAPRI(system.file("extdata", "example.CAPRI", package = "CIMICE", mustWork = TRUE))
Read a "CAPRI" formatted file, from a text string
read_CAPRI_string(txt)
read_CAPRI_string(txt)
txt |
string in valid "CAPRI" format |
the described mutational matrix as a (sparse) matrix
read_CAPRI_string(" s\\g A B C D S1 0 0 0 1 S2 1 0 0 0 S3 1 0 0 0 S4 1 0 0 1 S5 1 1 0 1 S6 1 1 0 1 S7 1 0 1 1 S8 1 1 0 1 ")
read_CAPRI_string(" s\\g A B C D S1 0 0 0 1 S2 1 0 0 0 S3 1 0 0 0 S4 1 0 0 1 S5 1 1 0 1 S6 1 1 0 1 S7 1 0 1 1 S8 1 1 0 1 ")
Read a "CAPRIpop" formatted file from the file system
read_CAPRIpop(filepath)
read_CAPRIpop(filepath)
filepath |
path to file |
a list containing the described mutational matrix as a (sparse) matrix and a list of the frequency of the genotypes
# "pathToDataset/myDataset.CAPRI" read_CAPRI(system.file("extdata", "example.CAPRIpop", package = "CIMICE", mustWork = TRUE))
# "pathToDataset/myDataset.CAPRI" read_CAPRI(system.file("extdata", "example.CAPRIpop", package = "CIMICE", mustWork = TRUE))
Read a "CAPRIpop" formatted file, from a text string
read_CAPRIpop_string(txt)
read_CAPRIpop_string(txt)
txt |
string in valid "CAPRIpop" format |
the described mutational matrix as a (sparse) matrix
read_CAPRIpop_string(" s\\g A B C D freqs S1 0 0 0 1 0.1 S2 1 0 0 0 0.1 S3 1 0 0 0 0.2 S4 1 0 0 1 0.3 S5 1 1 0 1 0.05 S6 1 1 0 1 0.1 S7 1 0 1 1 0.05 S8 1 1 0 1 0.01 ")
read_CAPRIpop_string(" s\\g A B C D freqs S1 0 0 0 1 0.1 S2 1 0 0 0 0.1 S3 1 0 0 0 0.2 S4 1 0 0 1 0.3 S5 1 1 0 1 0.05 S6 1 1 0 1 0.1 S7 1 0 1 1 0.05 S8 1 1 0 1 0.01 ")
Read a MAF (Mutation Annotation Format) file and create a Mutational Matrix combining gene and sample IDs.
read_MAF(path, ...)
read_MAF(path, ...)
path |
path to MAF file |
... |
other maftools::mutCountMatrix arguments |
the mutational (sparse) matrix associated to the MAF file
read_MAF(system.file("extdata", "paac_jhu_2014_500.maf", package = "CIMICE", mustWork = TRUE))
read_MAF(system.file("extdata", "paac_jhu_2014_500.maf", package = "CIMICE", mustWork = TRUE))
also converts that matrix to a sparse matrix
read_matrix(mat)
read_matrix(mat)
mat |
a boolean mutational matrix |
the sparse mutational matrix to be used as CIMICE's input
m <- matrix(c(0,0,1,1,0,1,1,1,1), ncol=3) colnames(m) <- c("A","B","C") rownames(m) <- c("S1", "S2", "S3") read_matrix(m)
m <- matrix(c(0,0,1,1,0,1,1,1,1), ncol=3) colnames(m) <- c("A","B","C") rownames(m) <- c("S1", "S2", "S3") read_matrix(m)
Remove transitive edges from an edgelist. This procedure is temporary to cover a bug in 'relations' package.
remove_transitive_edges(E)
remove_transitive_edges(E)
E |
edge list, built from "build_topology_subset" |
a new edgelist without transitive edges (as a N*2 matrix)
l <- list(c(1,2),c(2,3), c(1,3)) remove_transitive_edges(l)
l <- list(c(1,2),c(2,3), c(1,3)) remove_transitive_edges(l)
Create the histogram of the samples' mutational frequencies
sample_mutations_hist(mutmatrix, binwidth = 1)
sample_mutations_hist(mutmatrix, binwidth = 1)
mutmatrix |
input dataset (mutational matrix) |
binwidth |
binwidth parameter for the histogram (as in ggplot) |
the newly created histogram
sample_mutations_hist(example_dataset(), binwidth = 10)
sample_mutations_hist(example_dataset(), binwidth = 10)
Dataset filtering on genes, based on their mutation count
select_genes_on_mutations(mutmatrix, n, desc = TRUE)
select_genes_on_mutations(mutmatrix, n, desc = TRUE)
mutmatrix |
input dataset (mutational matrix) to be reduced |
n |
number of genes to be kept |
desc |
TRUE: select the n least mutated genes, FALSE: select the n most mutated genes |
the modified dataset (mutational matrix)
# keep information on the 100 most mutated genes select_genes_on_mutations(example_dataset(), 5) # keep information on the 100 least mutated genes select_genes_on_mutations(example_dataset(), 5, desc = FALSE)
# keep information on the 100 most mutated genes select_genes_on_mutations(example_dataset(), 5) # keep information on the 100 least mutated genes select_genes_on_mutations(example_dataset(), 5, desc = FALSE)
Dataset filtering on samples, based on their mutation count
select_samples_on_mutations(mutmatrix, n, desc = TRUE)
select_samples_on_mutations(mutmatrix, n, desc = TRUE)
mutmatrix |
input dataset (mutational matrix) to be reduced |
n |
number of samples to be kept |
desc |
T: select the n least mutated samples, F: select the n most mutated samples |
the modified dataset (mutational matrix)
require(dplyr) # keep information on the 5 most mutated samples select_samples_on_mutations(example_dataset(), 5) # keep information on the 5 least mutated samples select_samples_on_mutations(example_dataset(), 5, desc = FALSE) # combine selections select_samples_on_mutations(example_dataset() , 5, desc = FALSE) %>% select_genes_on_mutations(5)
require(dplyr) # keep information on the 5 most mutated samples select_samples_on_mutations(example_dataset(), 5) # keep information on the 5 least mutated samples select_samples_on_mutations(example_dataset(), 5, desc = FALSE) # combine selections select_samples_on_mutations(example_dataset() , 5, desc = FALSE) %>% select_genes_on_mutations(5)
Add edge weights to a generator
set_generator_edges(generator, f_t_w_list, by = "labels")
set_generator_edges(generator, f_t_w_list, by = "labels")
generator |
a generator |
f_t_w_list |
a list of triplets (from, to, list), the triplets must not be nested in the list. For example list("A","B",0.3, "B", "C", 0.2) is a valid input. |
by |
"labels" or "samples" to use gene labels or sample labels as references for edge identifiers. |
the generator with the modified edges (invalid edges are ignored)
require(dplyr) example_dataset() %>% make_generator_stub() %>% set_generator_edges( list( "D", "A, D", 1 , "A", "A, D", 1 , "A, D", "A, C, D", 1 , "A, D", "A, B, D", 1 , "Clonal", "D", 1 , "Clonal", "A", 1 , "D", "D", 1 , "A", "A", 1 , "A, D", "A, D", 1 , "A, C, D", "A, C, D", 1 , "A, B, D", "A, B, D", 1 , "Clonal", "Clonal", 1 ))
require(dplyr) example_dataset() %>% make_generator_stub() %>% set_generator_edges( list( "D", "A, D", 1 , "A", "A, D", 1 , "A, D", "A, C, D", 1 , "A, D", "A, B, D", 1 , "Clonal", "D", 1 , "Clonal", "A", 1 , "D", "D", 1 , "A", "A", 1 , "A, D", "A, D", 1 , "A, C, D", "A, C, D", 1 , "A, B, D", "A, B, D", 1 , "Clonal", "Clonal", 1 ))
Simulate the DTMC associated to the generator to create a dataset that reflects the genotypes of 'times' cells, sampled after 'time_ticks' passages.
simulate_generator( generator, time_ticks, times, starting_label = "Clonal", by = "labels", mode = "full" )
simulate_generator( generator, time_ticks, times, starting_label = "Clonal", by = "labels", mode = "full" )
generator |
a generator |
time_ticks |
number of steps (updates) of the DTMC associated to the generato |
times |
number of sumlated cells |
starting_label |
node from which to start the simulation |
by |
"labels" or "samples" to use gene labels or sample labels as references to identify the ‘starting_label'’s node |
mode |
"full" to generate a matrix with 'times' genotypes, "compacted" to *efficiently* create an already compacted dataset (a dataset showing the genotypes and their respective frequencies) |
the simulated dataset
require(dplyr) example_dataset() %>% make_generator_stub() %>% set_generator_edges( list( "D", "A, D", 1 , "A", "A, D", 1 , "A, D", "A, C, D", 1 , "A, D", "A, B, D", 1 , "Clonal", "D", 1 , "Clonal", "A", 1 , "D", "D", 1 , "A", "A", 1 , "A, D", "A, D", 1 , "A, C, D", "A, C, D", 1 , "A, B, D", "A, B, D", 1 , "Clonal", "Clonal", 1 )) %>% finalize_generator %>% simulate_generator(3, 10)
require(dplyr) example_dataset() %>% make_generator_stub() %>% set_generator_edges( list( "D", "A, D", 1 , "A", "A, D", 1 , "A, D", "A, C, D", 1 , "A, D", "A, B, D", 1 , "Clonal", "D", 1 , "Clonal", "A", 1 , "D", "D", 1 , "A", "A", 1 , "A, D", "A, D", 1 , "A, C, D", "A, C, D", 1 , "A, B, D", "A, B, D", 1 , "Clonal", "Clonal", 1 )) %>% finalize_generator %>% simulate_generator(3, 10)
Represents this graph in dot format (a textual output format)
to_dot(out, ...)
to_dot(out, ...)
out |
the output object of CIMICE (es, from quick run) |
... |
other arguments for format_labels |
a string representing the graph in dot format
to_dot(quick_run(example_dataset()))
to_dot(quick_run(example_dataset()))
Add a sample (a row) to an existing dataset. This procedure is meant to be used with the "
update_df(mutmatrix, sampleName, ...)
update_df(mutmatrix, sampleName, ...)
mutmatrix |
an existing (sparse) matrix (mutational matrix) |
sampleName |
the row (sample) name |
... |
sample's genotype (0/1 numbers) |
the modified (sparse) matrix (mutational matrix)
require(dplyr) make_dataset(APC,P53,KRAS) %>% update_df("S1", 1, 0, 1) %>% update_df("S2", 1, 1, 1)
require(dplyr) make_dataset(APC,P53,KRAS) %>% update_df("S1", 1, 0, 1) %>% update_df("S2", 1, 1, 1)