Title: | Forward Genetic Simulation of Cancer Progression with Epistasis |
---|---|
Description: | Functions for forward population genetic simulation in asexual populations, with special focus on cancer progression. Fitness can be an arbitrary function of genetic interactions between multiple genes or modules of genes, including epistasis, order restrictions in mutation accumulation, and order effects. Fitness (including just birth, just death, or both birth and death) can also be a function of the relative and absolute frequencies of other genotypes (i.e., frequency-dependent fitness). Mutation rates can differ between genes, and we can include mutator/antimutator genes (to model mutator phenotypes). Simulating multi-species scenarios and therapeutic interventions, including adaptive therapy, is also possible. Simulations use continuous-time models and can include driver and passenger genes and modules. Also included are functions for: simulating random DAGs of the type found in Oncogenetic Trees, Conjunctive Bayesian Networks, and other cancer progression models; plotting and sampling from single or multiple realizations of the simulations, including single-cell sampling; plotting the parent-child relationships of the clones; generating random fitness landscapes (Rough Mount Fuji, House of Cards, additive, NK, Ising, and Eggbox models) and plotting them. |
Authors: | Ramon Diaz-Uriarte [aut, cre], Sergio Sanchez-Carrillo [aut], Juan Antonio Miguel Gonzalez [aut], Alberto Gonzalez Klein [aut], Javier Mu\~noz Haro [aut], Javier Lopez Cano [aut], Niklas Endres [ctb], Mark Taylor [ctb], Arash Partow [ctb], Sophie Brouillet [ctb], Sebastian Matuszewski [ctb], Harry Annoni [ctb], Luca Ferretti [ctb], Guillaume Achaz [ctb], Tymoteusz Wolodzko [ctb], Guillermo Gorines Cordero [ctb], Ivan Lorca Alonso [ctb], Francisco Mu\~noz Lopez [ctb], David Roncero Moro\~no [ctb], Alvaro Quevedo [ctb], Pablo Perez [ctb], Cristina Devesa [ctb], Alejandro Herrador [ctb], Holger Froehlich [ctb], Florian Markowetz [ctb], Achim Tresch [ctb], Theresa Niederberger [ctb], Christian Bender [ctb], Matthias Maneck [ctb], Claudio Lottaz [ctb], Tim Beissbarth [ctb], Sara Dorado Alfaro [ctb], Miguel Hernandez del Valle [ctb], Alvaro Huertas Garcia [ctb], Diego Ma\~nanes Cayero [ctb], Alejandro Martin Mu\~noz [ctb], Marta Couce Iglesias [ctb], Silvia Garcia Cobos [ctb], Carlos Madariaga Aramendi [ctb], Ana Rodriguez Ronchel [ctb], Lucia Sanchez Garcia [ctb], Yolanda Benitez Quesada [ctb], Asier Fernandez Pato [ctb], Esperanza Lopez Lopez [ctb], Alberto Manuel Parra Perez [ctb], Jorge Garcia Calleja [ctb], Ana del Ramo Galian [ctb], Alejandro de los Reyes Benitez [ctb], Guillermo Garcia Hoyos [ctb], Rosalia Palomino Cabrera [ctb], Rafael Barrero Rodriguez [ctb], Silvia Talavera Marcos [ctb] |
Maintainer: | Ramon Diaz-Uriarte <[email protected]> |
License: | GPL (>= 3) |
Version: | 4.9.0 |
Built: | 2024-12-24 06:02:47 UTC |
Source: | https://github.com/bioc/OncoSimulR |
Given one or more of a set of poset restrictions, epistatic interactions, order effects, and genes without interactions, as well as, optionally, a mapping of genes to modules, return the complete fitness specification.
For mutator effects, given one or more of a set of epistatic interactions and genes without interactions, as well as, optionally, a mapping of genes to modules, return the complete specification of how mutations affect the mutation rate.
This function can be used also to produce the fitness specification
needed to run simulations in a frequency dependent fitness way. In that
situation we presume that the effects must be considered as fitness effects
and never as mutator effects (see details
for more info).
The output of these functions is not intended for user consumption, but as a way of preparing data to be sent to the C++ code.
allFitnessEffects(rT = NULL, epistasis = NULL, orderEffects = NULL, noIntGenes = NULL, geneToModule = NULL, drvNames = NULL, genotFitness = NULL, keepInput = TRUE, frequencyDependentBirth = FALSE, frequencyDependentDeath = FALSE, frequencyDependentFitness, frequencyType = NA, deathSpec = FALSE) allMutatorEffects(epistasis = NULL, noIntGenes = NULL, geneToModule = NULL, keepInput = TRUE)
allFitnessEffects(rT = NULL, epistasis = NULL, orderEffects = NULL, noIntGenes = NULL, geneToModule = NULL, drvNames = NULL, genotFitness = NULL, keepInput = TRUE, frequencyDependentBirth = FALSE, frequencyDependentDeath = FALSE, frequencyDependentFitness, frequencyType = NA, deathSpec = FALSE) allMutatorEffects(epistasis = NULL, noIntGenes = NULL, geneToModule = NULL, keepInput = TRUE)
rT |
A restriction table that is an extended version of a poset
(see
This paramenter is not used if |
epistasis |
A named numeric vector. The names identify the relationship, and the numeric value is the fitness (or mutator) effect. For the names, each of the genes or modules involved is separated by a ":". A negative sign denotes the absence of that term. This paramenter is not used if |
orderEffects |
A named numeric vector, as for This paramenter is not used if |
noIntGenes |
A numeric vector (optionally named) with the fitness coefficients (or mutator multiplier factor) of genes (only genes, not modules) that show no interactions. These genes cannot be part of modules. But you can specify modules that have no epistatic interactions. See examples and vignette. Of course, avoid using potentially confusing characters in the names. In particular, "," and ">" are not allowed as gene names. This paramenter is not used if |
geneToModule |
A named character vector that allows to match genes and modules. The
names are the modules, and each of the values is a character vector
with the gene names, separated by a comma, that correspond to a
module. Note that modules cannot share genes. There is no need for
modules to contain more than one gene. If you specify a geneToModule
argument, and you used a restriction table, the This paramenter is not used if |
drvNames |
The names of genes that are considered drivers. This is
only used for: a) deciding when to stop the simulations, in case you
use number of drivers as a simulation stopping criterion (see
|
genotFitness |
A matrix or data frame that contains explicitly the mapping of genotypes to birth and optionally death. For now, we only allow epistasis-like relations between genes (so you cannot code order effects this way). Genotypes can be specified in two ways:
When In contrast, if If we want to use absolute numbers (absolute frequencies), just
subtitute "f" for "n". The choice between relative or absolute
frequencies may be specified also in Mathematical operations and symbols allowed are described in the documentation of C++'s library ExprTk that is used to parse and evaluate the fitness equations (see references for more information). |
keepInput |
If TRUE, whether to keep the original input. This is only useful for human consumption of the output. It is useful because it is easier to decode, say, the restriction table from the data frame than from the internal representation. But if you want, you can set it to FALSE and the object will be a little bit smaller. |
frequencyDependentBirth |
If FALSE, the default value, all downstream work will be realised in a
way not related to frequency depedent fitness situations. That implies
that fitness specifications are fixed, except death rate in case of
McFarland model (see |
frequencyDependentDeath |
If FALSE, the default value, all downstream work will be realised in a
way not related to frequency depedent fitness situations. That implies
that fitness specifications are fixed, except death rate in case of
McFarland model (see |
frequencyDependentFitness |
NA. |
frequencyType |
frequencyType is a character that specify whether
we are using absolute or relatives frequecies and can take tree values
depending on |
deathSpec |
If FALSE, the default value, all downstream work will be realised in a
way in which we assume that death is not specified by the user in |
allFitnessEffects
is used for extremely flexible specification of fitness
and mutator effects, including posets, XOR relationships, synthetic mortality and
synthetic viability, arbitrary forms of epistatis, arbitrary forms of
order effects, etc. allFitnessEffects
produce the output necessary
to pass to the C++ code the fitness/mutator specifications to run simulations.
Please, see the vignette for detailed and commented examples.
allMutatorEffects
provide the same flexibility, but without
order and posets (this might be included in the future, but I have
seen no empirical or theoretical argument for their existence or
relevance as of now, so I do not add them to minimize unneeded complexity).
If you use both for simulations in the same call to, say,
oncoSimulIndiv
, all the genes specified in
allMutatorEffects
MUST be included in the
allFitnessEffects
object. If you want to have genes that have
no direct effect on fitness, but that affect mutation rate, you MUST
specify them in the call to allFitnessEffects
, for instance as
noIntGenes
with an effect of 0. When you run the simulations in
frequencyDependentBirth
= TRUE or frequencyDependentDeath
= TRUE
only fitness effects are allowed, and must be codified in genotFitness
.
If you use genotFitness
then you cannot pass modules,
noIntgenes, epistasis, or rT. This makes sense, because using
genotFitness
is saying
"this is the mapping of genotypes to birth and maybe death. Period", so we should
not allow further modifications from other terms. This is always the
case when frequencyDependentBirth
= TRUE or
frequencyDependentDeath
= TRUE.
If you use genotFitness
you need to be careful when you use
Bozic's model (as you get a death rate of 0).
If you use genotFitness
note that we force the WT (wildtype) to
always be 1 so birth rates (death rates) are rescaled in case of
frequencyDependentBirth = FALSE
(frequencyDependentDeath = FALSE
).
In contrast, when frequencyDependentBirth = TRUE
(frequencyDependentDeath = TRUE
)
you are free to determine the birth rate (death rate) as a function of the frequencies of the
genotypes (see genotFitness
and the vignette).
When using genotFitness
, any genotype with a fitness <= 1e-9 is removed from the table of genotypes, thus
making it a non-viable genotype during simulations.
An object of class "fitnessEffects" or "mutatorEffects". This is just a list, but it is not intended for human consumption. The components are:
long.rt |
The restriction table in "long format", so as to be easy to parse by the C++ code. |
long.epistasis |
Ditto, but for the epistasis specification. |
long.orderEffects |
Ditto for the order effects. |
long.geneNoInt |
Ditto for the non-interaction genes. |
geneModule |
Similar, for the gene-module correspondence. |
graph |
An |
drv |
The numeric identifiers of the drivers. The numbers correspond to the internal numeric coding of the genes. |
rT |
If |
epistasis |
If |
orderEffects |
If |
noIntGenes |
If |
fitnessLandscape |
A data.frame that contains number of genes + 1 columns, where the first columns are the genes (1 if mutated and 0 if not) and the last one contains the fitnesses. |
fitnessLandscape_df |
A data.frame with the same information of |
fitnessLandscape_gene_id |
A data.frame with two columns (Gene and GeneNumID), that map by rows genes as letters (Gene) with genes as numbers (GeneNumID). |
fitnessLandscapeVariables |
A character vector that contains the frequency variables necessary for the C++ code. The "fvars". |
frequencyDependentBirth |
TRUE or FALSE as we have explained before. |
frequencyDependentDeath |
TRUE or FALSE as we have explained before. |
frequencyDependentFitness |
DEPRECATED. Use instead of |
frequencyType |
A character string "abs" or "rel" (or NULL). |
deathSpec |
TRUE or FALSE as we have explained before. |
full_FDF_spec |
For frequency-dependent birth (death), a complete data frame showing the genotypes (as matrix, letters, and "fvars") and the birth (death) specification, in terms of the original specification (Birth_as_letters (Death_as_letters) and with genotypes mapped to numbers according to the "fvars" (Birth_as_fvars (Death_as_fvars)). If birth (death) was originally specified in terms of numbers, these two columns will be identical. All the information in this data frame is implicitly above, but this simplifies checking that you are doing what you think you are doing. |
Please, note that the meaning of the fitness effects in the McFarland model is not the same as in the original paper; the fitness coefficients are transformed to allow for a simpler fitness function as a product of terms. This differs with respect to v.1. See the vignette for details.
The names of the genes and modules can be fairly arbitrary. But if you try hard you can confuse the parser. For instance, using gene or module names that contain "," or ":", or ">" is likely to get you into trouble. Of course, you know you should not try to use those characters because you know those characters have special meanings to separate names or indicate epistasis or order relationships. Right now, using those characters as names is caught (and result in stopping) if passed as names for noIntGenes.
At the moment, the variables you need to specify in the fitness
equations when you are in a frequency dependent fitness situation are
fixed as we have explained in genotFitness
. Perhaps using
different and strange combinations of "f_" or "n_" followed by letters
and numbers you could confuse the R parser, but never the C++ one. For
a correct performance please be aware of this.
Ramon Diaz-Uriarte
Diaz-Uriarte, R. (2015). Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling http://www.biomedcentral.com/1471-2105/16/41/abstract.
McFarland, C.~D. et al. (2013). Impact of deleterious passenger mutations on cancer progression. Proceedings of the National Academy of Sciences of the United States of America\/, 110(8), 2910–5.
Partow, A. ExprTk: C++ Mathematical Expression Library (MIT Open Souce License). http://www.partow.net/programming/exprtk/.
evalGenotype
,
evalAllGenotypes
,
oncoSimulIndiv
,
plot.fitnessEffects
,
evalGenotypeFitAndMut
,
rfitness
,
plotFitnessLandscape
## A simple poset or CBN-like example cs <- data.frame(parent = c(rep("Root", 4), "a", "b", "d", "e", "c"), child = c("a", "b", "d", "e", "c", "c", rep("g", 3)), s = 0.1, sh = -0.9, typeDep = "MN") cbn1 <- allFitnessEffects(cs) plot(cbn1) ## A more complex example, that includes a restriction table ## order effects, epistasis, genes without interactions, and moduels p4 <- data.frame(parent = c(rep("Root", 4), "A", "B", "D", "E", "C", "F"), child = c("A", "B", "D", "E", "C", "C", "F", "F", "G", "G"), s = c(0.01, 0.02, 0.03, 0.04, 0.1, 0.1, 0.2, 0.2, 0.3, 0.3), sh = c(rep(0, 4), c(-.9, -.9), c(-.95, -.95), c(-.99, -.99)), typeDep = c(rep("--", 4), "XMPN", "XMPN", "MN", "MN", "SM", "SM")) oe <- c("C > F" = -0.1, "H > I" = 0.12) sm <- c("I:J" = -1) sv <- c("-K:M" = -.5, "K:-M" = -.5) epist <- c(sm, sv) modules <- c("Root" = "Root", "A" = "a1", "B" = "b1, b2", "C" = "c1", "D" = "d1, d2", "E" = "e1", "F" = "f1, f2", "G" = "g1", "H" = "h1, h2", "I" = "i1", "J" = "j1, j2", "K" = "k1, k2", "M" = "m1") set.seed(1) ## for repeatability noint <- rexp(5, 10) names(noint) <- paste0("n", 1:5) fea <- allFitnessEffects(rT = p4, epistasis = epist, orderEffects = oe, noIntGenes = noint, geneToModule = modules) plot(fea) ## Modules that show, between them, ## no epistasis (so multiplicative effects). ## We specify the individual terms, but no value for the ":". fnme <- allFitnessEffects(epistasis = c("A" = 0.1, "B" = 0.2), geneToModule = c("A" = "a1, a2", "B" = "b1")) evalAllGenotypes(fnme, order = FALSE, addwt = TRUE) ## Epistasis for fitness and simple mutator effects fe <- allFitnessEffects(epistasis = c("a : b" = 0.3, "b : c" = 0.5), noIntGenes = c("e" = 0.1)) fm <- allMutatorEffects(noIntGenes = c("a" = 10, "c" = 5)) evalAllGenotypesFitAndMut(fe, fm, order = FALSE) ## Simple fitness effects (noIntGenes) and modules ## for mutators fe2 <- allFitnessEffects(noIntGenes = c(a1 = 0.1, a2 = 0.2, b1 = 0.01, b2 = 0.3, b3 = 0.2, c1 = 0.3, c2 = -0.2)) fm2 <- allMutatorEffects(epistasis = c("A" = 5, "B" = 10, "C" = 3), geneToModule = c("A" = "a1, a2", "B" = "b1, b2, b3", "C" = "c1, c2")) evalAllGenotypesFitAndMut(fe2, fm2, order = FALSE) ## Passing fitness directly, a complete fitness specification ## with a two column data frame with genotypes as character vectors (m4 <- data.frame(G = c("A, B", "A", "WT", "B"), F = c(3, 2, 1, 4))) fem4 <- allFitnessEffects(genotFitness = m4) ## Verify it interprets what it should: m4 is the same as the evaluation ## of the fitness effects (note row reordering) evalAllGenotypes(fem4, addwt = TRUE, order = FALSE) ## Passing fitness directly, a complete fitness specification ## that uses a three column matrix m5 <- cbind(c(0, 1, 0, 1), c(0, 0, 1, 1), c(1, 2, 3, 5.5)) fem5 <- allFitnessEffects(genotFitness = m5) ## Verify it interprets what it should: m5 is the same as the evaluation ## of the fitness effects evalAllGenotypes(fem5, addwt = TRUE, order = FALSE) ## Passing fitness directly, an incomplete fitness specification ## that uses a three column matrix m6 <- cbind(c(1, 1), c(1, 0), c(2, 3)) fem6 <- allFitnessEffects(genotFitness = m6) evalAllGenotypes(fem6, addwt = TRUE, order = FALSE) ## Plotting a fitness landscape fe2 <- allFitnessEffects(noIntGenes = c(a1 = 0.1, b1 = 0.01, c1 = 0.3)) plot(evalAllGenotypes(fe2, order = FALSE)) ## same as plotFitnessLandscape(evalAllGenotypes(fe2, order = FALSE)) ## same as plotFitnessLandscape(fe2) ###### Defaults for missing genotypes ## As a two-column data frame (m8 <- data.frame(G = c("A, B, C", "B"), F = c(3, 2))) evalAllGenotypes(allFitnessEffects(genotFitness = m8), addwt = TRUE) ## As a matrix (m9 <- rbind(c(0, 1, 0, 1, 4), c(1, 0, 1, 0, 1.5))) evalAllGenotypes(allFitnessEffects(genotFitness = m9), addwt = TRUE) ######### Frequency Dependent Birth genofit <- data.frame(A = c(0, 1, 0, 1), B = c(0, 0, 1, 1), Birth = c("max(3, 2*f_)", "max(1.5, 3*(f_ + f_1))", "max(2, 3*(f_ + f_2))", "max(2, 5*f_ - 0.5*( f_1 + f_2) + 15*f_1_2)"), stringsAsFactors = FALSE) afe <- allFitnessEffects(genotFitness = genofit, frequencyDependentBirth = TRUE, frequencyType = "rel") ##Ploting fitness landscape in case of spPopSizes = c(5000, 2500, 3000, 7500) plotFitnessLandscape(evalAllGenotypes(afe, spPopSizes = c(5000, 2500, 3000, 7500))) ## Reinitialize the seed set.seed(NULL)
## A simple poset or CBN-like example cs <- data.frame(parent = c(rep("Root", 4), "a", "b", "d", "e", "c"), child = c("a", "b", "d", "e", "c", "c", rep("g", 3)), s = 0.1, sh = -0.9, typeDep = "MN") cbn1 <- allFitnessEffects(cs) plot(cbn1) ## A more complex example, that includes a restriction table ## order effects, epistasis, genes without interactions, and moduels p4 <- data.frame(parent = c(rep("Root", 4), "A", "B", "D", "E", "C", "F"), child = c("A", "B", "D", "E", "C", "C", "F", "F", "G", "G"), s = c(0.01, 0.02, 0.03, 0.04, 0.1, 0.1, 0.2, 0.2, 0.3, 0.3), sh = c(rep(0, 4), c(-.9, -.9), c(-.95, -.95), c(-.99, -.99)), typeDep = c(rep("--", 4), "XMPN", "XMPN", "MN", "MN", "SM", "SM")) oe <- c("C > F" = -0.1, "H > I" = 0.12) sm <- c("I:J" = -1) sv <- c("-K:M" = -.5, "K:-M" = -.5) epist <- c(sm, sv) modules <- c("Root" = "Root", "A" = "a1", "B" = "b1, b2", "C" = "c1", "D" = "d1, d2", "E" = "e1", "F" = "f1, f2", "G" = "g1", "H" = "h1, h2", "I" = "i1", "J" = "j1, j2", "K" = "k1, k2", "M" = "m1") set.seed(1) ## for repeatability noint <- rexp(5, 10) names(noint) <- paste0("n", 1:5) fea <- allFitnessEffects(rT = p4, epistasis = epist, orderEffects = oe, noIntGenes = noint, geneToModule = modules) plot(fea) ## Modules that show, between them, ## no epistasis (so multiplicative effects). ## We specify the individual terms, but no value for the ":". fnme <- allFitnessEffects(epistasis = c("A" = 0.1, "B" = 0.2), geneToModule = c("A" = "a1, a2", "B" = "b1")) evalAllGenotypes(fnme, order = FALSE, addwt = TRUE) ## Epistasis for fitness and simple mutator effects fe <- allFitnessEffects(epistasis = c("a : b" = 0.3, "b : c" = 0.5), noIntGenes = c("e" = 0.1)) fm <- allMutatorEffects(noIntGenes = c("a" = 10, "c" = 5)) evalAllGenotypesFitAndMut(fe, fm, order = FALSE) ## Simple fitness effects (noIntGenes) and modules ## for mutators fe2 <- allFitnessEffects(noIntGenes = c(a1 = 0.1, a2 = 0.2, b1 = 0.01, b2 = 0.3, b3 = 0.2, c1 = 0.3, c2 = -0.2)) fm2 <- allMutatorEffects(epistasis = c("A" = 5, "B" = 10, "C" = 3), geneToModule = c("A" = "a1, a2", "B" = "b1, b2, b3", "C" = "c1, c2")) evalAllGenotypesFitAndMut(fe2, fm2, order = FALSE) ## Passing fitness directly, a complete fitness specification ## with a two column data frame with genotypes as character vectors (m4 <- data.frame(G = c("A, B", "A", "WT", "B"), F = c(3, 2, 1, 4))) fem4 <- allFitnessEffects(genotFitness = m4) ## Verify it interprets what it should: m4 is the same as the evaluation ## of the fitness effects (note row reordering) evalAllGenotypes(fem4, addwt = TRUE, order = FALSE) ## Passing fitness directly, a complete fitness specification ## that uses a three column matrix m5 <- cbind(c(0, 1, 0, 1), c(0, 0, 1, 1), c(1, 2, 3, 5.5)) fem5 <- allFitnessEffects(genotFitness = m5) ## Verify it interprets what it should: m5 is the same as the evaluation ## of the fitness effects evalAllGenotypes(fem5, addwt = TRUE, order = FALSE) ## Passing fitness directly, an incomplete fitness specification ## that uses a three column matrix m6 <- cbind(c(1, 1), c(1, 0), c(2, 3)) fem6 <- allFitnessEffects(genotFitness = m6) evalAllGenotypes(fem6, addwt = TRUE, order = FALSE) ## Plotting a fitness landscape fe2 <- allFitnessEffects(noIntGenes = c(a1 = 0.1, b1 = 0.01, c1 = 0.3)) plot(evalAllGenotypes(fe2, order = FALSE)) ## same as plotFitnessLandscape(evalAllGenotypes(fe2, order = FALSE)) ## same as plotFitnessLandscape(fe2) ###### Defaults for missing genotypes ## As a two-column data frame (m8 <- data.frame(G = c("A, B, C", "B"), F = c(3, 2))) evalAllGenotypes(allFitnessEffects(genotFitness = m8), addwt = TRUE) ## As a matrix (m9 <- rbind(c(0, 1, 0, 1, 4), c(1, 0, 1, 0, 1.5))) evalAllGenotypes(allFitnessEffects(genotFitness = m9), addwt = TRUE) ######### Frequency Dependent Birth genofit <- data.frame(A = c(0, 1, 0, 1), B = c(0, 0, 1, 1), Birth = c("max(3, 2*f_)", "max(1.5, 3*(f_ + f_1))", "max(2, 3*(f_ + f_2))", "max(2, 5*f_ - 0.5*( f_1 + f_2) + 15*f_1_2)"), stringsAsFactors = FALSE) afe <- allFitnessEffects(genotFitness = genofit, frequencyDependentBirth = TRUE, frequencyType = "rel") ##Ploting fitness landscape in case of spPopSizes = c(5000, 2500, 3000, 7500) plotFitnessLandscape(evalAllGenotypes(afe, spPopSizes = c(5000, 2500, 3000, 7500))) ## Reinitialize the seed set.seed(NULL)
Summary results from some benchmarks reported in the vignette. Included are timings, sizes of return objects and key oputput from each simulation.
They are here mainly to facilitate creation of table from the vignette itself. The scripts are available under "inst/miscell".
data(benchmark_1) data(benchmark_1_0.05) data(benchmark_2) data(benchmark_3)
data(benchmark_1) data(benchmark_1_0.05) data(benchmark_2) data(benchmark_3)
Data frames.
data(benchmark_1) benchmark_1
data(benchmark_1) benchmark_1
This functions checks that the user has specified correctly the interventions and also makes some modifications in the specification, so the "core" of the code that runs the simulation can "understand" them and execute them.
createInterventions(interventions, genotFitness, frequencyType = "auto")
createInterventions(interventions, genotFitness, frequencyType = "auto")
interventions |
Interventions must be a list of lists, where each "sub-list" must have the following fields: * ID: The identifier of the intervention, must be unique. * Trigger: The situation in the simulation that triggers/activates the intervention. * What Happens: "What happens" in the simulation. Basically, once the trigger is satisfied, this defines how the population is affected by the intervention. * Periodicity: Defines the periodicity of the intervention. * Repetitions: Defines the maximum repetitions of each intervention in the simulation. |
genotFitness |
Object that |
frequencyType |
If you want to specify the frequency type of the simulation, by default is set to "auto" |
See the vignette for details about differences between intervening on the total population or over specific genotypes and when do each occur.
Returns the same list of list that the user specifies, but with the following changes:
First, it transforms the arguments that refer to the genotipes, for example:
n_A is the actual population of A in the simulation for a T given.
But in the C++ part, "A" receives a Genotype ID, in this case 1, so n_A in the simulation
is n_1. (For more info run allFitnessEffects
with parameter frequencyDependentFitness = TRUE
,
then, check the data that returns, specificly, the field $full_FDF_spec
.
There you have more info about those transformations).
Then, it checks that all fields of the sub-lists are correctly specified.
Finally, it returns the list of interventions with the modifications needed for the code to interpret it correctly.
#first we create and the populations to simulate. fa1 <- data.frame(Genotype = c("A", "B"), Fitness = c("1.001 + (0*n_A)", "1.002")) afd3 <- allFitnessEffects(genotFitness = fa1, frequencyDependentFitness = TRUE, frequencyType = "abs") # now we especify intervention to drastically reduce A population # depending on the T of the simulation list_of_interventions <- list( list(ID = "intOverA", Trigger = "(T >= 5)", WhatHappens = "n_A = n_A * 0.1", Repetitions = 0, Periodicity = Inf ) ) # we transform the intervention to somthing the simulation can process final_interventions <- createInterventions(interventions = list_of_interventions, afd3) # we run the simulations passing the interventions as an argument ep2 <- oncoSimulIndiv( afd3, model = "McFL", mu = 1e-4, initSize = c(20000, 20000), initMutant = c("A", "B"), sampleEvery = 0.01, finalTime = 5.2, onlyCancer = FALSE, interventions = final_interventions ) # you can also make the intervention depend on the total population list_of_interventions1 <- list( list(ID = "intOverTotPop", Trigger = "(N >= 5000)", WhatHappens = "N = N * 0.1", Repetitions = 0, Periodicity = Inf ) ) # or depend over the population of a genotype list_of_interventions2 <- list( list(ID = "intOverTotPop", Trigger = "(n_A >= 5000)", WhatHappens = "n_ = n_B * 0.1", Repetitions = 0, Periodicity = Inf ) ) # or mix it all together using logic conectors list_of_interventions2 <- list( list(ID = "intOverTotPop", Trigger = "((n_A >= 200) and (N >= 2000)) or (T > 20)", WhatHappens = "n_ = n_B * 0.1", Repetitions = 0, Periodicity = Inf ) )
#first we create and the populations to simulate. fa1 <- data.frame(Genotype = c("A", "B"), Fitness = c("1.001 + (0*n_A)", "1.002")) afd3 <- allFitnessEffects(genotFitness = fa1, frequencyDependentFitness = TRUE, frequencyType = "abs") # now we especify intervention to drastically reduce A population # depending on the T of the simulation list_of_interventions <- list( list(ID = "intOverA", Trigger = "(T >= 5)", WhatHappens = "n_A = n_A * 0.1", Repetitions = 0, Periodicity = Inf ) ) # we transform the intervention to somthing the simulation can process final_interventions <- createInterventions(interventions = list_of_interventions, afd3) # we run the simulations passing the interventions as an argument ep2 <- oncoSimulIndiv( afd3, model = "McFL", mu = 1e-4, initSize = c(20000, 20000), initMutant = c("A", "B"), sampleEvery = 0.01, finalTime = 5.2, onlyCancer = FALSE, interventions = final_interventions ) # you can also make the intervention depend on the total population list_of_interventions1 <- list( list(ID = "intOverTotPop", Trigger = "(N >= 5000)", WhatHappens = "N = N * 0.1", Repetitions = 0, Periodicity = Inf ) ) # or depend over the population of a genotype list_of_interventions2 <- list( list(ID = "intOverTotPop", Trigger = "(n_A >= 5000)", WhatHappens = "n_ = n_B * 0.1", Repetitions = 0, Periodicity = Inf ) ) # or mix it all together using logic conectors list_of_interventions2 <- list( list(ID = "intOverTotPop", Trigger = "((n_A >= 200) and (N >= 2000)) or (T > 20)", WhatHappens = "n_ = n_B * 0.1", Repetitions = 0, Periodicity = Inf ) )
This functions check that the user has specified correctly the user variables and rules and also makes some modifications in the specification, so the "core" of the code that runs the simulation can "understand" them and execute them.
createUserVars(userVars) createRules(rules, genotFitness, frequencyType = "auto")
createUserVars(userVars) createRules(rules, genotFitness, frequencyType = "auto")
userVars |
userVars must be a list of lists, where each "sub-list" must have the following fields: * name: The name of the variable, must be unique. * Value: initial numeric value of the variable. |
rules |
rules must be a list of lists, where each "sub-list" must have the following fields: * ID: The identifier of the rule, must be unique. * Condition: boolean expression that, if true, determines the execution of the rule. * Condition: expression thatdetermines the variables that will be modified when the condition is true, it can be arbitrarily complex using other simulation parameters such as N, T and genotype populations and rates. |
genotFitness |
Object that |
frequencyType |
If you want to specify the frequency type of the simulation, by default is set to "auto" |
N/A
For createUserVars
, the same list that the user specifies, after checking that all the parameters are correctly specified.
For createRules
the same list of list that the user specifies,
but with the following changes:
First, it transforms the arguments that refer to the genotipes, for example:
n_A is the actual population of A in the simulation for a T given.
But in the C++ part, "A" receives a Genotype ID, in this case 1, so n_A in the simulation
is n_1. (For more info run allFitnessEffects
with parameter frequencyDependentFitness = TRUE
,
then, check the data that returns, specificly, the field $full_FDF_spec
.
There you have more info about those transformations).
Then, it checks that all fields of the sub-lists are correctly specified.
Finally, it returns the list of rules with the modifications needed for the code to interpret it correctly.
#first we create and the populations to simulate. fa1 <- data.frame(Genotype = c("A", "B"), Fitness = c("1.001 + (0*n_A)", "1.002")) afd3 <- allFitnessEffects(genotFitness = fa1, frequencyDependentFitness = TRUE, frequencyType = "abs") # now we specify some user variables userVars <- list( list(Name = "user_var1", Value = 0 ), list(Name = "user_var2", Value = 3 ), list(Name = "user_var3", Value = 2.5 ) ) # we call the function to check the specification of the variables userVars <- createUserVars(userVars = userVars) # we determine the rules that modify the variables rules <- list( list(ID = "rule_1", Condition = "T > 20", Action = "user_var_1 = 1" ),list(ID = "rule_2", Condition = "T > 30", Action = "user_var_2 = 2; user_var3 = 2*N" ),list(ID = "rule_3", Condition = "T > 40", Action = "user_var_3 = 3;user_var_2 = n_A*n_B" ) ) # we call the function to check the specification of the rules rules <- createRules(rules = rules, afd3) # we run the simulations passing theese lists as arguments ep3 <- oncoSimulIndiv( afd3, model = "McFL", mu = 1e-4, initSize = c(20000, 20000), initMutant = c("A", "B"), sampleEvery = 0.01, finalTime = 5.2, onlyCancer = FALSE, userVars = userVars, rules = rules ) # you can also make the rules depend on the total population rules <- list( list(ID = "rule_1", Condition = "N > 5000", Action = "user_var_1 = 1" ),list(ID = "rule_2", Condition = "N <= 5000", Action = "user_var_1 = 2" ),list(ID = "rule_3", Condition = "N > 4000", Action = "user_var_2 = 1;user_var_3 = 1" ),list(ID = "rule_4", Condition = "N <= 4000", Action = "user_var_2 = 2;user_var_3 = 3" ) ) # or depend on the population of each genotype rules <- list( list(ID = "rule_1", Condition = "n_B > 300", Action = "user_var_1 = 1" ),list(ID = "rule_2", Condition = "n_B > 400", Action = "user_var_1 = 2" ),list(ID = "rule_3", Condition = "n_B <= 300", Action = "user_var_1 = 3" ),list(ID = "rule_4", Condition = "n_B <= 200", Action = "user_var_1 = 4" ) ) # or depend on other previously defined user vars rules <- list( list(ID = "rule_3", Condition = "T > 10", Action = "user_var_1 = 1" ),list(ID = "rule_1", Condition = "user_var_1 = 0", Action = "user_var_2 = 1" ),list(ID = "rule_2", Condition = "user_var_1 = 1", Action = "user_var_2 = 2" ) ) # or mix it all together using logic conectors rules <- list( list(ID = "rule_3", Condition = "T > 10 and N < 5000", Action = "user_var_1 = 1" ),list(ID = "rule_1", Condition = "user_var_1 = 0 and n_B > 1000", Action = "user_var_2 = 1" ) )
#first we create and the populations to simulate. fa1 <- data.frame(Genotype = c("A", "B"), Fitness = c("1.001 + (0*n_A)", "1.002")) afd3 <- allFitnessEffects(genotFitness = fa1, frequencyDependentFitness = TRUE, frequencyType = "abs") # now we specify some user variables userVars <- list( list(Name = "user_var1", Value = 0 ), list(Name = "user_var2", Value = 3 ), list(Name = "user_var3", Value = 2.5 ) ) # we call the function to check the specification of the variables userVars <- createUserVars(userVars = userVars) # we determine the rules that modify the variables rules <- list( list(ID = "rule_1", Condition = "T > 20", Action = "user_var_1 = 1" ),list(ID = "rule_2", Condition = "T > 30", Action = "user_var_2 = 2; user_var3 = 2*N" ),list(ID = "rule_3", Condition = "T > 40", Action = "user_var_3 = 3;user_var_2 = n_A*n_B" ) ) # we call the function to check the specification of the rules rules <- createRules(rules = rules, afd3) # we run the simulations passing theese lists as arguments ep3 <- oncoSimulIndiv( afd3, model = "McFL", mu = 1e-4, initSize = c(20000, 20000), initMutant = c("A", "B"), sampleEvery = 0.01, finalTime = 5.2, onlyCancer = FALSE, userVars = userVars, rules = rules ) # you can also make the rules depend on the total population rules <- list( list(ID = "rule_1", Condition = "N > 5000", Action = "user_var_1 = 1" ),list(ID = "rule_2", Condition = "N <= 5000", Action = "user_var_1 = 2" ),list(ID = "rule_3", Condition = "N > 4000", Action = "user_var_2 = 1;user_var_3 = 1" ),list(ID = "rule_4", Condition = "N <= 4000", Action = "user_var_2 = 2;user_var_3 = 3" ) ) # or depend on the population of each genotype rules <- list( list(ID = "rule_1", Condition = "n_B > 300", Action = "user_var_1 = 1" ),list(ID = "rule_2", Condition = "n_B > 400", Action = "user_var_1 = 2" ),list(ID = "rule_3", Condition = "n_B <= 300", Action = "user_var_1 = 3" ),list(ID = "rule_4", Condition = "n_B <= 200", Action = "user_var_1 = 4" ) ) # or depend on other previously defined user vars rules <- list( list(ID = "rule_3", Condition = "T > 10", Action = "user_var_1 = 1" ),list(ID = "rule_1", Condition = "user_var_1 = 0", Action = "user_var_2 = 1" ),list(ID = "rule_2", Condition = "user_var_1 = 1", Action = "user_var_2 = 2" ) ) # or mix it all together using logic conectors rules <- list( list(ID = "rule_3", Condition = "T > 10 and N < 5000", Action = "user_var_1 = 1" ),list(ID = "rule_1", Condition = "user_var_1 = 0 and n_B > 1000", Action = "user_var_2 = 1" ) )
Given a fitnessEffects/mutatorEffects description, obtain the fitness/mutator effects of a single or all genotypes.
evalGenotype(genotype, fitnessEffects, spPopSizes = NULL, verbose = FALSE, echo = FALSE, model = "", currentTime = 0) evalGenotypeMut(genotype, mutatorEffects, spPopSizes = NULL, verbose = FALSE, echo = FALSE, currentTime = 0) evalAllGenotypes(fitnessEffects, order = FALSE, max = 256, addwt = FALSE, model = "", spPopSizes = NULL, currentTime = 0) evalAllGenotypesMut(mutatorEffects, max = 256, addwt = FALSE, spPopSizes = NULL, currentTime = 0) evalGenotypeFitAndMut(genotype, fitnessEffects, mutatorEffects, spPopSizes = NULL, verbose = FALSE, echo = FALSE, model = "", currentTime = 0) evalAllGenotypesFitAndMut(fitnessEffects, mutatorEffects, order = FALSE, max = 256, addwt = FALSE, model = "", spPopSizes = NULL, currentTime = 0)
evalGenotype(genotype, fitnessEffects, spPopSizes = NULL, verbose = FALSE, echo = FALSE, model = "", currentTime = 0) evalGenotypeMut(genotype, mutatorEffects, spPopSizes = NULL, verbose = FALSE, echo = FALSE, currentTime = 0) evalAllGenotypes(fitnessEffects, order = FALSE, max = 256, addwt = FALSE, model = "", spPopSizes = NULL, currentTime = 0) evalAllGenotypesMut(mutatorEffects, max = 256, addwt = FALSE, spPopSizes = NULL, currentTime = 0) evalGenotypeFitAndMut(genotype, fitnessEffects, mutatorEffects, spPopSizes = NULL, verbose = FALSE, echo = FALSE, model = "", currentTime = 0) evalAllGenotypesFitAndMut(fitnessEffects, mutatorEffects, order = FALSE, max = 256, addwt = FALSE, model = "", spPopSizes = NULL, currentTime = 0)
genotype |
(For Using "," or ">" makes no difference: the sequence is always taken
as the order in which mutations occurred. Whether order matters or not
is encoded in the |
fitnessEffects |
A |
mutatorEffects |
A |
order |
(For |
max |
(For |
addwt |
(For |
model |
Either nothing (the default) or "Bozic". If "Bozic" then the fitness effects contribute to decreasing the Death rate. Otherwise Birth rate is shown (and labeled as Fitness). |
verbose |
(For |
echo |
(For |
spPopSizes |
It is strongly recommended that |
currentTime |
The time of the simulation. It is possible to access to the current time and run interventions for example using the frequency-dependent-fitness functionality or modifying the mutation rate through oncoSimul functions such as oncoSimulIndiv. With evalAllGenotypes we can check if the fitness has changed before or after a specific timepoint. |
For evalGenotype
either the value of fitness or (if verbose
= TRUE
) the value of fitness and its individual components.
For evalAllGenotypes
a data frame with two columns, the Genotype
,the Birth Rate (or Death Rate, if Bozic) and the Death Rate (if deathSpec = TRUE
in fitnessEffects
. The notation for the Genotype
column is a follows: when order does not matter, a comma "," separates
the identifiers of mutated genes. When order matters, a genotype shown
as “x > y _ z” means that a mutation in “x” happened before a
mutation in “y”; there is also a mutation in “z” (which could have
happened before or after either of “x” or “y”), but “z” is a gene
for which order does not matter. In all cases, a "WT" denotes the
wild-type (or, actually, the genotype without any mutations).
If you use both fitnessEffects
and mutatorEffects
in a
call, all the genes specified in
mutatorEffects
MUST be included in the
fitnessEffects
object. If you want to have genes that have
no direct effect on fitness, but that affect mutation rate, you MUST
specify them in the call to fitnessEffects
, for instance as
noIntGenes
with an effect of 0.
When you are in a frequency dependent fitness situation you must set
frequencydependentBirth = TRUE
and/or frequencydependentDeath = TRUE
and spPopSizes
must not be NULL and its length must be equal to the number
of possible genotypes. Here only evalGenotype
and evalAllGenotypes
make sense.
Modules are, of course, taken into account if present (i.e., fitness is specified in terms of modules, but the genotype is specified in terms of genes).
About the naming. This is the convention used: "All" means we will go over all possible genotypes. A function that ends as "Genotypes" returns only fitness effects (for backwards compatibility and because mutator effects are not always used). A function that ends as "Genotype(s)Mut" returns only the mutator effects. A function that ends as "FitAndMut" will return both fitness and mutator effects.
Functions that return ONLY fitness or ONLY mutator effects are kept as separate functions because they free you from specifyin mutator/fitness effects if you only want to play with one of them.
Ramon Diaz-Uriarte, Sergio Sanchez Carrillo, Juan Antonio Miguel Gonzalez
# A three-gene epistasis example sa <- 0.1 sb <- 0.15 sc <- 0.2 sab <- 0.3 sbc <- -0.25 sabc <- 0.4 sac <- (1 + sa) * (1 + sc) - 1 E3A <- allFitnessEffects(epistasis = c("A:-B:-C" = sa, "-A:B:-C" = sb, "-A:-B:C" = sc, "A:B:-C" = sab, "-A:B:C" = sbc, "A:-B:C" = sac, "A : B : C" = sabc) ) evalAllGenotypes(E3A, order = FALSE, addwt = FALSE) evalAllGenotypes(E3A, order = FALSE, addwt = TRUE, model = "Bozic") evalGenotype("B, C", E3A, verbose = TRUE) ## Order effects and modules ofe2 <- allFitnessEffects(orderEffects = c("F > D" = -0.3, "D > F" = 0.4), geneToModule = c("Root" = "Root", "F" = "f1, f2, f3", "D" = "d1, d2") ) evalAllGenotypes(ofe2, order = TRUE, max = 325)[1:15, ] ## Next two are identical evalGenotype("d1 > d2 > f3", ofe2, verbose = TRUE) evalGenotype("d1 , d2 , f3", ofe2, verbose = TRUE) ## This is different evalGenotype("f3 , d1 , d2", ofe2, verbose = TRUE) ## but identical to this one evalGenotype("f3 > d1 > d2", ofe2, verbose = TRUE) ## Restrictions in mutations as a graph. Modules present. p4 <- data.frame(parent = c(rep("Root", 4), "A", "B", "D", "E", "C", "F"), child = c("A", "B", "D", "E", "C", "C", "F", "F", "G", "G"), s = c(0.01, 0.02, 0.03, 0.04, 0.1, 0.1, 0.2, 0.2, 0.3, 0.3), sh = c(rep(0, 4), c(-.9, -.9), c(-.95, -.95), c(-.99, -.99)), typeDep = c(rep("--", 4), "XMPN", "XMPN", "MN", "MN", "SM", "SM")) fp4m <- allFitnessEffects(p4, geneToModule = c("Root" = "Root", "A" = "a1", "B" = "b1, b2", "C" = "c1", "D" = "d1, d2", "E" = "e1", "F" = "f1, f2", "G" = "g1")) evalAllGenotypes(fp4m, order = FALSE, max = 1024, addwt = TRUE)[1:15, ] evalGenotype("b1, b2, e1, f2, a1", fp4m, verbose = TRUE) ## Of course, this is identical; b1 and b2 are same module ## and order is not present here evalGenotype("a1, b2, e1, f2", fp4m, verbose = TRUE) evalGenotype("a1 > b2 > e1 > f2", fp4m, verbose = TRUE) ## We can use the exact same integer numeric id codes as in the ## fitnessEffects geneModule component: evalGenotype(c(1L, 3L, 7L, 9L), fp4m, verbose = TRUE) ## Epistasis for fitness and simple mutator effects fe <- allFitnessEffects(epistasis = c("a : b" = 0.3, "b : c" = 0.5), noIntGenes = c("e" = 0.1)) fm <- allMutatorEffects(noIntGenes = c("a" = 10, "c" = 5)) evalAllGenotypesFitAndMut(fe, fm, order = "FALSE") ## Simple fitness effects (noIntGenes) and modules ## for mutators fe2 <- allFitnessEffects(noIntGenes = c(a1 = 0.1, a2 = 0.2, b1 = 0.01, b2 = 0.3, b3 = 0.2, c1 = 0.3, c2 = -0.2)) fm2 <- allMutatorEffects(epistasis = c("A" = 5, "B" = 10, "C" = 3), geneToModule = c("A" = "a1, a2", "B" = "b1, b2, b3", "C" = "c1, c2")) ## Show only all the fitness effects evalAllGenotypes(fe2, order = FALSE) ## Show only all mutator effects evalAllGenotypesMut(fm2) ## Show all fitness and mutator evalAllGenotypesFitAndMut(fe2, fm2, order = FALSE) ## This is probably not what you want try(evalAllGenotypesMut(fe2)) ## ... nor this try(evalAllGenotypes(fm2)) ## Show the fitness effect of a specific genotype evalGenotype("a1, c2", fe2, verbose = TRUE) ## Show the mutator effect of a specific genotype evalGenotypeMut("a1, c2", fm2, verbose = TRUE) ## Fitness and mutator of a specific genotype evalGenotypeFitAndMut("a1, c2", fe2, fm2, verbose = TRUE) ## This is probably not what you want try(evalGenotype("a1, c2", fm2, verbose = TRUE)) ## Not what you want either try(evalGenotypeMut("a1, c2", fe2, verbose = TRUE)) ## Frequency dependent birth example r <- data.frame(Genotype = c("WT", "A", "B", "A, B"), Birth = c("1 + 1.5*f_", "5 + 3*(f_A + f_B + f_A_B)", "5 + 3*(f_A + f_B + f_A_B)", "7 + 5*(f_A + f_B + f_A_B)"), stringsAsFactors = FALSE) afe <- allFitnessEffects(genotFitness = r, frequencyDependentBirth = TRUE, frequencyType = "rel") evalAllGenotypes(afe, spPopSizes = c(5000, 2500, 2500, 500))
# A three-gene epistasis example sa <- 0.1 sb <- 0.15 sc <- 0.2 sab <- 0.3 sbc <- -0.25 sabc <- 0.4 sac <- (1 + sa) * (1 + sc) - 1 E3A <- allFitnessEffects(epistasis = c("A:-B:-C" = sa, "-A:B:-C" = sb, "-A:-B:C" = sc, "A:B:-C" = sab, "-A:B:C" = sbc, "A:-B:C" = sac, "A : B : C" = sabc) ) evalAllGenotypes(E3A, order = FALSE, addwt = FALSE) evalAllGenotypes(E3A, order = FALSE, addwt = TRUE, model = "Bozic") evalGenotype("B, C", E3A, verbose = TRUE) ## Order effects and modules ofe2 <- allFitnessEffects(orderEffects = c("F > D" = -0.3, "D > F" = 0.4), geneToModule = c("Root" = "Root", "F" = "f1, f2, f3", "D" = "d1, d2") ) evalAllGenotypes(ofe2, order = TRUE, max = 325)[1:15, ] ## Next two are identical evalGenotype("d1 > d2 > f3", ofe2, verbose = TRUE) evalGenotype("d1 , d2 , f3", ofe2, verbose = TRUE) ## This is different evalGenotype("f3 , d1 , d2", ofe2, verbose = TRUE) ## but identical to this one evalGenotype("f3 > d1 > d2", ofe2, verbose = TRUE) ## Restrictions in mutations as a graph. Modules present. p4 <- data.frame(parent = c(rep("Root", 4), "A", "B", "D", "E", "C", "F"), child = c("A", "B", "D", "E", "C", "C", "F", "F", "G", "G"), s = c(0.01, 0.02, 0.03, 0.04, 0.1, 0.1, 0.2, 0.2, 0.3, 0.3), sh = c(rep(0, 4), c(-.9, -.9), c(-.95, -.95), c(-.99, -.99)), typeDep = c(rep("--", 4), "XMPN", "XMPN", "MN", "MN", "SM", "SM")) fp4m <- allFitnessEffects(p4, geneToModule = c("Root" = "Root", "A" = "a1", "B" = "b1, b2", "C" = "c1", "D" = "d1, d2", "E" = "e1", "F" = "f1, f2", "G" = "g1")) evalAllGenotypes(fp4m, order = FALSE, max = 1024, addwt = TRUE)[1:15, ] evalGenotype("b1, b2, e1, f2, a1", fp4m, verbose = TRUE) ## Of course, this is identical; b1 and b2 are same module ## and order is not present here evalGenotype("a1, b2, e1, f2", fp4m, verbose = TRUE) evalGenotype("a1 > b2 > e1 > f2", fp4m, verbose = TRUE) ## We can use the exact same integer numeric id codes as in the ## fitnessEffects geneModule component: evalGenotype(c(1L, 3L, 7L, 9L), fp4m, verbose = TRUE) ## Epistasis for fitness and simple mutator effects fe <- allFitnessEffects(epistasis = c("a : b" = 0.3, "b : c" = 0.5), noIntGenes = c("e" = 0.1)) fm <- allMutatorEffects(noIntGenes = c("a" = 10, "c" = 5)) evalAllGenotypesFitAndMut(fe, fm, order = "FALSE") ## Simple fitness effects (noIntGenes) and modules ## for mutators fe2 <- allFitnessEffects(noIntGenes = c(a1 = 0.1, a2 = 0.2, b1 = 0.01, b2 = 0.3, b3 = 0.2, c1 = 0.3, c2 = -0.2)) fm2 <- allMutatorEffects(epistasis = c("A" = 5, "B" = 10, "C" = 3), geneToModule = c("A" = "a1, a2", "B" = "b1, b2, b3", "C" = "c1, c2")) ## Show only all the fitness effects evalAllGenotypes(fe2, order = FALSE) ## Show only all mutator effects evalAllGenotypesMut(fm2) ## Show all fitness and mutator evalAllGenotypesFitAndMut(fe2, fm2, order = FALSE) ## This is probably not what you want try(evalAllGenotypesMut(fe2)) ## ... nor this try(evalAllGenotypes(fm2)) ## Show the fitness effect of a specific genotype evalGenotype("a1, c2", fe2, verbose = TRUE) ## Show the mutator effect of a specific genotype evalGenotypeMut("a1, c2", fm2, verbose = TRUE) ## Fitness and mutator of a specific genotype evalGenotypeFitAndMut("a1, c2", fe2, fm2, verbose = TRUE) ## This is probably not what you want try(evalGenotype("a1, c2", fm2, verbose = TRUE)) ## Not what you want either try(evalGenotypeMut("a1, c2", fe2, verbose = TRUE)) ## Frequency dependent birth example r <- data.frame(Genotype = c("WT", "A", "B", "A, B"), Birth = c("1 + 1.5*f_", "5 + 3*(f_A + f_B + f_A_B)", "5 + 3*(f_A + f_B + f_A_B)", "7 + 5*(f_A + f_B + f_A_B)"), stringsAsFactors = FALSE) afe <- allFitnessEffects(genotFitness = r, frequencyDependentBirth = TRUE, frequencyType = "rel") evalAllGenotypes(afe, spPopSizes = c(5000, 2500, 2500, 500))
An example where there are intermediate missing drivers. This is fictitious and I've never seen it. But it is here to check plots work even if there are no cases of some intermediate value of drivers (2 in this case). b11 contains the full, original data, whereas b12 contains the same data where there are no cases with exactly 2 drivers.
data("ex_missing_drivers_b11"); data("ex_missing_drivers_b12")
data("ex_missing_drivers_b11"); data("ex_missing_drivers_b12")
Two objects of class "oncosimul".
data(ex_missing_drivers_b11) plot(ex_missing_drivers_b11, type = "line") dev.new() data(ex_missing_drivers_b12) plot(ex_missing_drivers_b12, type = "line")
data(ex_missing_drivers_b11) plot(ex_missing_drivers_b11, type = "line") dev.new() data(ex_missing_drivers_b12) plot(ex_missing_drivers_b12, type = "line")
Some example posets. For simplicity, all the posets are in a single list. You can access each poset by accessing each element of the list. The first digit or pair of digits denotes the number of nodes.
Poset 1101 is the same as the one in Gerstung et al., 2009 (figure
2A, poset 2). Poset 701 is the same as the one in Gerstung et al.,
2011 (figure 2B, left, the pancreatic cancer poset). Those posets
were entered manually at the command line: see poset
.
data("examplePosets")
data("examplePosets")
The format is: List of 13 $ p1101: num [1:10, 1:2] 1 1 3 3 3 7 7 8 9 10 ... $ p1102: num [1:9, 1:2] 1 1 3 3 3 7 7 9 10 2 ... $ p1103: num [1:9, 1:2] 1 1 3 3 3 7 7 8 10 2 ... $ p1104: num [1:9, 1:2] 1 1 3 3 7 7 9 2 10 2 ... $ p901 : num [1:8, 1:2] 1 2 4 5 7 8 5 1 2 3 ... $ p902 : num [1:6, 1:2] 1 2 4 5 7 5 2 3 5 6 ... $ p903 : num [1:6, 1:2] 1 2 5 7 8 1 2 3 6 8 ... $ p904 : num [1:6, 1:2] 1 4 5 5 1 7 2 5 8 6 ... $ p701 : num [1:9, 1:2] 1 1 1 1 2 3 4 4 5 2 ... $ p702 : num [1:6, 1:2] 1 1 1 1 2 4 2 3 4 5 ... $ p703 : num [1:6, 1:2] 1 1 1 1 3 5 2 3 4 5 ... $ p704 : num [1:6, 1:2] 1 1 1 1 4 5 2 3 4 5 ... $ p705 : num [1:6, 1:2] 1 2 1 1 1 2 2 5 4 6 ...
Gerstung et al., 2009. Quantifying cancer progression with conjunctive Bayesian networks. Bioinformatics, 21: 2809–2815.
Gerstung et al., 2011. The Temporal Order of Genetic and Pathway Alterations in Tumorigenesis. PLoS ONE, 6.
data(examplePosets) ## Plot all of them par(mfrow = c(3, 5)) invisible(sapply(names(examplePosets), function(x) {plotPoset(examplePosets[[x]], main = x, box = TRUE)}))
data(examplePosets) ## Plot all of them par(mfrow = c(3, 5)) invisible(sapply(names(examplePosets), function(x) {plotPoset(examplePosets[[x]], main = x, box = TRUE)}))
Some examples of fitnessEffects objects. This is a collection, in a list,
of most of the fitnessEffects created (using
allFitnessEffects
) for the vignette. See the vignette
for descriptions and references.
data("examplesFitnessEffects")
data("examplesFitnessEffects")
The format is a list of fitnessEffects objects.
data(examplesFitnessEffects) plot(examplesFitnessEffects[["fea"]]) evalAllGenotypes(examplesFitnessEffects[["cbn1"]], order = FALSE)
data(examplesFitnessEffects) plot(examplesFitnessEffects[["fea"]]) evalAllGenotypes(examplesFitnessEffects[["cbn1"]], order = FALSE)
Simulations shown in the vignette. Since running them can take a few seconds, we have pre-run them, and stored the results.
data(woAntibS)
data(woAntibS)
For output from runs of oncoSimulIndiv
a list of classes
oncosimul
and oncosimul2
.
data(woAntibS) plot(woAntibS, show = "genotypes", type = "line", col = c("black", "green", "red"))
data(woAntibS) plot(woAntibS, show = "genotypes", type = "line", col = c("black", "green", "red"))
Trimmed output from the simulation mcfLs in the vignette. This is a somewhat long run, and we have stored here the object (after trimming the Genotype matrix) to allow for plotting it.
data("mcfLs")
data("mcfLs")
An object of class "oncosimul2". A list.
## Not run: data(mcfLs) plot(mcfLs, addtot = TRUE, lwdClone = 0.9, log = "") summary(mcfLs) ## End(Not run)
## Not run: data(mcfLs) plot(mcfLs, addtot = TRUE, lwdClone = 0.9, log = "") summary(mcfLs) ## End(Not run)
Simulate tumor progression including possible restrictions in the order of driver mutations. Optionally add passenger mutations. When used in frequency dependent fitness situation, only fitness effects are allowed. Simulation is done using the BNB algorithm of Mather et al., 2012.
oncoSimulIndiv(fp, model = "Exp", numPassengers = 0, mu = 1e-6, muEF = NULL, detectionSize = 1e8, detectionDrivers = 4, detectionProb = NA, sampleEvery = ifelse(model %in% c("Bozic", "Exp"), 1, 0.025), initSize = 500, s = 0.1, sh = -1, K = sum(initSize)/(exp(1) - 1), keepEvery = sampleEvery, minDetectDrvCloneSz = "auto", extraTime = 0, finalTime = 0.25 * 25 * 365, onlyCancer = FALSE, keepPhylog = FALSE, mutationPropGrowth = ifelse(model == "Bozic", FALSE, TRUE), max.memory = 2000, max.wall.time = 200, max.num.tries = 500, errorHitWallTime = TRUE, errorHitMaxTries = TRUE, verbosity = 0, initMutant = NULL, AND_DrvProbExit = FALSE, fixation = NULL, seed = NULL, interventions = NULL, userVars = NULL, rules = NULL) oncoSimulPop(Nindiv, fp, model = "Exp", numPassengers = 0, mu = 1e-6, muEF = NULL, detectionSize = 1e8, detectionDrivers = 4, detectionProb = NA, sampleEvery = ifelse(model %in% c("Bozic", "Exp"), 1, 0.025), initSize = 500, s = 0.1, sh = -1, K = sum(initSize)/(exp(1) - 1), keepEvery = sampleEvery, minDetectDrvCloneSz = "auto", extraTime = 0, finalTime = 0.25 * 25 * 365, onlyCancer = FALSE, keepPhylog = FALSE, mutationPropGrowth = ifelse(model == "Bozic", FALSE, TRUE), max.memory = 2000, max.wall.time = 200, max.num.tries = 500, errorHitWallTime = TRUE, errorHitMaxTries = TRUE, initMutant = NULL, AND_DrvProbExit = FALSE, fixation = NULL, verbosity = 0, mc.cores = detectCores(), seed = "auto", interventions = NULL, userVars = NULL, rules = NULL) oncoSimulSample(Nindiv, fp, model = "Exp", numPassengers = 0, mu = 1e-6, muEF = NULL, detectionSize = round(runif(Nindiv, 1e5, 1e8)), detectionDrivers = { if(inherits(fp, "fitnessEffects")) { if(length(fp$drv)) { nd <- (2: round(0.75 * length(fp$drv))) } else { nd <- 9e6 } } else { nd <- (2 : round(0.75 * max(fp))) } if (length(nd) == 1) nd <- c(nd, nd) sample(nd, Nindiv, replace = TRUE) }, detectionProb = NA, sampleEvery = ifelse(model %in% c("Bozic", "Exp"), 1, 0.025), initSize = 500, s = 0.1, sh = -1, K = sum(initSize)/(exp(1) - 1), minDetectDrvCloneSz = "auto", extraTime = 0, finalTime = 0.25 * 25 * 365, onlyCancer = FALSE, keepPhylog = FALSE, mutationPropGrowth = ifelse(model == "Bozic", FALSE, TRUE), max.memory = 2000, max.wall.time.total = 600, max.num.tries.total = 500 * Nindiv, typeSample = "whole", thresholdWhole = 0.5, initMutant = NULL, AND_DrvProbExit = FALSE, fixation = NULL, verbosity = 1, showProgress = FALSE, seed = "auto", interventions = NULL, userVars = NULL, rules = NULL)
oncoSimulIndiv(fp, model = "Exp", numPassengers = 0, mu = 1e-6, muEF = NULL, detectionSize = 1e8, detectionDrivers = 4, detectionProb = NA, sampleEvery = ifelse(model %in% c("Bozic", "Exp"), 1, 0.025), initSize = 500, s = 0.1, sh = -1, K = sum(initSize)/(exp(1) - 1), keepEvery = sampleEvery, minDetectDrvCloneSz = "auto", extraTime = 0, finalTime = 0.25 * 25 * 365, onlyCancer = FALSE, keepPhylog = FALSE, mutationPropGrowth = ifelse(model == "Bozic", FALSE, TRUE), max.memory = 2000, max.wall.time = 200, max.num.tries = 500, errorHitWallTime = TRUE, errorHitMaxTries = TRUE, verbosity = 0, initMutant = NULL, AND_DrvProbExit = FALSE, fixation = NULL, seed = NULL, interventions = NULL, userVars = NULL, rules = NULL) oncoSimulPop(Nindiv, fp, model = "Exp", numPassengers = 0, mu = 1e-6, muEF = NULL, detectionSize = 1e8, detectionDrivers = 4, detectionProb = NA, sampleEvery = ifelse(model %in% c("Bozic", "Exp"), 1, 0.025), initSize = 500, s = 0.1, sh = -1, K = sum(initSize)/(exp(1) - 1), keepEvery = sampleEvery, minDetectDrvCloneSz = "auto", extraTime = 0, finalTime = 0.25 * 25 * 365, onlyCancer = FALSE, keepPhylog = FALSE, mutationPropGrowth = ifelse(model == "Bozic", FALSE, TRUE), max.memory = 2000, max.wall.time = 200, max.num.tries = 500, errorHitWallTime = TRUE, errorHitMaxTries = TRUE, initMutant = NULL, AND_DrvProbExit = FALSE, fixation = NULL, verbosity = 0, mc.cores = detectCores(), seed = "auto", interventions = NULL, userVars = NULL, rules = NULL) oncoSimulSample(Nindiv, fp, model = "Exp", numPassengers = 0, mu = 1e-6, muEF = NULL, detectionSize = round(runif(Nindiv, 1e5, 1e8)), detectionDrivers = { if(inherits(fp, "fitnessEffects")) { if(length(fp$drv)) { nd <- (2: round(0.75 * length(fp$drv))) } else { nd <- 9e6 } } else { nd <- (2 : round(0.75 * max(fp))) } if (length(nd) == 1) nd <- c(nd, nd) sample(nd, Nindiv, replace = TRUE) }, detectionProb = NA, sampleEvery = ifelse(model %in% c("Bozic", "Exp"), 1, 0.025), initSize = 500, s = 0.1, sh = -1, K = sum(initSize)/(exp(1) - 1), minDetectDrvCloneSz = "auto", extraTime = 0, finalTime = 0.25 * 25 * 365, onlyCancer = FALSE, keepPhylog = FALSE, mutationPropGrowth = ifelse(model == "Bozic", FALSE, TRUE), max.memory = 2000, max.wall.time.total = 600, max.num.tries.total = 500 * Nindiv, typeSample = "whole", thresholdWhole = 0.5, initMutant = NULL, AND_DrvProbExit = FALSE, fixation = NULL, verbosity = 1, showProgress = FALSE, seed = "auto", interventions = NULL, userVars = NULL, rules = NULL)
Nindiv |
Number of individuals or number of different trajectories to simulate. |
fp |
Either a poset that specifies the order restrictions (see
Other arguments below (s, sh, numPassengers) make sense only if you use a poset, as they are included in the fitnessEffects object. |
model |
One of "Bozic", "Exp", "Arb", "McFarlandLog", "McFarlandLogD"
(the last two can be abbreviated to "McFL" and "McFLD",
respectively). The default is "Exp". (See vignette for the difference
between "McFL" and "McFLD": in the former, death rate = |
numPassengers |
This has no effect if you use the If you use the specification of v.1., the number of passenger genes. Note that using v.1 the total number of genes (drivers plus passengers) must be smaller than 64. All driver genes should be included in the poset (even if they depend on no one and no one depends on them), and will be numbered from 1 to the total number of driver genes. Thus, passenger genes will be numbered from (number of driver genes + 1):(number of drivers + number of passengers). |
mu |
Mutation rate. Can be a single value or a named vector. If a single
value, all genes will have the same mutation rate. If a named vector,
the entries in the vector specify the gene-specific mutation rate. If
you pass a vector, it must be named, and it must have entries for all
the genes in the fitness specification. Passing a vector is only
available when using fitnessEffects objects for fitness specification.
Mutation rates |
muEF |
Mutator effects. A mutatorEffects object as obtained from
All the genes specified in If you use mutator effects you must also use fitnessEffects in
|
detectionSize |
What is the minimal number of cells for cancer
to be detected. For If set to NA, |
detectionDrivers |
The minimal number of drivers (not modules,
drivers, whether or not they are from the same module) present in any
clone for cancer to be detected. For For If set to NA, |
detectionProb |
Vector of arguments for the mechanism where
probability of detection depends on size. If
If you only provide some of the elements (except for the pair This option can not be used with v.1 objects. |
sampleEvery |
How often the whole population is sampled. This is not the same as the
interval between successive samples that are kept or stored (for that,
see For very fast growing clones, you might need to have a small value here to minimize possible numerical problems (such as huge increase in population size between two successive samples that can then lead to problems for random number generators). Likewise, for models with density dependence (such as McF) this value should be very small. |
initSize |
Initial population size. If you are passing more than one
|
K |
Initial population equilibrium size in the McFarland models. |
keepEvery |
Time interval between successive whole population samples that are
actually stored. This must be larger or equal to If you want nice plots, set Setting |
minDetectDrvCloneSz |
A value of 0 or larger than 0 (by default equal to
The reason for this parameter is to ensure that, say, a clone with a certain number of drivers that would cause the simulation to end has not just appeared and is present in only one individual that might then immediately go extinct. This can be relevant in secenarios such as the McFarland model. If See also |
extraTime |
A value larger than zero waits those many additional time periods before exiting after having reached the exit condition (population size, number of drivers). The reason for this setting is to prevent the McFL models from always
exiting at a time when one clone is increasing its size quickly (see
|
finalTime |
What is the maximum number of time units that the simulation can run. Set to NA to disable this limit. |
onlyCancer |
Return only simulations that reach cancer? If set to TRUE, only simulations that satisfy the
If The default used to be |
keepPhylog |
If TRUE, keep track of when and from which clone each clone is
created. See also |
mutationPropGrowth |
If TRUE, make mutation rate proportional to growth
rate, so clones that grow faster also mutate faster (laso have a larger
mutation rate): $mutation_rate = mu
* birth_rate$. With BNB mutation is actually "mutate after division": p.\ 1232 of Mather
et al., 2012 explains:
"(...) mutation is simply defined as the creation and subsequent departure of a single individual from the class". Thus,
if we want to have individuals of clones/genotypes/populations that divide faster to also produce more
mutants per unit time (per individual) we have to set |
initMutant |
For v.2: a string with the mutations of the initial
mutant, if any. This is the same format as for
( |
max.num.tries |
Only applies when |
max.num.tries.total |
Only applies when |
max.wall.time |
Maximum wall time for the simulation of one
individual (over all |
max.wall.time.total |
Maximum wall time for all the simulations (when using
|
errorHitMaxTries |
If TRUE (the default) a simulation that reaches
the maximum number of repetitions allowed is considered not to have
succesfully finished and, thus, an error, and no output from it will
be reported. This is often what you want.
See |
errorHitWallTime |
If TRUE (the default) a simulation that reaches the maximum wall time
is considered not to have succesfully finished and, thus, an error,
and no output from it will be reported. This is often what you
want.
See |
max.memory |
The largest size (in MB) of the matrix of Populations by Time. If it creating it would use more than this amount of memory, it is not created. This prevents you from accidentally passing parameters that will return an enormous object. |
verbosity |
If 0, run silently. Iincreasing values of verbosity provide progressively more information about intermediate steps, possible numerical notes/warnings from the C++ code, etc. Values less than 0 supress some default notes: use with care. |
typeSample |
"singleCell" (or "single") for single cell sampling, where the
probability of sampling a cell (a clone) is directly proportional to
its population size. "wholeTumor" (or "whole") for whole tumor
sampling (i.e., this is similar to a biopsy being the entire
tumor). See |
thresholdWhole |
In whole tumor sampling, whether a gene is detected as mutated depends
on thresholdWhole: a gene is considered mutated if it is altered in at
least thresholdWhole proportion of the cells in that individual. See |
mc.cores |
Number of cores to use when simulating more than one individual (i.e., when calling oncoSimulPop). |
showProgress |
If TRUE, provide information, during exection, of the individual done, and the number of attempts and time used. |
AND_DrvProbExit |
If TRUE, cancer will be considered to be reached
if both the |
fixation |
If non-NULL, a list or a vector, where each element of
is a string with a gene or a gene combination or a genotype (see
below). Simulations will stop as soon as any of the genes or gene
combinations or genotypes are fixed (i.e., reach a minimal
frequency). If you pass gene combinations or genotypes, separate genes
with commas (not '>'); this means order is not (yet?) supported. This
way of specifying gene combinations is the same as the one used for
To differentiate between gene combinations and specific genotypes,
genotypes are specified by prepending them with a "_,". For instance,
In addition to the gene combinations or genotypes themeselves, you can
add to the list or vector the named elements
Using this option with |
s |
Selection coefficient for drivers. Only relevant if using a poset as this is included in the fitnessEffects object. This will eventually be deprecated. |
sh |
Selection coefficient for drivers with restrictions not satisfied. A value of 0 means there are no penalties for a driver appearing in a clone when its restrictions are not satisfied. To specify "sh=Inf" (in Diaz-Uriarte, 2015) use sh = -1. Only relevant if using a poset as this is included in the fitnessEffects object. This will eventually be deprecated. |
seed |
The seed for the C++ PRNG. You can pass a value. If you set
it to NULL, then a seed will be generated in R and passed to C++. If
you set it to "auto", then if you are using v.1, the behavior is the
same as if you set it to NULL (a seed will be generated in R and
passed to C++) but if you are using v.2, a random seed will be
produced in C++.
If you need reproducibility, either pass a value or set it to NULL (setting
it to NULL will make the C++ seed reproducible if you use the same
seed in R via When using oncoSimulPop, if you want reproducibility, you might want
to, in addition to setting |
interventions |
This has no effect if you do not specify Use this argument in case you want to intervene in the simulation. With interventions, you can affect the total population size, or just some genotype-specific population. You can complicate it as much as you want, or keep it simple, it is really up to you. Formally, interventions must be a list of lists, where each "sub-list" must have the following fields: * ID: The identifier of the intervention, must be unique. * Trigger: The situation in the simulation that triggers/activates the intervention. * What Happens: "What happens" in the simulation. Basically, once the trigger is satisfied, this defines how the population is affected by the intervention. Please see the vignette for details about the differences between when interventions that affect a single genotype and those that affect the complete population occur. * Periodicity: Defines the periodicity of the intervention. * Repetitions: Defines the maximum repetitions of each intervention in the simulation. |
userVars |
This has no effect if you do not specify Use this argument in case you want to define arbitrary variables that depend on other simulation values. With the yser Variables you can simulate Adaptive therapy by using this defined variables in the intervention's whatHappens definition, or simply get more detailed insight by defining some interesting values you desire as an output. Formally, userVars must be a list of lists, where each "sub-list" must have the following fields: * Name: The name that identifies the new variable, must be unique. * Value: The initial numeric value of the variable. You must define the |
rules |
This has no effect if you do not specify Use this argument in order to determine how the defined user variebles will be modified during the simulation. You can use any arbitrarily complex expression depending on other simulation parameters such as T, N, Genotype populations or genotype rates. Formally, rules must be a list of lists, where each "sub-list" must have the following fields: * ID: The identifier of the rule, must be unique. * Condition: The situation in the simulation that triggers/activates the user variable modification. * Action: The action that will take place once the condition is true. This defines wich user variables will be modified and the expression that defines the new values for them. |
The basic simulation algorithm implemented is the BNB one of Mather et al., 2012, where I have added modifications to fitness based on the restrictions in the order of mutations.
Full details about the algorithm are provided in Mather et al., 2012. The evolutionary models, including references, and the rest of the parameters are explained in Diaz-Uriarte, 2014, especially in the Supplementary Material. The model called "Bozic" is based on Bozic et al., 2010, and the model called "McFarland" in McFarland et al., 2013.
oncoSimulPop simply calls oncoSimulIndiv multiple times. When run on POSIX systems, it can use multiple cores (via mclapply).
The summary
methods for these classes return some of the return
values (see next) as a one-row (for class oncosimul) or multiple row
(for class oncosimulpop) data frame. The print
methods for
these classes simply print the summary.
Changing options errorHitMaxTries
and errorHitWallTime
can be useful when conducting many simulations, as in the call to
oncoSimulPop
: setting them to TRUE means nothing is recorded
for those simulations where ending conditions are not reached but
setting them to FALSE would allow you to record the output; this would
potentially result in a mixture where some simulations would not have
reached the ending condition, but this might sometimes be what you
want. Note, however, that oncoSimulSample
always has both them
to TRUE, as it could not be otherwise.
GenotypesWDistinctOrderEff
provides the information about
order effects that is missing from Genotypes
. When there are
order effects, the Genotypes
matrix can contain genotypes
that are not distinguishable. Suppose there are two genes, the first
and the second. In the Genotype
output you can get two
columns where there is a 1 in both genes: those two columns
correspond to the two possible orders (first gene mutated first, or
first gene mutated after the
second). GenotypesWDistinctOrderEff
disambiguates this. The
same is done by GenotypesLabels
; this is easier to decode for
a human (a string of gene labels) but a little bit harder to parse
automatically. Note that when you use the default print method for
this object, you get, among others, a two-column display with the
GenotypeLabels
information. When order matters, a genotype
shown as “x > y _ z” means that a mutation in “x” happened
before a mutation in “y”; there is also a mutation in “z” (which
could have happened before or after either of “x” or “y”), but
“z” is a gene for which order does not matter. When order does not
matter, a comma "," separates the identifiers of mutated genes.
Detection of cancer can be a deterministic process, where cancer is
always detected (and, thus, simulation ended) when certain conditions
are met (detectionSize
, detectionDrivers
,
fixation
). Alternatively, it can be stochastic process where
probability of detection depends on size. Every so often (see below)
we assess population size, and detect cancer or not probabilistically
(comparing the probability of detection for that size with a random
uniform number). Probability of detection changes with population size
according to the function
.
You can pass cPDetect
manually (you will need to set n2
and p2
to NA). However, it might be more intuitive to specify
the pair n2
, p2
, such that the probability of detection
is for population size
(and from that pair we solve
for the value of
cPDetect
). How often do we check? That is
controlled by checkSizePEvery
, the (minimal) time between
successive checks (from among the sampling times given by
sampleEvery
: the interval between successive assessments will
be the smallest multiple integer of sampleEvery
that is
larger than checkSizePEvery
—see vignette for details).
checkSizePEvery
has, by default, a different (and much larger)
value than sampleEvery
both to allow to examine the effects of
sampling, and to avoid many costly random number generations.
Please note that detectionProb
is NOT available with version 1
objects.
For oncoSimulIndiv
a list, of class "oncosimul", with the
following components:
pops.by.time |
A matrix of the population sizes of the clones, with clones in columns and time in row. Not all clones are shown here, only those that were present in at least on of the keepEvery samples. |
NumClones |
Total number of clones in the above matrix. This is not the total number of distinct clones that have appeared over all simulations (which is likely to be larger or much larger). |
TotalPopSize |
Total population size at the end. |
Genotypes |
A matrix of genotypes. For each of the clones in the pops.by.time matrix, its genotype, with a 0 if the gene is not mutated and a 1 if it is mutated. |
MaxNumDrivers |
The largest number of mutated driver genes ever seen in the simulation in any clone. |
MaxDriversLast |
The largest number of mutated drivers in any clone at the end of the simulation. |
NumDriversLargestPop |
The number of mutated driver genes in the clone with largest population size. |
LargestClone |
Population size of the clone with largest number of population size. |
PropLargestPopLast |
Ratio of LargestClone/TotalPopSize |
FinalTime |
The time (in time units) at the end of the simulation. |
NumIter |
The number of iterations of the BNB algorithm. |
HittedWallTime |
TRUE if we reached the limit of max.wall.time. FALSE otherwise. |
TotalPresentDrivers |
The total number of mutated driver genes,
whether or not in the same clone. The number of elements in
|
CountByDriver |
A vector of length number of drivers, with the count of the number of clones that have that driver mutated. |
OccurringDrivers |
The actual number of drivers mutated. |
PerSampleStats |
A 5 column matrix with a row for each sampling period. The columns are: total population size, population size of the largest clone, the ratio of the two, the largest number of drivers in any clone, and the number of drivers in the clone with the largest population size. |
other |
A list that contains statistics for an estimate of the simulation error when using the McFarland model as well as other statistics. For the McFarland model, the relevant value is errorMF, which is -99 unless in the McFarland model. For the McFarland model it is the largest difference of successive death rates. The entries named minDMratio and minBMratio are the smallest ratio, over all simulations, of death rate to mutation rate and birth rate to mutation rate, respectively. The BNB algorithm thrives when those are large. |
For oncoSimulPop
a list of length Nindiv
, and of class
"oncosimulpop"
, where each element of the list is itself a
list, of class oncosimul
, with components as described above.
In v.2, the output is of both class "oncosimul" and "oncosimul2". The oncoSimulIndiv return object differs in
GenotypesWDistinctOrderEff |
A list
of vectors, where each vector corresponds to a genotype in the
|
GenotypesLabels |
The genotypes, as character vectors with the original labels provided (i.e., not the integer codes). As before, mutated genes, for those where order matters, come first, and are separated by the rest by a "_". See details. |
OccurringDrivers |
This is the same as in v.1, but we use the labels, not the numeric id codes. Of course, if you entered integers as labels for the genes, you will see numbers (however, as a character string). |
Please, note that the meaning of the fitness effects in the McFarland model is not the same as in the original paper; the fitness coefficients are transformed to allow for a simpler fitness function as a product of terms. This differs with respect to v.1. See the vignette for details.
Ramon Diaz-Uriarte
Bozic, I., et al., (2010). Accumulation of driver and passenger mutations during tumor progression. Proceedings of the National Academy of Sciences of the United States of America\/, 107, 18545–18550.
Diaz-Uriarte, R. (2015). Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling http://www.biomedcentral.com/1471-2105/16/41/abstract
Gerstung et al., 2011. The Temporal Order of Genetic and Pathway Alterations in Tumorigenesis. PLoS ONE, 6.
McFarland, C.~D. et al. (2013). Impact of deleterious passenger mutations on cancer progression. Proceedings of the National Academy of Sciences of the United States of America\/, 110(8), 2910–5.
Mather, W.~H., Hasty, J., and Tsimring, L.~S. (2012). Fast stochastic algorithm for simulating evolutionary population dynamics. Bioinformatics (Oxford, England)\/, 28(9), 1230–1238.
plot.oncosimul
,
samplePop
, allFitnessEffects
#### A model similar to the one in McFarland. We use 270 genes. set.seed(456) nd <- 70 np <- 200 s <- 0.1 sp <- 1e-3 spp <- -sp/(1 + sp) mcf1 <- allFitnessEffects(noIntGenes = c(rep(s, nd), rep(spp, np)), drv = seq.int(nd)) mcf1s <- oncoSimulIndiv(mcf1, model = "McFL", mu = 1e-7, detectionSize = 1e8, detectionDrivers = 100, sampleEvery = 0.02, keepEvery = 2, initSize = 2000, finalTime = 1000, onlyCancer = FALSE) plot(mcf1s, addtot = TRUE, lwdClone = 0.6, log = "") summary(mcf1s) plot(mcf1s) #### Order effects with modules, and 5 genes without interactions #### with fitness effects from an exponential distribution oi <- allFitnessEffects(orderEffects = c("F > D" = -0.3, "D > F" = 0.4), noIntGenes = rexp(5, 10), geneToModule = c("Root" = "Root", "F" = "f1, f2, f3", "D" = "d1, d2") ) oiI1 <- oncoSimulIndiv(oi, model = "Exp") oiI1$GenotypesLabels oiI1 ## note the order and separation by "_" oiP1 <- oncoSimulPop(2, oi, keepEvery = 10, mc.cores = 2) summary(oiP1) ## Even if order exists, this cannot reflect it; ## G1 to G10 are d1, d2, f1..,f3, and the 5 genes without ## interaction samplePop(oiP1) oiS1 <- oncoSimulSample(2, oi) ## The output contains only the summary of the runs AND ## the sample: oiS1 ## And their sizes do differ object.size(oiS1) object.size(oiP1) ######## Using an extended poset for pancreatic cancer from Gerstung et al. ### (s and sh are made up for the example; only the structure ### and names come from Gerstung et al.) pancr <- allFitnessEffects(data.frame(parent = c("Root", rep("KRAS", 4), "SMAD4", "CDNK2A", "TP53", "TP53", "MLL3"), child = c("KRAS","SMAD4", "CDNK2A", "TP53", "MLL3", rep("PXDN", 3), rep("TGFBR2", 2)), s = 0.05, sh = -0.3, typeDep = "MN")) plot(pancr) ### Use an exponential growth model (pancr1 <- oncoSimulIndiv(pancr, model = "Exp")) summary(pancr1) plot(pancr1) ## Pop and Sample pancrPop <- oncoSimulPop(2, pancr, keepEvery = 10, mc.cores = 2) summary(pancrPop) (pancrSPop <- samplePop(pancrPop)) (pancrSamp <- oncoSimulSample(2, pancr)) ## Not run: ## Using gene-specific mutation rates muv <- c("U" = 1e-3, "z" = 1e-7, "e" = 1e-6, "m" = 1e-5, "D" = 1e-4) ni <- rep(0.01, 5) names(ni) <- names(muv) femuv <- allFitnessEffects(noIntGenes = ni) oncoSimulIndiv(femuv, mu = muv) ## End(Not run) ######### Frequency dependent birth examples ## An example with cooperation. Presence of WT favours all clones ## and all clones have a positive effect on themselves genofit <- data.frame(A = c(0, 1, 0, 1), B = c(0, 0, 1, 1), Birth = c("3 + 5*f_", "3 + 5*(f_ + f_A)", "3 + 5*(f_ + f_B)", "5 + 6*(f_ + f_A_B)")) afe <- allFitnessEffects(genotFitness = genofit, frequencyDependentBirth = TRUE) ## Use gene-specific mutation rates and start the simulation from ## 5000 WT and 1000 A mutants. osi <- oncoSimulIndiv(afe, model = "McFL", onlyCancer = FALSE, finalTime = 50, mu = c("A" = 1e-6, B = 1e-8), initMutant = c("WT", "A"), initSize = c(5000, 1000), keepPhylog = FALSE, seed = NULL, errorHitMaxTries = FALSE, errorHitWallTime = FALSE) osi plot(osi, show = "genotypes", type = "line") ## Not run: ## This can be slow osp <- oncoSimulPop(5, afe, model = "McFL", initSize = 5000, mu = 1e-6, keepEvery = 5, mc.cores = 2, finalTime = 5000) sp <- samplePop(osp) sp ## End(Not run) ## A little bit more complex example situation. WT favours clones A and B. A and ## B compete with each other. Presence of A and B favours clone A, B. ## Not run: ## This can be slow genofit <- data.frame(A = c(0, 1, 0, 1), B = c(0, 0, 1, 1), Birth = c("3 + 5*f_", "3 + 5*(f_ + f_1 - f_2)", "3 + 5*(f_ + f_2 - f_1)", "5 + 6*(f_1 + f_2 + f_1_2)")) afe <- allFitnessEffects(genotFitness = genofit, frequencyDependentBirth = TRUE, frequencyType = "rel") osi <- oncoSimulIndiv(afe, model = "McFL", onlyCancer = FALSE, finalTime = 200, mu = 1e-6, initSize = 5000, keepPhylog = FALSE, seed = NULL, errorHitMaxTries = FALSE, errorHitWallTime = FALSE) osi plot(osi, show = "genotypes", type = "line") ## End(Not run) ## Not run: ## This can be slow osp <- oncoSimulPop(5, afe, model = "McFL", initSize = 5000, onlyCancer = FALSE, mu = 1e-6, keepEvery = 5, mc.cores = 2) summary(osp) sp <- samplePop(osp) sp oss <- oncoSimulSample(5, afe, model = "McFL", initSize = 5000, mu = 1e-6, finalTime = 5000, verbosity = 0) oss ## End(Not run) ## Reinitialize the RNG set.seed(NULL)
#### A model similar to the one in McFarland. We use 270 genes. set.seed(456) nd <- 70 np <- 200 s <- 0.1 sp <- 1e-3 spp <- -sp/(1 + sp) mcf1 <- allFitnessEffects(noIntGenes = c(rep(s, nd), rep(spp, np)), drv = seq.int(nd)) mcf1s <- oncoSimulIndiv(mcf1, model = "McFL", mu = 1e-7, detectionSize = 1e8, detectionDrivers = 100, sampleEvery = 0.02, keepEvery = 2, initSize = 2000, finalTime = 1000, onlyCancer = FALSE) plot(mcf1s, addtot = TRUE, lwdClone = 0.6, log = "") summary(mcf1s) plot(mcf1s) #### Order effects with modules, and 5 genes without interactions #### with fitness effects from an exponential distribution oi <- allFitnessEffects(orderEffects = c("F > D" = -0.3, "D > F" = 0.4), noIntGenes = rexp(5, 10), geneToModule = c("Root" = "Root", "F" = "f1, f2, f3", "D" = "d1, d2") ) oiI1 <- oncoSimulIndiv(oi, model = "Exp") oiI1$GenotypesLabels oiI1 ## note the order and separation by "_" oiP1 <- oncoSimulPop(2, oi, keepEvery = 10, mc.cores = 2) summary(oiP1) ## Even if order exists, this cannot reflect it; ## G1 to G10 are d1, d2, f1..,f3, and the 5 genes without ## interaction samplePop(oiP1) oiS1 <- oncoSimulSample(2, oi) ## The output contains only the summary of the runs AND ## the sample: oiS1 ## And their sizes do differ object.size(oiS1) object.size(oiP1) ######## Using an extended poset for pancreatic cancer from Gerstung et al. ### (s and sh are made up for the example; only the structure ### and names come from Gerstung et al.) pancr <- allFitnessEffects(data.frame(parent = c("Root", rep("KRAS", 4), "SMAD4", "CDNK2A", "TP53", "TP53", "MLL3"), child = c("KRAS","SMAD4", "CDNK2A", "TP53", "MLL3", rep("PXDN", 3), rep("TGFBR2", 2)), s = 0.05, sh = -0.3, typeDep = "MN")) plot(pancr) ### Use an exponential growth model (pancr1 <- oncoSimulIndiv(pancr, model = "Exp")) summary(pancr1) plot(pancr1) ## Pop and Sample pancrPop <- oncoSimulPop(2, pancr, keepEvery = 10, mc.cores = 2) summary(pancrPop) (pancrSPop <- samplePop(pancrPop)) (pancrSamp <- oncoSimulSample(2, pancr)) ## Not run: ## Using gene-specific mutation rates muv <- c("U" = 1e-3, "z" = 1e-7, "e" = 1e-6, "m" = 1e-5, "D" = 1e-4) ni <- rep(0.01, 5) names(ni) <- names(muv) femuv <- allFitnessEffects(noIntGenes = ni) oncoSimulIndiv(femuv, mu = muv) ## End(Not run) ######### Frequency dependent birth examples ## An example with cooperation. Presence of WT favours all clones ## and all clones have a positive effect on themselves genofit <- data.frame(A = c(0, 1, 0, 1), B = c(0, 0, 1, 1), Birth = c("3 + 5*f_", "3 + 5*(f_ + f_A)", "3 + 5*(f_ + f_B)", "5 + 6*(f_ + f_A_B)")) afe <- allFitnessEffects(genotFitness = genofit, frequencyDependentBirth = TRUE) ## Use gene-specific mutation rates and start the simulation from ## 5000 WT and 1000 A mutants. osi <- oncoSimulIndiv(afe, model = "McFL", onlyCancer = FALSE, finalTime = 50, mu = c("A" = 1e-6, B = 1e-8), initMutant = c("WT", "A"), initSize = c(5000, 1000), keepPhylog = FALSE, seed = NULL, errorHitMaxTries = FALSE, errorHitWallTime = FALSE) osi plot(osi, show = "genotypes", type = "line") ## Not run: ## This can be slow osp <- oncoSimulPop(5, afe, model = "McFL", initSize = 5000, mu = 1e-6, keepEvery = 5, mc.cores = 2, finalTime = 5000) sp <- samplePop(osp) sp ## End(Not run) ## A little bit more complex example situation. WT favours clones A and B. A and ## B compete with each other. Presence of A and B favours clone A, B. ## Not run: ## This can be slow genofit <- data.frame(A = c(0, 1, 0, 1), B = c(0, 0, 1, 1), Birth = c("3 + 5*f_", "3 + 5*(f_ + f_1 - f_2)", "3 + 5*(f_ + f_2 - f_1)", "5 + 6*(f_1 + f_2 + f_1_2)")) afe <- allFitnessEffects(genotFitness = genofit, frequencyDependentBirth = TRUE, frequencyType = "rel") osi <- oncoSimulIndiv(afe, model = "McFL", onlyCancer = FALSE, finalTime = 200, mu = 1e-6, initSize = 5000, keepPhylog = FALSE, seed = NULL, errorHitMaxTries = FALSE, errorHitWallTime = FALSE) osi plot(osi, show = "genotypes", type = "line") ## End(Not run) ## Not run: ## This can be slow osp <- oncoSimulPop(5, afe, model = "McFL", initSize = 5000, onlyCancer = FALSE, mu = 1e-6, keepEvery = 5, mc.cores = 2) summary(osp) sp <- samplePop(osp) sp oss <- oncoSimulSample(5, afe, model = "McFL", initSize = 5000, mu = 1e-6, finalTime = 5000, verbosity = 0) oss ## End(Not run) ## Reinitialize the RNG set.seed(NULL)
pops.by.time
component of an oncosimul
object into "long" format.
Convert the pops.by.time
component from its "wide" format (with
one column for time, and as many columns as clones/genotypes) into
"long" format, so that it can be used with other functions, for
instance for plots.
OncoSimulWide2Long(x)
OncoSimulWide2Long(x)
x |
An object of class |
A data frame with four columns: Time; Y, the number of cells (the population size); Drivers, a factor with the number of drivers of the given genotype; Genotype, the genotyp.
Ramon Diaz-Uriarte
data(examplesFitnessEffects) sm <- oncoSimulIndiv(examplesFitnessEffects$cbn1, model = "McFL", mu = 5e-7, detectionSize = 1e8, detectionDrivers = 2, sampleEvery = 0.025, keepEvery = 5, initSize = 2000, onlyCancer = FALSE) class(sm) lsm <- OncoSimulWide2Long(sm) head(lsm) summary(lsm)
data(examplesFitnessEffects) sm <- oncoSimulIndiv(examplesFitnessEffects$cbn1, model = "McFL", mu = 5e-7, detectionSize = 1e8, detectionDrivers = 2, sampleEvery = 0.025, keepEvery = 5, initSize = 2000, onlyCancer = FALSE) class(sm) lsm <- OncoSimulWide2Long(sm) head(lsm) summary(lsm)
Plot the restriction table/graph of restrictions, the epistasis, and
the order effects in a fitnessEffects object. This is not a plot of
the fitness landscape; for that, see plotFitnessLandscape
.
## S3 method for class 'fitnessEffects' plot(x, type = "graphNEL", layout = NULL, expandModules = FALSE, autofit = FALSE, scale_char = ifelse(type == "graphNEL", 1/10, 5), return_g = FALSE, lwdf = 1, ...)
## S3 method for class 'fitnessEffects' plot(x, type = "graphNEL", layout = NULL, expandModules = FALSE, autofit = FALSE, scale_char = ifelse(type == "graphNEL", 1/10, 5), return_g = FALSE, lwdf = 1, ...)
x |
A |
type |
Whether you want a "graphNEL" or an "igraph" graph. |
layout |
For "igraph", the layout. For example, if you know you really have
only a tree you might want to use |
expandModules |
If there are modules with multiple genes, if you set this to TRUE modules will be replaced by their genes. |
autofit |
If TRUE, we try to fit the edges to the labels. This is a very experimental feature, likely to be not very robust. |
scale_char |
If using |
return_g |
It TRUE, the graph object (graphNEL or igrap) is returned. |
lwdf |
The multiplier factor for |
... |
Other arguments passed to |
A plot.
Order and epistatic relationships have orange edges. OR (semimonotone) relationships blue, and XOR red. All others have black edges (so AND and unique edges from root). Epistatic relationships, being symmetrical, have no arrows between nodes and have a dotted line type. Order relationships have an arrow from the earlier to the later event and have a different dotted line (lty 3).
If return_g
is TRUE, you are returned also the graph object
(igraph or graphNEL) so that you can manipulate it further.
The purpose of the plot is to get a quick idea of the relationships. Note that three-way (or higher order) epistatic relationships cannot be shown as such (we would show all possible pairs, but that is not quite the same thing). Likewise, there is no reasonable way to convey the pressence of a "-" in the epistatic relationship.
Genes without interactions are not shown.
Ramon Diaz-Uriarte
allFitnessEffects
,
plotFitnessLandscape
cs <- data.frame(parent = c(rep("Root", 4), "a", "b", "d", "e", "c"), child = c("a", "b", "d", "e", "c", "c", rep("g", 3)), s = 0.1, sh = -0.9, typeDep = "MN") cbn1 <- allFitnessEffects(cs) plot(cbn1, "igraph") library(igraph) ## to make layouts available plot(cbn1, "igraph", layout = layout.reingold.tilford) ### A DAG with the three types of relationships p3 <- data.frame(parent = c(rep("Root", 4), "a", "b", "d", "e", "c", "f"), child = c("a", "b", "d", "e", "c", "c", "f", "f", "g", "g"), s = c(0.01, 0.02, 0.03, 0.04, 0.1, 0.1, 0.2, 0.2, 0.3, 0.3), sh = c(rep(0, 4), c(-.9, -.9), c(-.95, -.95), c(-.99, -.99)), typeDep = c(rep("--", 4), "XMPN", "XMPN", "MN", "MN", "SM", "SM")) fp3 <- allFitnessEffects(p3) plot(fp3) plot(fp3, "igraph", layout = layout.reingold.tilford) ## A more complex example, that includes a restriction table ## order effects, epistasis, genes without interactions, and moduels p4 <- data.frame(parent = c(rep("Root", 4), "A", "B", "D", "E", "C", "F"), child = c("A", "B", "D", "E", "C", "C", "F", "F", "G", "G"), s = c(0.01, 0.02, 0.03, 0.04, 0.1, 0.1, 0.2, 0.2, 0.3, 0.3), sh = c(rep(0, 4), c(-.9, -.9), c(-.95, -.95), c(-.99, -.99)), typeDep = c(rep("--", 4), "XMPN", "XMPN", "MN", "MN", "SM", "SM")) oe <- c("C > F" = -0.1, "H > I" = 0.12) sm <- c("I:J" = -1) sv <- c("-K:M" = -.5, "K:-M" = -.5) epist <- c(sm, sv) modules <- c("Root" = "Root", "A" = "a1", "B" = "b1, b2", "C" = "c1", "D" = "d1, d2", "E" = "e1", "F" = "f1, f2", "G" = "g1", "H" = "h1, h2", "I" = "i1", "J" = "j1, j2", "K" = "k1, k2", "M" = "m1") noint <- rexp(5, 10) names(noint) <- paste0("n", 1:5) fea <- allFitnessEffects(rT = p4, epistasis = epist, orderEffects = oe, noIntGenes = noint, geneToModule = modules) plot(fea) plot(fea, expandModules = TRUE) plot(fea, type = "igraph")
cs <- data.frame(parent = c(rep("Root", 4), "a", "b", "d", "e", "c"), child = c("a", "b", "d", "e", "c", "c", rep("g", 3)), s = 0.1, sh = -0.9, typeDep = "MN") cbn1 <- allFitnessEffects(cs) plot(cbn1, "igraph") library(igraph) ## to make layouts available plot(cbn1, "igraph", layout = layout.reingold.tilford) ### A DAG with the three types of relationships p3 <- data.frame(parent = c(rep("Root", 4), "a", "b", "d", "e", "c", "f"), child = c("a", "b", "d", "e", "c", "c", "f", "f", "g", "g"), s = c(0.01, 0.02, 0.03, 0.04, 0.1, 0.1, 0.2, 0.2, 0.3, 0.3), sh = c(rep(0, 4), c(-.9, -.9), c(-.95, -.95), c(-.99, -.99)), typeDep = c(rep("--", 4), "XMPN", "XMPN", "MN", "MN", "SM", "SM")) fp3 <- allFitnessEffects(p3) plot(fp3) plot(fp3, "igraph", layout = layout.reingold.tilford) ## A more complex example, that includes a restriction table ## order effects, epistasis, genes without interactions, and moduels p4 <- data.frame(parent = c(rep("Root", 4), "A", "B", "D", "E", "C", "F"), child = c("A", "B", "D", "E", "C", "C", "F", "F", "G", "G"), s = c(0.01, 0.02, 0.03, 0.04, 0.1, 0.1, 0.2, 0.2, 0.3, 0.3), sh = c(rep(0, 4), c(-.9, -.9), c(-.95, -.95), c(-.99, -.99)), typeDep = c(rep("--", 4), "XMPN", "XMPN", "MN", "MN", "SM", "SM")) oe <- c("C > F" = -0.1, "H > I" = 0.12) sm <- c("I:J" = -1) sv <- c("-K:M" = -.5, "K:-M" = -.5) epist <- c(sm, sv) modules <- c("Root" = "Root", "A" = "a1", "B" = "b1, b2", "C" = "c1", "D" = "d1, d2", "E" = "e1", "F" = "f1, f2", "G" = "g1", "H" = "h1, h2", "I" = "i1", "J" = "j1, j2", "K" = "k1, k2", "M" = "m1") noint <- rexp(5, 10) names(noint) <- paste0("n", 1:5) fea <- allFitnessEffects(rT = p4, epistasis = epist, orderEffects = oe, noIntGenes = noint, geneToModule = modules) plot(fea) plot(fea, expandModules = TRUE) plot(fea, type = "igraph")
Plots data generated from the simulations, either for a single individual or for a population of individuals, with time units in the x axis and nubmer of cells in the y axis.
In "drivers" plots, by default, all clones with the same number of drivers are plotted using the same colour (but different line types), and clones with different number of drivers are plotted in different colours. Plots can alternatively display genotypes instead of drivers.
Plots available are line plots, stacked area, and stream plots.
## S3 method for class 'oncosimul' plot(x, show = "drivers", type = ifelse(show == "genotypes", "stacked", "line"), col = "auto", log = ifelse(type == "line", "y", ""), ltyClone = 2:6, lwdClone = 0.9, ltyDrivers = 1, lwdDrivers = 3, xlab = "Time units", ylab = "Number of cells", plotClones = TRUE, plotDrivers = TRUE, addtot = FALSE, addtotlwd = 0.5, ylim = NULL, xlim = NULL, thinData = FALSE, thinData.keep = 0.1, thinData.min = 2, plotDiversity = FALSE, order.method = "as.is", stream.center = TRUE, stream.frac.rand = 0.01, stream.spar = 0.2, border = NULL, lwdStackedStream = 1, srange = c(0.4, 1), vrange = c(0.8, 1), breakSortColors = "oe", legend.ncols = "auto", ...) ## S3 method for class 'oncosimulpop' plot(x, ask = TRUE, show = "drivers", type = ifelse(show == "genotypes", "stacked", "line"), col = "auto", log = ifelse(type == "line", "y", ""), ltyClone = 2:6, lwdClone = 0.9, ltyDrivers = 1, lwdDrivers = 3, xlab = "Time units", ylab = "Number of cells", plotClones = TRUE, plotDrivers = TRUE, addtot = FALSE, addtotlwd = 0.5, ylim = NULL, xlim = NULL, thinData = FALSE, thinData.keep = 0.1, thinData.min = 2, plotDiversity = FALSE, order.method = "as.is", stream.center = TRUE, stream.frac.rand = 0.01, stream.spar = 0.2, border = NULL, lwdStackedStream = 1, srange = c(0.4, 1), vrange = c(0.8, 1), breakSortColors = "oe", legend.ncols = "auto", ...)
## S3 method for class 'oncosimul' plot(x, show = "drivers", type = ifelse(show == "genotypes", "stacked", "line"), col = "auto", log = ifelse(type == "line", "y", ""), ltyClone = 2:6, lwdClone = 0.9, ltyDrivers = 1, lwdDrivers = 3, xlab = "Time units", ylab = "Number of cells", plotClones = TRUE, plotDrivers = TRUE, addtot = FALSE, addtotlwd = 0.5, ylim = NULL, xlim = NULL, thinData = FALSE, thinData.keep = 0.1, thinData.min = 2, plotDiversity = FALSE, order.method = "as.is", stream.center = TRUE, stream.frac.rand = 0.01, stream.spar = 0.2, border = NULL, lwdStackedStream = 1, srange = c(0.4, 1), vrange = c(0.8, 1), breakSortColors = "oe", legend.ncols = "auto", ...) ## S3 method for class 'oncosimulpop' plot(x, ask = TRUE, show = "drivers", type = ifelse(show == "genotypes", "stacked", "line"), col = "auto", log = ifelse(type == "line", "y", ""), ltyClone = 2:6, lwdClone = 0.9, ltyDrivers = 1, lwdDrivers = 3, xlab = "Time units", ylab = "Number of cells", plotClones = TRUE, plotDrivers = TRUE, addtot = FALSE, addtotlwd = 0.5, ylim = NULL, xlim = NULL, thinData = FALSE, thinData.keep = 0.1, thinData.min = 2, plotDiversity = FALSE, order.method = "as.is", stream.center = TRUE, stream.frac.rand = 0.01, stream.spar = 0.2, border = NULL, lwdStackedStream = 1, srange = c(0.4, 1), vrange = c(0.8, 1), breakSortColors = "oe", legend.ncols = "auto", ...)
x |
An object of class |
ask |
Same meaning as in |
show |
One of "drivers" or "genotypes". If "drivers" the legend will reflect the number of drivers. If "genotypes" you will be shown genotypes. You probably want to limit "genotypes" to those cases where only a relatively small number of genotypes exist (or the plot will be an unmanageable mess). The default is "drivers". |
type |
One of "line", "stacked", "stream". If "line", you are shown lines for each genotype or clone. This
means that to get an idea of the total population size you need to
use If "stacked" a stacked area plot. If "stream" a stream plot. Since
these stack areas, you immediately get the total population. But that
also means you cannot use The default is to use "line" for |
col |
Colour of the lines/areas. For The default is "auto". If you have You can specify your own vector of colors, but it will be ignored
with |
log |
See
|
ltyClone |
Line type for each clone. Recycled as needed. You probably do not
want to use lty=1 for any clone, to differentiate from the clone
type, unless you change the setting for |
lwdClone |
Line width for clones. |
ltyDrivers |
Line type for the driver type. |
lwdDrivers |
Line width for the driver type. |
xlab |
Same as |
ylab |
Same as |
plotClones |
Should clones be plotted? |
plotDrivers |
Should clone types (which are defined by number of drivers), be
plotted? (Only applies when using |
addtot |
If TRUE, add a line with the total populatino size. |
addtotlwd |
Line width for total population size. |
ylim |
If non NULL, limits of the y axis. Same as in
|
xlim |
If non NULL, limits of the x axis. Same as in
|
thinData |
If TRUE, the data plotted is a subset of the original data. The original data are "thinned" in such a way that the origin of each clone is not among the non-shown data (i.e., so that we can see when each clone/driver originates). Thinning is done to reduce the plot size and to speed up plotting. Note that thinning is carried out before dealing with the plot axis, so the actual number of points to be plotted could be a lot less (if you reduce the x-axis considerably) than those returned from the thinning. (In extreme cases this could lead to crashes when trying to use stream plots if, say, you end up plotting only three values). |
thinData.keep |
The fraction of the data to keep (actually, a lower bound on the fraction of data to keep). |
thinData.min |
Any time point for which a clone has a population size > thinData.min will be kept (i.e., will not be removed from) in the data. |
plotDiversity |
If TRUE, we also show, on top of the main figure, Shannon's diversity index (and we consider as distinct those genotypes with different order of mutations when order matters). If you set this to true, using |
order.method |
For stacked and stream plots. |
stream.center |
For stream plots. If TRUE, the stacked polygons will be centered so that the middle, i.e. baseline ("g0"), of the stream is approximately equal to zero. Centering is done before the addition of random wiggle to the baseline. |
stream.frac.rand |
For stream plots. Fraction of the overall data "stream" range used to define the range of random wiggle (uniform distribution) to be added to the baseline 'g0'. |
stream.spar |
Setting for smooth.spline function to make a smoothed version of baseline "g0". |
border |
For stacked and stream plots. Border colors for polygons
corresponding to y columns (will recycle) (see |
lwdStackedStream |
border line width for polygons corresponding to y columns (will recycle). |
srange |
Range of values of s in the HSV specification of colors
(see |
vrange |
Range of values of v in the HSV specification of colors
(see |
breakSortColors |
How to try to minimize that similar colors be
used for contiguous or nearby driver categories. The default is "oe"
which resorts them in alternating way. The other two options are
"distave", where we alternate after folding from the mean and "random"
where the colors are randomly sorted. Only applies when using
"stacked" or "stream" plots and |
legend.ncols |
The number of columns of the legend. If
|
... |
Other arguments passed to |
Ramon Diaz-Uriarte. Marc Taylor for stacked and stream plots.
## Show individual genotypes and drivers for an ## epistasis case with at most eight genotypes set.seed(1) sa <- 0.1 sb <- -0.2 sab <- 0.25 sac <- -0.1 sbc <- 0.25 sv2 <- allFitnessEffects(epistasis = c("-A : B" = sb, "A : -B" = sa, "A : C" = sac, "A:B" = sab, "-A:B:C" = sbc), geneToModule = c( "Root" = "Root", "A" = "a1, a2", "B" = "b", "C" = "c")) evalAllGenotypes(sv2, order = FALSE, addwt = TRUE) e1 <- oncoSimulIndiv(sv2, model = "McFL", mu = 5e-6, sampleEvery = 0.02, keepEvery = 1, initSize = 2000, finalTime = 2000, seed = NULL, onlyCancer = FALSE) ## Drivers and clones plot(e1, show = "drivers") ## Stack plot(e1, type = "stacked") ## Make genotypes explicit plot(e1, show = "genotypes") ## Oh, but I want other colors plot(e1, show = "genotypes", col = rainbow(8)) ## and actually I want a line plot plot(e1, show = "genotypes", type = "line")
## Show individual genotypes and drivers for an ## epistasis case with at most eight genotypes set.seed(1) sa <- 0.1 sb <- -0.2 sab <- 0.25 sac <- -0.1 sbc <- 0.25 sv2 <- allFitnessEffects(epistasis = c("-A : B" = sb, "A : -B" = sa, "A : C" = sac, "A:B" = sab, "-A:B:C" = sbc), geneToModule = c( "Root" = "Root", "A" = "a1, a2", "B" = "b", "C" = "c")) evalAllGenotypes(sv2, order = FALSE, addwt = TRUE) e1 <- oncoSimulIndiv(sv2, model = "McFL", mu = 5e-6, sampleEvery = 0.02, keepEvery = 1, initSize = 2000, finalTime = 2000, seed = NULL, onlyCancer = FALSE) ## Drivers and clones plot(e1, show = "drivers") ## Stack plot(e1, type = "stacked") ## Make genotypes explicit plot(e1, show = "genotypes") ## Oh, but I want other colors plot(e1, show = "genotypes", col = rainbow(8)) ## and actually I want a line plot plot(e1, show = "genotypes", type = "line")
Plot a parent-child relationship of the clones, controlling which clones are displayed, and whether to shown number of times of appearance, and time of first appearance of a clone.
plotClonePhylog(x, N = 1, t ="last", timeEvents = FALSE, keepEvents = FALSE, fixOverlap = TRUE, returnGraph = FALSE, ...)
plotClonePhylog(x, N = 1, t ="last", timeEvents = FALSE, keepEvents = FALSE, fixOverlap = TRUE, returnGraph = FALSE, ...)
x |
The output from a simulation, as obtained from
|
N |
Show in the plot all clones that have a population size of at
least N at time |
t |
The time at which |
timeEvents |
If TRUE, the vertical position of the nodes in the plot will be proportional to their time of first appearance. |
keepEvents |
If TRUE, the graph will show all the birth events. Thus, the number of arrows shows the number of times a clone give rise to another. For large graphs with many events, this slows the graph considerably. |
fixOverlap |
When using |
returnGraph |
If TRUE, the igraph object is returned. You can use this to plot the object however you want or obtain the adjacency matrix. |
... |
Additional arguments. Currently not used. |
.
A plot is produced. If returnGraph
the igraph object is returned.
These are not, technically, proper phylogenetic trees and we use "phylogeny" here in an abuse of terminology. The plots we use, where we show parent child relationships are arguably more helpful in this context. But you could draw proper phylogenies with the information provided.
If you want to obtain the adjacency matrix, this is trivial: just
set returnGraph = TRUE
and use get.adjacency
. See
an example below.
Ramon Diaz-Uriarte
data(examplesFitnessEffects) tmp <- oncoSimulIndiv(examplesFitnessEffects[["o3"]], model = "McFL", mu = 5e-5, detectionSize = 1e8, detectionDrivers = 3, sampleEvery = 0.025, max.num.tries = 10, keepEvery = 5, initSize = 2000, finalTime = 3000, onlyCancer = FALSE, keepPhylog = TRUE) ## Show only those with N > 10 at end plotClonePhylog(tmp, N = 10) ## Show only those with N > 1 between times 5 and 1000 plotClonePhylog(tmp, N = 1, t = c(5, 1000)) ## Show everything, even if teminal nodes are extinct plotClonePhylog(tmp, N = 0) ## Show time when first appeared plotClonePhylog(tmp, N = 10, timeEvents = TRUE) ## Not run: ## Show each event ## This can take a few seconds plotClonePhylog(tmp, N = 10, keepEvents = TRUE) ## End(Not run) ## Adjacency matrix require(igraph) get.adjacency(plotClonePhylog(tmp, N = 10, returnGraph = TRUE))
data(examplesFitnessEffects) tmp <- oncoSimulIndiv(examplesFitnessEffects[["o3"]], model = "McFL", mu = 5e-5, detectionSize = 1e8, detectionDrivers = 3, sampleEvery = 0.025, max.num.tries = 10, keepEvery = 5, initSize = 2000, finalTime = 3000, onlyCancer = FALSE, keepPhylog = TRUE) ## Show only those with N > 10 at end plotClonePhylog(tmp, N = 10) ## Show only those with N > 1 between times 5 and 1000 plotClonePhylog(tmp, N = 1, t = c(5, 1000)) ## Show everything, even if teminal nodes are extinct plotClonePhylog(tmp, N = 0) ## Show time when first appeared plotClonePhylog(tmp, N = 10, timeEvents = TRUE) ## Not run: ## Show each event ## This can take a few seconds plotClonePhylog(tmp, N = 10, keepEvents = TRUE) ## End(Not run) ## Adjacency matrix require(igraph) get.adjacency(plotClonePhylog(tmp, N = 10, returnGraph = TRUE))
Show a plot of a fitness landscape. The plot is modeled after (actually, mostly a blatant copy of) that of MAGELLAN, http://wwwabi.snv.jussieu.fr/public/Magellan/.
Note: this is not a plot of the fitnessEffects object; for that, see
plot.fitnessEffects
.
plotFitnessLandscape(x, show_labels = TRUE, col = c("green4", "red", "yellow"), lty = c(1, 2, 3), use_ggrepel = FALSE, log = FALSE, max_num_genotypes = 2000, only_accessible = FALSE, accessible_th = 0, ...) ## S3 method for class 'genotype_fitness_matrix' plot(x, show_labels = TRUE, col = c("green4", "red", "yellow"), lty = c(1, 2, 3), use_ggrepel = FALSE, log = FALSE, max_num_genotypes = 2000, only_accessible = FALSE, accessible_th = 0, ...) ## S3 method for class 'evalAllGenotypes' plot(x, show_labels = TRUE, col = c("green4", "red", "yellow"), lty = c(1, 2, 3), use_ggrepel = FALSE, log = FALSE, max_num_genotypes = 2000, only_accessible = FALSE, accessible_th = 0, ...) ## S3 method for class 'evalAllGenotypesMut' plot(x, show_labels = TRUE, col = c("green4", "red", "yellow"), lty = c(1, 2, 3), use_ggrepel = FALSE, log = FALSE, max_num_genotypes = 2000, only_accessible = FALSE, accessible_th = 0, ...)
plotFitnessLandscape(x, show_labels = TRUE, col = c("green4", "red", "yellow"), lty = c(1, 2, 3), use_ggrepel = FALSE, log = FALSE, max_num_genotypes = 2000, only_accessible = FALSE, accessible_th = 0, ...) ## S3 method for class 'genotype_fitness_matrix' plot(x, show_labels = TRUE, col = c("green4", "red", "yellow"), lty = c(1, 2, 3), use_ggrepel = FALSE, log = FALSE, max_num_genotypes = 2000, only_accessible = FALSE, accessible_th = 0, ...) ## S3 method for class 'evalAllGenotypes' plot(x, show_labels = TRUE, col = c("green4", "red", "yellow"), lty = c(1, 2, 3), use_ggrepel = FALSE, log = FALSE, max_num_genotypes = 2000, only_accessible = FALSE, accessible_th = 0, ...) ## S3 method for class 'evalAllGenotypesMut' plot(x, show_labels = TRUE, col = c("green4", "red", "yellow"), lty = c(1, 2, 3), use_ggrepel = FALSE, log = FALSE, max_num_genotypes = 2000, only_accessible = FALSE, accessible_th = 0, ...)
x |
One of the following:
The first two are the same as the format for the |
show_labels |
If TRUE, show the genotype labels. |
col |
A three-element vector that gives the colors to use for increase, decreases and no changes in fitness, respectively. The first two colours are also used for peaks and sinks. |
lty |
A three-element vector that gives the line types to use for increase, decreases and no changes in fitness, respectively. |
use_ggrepel |
If TRUE, use the ggrepel package to avoid overlap of labels. |
log |
Log-scale the y axis. |
max_num_genotypes |
Maximum allowed number of genotypes. For some
types of input, we make a call to |
only_accessible |
If TRUE, show only accessible paths. A path is
considered accesible if, at each mutational step (i.e., with the
addition of each mutation) fitness increases by at least
|
accessible_th |
The threshold for the minimal change in fitness at
each mutation step (i.e., between successive genotypes) to be used if
|
... |
Other arguments passed to |
A fitness landscape plot: a plot showing paths between genotypes and peaks and sinks (local maxima and minima).
I have copied most of the ideas (and colors, and labels) of this plot from MAGELLAN (http://wwwabi.snv.jussieu.fr/public/Magellan/) but MAGELLAN has other functionality that is not provided here such as epistasis stats for the landscape, and several visual manipulation options.
One feature of this function that is not available in MAGELLAN is showing genotype labels (i.e., annotated by gene names), which can be helpful if the different genotypes mean something to you.
In addition to the above differences, another difference between this plot and those of MAGELLAN is how sinks/peaks of more than one genotype are dealt with. This plot will show as sinks or peaks sets of one or more genotypes that are of identical fitness (and separated by a Haming distance of one). So a sink or a peak might actually be made of more than one genotype. In MAGELLAN, as far as I can tell, peaks and sinks are always made of a single isolated genotype.
Does this matter? In most realistic cases where not two genotypes can have exactly the same fittnes it does not. In some cases, though, it might matter. Are multi-genotype sinks/peaks really sinks/peaks? Arguably yes: suppose genotypes "AB" and "ABC" both have fitness 0, which is minimal among the fitness in the set of genotypes, and genotypes "A" and "ABCD" have fitness 0.1. To go from "A" to "ABCD", if you want to travel through "AB", you have to go through the valley of "AB" and "ABC"; once in "ABC" you can climb up to "ABCD"; and once in "AB" you can move to "ABC" since it has identical fitness to "AB". Mutatis mutandis for multi-genotype peaks. Ignoring the possibility of peaks/sinks made of more than one genotype actually makes code much simpler.
Sometimes not showing the any links that involve a decrease in fitness
can help see non-accessible pathways (in strong selection, no multiple
mutations, etc); do this by passing, for instance, an NA for the
second element of col
.
Finally, use common sense: for instance, if you pass a
allFitnessEffects
that specifies for, say, the fitness
of a total of 5000 genotypes you'll have to wait a while for the plot
to finish.
Ramon Diaz-Uriarte
MAGELLAN web site: http://wwwabi.snv.jussieu.fr/public/Magellan/
Brouillet, S. et al. (2015). MAGELLAN: a tool to explore small fitness landscapes. bioRxiv, 31583. http://doi.org/10.1101/031583
allFitnessEffects
,
evalAllGenotypes
,
allFitnessEffects
,
rfitness
,
plot.fitnessEffects
## Generate random fitness for four genes-genotypes ## and plot landscape. r1 <- rfitness(4) plot(r1) ## Specify fitness in a matrix, and plot it m5 <- cbind(A = c(0, 1, 0, 1), B = c(0, 0, 1, 1), F = c(1, 2, 3, 5.5)) plotFitnessLandscape(m5) ## Specify fitness with allFitnessEffects, and plot it fe <- allFitnessEffects(epistasis = c("a : b" = 0.3, "b : c" = 0.5), noIntGenes = c("e" = 0.1)) plot(evalAllGenotypes(fe, order = FALSE)) ## same as plotFitnessLandscape(evalAllGenotypes(fe, order = FALSE))
## Generate random fitness for four genes-genotypes ## and plot landscape. r1 <- rfitness(4) plot(r1) ## Specify fitness in a matrix, and plot it m5 <- cbind(A = c(0, 1, 0, 1), B = c(0, 0, 1, 1), F = c(1, 2, 3, 5.5)) plotFitnessLandscape(m5) ## Specify fitness with allFitnessEffects, and plot it fe <- allFitnessEffects(epistasis = c("a : b" = 0.3, "b : c" = 0.5), noIntGenes = c("e" = 0.1)) plot(evalAllGenotypes(fe, order = FALSE)) ## same as plotFitnessLandscape(evalAllGenotypes(fe, order = FALSE))
Plot a poset. Optionally add a root and change names of nodes.
plotPoset(x, names = NULL, addroot = FALSE, box = FALSE, ...)
plotPoset(x, names = NULL, addroot = FALSE, box = FALSE, ...)
x |
A poset. A matrix with two columns where, in each row, the first
column is the ancestor and the second the descendant. Note that
there might be multiple rows with the same ancestor, and multiple
rows with the same descendant. See |
names |
If not NULL, a vector of names for the nodes, with the same length
as the total number of nodes in a poset (which need not be the same
as the number of rows; see |
addroot |
Add a "Root" node to the graph? |
box |
Should the graph be placed inside a box? |
... |
Additional arguments to |
).
The poset is converted to a graphNEL
object.
A plot is produced.
Ramon Diaz-Uriarte
data(examplePosets) plotPoset(examplePosets[["p1101"]]) ## If you will be using that poset a lot, maybe simpler if poset701 <- examplePosets[["p701"]] plotPoset(poset701, addroot = TRUE) ## Compare to Pancreatic cancer figure in Gerstung et al., 2011 plotPoset(poset701, names = c("KRAS", "SMAD4", "CDNK2A", "TP53", "MLL3","PXDN", "TGFBR2")) ## If you want to show Root explicitly do plotPoset(poset701, addroot = TRUE, names = c("Root", "KRAS", "SMAD4", "CDNK2A", "TP53", "MLL3","PXDN", "TGFBR2")) ## Of course, names are in the order of nodes, so KRAS is for node 1, ## etc, but the order of entries in the poset does not matter: poset701b <- poset701[nrow(poset701):1, ] plotPoset(poset701b, names = c("KRAS", "SMAD4", "CDNK2A", "TP53", "MLL3","PXDN", "TGFBR2"))
data(examplePosets) plotPoset(examplePosets[["p1101"]]) ## If you will be using that poset a lot, maybe simpler if poset701 <- examplePosets[["p701"]] plotPoset(poset701, addroot = TRUE) ## Compare to Pancreatic cancer figure in Gerstung et al., 2011 plotPoset(poset701, names = c("KRAS", "SMAD4", "CDNK2A", "TP53", "MLL3","PXDN", "TGFBR2")) ## If you want to show Root explicitly do plotPoset(poset701, addroot = TRUE, names = c("Root", "KRAS", "SMAD4", "CDNK2A", "TP53", "MLL3","PXDN", "TGFBR2")) ## Of course, names are in the order of nodes, so KRAS is for node 1, ## etc, but the order of entries in the poset does not matter: poset701b <- poset701[nrow(poset701):1, ] plotPoset(poset701b, names = c("KRAS", "SMAD4", "CDNK2A", "TP53", "MLL3","PXDN", "TGFBR2"))
Compute Lines of Descent (LOD) and Path of the Maximum (POM) for a
single simulation or a set of simulations (from oncoSimulPop
).
diversityPOM
and diversityLOD
return the Shannon's
diversity (entropy) of the POM and LOD, respectively, of a set of
simulations (it makes no sense to compute those from a single simulation).
POM(x) LOD(x) diversityPOM(lpom) diversityLOD(llod)
POM(x) LOD(x) diversityPOM(lpom) diversityLOD(llod)
x |
An object of class |
lpom |
A list of POMs, as returned from |
llod |
A list of LODs, as returned from |
Lines of Descent (LOD) and Path of the Maximum (POM) were defined in Szendro et al. (2013) and I follow those definitions here, as applied to a process in continuous time with sampling at user-specified periods.
For POM, the results can depend strongly on how often we sample (i.e.,
the sampleEvery
argument to oncoSimulIndiv
and
oncoSimulPop
), since the POM is computed by finding the clone
with largest population size whenever we sample.
This also explains why
it is generally meaningless to use POM on oncoSimulSample
runs:
these only keep the very last sample.
For LOD, a single LOD per simulation is returned, with the same meaning as that in p. 572 of Szendro et al. (2013). "A given genotype may undergo several episodes of colonization and extinction that are stored by the algorithm, and the last episode before the colonization of the final state is used to construct the step.", and I check that this genotype (which is the one that will become the most populated at final time) does not become extinct before the final colonization.
Note breaking changes: for LOD we used to return all lines of descent in a given simulation. In v. 2.9.1 we also returned the LOD as explained above. Now we only return the LOD as defined above.
Beware, however, that if you use multiple initial mutants the LOD function will probably not do what you want. It is not even clear that the LOD is well defined in this case. We are working on this.
For POM
either a character vector (if x
is a single
simulation) or a list of character vectors. Each character vector is
the ordered set of genotypes that contain the largest subpopulation at
the times of sampling.
For LOD
, if x
is a single simulation, the line of
descent as defined above (either an object of class "igraph.vs" (an
igraph vertex sequence: see vertex_attr
) or a
character vector if there were no descendants). If x
is a list
(population) of simulations, then a list where each element is a list
as just explained.
For diversityLOD
and diversityPOM
a single element
vector with the Shannon's diversity (entropy) of the LODs (for
diversityLOD
) or of the POMs (for diversityPOM
).
Ramon Diaz-Uriarte
Szendro, I. G., Franke, J., Visser, J. A. G. M. de, & Krug, J. (2013). Predictability of evolution depends nonmonotonically on population size. Proceedings of the National Academy of Sciences, 110(2), 571-576. https://doi.org/10.1073/pnas.1213613110
######## Using a poset for pancreatic cancer from Gerstung et al. ### (s and sh are made up for the example; only the structure ### and names come from Gerstung et al.) pancr <- allFitnessEffects(data.frame(parent = c("Root", rep("KRAS", 4), "SMAD4", "CDNK2A", "TP53", "TP53", "MLL3"), child = c("KRAS","SMAD4", "CDNK2A", "TP53", "MLL3", rep("PXDN", 3), rep("TGFBR2", 2)), s = 0.05, sh = -0.3, typeDep = "MN")) pancr1 <- oncoSimulIndiv(pancr, model = "Exp") RNGkind("L'Ecuyer-CMRG") set.seed(3) pancr8 <- oncoSimulPop(3, pancr, model = "Exp", finalTime = 600, onlyCancer = TRUE, seed = NULL, mc.cores = 2) POM(pancr1) LOD(pancr1) POM(pancr8) LOD(pancr8) diversityPOM(POM(pancr8)) diversityLOD(LOD(pancr8))
######## Using a poset for pancreatic cancer from Gerstung et al. ### (s and sh are made up for the example; only the structure ### and names come from Gerstung et al.) pancr <- allFitnessEffects(data.frame(parent = c("Root", rep("KRAS", 4), "SMAD4", "CDNK2A", "TP53", "TP53", "MLL3"), child = c("KRAS","SMAD4", "CDNK2A", "TP53", "MLL3", rep("PXDN", 3), rep("TGFBR2", 2)), s = 0.05, sh = -0.3, typeDep = "MN")) pancr1 <- oncoSimulIndiv(pancr, model = "Exp") RNGkind("L'Ecuyer-CMRG") set.seed(3) pancr8 <- oncoSimulPop(3, pancr, model = "Exp", finalTime = 600, onlyCancer = TRUE, seed = NULL, mc.cores = 2) POM(pancr1) LOD(pancr1) POM(pancr8) LOD(pancr8) diversityPOM(POM(pancr8)) diversityLOD(LOD(pancr8))
Poset: explanation.
x |
The poset. See details. |
A poset is a two column matrix. In each row, the first column is the ancestor (or the restriction) and the second column the descendant (or the node that depends on the restriction). Each node is identified by a positive integer. The graph includes all nodes with integers between 1 and the largest integer in the poset.
Each node can be necessary for several nodes: in this case, the same node would appear in the first column in several rows.
A node can depend on two or more nodes (conjunctions): in this case, the same node would appear in the second column in several rows.
There can be nodes that do not depend on anything (except the Root node) and on which no other nodes depend. The simplest and safest way to deal with all possible cases, including these cases, is to have all nodes with at least one entry in the poset, and nodes that depend on no one, and on which no one depends should be placed on the second column (with a 0 on the first column).
Alternatively, any node not named explicitly in the poset, but with a number smaller than the largest number in the poset, is taken to be a node that depends on no one and on which no one depends. See examples below.
This specification of restrictions is for version 1. See
allFitnessEffects
for a much more flexible one for
version 2. Both can be used with oncoSimulIndiv
.
Note that simulating using posets directly is no longer supported. This function is left here only for historical purposes.
Ramon Diaz-Uriarte
Posets and similar structures appear in several places. The following two papers use them extensively.
Gerstung et al., 2009. Quantifying cancer progression with conjunctive Bayesian networks. Bioinformatics, 21: 2809–2815.
Gerstung et al., 2011. The Temporal Order of Genetic and Pathway Alterations in Tumorigenesis. PLoS ONE, 6.
examplePosets
, plotPoset
,
oncoSimulIndiv
## Node 2 and 3 depend on 1, and 4 depends on no one p1 <- cbind(c(1L, 1L, 0L), c(2L, 3L, 4L)) plotPoset(p1, addroot = TRUE) ## Node 2 and 3 depend on 1, and 4 to 7 depend on no one. ## We do not have nodes 4 to 6 explicitly in the poset. p2 <- cbind(c(1L, 1L, 0L), c(2L, 3L, 7L)) plotPoset(p2, addroot = TRUE) ## But this is arguably cleaner p3 <- cbind(c(1L, 1L, rep(0L, 4)), c(2L, 3L, 4:7 )) plotPoset(p3, addroot = TRUE) ## A simple way to create a poset where no gene (in a set of 15) depends ## on any other. p4 <- cbind(0L, 15L) plotPoset(p4, addroot = TRUE) ## Specifying the pancreatic cancer poset in Gerstung et al., 2011 ## (their figure 2B, left). We use numbers, but for nicer plotting we ## will use names: KRAS is 1, SMAD4 is 2, etc. pancreaticCancerPoset <- cbind(c(1, 1, 1, 1, 2, 3, 4, 4, 5), c(2, 3, 4, 5, 6, 6, 6, 7, 7)) storage.mode(pancreaticCancerPoset) <- "integer" plotPoset(pancreaticCancerPoset, names = c("KRAS", "SMAD4", "CDNK2A", "TP53", "MLL3","PXDN", "TGFBR2")) ## Specifying poset 2 in Figure 2A of Gerstung et al., 2009: poset2 <- cbind(c(1, 1, 3, 3, 3, 7, 7, 8, 9, 10), c(2, 3, 4, 5, 6, 8, 9, 10, 10, 11)) storage.mode(poset2) <- "integer" plotPoset(poset2)
## Node 2 and 3 depend on 1, and 4 depends on no one p1 <- cbind(c(1L, 1L, 0L), c(2L, 3L, 4L)) plotPoset(p1, addroot = TRUE) ## Node 2 and 3 depend on 1, and 4 to 7 depend on no one. ## We do not have nodes 4 to 6 explicitly in the poset. p2 <- cbind(c(1L, 1L, 0L), c(2L, 3L, 7L)) plotPoset(p2, addroot = TRUE) ## But this is arguably cleaner p3 <- cbind(c(1L, 1L, rep(0L, 4)), c(2L, 3L, 4:7 )) plotPoset(p3, addroot = TRUE) ## A simple way to create a poset where no gene (in a set of 15) depends ## on any other. p4 <- cbind(0L, 15L) plotPoset(p4, addroot = TRUE) ## Specifying the pancreatic cancer poset in Gerstung et al., 2011 ## (their figure 2B, left). We use numbers, but for nicer plotting we ## will use names: KRAS is 1, SMAD4 is 2, etc. pancreaticCancerPoset <- cbind(c(1, 1, 1, 1, 2, 3, 4, 4, 5), c(2, 3, 4, 5, 6, 6, 6, 7, 7)) storage.mode(pancreaticCancerPoset) <- "integer" plotPoset(pancreaticCancerPoset, names = c("KRAS", "SMAD4", "CDNK2A", "TP53", "MLL3","PXDN", "TGFBR2")) ## Specifying poset 2 in Figure 2A of Gerstung et al., 2009: poset2 <- cbind(c(1, 1, 3, 3, 3, 7, 7, 8, 9, 10), c(2, 3, 4, 5, 6, 8, 9, 10, 10, 11)) storage.mode(poset2) <- "integer" plotPoset(poset2)
Generate random fitness landscapes under a House of Cards, Rough Mount Fuji (RMF), additive (multiplicative) model, Kauffman's NK model, Ising model, Eggbox model and Full model
rfitness(g, c = 0.5, sd = 1, mu = 1, reference = "random", scale = NULL, wt_is_1 = c("subtract", "divide", "force", "no"), log = FALSE, min_accessible_genotypes = NULL, accessible_th = 0, truncate_at_0 = TRUE, K = 1, r = TRUE, i = 0, I = -1, circular = FALSE, e = 0, E = -1, H = -1, s = 0.1, S = -1, d = 0, o = 0, O = -1, p = 0, P = -1, model = c("RMF", "Additive", "NK", "Ising", "Eggbox", "Full"), seed_magellan = -1)
rfitness(g, c = 0.5, sd = 1, mu = 1, reference = "random", scale = NULL, wt_is_1 = c("subtract", "divide", "force", "no"), log = FALSE, min_accessible_genotypes = NULL, accessible_th = 0, truncate_at_0 = TRUE, K = 1, r = TRUE, i = 0, I = -1, circular = FALSE, e = 0, E = -1, H = -1, s = 0.1, S = -1, d = 0, o = 0, O = -1, p = 0, P = -1, model = c("RMF", "Additive", "NK", "Ising", "Eggbox", "Full"), seed_magellan = -1)
g |
Number of genes. |
c |
The decrease in fitness of a genotype per each unit increase
in Hamming distance from the reference genotype for the RMF model
(see |
sd |
The standard deviation of the random component (a normal
distribution of mean |
mu |
The mean of the random component (a normal distribution of
mean |
reference |
The reference genotype: in the RMF model, for the
deterministic, additive part, this is the genotype with maximal
fitness, and all other genotypes decrease their fitness by |
scale |
Either NULL (nothing is done) or a two- or three-element vector. If a two-element vector, fitness is re-scaled between
If you pass a three element vector, fitness is re-scaled so that the
new maximum fitness is As a consequence of using a three element vector, the amount of
stretching/compressing (i.e., scaling) of fitness values larger than
that of the wildtype will likely be different from the scaling of
fitness values smaller than that of the wildtype. In other words,
this argument allows you to change the spread of the positive and
negative fitness values (and you can make this difference extreme and
make most fitness values less than wildtype be 0 by using a huge
negative number –huge in absolute value– for Using a three element vector is probably the most natural way of changing the scale and range of fitness. See also |
wt_is_1 |
If "divide" the fitness of all genotypes is
divided by the fitness of the wildtype (after possibly adding a value
to ensure no negative fitness) so that the wildtype (the genotype with
no mutations) has fitness 1. This is a case of scaling, and it is
applied after If "subtract" (the default) we shift all the fitness values (subtracting fitness of
the wildtype and adding 1) so that the wildtype ends up with a fitness
of 1. This is also applied after If "force" we simply set the fitness of the wildtype to 1, without any
divisions. This means that the If "no", the fitness of the wildtype is not modified. This option has no effect if you pass a three-element vector for
|
log |
If TRUE, log-transform fitness. Actually, there are two
cases: if If you pass a three-element vector for scale, you will want to pass
|
min_accessible_genotypes |
If not NULL, the minimum number of
accessible genotypes in the fitness landscape. A genotype is
considered accessible if you can reach if from the wildtype by going
through at least one path where all changes in fitness are larger or
equal to If the condition is not satisfied, we continue generating random fitness landscapes with the specified parameters until the condition is satisfied. (Why check against NULL and not against zero? Because this allows you to count accessible genotypes even if you do not want to ensure a minimum number of accessible genotypes.) |
accessible_th |
The threshold for the minimal change in fitness at
each mutation step (i.e., between successive genotypes) that allows a
genotype to be regarded as accessible. This only applies if
|
truncate_at_0 |
If TRUE (the default) any fitness <= 0 is
substituted by a small positive constant (a random uniform number
between 1e-10 and 1e-9). Why? Because MAGELLAN and some plotting
routines can have trouble (specially if you log) with values <=0. Or
we might have trouble if we want to log the fitness. This is done
after possibly taking logs. Noise is added to prevent creating several
identical minimal fitness values. Note that |
K |
K for NK model; K is the number of loci with which each locus interacts, and the larger the K the larger the ruggedness of the landscape. |
r |
For the NK model, whether interacting loci are chosen at random
( |
i |
For de Ising model, i is the mean cost for incompatibility with which the genotype's fitness is penalized when in two adjacent genes, only one of them is mutated. |
I |
For the Ising model, I is the standard deviation for the cost incompatibility (i). |
circular |
For the Ising model, whether there is a circular arrangement, where the last and the first genes are adjacent to each other. |
e |
For the Eggbox model, mean effect in fitness for the neighbor locus +/- e. |
E |
For the Eggbox model, noise added to the mean effect in fitness (e). |
H |
For Full models, standard deviation for the House of Cards model. |
s |
For Full models, mean of the fitness for the Multiplicative model. |
S |
For Full models, standard deviation for the Multiplicative model. |
d |
For Full models, a disminishing (negative) or increasing (positive) return as the peak is approached for multiplicative model. |
o |
For Full models, mean value for the optimum model. |
O |
For Full models, standard deviation for the optimum model. |
p |
For Full models, the mean production value for each non 0 allele in the Optimum model component. |
P |
For Full models, the associated stdev (of non 0 alleles) in the Optimum model component. |
model |
One of "RMF" (default) for Rough Mount Fuji, "Additive" for Additive model, "NK", for Kauffman's NK model, "Ising" for Ising model, "Eggbox" for Eggbox model or "Full" for Full models. |
seed_magellan |
The seed for the random number generator in models generated from MAGELLAN. If -1, the clock is used by MAGELLAN to generate a seed, but you probably want to pass a seed: see Details. |
When using model = "RMF"
, the model used here follows
the Rough Mount Fuji model in Szendro et al., 2013 or Franke et al.,
2011. Fitness is given as
where is the Hamming distance between genotypes
and
(the number of positions that differ) and
is a
random variable (in this case, a normal deviate of mean
mu
and standard deviation sd
).
When using model = "RMF"
, setting we obtain a House
of Cards model. Setting
fitness is given by the
distance from the reference and if the reference is the genotype
with all positions mutated, then we have a fully additive model
(fitness increases linearly with the number of positions mutated),
where all mutations have the same effect.
More flexible additive models can be used using model =
"Additive"
. This model is like the Rough Mount Fuji model in Szendro
et al., 2013 or Franke et al., 2011, but in this case, each locus can
have different contributions to the fitness evaluation. This model is
also referred to as the "multiplicative" model in the literature as it
is additive in the log-scale (e.g., see Brouillet et al., 2015 or
Ferretti et al., 2016). The contribution of each mutated allele to the
log-fitness is a random deviate from a Normal distribution with
specified mean mu
and standard deviation sd
, and the
log-fitness of a genotype is the sum of the contributions of each
mutated allele. There is no "reference" genotype in the Additive
model. There is no epistasis in the additve model because the effect
of a mutation in a locus does not depend on the genetic background, or
whether the rest of the loci are mutated or not.
When using model = "NK"
fitness is drawn from a uniform (0, 1)
distribution.
When using model = "Ising"
for each pair of interacting loci,
there is an associated cost if both alleles are not identical
(and therefore 'compatible').
When using model = "Eggbox"
each locus is either high or low fitness,
with a systematic change between each neighbor.
When using model = "Full"
, the fitness is computed with different
parts of the previous models depending on the choosen parameters described
above.
For model = "NK" | "Ising" | "Eggbox" | "Full"
the fitness
landscape is generated by directly calling the fl_generate
function of MAGELLAN
(http://wwwabi.snv.jussieu.fr/public/Magellan/). See details in
Ferretti et al. 2016, or Brouillet et al., 2015.
For OncoSimulR, we often want the wildtype to have a mean of
1. Reasonable settings when using RMF are mu = 1
and wt_is_1 =
'subtract'
so that we simulate from a distribution centered in 1, and
we make sure afterwards (via a simple shift) that the wildtype is
actuall 1. The sd
controls the standard deviation, with the
usual working and meaning as in a normal distribution, unless c
is different from zero. In this case, with c
large, the range
of the data can be large, specially if g
(the number of genes)
is large.
Note that allFitnessEffects
will remove from the table
of genotypes any genotype with a fitness <= 1e-9, thus
making it a non-viable genotype during simulations.
seed_magellan
: if you run code in parallel or you use sequential
code where you generate random fitness landscapes generated by MAGELLAN
(model = "NK" | "Ising" | "Eggbox" | "Full"
) in a short time,
MAGELLAN would likely end up using the same seed as the different calls
would be done within the same time (within second resolution). Thus,
especially if you are generating the same kind of fitness landscape,
you probably want to pass different seeds. The seed is read as a C
long, so you should be able to use integers going from at least
-2,147,483,647 to +2,147,483,647, in 32-bits, but probably a much
larger range (-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
in 64-bits. Note, though, that some values can crash MAGELLAN (for
example -9223372036854775806 or -2147483647). You do not want to pass
numbers in scientific notation; for example, you could instead do
seed_magellan = format(2^40, scientific = FALSE)
.
An matrix with g + 1
columns. Each column corresponds to a
gene, except the last one that corresponds to fitness. 1/0 in a gene
column denotes gene mutated/not-mutated. (For ease of use in other
functions, this matrix has class "genotype_fitness_matrix".)
If you have specified min_accessible_genotypes > 0
, the return
object has added attributes accessible_genotypes
and
accessible_th
that show the number of accessible
genotypes under the specified threshold.
MAGELLAN uses its own random number generating functions; using
set.seed
does not allow to obtain the same fitness landscape
repeatedly.
Ramon Diaz-Uriarte for the RMF and general wrapping code. S. Brouillet, G. Achaz, S. Matuszewski, H. Annoni, and L. Ferreti for the MAGELLAN code. Further contributions to the additive model and to wrapping MAGELLAN code and documentation from Guillermo Gorines Cordero, Ivan Lorca Alonso, Francisco Muñoz Lopez, David Roncero Moroño, Alvaro Quevedo, Pablo Perez, Cristina Devesa, Alejandro Herrador.
Szendro I.~G. et al. (2013). Quantitative analyses of empirical fitness landscapes. Journal of Statistical Mehcanics: Theory and Experiment\/, 01, P01005.
Franke, J. et al. (2011). Evolutionary accessibility of mutational pathways. PLoS Computational Biology\/, 7(8), 1–9.
Brouillet, S. et al. (2015). MAGELLAN: a tool to explore small fitness landscapes. bioRxiv, 31583. http://doi.org/10.1101/031583
Ferretti, L., Schmiegelt, B., Weinreich, D., Yamauchi, A., Kobayashi, Y., Tajima, F., & Achaz, G. (2016). Measuring epistasis in fitness landscapes: The correlation of fitness effects of mutations. Journal of Theoretical Biology\/, 396, 132–143. https://doi.org/10.1016/j.jtbi.2016.01.037
MAGELLAN web site: http://wwwabi.snv.jussieu.fr/public/Magellan/
oncoSimulIndiv
,
plot.genotype_fitness_matrix
,
evalAllGenotypes
allFitnessEffects
plotFitnessLandscape
Magellan_stats
## Random fitness for four genes-genotypes, ## plotting and simulating an oncogenetic trajectory ## NK model rnk <- rfitness(5, K = 3, model = "NK") plot(rnk) oncoSimulIndiv(allFitnessEffects(genotFitness = rnk)) ## Additive model radd <- rfitness(4, model = "Additive", mu = 0.2, sd = 0.5) plot(radd) ## Not run: ## Eggbox model regg = rfitness(g=4,model="Eggbox", e = 2, E=2.4) plot(regg) ## Ising model ris = rfitness(g=4,model="Ising", i = 0.002, I=2) plot(ris) ## Full model rfull = rfitness(g=4, model="Full", i = 0.002, I=2, K = 2, r = TRUE, p = 0.2, P = 0.3, o = 0.3, O = 1) plot(rfull) ## End(Not run)
## Random fitness for four genes-genotypes, ## plotting and simulating an oncogenetic trajectory ## NK model rnk <- rfitness(5, K = 3, model = "NK") plot(rnk) oncoSimulIndiv(allFitnessEffects(genotFitness = rnk)) ## Additive model radd <- rfitness(4, model = "Additive", mu = 0.2, sd = 0.5) plot(radd) ## Not run: ## Eggbox model regg = rfitness(g=4,model="Eggbox", e = 2, E=2.4) plot(regg) ## Ising model ris = rfitness(g=4,model="Ising", i = 0.002, I=2) plot(ris) ## Full model rfull = rfitness(g=4, model="Full", i = 0.002, I=2, K = 2, r = TRUE, p = 0.2, P = 0.3, o = 0.3, O = 1) plot(rfull) ## End(Not run)
Obtain a sample (a matrix of individuals/samples by genes or, equivalently, a vector of "genotypes") from an oncosimulpop object (i.e., a simulation of multiple individuals) or a single oncosimul object. Sampling schemes include whole tumor and single cell sampling, and sampling at the end of the tumor progression or during the progression of the disease.
sampledGenotypes
shows the genotype frequencies from that
sample; Shannon's diversity —entropy— of the genotypes is also
returned. Order effects are ignored.
samplePop(x, timeSample = "last", typeSample = "whole", thresholdWhole = 0.5, geneNames = NULL, popSizeSample = NULL, propError = 0) sampledGenotypes(y, genes = NULL)
samplePop(x, timeSample = "last", typeSample = "whole", thresholdWhole = 0.5, geneNames = NULL, popSizeSample = NULL, propError = 0) sampledGenotypes(y, genes = NULL)
x |
An object of class |
y |
The output from a call to |
timeSample |
"last" means to sample each individual in the very last time period of the simulation. "unif" (or "uniform") means sampling each individual at a time choosen uniformly from all the times recorded in the simulation with at least one driver between the time when the first driver appeared and the final time period. "unif" means that it is almost sure that different individuals will be sampled at different times. "last" does not guarantee that different individuals will be sampled at the same time unit, only that all will be sampled in the last time unit of their simulation. You can, alternatively, specify the population size at which you
want the sample to be taken. See argument Further clarification about "unif": suppose in a given simulation we have recorded times 1, 2, 3, 4, 5. And at times 2, 4, 5, there were clones with at least a mutant but at time 3 there were none (maybe they went extinct); the set of times to consider for sampling are 2, 4, 5, and time 3 is not considered. This might not always be what you want. |
typeSample |
"singleCell" (or "single") for single cell sampling, where the probability of sampling a cell (a clone) is directly proportional to its population size. "wholeTumor" (or "whole") for whole tumor sampling (i.e., this is similar to a biopsy being the entire tumor). "singleCell-noWT" or "single-nowt" is single cell sampling, but excluding the wild type. |
thresholdWhole |
In whole tumor sampling, whether a gene is detected as mutated depends on thresholdWhole: a gene is considered mutated if it is altered in at least thresholdWhole proportion of the cells in that individual. |
geneNames |
An optional vector of gene names so as to label the column names of the output. |
popSizeSample |
An optional vector of total population sizes at
which you want the samples to be taken. If you pass this vector,
This allows you to specify arbitrary sampling schemes with respect to total population size. |
propError |
The proportion of observations with error (for instance, genotyping error). If larger than 0, this proportion of entries in the sampled matrix will be flipped (i.e., 0s turned to 1s and 1s turned to 0s). |
genes |
If non-NULL, use only the genes in |
samplePop simply repeats the sampling process in each individual of the oncosimulpop object.
Please see oncoSimulSample
for a much more efficient way
of sampling when you are sure what you want to sample.
Note that if you have set onlyCancer = FALSE
in the call to
oncoSimulSample
, you can end up trying to sample from
simulations where the population size is 0. In this case, you will get
a vector/matrix of NAs and a warning.
Similarly, when using timeSample = "last"
you might end up with
a vector of 0 (not NAs) because you are sampling from a population
that contains no clones with mutated genes. This event (sampling from
a population that contains no clones with mutated genes), by
construction, cannot happen when timeSample = "unif"
as
"uniform" sampling is taken here to mean sampling at a time choosen
uniformly from all the times recorded in the simulation between the
time when the first driver appeared and the final time
period. However, you might still get a vector of 0, with uniform
sampling, if you sample from a population that contains only a few
cells with any mutated genes, and most cells with no mutated genes.
A matrix. Each row is a "sample genotype", where 0 denotes no alteration and 1 alteration. When using v.2, columns are named with the gene names.
We quote "sample genotype" because when not using single cell, a row
(a sample genotype) need not be, of course, any really existing
genotype in a population as we are genotyping a whole tumor. Suppose
there are really two genotypes present in the population, genotype A,
which has gene A mutated and genotype B, which has gene B
mutated. Genotype A has a frequency of 60% (so B's frequency is
40%). If you use whole tumor sampling with thresholdWhole =
0.4
you will obtain a genotype with A and B mutated.
For sampledGenotypes
a data frame with two columns: genotypes
and frequencies. This data frame has an additional attribute,
"ShannonI", where Shannon's index of diversity (entropy) is
stored. This is an object of class "sampledGenotypes"
with an S3 print method.
Ramon Diaz-Uriarte
Diaz-Uriarte, R. (2015). Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling http://www.biomedcentral.com/1471-2105/16/41/abstract
######## Using an extended poset for pancreatic cancer from Gerstung et al. ### (s and sh are made up for the example; only the structure ### and names come from Gerstung et al.) pancr <- allFitnessEffects(data.frame(parent = c("Root", rep("KRAS", 4), "SMAD4", "CDNK2A", "TP53", "TP53", "MLL3"), child = c("KRAS","SMAD4", "CDNK2A", "TP53", "MLL3", rep("PXDN", 3), rep("TGFBR2", 2)), s = 0.15, sh = -0.3, typeDep = "MN")) ## (I set mc.cores = 2 to comply with --as-cran checks, but you ## should either use a reasonable number for your hardware or ## leave it at its default value). p1 <- oncoSimulPop(4, pancr, mc.cores = 2) (sp1 <- samplePop(p1)) sampledGenotypes(sp1) ## Sample at fixed sizes. Notice the requested size ## for the last population is larger than the any population size ## so we get NAs (sp2 <- samplePop(p1, popSizeSample = c(1e7, 1e6, 4e5, 1e13))) sampledGenotypes(sp2) ## Now single cell sampling r1 <- oncoSimulIndiv(pancr) samplePop(r1, typeSample = "single") sampledGenotypes(samplePop(r1, typeSample = "single"))
######## Using an extended poset for pancreatic cancer from Gerstung et al. ### (s and sh are made up for the example; only the structure ### and names come from Gerstung et al.) pancr <- allFitnessEffects(data.frame(parent = c("Root", rep("KRAS", 4), "SMAD4", "CDNK2A", "TP53", "TP53", "MLL3"), child = c("KRAS","SMAD4", "CDNK2A", "TP53", "MLL3", rep("PXDN", 3), rep("TGFBR2", 2)), s = 0.15, sh = -0.3, typeDep = "MN")) ## (I set mc.cores = 2 to comply with --as-cran checks, but you ## should either use a reasonable number for your hardware or ## leave it at its default value). p1 <- oncoSimulPop(4, pancr, mc.cores = 2) (sp1 <- samplePop(p1)) sampledGenotypes(sp1) ## Sample at fixed sizes. Notice the requested size ## for the last population is larger than the any population size ## so we get NAs (sp2 <- samplePop(p1, popSizeSample = c(1e7, 1e6, 4e5, 1e13))) sampledGenotypes(sp2) ## Now single cell sampling r1 <- oncoSimulIndiv(pancr) samplePop(r1, typeSample = "single") sampledGenotypes(samplePop(r1, typeSample = "single"))
Simulate DAGs that represent restrictions in the accumulation of mutations.
simOGraph(n, h = ifelse(n >= 4, 4, n), conjunction = TRUE, nparents = 3, multilevelParent = TRUE, removeDirectIndirect = TRUE, rootName = "Root", geneNames = seq.int(n), out = c("adjmat", "rT"), s = 0.1, sh = -0.1, typeDep = "AND")
simOGraph(n, h = ifelse(n >= 4, 4, n), conjunction = TRUE, nparents = 3, multilevelParent = TRUE, removeDirectIndirect = TRUE, rootName = "Root", geneNames = seq.int(n), out = c("adjmat", "rT"), s = 0.1, sh = -0.1, typeDep = "AND")
n |
Number of nodes, or edges, in the graph. Like the number of genes. |
h |
Approximate height of the graph. See details. |
conjunction |
If TRUE, conjunctions (i.e., multiple parents for a node) are allowed. |
nparents |
Maximum number of parents of a node, when |
multilevelParent |
Can a node have parents at different heights (i.e., parents that are at different distance from the root node)? |
removeDirectIndirect |
Ensure that no two nodes are connected both directly (i.e., with an edge between them) and indirectly, through intermediate nodes. If TRUE, we return the transitive reduction of the DAG. |
rootName |
The name you want to give the "Root" node. |
geneNames |
The names you want to give the the non-root nodes. |
out |
Whether the ouptut should be an adjacency matrix or a
"restriction table", as used in |
s |
If using as output a restriction, the default value for s. See
|
sh |
If using as output a restriction, the default value for
sh. See |
typeDep |
If using as output a restriction, the default value for
"typeDep". See |
This is a simple, heuristic procedure for generating graphs of restrictions that seem compatible with published trees in the oncogenetic literature.
The basic procedure is as follows: nodes (argument n
) are split
into approximately equally sized h
groups, and then each node
from a level is connected to nodes chosen randomly from nodes of the
remaing superior (i.e., closer to the Root) levels. The number of
edges comes from a uniform distribution between 1 and nparents
.
The actual depth of the graph can be smaller than h
because
nodes from a level might be connected to superior levels skipping
intermediate ones.
See the vignette for further discussion about arguments.
An adjacency matrix for a directed graph or a data frame to be used as
input, as "restriction table" in allFitnessEffects
.
Ramon Diaz-Uriarte. Code for transitive closure taken from the nem package, whose authors are Holger Froehlich, Florian Markowetz, Achim Tresch, Theresa Niederberger, Christian Bender, Matthias Maneck, Claudio Lottaz, Tim Beissbarth
(a1 <- simOGraph(10)) library(graph) ## for simple plotting plot(as(a1, "graphNEL")) simOGraph(3, geneNames = LETTERS[1:3])
(a1 <- simOGraph(10)) library(graph) ## for simple plotting plot(as(a1, "graphNEL")) simOGraph(3, geneNames = LETTERS[1:3])
Export a fitness landscape in a format that is understood by MAGELLAN http://wwwabi.snv.jussieu.fr/public/Magellan/ and obtain fitness landscape statistics from MAGELLAN.
to_Magellan(x, file, max_num_genotypes = 2000) Magellan_stats(x, max_num_genotypes = 2000, verbose = FALSE, use_log = FALSE, short = TRUE, replace_missing = FALSE)
to_Magellan(x, file, max_num_genotypes = 2000) Magellan_stats(x, max_num_genotypes = 2000, verbose = FALSE, use_log = FALSE, short = TRUE, replace_missing = FALSE)
x |
One of the following:
The first two are the same as the format for the |
file |
The name of the output file. If NULL, a name will be
created using |
max_num_genotypes |
Maximum allowed number of genotypes. For some
types of input, we make a call to |
verbose |
If TRUE provide additional information about names of intermediate files. |
use_log |
Use log fitness when computing statistics. Note that
the |
short |
Give short output when computing statistics. |
replace_missing |
From MAGELLAN's |
to_Magellan
: A file is written to disk. You can then plot and/or show summary
statistics using MAGELLAN.
Magellan_stats
: MAGELLAN's statistics for fitness
landscapes. If you use short = TRUE
a vector of statistics is
returned. If short = FALSE
, MAGELLAN returns a file with
detailed statistics that cannot be turned into a simple vector of
statistics. The returned object uses readLines
and, as a
message, you are also shown the path of the file, in case you want to
process it yourself.
If you try to pass a fitness specification with order effects you will receive an error, since that cannot be plotted with MAGELLAN.
Ramon Diaz-Uriarte
MAGELLAN web site: http://wwwabi.snv.jussieu.fr/public/Magellan/
Brouillet, S. et al. (2015). MAGELLAN: a tool to explore small fitness landscapes. bioRxiv, 31583. http://doi.org/10.1101/031583
allFitnessEffects
,
evalAllGenotypes
,
allFitnessEffects
,
rfitness
## Generate random fitness for four-genes genotype ## and export landscape. r1 <- rfitness(4) to_Magellan(r1, NULL) ## Specify fitness using a DAG and export it cs <- data.frame(parent = c(rep("Root", 3), "a", "d", "c"), child = c("a", "b", "d", "e", "c", "f"), s = 0.1, sh = -0.9, typeDep = "MN") to_Magellan(allFitnessEffects(cs), NULL) ## Default, short output Magellan_stats(allFitnessEffects(cs)) ## Long output; since it is a > 200 lines file, ## place in an object. Name of output file is given as message statslong <- Magellan_stats(allFitnessEffects(cs), short = FALSE) ## Default, short output of two NK fitness landscapes rnk1 <- rfitness(6, K = 1, model = "NK") Magellan_stats(rnk1) rnk2 <- rfitness(6, K = 4, model = "NK") Magellan_stats(rnk2)
## Generate random fitness for four-genes genotype ## and export landscape. r1 <- rfitness(4) to_Magellan(r1, NULL) ## Specify fitness using a DAG and export it cs <- data.frame(parent = c(rep("Root", 3), "a", "d", "c"), child = c("a", "b", "d", "e", "c", "f"), s = 0.1, sh = -0.9, typeDep = "MN") to_Magellan(allFitnessEffects(cs), NULL) ## Default, short output Magellan_stats(allFitnessEffects(cs)) ## Long output; since it is a > 200 lines file, ## place in an object. Name of output file is given as message statslong <- Magellan_stats(allFitnessEffects(cs), short = FALSE) ## Default, short output of two NK fitness landscapes rnk1 <- rfitness(6, K = 1, model = "NK") Magellan_stats(rnk1) rnk2 <- rfitness(6, K = 4, model = "NK") Magellan_stats(rnk2)
Simulations shown in the vignette. Since running them can take a few seconds, we have pre-run them, and stored the results.
They are here mainly to facilitate creation of table from the vignette itself. The script is available under "inst/miscell".
data(osi) data(osi_with_ints) data(atex4) data(atex5) data(atex2b) data(uvex3) data(smyelo3v57) data(s_3_b) data(uvex2) data(simT2) data(simul_period_1) data(simT3) data(s_3_a)
data(osi) data(osi_with_ints) data(atex4) data(atex5) data(atex2b) data(uvex3) data(smyelo3v57) data(s_3_b) data(uvex2) data(simT2) data(simul_period_1) data(simT3) data(s_3_a)
Output from runs of oncoSimulIndiv, with some components removed to minimize size.
data(atex2b) plot(atex2b)
data(atex2b) plot(atex2b)