This example will illustrate how to simulate phenotypes that are based on the premises of SConES: they are caused by SNPs interconnected in the underlying network. The simulation has two steps: 1) simulating the causal mechanism and 2) simulating the phenotypes.
We will start by loading martini
.
Before starting, we need to create a network over which we will act.
This is a really simple network, formed by only 4 genes. Of these, we will randomly select 2 of them, interconnected in the network, and 30% of their SNPs will be causal.
causal <- simulate_causal_snps(gi, ngenes = 2, pcausal = 0.3)
par(mar=c(0,0,0,0)+.1)
plot(gi, mark.groups = names(causal))
The selected SNPs are highlighted. As we can see, they match the conditions described before i.e. they are mapped to a gene and are interconnected in the underlying network. Here are some examples of causal mechanisms that are simulated in the human interactome-derived GI network.
This SNPs can be directly passed to
simulate_phenotypes()
in order to simulate a phenotype
based on the genotypes on these SNPs on an existing experiment. It
starts from an existing GWAS experiment, in order to mantain the linkage
disequilibrium patterns, the complexity of the network, etc. of a real
case. If the existing GWAS experiment is a case-control, only the
controls are used for the simulation.
A quantitative phenotype yj is simulated for a patient j using the following additive model, proposed in (Yang et al. 2011):
yj = ∑iwijui + ej
where the weight wij is the inclination of the genotype i of patient j over the phenotype; the allelic effect of the i-th causal variant ui in arbitrary units; and the residual effect ej is the the proportion of the trait not attributable to the genotype.
The weight wij is calculated as
$$ w_{ij} = \frac{x_{ij} - 2p_i}{\sqrt{2p_i(1 - p_i)}} $$
where xij is the number of reference alleles for the i-th causal variant of the j-th individual; and pi is the frequency of the i-th causal variant. wij follows a sigmoid like behavior for different p: the rarer an allele is, the stronger its impact on the phenotype.
The most interesting bit of this simulation, however, has to do with the residual effect ej, given that it depends directly on the heritability of the trait. ej is generated from a normal distribution with mean of 0 and variance
$$ \frac{1}{h^2 - 1} var(\sum\nolimits_i w_{ij} u_i)$$
When all variance is due to genetics (h2 = 1), ej = 0 for every patient j.
Our limited minigwas
is too homogeneous to perform any
simulation, so we will first randomly simulate a genotype matrix. Then,
we will simualte a phenotype assuming a heritability h2 = 0.9:
# create a random 100x25 matrix of genotypes
X <- lapply(seq_len(100), function(i) { sample(c(0,1,2), 25, replace = TRUE) })
X <- do.call(rbind, X)
colnames(X) <- minigwas$map$snp.name
rownames(X) <- 1:nrow(X)
X <- X + 1
mode(X) <- "raw"
minigwas$genotypes <- new("SnpMatrix", X)
simulated <- simulate_phenotype(minigwas, snps = causal, h2 = 0.9)
simulate_phenotypes()
returns an object identical to the
input GWAS, where the phenotype has been replace by the simulation.
Case-control phenotypes are also possible. It is created by dividing in
two the aforementioned distribution y into cases and controls. In this
case, y reflects the general
population, and the number of cases cannot represent a higher proportion
than a specified prevalence of the disease.
simulated <- simulate_phenotype(minigwas, snps = causal, h2 = 0.9, qualitative = TRUE,
ncases = 10, ncontrols = 40, prevalence = 0.2)