Title: | An R package to identify multi-snp effects in nuclear family studies using the GADGETS method |
---|---|
Description: | This package runs the GADGETS method to identify epistatic effects in nuclear family studies. It also provides functions for permutation-based inference and graphical visualization of the results. |
Authors: | Michael Nodzenski [aut, cre], Juno Krahn [ctb] |
Maintainer: | Michael Nodzenski <[email protected]> |
License: | GPL-3 |
Version: | 1.9.0 |
Built: | 2024-12-19 03:24:53 UTC |
Source: | https://github.com/bioc/epistasisGA |
A simulated dataset containing the counts of the alternate allele for 100 SNPs for the affected child in 1000 simulated case-parent triads. Columns represent SNPs, rows are individuals. SNPs in columns 51, 52, 76, and 77 represent a true risk pathway.
data(case)
data(case)
A data frame with 1000 rows and 100 variables
A simulated dataset containing the counts of the alternate allele for 24 SNPs for the cases in 1000 simulated case-parent triads. Columns represent SNPs, rows are individuals. SNPs in columns 6, 12, and 18 represent a simulated risk pathway, where, in the child, at least one copy of the alternate allele for each path SNP in addition to exposure 1 confers increased disease risk. .
A simulated dataset containing the counts of the alternate allele for 24 SNPs for the cases in 1000 simulated case-parent triads. Columns represent SNPs, rows are individuals. SNPs in columns 6, 12, and 18 represent a simulated risk pathway, where, in the child, at least one copy of the alternate allele for each path SNP in addition to exposure 1 confers increased disease risk. .
data(case.gxe) data(case.gxe)
data(case.gxe) data(case.gxe)
A data frame with 1000 rows and 24 variables
A data frame with 1000 rows and 24 variables
A simulated dataset containing the counts of the alternate allele
for 24 SNPs for the cases in 1000 simulated case-parent
triads. Columns represent SNPs, rows are individuals. The SNP
in column 6 of the corresponding maternal dataset mom.mci
interacts with the SNPs in columns 12 and 18
of case.mci
to increase risk of
disease in the child, where at least one copy of the alternate allele
(genotype 1 or 2) is required at each implicated locus.
.
data(case.mci)
data(case.mci)
A matrix with 1000 rows and 24 variables
This function assigns a fitness score to a chromosome. It is a wrapper for the Rcpp function chrom_fitness_score.
chrom.fitness.score( case.genetic.data, complement.genetic.data, target.snps, ld.block.vec, weight.lookup, n.different.snps.weight = 2, n.both.one.weight = 1, recessive.ref.prop = 0.75, recode.test.stat = 1.64, epi.test = FALSE )
chrom.fitness.score( case.genetic.data, complement.genetic.data, target.snps, ld.block.vec, weight.lookup, n.different.snps.weight = 2, n.both.one.weight = 1, recessive.ref.prop = 0.75, recode.test.stat = 1.64, epi.test = FALSE )
case.genetic.data |
The genetic data of the disease affected children
from case-parent trios or disease-discordant sibling pairs. If searching for
maternal SNPs that are related to risk of disease in the child, some of the
columns in |
complement.genetic.data |
A genetic dataset for the controls
corresponding to the genotypes in |
target.snps |
An integer vector of the columns corresponding to the collection of SNPs, or chromosome, for which the fitness score will be computed. |
ld.block.vec |
An integer vector specifying the linkage blocks of the
input SNPs. As an example, for 100 candidate SNPs, suppose we specify
|
weight.lookup |
A vector that maps a family weight to the weighted sum of the number of different SNPs and SNPs both equal to one. |
n.different.snps.weight |
The number by which the number of different SNPs between a case and complement/unaffected sibling is multiplied in computing the family weights. Defaults to 2. |
n.both.one.weight |
The number by which the number of SNPs equal to 1 in both the case and complement/unaffected sibling is multiplied in computing the family weights. Defaults to 1. |
recessive.ref.prop |
The proportion to which the observed proportion of informative cases with the provisional risk genotype(s) will be compared to determine whether to recode the SNP as recessive. Defaults to 0.75. |
recode.test.stat |
For a given SNP, the minimum test statistic required to recode and recompute the fitness score using recessive coding. Defaults to 1.64. |
epi.test |
A logical indicating whether the function should return the
information required to run function |
A list:
The chromosome fitness score.
The weighted mean difference vector corresponding to the chromosome, with each element divided by it's pseudo-standard error. The magnitudes of these values are not particularly important, but the sign is useful. A positive value for a given SNP indicates the minor allele is positively associated with disease status, while a negative value implies the reference (‘wild type’) allele is positively associated with the disease.
The number of cases with a risk-related genotype at each locus over the total number of cases or controls that have a full set of risk genotypes at each locus, among families where only one of the case or control has the full risk set.
A vector indicating the number risk alleles a case
or complement must have for each SNP in target.snps
for the case or
complement to be classified as having the proposed risk set. '1+' indicates
at least one copy of the risk allele is required, while '2' indicates 2
copies are needed. The risk allele can be determined based on the signs of
the elements of sum_dif_vecs
, where a negative value indicates the
major allele for a given SNP is the risk allele, while a positive value
implicates the minor allele.
An integer vector of the informative family rows.
Only returned if epi.test
= TRUE.
data(case) data(dad) data(mom) case <- as.matrix(case) dad <- as.matrix(dad) mom <- as.matrix(mom) comp <- mom + dad - case weight.lookup <- vapply(seq_len(6), function(x) 2^x, 1) storage.mode(weight.lookup) <- "integer" block.ld.vec <- cumsum(rep(25, 4)) chrom.fitness.score(case, comp, c(1, 4, 7), block.ld.vec, weight.lookup)
data(case) data(dad) data(mom) case <- as.matrix(case) dad <- as.matrix(dad) mom <- as.matrix(mom) comp <- mom + dad - case weight.lookup <- vapply(seq_len(6), function(x) 2^x, 1) storage.mode(weight.lookup) <- "integer" block.ld.vec <- cumsum(rep(25, 4)) chrom.fitness.score(case, comp, c(1, 4, 7), block.ld.vec, weight.lookup)
This function combines GADGETS results for individual islands into a single dataset.
combine.islands( results.dir, annotation.data, preprocessed.list, n.top.chroms.per.island = 1 )
combine.islands( results.dir, annotation.data, preprocessed.list, n.top.chroms.per.island = 1 )
results.dir |
The directory in which individual island results from
|
annotation.data |
A data frame containing columns 'RSID', 'REF' and
'ALT'. Column 'RSID' gives the RSIDs for the input SNPs, with the rows
ordered such that the first RSID entry corresponds to the first SNP
column in the data passed to function |
preprocessed.list |
The initial list produced by function
|
n.top.chroms.per.island |
The number of top chromosomes per island to save in the final combined list. Defaults to the single top chromosome. |
A data.table containing the results aggregated across islands. Note
these results be written to results.dir
as
combined.island.unique.chromosome.results.rds'. See the package vignette for
more detailed descriptions of the content of each output column. Secondarily,
this will concatenate all individual island results files and store them
in a single file, called "all.island.results.concatenated.rds".
data(case) data(dad) data(mom) data(snp.annotations) pp.list <- preprocess.genetic.data(as.matrix(case[, 1:10]), father.genetic.data = as.matrix(dad[ , 1:10]), mother.genetic.data = as.matrix(mom[ , 1:10]), ld.block.vec = c(10)) run.gadgets(pp.list, n.chromosomes = 4, chromosome.size = 3, results.dir = 'tmp', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res <- combine.islands('tmp', snp.annotations[ 1:10, ], pp.list) unlink("tmp", recursive = TRUE) unlink("tmp_reg", recursive = TRUE)
data(case) data(dad) data(mom) data(snp.annotations) pp.list <- preprocess.genetic.data(as.matrix(case[, 1:10]), father.genetic.data = as.matrix(dad[ , 1:10]), mother.genetic.data = as.matrix(mom[ , 1:10]), ld.block.vec = c(10)) run.gadgets(pp.list, n.chromosomes = 4, chromosome.size = 3, results.dir = 'tmp', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res <- combine.islands('tmp', snp.annotations[ 1:10, ], pp.list) unlink("tmp", recursive = TRUE) unlink("tmp_reg", recursive = TRUE)
This function returns a data.table of graphical SNP-pair scores for use in network plots of GADGETS results.
compute.graphical.scores( results.list, preprocessed.list, score.type = "logsum", pval.thresh = 0.05, n.permutes = 10000, n.different.snps.weight = 2, n.both.one.weight = 1, weight.function.int = 2, recessive.ref.prop = 0.75, recode.test.stat = 1.64, bp.param = bpparam(), null.mean.vec.list = NULL, null.sd.vec.list = NULL )
compute.graphical.scores( results.list, preprocessed.list, score.type = "logsum", pval.thresh = 0.05, n.permutes = 10000, n.different.snps.weight = 2, n.both.one.weight = 1, weight.function.int = 2, recessive.ref.prop = 0.75, recode.test.stat = 1.64, bp.param = bpparam(), null.mean.vec.list = NULL, null.sd.vec.list = NULL )
results.list |
A list of length d, where d is the number of
chromosome sizes to be included in the network plot. Each element of the list
should be a data.table from |
preprocessed.list |
The list output by |
score.type |
A character string specifying the method for aggregating
SNP-pair scores across chromosome sizes. Options are 'max', 'sum', or
'logsum', defaulting to 'logsum'. For a given SNP-pair, it's graphical score
will be the |
pval.thresh |
A numeric value between 0 and 1 specifying the epistasis
test p-value threshold for a chromosome to contribute to the network. Any
chromosomes with epistasis p-value greater than |
n.permutes |
The number of permutations on which to base the epistasis tests. Defaults to 10000. |
n.different.snps.weight |
The number by which the number of different SNPs between a case and complement/unaffected sibling is multiplied in computing the family weights. Defaults to 2. |
n.both.one.weight |
The number by which the number of SNPs equal to 1 in both the case and complement/unaffected sibling is multiplied in computing the family weights. Defaults to 1. |
weight.function.int |
An integer used to assign family weights.
Specifically, we use |
recessive.ref.prop |
The proportion to which the observed proportion of informative cases with the provisional risk genotype(s) will be compared to determine whether to recode the SNP as recessive. Defaults to 0.75. |
recode.test.stat |
For a given SNP, the minimum test statistic required to recode and recompute the fitness score using recessive coding. Defaults to 1.64. |
bp.param |
The BPPARAM argument to be passed to bplapply.
See |
null.mean.vec.list |
(experimental) A list, equal in length to
|
null.sd.vec.list |
(experimental) A list, equal in length to
|
A list of two elements:
A data.table containing SNP-pair graphical scores, where the first four columns represent SNPs and the fifth column (pair.score) is the graphical SNP-pair score.
A data.table containing individual SNP graphical scores, where the first two columns represent SNPs and the third column (snp.score) is the graphical SNP score.
data(case) data(dad) data(mom) data(snp.annotations) set.seed(1400) # preprocess data target.snps <- c(1:3, 30:32, 60:62, 85) preprocessed.list <- preprocess.genetic.data(as.matrix(case[, target.snps]), father.genetic.data = as.matrix(dad[ , target.snps]), mother.genetic.data = as.matrix(mom[ , target.snps]), ld.block.vec = c(3, 3, 3, 1)) ## run GA for observed data #observed data chromosome size 2 run.gadgets(preprocessed.list, n.chromosomes = 5, chromosome.size = 2, results.dir = 'tmp_2', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res2 <- combine.islands('tmp_2', snp.annotations[ target.snps, ], preprocessed.list, 2) unlink('tmp_reg', recursive = TRUE) #observed data chromosome size 3 run.gadgets(preprocessed.list, n.chromosomes = 5, chromosome.size = 3, results.dir = 'tmp_3', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res3 <- combine.islands('tmp_3', snp.annotations[ target.snps, ], preprocessed.list, 2) unlink('tmp_reg', recursive = TRUE) ## create list of results final.results <- list(combined.res2[1:3, ], combined.res3[1:3, ]) ## compute edge scores edge.dt <- compute.graphical.scores(final.results, preprocessed.list, pval.thresh = 0.5) lapply(c("tmp_2", "tmp_3"), unlink, recursive = TRUE)
data(case) data(dad) data(mom) data(snp.annotations) set.seed(1400) # preprocess data target.snps <- c(1:3, 30:32, 60:62, 85) preprocessed.list <- preprocess.genetic.data(as.matrix(case[, target.snps]), father.genetic.data = as.matrix(dad[ , target.snps]), mother.genetic.data = as.matrix(mom[ , target.snps]), ld.block.vec = c(3, 3, 3, 1)) ## run GA for observed data #observed data chromosome size 2 run.gadgets(preprocessed.list, n.chromosomes = 5, chromosome.size = 2, results.dir = 'tmp_2', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res2 <- combine.islands('tmp_2', snp.annotations[ target.snps, ], preprocessed.list, 2) unlink('tmp_reg', recursive = TRUE) #observed data chromosome size 3 run.gadgets(preprocessed.list, n.chromosomes = 5, chromosome.size = 3, results.dir = 'tmp_3', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res3 <- combine.islands('tmp_3', snp.annotations[ target.snps, ], preprocessed.list, 2) unlink('tmp_reg', recursive = TRUE) ## create list of results final.results <- list(combined.res2[1:3, ], combined.res3[1:3, ]) ## compute edge scores edge.dt <- compute.graphical.scores(final.results, preprocessed.list, pval.thresh = 0.5) lapply(c("tmp_2", "tmp_3"), unlink, recursive = TRUE)
A simulated dataset containing the counts of the alternate allele for 100 SNPs for the fathers in 1000 simulated case-parent triads. Columns represent SNPs, rows are individuals. SNPs in columns 51, 52, 76, and 77 represent a true risk pathway.
data(dad)
data(dad)
A data frame with 1000 rows and 100 variables
A simulated dataset containing the counts of the alternate allele for 24 SNPs for the fathers in 1000 simulated case-parent triads. Columns represent SNPs, rows are individuals. SNPs in columns 6, 12, and 18 represent a simulated risk pathway, where, in the child, at least one copy of the alternate allele for each path SNP in addition to exposure 1 confers increased disease risk. .
A simulated dataset containing the counts of the alternate allele for 24 SNPs for the fathers in 1000 simulated case-parent triads. Columns represent SNPs, rows are individuals. SNPs in columns 6, 12, and 18 represent a simulated risk pathway, where, in the child, at least one copy of the alternate allele for each path SNP in addition to exposure 1 confers increased disease risk. .
data(dad.gxe) data(dad.gxe)
data(dad.gxe) data(dad.gxe)
A data frame with 1000 rows and 24 variables
A data frame with 1000 rows and 24 variables
A simulated dataset containing the counts of the alternate allele
for 24 SNPs for the fathers in 1000 simulated case-parent
triads. Columns represent SNPs, rows are individuals. The SNP
in column 6 of the corresponding maternal dataset mom.mci
interacts with the SNPs in columns 12 and 18
of the corresponding child dataset case.mci
to increase risk of
disease in the child, where at least one copy of the alternate allele
(genotype 1 or 2) is required at each implicated locus.
.
data(dad.mci)
data(dad.mci)
A matrix with 1000 rows and 24 variables
This function runs a permutation based test of the null hypothesis that a collection of SNPs do not exhibit epistasis, conditional upon observed marginal SNP-disease associations.
epistasis.test( snp.cols, preprocessed.list, n.permutes = 10000, n.different.snps.weight = 2, n.both.one.weight = 1, weight.function.int = 2, recessive.ref.prop = 0.75, recode.test.stat = 1.64, maternal.fetal.test = FALSE )
epistasis.test( snp.cols, preprocessed.list, n.permutes = 10000, n.different.snps.weight = 2, n.both.one.weight = 1, weight.function.int = 2, recessive.ref.prop = 0.75, recode.test.stat = 1.64, maternal.fetal.test = FALSE )
snp.cols |
An integer vector specifying the columns in the input data containing the SNPs to be tested. |
preprocessed.list |
The initial list produced by function
|
n.permutes |
The number of permutations on which to base the test. Defaults to 10000. |
n.different.snps.weight |
The number by which the number of different SNPs between a case and complement/unaffected sibling is multiplied in computing the family weights. Defaults to 2. |
n.both.one.weight |
The number by which the number of SNPs equal to 1 in both the case and complement/unaffected sibling is multiplied in computing the family weights. Defaults to 1. |
weight.function.int |
An integer used to assign family weights.
Specifically, we use |
recessive.ref.prop |
The proportion to which the observed proportion of informative cases with the provisional risk genotype(s) will be compared to determine whether to recode the SNP as recessive. Defaults to 0.75. |
recode.test.stat |
For a given SNP, the minimum test statistic required to recode and recompute the fitness score using recessive coding. Defaults to 1.64. |
maternal.fetal.test |
A boolean indicating whether the test specifically for a maternal-fetal interaction should be run. Defaults to FALSE. |
A list of thee elements:
The p-value of the test. (In GADGETS papers, these are instead referred to as h-values)
The fitness score from the observed data
A vector of fitness scores for the permuted datasets.
data(case) data(dad) data(mom) data(snp.annotations) pp.list <- preprocess.genetic.data(as.matrix(case), father.genetic.data = as.matrix(dad), mother.genetic.data = as.matrix(mom), ld.block.vec = rep(25, 4)) run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 3, results.dir = "tmp", cluster.type = "interactive", registryargs = list(file.dir = "tmp_reg", seed = 1300), n.islands = 8, island.cluster.size = 4, n.migrations = 2 ) combined.res <- combine.islands("tmp", snp.annotations, pp.list, 2) top.snps <- as.vector(t(combined.res[1, 1:3])) set.seed(10) epi.test.res <- epistasis.test(top.snps, pp.list) unlink('tmp', recursive = TRUE) unlink('tmp_reg', recursive = TRUE)
data(case) data(dad) data(mom) data(snp.annotations) pp.list <- preprocess.genetic.data(as.matrix(case), father.genetic.data = as.matrix(dad), mother.genetic.data = as.matrix(mom), ld.block.vec = rep(25, 4)) run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 3, results.dir = "tmp", cluster.type = "interactive", registryargs = list(file.dir = "tmp_reg", seed = 1300), n.islands = 8, island.cluster.size = 4, n.migrations = 2 ) combined.res <- combine.islands("tmp", snp.annotations, pp.list, 2) top.snps <- as.vector(t(combined.res[1, 1:3])) set.seed(10) epi.test.res <- epistasis.test(top.snps, pp.list) unlink('tmp', recursive = TRUE) unlink('tmp_reg', recursive = TRUE)
epistasisGA
packageA package implementing the GADGETS method to detect multi-SNP effects in case-parent triad or affected/unaffected sibling studies.
A data.frame containing simulated exposure status for each case of the case-parent triads data. Rows correspond to different families. The single column represents a binary exposure, where in combination with the relevant risk-associated alleles (columns 6, 12, and 18 in data set case.gxe), is associated with increased risk. .
data(exposure)
data(exposure)
A data frame with 1000 rows and 1 variables
This function runs the GADGETS method on a given cluster of islands. It is a wrapper for the underlying Rcpp function run_GADGETS.
GADGETS( cluster.number, results.dir, case.genetic.data, complement.genetic.data, case.genetic.data.n, mother.genetic.data.n, father.genetic.data.n, exposure.mat, weight.lookup.n, ld.block.vec, n.chromosomes, chromosome.size, snp.chisq, weight.lookup, null.mean.vec = c(0, 0), null.se.vec = c(1, 1), island.cluster.size = 4, n.migrations = 20, n.different.snps.weight = 2, n.both.one.weight = 1, migration.interval = 50, gen.same.fitness = 50, max.generations = 500, initial.sample.duplicates = FALSE, crossover.prop = 0.8, recessive.ref.prop = 0.75, recode.test.stat = 1.64, E_GADGETS = FALSE )
GADGETS( cluster.number, results.dir, case.genetic.data, complement.genetic.data, case.genetic.data.n, mother.genetic.data.n, father.genetic.data.n, exposure.mat, weight.lookup.n, ld.block.vec, n.chromosomes, chromosome.size, snp.chisq, weight.lookup, null.mean.vec = c(0, 0), null.se.vec = c(1, 1), island.cluster.size = 4, n.migrations = 20, n.different.snps.weight = 2, n.both.one.weight = 1, migration.interval = 50, gen.same.fitness = 50, max.generations = 500, initial.sample.duplicates = FALSE, crossover.prop = 0.8, recessive.ref.prop = 0.75, recode.test.stat = 1.64, E_GADGETS = FALSE )
cluster.number |
An integer indicating the cluster number (used for labeling the output file). |
results.dir |
The directory to which island results will be saved. |
case.genetic.data |
The genetic data of the disease affected children
from case-parent trios or disease-discordant sibling pairs. If searching for
maternal SNPs that are related to risk of disease in the child, some of the
columns in |
complement.genetic.data |
A genetic dataset for the controls
corresponding to the genotypes in |
case.genetic.data.n |
(experimental) A matrix, to be used in the
experimental E-GADGETS method, containing the same data as described above
for |
mother.genetic.data.n |
(experimental) A matrix, to be used in the
experimental E-GADGETS method, containing the genotypes for the mothers of
|
father.genetic.data.n |
(experimental) A matrix, to be used in the
experimental E-GADGETS method, containing the genotypes for the fathers of
|
exposure.mat |
(experimental) A matrix of the input categorical and continuous exposures, if specified, to be used in the experimental E-GADGETS method. If not running E-GADGETS, this should be a 1x1 matrix whose only entry is 0.0. |
weight.lookup.n |
(experimental) A vector that maps a family weight to
the weighted sum of the number of different SNPs and SNPs both equal to one,
to be used by the experimetnal E-GADGETS method. The vector should store
values as floating point values, not integers. If not running E-GADGETS, this
argument should be specified as 0.0, and will not be used in the GA. Instead,
for GADGETS, computation of the family weights will be based on argument
|
ld.block.vec |
An integer vector specifying the linkage blocks of the
input SNPs. As an example, for 100 candidate SNPs, suppose we specify
|
n.chromosomes |
An integer specifying the number of chromosomes to use in the GA. |
chromosome.size |
An integer specifying the number of SNPs on each chromosome. |
snp.chisq |
A vector of statistics to be used in sampling SNPs for
mutation. By default, these are the square roots of
the chi-square marginal SNP-disease association statistics for each column in
|
weight.lookup |
A vector that maps a family weight to the weighted sum of the number of different SNPs and SNPs both equal to one. This should store values as integers. |
null.mean.vec |
(experimental) A vector of estimated null means for each component of the E-GADGETS (GxGxE) fitness score. For all other uses, this should be specified as rep(0, 2) and will not be used. |
null.se.vec |
(experimental) A vector of estimated null standard deviations for each component of the E-GADGETS (GxGxE) fitness score. For all other uses, this should be specified as rep(0, 2) and will not be used. |
island.cluster.size |
An integer specifying the number of islands in the cluster. See coderun.gadgets for additional details. |
n.migrations |
The number of chromosomes that migrate among islands.
This value must be less than |
n.different.snps.weight |
The number by which the number of different SNPs between a case and complement is multiplied in computing the family weights. Defaults to 2. |
n.both.one.weight |
The number by which the number of SNPs equal to 1 in both the case and complement is multiplied in computing the family weights. Defaults to 1. |
migration.interval |
The interval of generations for which GADGETS will
run prior to migration of top chromosomes among islands in a cluster.
Defaults to 50. In other words, top chromosomes will migrate among cluster
islands every |
gen.same.fitness |
The number of consecutive generations with the same fitness score required for algorithm termination. Defaults to 50. |
max.generations |
The maximum number of generations for which GADGETS will run. Defaults to 500. |
initial.sample.duplicates |
A logical indicating whether the same SNP can appear in more than one chromosome in the initial sample of chromosomes (the same SNP may appear in more than one chromosome thereafter, regardless). Defaults to FALSE. |
crossover.prop |
A numeric between 0 and 1 indicating the proportion of chromosomes to be subjected to cross over. The remaining proportion will be mutated. Defaults to 0.8. |
recessive.ref.prop |
The proportion to which the observed proportion of informative cases with the provisional risk genotype(s) will be compared to determine whether to recode the SNP as recessive. Defaults to 0.75. |
recode.test.stat |
For a given SNP, the minimum test statistic required to recode and recompute the fitness score using recessive coding. Defaults to 1.64. See the GADGETS paper for specific details. |
E_GADGETS |
(experimental) A boolean indicating whether to run the experimental 'E_GADGETS' method. |
For each island in the cluster, an rds object containing a list with
the following elements will be written to results.dir
.
A data.table of the final generation chromosomes, their fitness scores, and, for GADGETS, additional information pertaining to nominated risk-related genotypes. See the package vignette for an example and for additional details.
The total number of generations run.
set.seed(10) data(case) case <- as.matrix(case) data(dad) dad <- as.matrix(dad) data(mom) mom <- as.matrix(mom) data.list <- preprocess.genetic.data(case[, 1:10], father.genetic.data = dad[ , 1:10], mother.genetic.data = mom[ , 1:10], ld.block.vec = c(10)) chisq.stats <- sqrt(data.list$chisq.stats) ld.block.vec <- data.list$ld.block.vec case.genetic.data <- data.list$case.genetic.data complement.genetic.data <- data.list$complement.genetic.data #required inputs but not actually used in function below case.genetic.data.n <- matrix(0.0, 1, 1) mother.genetic.data.n <- matrix(0.0, 1, 1) father.genetic.data.n <- matrix(0.0, 1, 1) exposure.mat <- data.list$exposure.mat + 0.0 weight.lookup <- vapply(seq_len(6), function(x) 2^x, 1) dir.create('tmp') GADGETS(cluster.number = 1, results.dir = 'tmp', case.genetic.data = case.genetic.data, complement.genetic.data = complement.genetic.data, case.genetic.data.n = case.genetic.data.n, mother.genetic.data.n = mother.genetic.data.n, father.genetic.data.n = father.genetic.data.n, exposure.mat = exposure.mat, weight.lookup.n = weight.lookup + 0.0, ld.block.vec = ld.block.vec, n.chromosomes = 10, chromosome.size = 3, snp.chisq = chisq.stats, weight.lookup = weight.lookup, n.migrations = 2, migration.interval = 5, gen.same.fitness = 10, max.generations = 10)
set.seed(10) data(case) case <- as.matrix(case) data(dad) dad <- as.matrix(dad) data(mom) mom <- as.matrix(mom) data.list <- preprocess.genetic.data(case[, 1:10], father.genetic.data = dad[ , 1:10], mother.genetic.data = mom[ , 1:10], ld.block.vec = c(10)) chisq.stats <- sqrt(data.list$chisq.stats) ld.block.vec <- data.list$ld.block.vec case.genetic.data <- data.list$case.genetic.data complement.genetic.data <- data.list$complement.genetic.data #required inputs but not actually used in function below case.genetic.data.n <- matrix(0.0, 1, 1) mother.genetic.data.n <- matrix(0.0, 1, 1) father.genetic.data.n <- matrix(0.0, 1, 1) exposure.mat <- data.list$exposure.mat + 0.0 weight.lookup <- vapply(seq_len(6), function(x) 2^x, 1) dir.create('tmp') GADGETS(cluster.number = 1, results.dir = 'tmp', case.genetic.data = case.genetic.data, complement.genetic.data = complement.genetic.data, case.genetic.data.n = case.genetic.data.n, mother.genetic.data.n = mother.genetic.data.n, father.genetic.data.n = father.genetic.data.n, exposure.mat = exposure.mat, weight.lookup.n = weight.lookup + 0.0, ld.block.vec = ld.block.vec, n.chromosomes = 10, chromosome.size = 3, snp.chisq = chisq.stats, weight.lookup = weight.lookup, n.migrations = 2, migration.interval = 5, gen.same.fitness = 10, max.generations = 10)
This function runs a global test of the null hypothesis that there are no SNP-disease associations across a range of chromosome sizes
global.test(results.list, n.top.scores = 10)
global.test(results.list, n.top.scores = 10)
results.list |
A list of length d, where d is the number of chromosome
sizes to be included in a global test. Each element of the list must itself
be a list whose first element |
n.top.scores |
The number of top scoring chromosomes, for each chromosome size, to be used in calculating the global test. Defaults to 10. |
A list containing the following:
The observed test statistic.
A vector of test statistics from permuted data.
The p-value for the global test.
A vector of observed test statistics for each chromosome size.
A matrix of test statistics for the permutation datasets, where rows correspond to permutations and columns correspond to chromosome sizes.
A vector containing marignal p-values for each chromosome size.
A vector of the maximum fitness score for each chromosome size in the observed data.
A list of vectors for each chromosome size of maximum observed fitness scores for each permutation.
A vector of p-values for the maximum observed order statistics for each chromosome size. P-values are the proportion of permutation based maximum order statistics that exceed the observed maximum fitness score.
A grob of a ggplot plot of the observed vs permuted fitness score densities for each chromosome size.
A vector indicating the number of top scores (k)
from each chromosome size that the test used.
This will be equal to n.top.scores
unless GADGETS returns fewer than
n.top.scores
unique chromosomes for
the observed data or any permute, in which case the chromosome size-specific
value will be equal to the smallest number of unique chromosomes returned.
The 95th percentile of the permutation maximum order statistics for each chromosome size.
data(case) data(dad) data(mom) case <- as.matrix(case) dad <- as.matrix(dad) mom <- as.matrix(mom) data(snp.annotations) set.seed(1400) pp.list <- preprocess.genetic.data(case[, 1:10], father.genetic.data = dad[ , 1:10], mother.genetic.data = mom[ , 1:10], ld.block.vec = c(10)) ## run GA for observed data #observed data chromosome size 2 run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 2, results.dir = 'tmp_2', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res2 <- combine.islands('tmp_2', snp.annotations[ 1:10, ], pp.list, 2) unlink('tmp_reg', recursive = TRUE) #observed data chromosome size 3 run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 3, results.dir = 'tmp_3', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res3 <- combine.islands('tmp_3', snp.annotations[ 1:10, ], pp.list, 2) unlink('tmp_reg', recursive = TRUE) # create three permuted datasets set.seed(1400) perm.data.list <- permute.dataset(pp.list, "perm_data", n.permutations = 3) #pre-process permuted data case.p1 <- readRDS("perm_data/case.permute1.rds") comp.p1 <- readRDS("perm_data/complement.permute1.rds") p1.list <- preprocess.genetic.data(case.p1, complement.genetic.data = comp.p1, ld.block.vec = c(10)) case.p2 <- readRDS("perm_data/case.permute2.rds") comp.p2 <- readRDS("perm_data/complement.permute2.rds") p2.list <- preprocess.genetic.data(case.p2, complement.genetic.data = comp.p2, ld.block.vec = c(10)) case.p3 <- readRDS("perm_data/case.permute3.rds") comp.p3 <- readRDS("perm_data/complement.permute3.rds") p3.list <- preprocess.genetic.data(case.p3, complement.genetic.data = comp.p3, ld.block.vec = c(10)) #permutation 1, chromosome size 2 run.gadgets(p1.list, n.chromosomes = 5, chromosome.size = 2, results.dir = 'p1_tmp_2', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) p1.combined.res2 <- combine.islands('p1_tmp_2', snp.annotations[ 1:10, ], p1.list, 2) unlink('tmp_reg', recursive = TRUE) #permutation 1, chromosome size 3 run.gadgets(p1.list, n.chromosomes = 5, chromosome.size = 3, results.dir = 'p1_tmp_3', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) p1.combined.res3 <- combine.islands('p1_tmp_3', snp.annotations[ 1:10, ], p1.list, 2) unlink('tmp_reg', recursive = TRUE) #permutation 2, chromosome size 2 run.gadgets(p2.list, n.chromosomes = 5, chromosome.size = 2, results.dir = 'p2_tmp_2', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) p2.combined.res2 <- combine.islands('p2_tmp_2', snp.annotations[ 1:10, ], p2.list, 2) unlink('tmp_reg', recursive = TRUE) #permutation 2, chromosome size 3 run.gadgets(p2.list, n.chromosomes = 5, chromosome.size = 3, results.dir = 'p2_tmp_3', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) p2.combined.res3 <- combine.islands('p2_tmp_3', snp.annotations[ 1:10, ], p2.list, 2) unlink('tmp_reg', recursive = TRUE) #permutation 3, chromosome size 2 run.gadgets(p3.list, n.chromosomes = 5, chromosome.size = 2, results.dir = 'p3_tmp_2', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) p3.combined.res2 <- combine.islands('p3_tmp_2', snp.annotations[ 1:10, ], p3.list, 2) unlink('tmp_reg', recursive = TRUE) #permutation 3, chromosome size 3 run.gadgets(p3.list, n.chromosomes = 5, chromosome.size = 3, results.dir = 'p3_tmp_3', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) p3.combined.res3 <- combine.islands('p3_tmp_3', snp.annotations[ 1:10, ], p3.list, 2) unlink('tmp_reg', recursive = TRUE) ## create list of results # chromosome size 2 results chrom2.list <- list( observed.data = combined.res2$fitness.score, permutation.list = list( p1.combined.res2$fitness.score, p2.combined.res2$fitness.score, p3.combined.res2$fitness.score ) ) # chromosome size 3 results chrom3.list <- list( observed.data = combined.res3$fitness.score, permutation.list = list( p1.combined.res3$fitness.score, p2.combined.res3$fitness.score, p3.combined.res3$fitness.score ) ) final.results <- list(chrom2.list, chrom3.list) lapply(c('tmp_2', 'tmp_3', 'p1_tmp_2', 'p2_tmp_2', 'p3_tmp_2', 'p1_tmp_3', 'p2_tmp_3', 'p3_tmp_3', 'perm_data'), unlink, recursive = TRUE)
data(case) data(dad) data(mom) case <- as.matrix(case) dad <- as.matrix(dad) mom <- as.matrix(mom) data(snp.annotations) set.seed(1400) pp.list <- preprocess.genetic.data(case[, 1:10], father.genetic.data = dad[ , 1:10], mother.genetic.data = mom[ , 1:10], ld.block.vec = c(10)) ## run GA for observed data #observed data chromosome size 2 run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 2, results.dir = 'tmp_2', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res2 <- combine.islands('tmp_2', snp.annotations[ 1:10, ], pp.list, 2) unlink('tmp_reg', recursive = TRUE) #observed data chromosome size 3 run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 3, results.dir = 'tmp_3', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res3 <- combine.islands('tmp_3', snp.annotations[ 1:10, ], pp.list, 2) unlink('tmp_reg', recursive = TRUE) # create three permuted datasets set.seed(1400) perm.data.list <- permute.dataset(pp.list, "perm_data", n.permutations = 3) #pre-process permuted data case.p1 <- readRDS("perm_data/case.permute1.rds") comp.p1 <- readRDS("perm_data/complement.permute1.rds") p1.list <- preprocess.genetic.data(case.p1, complement.genetic.data = comp.p1, ld.block.vec = c(10)) case.p2 <- readRDS("perm_data/case.permute2.rds") comp.p2 <- readRDS("perm_data/complement.permute2.rds") p2.list <- preprocess.genetic.data(case.p2, complement.genetic.data = comp.p2, ld.block.vec = c(10)) case.p3 <- readRDS("perm_data/case.permute3.rds") comp.p3 <- readRDS("perm_data/complement.permute3.rds") p3.list <- preprocess.genetic.data(case.p3, complement.genetic.data = comp.p3, ld.block.vec = c(10)) #permutation 1, chromosome size 2 run.gadgets(p1.list, n.chromosomes = 5, chromosome.size = 2, results.dir = 'p1_tmp_2', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) p1.combined.res2 <- combine.islands('p1_tmp_2', snp.annotations[ 1:10, ], p1.list, 2) unlink('tmp_reg', recursive = TRUE) #permutation 1, chromosome size 3 run.gadgets(p1.list, n.chromosomes = 5, chromosome.size = 3, results.dir = 'p1_tmp_3', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) p1.combined.res3 <- combine.islands('p1_tmp_3', snp.annotations[ 1:10, ], p1.list, 2) unlink('tmp_reg', recursive = TRUE) #permutation 2, chromosome size 2 run.gadgets(p2.list, n.chromosomes = 5, chromosome.size = 2, results.dir = 'p2_tmp_2', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) p2.combined.res2 <- combine.islands('p2_tmp_2', snp.annotations[ 1:10, ], p2.list, 2) unlink('tmp_reg', recursive = TRUE) #permutation 2, chromosome size 3 run.gadgets(p2.list, n.chromosomes = 5, chromosome.size = 3, results.dir = 'p2_tmp_3', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) p2.combined.res3 <- combine.islands('p2_tmp_3', snp.annotations[ 1:10, ], p2.list, 2) unlink('tmp_reg', recursive = TRUE) #permutation 3, chromosome size 2 run.gadgets(p3.list, n.chromosomes = 5, chromosome.size = 2, results.dir = 'p3_tmp_2', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) p3.combined.res2 <- combine.islands('p3_tmp_2', snp.annotations[ 1:10, ], p3.list, 2) unlink('tmp_reg', recursive = TRUE) #permutation 3, chromosome size 3 run.gadgets(p3.list, n.chromosomes = 5, chromosome.size = 3, results.dir = 'p3_tmp_3', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) p3.combined.res3 <- combine.islands('p3_tmp_3', snp.annotations[ 1:10, ], p3.list, 2) unlink('tmp_reg', recursive = TRUE) ## create list of results # chromosome size 2 results chrom2.list <- list( observed.data = combined.res2$fitness.score, permutation.list = list( p1.combined.res2$fitness.score, p2.combined.res2$fitness.score, p3.combined.res2$fitness.score ) ) # chromosome size 3 results chrom3.list <- list( observed.data = combined.res3$fitness.score, permutation.list = list( p1.combined.res3$fitness.score, p2.combined.res3$fitness.score, p3.combined.res3$fitness.score ) ) final.results <- list(chrom2.list, chrom3.list) lapply(c('tmp_2', 'tmp_3', 'p1_tmp_2', 'p2_tmp_2', 'p3_tmp_2', 'p1_tmp_3', 'p2_tmp_3', 'p3_tmp_3', 'perm_data'), unlink, recursive = TRUE)
This function assigns the (currently experimental) E-GADGETS fitness score to a chromosome.
GxE.fitness.score( case.genetic.data, mother.genetic.data, father.genetic.data, exposure.mat, target.snps, weight.lookup, null.mean.vec = c(0, 0), null.sd.vec = c(1, 1), n.different.snps.weight = 2, n.both.one.weight = 1 )
GxE.fitness.score( case.genetic.data, mother.genetic.data, father.genetic.data, exposure.mat, target.snps, weight.lookup, null.mean.vec = c(0, 0), null.sd.vec = c(1, 1), n.different.snps.weight = 2, n.both.one.weight = 1 )
case.genetic.data |
The genetic data of the disease affected children
from case-parent trios or disease-discordant sibling pairs. If searching for
maternal SNPs that are related to risk of disease in the child, some of the
columns in |
mother.genetic.data |
The genetic data for the mothers of the cases in
|
father.genetic.data |
The genetic data for the fathers of the cases in
|
exposure.mat |
A matrix of the input categorical and continuous exposures to be used in the experimental E-GADGETS fitness score. If there are categorical exposure variables with more than 2 levels, those should be dummy coded. |
target.snps |
An integer vector of the columns corresponding to the collection of SNPs, or chromosome, for which the fitness score will be computed. |
weight.lookup |
A vector that maps a family weight to the weighted sum of the number of different SNPs and SNPs both equal to one. |
null.mean.vec |
A vector of estimated null means for each
of the components of the E-GADGETS fitness score.
It should be set to the values of the "null.mean" element of the file
"null.mean.sd.info.rds" for the observed data, that is saved by the
|
null.sd.vec |
A vector of estimated null means for each
of the components of the E-GADGETS fitness score.
It should be set to the values of the "null.se" element of the file
"null.mean.sd.info.rds" for the observed data, that is saved by the
|
n.different.snps.weight |
The number by which the number of different SNPs between a case and complement/unaffected sibling is multiplied in computing the family weights. Defaults to 2. |
n.both.one.weight |
The number by which the number of SNPs equal to 1 in both the case and complement/unaffected sibling is multiplied in computing the family weights. Defaults to 1. |
A list:
The chromosome fitness score.
The element of the Hotelling-Lawley trace matrix corresponding to each SNP. Larger magnitudes indicate larger contributions to the score, but are otherwise difficult to interpret.
The Hotelling-Lawley trace statistic from the transmission-based fitness score component.
The Wald statistic from the family-based component of the fitness score.
data(case.gxe) data(dad.gxe) data(mom.gxe) data(exposure) case.gxe <- case.gxe + 0.0 mom.gxe <- mom.gxe + 0.0 dad.gxe <- dad.gxe + 0.0 exposure <- as.matrix(exposure + 0.0) weight.lookup <- vapply(seq_len(6), function(x) 2^x, 1.0) res <- GxE.fitness.score(case.gxe, mom.gxe, dad.gxe, exposure, c(1, 4, 7), weight.lookup)
data(case.gxe) data(dad.gxe) data(mom.gxe) data(exposure) case.gxe <- case.gxe + 0.0 mom.gxe <- mom.gxe + 0.0 dad.gxe <- dad.gxe + 0.0 exposure <- as.matrix(exposure + 0.0) weight.lookup <- vapply(seq_len(6), function(x) 2^x, 1.0) res <- GxE.fitness.score(case.gxe, mom.gxe, dad.gxe, exposure, c(1, 4, 7), weight.lookup)
This function runs a permutation based test run a test of interaction between a collection of SNPs and exposure variables.
GxE.test( snp.cols, preprocessed.list, null.mean.vec = c(0, 0), null.sd.vec = c(1, 1), n.permutes = 10000, n.different.snps.weight = 2, n.both.one.weight = 1, weight.function.int = 2 )
GxE.test( snp.cols, preprocessed.list, null.mean.vec = c(0, 0), null.sd.vec = c(1, 1), n.permutes = 10000, n.different.snps.weight = 2, n.both.one.weight = 1, weight.function.int = 2 )
snp.cols |
An integer vector specifying the columns in the input data containing the SNPs to be tested. |
preprocessed.list |
The initial list produced by function
|
null.mean.vec |
A vector of estimated null means for each
of the components of the E-GADGETS fitness score.
It should be set to the values of the "null.mean" element of the file
"null.mean.sd.info.rds" for the observed data, that is saved by the
|
null.sd.vec |
A vector of estimated null means for each
of the components of the E-GADGETS fitness score.
It should be set to the values of the "null.se" element of the file
"null.mean.sd.info.rds" for the observed data, that is saved by the
|
n.permutes |
The number of permutations on which to base the test. Defaults to 10000. |
n.different.snps.weight |
The number by which the number of different SNPs between a case and complement/unaffected sibling is multiplied in computing the family weights. Defaults to 2. |
n.both.one.weight |
The number by which the number of SNPs equal to 1 in both the case and complement/unaffected sibling is multiplied in computing the family weights. Defaults to 1. |
weight.function.int |
An integer used to assign family weights.
Specifically, we use |
A list of three elements:
The p-value of the test.
The fitness score from the observed data
A vector of fitness scores for the permuted datasets.
data(case.gxe) data(dad.gxe) data(mom.gxe) data(exposure) data(snp.annotations.mci) pp.list <- preprocess.genetic.data(case.gxe, father.genetic.data = dad.gxe, mother.genetic.data = mom.gxe, ld.block.vec = rep(6, 4), categorical.exposures = exposure) run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 3, results.dir = "tmp_gxe", cluster.type = "interactive", registryargs = list(file.dir = "tmp_reg_gxe", seed = 1300), n.islands = 8, island.cluster.size = 4, n.migrations = 1) combined.res <- combine.islands('tmp_gxe', snp.annotations.mci, pp.list, 1) top.snps <- as.vector(t(combined.res[1, 1:3])) set.seed(10) GxE.test.res <- GxE.test(top.snps, pp.list) unlink('tmp_gxe', recursive = TRUE) unlink('tmp_reg_gxe', recursive = TRUE)
data(case.gxe) data(dad.gxe) data(mom.gxe) data(exposure) data(snp.annotations.mci) pp.list <- preprocess.genetic.data(case.gxe, father.genetic.data = dad.gxe, mother.genetic.data = mom.gxe, ld.block.vec = rep(6, 4), categorical.exposures = exposure) run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 3, results.dir = "tmp_gxe", cluster.type = "interactive", registryargs = list(file.dir = "tmp_reg_gxe", seed = 1300), n.islands = 8, island.cluster.size = 4, n.migrations = 1) combined.res <- combine.islands('tmp_gxe', snp.annotations.mci, pp.list, 1) top.snps <- as.vector(t(combined.res[1, 1:3])) set.seed(10) GxE.test.res <- GxE.test(top.snps, pp.list) unlink('tmp_gxe', recursive = TRUE) unlink('tmp_reg_gxe', recursive = TRUE)
A simulated dataset containing the counts of the alternate allele for 100 SNPs for the mothers in 1000 simulated case-parent triads. Columns represent SNPs, rows are individuals. SNPs in columns 51, 52, 76, and 77 represent a true risk pathway.
data(mom)
data(mom)
A data frame with 1000 rows and 100 variables
A simulated dataset containing the counts of the alternate allele for 24 SNPs for the mothers in 1000 simulated case-parent triads. Columns represent SNPs, rows are individuals. SNPs in columns 6, 12, and 18 represent a simulated risk pathway, where, in the child, at least one copy of the alternate allele for each path SNP in addition to exposure 1 confers increased disease risk. .
data(mom.gxe)
data(mom.gxe)
A data frame with 1000 rows and 24 variables
A simulated dataset containing the counts of the alternate allele
for 24 SNPs for the mothers in 1000 simulated case-parent
triads. Columns represent SNPs, rows are individuals. The SNP
in column 6 interacts with the SNPs in columns 12 and 18
of dataset case.mci
to increase risk of disease in the child,
where at least one copy of the alternate allele (genotype 1 or 2)
is required at each implicated locus.
.
data(mom.mci)
data(mom.mci)
A matrix with 1000 rows and 24 variables
This function plots a network of SNPs with potential multi-SNP effects.
network.plot( graphical.score.list, preprocessed.list, n.top.scoring.pairs = NULL, node.shape = "circle", repulse.rad = 1000, node.size = 25, graph.area = 100, vertex.label.cex = 0.5, edge.width.cex = 12, plot = TRUE, edge.color.ramp = c("lightblue", "blue"), node.color.ramp = c("white", "red"), plot.legend = TRUE, high.ld.threshold = 0.1, plot.margins = c(2, 1, 2, 1), legend.title.cex = 1.75, legend.axis.cex = 1.75, ... )
network.plot( graphical.score.list, preprocessed.list, n.top.scoring.pairs = NULL, node.shape = "circle", repulse.rad = 1000, node.size = 25, graph.area = 100, vertex.label.cex = 0.5, edge.width.cex = 12, plot = TRUE, edge.color.ramp = c("lightblue", "blue"), node.color.ramp = c("white", "red"), plot.legend = TRUE, high.ld.threshold = 0.1, plot.margins = c(2, 1, 2, 1), legend.title.cex = 1.75, legend.axis.cex = 1.75, ... )
graphical.score.list |
The list returned by function
|
preprocessed.list |
The initial list produced by function
|
n.top.scoring.pairs |
An integer indicating the number of top scoring SNP-pairs to plot. Defaults to, NULL, which plots all pairs. For large networks, plotting a subset of the top scoring pairs can improve the appearance of the graph. |
node.shape |
The desired node shape. See
|
repulse.rad |
A scalar affecting the graph shape. Decrease to reduce overlapping nodes, increase to move nodes closer together. |
node.size |
A scalar affecting the size of the graph nodes. Increase to increase size. |
graph.area |
A scalar affecting the size of the graph area. Increase to increase graph area. |
vertex.label.cex |
A scalar controlling the size of the vertex label. Increase to increase size. |
edge.width.cex |
A scalar controlling the width of the graph edges. Increase to make edges wider. |
plot |
A logical indicating whether the network should be plotted. If set to false, this function will return an igraph object to be used for manual plotting. |
edge.color.ramp |
A character vector of colors. The coloring of the
network edges will be shown on a gradient, with the lower scoring edge
weights closer to the first color specified in |
node.color.ramp |
A character vector of colors. The coloring of the
network nodes will be shown on a gradient, with the lower scoring nodes
closer to the first color specified in |
plot.legend |
A boolean indicating whether a legend should be plotted. Defaults to TRUE. |
high.ld.threshold |
A numeric value between 0 and 1, indicating the r^2
threshold in complements (or unaffected siblings)
above which a pair of SNPs in the same LD block
(as specified in |
plot.margins |
A vector of length 4 passed to |
legend.title.cex |
A numeric value controlling the size of the legend titles. Defaults to 1.75. Increase to increase font size, decrease to decrease font size. |
legend.axis.cex |
A numeric value controlling the size of the legend axis labels. Defaults to 1.75. Increase to increase font size, decrease to decrease font size. |
... |
Additional arguments to be passed to |
An igraph object, if plot
is set to FALSE.
data(case) data(dad) data(mom) case <- as.matrix(case) dad <- as.matrix(dad) mom <- as.matrix(mom) data(snp.annotations) set.seed(1400) # preprocess data target.snps <- c(1:3, 30:32, 60:62, 85) pp.list <- preprocess.genetic.data(case[, target.snps], father.genetic.data = dad[ , target.snps], mother.genetic.data = mom[ , target.snps], ld.block.vec = c(3, 3, 3, 1)) ## run GA for observed data #observed data chromosome size 2 run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 2, results.dir = 'tmp_2', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res2 <- combine.islands('tmp_2', snp.annotations[ target.snps, ], pp.list, 2) unlink('tmp_reg', recursive = TRUE) #observed data chromosome size 3 run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 3, results.dir = 'tmp_3', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res3 <- combine.islands('tmp_3', snp.annotations[ target.snps, ], pp.list, 2) unlink('tmp_reg', recursive = TRUE) ## create list of results final.results <- list(combined.res2[1:3, ], combined.res3[1:3, ]) ## compute edge scores set.seed(20) graphical.list <- compute.graphical.scores(final.results, pp.list, pval.thresh = 0.5) ## plot set.seed(10) network.plot(graphical.list, pp.list) lapply(c("tmp_2", "tmp_3"), unlink, recursive = TRUE)
data(case) data(dad) data(mom) case <- as.matrix(case) dad <- as.matrix(dad) mom <- as.matrix(mom) data(snp.annotations) set.seed(1400) # preprocess data target.snps <- c(1:3, 30:32, 60:62, 85) pp.list <- preprocess.genetic.data(case[, target.snps], father.genetic.data = dad[ , target.snps], mother.genetic.data = mom[ , target.snps], ld.block.vec = c(3, 3, 3, 1)) ## run GA for observed data #observed data chromosome size 2 run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 2, results.dir = 'tmp_2', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res2 <- combine.islands('tmp_2', snp.annotations[ target.snps, ], pp.list, 2) unlink('tmp_reg', recursive = TRUE) #observed data chromosome size 3 run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 3, results.dir = 'tmp_3', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) combined.res3 <- combine.islands('tmp_3', snp.annotations[ target.snps, ], pp.list, 2) unlink('tmp_reg', recursive = TRUE) ## create list of results final.results <- list(combined.res2[1:3, ], combined.res3[1:3, ]) ## compute edge scores set.seed(20) graphical.list <- compute.graphical.scores(final.results, pp.list, pval.thresh = 0.5) ## plot set.seed(10) network.plot(graphical.list, pp.list) lapply(c("tmp_2", "tmp_3"), unlink, recursive = TRUE)
This function creates permuted datasets for permutation based hypothesis testing of GADGETS fitness scores.
permute.dataset( preprocessed.list, permutation.data.file.path, n.permutations = 100, bp.param = bpparam() )
permute.dataset( preprocessed.list, permutation.data.file.path, n.permutations = 100, bp.param = bpparam() )
preprocessed.list |
The output list from |
permutation.data.file.path |
If running GADGETS for GxG interactions, this argument specifies a directory where each permuted dataset will be saved on disk. If searching for GxE interactions, permuted versions of the exposure matrix will be saved to this directory. |
n.permutations |
The number of permuted datasets to create. |
bp.param |
The BPPARAM argument to be passed to bplapply.
See |
If genetic data are specified, a total of n.permutations
datasets containing pairs of case and complement data, where the observed
case/complement status has been randomly flipped or not flipped, will be
saved to permutation.data.file.path
. If exposure data are specified, a
total of n.permutations
exposure matrices, where the observed
exposures have been randomly re-assigned across the permuted 'families'.
data(case) case <- as.matrix(case) data(dad) dad <- as.matrix(dad) data(mom) mom <- as.matrix(mom) pp.list <- preprocess.genetic.data(case[, 1:10], father.genetic.data = dad[ , 1:10], mother.genetic.data = mom[ , 1:10], ld.block.vec = c(10)) set.seed(15) perm.data.list <- permute.dataset(pp.list, "tmp_perm", n.permutations = 1) unlink("tmp_perm", recursive = TRUE)
data(case) case <- as.matrix(case) data(dad) dad <- as.matrix(dad) data(mom) mom <- as.matrix(mom) pp.list <- preprocess.genetic.data(case[, 1:10], father.genetic.data = dad[ , 1:10], mother.genetic.data = mom[ , 1:10], ld.block.vec = c(10)) set.seed(15) perm.data.list <- permute.dataset(pp.list, "tmp_perm", n.permutations = 1) unlink("tmp_perm", recursive = TRUE)
This function performs several pre-processing steps, intended for use before function run.gadgets.
preprocess.genetic.data( case.genetic.data, complement.genetic.data = NULL, father.genetic.data = NULL, mother.genetic.data = NULL, ld.block.vec = NULL, bp.param = bpparam(), snp.sampling.probs = NULL, categorical.exposures = NULL, continuous.exposures = NULL, mother.snps = NULL, child.snps = NULL, lower.order.gxe = FALSE )
preprocess.genetic.data( case.genetic.data, complement.genetic.data = NULL, father.genetic.data = NULL, mother.genetic.data = NULL, ld.block.vec = NULL, bp.param = bpparam(), snp.sampling.probs = NULL, categorical.exposures = NULL, continuous.exposures = NULL, mother.snps = NULL, child.snps = NULL, lower.order.gxe = FALSE )
case.genetic.data |
The genetic data of the disease affected children
from case-parent trios or disease-discordant sibling pairs. If searching for
maternal SNPs that are related to risk of disease in the child, some of the
columns in |
complement.genetic.data |
A genetic dataset for the controls
corresponding to the genotypes in |
father.genetic.data |
The genetic data for the fathers of the cases in
|
mother.genetic.data |
The genetic data for the mothers of the cases in
|
ld.block.vec |
An integer vector specifying the linkage blocks of the
input SNPs. As an example, for 100 candidate SNPs, suppose we specify
|
bp.param |
The BPPARAM argument to be passed to bplapply when
estimating marginal disease associations for each SNP. If using a cluster
computer, this parameter needs to be set with care. See
|
snp.sampling.probs |
A vector indicating the sampling probabilities of
the SNPs in |
categorical.exposures |
(experimental) A matrix or data.frame of
integers corresponding to categorical exposures corresponding to the cases in
|
continuous.exposures |
(experimental) A matrix or data.frame of numeric
values representing continuous exposures corresponding to the families in
|
mother.snps |
If searching for maternal SNPs that are associated
with disease in the child, the indices of the maternal SNP columns in object
|
child.snps |
If searching for maternal SNPs that are associated
with disease in the child, the indices of the child SNP columns in object
|
lower.order.gxe |
(experimental) A boolean indicating whether, if multiple exposures of interest are input, E-GADGETS should search for only for genetic interactions with the joint combination of exposures (i.e., GxGxExE interactions), or if it should additionally search for lower-order interactions that involve subsets of the exposures that were input (i.e., GxGxE in addition to GxGxExE). The default, FALSE, restricts the search to GxGxExE interactions. Users should be cautious about including large numbers of input exposures, and, if they do, very cautious about setting this argument to TRUE. |
A list containing the following:
A matrix of case/maternal genotypes.
A matrix of complement/sibling/paternal genotypes. If running E-GADGETS, this is set to a 1x1 matrix whose single entry is 0, and not used
If running E-GADGETS, A matrix of maternal genotypes, otherwise a 1x1 matrix whose single entry is 0.0, and not used
If running E-GADGETS, A matrix of mpaternal genotypes, otherwise a 1x1 matrix whose single entry is 0.0, and not used
A vector of chi-square statistics corresponding to
marginal SNP-disease associations, if snp.sampling.probs
is not specified, and snp.sampling.probs
otherwise.
A vector eaul to cumsum(ld.block.vec)
.
A design matrix of the input categorical and continuous exposures, if specified. Otherwise NULL.
A boolean indicating whether a GxGxE search is desired.
A vector of the column indices of maternal SNPs in
case.genetic.data
, set to NULL if not applicable.
A vector of the column indices of child SNPs in
case.genetic.data
, set to NULL if not applicable.
data(case) data(dad) data(mom) case <- as.matrix(case) dad <- as.matrix(dad) mom <- as.matrix(mom) res <- preprocess.genetic.data(case[, 1:10], father.genetic.data = dad[ , 1:10], mother.genetic.data = mom[ , 1:10], ld.block.vec = c(10))
data(case) data(dad) data(mom) case <- as.matrix(case) dad <- as.matrix(dad) mom <- as.matrix(mom) res <- preprocess.genetic.data(case[, 1:10], father.genetic.data = dad[ , 1:10], mother.genetic.data = mom[ , 1:10], ld.block.vec = c(10))
This function runs the GADGETS algorithm to detect multi-SNP effects in case-parent triad studies.
run.gadgets( data.list, n.chromosomes, chromosome.size, results.dir, cluster.type, registryargs = list(file.dir = NA, seed = 1500), resources = list(), cluster.template = NULL, n.workers = min(detectCores() - 2, n.islands/island.cluster.size), n.chunks = NULL, n.different.snps.weight = 2, n.both.one.weight = 1, weight.function.int = 2, generations = 500, gen.same.fitness = 50, initial.sample.duplicates = FALSE, snp.sampling.type = "chisq", crossover.prop = 0.8, n.islands = 1000, island.cluster.size = 4, migration.generations = 50, n.migrations = 20, recessive.ref.prop = 0.75, recode.test.stat = 1.64, n.random.chroms = 10000, null.mean.vec = NULL, null.sd.vec = NULL )
run.gadgets( data.list, n.chromosomes, chromosome.size, results.dir, cluster.type, registryargs = list(file.dir = NA, seed = 1500), resources = list(), cluster.template = NULL, n.workers = min(detectCores() - 2, n.islands/island.cluster.size), n.chunks = NULL, n.different.snps.weight = 2, n.both.one.weight = 1, weight.function.int = 2, generations = 500, gen.same.fitness = 50, initial.sample.duplicates = FALSE, snp.sampling.type = "chisq", crossover.prop = 0.8, n.islands = 1000, island.cluster.size = 4, migration.generations = 50, n.migrations = 20, recessive.ref.prop = 0.75, recode.test.stat = 1.64, n.random.chroms = 10000, null.mean.vec = NULL, null.sd.vec = NULL )
data.list |
The output list from |
n.chromosomes |
An integer specifying the number of chromosomes to use for each island in GADGETS. |
chromosome.size |
An integer specifying the number of SNPs in each chromosome. |
results.dir |
The directory to which island results will be saved. |
cluster.type |
A character string indicating the type of cluster on which to evolve solutions in parallel. Supported options are interactive, socket, multicore, sge, slurm, lsf, openlava, or torque. See the \ documentation for package batchtools for more information. |
registryargs |
A list of the arguments to be provided to
|
resources |
A named list of key-value pairs to be substituted into the
template file. Options available are specified in
|
cluster.template |
A character string of the path to the template file
required for the cluster specified in |
n.workers |
An integer indicating the number of workers for the cluster
specified in |
n.chunks |
An integer specifying the number of chunks jobs running
island clusters should be split into when dispatching jobs using
|
n.different.snps.weight |
The number by which the number of different SNPs between a case and complement or unaffected sibling is multiplied in computing the family weights. Defaults to 2. |
n.both.one.weight |
The number by which the number of SNPs equal to 1 in both the case and complement or unaffected sibling is multiplied in computing the family weights. Defaults to 1. |
weight.function.int |
An integer used to assign family weights.
Specifically, we use |
generations |
The maximum number of generations for which GADGETS will run. Defaults to 500. |
gen.same.fitness |
The number of consecutive generations with the same fitness score required for algorithm termination. Defaults to 50. |
initial.sample.duplicates |
A logical indicating whether the same SNP can appear in more than one chromosome in the initial sample of chromosomes (the same SNP may appear in more than one chromosome thereafter, regardless). Default to FALSE. |
snp.sampling.type |
A string indicating how SNPs are to be sampled for
mutations. Options are 'chisq', 'random', or 'manual'. The 'chisq' option
takes into account the marginal association between a SNP and disease status,
with larger marginal associations corresponding to higher sampling
probabilities. The 'random' option gives each SNP the same sampling
probability regardless of marginal association. The 'manual' option should be
used when |
crossover.prop |
A numeric between 0 and 1 indicating the proportion of chromosomes to be subjected to cross over.The remaining proportion will be mutated. Defaults to 0.8. |
n.islands |
An integer indicating the number of islands to be used. Defaults to 1000. |
island.cluster.size |
An integer specifying the number of islands in a
given cluster. Must evenly divide |
migration.generations |
An integer equal to the number of generations
between migrations among islands of a distinct cluster.
Argument |
n.migrations |
The number of chromosomes that migrate among islands.
This value must be less than |
recessive.ref.prop |
The proportion to which the observed proportion of informative cases with the provisional risk genotype(s) will be compared to determine whether to recode the SNP as recessive. Defaults to 0.75. |
recode.test.stat |
For a given SNP, the minimum test statistic required to recode and recompute the fitness score using recessive coding. Defaults to 1.64. See the GADGETS paper for specific details. |
n.random.chroms |
(experimental) The number of random chromosomes used to construct a reference null mean and standard deviations vectors to compute the E-GADGETS (GxGxE) fitness score. |
null.mean.vec |
(experimental) A vector of estimated null means for each
of the components of the E-GADGETS fitness score. This needs to be specified
if running permutes under the no-GxE null, and should be set to the values in
the "null.mean" element of the "null.mean.sd.info.rds" file stored in the
|
null.sd.vec |
A vector of estimated null standard deviations for the
components of the E-GADGETS fitness score. See argument |
For each island, a list of two elements will be written to
results.dir
:
A data.table of the final generation
chromosomes, their fitness scores, and, for GADGETS, additional information
pertaining to nominated risk-related genotypes. See the package vignette for
an example and the documentation for chrom.fitness.score
for
additional details.
The total number of generations run.
data(case) case <- as.matrix(case) data(dad) dad <- as.matrix(dad) data(mom) mom <- as.matrix(mom) pp.list <- preprocess.genetic.data(case[, 1:10], father.genetic.data = dad[ , 1:10], mother.genetic.data = mom[ , 1:10], ld.block.vec = c(10)) run.gadgets(pp.list, n.chromosomes = 4, chromosome.size = 3, results.dir = 'tmp', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) unlink('tmp_bm', recursive = TRUE) unlink('tmp', recursive = TRUE) unlink('tmp_reg', recursive = TRUE)
data(case) case <- as.matrix(case) data(dad) dad <- as.matrix(dad) data(mom) mom <- as.matrix(mom) pp.list <- preprocess.genetic.data(case[, 1:10], father.genetic.data = dad[ , 1:10], mother.genetic.data = mom[ , 1:10], ld.block.vec = c(10)) run.gadgets(pp.list, n.chromosomes = 4, chromosome.size = 3, results.dir = 'tmp', cluster.type = 'interactive', registryargs = list(file.dir = 'tmp_reg', seed = 1500), generations = 2, n.islands = 2, island.cluster.size = 1, n.migrations = 0) unlink('tmp_bm', recursive = TRUE) unlink('tmp', recursive = TRUE) unlink('tmp_reg', recursive = TRUE)
A data.frame containing the RSID, REF allele and ALT allele
for each SNP in the example datasets case
, mom
, and dad
.
The SNPs are in the same order as they appear in the example dataset columns.
data(snp.annotations)
data(snp.annotations)
A data frame with 100 rows and 3 variables
A data.frame containing the RSID, REF allele and ALT allele
for each SNP in the example datasets case.mci
, mom.mci
,
dad.mci
, case.gxe
, mom.gxe
, and dad.gxe
.
The SNPs are in the same order as they appear in the example dataset columns.
data(snp.annotations.mci)
data(snp.annotations.mci)
A matrix with 24 rows and 3 variables