Title: | Testing of SNPs and SNP Interactions in Case-Parent Trio Studies |
---|---|
Description: | Testing SNPs and SNP interactions with a genotypic TDT. This package furthermore contains functions for computing pairwise values of LD measures and for identifying LD blocks, as well as functions for setting up matched case pseudo-control genotype data for case-parent trios in order to run trio logic regression, for imputing missing genotypes in trios, for simulating case-parent trios with disease risk dependent on SNP interaction, and for power and sample size calculation in trio data. |
Authors: | Holger Schwender, Qing Li, Philipp Berger, Christoph Neumann, Margaret Taub, Ingo Ruczinski |
Maintainer: | Holger Schwender <[email protected]> |
License: | LGPL-2 |
Version: | 3.45.0 |
Built: | 2024-10-31 06:28:32 UTC |
Source: | https://github.com/bioc/trio |
Performs the allelic Transmission/Disequilibrium Test for each SNP contained in a genotype matrix.
allelicTDT(mat.snp, size = 50, correct = FALSE) ## S3 method for class 'aTDT' print(x, top = 5, digits = 4, ...)
allelicTDT(mat.snp, size = 50, correct = FALSE) ## S3 method for class 'aTDT' print(x, top = 5, digits = 4, ...)
mat.snp |
a numeric matrix in which each column represents a SNP. Each column must be
a numeric vector of length |
size |
the number of SNPs considered simultaneously when computing the parameter estimates. |
correct |
should the test statistic be continuity corrected? If |
x |
an object of class |
digits |
number of digits that should be printed. |
top |
number of interactions that should be printed. If |
... |
ignored. |
An object of class aTDT
containing the following numeric vectors:
stat |
values of the test statistic of the allelic TDT, |
pval |
the corresponding p-values. |
Holger Schwender, [email protected]
Spielman, R.S., McGinnis, R.E., and Ewens, W.J. (1993). Trsnmmission Test for Linkage Disequilibrium: The Insulin Gene Region and Insulin-Dependent Diabetes Mellitus (IDDM). American Journal of Human Genetics, 52, 506-516.
# Load the simulated data for the analysis. data(trio.data) # Perform an allelic TDT a.out <- allelicTDT(mat.test) # By default, the top 5 SNPs are shown. # Another number of SNPs, e.g., 10, are displayed by print(a.out, top=10) # If the results for all SNPs should be shown (or returned), use print(a.out, top=0)
# Load the simulated data for the analysis. data(trio.data) # Perform an allelic TDT a.out <- allelicTDT(mat.test) # By default, the top 5 SNPs are shown. # Another number of SNPs, e.g., 10, are displayed by print(a.out, top=10) # If the results for all SNPs should be shown (or returned), use print(a.out, top=0)
Performs the Expectation Maximimization Likelihood Ratio Test (EMLRT) proposed by Weinberg (1999) for each SNP in a matrix in genotype format.
colEMlrt(mat.snp, model = c("general", "dominant", "recessive"), maternal = FALSE, parentMissing = c("father", "mother", "either"), iter = 40, eps = 10^-16) ## S3 method for class 'colEMlrt' print(x, top = 5, digits = 4, ...)
colEMlrt(mat.snp, model = c("general", "dominant", "recessive"), maternal = FALSE, parentMissing = c("father", "mother", "either"), iter = 40, eps = 10^-16) ## S3 method for class 'colEMlrt' print(x, top = 5, digits = 4, ...)
mat.snp |
a numeric matrix in which each column represents a SNP. Each column must be
a numeric vector of length |
model |
a character string specifying the genetic effect that should be considered in the Poisson regression.
By default, the general model proposed by Weinberg (1999) is fitted.
Alternatively, a dominant or recessive mode of inheritance might be considered by setting |
maternal |
logical specifying whether parameters for a maternal effects should be added to the Poisson regression model.
If |
parentMissing |
a character string specifying whether the genotype of the |
iter |
a non-negative numeric value specifying the maximum number of iterations that should be used in the application of the expectation maximization algorithm. |
eps |
a non-negative small numeric value specifying the accuracy for the stopping criterion of the EM algorithm.
If the sum of the squared differences of the estimated expected numbers of trios over the 15 possible genotype combinations
in trios between two consecutive iterations of the EM algorithm is smaller than |
x |
an object of class |
digits |
number of digits that should be printed. |
top |
number of SNPs that should be printed. If |
... |
ignored. |
While in functions such as colTDT
all trios in which the genotype of one or more of the members of this trio
is missing for a particular SNP are removed from the analysis of this SNP, the procedure proposed by Weinberg (1999) can handle missing
genotypes by employing an expectation maximization (EM) algorithm to estimate the expected numbers of
trios for the 15 different genotype combinations
possible in a trio (when considering the genotypes of mothers and fathers individually) and a likelihood ratio test based
on two nested Poisson regression models using the estimated expected numbers of trios as outcome.
An object of class colEMlrt
consisting of the following numeric vectors:
stat |
the values of the test statistic of the likelihood ratio test for all SNPs in |
pval |
the corresponding p-values, |
ll.full |
the values of the maximized likelihoods of the full models, |
ll , red
|
the values of the maximied likelihoods of the reduced models not containing the
parameter(s) corresponding to the genetic |
Philipp Berger, [email protected]
Weinberg, C.R. (1999). Allowing for Missing Parents in Genetic Studies of Case-Parent Triads. American Journal of Human Genetics, 64, 1186-1193.
# Load the simulated data. data(trio.data) # The EM Likelihood Ratio Test can be applied # to the SNPs in mat.test by em.out <- colEMlrt(mat.test) # If a dominant mode of inheritance is of interest, # the corresponding EM Likelihood Ratio Test can be # performed by emd.out <- colEMlrt(mat.test, model="dominant") # By default, statistics for the top 5 SNPs are displayed. # If another number of SNPs, say 10, should be displayed, # then this can be done by print(em.out, top = 10) # The statistics for all SNPs (not ordered by their # significance) can be obtained by print(em.out, top = 0)
# Load the simulated data. data(trio.data) # The EM Likelihood Ratio Test can be applied # to the SNPs in mat.test by em.out <- colEMlrt(mat.test) # If a dominant mode of inheritance is of interest, # the corresponding EM Likelihood Ratio Test can be # performed by emd.out <- colEMlrt(mat.test, model="dominant") # By default, statistics for the top 5 SNPs are displayed. # If another number of SNPs, say 10, should be displayed, # then this can be done by print(em.out, top = 10) # The statistics for all SNPs (not ordered by their # significance) can be obtained by print(em.out, top = 0)
Performs a genotypic TDT for gene-environment interactions for each SNP represented by a column of a matrix in genotype format
and a binary environmental factor. If alpha1
is set to a value smaller than 1, then the two-step procedure of
Gauderman et al. (2010) will be used to first select all SNPs showing a p-value smaller than alpha1
in a logistic regression of the environmental factor against the sums of the codings for the parents' genotypes at the respective
SNP. In the second step, the genotypic TDT is then applied to the selected SNPs.
If unstructured = TRUE
, all fully parameterized model is considered and a likelihood ratio test is performed.
While colGxE
computes the p-values based on asymptotic ChiSquare-distributions,
colGxEPerms
can be used to determine permutation-based p-values for the basic genotypic TDT (i.e. for colGxE
using alpha = 1
and unstructured = FALSE
.
colGxE(mat.snp, env, model = c("additive", "dominant", "recessive"), alpha1 = 1, size = 50, addGandE = TRUE, whichLRT = c("both", "2df", "1df", "none"), add2df = TRUE, addCov = FALSE, famid = NULL, unstructured = FALSE) colGxEPerms(mat.snp, env, model = c("additive", "dominant", "recessive"), B = 10000, size = 20, addPerms = TRUE, famid = NULL, rand = NA)
colGxE(mat.snp, env, model = c("additive", "dominant", "recessive"), alpha1 = 1, size = 50, addGandE = TRUE, whichLRT = c("both", "2df", "1df", "none"), add2df = TRUE, addCov = FALSE, famid = NULL, unstructured = FALSE) colGxEPerms(mat.snp, env, model = c("additive", "dominant", "recessive"), B = 10000, size = 20, addPerms = TRUE, famid = NULL, rand = NA)
mat.snp |
a numeric matrix in which each column represents a SNP. Each column must be
a numeric vector of length |
env |
a vector of length |
model |
type of model that should be fitted. Abbreviations are allowed. Thus, e.g., |
alpha1 |
a numeric value between 0 and 1 (excluding 0). If |
size |
the number of SNPs considered simultaneously when computing the parameter estimates. |
addGandE |
should the relative risks and their confidence intervals for the exposed cases be added to the output? |
whichLRT |
character string specifying which likelihood ratio test should be added to the output. If |
add2df |
should the results of a 2 df Wald test for testing both the SNP and the interaction effect simultaneously be added to the model? |
addCov |
should the covariance between the parameter estimations for the SNP and the gene-environment interaction be added
to the output? Default is |
famid |
a vector of the same length as |
unstructured |
should a fully parameterized model be fitted? If |
B |
number of permutations. |
addPerms |
should the matrices containing the permuted values of the test statistics for the SNP and the gene-environment interaction be added to the output? |
rand |
integer for setting the random number generator into a reproducible state. |
A conditional logistic regression model including two parameters, one for , and the other for
, is fitted, where
is specified according to
model
.
For colGxE
with unstructured=FALSE
, an object of class colGxE
consisting of the following numeric matrices with two columns (one for each parameter):
coef |
the estimated parameter, |
se |
the estimated standard deviation of the parameter estimate, |
stat |
Wald statistic, |
RR |
the relative risk, i.e.\ in the case of trio data, |
lowerRR |
the lower bound of the 95% confidence interval for |
upperRR |
the upper bound of the 95% confidence interval for |
usedTrios |
the number of trios affecting the parameter estimation, |
env |
vector containing the values of the environmental factor, |
type |
|
addGandE |
the value of |
addOther |
a logical vector specifying which of the likelihood ratio tests and if the 2 df Wald test was performed, |
and depending on the specifications in colGxE
cov |
numeric vector containing the covariances, |
lrt2df |
a numeric matrix with two columns, in which the first column contains the values of the 1 df likelihood ratio test statistic and the second the corresponding p-values, |
wald2df |
a numeric matrix with two columns, in which the first column contains the values of the 2 df Wald test statistics and the second the corresponding p-values, |
lrt1df |
a numeric matrix with two columns, in which the first column contains the values of the 2 df likelihood ratio test statistic and the seocnd the corresponding p-values. |
For colGxE
with unstructured=TRUE
, an object of class colGxEunstruct
consisting of the following vectors:
ll.main |
the loglikelihoods of the models containing only the two main effects, |
ll.full |
the loglikelihoods of the models additionally containing the two main effects and the two interaction effects, |
stat |
the values of the test statistic of the likelihood ratio test, |
pval |
the corresponding p-values. |
For colGxEPerms
,
stat |
a matrix with two columns containing the values of gTDT statistics for the main effects of the SNPs and the gene-environment interactions when considering the original, unpermuted case-pseudo-control status, |
pval |
a matrix with two columns comprising the permutation-based p-values corresponding to the test statistics in |
and if addPerms = TRUE
matPermG |
a matrix with |
matPermGxE |
a matrix with |
Holger Schwender, [email protected]
Gauderman, W.J., Thomas, D.C., Murcray, C.E., Conti, D., Li, D., and Lewinger, J.P. (2010). Efficient Genome-Wide Association Testing of Gene-Environment Interaction in Case-Parent Trios. American Journal of Epidemiology, 172, 116-122.
Schaid, D.J. (1996). General Score Tests for Associations of Genetic Markers with Disease Using Cases and Their Parents. Genetic Epidemiology, 13, 423-449.
Schwender, H., Taub, M.A., Beaty, T.H., Marazita, M.L., and Ruczinski, I. (2011). Rapid Testing of SNPs and Gene-Environment Interactions in Case-Parent Trio Data Based on Exact Analytic Parameter Estimation. Biometrics, 68, 766-773.
# Load the simulated data for the analysis. data(trio.data) # Set up a vector with the binary environmental variable. # Here, we consider the gene-gender interactions and # assume that the children in the first 50 trios are # girls, and the remaining 50 are boys. sex <- rep(0:1, each = 50) # Test the interaction of sex with each of the SNPs in mat.test gxe.out <- colGxE(mat.test, sex) # By default, an additive mode of inheritance is considered. # If, e.g., a dominant mode should be considered, then this can # be done by calling gxeDom.out <- colGxE(mat.test, sex, model="dominant")
# Load the simulated data for the analysis. data(trio.data) # Set up a vector with the binary environmental variable. # Here, we consider the gene-gender interactions and # assume that the children in the first 50 trios are # girls, and the remaining 50 are boys. sex <- rep(0:1, each = 50) # Test the interaction of sex with each of the SNPs in mat.test gxe.out <- colGxE(mat.test, sex) # By default, an additive mode of inheritance is considered. # If, e.g., a dominant mode should be considered, then this can # be done by calling gxeDom.out <- colGxE(mat.test, sex, model="dominant")
Computes the original and permuted values of the test statistic of the gTDT test as proposed by Cordell (2002) for each interaction between the pairs of SNPs in mat.snp.
colGxGPerms(mat.snp, n.perm = 1000, genes = NULL, col.out = NULL, warnError = TRUE, verbose = TRUE, rand = NA)
colGxGPerms(mat.snp, n.perm = 1000, genes = NULL, col.out = NULL, warnError = TRUE, verbose = TRUE, rand = NA)
mat.snp |
a numeric matrix in which each column represents a SNP. Each column must be
a numeric vector of length |
n.perm |
number of permutations of the response for which the permuted values of the test statistic should be computed. |
genes |
a character vector containing the names of the genes to which the SNPs belong. If specified, only the two-way interactions
between SNPs from different genes are tested. If |
col.out |
the output of |
warnError |
logical indicating whether the statistics for the gTDT should be returned as |
verbose |
logical indicating whether some information on what is currently computed should be printed. |
rand |
numeric value. If specified, the random number generator is set into a reproducible state. |
A list consisting of
stat |
a numeric vector containing the original values of the test statistic, |
permStat |
a numeric matrix containing the permuted values of the test statistic, |
y.perm |
a matrix containing the permutations of the response. |
Holger Schwender, [email protected]
Cordell, H. J. (2002). Epistasis: What it Means, what it Doesn't mean, and Statistical Methods to Detect it in Humans. Human Molecular Genetics, 11, 2463-2468.
# Load the simulated data. data(trio.data) # Cordell's LRT for all pairs of SNPs in mat.test can be performed # and the values of the LRT statistic for 10 permutations of the # case-pseudo-controls status can be computed by gxg <- colGxGPerms(mat.test, n.perm = 10) # where we here consider only 10 permutations to keep the computing # time of this example small. Usually, at least a few thousand # permutations should be considered.
# Load the simulated data. data(trio.data) # Cordell's LRT for all pairs of SNPs in mat.test can be performed # and the values of the LRT statistic for 10 permutations of the # case-pseudo-controls status can be computed by gxg <- colGxGPerms(mat.test, n.perm = 10) # where we here consider only 10 permutations to keep the computing # time of this example small. Usually, at least a few thousand # permutations should be considered.
Computes the test statistics and the corresponding p-values either for the Parent-of-Origin Likelihood Ratio Test proposed by Weinberg (1999) or the Transmission Asymmetry Test proposed by Weinberg et al. (1998).
colPOlrt(mat.snp, size = 20) colTAT(mat.snp, stratified = FALSE, size = 50, bothHet = 0) ## S3 method for class 'polrt' print(x, top = 5, digits = 4, ...) ## S3 method for class 'tat' print(x, top = 5, digits = 4, ...)
colPOlrt(mat.snp, size = 20) colTAT(mat.snp, stratified = FALSE, size = 50, bothHet = 0) ## S3 method for class 'polrt' print(x, top = 5, digits = 4, ...) ## S3 method for class 'tat' print(x, top = 5, digits = 4, ...)
mat.snp |
a numeric matrix in which each column represents a SNP. Each column must be
a numeric vector of length |
size |
the number of SNPs considered simultaneously when computing the test statistics. |
stratified |
should also test statistics and p-values stratified by paternal and maternal transmission be computed? |
bothHet |
a numeric value between 0 and 1 specifying how trios in which both parents are heterozygous are weighted in
determination of the TAT statistic. By default, such trios are ignored (as proposed by Weinberg, 1999). If |
x |
an object of class |
digits |
number of digits that should be printed. |
top |
number of interactions that should be printed. If |
... |
ignored. |
For colPOlrt
, an object of class polrt
consisting of the following numeric vectors:
stat |
the values of the test statistic of the likelihood ratio test for all SNPs in |
pval |
the corresponding p-values, |
full |
the values of the maximized likelihoods of the full models containing also a parameter for the parent-of-origin effect, |
red |
the values of the maximied likelihoods of the reduced models not containing this parameter. |
For colTAT
, an object of class tat
consisting of the following numeric vectors:
stat |
the values of the test statistic of transmission asymmetry test for all SNPs in |
pval |
the corresponding p-values, |
usedTrios |
the number of trios affecting the determination of the TAT statistic, |
and if stratified = TRUE
matStrat |
a matrix with four columns containing the number of minor alleles transmitted and not-transmitted by heterozygous fathers and mothers, |
statPaternal |
a numeric vector containing for each SNP the value of the test statistic for testing whether the numbers of paternal transmissions and non-transmissions differ, |
pvalPaternal |
the p-values corresponding to |
statMaternal |
a numeric vector containing for each SNP the value of the test statistic for testing whether the numbers of maternal transmissions and non-transmissions differ, |
pvalMaternal |
the p-values corresponding to |
Holger Schwender, [email protected]
Weinberg, C.R.,Wilcox, A.J., and Lie, R.T. (1998). A Log-Linear Approach to Case-Parent-??Triad Data: Assessing Effects of Disease Genes that act Either Directly or Through Maternal Effects and that may be Subject to Parental Imprinting. American Journal of Human Genetics, 62, 969-978.
Weinberg, C.R. (1999). Methods for Detection of Parent-of-Origin Effects in Genetic Studies of Case-Parents Triads. American Journal of Human Genetics, 65, 229-235.
# Load the simulated data. data(trio.data) # The Parent-of-Origin Likelihood Ratio Test can be applied # to the SNPs in mat.test by po.out <- colPOlrt(mat.test) # The Transmission Asymmetry Test can be applid to the SNPs # in mat.test by tat.out <- colTAT(mat.test) # By default, statistics for the top 5 SNPs are displayed. # If another number of SNPs, say 10, should be displayed, # then this can be done by print(po.out, top = 10) # The statistics for all SNPs (not ordered by their # significance) can be obtained by print(po.out, top = 0)
# Load the simulated data. data(trio.data) # The Parent-of-Origin Likelihood Ratio Test can be applied # to the SNPs in mat.test by po.out <- colPOlrt(mat.test) # The Transmission Asymmetry Test can be applid to the SNPs # in mat.test by tat.out <- colTAT(mat.test) # By default, statistics for the top 5 SNPs are displayed. # If another number of SNPs, say 10, should be displayed, # then this can be done by print(po.out, top = 10) # The statistics for all SNPs (not ordered by their # significance) can be obtained by print(po.out, top = 0)
Computes the maximum over the gTDT statistics for an additive, dominant, and recessive model.
colTDTmaxTest
additionally computes permutation-based p-values.
colTDTmaxTest(geno, perm = 10000, size = 50, chunk = 10000, minimum = 0.001, verbose = FALSE) colTDTmaxStat(geno, size = 50) ## S3 method for class 'maxTestTrio' print(x, top = 5, digits = 4, ...) ## S3 method for class 'maxStatTrio' print(x, top = 5, digits = 4, ...)
colTDTmaxTest(geno, perm = 10000, size = 50, chunk = 10000, minimum = 0.001, verbose = FALSE) colTDTmaxStat(geno, size = 50) ## S3 method for class 'maxTestTrio' print(x, top = 5, digits = 4, ...) ## S3 method for class 'maxStatTrio' print(x, top = 5, digits = 4, ...)
geno |
a numeric matrix in which each column represents a SNP. Each column must be
a numeric vector of length |
perm |
number of permutations of the response for which the permuted values of the test statistic should be computed. |
size |
number of SNPs that should be considered simultaneously when estimating the parameter. |
chunk |
number of permutations that should be considered simultaneously in the computation of the p-values |
minimum |
minimum value that a test statistic must show that for the corresponding SNP the p-value is computed. |
verbose |
logical indicating whether some information on what is currently computed should be printed. |
x |
an object of class |
digits |
number of digits that should be printed. |
top |
number of interactions that should be printed. If the number of interactions is smaller than or equal to
|
... |
ignored. |
For colTDTmaxStat
, an object of class maxStatTrio
consisting of a vector stat
containing the values of the Max
statistic for the SNPs in geno
, a matrix max.stat
containing the values of the gTDT statistic for testing an additive, a dominant,
and a recessive effect, and additional information required by colTDTmaxTest
.
For colTDTmaxTest
, an object of class maxTestTrio
consisting of stat
, max.stat
, and the unadjusted
p-values pval
corresponding to stat
.
Holger Schwender, [email protected]
Schwender, H., Taub, M.A., Beaty, T.H., Marazita, M.L., and Ruczinski, I. (2011). Rapid Testing of SNPs and Gene-Environment Interactions in Case-Parent Trio Data Based on Exact Analytic Parameter Estimation. Biometrics, 68, 766-773.
# Load the simulated data. data(trio.data) # Perform a MAX test by only computing the MAX statistics. max.out <- colTDTmaxStat(mat.test) # Permutation-based p-values are additionally computed when using max.out2 <- colTDTmaxTest(mat.test)
# Load the simulated data. data(trio.data) # Perform a MAX test by only computing the MAX statistics. max.out <- colTDTmaxStat(mat.test) # Permutation-based p-values are additionally computed when using max.out2 <- colTDTmaxTest(mat.test)
Performs a Significance Analysis of Microarrays (SAM; Tusher et al., 2001) or an Empirical Bayes Analysis of Microarrays (EBAM; Efron et al., 2001), respectively, based on the genotypic transmission/disequilibrium test statistic.
colTDTsam(mat.snp, model = c("additive", "dominant", "recessive", "max"), approx = NULL, B = 1000, size = 10, chunk = 100, rand = NA) colTDTebam(mat.snp, model = c("additive", "dominant", "recessive", "max"), approx = NULL, B = 1000, size = 10, chunk = 100, n.interval = NULL, df.ratio = 3, df.dens = 3, knots.mode = TRUE, type.nclass = c("wand", "FD", "scott"), fast = FALSE, rand = NA)
colTDTsam(mat.snp, model = c("additive", "dominant", "recessive", "max"), approx = NULL, B = 1000, size = 10, chunk = 100, rand = NA) colTDTebam(mat.snp, model = c("additive", "dominant", "recessive", "max"), approx = NULL, B = 1000, size = 10, chunk = 100, n.interval = NULL, df.ratio = 3, df.dens = 3, knots.mode = TRUE, type.nclass = c("wand", "FD", "scott"), fast = FALSE, rand = NA)
mat.snp |
a matrix in genotype format, i.e. a numeric matrix in which each column is
a vector of length |
model |
type of genetic mode of inheritance that should be considered. Either |
approx |
logical specifying whether the null distribution should be approximated by a |
B |
number of permutations used in the estimation of the null distribution, and thus, the computation of the null statistics.
Ignored if |
size |
number of SNPs considered simultaneously when computing the gTDT statistics. |
chunk |
number of permutations considered simultaneously in the permutation procedure. |
n.interval |
the number of intervals used in the logistic regression with
repeated observations for estimating the ratio of the null density to the density of the observed
gTDT values in an EBAM analysis (if |
df.ratio |
integer specifying the degrees of freedom of the natural cubic
spline used in the logistic regression with repeated observations for estimating the ratio of the null
density to the density of the observed gTDT values in an EBAM analysis. Only used when |
df.dens |
integer specifying the degrees of freedom of the natural cubic
spline used in the Poisson regression to estimate the density of the observed gTDT values in an EBAM analysis.
Only used when |
knots.mode |
logical specifying whether the |
type.nclass |
character string specifying the procedure used to estimate the
number of intervals of the histogram used in the logistic regression with repeated observations or the Poisson regression,
respectively (see |
fast |
logical specifying whether a crude estimate for the number of permuted test scores larger than the respective
observed gTDT value should be used. If |
rand |
numeric value. If specified, i.e. not |
The output of colTDTsam
or colTDTebam
is an object of class SAM
or EBAM
, respectively. All the
features implemented in the R
package siggenes
for an SAM or EBAM analysis, respectively, can therefore be
used in the SAM or EBAM analysis of case-parent trio data implemented in colTDTsam
or colTDTebam
, respectively.
For details, see sam
or ebam
, respectively.
Holger Schwender, [email protected]
Efron, B., Tibshirani, R., Storey, J.D., and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment, Journal of the American Statistical Association, 96, 1151-1160.
Schwender, H. and Ickstadt, K. (2008). Empirical Bayes Analysis of Single Nucleotide Polymorphisms. BMC Bioinformatics, 9, 144.
Schwender, H., Taub, M.A., Beaty, T.H., Marazita, M.L., and Ruczinski, I. (2011). Rapid Testing of SNPs and Gene-Environment Interactions in Case-Parent Trio Data Based on Exact Analytic Parameter Estimation. Biometrics, 68, 766-773.
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance Analysis of Microarrays Applied to the Ionizing Radiation Response. Proceedings of the National Academy of Science of the United States of America, 98, 5116-5121.
colTDT
, colTDTmaxStat
, sam
, ebam
,
SAM-class
, EBAM-class
# Load the simulated data. data(trio.data) # Perform a Significance Analysis of Microarrays (SAM). sam.out <- colTDTsam(mat.test) # By default an additive mode of inheritance is considered. # If another mode, e.g., the dominant mode, should be # considered, then this can be done by samDom.out <- colTDTsam(mat.test, model="dominant") # Analogously, an Empirical Bayes Analysis of Microarrays based # on the genotypic TDT can be performed by ebam.out <- colTDTebam(mat.test)
# Load the simulated data. data(trio.data) # Perform a Significance Analysis of Microarrays (SAM). sam.out <- colTDTsam(mat.test) # By default an additive mode of inheritance is considered. # If another mode, e.g., the dominant mode, should be # considered, then this can be done by samDom.out <- colTDTsam(mat.test, model="dominant") # Analogously, an Empirical Bayes Analysis of Microarrays based # on the genotypic TDT can be performed by ebam.out <- colTDTebam(mat.test)
Finds LD blocks using the procedure proposed by Gabriel et al. (2002).
findLDblocks(x, alpha = 0.1, ciLD = c(0.7, 0.98), cuRecomb = 0.9, ratio = 9, alsoOthers = FALSE, parentsOnly = FALSE, iter = 50, snp.in.col = TRUE) splitBlocks(blocks)
findLDblocks(x, alpha = 0.1, ciLD = c(0.7, 0.98), cuRecomb = 0.9, ratio = 9, alsoOthers = FALSE, parentsOnly = FALSE, iter = 50, snp.in.col = TRUE) splitBlocks(blocks)
x |
either the output of |
alpha |
numeric value between 0 and 1. For each pair of SNPs, a two-sided
100 * (1 - |
ciLD |
numeric vector consisting of two values between 0 and 1. If the
lower bound of the confidence interval of D' for a SNP pair is larger than
or equal to the first value in |
cuRecomb |
numeric value between 0 and 1. If the upper bound of the confidence
interval of D' for a SNP pair is smaller than |
ratio |
numeric value larger than 1. If in a block of SNPs, the ratio of the
number of SNP pairs being in strong LD to the number of SNPs showing evidence
of recombination is larger than or equal to |
alsoOthers |
logical value. Following the description of Wall and Pritchard (2003)
the endmarkers of a LD block must be in strong LD. By default (i.e.\ if |
parentsOnly |
logical indicating whether only the genotypes of the parents, i.e.\ rows 1, 2,
4, 5, ... of |
iter |
integer specifying the number of iterations used in the computation of D (for details,
see |
snp.in.col |
logical specifying whether each column of |
blocks |
output of |
The LD-blocks are estimated using the method of Gabriel et al. (2002) as described in Wall and Pritchard (2003), where we use the approximate variance estimates of D' proposed by Zabaleta et al. (1997).
Since in trio.prepare
the LD blocks are restricted to a maximum of 7 SNPs, splitBlocks
can be used to split LD blocks composed of more than 7 SNPs into smaller blocks, if the output of findLDblocks
should be used in trio.prepare
to prepare a matrix for a trioLR
or trioFS
analysis.
An object of class LDblocks
consisting of
ld |
the output of |
blocks |
a vector specifying which SNP belongs to which LD-block, |
vec.blocks |
a list in which each entry contains the names of the SNPs belonging to a specific LD-block, |
param |
a list of the input parameters. |
Holger Schwender, [email protected]
Gabriel, S.B. et al. (2002). The Structure of Haplotype Blocks in the Human Genome. Science, 296, 2225-2229.
Wall, J.D. and Pritchard J.K. (2003). Assessing the Performance of the Haplotype Block Model of Linkage Disequilibrium. American Journal of Human Genetics, 73, 502-515.
Zapata, C., Alvarez, G., and Carollo, C. (1997). Approximate Variance of the Standardized Measure of Gametic Disequilibrium D'. American Journal of Human Genetics, 61, 771-774.
# Load the simulated data. data(trio.data) # Estimate LD blocks. blocks <- findLDblocks(LDdata) # Alternatively, the LD blocks can be estimated by ld.out <- getLD(LDdata, addVarN=TRUE) blocks2 <- findLDblocks(ld.out)
# Load the simulated data. data(trio.data) # Estimate LD blocks. blocks <- findLDblocks(LDdata) # Alternatively, the LD blocks can be estimated by ld.out <- getLD(LDdata, addVarN=TRUE) blocks2 <- findLDblocks(ld.out)
While getLD
computes the value of D' and r^2 for each pair of SNPs in a matrix,
getLDlarge
determines D' and r^2 between each SNP and a user-specified number of SNPs
closest to the SNP on the corresponding chromosome. Thus, getLDlarge
can be applied
to much more SNPs than getLD
.
getLD(x, which = c("both", "rSquare", "Dprime"), parentsOnly = FALSE, iter = 50, snp.in.col = TRUE, asMatrix = FALSE, addVarN = FALSE) getLDlarge(x, neighbors=25, which=c("both", "rSquare", "Dprime"), parentsOnly=FALSE, iter=50, snp.in.col=TRUE, addVarN=FALSE)
getLD(x, which = c("both", "rSquare", "Dprime"), parentsOnly = FALSE, iter = 50, snp.in.col = TRUE, asMatrix = FALSE, addVarN = FALSE) getLDlarge(x, neighbors=25, which=c("both", "rSquare", "Dprime"), parentsOnly=FALSE, iter=50, snp.in.col=TRUE, addVarN=FALSE)
x |
a numeric matrix consisting of 0, 1, and 2, where it is assumed that
the values represent the numbers of minor alleles that the SNPs show. Missing values
are allowed. By default, each column represents a SNP and each row a subject.
This can be changed by setting |
neighbors |
positive integer specifying the number of neighbors of a SNP (in both directions) on a chromosome
for which D' or r^2 should be computed. Thus, for each SNP (except for the SNPs in the first and last |
which |
which LD measures should be computed? Either |
parentsOnly |
logical indicating whether only the genotypes of the parents, i.e.\ rows 1, 2,
4, 5, ... of |
iter |
integer specifying how many iterations are used in the procedure of Hill (1974) which is used to estimate D. |
snp.in.col |
logical indicating whether each column of |
asMatrix |
logical indicating whether the LD values are returned as a |
addVarN |
logical indicating whether for each pair of SNPs the number of non-missing values
and the variance estimates of D' proposed by Zabaleta et al. (1997) should be added to the output. The
variance estimates are required for the identification of LD-blocks with |
An object of class getLD
or getLDlarge
consisting (depending of the specification of which
) the
D' (Dprime
) or r^2 (rSquare
) values for each SNP pair, and (depending of the specification
of addVarN
) the variance estimates for D' (varDprime
) and the numbers of non-missing values
(n
). Furthermore, the names of the SNPs (rn
) will be added (in getLD
, if asMatrix = FALSE
).
Holger Schwender, [email protected]
Hill, W.O. (1974). Estimation of Linkage Disequilibrium in Randomly Mating Populations. Heredity, 33, 229-239.
Zapata, C., Alvarez, G., and Carollo, C. (1997). Approximate Variance of the Standardized Measure of Gametic Disequilibrium D'. American Journal of Human Genetics, 61, 771-774.
# Load the simulated data. data(trio.data) # The values of Dprime and Rsquare for each pair of SNPs # in LDdata can be computed by ld.out <- getLD(LDdata) # By default, the LD measures are returned as a vector. # If they should be returned as a matrix, then use ld.out2 <- getLD(LDdata, asMatrix = TRUE)
# Load the simulated data. data(trio.data) # The values of Dprime and Rsquare for each pair of SNPs # in LDdata can be computed by ld.out <- getLD(LDdata) # By default, the LD measures are returned as a vector. # If they should be returned as a matrix, then use ld.out2 <- getLD(LDdata, asMatrix = TRUE)
Generates a matrix containing the genotypes of the cases and the corresponding three pseudo-controls (i.e. the genotypes of the children and the respective corresponding three genotypes not transmitted from the parents).
getMatPseudo(mat.snp)
getMatPseudo(mat.snp)
mat.snp |
a numeric matrix in which each column represents a SNP. Each column must be
a numeric vector of length |
A matrix with rows, in which each block of four consecutive rows consists of the genotypes of the SNPs in
mat.snp
for the case and the three matched pseudo-controls corresponding to the respective block in mat.snp
.
Holger Schwender, [email protected]
# Load the simulated data. data(trio.data) # The matrix with the genotypes of the offspring and the three # pseudo-controls for each of the trios in mat.test can be # generated by matPseudo <- getMatPseudo(mat.test)
# Load the simulated data. data(trio.data) # The matrix with the genotypes of the offspring and the three # pseudo-controls for each of the trios in mat.test can be # generated by matPseudo <- getMatPseudo(mat.test)
Specifies the control parameters for the search algorithms (i.e. either simulated annealing or MCMC) and the logic tree considered when fitting a trio logic regression model.
lrControl(start = 0, end = 0, iter = 0, earlyout = 0, update = 0, treesize = 8, opers = 1, minmass = 0, nburn = 1000, hyperpars = 0, output = 4)
lrControl(start = 0, end = 0, iter = 0, earlyout = 0, update = 0, treesize = 8, opers = 1, minmass = 0, nburn = 1000, hyperpars = 0, output = 4)
start |
a numeric value specifying the upper temperature (on log10 scale) used as start temperature in simulated annealing.
Must be larger than |
end |
a numeric value specifying the lowest temperature (on log10 scale) used in simulated annealing. Must be smaller than
|
iter |
the number of iterations used in the (stochastic) search for the best trio logic regression model, i.e. either in simulated
annealing (if the argument |
earlyout |
a non-negative integer providing an option to end the search before all |
update |
the number of iterations in simulated annealing or MCMC after which statistics for the current trio logic regression model are displayed. This argument allows to evaluate the progress in the search for the best trio logic regression model. By default, no updates are shown. |
treesize |
a positive integer specifying the maximum number of leaves allowed in the logic tree of a trio logic regression model. |
opers |
either 1, 2, or 3 specifying if both the AND and the OR operator ( |
minmass |
a non-negative integer specifying the number of cases and pseudo-controls for which the logic expression (i.e. the logic tree)
needs to be 1 or for which the logic expression needs to be 0 to be considered as a logic tree in the trio logic regression model.
By default, |
nburn |
number of initial iterations in MCMC considered as burn-in MC trio logic regression, and therefore, ignored when computing the summaries. |
hyperpars |
a numeric value specifying the hyperparameter for the prior on the model size when performing a MC trio logic regression.
More exactly, |
output |
a value specifying which statistics are returned in an MCMC trio logic regression analysis. If |
More details on the different control parameters and their specification can be found on the help pages of the functions
logreg.anneal.control
, logreg.tree.control
, and logreg.mc.control
for the different
types of control parameters available in the R
package LogicReg
for a standard logic regressions.
A list containing all required control parameters.
Holger Schwender, [email protected]
# The default values for the parameters in trio logic regression # can be specified by myControl <- lrControl() # If the starting temperature of Simulated Annealing should be set # to 100 and the lowest temperature to 0.001, then this can be done by myControl2 <- lrControl(start = 2, end = -3)
# The default values for the parameters in trio logic regression # can be specified by myControl <- lrControl() # If the starting temperature of Simulated Annealing should be set # to 100 and the lowest temperature to 0.001, then this can be done by myControl2 <- lrControl(start = 2, end = -3)
Transforms a ped-file into a genotype file as required by, e.g., the functions for computing the genotypic TDT.
ped2geno(ped, snpnames = NULL, coded = c("12", "AB", "ATCG", "1234"), naVal = 0, cols4ID = FALSE)
ped2geno(ped, snpnames = NULL, coded = c("12", "AB", "ATCG", "1234"), naVal = 0, cols4ID = FALSE)
ped |
a data frame in ped format, i.e. the first six columns must contain information on the families as typically
presenteed in ped files, where
the column names of these six columns must be "famid", "pid", "fatid", "motid", "sex","affected". The last two
of these six columns are ignored. The IDs of individuals in the second column must be unique (not only within the family,
but among all individuals). The columns following the six columns are assumed to contain the alleles of the SNPs, where
the alleles are coded using the letters/numbers in |
snpnames |
a character vector containing the names of the SNPs. If not specified, generic names are assigned (i.e.
|
coded |
the coding used for the alleles of the SNPs. |
naVal |
the value used for specifying missing values. |
cols4ID |
logical indicating whether columns should be added to output matrix containing the family ID and the individual ID.
If |
A vector (if ped
consists of alleles for one SNP) or matrix (otherwise)
containing one column for each SNP representing the genotypes of the respective SNP, where the genotypes are coded
by 0, 1, 2 (i.e. the number of minor alleles), and missing values are represented by NA
. The vector or matrix contains
values for each SNP genotyped at the
trios, where each block of 3 values is composed of the genotypes of the father,
the mother, and the offspring (in this order) of a specific trio. If data for a family with more than one children are available,
each of the children is treated as a separate trio.
Holger Schwender, [email protected]
## Not run: # Assuming there is a ped-file called pedfile.ped in the # R working directory, this file can be read into R by ped <- read.pedfile("pedfile.ped") # The resulting data frame is in the typical ped format # which needs to be transformed into the genotype format # for applications of most of the functions in the trio # package. This transformation can be done by geno <- ped2geno(ped) # This transformation can also be done directly when # reading the ped-file into R by geno2 <- read.pedfile("pedfile.ped", p2g = TRUE) ## End(Not run)
## Not run: # Assuming there is a ped-file called pedfile.ped in the # R working directory, this file can be read into R by ped <- read.pedfile("pedfile.ped") # The resulting data frame is in the typical ped format # which needs to be transformed into the genotype format # for applications of most of the functions in the trio # package. This transformation can be done by geno <- ped2geno(ped) # This transformation can also be done directly when # reading the ped-file into R by geno2 <- read.pedfile("pedfile.ped", p2g = TRUE) ## End(Not run)
Plots either the pairwise r^2 or D' values computed by either getLD
or getLDlarge
. Can also be used to plot
the categorizations used in the procedure of Gabriel et al. (2002).
## S3 method for class 'getLD' plot(x, y = "rSquare", start = 1, end = NA, squared = TRUE, col = NULL, xlab = "", ylab = "", cexAxis = 0.8, alpha = 0.1, ciLD = c(0.7, 0.98), cuRecomb = 0.9, ...) ## S3 method for class 'getLDlarge' plot(x, y = "rSquare", start = NA, end = NA, squared = TRUE, col = NULL, xlab = "", ylab = "", cexAxis = 0.8, alpha = 0.1, ciLD = c(0.7,0.98), cuRecomb = 0.9, ...)
## S3 method for class 'getLD' plot(x, y = "rSquare", start = 1, end = NA, squared = TRUE, col = NULL, xlab = "", ylab = "", cexAxis = 0.8, alpha = 0.1, ciLD = c(0.7, 0.98), cuRecomb = 0.9, ...) ## S3 method for class 'getLDlarge' plot(x, y = "rSquare", start = NA, end = NA, squared = TRUE, col = NULL, xlab = "", ylab = "", cexAxis = 0.8, alpha = 0.1, ciLD = c(0.7,0.98), cuRecomb = 0.9, ...)
x |
the output of |
y |
either |
start |
integer or character string specifying the index or the name of the first SNP, respectively,
that should be plotted, where the index corresponds to the column (or row if |
end |
integer or character string specifying the index or the name of the last SNP, respectively, that should be plotted. |
squared |
should the |
col |
a vector specifying the colors used in plotting of the LD values. If |
xlab |
character string naming the label of the x-axis. |
ylab |
character string naming the label of the y-axis. |
cexAxis |
a numeric value specifying the relative size of the SNP names displayed at the axes of the plot. |
alpha |
numeric value between 0 and 1. Only considered if |
ciLD |
numeric vector consisting of two values between 0 and 1.
Only considered if |
cuRecomb |
numeric value between 0 and 1. Only considered if |
... |
further arguments of |
Holger Schwender, [email protected]
Gabriel, S.B. et al. (2002). The Structure of Haplotype Blocks in the Human Genome. Science, 296, 2225-2229.
# Load the simulated data. data(trio.data) # The values of Dprime and Rsquare for each pair of SNPs # in LDdata can be computed by ld.out <- getLD(LDdata) # By default, the LD measures are returned as a vector. # If they should be returned as a matrix, then use ld.out2 <- getLD(LDdata, asMatrix = TRUE) # The matrix of the Rsquare values can be plotted by plot(ld.out) # The matrix of the Dprime values can be plotted by plot(ld.out, "Dprime")
# Load the simulated data. data(trio.data) # The values of Dprime and Rsquare for each pair of SNPs # in LDdata can be computed by ld.out <- getLD(LDdata) # By default, the LD measures are returned as a vector. # If they should be returned as a matrix, then use ld.out2 <- getLD(LDdata, asMatrix = TRUE) # The matrix of the Rsquare values can be plotted by plot(ld.out) # The matrix of the Dprime values can be plotted by plot(ld.out, "Dprime")
Plots either the pairwise D' values or the pairwise LD categorization used in the procedure of Gabriel et al. (2002). Additionally, the LD blocks are marked in this plot.
## S3 method for class 'LDblocks' plot(x, y = "gabriel", col = NULL, start = 1, end = NA, xlab = "", ylab = "", cexAxis = 0.8, block.col = 2, block.lwd = 3, ...)
## S3 method for class 'LDblocks' plot(x, y = "gabriel", col = NULL, start = 1, end = NA, xlab = "", ylab = "", cexAxis = 0.8, block.col = 2, block.lwd = 3, ...)
x |
the output of |
y |
either |
col |
a vector specifying the colors used in plotting of the LD values. If |
start |
integer or character string specifying the index or name of the first SNP, respectively,
that should be plotted, where the index corresponds to the column (or row if |
end |
integer or character string specifying the index or name of the last SNP, respectively, that should be plotted. |
xlab |
character string naming the label of the x-axis. |
ylab |
character string naming the label of the y-axis. |
cexAxis |
a numeric value specifying the relative size of the SNP names displayed at the axes of the plot. |
block.col |
the color of the lines used to show the borders of the LD blocks. |
block.lwd |
numeric value specifying the size of the lines used to show the borders of the LD blocks |
... |
further arguments of |
Holger Schwender, [email protected]
Gabriel, S.B. et al.~(2002). The Structure of Haplotype Blocks in the Human Genome. Science, 296, 2225-2229.
# Load the simulated data. data(trio.data) # Estimate LD blocks. blocks <- findLDblocks(LDdata) # Alternatively, the LD blocks can be estimated by ld.out <- getLD(LDdata, addVarN=TRUE) blocks2 <- findLDblocks(ld.out) # Plot the LD blocks showing the Gabriel categorization. plot(blocks) # Plot the LD blocks showing the Dprime values. plot(blocks, "Dprime")
# Load the simulated data. data(trio.data) # Estimate LD blocks. blocks <- findLDblocks(LDdata) # Alternatively, the LD blocks can be estimated by ld.out <- getLD(LDdata, addVarN=TRUE) blocks2 <- findLDblocks(ld.out) # Plot the LD blocks showing the Gabriel categorization. plot(blocks) # Plot the LD blocks showing the Dprime values. plot(blocks, "Dprime")
Plots the logic trees or information on the visited models generated in a the trio logic regression analyis with trioLR
.
## S3 method for class 'trioLR' plot(x, whichTree = NA, freqType = 1, useNames = FALSE, addStats = TRUE, digits = 3, main = NULL, cexOper=1.5, cexLeaf=1.5, sizeLeaf=7, cexPar=1.3, ...)
## S3 method for class 'trioLR' plot(x, whichTree = NA, freqType = 1, useNames = FALSE, addStats = TRUE, digits = 3, main = NULL, cexOper=1.5, cexLeaf=1.5, sizeLeaf=7, cexPar=1.3, ...)
x |
an object of class |
whichTree |
positive integer specifying the model for which the logic tree should be plotted when several trio logic regression models with different maximum numbers of leaves have been fitted. Ignored if just one model has been fitted using simulated annealing or MCMC has been employed to perform a Trio Logic Regression. |
freqType |
positive integer between 1 and 3 specifying which statistics from the MC Trio Logic Regression
analysis should be plotted. If |
useNames |
should the names of the variables be used in the plots? If |
addStats |
should the coefficient in the trio logic regression model and the score for the fitted model
be shown in the plot? Ignored if MCMC has been used in |
digits |
number of digits used in the presentation of the coefficient and score (see |
main |
character string specifying the title that should be added to the plot. If |
cexOper |
the relative size of the AND- and OR-operators in the plotting of the logic tree. Ignored if MCMC
has been used in |
cexLeaf |
the relative size of the variable names shown in the logic tree. Ignored if MCMC has been used in |
sizeLeaf |
the relative size of the boxes representing the leaves in the logic trees. Ignored if MCMC has been
used in |
cexPar |
the relative size of the coefficient and the score (see |
... |
ignored. |
Holger Schwender, [email protected], based on the plot
functions
implemented by Ingo Ruczinski and Charles Kooperberg in the R
package LogicReg
.
Kooperberg, C. and Ruczinski, I. (2005). Identifying Interacting SNPs Using Monte Carlo Logic Regression. Genetic Epidemiology, 28, 157-170.
Li, Q., Fallin, M.D., Louis, T.A., Lasseter, V.K., McGrath, J.A., Avramopoulos, D., Wolyniec, P.S., Valle, D., Liang, K.Y., Pulver, A.E., and Ruczinski, I. (2010). Detection of SNP-SNP Interactions in Trios of Parents with Schizophrenic Children. Genetic Epidemiology, 34, 396-406.
Ruczinski, I., Kooperberg, C., and LeBlanc, M.L. (2003). Logic Regression. Journal of Computational and Graphical Statistics, 12, 475-511.
# Load the simulated data. data(trio.data) # Prepare the data in trio.ped1 for a trio logic # regression analysis by first calling trio.tmp <- trio.check(dat = trio.ped1) # and then applying set.seed(123456) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) # where we here assume the block structure to be # c(1, 4, 2, 3), which means that the first LD "block" # only consists of the first SNP, the second LD block # consists of the following four SNPs in trio.bin, # the third block of the following two SNPs, # and the last block of the last three SNPs. # set.seed() is specified to make the results reproducible. # For the application of trio logic regression, some # parameters of trio logic regression are changed # to make the following example faster. my.control <- lrControl(start=1, end=-3, iter=1000, output=-4) # Please note typically you should consider much more # than 1000 iterations (usually, at least a few hundred # thousand). # Trio regression can then be applied to the trio data in # trio.ped1 by lr.out <- trioLR(trio.bin, control=my.control, rand=9876543) # where we specify rand just to make the results reproducible. # The logic tree representing the logic expression found in # the trio logic regression analysis can then be plotted by plot(lr.out)
# Load the simulated data. data(trio.data) # Prepare the data in trio.ped1 for a trio logic # regression analysis by first calling trio.tmp <- trio.check(dat = trio.ped1) # and then applying set.seed(123456) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) # where we here assume the block structure to be # c(1, 4, 2, 3), which means that the first LD "block" # only consists of the first SNP, the second LD block # consists of the following four SNPs in trio.bin, # the third block of the following two SNPs, # and the last block of the last three SNPs. # set.seed() is specified to make the results reproducible. # For the application of trio logic regression, some # parameters of trio logic regression are changed # to make the following example faster. my.control <- lrControl(start=1, end=-3, iter=1000, output=-4) # Please note typically you should consider much more # than 1000 iterations (usually, at least a few hundred # thousand). # Trio regression can then be applied to the trio data in # trio.ped1 by lr.out <- trioLR(trio.bin, control=my.control, rand=9876543) # where we specify rand just to make the results reproducible. # The logic tree representing the logic expression found in # the trio logic regression analysis can then be plotted by plot(lr.out)
While poly4root
computes the (real-valued) roots of a polynomial of fourth degree, poly4rootMat
can
be applied to several polynomials of fourh degree at once by assuming that each row the input matrix contains the
coefficients for one of the polynomials.
poly4root(a) poly4rootMat(amat)
poly4root(a) poly4rootMat(amat)
a |
a numeric vector of length five specifying the coefficients of the polynomial |
amat |
a numeric matrix with five columns in which each row contains the five coefficients of a polynomial of fourth degree. |
For poly4root
, a vector containing the real-valued roots of the polynomial. For poly4rootMat
,
a matrix with four columns in which each row contains the real-valued roots of the corresponding polynomial. If
a polynomial has less than four real-valued roots, the remaining entries in the corresponding row are set to NA
.
Holger Schwender, [email protected]
# The roots of # 2 * x^4 + 3 * x^3 - x^2 + 5 * x^1 - 4 # can be determined by poly4root(c(2, 3, -1, 5, -4))
# The roots of # 2 * x^4 + 3 * x^3 - x^2 + 5 * x^1 - 4 # can be determined by poly4root(c(2, 3, -1, 5, -4))
Print
s the statistics computed with colGxE
. getGxEstats
generates a data frame
containing these statistics.
## S3 method for class 'colGxE' print(x, top = 5, digits = 4, onlyGxE = FALSE, ...) ## S3 method for class 'colGxEunstruct' print(x, top = 5, digits = 4, ...) getGxEstats(x, top = NA, sortBy = c("none", "gxe", "lrt2df", "wald2df", "lrt1df", "g"))
## S3 method for class 'colGxE' print(x, top = 5, digits = 4, onlyGxE = FALSE, ...) ## S3 method for class 'colGxEunstruct' print(x, top = 5, digits = 4, ...) getGxEstats(x, top = NA, sortBy = c("none", "gxe", "lrt2df", "wald2df", "lrt1df", "g"))
x |
an object of class |
top |
number of top interactions that should be printed or stored in a data frame. If |
onlyGxE |
logical indicating whether only the statistics for the parameter of the GxE interaction should be printed.
If |
digits |
number of digits that should be printed. |
... |
ignored. |
sortBy |
character string specifying by the p-value of which test the SNPs should be sorted. If |
Holger Schwender, [email protected]
Schwender, H., Taub, M.A., Beaty, T.H., Marazita, M.L., and Ruczinski, I. (2011). Rapid Testing of SNPs and Gene-Environment Interactions in Case-Parent Trio Data Based on Exact Analytic Parameter Estimation. Biometrics, 68, 766-773.
# Load the simulated data for the analysis. data(trio.data) # Set up a vector with the binary environmental variable. # Here, we consider the gene-gender interactions and # assume that the children in the first 50 trios are # girls, and the remaining 50 are boys. sex <- rep(0:1, each = 50) # Test the interaction of sex with each of the SNPs in mat.test gxe.out <- colGxE(mat.test, sex) # By default, the statistics are shown for the parameters of # the top 5 GxE interactions and the parameters of the # corresponding SNPs. gxe.out # If the top 10 GxE interactions should be displayed, then this # can be done by print(gxe.out, top = 10) # The statististics for all GxE interactions (and SNPs) are # shown, when calling print(gxe.out, top = 0) # If only the statistics for the GxE parameters, but not for # the SNPs should be displayed, then use print(gxe.out, onlyGxE = TRUE) # A convenient way to generate a data frame with all the statistics # computed by colGxE either for the top SNPs or for all SNPs (here, # the top 10 SNPs) ordered by the p-values of one of the considered # tests, e.g., the 2 df likelihood ratio test, is dat.top3 <- getGxEstats(gxe.out, top = 10, sortBy = "lrt2df")
# Load the simulated data for the analysis. data(trio.data) # Set up a vector with the binary environmental variable. # Here, we consider the gene-gender interactions and # assume that the children in the first 50 trios are # girls, and the remaining 50 are boys. sex <- rep(0:1, each = 50) # Test the interaction of sex with each of the SNPs in mat.test gxe.out <- colGxE(mat.test, sex) # By default, the statistics are shown for the parameters of # the top 5 GxE interactions and the parameters of the # corresponding SNPs. gxe.out # If the top 10 GxE interactions should be displayed, then this # can be done by print(gxe.out, top = 10) # The statististics for all GxE interactions (and SNPs) are # shown, when calling print(gxe.out, top = 0) # If only the statistics for the GxE parameters, but not for # the SNPs should be displayed, then use print(gxe.out, onlyGxE = TRUE) # A convenient way to generate a data frame with all the statistics # computed by colGxE either for the top SNPs or for all SNPs (here, # the top 10 SNPs) ordered by the p-values of one of the considered # tests, e.g., the 2 df likelihood ratio test, is dat.top3 <- getGxEstats(gxe.out, top = 10, sortBy = "lrt2df")
Prints or plots the most important interactions found in a trioFS analysis.
## S3 method for class 'trioFS' print(x, topX = 5, show.prop = TRUE, coded = FALSE, digits = 2, ...) ## S3 method for class 'trioFS' plot(x, topX = 15, show.prop = FALSE, coded = TRUE, cex = 0.9, pch = 16, col = 1, force.topX = FALSE, include0 = TRUE, add.v0 = TRUE, v0.col = "grey50", main = NULL, ...)
## S3 method for class 'trioFS' print(x, topX = 5, show.prop = TRUE, coded = FALSE, digits = 2, ...) ## S3 method for class 'trioFS' plot(x, topX = 15, show.prop = FALSE, coded = TRUE, cex = 0.9, pch = 16, col = 1, force.topX = FALSE, include0 = TRUE, add.v0 = TRUE, v0.col = "grey50", main = NULL, ...)
x |
an object of class |
topX |
integer specifying how many interactions should be shown.
If |
show.prop |
should the proportions of models containing the respective interactions be
added to the output (if |
coded |
should the coded variable names be displayed? Might be useful
if the actual variable names are pretty long. The coded variable name of
the j-th variable is |
digits |
number of digits shown in the |
cex |
a numeric value specifying the relative size of the text and symbols. |
pch |
specifies the used symbol. See the help of |
col |
the color of the text and the symbols. See the help of |
force.topX |
if |
include0 |
should the |
add.v0 |
should a vertical line be drawn at |
v0.col |
the color of the vertical line at |
main |
character string naming the title of the plot. If |
... |
Ignored. |
Holger Schwender, [email protected]
# Load the simulated data. data(trio.data) # Prepare the data in trio.ped1 for a trioFS analysis # by first calling trio.tmp <- trio.check(dat = trio.ped1) # and then applying set.seed(123456) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) # where we here assume the block structure to be # c(1, 4, 2, 3), which means that the first LD "block" # only consists of the first SNP, the second LD block # consists of the following four SNPs in trio.bin, # the third block of the following two SNPs, # and the last block of the last three SNPs. # set.seed() is specified to make the results reproducible. # For the application of trioFS, some parameters of trio # logic regression are changed to make the following example faster. my.control <- lrControl(start=1, end=-3, iter=1000, output=-4) # Please note typically you should consider much more # than 1000 iterations (usually, at least a few hundred # thousand). # TrioFS can then be applied to the trio data in trio.ped1 by fs.out <- trioFS(trio.bin, control=my.control, rand=9876543) # where we specify rand just to make the results reproducible. # The output of trioFS can be printed by fs.out # By default, the five most important interactions are displayed. # If another number of interactions, e.g., 10, should be shown, # then this can be done by print(fs.out, topX = 10) # The importances can also be plotted by plot(fs.out)
# Load the simulated data. data(trio.data) # Prepare the data in trio.ped1 for a trioFS analysis # by first calling trio.tmp <- trio.check(dat = trio.ped1) # and then applying set.seed(123456) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) # where we here assume the block structure to be # c(1, 4, 2, 3), which means that the first LD "block" # only consists of the first SNP, the second LD block # consists of the following four SNPs in trio.bin, # the third block of the following two SNPs, # and the last block of the last three SNPs. # set.seed() is specified to make the results reproducible. # For the application of trioFS, some parameters of trio # logic regression are changed to make the following example faster. my.control <- lrControl(start=1, end=-3, iter=1000, output=-4) # Please note typically you should consider much more # than 1000 iterations (usually, at least a few hundred # thousand). # TrioFS can then be applied to the trio data in trio.ped1 by fs.out <- trioFS(trio.bin, control=my.control, rand=9876543) # where we specify rand just to make the results reproducible. # The output of trioFS can be printed by fs.out # By default, the five most important interactions are displayed. # If another number of interactions, e.g., 10, should be shown, # then this can be done by print(fs.out, topX = 10) # The importances can also be plotted by plot(fs.out)
Prints information on the trio logic regression model(s) fitted with trioLR
.
## S3 method for class 'trioLR' print(x, asDNF=FALSE, posBeta=FALSE, digits = 3, ...)
## S3 method for class 'trioLR' print(x, asDNF=FALSE, posBeta=FALSE, digits = 3, ...)
x |
an object of class |
asDNF |
should the disjunctive normal form of the logic expression represented by the logic tree be printed?
If |
posBeta |
should the disjunctive normal form be determined as if the sign of the coefficient in trio logic
regression model is positive? If |
digits |
number of digits used in the printing of the score and the parameter estimate of the fitted trio logic regression model(s). |
... |
ignored. |
Holger Schwender, [email protected], based on the plot
functions
implemented by Ingo Ruczinski and Charles Kooperberg in the R
package LogicReg
.
Kooperberg, C. and Ruczinski, I. (2005). Identifying Interacting SNPs Using Monte Carlo Logic Regression. Genetic Epidemiology, 28, 157-170.
Li, Q., Fallin, M.D., Louis, T.A., Lasseter, V.K., McGrath, J.A., Avramopoulos, D., Wolyniec, P.S., Valle, D., Liang, K.Y., Pulver, A.E., and Ruczinski, I. (2010). Detection of SNP-SNP Interactions in Trios of Parents with Schizophrenic Children. Genetic Epidemiology, 34, 396-406.
Ruczinski, I., Kooperberg, C., and LeBlanc, M.L. (2003). Logic Regression. Journal of Computational and Graphical Statistics, 12, 475-511.
# Load the simulated data. data(trio.data) # Prepare the data in trio.ped1 for a trio logic # regression analysis by first calling trio.tmp <- trio.check(dat = trio.ped1) # and then applying set.seed(123456) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) # where we here assume the block structure to be # c(1, 4, 2, 3), which means that the first LD "block" # only consists of the first SNP, the second LD block # consists of the following four SNPs in trio.bin, # the third block of the following two SNPs, # and the last block of the last three SNPs. # set.seed() is specified to make the results reproducible. # For the application of trio logic regression, some # parameters of trio logic regression are changed # to make the following example faster. my.control <- lrControl(start=1, end=-3, iter=1000, output=-4) # Please note typically you should consider much more # than 1000 iterations (usually, at least a few hundred # thousand). # Trio regression can then be applied to the trio data in # trio.ped1 by lr.out <- trioLR(trio.bin, control=my.control, rand=9876543) # where we specify rand just to make the results reproducible. # The output of trioLR can then be displayed by lr.out # This output shows the detected logic expression. If this # expression should be displayed in disjunctive normal form, # then this can be done by print(lr.out, asDNF = TRUE)
# Load the simulated data. data(trio.data) # Prepare the data in trio.ped1 for a trio logic # regression analysis by first calling trio.tmp <- trio.check(dat = trio.ped1) # and then applying set.seed(123456) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) # where we here assume the block structure to be # c(1, 4, 2, 3), which means that the first LD "block" # only consists of the first SNP, the second LD block # consists of the following four SNPs in trio.bin, # the third block of the following two SNPs, # and the last block of the last three SNPs. # set.seed() is specified to make the results reproducible. # For the application of trio logic regression, some # parameters of trio logic regression are changed # to make the following example faster. my.control <- lrControl(start=1, end=-3, iter=1000, output=-4) # Please note typically you should consider much more # than 1000 iterations (usually, at least a few hundred # thousand). # Trio regression can then be applied to the trio data in # trio.ped1 by lr.out <- trioLR(trio.bin, control=my.control, rand=9876543) # where we specify rand just to make the results reproducible. # The output of trioLR can then be displayed by lr.out # This output shows the detected logic expression. If this # expression should be displayed in disjunctive normal form, # then this can be done by print(lr.out, asDNF = TRUE)
Computes the genotypic TDT for a a matrix representing SNP genotype probabilities.
probTDT(mat.geno, model = c("additive", "dominant", "recessive"), size = 50)
probTDT(mat.geno, model = c("additive", "dominant", "recessive"), size = 50)
mat.geno |
a numeric matrix with one row for each SNP and |
model |
type of model that should be fitted. Abbreviations are allowed. Thus, e.g., |
size |
the number of SNPs considered simultaneously when computing the parameter estimates. Ignored if |
An object of class colTDT
consisting of the following numeric values or vectors, respectively:
coef |
the estimated parameter, |
se |
the estimated standard deviation of the parameter estimate, |
stat |
Wald statistic, |
RR |
the relative risk, i.e.\ for trio data, |
lowerRR |
the lower bound of the 95% confidence interval for |
upperRR |
the upper bound of the 95% confidence interval for |
usedTrios |
the number of trios affecting the parameter estimation, |
pMendelErr |
the sum across families of probabilities of Mendelian errors. |
Margaret Taub, [email protected]
Schaid, D.J. (1996). General Score Tests for Associations of Genetic Markers with Disease Using Cases and Their Parents. Genetic Epidemiology, 13, 423-449.
Schwender, H., Taub, M.A., Beaty, T.H., Marazita, M.L., and Ruczinski, I. (2011). Rapid Testing of SNPs and Gene-Environment Interactions in Case-Parent Trio Data Based on Exact Analytic Parameter Estimation. Biometrics, 68, 766-773.
Taub M.A., Schwender H., Beatty T.H., Louis T.A., Ruczinski I. (2012). Incorporating genotype uncertainties into the genotypic TDT for main effects and gene-environment interactions. Genetic Epidemiology, 36, 225-234.
# Load the simulated data. data(trio.data) # All SNPs in prob.mat.test can be tested by prob.tdt.out <- probTDT(prob.mat.test) # By default, an additive mode of inheritance is considered. # If another mode, e.g., the dominant mode, should be # considered, then this can be done by prob.tdt.out2 <- probTDT(prob.mat.test, model = "dominant") # By default, statistics for the top 5 SNPs are displayed. # If another number of SNPs, say 10, should be displayed, # then this can be done by print(prob.tdt.out2, top = 10) # The statistics for all SNPs (not ordered by their # significance) can be obtained by print(prob.tdt.out2, top = 0)
# Load the simulated data. data(trio.data) # All SNPs in prob.mat.test can be tested by prob.tdt.out <- probTDT(prob.mat.test) # By default, an additive mode of inheritance is considered. # If another mode, e.g., the dominant mode, should be # considered, then this can be done by prob.tdt.out2 <- probTDT(prob.mat.test, model = "dominant") # By default, statistics for the top 5 SNPs are displayed. # If another number of SNPs, say 10, should be displayed, # then this can be done by print(prob.tdt.out2, top = 10) # The statistics for all SNPs (not ordered by their # significance) can be obtained by print(prob.tdt.out2, top = 0)
Reads a ped file into R and creates a data frame in ped format, or transform the ped file into a matrix in genotype format.
read.pedfile(file, first.row = NA, coded = NULL, naVal = 0, sep = " ", p2g = FALSE, non.rs.IDs = FALSE, cols4ID=FALSE)
read.pedfile(file, first.row = NA, coded = NULL, naVal = 0, sep = " ", p2g = FALSE, non.rs.IDs = FALSE, cols4ID=FALSE)
file |
the filename (if necessary with path) of a ped file that should be read into R. |
first.row |
logical indicating whether the first row of |
coded |
a character string stating how the alleles of the SNPs are coded. Possible values are |
naVal |
value or character string specifying how missing values in the SNP data are coded. |
sep |
character string specifying how the SNP names in the first row of |
p2g |
logical indicating whether the ped file should be transformed into a matrix in genotype format. If |
non.rs.IDs |
logical indicating whether (some of) the SNP names are specified by other names than rs-IDs. |
cols4ID |
logical indicating whether columns should be added to output matrix containing the family ID and the individual ID.
If |
A data frame in ped format (if p2g = FALSE
), or a matrix in genotype format (if p2g = TRUE
).
Holger Schwender, [email protected]
## Not run: # Assuming there is a ped-file called pedfile.ped in the # R working directory, this file can be read into R by ped <- read.pedfile("pedfile.ped") # The resulting data frame is in the typical ped format # which needs to be transformed into the genotype format # for applications of most of the functions in the trio # package. This transformation can be done by geno <- ped2geno(ped) # This transformation can also be done directly when # reading the ped-file into R by geno2 <- read.pedfile("pedfile.ped", p2g = TRUE) ## End(Not run)
## Not run: # Assuming there is a ped-file called pedfile.ped in the # R working directory, this file can be read into R by ped <- read.pedfile("pedfile.ped") # The resulting data frame is in the typical ped format # which needs to be transformed into the genotype format # for applications of most of the functions in the trio # package. This transformation can be done by geno <- ped2geno(ped) # This transformation can also be done directly when # reading the ped-file into R by geno2 <- read.pedfile("pedfile.ped", p2g = TRUE) ## End(Not run)
Functions for removing SNPs with a low minor allele frequency or a high percentage of missing values, for removing trios in which at least one member shows a high percentage of missing values, for ordering the SNPs by their position in the genome, and for computing the minor allele frequencies of the SNPs based on only the genotypes of the parents, where each parent is only used once in this computation, even if this person is part of more than one of the trios.
removeSNPs(geno, maf = NA, perc.na = NA) removeTrios(geno, perc.na = 1) orderSNPs(geno, map, snp = "SNP", orderBy = c("Chr", "Position")) colMAFtrio(geno, changeMinor = FALSE)
removeSNPs(geno, maf = NA, perc.na = NA) removeTrios(geno, perc.na = 1) orderSNPs(geno, map, snp = "SNP", orderBy = c("Chr", "Position")) colMAFtrio(geno, changeMinor = FALSE)
geno |
a matrix in genotype format, i.e.\ the output of |
maf |
a numeric value. If specified, i.e.\ not |
perc.na |
a numeric value between 0 and 1 specifying a cutoff for the percentage of missing values that a SNP or a subject
is allowed to have. If more than 100 * |
map |
a data frame containing the chromosome and the position for all the SNPs in |
snp |
a character string giving the (case-sensitive) name of the column of |
orderBy |
character string of length 2 specifying the (case-sensitive) names of the columns of |
changeMinor |
logical specifying whether 1 - minor allele frequency should be returned when the MAF is larger than 0.5.
The MAF might be larger than 0.5, if the minor allele was specified on another data set than the one considered in |
For removeSNPs
, removeTrios
, and orderSNPs
, a reduced or ordered version of geno
.
For colMAFtrio
, a vector containing the minor allele frequencies of the SNPs in geno
.
Holger Schwender, [email protected]
# Load the simulated data. data(trio.data) # All SNPs with a minor allele frequency smaller than 0.1 # can be removed from mat.test by mat2 <- removeSNPs(mat.test, maf = 0.1) # The minor allele frequencies for all SNPs can be # determined (based on the genotypes of the parents) by maf <- colMAFtrio(mat.test)
# Load the simulated data. data(trio.data) # All SNPs with a minor allele frequency smaller than 0.1 # can be removed from mat.test by mat2 <- removeSNPs(mat.test, maf = 0.1) # The minor allele frequencies for all SNPs can be # determined (based on the genotypes of the parents) by maf <- colMAFtrio(mat.test)
Performs score tests for all individual SNPs (scoreTDT
), all interactions of each SNP with an environmental variable (scoreGxE
),
or all interactions of two SNPs (scoreGxG
) comprised by an input matrix based on the same log-likelihood considered in the corresponding
genotypic TDT, where in scoreGxG
the conditional logistic regression model including only one parameter (for the interaction effect)
is used.
Additionally, the maximum over the score statistics for testing an additive, dominant, and recessive effect can be
determined using scoreMaxStat
.
scoreTDT(mat.snp, model = c("additive", "dominant", "recessive"), size = 20) scoreGxE(mat.snp, env, model = c("additive", "dominant", "recessive"), size = 20, famid = NULL) scoreGxG(mat.snp, model = c("additive", "dominant", "recessive"), genes = NULL, size = 20) scoreMaxStat(mat.snp, size = 20) ## S3 method for class 'scoreTDT' print(x, top = 5, digits = 4, ...) ## S3 method for class 'scoreGxE' print(x, top = 5, digits = 4, onlyGxE = FALSE, ...) ## S3 method for class 'maxScoreTrio' print(x, top = 5, digits = 4, ...)
scoreTDT(mat.snp, model = c("additive", "dominant", "recessive"), size = 20) scoreGxE(mat.snp, env, model = c("additive", "dominant", "recessive"), size = 20, famid = NULL) scoreGxG(mat.snp, model = c("additive", "dominant", "recessive"), genes = NULL, size = 20) scoreMaxStat(mat.snp, size = 20) ## S3 method for class 'scoreTDT' print(x, top = 5, digits = 4, ...) ## S3 method for class 'scoreGxE' print(x, top = 5, digits = 4, onlyGxE = FALSE, ...) ## S3 method for class 'maxScoreTrio' print(x, top = 5, digits = 4, ...)
mat.snp |
a numeric matrix in which each column represents a SNP. Each column must be
a numeric vector of length |
model |
type of model that should be fitted. Abbreviations are allowed. Thus, e.g., |
size |
the number of models considered simultaneously when computing the parameter estimates. |
env |
a vector of length |
famid |
a vector of the same length as |
genes |
a character vector containing the names of the genes (or LD-blocks or other genetic sets of SNPs) to which the SNPs belong.
If specified, only the two-way interactions between SNPs from different genes (or LD-blocks or other genetic sets of SNPs) are tested.
If |
x |
an object of class |
digits |
number of digits that should be printed. |
top |
number of interactions that should be printed. If the number of interactions is smaller than or equal to
|
onlyGxE |
logical indicating whether only the statistics for the parameter of the GxE interaction should be printed.
If |
... |
ignored. |
For scoreTDT
and scoreGxG
, an object of class scoreTDT
containing numeric vectors
score |
the scores for all SNPs or SNP interactions, |
info |
the denominators of the corresponding score statistics |
,
stat |
the values of the score statistics for all SNPs or SNP interactions |
,
pval |
the corresponding p-values computed based on a ChiSquare-distribution with 1 degree of freedom. |
Holger Schwender, [email protected]
# Load the simulated data. data(trio.data) # A score test can be applied to the SNPs in # mat.test by s.out <- scoreTDT(mat.test) # By default, an additive mode of inheritance is considered. # Another mode, e.g., the dominant mode can be considered by sDom.out <- scoreTDT(mat.test, model = "dominant") # The test statistic of the MAX score test can be computed by sMax.out <- scoreMaxStat(mat.test) # The interaction between a binary environmental factor, # e.g., the gender, and each SNP in mat.test can be tested # by setting up the vector containing the value of the # environmental factor for each trio. If we, e.g., assume # that the children in the first 50 trios are girls # and in the remaininge 50 trios boys, then this vector # can be generated by sex <- rep(0:1, each = 50) # and the interaction between sex and each SNP in mat.test # can be tested with a score test by sgxe.out <- scoreGxE(mat.test, sex) # The interactions between all pairs of SNPs in mat.test # can be tested with a score test by sgxg.out <- scoreGxG(mat.test)
# Load the simulated data. data(trio.data) # A score test can be applied to the SNPs in # mat.test by s.out <- scoreTDT(mat.test) # By default, an additive mode of inheritance is considered. # Another mode, e.g., the dominant mode can be considered by sDom.out <- scoreTDT(mat.test, model = "dominant") # The test statistic of the MAX score test can be computed by sMax.out <- scoreMaxStat(mat.test) # The interaction between a binary environmental factor, # e.g., the gender, and each SNP in mat.test can be tested # by setting up the vector containing the value of the # environmental factor for each trio. If we, e.g., assume # that the children in the first 50 trios are girls # and in the remaininge 50 trios boys, then this vector # can be generated by sex <- rep(0:1, each = 50) # and the interaction between sex and each SNP in mat.test # can be tested with a score test by sgxe.out <- scoreGxE(mat.test, sex) # The interactions between all pairs of SNPs in mat.test # can be tested with a score test by sgxg.out <- scoreGxG(mat.test)
Computes the genotypic TDT for a SNP or for each column of a matrix representing a SNP.
tdt(snp, model = c("additive", "dominant", "recessive")) colTDT(mat.snp, model = c("additive", "dominant", "recessive"), size = 50) ## S3 method for class 'tdt' print(x, digits = 4, ...) ## S3 method for class 'colTDT' print(x, top = 5, digits = 4, ...)
tdt(snp, model = c("additive", "dominant", "recessive")) colTDT(mat.snp, model = c("additive", "dominant", "recessive"), size = 50) ## S3 method for class 'tdt' print(x, digits = 4, ...) ## S3 method for class 'colTDT' print(x, top = 5, digits = 4, ...)
snp |
a numeric vector of length |
mat.snp |
a numeric matrix in which each column represents a SNP. Each of the SNPs must have the same structure
as |
model |
type of model that should be fitted. Abbreviations are allowed. Thus, e.g., |
size |
the number of SNPs considered simultaneously when computing the parameter estimates. Ignored if |
x |
an object of class |
digits |
number of digits that should be printed. |
top |
number of interactions that should be printed. If |
... |
ignored. |
An object of class tdt
or colTDT
consisting of the following numeric values or vectors, respectively:
coef |
the estimated parameter, |
se |
the estimated standard deviation of the parameter estimate, |
stat |
Wald statistic, |
RR |
the relative risk, i.e.\ for trio data, |
lowerRR |
the lower bound of the 95% confidence interval for |
upperRR |
the upper bound of the 95% confidence interval for |
usedTrios |
the number of trios affecting the parameter estimation (only for |
... |
further internal parameters |
Holger Schwender, [email protected]
Schaid, D.J. (1996). General Score Tests for Associations of Genetic Markers with Disease Using Cases and Their Parents. Genetic Epidemiology, 13, 423-449.
Schwender, H., Taub, M.A., Beaty, T.H., Marazita, M.L., and Ruczinski, I. (2011). Rapid Testing of SNPs and Gene-Environment Interactions in Case-Parent Trio Data Based on Exact Analytic Parameter Estimation. Biometrics, 68, 766-773.
# Load the simulated data. data(trio.data) # One particular SNP (e.g., the one in the first # column of mat.test) can be tested by tdt.out <- tdt(mat.test[,1]) # All SNPs in mat.test can be tested by tdt.out2 <- colTDT(mat.test) # By default, an additive mode of inheritance is considered. # If another mode, e.g., the dominant mode, should be # considered, then this can be done by tdt.out3 <- colTDT(mat.test, model = "dominant") # By default, statistics for the top 5 SNPs are displayed. # If another number of SNPs, say 10, should be displayed, # then this can be done by print(tdt.out2, top = 10) # The statistics for all SNPs (not ordered by their # significance) can be obtained by print(tdt.out2, top = 0)
# Load the simulated data. data(trio.data) # One particular SNP (e.g., the one in the first # column of mat.test) can be tested by tdt.out <- tdt(mat.test[,1]) # All SNPs in mat.test can be tested by tdt.out2 <- colTDT(mat.test) # By default, an additive mode of inheritance is considered. # If another mode, e.g., the dominant mode, should be # considered, then this can be done by tdt.out3 <- colTDT(mat.test, model = "dominant") # By default, statistics for the top 5 SNPs are displayed. # If another number of SNPs, say 10, should be displayed, # then this can be done by print(tdt.out2, top = 10) # The statistics for all SNPs (not ordered by their # significance) can be obtained by print(tdt.out2, top = 0)
tdtGxG
and colGxG
perform the genotypic TDT for the interaction of two SNPs or of each pair of columns
of a genotype matrix, respectively.
fastGxG
provides a fast implementation for the genotypic TDT for two-way interactions when considering the
simplest conditional logistic regression model only containing one parameter for the interaction effect. It thus leads
to the same results as colGxG
with test = "screen"
. In fastGxGrec
,
an analytic solution to the genotypic TDT based on the simplest model for testing a recessive x recessive model is
implemented, which is even faster than fastGxG
with model = "recessive"
. In future versions of this
package, fastGxG
and fastGxGrec
will be joint with colGxG
.
The genotypic TDT for testing two-way interactions makes use of the 16 possible genotypes that can be obtained from combining the parents' genotypes of the two considered SNPs. Thus, for each family, genotypes for one case (i.e. the affected offspring) and 15 pseudo-controls are used.
tdtGxG(snp1, snp2, test = c("epistatic", "lrt", "full", "screen"), model = c("additive", "dominant", "recessive")) colGxG(mat.snp, test = c("epistatic", "lrt", "full", "screen"), genes = NULL, maf = FALSE, model = c("additive", "dominant", "recessive")) fastGxG(mat.snp, model = c("additive", "dominant", "recessive"), genes = NULL, interval = c(-10, 10), tol = 10^-8, maxiter = 1000, size = 20) fastGxGrec(mat.snp, genes = NULL, size = 20)
tdtGxG(snp1, snp2, test = c("epistatic", "lrt", "full", "screen"), model = c("additive", "dominant", "recessive")) colGxG(mat.snp, test = c("epistatic", "lrt", "full", "screen"), genes = NULL, maf = FALSE, model = c("additive", "dominant", "recessive")) fastGxG(mat.snp, model = c("additive", "dominant", "recessive"), genes = NULL, interval = c(-10, 10), tol = 10^-8, maxiter = 1000, size = 20) fastGxGrec(mat.snp, genes = NULL, size = 20)
snp1 , snp2
|
numeric vectors of length |
mat.snp |
a numeric matrix in which each column represents a SNP. Each of the SNPs must have
the same structure as |
test |
character string naming the GxG test that should be performed. If |
genes |
a character vector containing the names of the genes to which the SNPs belong. If specified, only the two-way interactions
between SNPs from different genes are tested. If |
maf |
logical indicating whether the minor allele frequency (computed by considering the genotypes of only the parents) should be added to the output. |
model |
type of model that should be considered. Abbreviations are allowed. Thus, e.g., |
interval |
the end-points of the interval to be searched for the root. For details, see |
tol |
the desired accuracy/convergence tolerance. For details, see |
maxiter |
the maximum number of iterations. For details, see |
size |
the number of interactions considered simultaneously when computing the parameter estimates. |
Depending on test
, the output contains statistics and p-values either of a likelihood ratio test (test = "epistatic"
or
test = "lrt"
) or the Wald statistics and the corresponding p-values for the interaction term in the conditional logistic
regression model (test = "full"
or test = "screen"
). If maf = TRUE
, a vector maf
containing the minor allele frequencies of each SNP and a matrix mat.maf
with two columns containing the SNP-wise minor allele
frequencies for each tested pair of SNPs are added to the output of colGxG
.
Holger Schwender, [email protected]
Cordell, H. J. (2002). Epistasis: What it Means, what it Doesn't mean, and Statistical Methods to Detect it in Humans. Human Molecular Genetics, 11, 2463-2468.
Schwender, H., Taub, M.A., Beaty, T.H., Marazita, M.L., and Ruczinski, I. (2011). Rapid Testing of SNPs and Gene-Environment Interactions in Case-Parent Trio Data Based on Exact Analytic Parameter Estimation. Biometrics, 68, 766-773.
# Load the simulated data. data(trio.data) # The interaction between a particular pair of SNPs # (e.g., the ones in the first and second column of # mat.test) can be tested by gxg.out <- tdtGxG(mat.test[,1], mat.test[,2]) # All pairs of SNPs in mat.test can be tested by gxg.out2 <- colGxG(mat.test) # By default, Cordell's likelihood ratio test for # epistatistic interactions is used. This is the # most sophisticated, but also most time-consuming # test. If another test, e.g., the one considering # a conditional logistic regression model only # containing a term for the interaction, should # be used, then this can be done by gxg.out3 <- colGxG(mat.test, test = "screen") # In this case, different modes of inheritance can # be considered (by default, the additive mode is # considered). If a dominant model (for both SNPs) # should be tested, this can be done by gxg.out4 <- colGxG(mat.test, test = "screen", model ="dom") # If just a subset of all pairs of SNPs should be # tested, e.g., only pairs of SNPs belonging to different # genes, then this can be done by first specifying a # vector specifying which SNP belongs to which genes. # If we, e.g., assume that the first two SNPs in mat.test # belong to gene G1 and the other four SNPs to G2, then # this vector can be specified by genes <- paste("G", rep(1:2, c(2,4)), sep="") # and only the pairs of SNPs in which the two SNPs belong # to different genes can be tested with Cordell's # likelihood ratio test by gxg.out5 <- colGxG(mat.test, genes = genes)
# Load the simulated data. data(trio.data) # The interaction between a particular pair of SNPs # (e.g., the ones in the first and second column of # mat.test) can be tested by gxg.out <- tdtGxG(mat.test[,1], mat.test[,2]) # All pairs of SNPs in mat.test can be tested by gxg.out2 <- colGxG(mat.test) # By default, Cordell's likelihood ratio test for # epistatistic interactions is used. This is the # most sophisticated, but also most time-consuming # test. If another test, e.g., the one considering # a conditional logistic regression model only # containing a term for the interaction, should # be used, then this can be done by gxg.out3 <- colGxG(mat.test, test = "screen") # In this case, different modes of inheritance can # be considered (by default, the additive mode is # considered). If a dominant model (for both SNPs) # should be tested, this can be done by gxg.out4 <- colGxG(mat.test, test = "screen", model ="dom") # If just a subset of all pairs of SNPs should be # tested, e.g., only pairs of SNPs belonging to different # genes, then this can be done by first specifying a # vector specifying which SNP belongs to which genes. # If we, e.g., assume that the first two SNPs in mat.test # belong to gene G1 and the other four SNPs to G2, then # this vector can be specified by genes <- paste("G", rep(1:2, c(2,4)), sep="") # and only the pairs of SNPs in which the two SNPs belong # to different genes can be tested with Cordell's # likelihood ratio test by gxg.out5 <- colGxG(mat.test, genes = genes)
This function checks case-parent trio data in linkage or
genotype format for Mendelian errors. If no errors are found, the
function returns an object suitable for input to the trio.prepare
function. Otherwise, an object identifying the Mendelian errors is
returned.
trio.check(dat, is.linkage=TRUE, replace=FALSE)
trio.check(dat, is.linkage=TRUE, replace=FALSE)
dat |
A matrix or data frame of pedigree data in linkage format, or in genotype format. If the data are in linkage format, the file has to have the
standard linkage/pedigree format. Each row describes an individual,
and the columns are <famid> <pid> <fatid> <motid> <sex>
<affected> <genotype:1_1> <genotype:1_2> ... <genotype:n_1>
<genotype:n_2>. Here, <famid> is a unique identifier for
each family, <pid> is a unique identifier for an individual
within each family, <fatid> and <motid> identify the
father and mother of the individual, <sex> denotes the
gender, using the convention 1=male, 2=female, <affected>
denotes the disease status (0=unknown, 1=unaffected, 2=affected).
Only one phenotype column is allowed. Each genotype is encoded using
two columns (<genotype:k_1> and <genotype:k_2>),
identifying the alleles (1 for the major allele, 2 for the minor
allele, 0 if missing). Other values for the alleles will result in
an error. Please see the data frames If the data are in genotype format, each row in the
object describes an individual, and each block of three consecutive
rows describes the two parents and the affected child in a trio. The
columns in the object are <famid> <pid> <genotype_1>
... <genotype_n>. Here, <famid> is a unique identifier for
each family, <pid> is a unique identifier for an individual
within each family (with each block of three consecutive rows
describing the two parents and the affected child in a trio). Each
<genotype> is encoded as an integer indicating the number of
variant alleles (e.g. 0=common homozygote, 1=heterozygote, and
2=rare homozygote, and |
is.linkage |
A logical value indicating if the case parent data
are in linkage file format ( |
replace |
A logical value indicating whether existing Mendelian
errors should be replaced by missing values. For each Mendelian
error found (for a particular trio at a particular locus), all three
genotypes are replaced by |
The first function used from this package should always be
trio.check
. Unless otherwise specified, this function assumes
that the data are in linkage format, however, genotype data can also
be accommodated. If no Mendelian inconsistencies in the data provided
are identified, trio.check
creates an object that can be
processed in the subsequent analysis with the trio.prepare
function. If the data were in linkage format, the genotype information
for each SNP will be converted into a single variable, denoting the
number of variant alleles.
To delineate the genotype information for the pseudo-controls in the
subsequent analysis, the trio data must not contain any Mendelian
errors. The function trio.check
returns a warning, and an R
object with relevant information when Mendelian errors are encountered
in the supplied trio data. It is the users responsibility to find the
cause for the Mendelian errors and correct those, if
possible. However, Mendelian inconsistencies are often due to
genotyping errors and thus, it might not be possible to correct those
in a very straightforward manner. In this instance, the user might
want to encode the genotypes that cause theses Mendelian errors in
some of the trios as missing data. The function trio.check
allows for this possibility, using the argument replace=T
.
The function trio.check
returns a list with the following
elements:
trio |
A data frame with the genotypes of the trios, suitable for
input to the function |
errors |
This element will be |
trio.err |
This element will be |
Qing Li, [email protected]
Li, Q., Fallin, M.D., Louis, T.A., Lasseter, V.K., McGrath, J.A., Avramopoulos, D., Wolyniec, P.S., Valle, D., Liang, K.Y., Pulver, A.E., and Ruczinski, I. (2010). Detection of SNP-SNP Interactions in Trios of Parents with Schizophrenic Children. Genetic Epidemiology, 34, 396-406.
data(trio.data) trio.tmp <- trio.check(dat=trio.ped1) str(trio.tmp, max=1) trio.tmp$trio[1:6,] trio.tmp <- trio.check(dat=trio.ped.err) str(trio.tmp, max=1) trio.tmp$errors trio.tmp$trio.err[1:3, c(1,2, 11:12)] trio.ped.err[1:3,c(1:2, 23:26)] trio.tmp <- trio.check(dat=trio.gen.err, is.linkage=FALSE) trio.tmp$errors trio.tmp$trio.err[1:6, c(1,2,7), drop=FALSE] trio.rep <- trio.check(dat=trio.gen.err, is.linkage=FALSE, replace=TRUE) trio.rep$trio[1:6,c(1,2,7)]
data(trio.data) trio.tmp <- trio.check(dat=trio.ped1) str(trio.tmp, max=1) trio.tmp$trio[1:6,] trio.tmp <- trio.check(dat=trio.ped.err) str(trio.tmp, max=1) trio.tmp$errors trio.tmp$trio.err[1:3, c(1,2, 11:12)] trio.ped.err[1:3,c(1:2, 23:26)] trio.tmp <- trio.check(dat=trio.gen.err, is.linkage=FALSE) trio.tmp$errors trio.tmp$trio.err[1:6, c(1,2,7), drop=FALSE] trio.rep <- trio.check(dat=trio.gen.err, is.linkage=FALSE, replace=TRUE) trio.rep$trio[1:6,c(1,2,7)]
trio.data
contains several simulated data sets used in the different examples for the analyses with the functions in the R
package trio
.
For the applications of genotypic TDTs for individual SNPs and two-way interactions with, for example,
tdt
and tdt2way
, respectively, trio.data
contains a 300 x 6 matrix
called mat.test
consisting of genotype data for 100 trios genotyped at 6 SNPs.
For the application of probTDT
to genotype probabilities, trio.data
contains a 334 x 180 matrix called
prob.mat.test
containing genotype probabilities for 334 SNPs and 20 trios.
For the preparation of the trio data for an application of trio logic regression with trio.check
and
trio.prepare
, trio.data
contains different data set containing genotype data for 10 SNPs in 100
trios in different formats.
trio.gen1
, trio.gen2
, and trio.gen.err
consist of 12 columns and 300 rows, where
the first two columns contain family identifier and individual identifier. In the columns afterwards,
each SNPs is encoded in one variable denoting the number of minor alleles.
trio.ped1
, trio.ped2
, and trio.ped.err
consist of 26 columns and 300 rows, where
the first six columns identify the family structure of the data, and the phenotype. Besides the variables
providing information on the family structure and the phenotypes
(columns 1 to 6), each SNPs is encoded in two variables denoting the alleles.
Contrary to the other data sets, trio.gen.err
and trio.ped.err
contain Mendelian errors.
For the application of the functions getLD
and findLDblocks
for computing the pairwise LD values and for detecting the LD blocks, respectively,
trio.data
contains a 500 x 50 matrix called LDblock
that is composed of genotype data for
10 LD blocks each consisting of 5 SNPs in strong LD.
Finally, for the simulation of trio data with trio.sim
,
trio.data
contains examples for haplotype frequencies used in these
simulations. Both freq.hap
and simuBkMap
are data.frame
s containing haplotype information, including the
haplotype block identifier, haplotype, and haplotype frequency. While freq.hap
is a data frame consisting of
20 rows and 3 columns, simuBkMap
consists of 66 rows and 3 columns. step3way
is a list internally used
for simulation, containing some indexes and sampling frequencies.
LDdata
and mat.test
: Holger Schwender, [email protected];
prob.mat.test
: Margaret Taub, [email protected];
all other data sets: Qing Li, [email protected]
# Data can be loaded by data(trio.data)
# Data can be loaded by data(trio.data)
Performs either a null-model or a conditional permutation test for a trio logic regression analysis.
trio.permTest(object, conditional = FALSE, n.perm = 10, nleaves = NULL, control = NULL, rand = NA)
trio.permTest(object, conditional = FALSE, n.perm = 10, nleaves = NULL, control = NULL, rand = NA)
object |
an object of class |
conditional |
should the conditional permutation test be performed? If |
n.perm |
integer specifying the number of permutations. |
nleaves |
integer specifying the maximum number of leaves that the logic tree in the trio logic regression model is allowed to have.
If |
control |
a list containing the control parameters for the search algorithms and the logic tree considered in |
rand |
an integer. If specified, the random number generator will be set into a reproducible state. |
A list consisting of
origScore |
|
,
permScore |
a vector of length |
Qing Li, [email protected]. Modified by Holger Schwender.
Li, Q., Fallin, M.D., Louis, T.A., Lasseter, V.K., McGrath, J.A., Avramopoulos, D., Wolyniec, P.S., Valle, D., Liang, K.Y., Pulver, A.E., and Ruczinski, I. (2010). Detection of SNP-SNP Interactions in Trios of Parents with Schizophrenic Children. Genetic Epidemiology, 34, 396-406.
# Load the simulated data. data(trio.data) # Prepare the data in trio.ped1 for a trio logic # regression analysis by first calling trio.tmp <- trio.check(dat = trio.ped1) # and then applying set.seed(123456) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) # where we here assume the block structure to be # c(1, 4, 2, 3), which means that the first LD "block" # only consists of the first SNP, the second LD block # consists of the following four SNPs in trio.bin, # the third block of the following two SNPs, # and the last block of the last three SNPs. # set.seed() is specified to make the results reproducible. # For the application of trio logic regression, some # parameters of trio logic regression are changed # to make the following example faster. my.control <- lrControl(start=1, end=-3, iter=1000, output=-4) # Please note typically you should consider much more # than 1000 iterations (usually, at least a few hundred # thousand). # Trio regression can then be applied to the trio data in # trio.ped1 by lr.out <- trioLR(trio.bin, control=my.control, rand=9876543) # where we specify rand just to make the results reproducible. # A null model permutation test can be performed by trio.permTest(lr.out) # The conditional permutation test can be performed by trio.permTest(lr.out, conditional = TRUE)
# Load the simulated data. data(trio.data) # Prepare the data in trio.ped1 for a trio logic # regression analysis by first calling trio.tmp <- trio.check(dat = trio.ped1) # and then applying set.seed(123456) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) # where we here assume the block structure to be # c(1, 4, 2, 3), which means that the first LD "block" # only consists of the first SNP, the second LD block # consists of the following four SNPs in trio.bin, # the third block of the following two SNPs, # and the last block of the last three SNPs. # set.seed() is specified to make the results reproducible. # For the application of trio logic regression, some # parameters of trio logic regression are changed # to make the following example faster. my.control <- lrControl(start=1, end=-3, iter=1000, output=-4) # Please note typically you should consider much more # than 1000 iterations (usually, at least a few hundred # thousand). # Trio regression can then be applied to the trio data in # trio.ped1 by lr.out <- trioLR(trio.bin, control=my.control, rand=9876543) # where we specify rand just to make the results reproducible. # A null model permutation test can be performed by trio.permTest(lr.out) # The conditional permutation test can be performed by trio.permTest(lr.out, conditional = TRUE)
Computes power for genotypic TDT, allelic TDT or Score test given n trios or required sample size to gain given power.
trio.power(maf = 0.5, RR = 1.5, alpha = 5*10^(-8), n = NULL, beta = NULL, model = c("additive", "dominant", "recessive"), test = c("gTDT", "Score", "aTDT")) ## S3 method for class 'trio.power' print(x,digits=4,...)
trio.power(maf = 0.5, RR = 1.5, alpha = 5*10^(-8), n = NULL, beta = NULL, model = c("additive", "dominant", "recessive"), test = c("gTDT", "Score", "aTDT")) ## S3 method for class 'trio.power' print(x,digits=4,...)
maf |
a numeric vector of population frequencies of a mutant allele. |
RR |
a numeric vector of the assumed relative risks for an individual getting a disease with 1 (in case of recessive model 2) mutant alleles compared to the risk of individuals carnying 0 mutant alleles. |
alpha |
a numeric vector of significance levels (Type I Error probability). |
n |
a numeric vector containing number of trios in a study. Must be filled for power calculation. Must not be NULL for sample size calculation. |
beta |
the desired power of the test. Must be filled for power calculation. Must not be NULL for sample size calculation. |
model |
a character containing the genotypic model assumed. Possible values are |
test |
the chosen test. Must be |
x |
an object of class |
digits |
number of digits that should be printed. |
... |
ignored |
Power and sample size calculation is derived on Knapp (1999). The power or the sample size will be calculated for all combinations of p, RR, alpha, test, model and n or beta.
An object of class trio.power containing the following numeric values or vectors, respectively:
model |
the chosen model |
size |
In case of sample size calculation: calculated sample sizes |
beta |
In case of sample size calculation: desired power |
n |
In case of power calculation: given number of trios |
power |
In case of power calculation: calculated power |
alpha |
Type I error |
test |
the chosen test |
RR |
the relative risks assumed |
p |
the assumed allele frequency |
calc |
the type of calculation |
Christoph Neumann
Knapp, M. (1999). A Note on Power Approximations for the Transmission/Disequilibrium Test. American Journal of Human Genetics, 64, 1177-1185.
Neumann, C., Taub, M.A., Younkin, S.G., Beaty, T.H., Ruczinski, I., Schwender, H. (2014). Analytic Power and Sample Size Calculation for the Genotypic Transmission/Disequilibrium Test in Case-Parent Trio Studies. Submitted.
# The required samples size to reach of power # of 0.8 when testing SNPs with minor allele # frequencies of 0.1 and 0.2 with an additive # or dominant genotypic TDT and score test # can be determined by trio.power(maf = c(0.1, 0.2), beta = 0.8, model = c("add", "dom"))
# The required samples size to reach of power # of 0.8 when testing SNPs with minor allele # frequencies of 0.1 and 0.2 with an additive # or dominant genotypic TDT and score test # can be determined by trio.power(maf = c(0.1, 0.2), beta = 0.8, model = c("add", "dom"))
This function transforms case-parent data into a format suitable as input for trio logic regression. The function can also be used for the imputation of missing genotypes in case-parent data, while taking the existing SNP block structure into account.
trio.prepare(trio.dat, freq=NULL, blocks=NULL, logic=TRUE, ...)
trio.prepare(trio.dat, freq=NULL, blocks=NULL, logic=TRUE, ...)
trio.dat |
An object returned from the function
|
freq |
An optional data frame specifying haplotype blocks and
frequencies. For an example, see the data frame The object must have three columns in the following order: block
identifiers ( |
blocks |
An optional vector of integers, specifying (in sequence)
the lengths of the linkage disequilibrium blocks. The sum of these
integers must be equal to the total numbers of SNPs in the data set
used as input. Using the integer 1 for SNPs not contained in LD
blocks is required if this argument is used. If both arguments
|
logic |
A logical value indicating whether the trio data are
returned with genotypes in dominant and recessive coding, suitable
as input for trio logic regression ( |
... |
Optional arguments that can be passed to function
|
To create the genotypes for the pseudo-controls it is
necessary to take the LD structure of the SNPs into account. This
requires information on the LD blocks. It is assumed that the user
has already delineated the block structure according to his or her
method of choice. The function trio.prepare
, which operates on an
output object of trio.check
, accepts the block length
information as an argument. If this argument is not specified, a
uniform block length of 1 (i.e., no LD structure) is assumed. If the
haplotype frequencies are not specified, they are estimated from the
parents' genotypes using the function haplo.em
. The
function then returns a list that contains the genotype information in
binary format, suitable as input for trio logic regression. Since
trio logic regression requires complete data, the function trio.prepare
also performs an imputation of the missing genotypes. The imputation
is based on the estimated or supplied haplotype information.
bin |
A matrix suitable as input for trio logic regression. The
first column specifies the cases and pseudo-controls as required by
logic regression using conditional logistic regression (the integer
3 for the probands followed by three zeros indicating the
pseudo-controls). The following columns specify the (possibly
imputed) genotypes in dominant and recessive coding, with two binary
variables for each SNP. This is returned only if |
trio |
A data frame with imputed SNPs in genotype format derived
from the input. This is returned only if |
miss |
A data frame with five columns indicating the missing genotypes in the input object.
The five columns of the data frame refer to the family id ( |
freq |
The estimated or supplied haplotype information, in the same format as described in the Arguments above. |
Support was provided by NIH grants R01 DK061662 and HL090577.
Qing Li, [email protected]
Li, Q., Fallin, M.D., Louis, T.A., Lasseter, V.K., McGrath, J.A., Avramopoulos, D., Wolyniec, P.S., Valle, D., Liang, K.Y., Pulver, A.E., and Ruczinski, I. (2010). Detection of SNP-SNP Interactions in Trios of Parents with Schizophrenic Children. Genetic Epidemiology, 34, 396-406.
data(trio.data) trio.tmp <- trio.check(dat=trio.ped1) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) trio.bin$bin[1:8,]
data(trio.data) trio.tmp <- trio.check(dat=trio.ped1) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) trio.bin$bin[1:8,]
trio.sim
generates case-parents trios when the disease
risk of children is specified by (possibly higher-order) SNP-SNP
interactions. The SNP minor allele frequencies and/or haplotypes are
specified by the user, as are the parameters in the logistic model
that describes the disease risk. If pi.usr
is specified, a
specific type of model, namely the well-known Risch model, will be employed.
trio.sim(freq, interaction = "1R and 2D", prev = 1e-3, OR = 1, pi.usr = 0, n = 100, rep = 1, step.save = NULL, step.load = NULL, verbose = FALSE)
trio.sim(freq, interaction = "1R and 2D", prev = 1e-3, OR = 1, pi.usr = 0, n = 100, rep = 1, step.save = NULL, step.load = NULL, verbose = FALSE)
freq |
A data frame specifying haplotype blocks and
frequencies. For an example, see the data frame The object must have three columns in the following order: block
identifiers ( |
interaction |
A string that specifies the risk altering genotype interaction as a Boolean term, such as "7D or 19R", or "(not 10D) or 45D". Each locus can appear at most once in the string, and the the Boolean term not can appear at most once before each locus, and must be enclosed in parenthesis, e.g., "(not 3D)". Therefore, strings such as "not (not 3D)" and "not 3D or 5R" are prohibited. Parenthesis are also used to unambiguously define the Boolean expression as a binary tree, i.e., every parent node has exact two children. For example Thus, a long string such as "1R or 3D or 5R" must be written as "(1R or 3D) or 5R" or as "1R or (3D or 5R)", even though the parenthesis are technically redundant. There is also a limit on the size of the interactions, please see Details below. |
prev |
The prevalence of the disease in the simulated population among non-carriers (the "un-exposed" group). |
OR |
The odds ratio of disease in the simulated population, comparing carriers to non-carriers. |
pi.usr |
probability for an individual without the |
n |
The number of case-parent trios simulated. The default is 100. |
rep |
The number of data set replicates generated. The default is 1. |
step.save |
The name of the binary file (without ".RData"
extension) in which the object specifying the simulation mating
tables and probabilities will be saved. The default value is
|
step.load |
The name of an existing binary file (without ".RData"
extension) in which the object specifying the simulation mating
tables and probabilities have been saved (see above). The default
value is |
verbose |
A logical value indicating whether or not to print information about memory and time usage. |
The function trio.sim
simulates case-parent trio data
when the disease risk of children is specified by (possibly
higher-order) SNP-SNP interactions. The mating tables and the
respective sampling probabilities depend on the haplotype frequencies
(or SNP minor allele frequencies when the SNP does not belong to a
block). This information is specified in the freq
argument of
the function. The probability of disease is assumed to be described
by the logistic term logit(p) = a + b I[Interaction],
where a = logit (prev
) and b = log(OR
),
with prev
and OR
specified by the user. Note that at
this point only data for two risk groups (carriers versus
non-carriers) can be simulated. Since the computational demands for
generating the mating is dependent on the number of loci involved in
the interactions and the lengths of the LD blocks that contain these
disease loci, the interaction term can only consist of up to six loci,
not more than one of those loci per block, and haplotype (block)
lengths of at most 5 loci.
Generating the mating tables and the respective sampling probabilities
necessary to simulate case-parent trios can be very time consuming for
interaction models involving three or more SNPs. In simulation
studies, many replicates of similar data are usually required, and
generating these sampling probabilities in each instance would be a
large and avoidable computational burden (CPU and memory). The
sampling probabilities depend foremost on the interaction term and the
underlying haplotype frequencies, and as long as these remain constant
in the simulation study, the mating table information and the sampling
probabilities can be "recycled". This is done by storing the relevant
information (denoted as "step-stone") as a binary R file in the
working directory (using the argument step.save
), and loading
the binary file again in future simulations (using the argument
step.load
), speeding up the simulation process dramatically.
It is even possible to change the parameters prev
and OR
(corresponding to a and b in the logistic model) in
these additional simulations, as the sampling probabilities can be
adjusted accordingly.
A list of matrices, containing the simulated data sets, in genotype format (indicating the number of variant alleles), including family and subject identifiers.
Qing Li, [email protected]
Li, Q., Fallin, M.D., Louis, T.A., Lasseter, V.K., McGrath, J.A., Avramopoulos, D., Wolyniec, P.S., Valle, D., Liang, K.Y., Pulver, A.E., and Ruczinski, I. (2010). Detection of SNP-SNP Interactions in Trios of Parents with Schizophrenic Children. Genetic Epidemiology, 34, 396-406.
data(trio.data) sim <- trio.sim(freq=simuBkMap, interaction="1R and 5R", prev=.001, OR=2, n=20, rep=1) sim[[1]][1:6, 1:12]
data(trio.data) sim <- trio.sim(freq=simuBkMap, interaction="1R and 5R", prev=.001, OR=2, n=20, rep=1) sim[[1]][1:6, 1:12]
Performs a trioFS (trio Feature Selection) analysis as proposed by Schwender et al. (2011) based on bagging/subsampling with base learner trio logic regression (Li et al., 2011).
## Default S3 method: trioFS(x, y, B = 20, nleaves = 5, replace = TRUE, sub.frac = 0.632, control = lrControl(), fast = FALSE, addMatImp = TRUE, addModels = TRUE, verbose = FALSE, rand = NA, ...) ## S3 method for class 'trioPrepare' trioFS(x, ...) ## S3 method for class 'formula' trioFS(formula, data, recdom = TRUE, ...)
## Default S3 method: trioFS(x, y, B = 20, nleaves = 5, replace = TRUE, sub.frac = 0.632, control = lrControl(), fast = FALSE, addMatImp = TRUE, addModels = TRUE, verbose = FALSE, rand = NA, ...) ## S3 method for class 'trioPrepare' trioFS(x, ...) ## S3 method for class 'formula' trioFS(formula, data, recdom = TRUE, ...)
x |
either an object of class |
y |
a numeric vector specifying the case-pseudo-control status for the observations in |
B |
number of bootstrap samples or subsamples used in |
nleaves |
maximum number of leaves, i.e.\ variables, in the logic tree considered in each of the |
replace |
should sampling of the trios be done with replacement? If
|
sub.frac |
a proportion specifying the fraction of trios that
are used in each iteration to fit a trio logic regression model if |
control |
a list of control parameters for the search algorithms and the logic trees considered when fitting the
trio logic regression model, where the parameters for an MC logic regression are ignored. For details and the parameters,
see |
fast |
should a greedy search be used instead of simulated annealing, i.e. the standard search algorithm in (trio) logic regression? |
addMatImp |
should the matrix containing the improvements due to the interactions
in each of the iterations be added to the output, where the importance of each interaction
is computed by the average over the |
addModels |
should the |
verbose |
should some comments on the progress the |
rand |
positive integer. If specified, the random number generator is set into a reproducible state. |
formula |
an object of class |
data |
a data frame containing the variables in the model. Each row of |
recdom |
a logical value or vector of length |
... |
for the |
An object of class trioFS
consisting of
vim |
a numeric vector containing the values of the importance measure for the found interactions, |
prop |
a numeric vector consisting of the percentage of models that contain the respective found interactions, |
primes |
a character vector naming the found interactions, |
param |
a list of parameters used in the trioFS analysis, i.e. |
mat.imp |
if |
logreg.model |
if |
inbagg |
if |
Holger Schwender, [email protected]
Li, Q., Fallin, M.D., Louis, T.A., Lasseter, V.K., McGrath, J.A., Avramopoulos, D., Wolyniec, P.S., Valle, D., Liang, K.Y., Pulver, A.E., and Ruczinski, I. (2010). Detection of SNP-SNP Interactions in Trios of Parents with Schizophrenic Children. Genetic Epidemiology, 34, 396-406.
Schwender, H., Bowers, K., Fallin, M.D., and Ruczinski, I. (2011). Importance Measures for Epistatic Interactions# in Case-Parent Trios. Annals of Human Genetics, 75, 122-132.
trioLR
, print.trioFS
, trio.prepare
# Load the simulated data. data(trio.data) # Prepare the data in trio.ped1 for a trioFS analysis # by first calling trio.tmp <- trio.check(dat = trio.ped1) # and then applying set.seed(123456) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) # where we here assume the block structure to be # c(1, 4, 2, 3), which means that the first LD "block" # only consists of the first SNP, the second LD block # consists of the following four SNPs in trio.bin, # the third block of the following two SNPs, # and the last block of the last three SNPs. # set.seed() is specified to make the results reproducible. # For the application of trioFS, some parameters of trio # logic regression are changed to make the following example faster. my.control <- lrControl(start=1, end=-3, iter=1000, output=-4) # Please note typically you should consider much more # than 1000 iterations (usually, at least a few hundred # thousand). # TrioFS can then be applied to the trio data in trio.ped1 by fs.out <- trioFS(trio.bin, control=my.control, rand=9876543) # where we specify rand just to make the results reproducible.
# Load the simulated data. data(trio.data) # Prepare the data in trio.ped1 for a trioFS analysis # by first calling trio.tmp <- trio.check(dat = trio.ped1) # and then applying set.seed(123456) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) # where we here assume the block structure to be # c(1, 4, 2, 3), which means that the first LD "block" # only consists of the first SNP, the second LD block # consists of the following four SNPs in trio.bin, # the third block of the following two SNPs, # and the last block of the last three SNPs. # set.seed() is specified to make the results reproducible. # For the application of trioFS, some parameters of trio # logic regression are changed to make the following example faster. my.control <- lrControl(start=1, end=-3, iter=1000, output=-4) # Please note typically you should consider much more # than 1000 iterations (usually, at least a few hundred # thousand). # TrioFS can then be applied to the trio data in trio.ped1 by fs.out <- trioFS(trio.bin, control=my.control, rand=9876543) # where we specify rand just to make the results reproducible.
Performs a trio logic regression analysis as proposed by Li et al. (2011), where trio logic regression is an adaptation of logic regression (Ruczinski et al., 2003) for case-parent trio data.
## Default S3 method: trioLR(x, y, search = c("sa", "greedy", "mcmc"), nleaves = 5, penalty = 0, weights = NULL, control=lrControl(), rand = NA, ...) ## S3 method for class 'trioPrepare' trioLR(x, ...) ## S3 method for class 'formula' trioLR(formula, data, recdom = TRUE, ...)
## Default S3 method: trioLR(x, y, search = c("sa", "greedy", "mcmc"), nleaves = 5, penalty = 0, weights = NULL, control=lrControl(), rand = NA, ...) ## S3 method for class 'trioPrepare' trioLR(x, ...) ## S3 method for class 'formula' trioLR(formula, data, recdom = TRUE, ...)
x |
either an object of class |
y |
a numeric vector specifying the case-pseudo-control status for the observations in |
search |
character string naming the search algorithm that should be used in the search for the best
trio logic regression model. By default, i.e. |
nleaves |
integer or vector of two integers specifying the maximum number of leaves, i.e.\ variables,
in the logic tree of the trio logic regression model (please note in trio logic regression the model
consists only of one logic tree).
Must be a single integer, if |
penalty |
a non-negative value for the |
weights |
a numeric vector containing one weight for each trio considered in |
control |
a list of control parameters for the search algorithms and the logic tree considered when fitting a
(trio) logic regression model. For these parameters, see |
rand |
integer. If specified, the random number generator will be set into a reproducible state. |
formula |
an object of class |
data |
a data frame containing the variables in the model. Each row of |
recdom |
a logical value or vector of length |
... |
for the |
Trio logic regression is an adaptation of logic regression to case-parent trio data. Virtually all
features for a standard logic regression analysis with the function logreg
available in the R
package LogicReg
are also available for a trio logic regression analysis,
either directly via trioLR
or via the function trio.permTest
for performing permutation tests.
For a detailed, comprehensive description on how to perform a logic regression analysis, and thus, a trio
logic regression analysis, see the Details
section of the help page for the function logreg
in the R
package LogicReg
. For a detailed explanation on how to specify the parameters for
simulated annealing, see the man page of the function logreg.anneal.control
in the R
package
LogicReg
.
Finally, an example for a trio logic regression analysis is given in the vignette trio
available
in the R
package trio
.
An object of class trioLR
composed of the same objects as an object of class logreg
. For details,
see the Value
section of the function logreg
from the R
package LogicReg
.
Holger Schwender, [email protected]
Kooperberg, C. and Ruczinski, I. (2005). Identifying Interacting SNPs Using Monte Carlo Logic Regression. Genetic Epidemiology, 28, 157-170.
Li, Q., Fallin, M.D., Louis, T.A., Lasseter, V.K., McGrath, J.A., Avramopoulos, D., Wolyniec, P.S., Valle, D., Liang, K.Y., Pulver, A.E., and Ruczinski, I. (2010). Detection of SNP-SNP Interactions in Trios of Parents with Schizophrenic Children. Genetic Epidemiology, 34, 396-406.
Ruczinski, I., Kooperberg, C., and LeBlanc, M.L. (2003). Logic Regression. Journal of Computational and Graphical Statistics, 12, 475-511.
logreg
, trio.prepare
, trio.check
, trio.permTest
# Load the simulated data. data(trio.data) # Prepare the data in trio.ped1 for a trio logic # regression analysis by first calling trio.tmp <- trio.check(dat = trio.ped1) # and then applying set.seed(123456) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) # where we here assume the block structure to be # c(1, 4, 2, 3), which means that the first LD "block" # only consists of the first SNP, the second LD block # consists of the following four SNPs in trio.bin, # the third block of the following two SNPs, # and the last block of the last three SNPs. # set.seed() is specified to make the results reproducible. # For the application of trio logic regression, some # parameters of trio logic regression are changed # to make the following example faster. my.control <- lrControl(start=1, end=-3, iter=1000, output=-4) # Please note typically you should consider much more # than 1000 iterations (usually, at least a few hundred # thousand). # Trio regression can then be applied to the trio data in # trio.ped1 by lr.out <- trioLR(trio.bin, control=my.control, rand=9876543) # where we specify rand just to make the results reproducible.
# Load the simulated data. data(trio.data) # Prepare the data in trio.ped1 for a trio logic # regression analysis by first calling trio.tmp <- trio.check(dat = trio.ped1) # and then applying set.seed(123456) trio.bin <- trio.prepare(trio.dat=trio.tmp, blocks=c(1,4,2,3)) # where we here assume the block structure to be # c(1, 4, 2, 3), which means that the first LD "block" # only consists of the first SNP, the second LD block # consists of the following four SNPs in trio.bin, # the third block of the following two SNPs, # and the last block of the last three SNPs. # set.seed() is specified to make the results reproducible. # For the application of trio logic regression, some # parameters of trio logic regression are changed # to make the following example faster. my.control <- lrControl(start=1, end=-3, iter=1000, output=-4) # Please note typically you should consider much more # than 1000 iterations (usually, at least a few hundred # thousand). # Trio regression can then be applied to the trio data in # trio.ped1 by lr.out <- trioLR(trio.bin, control=my.control, rand=9876543) # where we specify rand just to make the results reproducible.
Transforms a vcf file into a matrix in genotype format required by, e.g., the functions for computing the genotypic TDT.
vcf2geno(vcf, ped, none = "0/0", one = c("0/1"), both = "1/1", na.string = ".", use.rownames = FALSE, allowDifference = FALSE, removeMonomorphic = TRUE, removeNonBiallelic = TRUE, changeMinor = FALSE)
vcf2geno(vcf, ped, none = "0/0", one = c("0/1"), both = "1/1", na.string = ".", use.rownames = FALSE, allowDifference = FALSE, removeMonomorphic = TRUE, removeNonBiallelic = TRUE, changeMinor = FALSE)
vcf |
a matrix resulting from reading a vcf file into |
ped |
a data frame containing the family information for the subjects in |
none |
a character string or vector specifying the coding for the homozygous reference genotype. |
one |
a character string or vector specifying the coding for the heterozygous genotype. |
both |
a character string or vector specifying the coding for the homozygous variant genotype. |
na.string |
a character string or vector specifying how missing values are coded in the vcf file. |
use.rownames |
a logical value specifying whether the row names of |
allowDifference |
a logical value specifying whether |
removeMonomorphic |
a logical value specifying whether monomorphic SNVs should be removed from the output. |
removeNonBiallelic |
a logical value specifying whether SNVs showing other genotypes than
the ones specified by |
changeMinor |
a logical value specifying whether the coding of the genotypes should be changed for SNVs for which the
default coding leads to a minor allele frequency larger than 0.5. The genotypes are coded by the number of minor alleles,
i.e. the genotype(s) specified by |
A matrix in genotype format required, e.g., by functions for performing different types of the genotypic TDT, such as
colTDT
.
Holger Schwender, [email protected]
colTDT
, colGxG
, colGxE
, ped2geno