--- title: 'Using RQT, an R package for gene-level meta-analysis' author: "Ilya Y. Zhbannikov" date: "`r BiocStyle::doc_date()`" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Tutorial for rqt package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Overview Despite the recent advances of modern GWAS methods, it is still remains an important problem of addressing calculation an effect size and corresponding p-value for the whole gene rather than for single variant. We developed an R-package rqt, which offers gene-level GWAS meta-analysis. The package can be easily included into bioinformatics pipeline or used as stand-alone. Contact: ilya.zhbannikov@duke.edu for questions of usage the \texttt{rqt} or any other issues. Below we provide several examples that show GWAS meta-analysis on gene-level layer. ### Methods in brief The workflow of gene-level meta analysis consists of the following steps: (i) reducing the number of predictors, thereby alleviating correlation problem in variants (accounting for LD); (ii) then the regression mod-el is fitted on the reduced dataset to obtain corresponding regression coefficient ("effect sizes"); (iii) these coefficients are then to be pooled into a total index representing a total gene-level effect size and corresponding statistics is calculated. P- and q- values are then calculated using this statistics from asymptotic approximation or permutation procedure; (iv) the final step is combining gene-level p-values calculated from each study with Fisher's combined probability method. ## Installation of \emph{rqt} package In order to install the \emph{rqt} package, the user must first install R (\url{http://www.r-project.org}). After that, \emph{rqt} can be installed with: ```{r, echo=TRUE, eval=FALSE} devtools::install_github("izhbannikov/rqt", buildVignette=TRUE) ``` ## Data description ### Single dataset In \texttt{rqt} requires the following datasets: (i) \texttt{phenotype} (a \texttt{N} by 1) matrix (i.e. a vector); and (ii) \texttt{genotype} - an object of class \texttt{SummarizedExperiment} containing one assay: (a \texttt{N} by \texttt{M}) matrix, where \texttt{N} - is the total number of individuals in the study and \texttt{M} is the total number of genetic variants. Optionally, \texttt{rqt} can accept covariates, in form of \texttt{N} by \texttt{K} matrix, where \texttt{K} is the total number of covariates used in the study. Phenotype can be dichotomous (0/1, where 1 indicates control and 0 case). ### Meta-analysis In meta-analysis, \texttt{rqt} requires a list of \texttt{M} (\texttt{M} - number of datasets used in meta-analysis) and optionally it accepts covariates in form described above. ## Examples ### Gene-level analysis on a single dataset #### Dichotomous phenotype ```{r, echo=TRUE, message=FALSE} library(rqt) data <- data.matrix(read.table(system.file("extdata/test.bin1.dat", package="rqt"), header=TRUE)) pheno <- data[,1] geno <- data[, 2:dim(data)[2]] colnames(geno) <- paste(seq(1, dim(geno)[2])) geno.obj <- SummarizedExperiment(geno) obj <- rqt(phenotype=pheno, genotype=geno.obj) res <- geneTest(obj, method="pca", out.type = "D") print(res) ``` #### Continuous phenotype ```{r, echo=TRUE, message=FALSE} library(rqt) data <- data.matrix(read.table(system.file("extdata/test.cont1.dat", package="rqt"), header = TRUE)) pheno <- data[,1] geno <- data[, 2:dim(data)[2]] colnames(geno) <- paste(seq(1, dim(geno)[2])) geno.obj <- SummarizedExperiment(geno) obj <- rqt(phenotype=pheno, genotype=geno.obj) res <- geneTest(obj, method="pca", out.type = "C") print(res) ``` #### Preprocessing with Partial Least Square regression (PLS) This method is used for continous outcome, i.e. out.type = "C". ```{r, echo=TRUE, message=FALSE} library(rqt) data <- data.matrix(read.table(system.file("extdata/test.cont1.dat", package="rqt"), header = TRUE)) pheno <- data[,1] geno <- data[, 2:dim(data)[2]] colnames(geno) <- paste(seq(1, dim(geno)[2])) geno.obj <- SummarizedExperiment(geno) obj <- rqt(phenotype=pheno, genotype=geno.obj) res <- geneTest(obj, method="pls", out.type = "C") print(res) ``` #### Preprocessing with Partial Least Square Discriminant Analysis (PLS-DA) This method of data preprocessing is used for dichotomous outcome. ```{r, echo=TRUE, message=FALSE} library(rqt) data <- data.matrix(read.table(system.file("extdata/test.bin1.dat", package="rqt"), header=TRUE)) pheno <- data[,1] geno <- data[, 2:dim(data)[2]] colnames(geno) <- paste(seq(1, dim(geno)[2])) geno.obj <- SummarizedExperiment(geno) obj <- rqt(phenotype=pheno, genotype=geno.obj) # Not yet supported, sorry! #res <- geneTest(obj, method="pls", out.type = "D", scale = TRUE) print(res) ``` #### Using additional covariates Quite often, researchers want to supply not only genetic data but also specific covariates, representic some physiological parameters or environment (for example, to evaluate hyphoteses of gene-environment interactions). In such cases, the package \texttt{rqt} can accept additional covariates, in form of \texttt{N} by \texttt{K} matrix, as provided below: ```{r, echo=TRUE, message=FALSE} library(rqt) data <- data.matrix(read.table(system.file("extdata/test.bin1.dat", package="rqt"), header = TRUE)) pheno <- data[,1] geno <- data[, 2:dim(data)[2]] colnames(geno) <- paste(seq(1, dim(geno)[2])) geno.obj <- SummarizedExperiment(geno) covars <- read.table(system.file("extdata/test.cova1.dat",package="rqt"), header=TRUE) obj <- rqt(phenotype=pheno, genotype=geno.obj, covariates = covars) res <- geneTest(obj, method="pca", out.type = "D") print(res) ``` For continous phenotype: ```{r, echo=TRUE, message=FALSE} library(rqt) data <- data.matrix(read.table(system.file("extdata/test.cont1.dat", package="rqt"), header = TRUE)) pheno <- data[,1] geno <- data[, 2:dim(data)[2]] colnames(geno) <- paste(seq(1, dim(geno)[2])) geno.obj <- SummarizedExperiment(geno) covars <- read.table(system.file("extdata/test.cova1.dat",package="rqt"), header=TRUE) obj <- rqt(phenotype=pheno, genotype=geno.obj, covariates = covars) res <- geneTest(obj, method="pca", out.type = "C") print(res) ``` ### Meta-analysis ```{r, echo=TRUE, message=FALSE} library(rqt) data1 <- data.matrix(read.table(system.file("extdata/phengen2.dat", package="rqt"), skip=1)) pheno <- data1[,1] geno <- data1[, 2:dim(data1)[2]] colnames(geno) <- paste(seq(1, dim(geno)[2])) geno.obj <- SummarizedExperiment(geno) obj1 <- rqt(phenotype=pheno, genotype=geno.obj) data2 <- data.matrix(read.table(system.file("extdata/phengen3.dat", package="rqt"), skip=1)) pheno <- data2[,1] geno <- data2[, 2:dim(data2)[2]] colnames(geno) <- paste(seq(1, dim(geno)[2])) geno.obj <- SummarizedExperiment(geno) obj2 <- rqt(phenotype=pheno, genotype=geno.obj) data3 <- data.matrix(read.table(system.file("extdata/phengen.dat", package="rqt"), skip=1)) pheno <- data3[,1] geno <- data3[, 2:dim(data3)[2]] colnames(geno) <- paste(seq(1, dim(geno)[2])) geno.obj <- SummarizedExperiment(geno) obj3 <- rqt(phenotype=pheno, genotype=geno.obj) res.meta <- geneTestMeta(list(obj1, obj2, obj3)) print(res.meta) ``` ## Session information ```{r, echo=TRUE} sessionInfo() ```