Title: | A Functional Approach To Impute Genetically Regulated Expression |
---|---|
Description: | Impute a GReX (Genetically Regulated Expression) for a set of genes in a sample of individuals, using a method based on the Total Binding Affinity (TBA). Statistical models to impute GReX can be trained with a training dataset where the real total expression values are known. |
Authors: | Alessandro Lussana [aut, cre] |
Maintainer: | Alessandro Lussana <[email protected]> |
License: | GPL-3 |
Version: | 1.25.0 |
Built: | 2024-11-29 03:08:44 UTC |
Source: | https://github.com/bioc/AffiXcan |
Impute a GReX (Genetically Regulated Expression) for a set of genes in a sample of individuals, using a method based on the Total Binding Affinity (TBA) score. Statistical models to impute GReX can be trained on a training dataset where the real total expression values are known.
For every gene a linear regression model can be fitted on a training set of individuals where the real total expression values, the Total Binding Affinity (TBA) values for a set of genomic regions, and the covariates of the population genetic structure, are known. AffiXcan performs a principal component analysis (PCA) on the TBA values to fit a linear model using the formula: GReX ~ PC1 + PC2 + PC3 + ... + COV1 + COV2 + ... . Associations between the expressed genes and the regulatory regions, on which the TBA values have to be computed, are needed. TBA can be computed using "vcf_rider" software (see references)
Alessandro Lussana <[email protected]> Maintainer: Alessandro Lussana <[email protected]>
https://github.com/vodkatad/vcf_rider
trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) testingTbaPaths <- system.file("extdata","testing.tba.toydata.rds", package="AffiXcan") exprmatrix <- affiXcanImpute(tbaPaths=testingTbaPaths, affiXcanTraining=training, scale=TRUE)
trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) testingTbaPaths <- system.file("extdata","testing.tba.toydata.rds", package="AffiXcan") exprmatrix <- affiXcanImpute(tbaPaths=testingTbaPaths, affiXcanTraining=training, scale=TRUE)
Fit linear models and compute ANOVA p-values
affiXcanBs( exprMatrix, assay, regionAssoc, pca, cov = NULL, BPPARAM = bpparam(), trainingSamples )
affiXcanBs( exprMatrix, assay, regionAssoc, pca, cov = NULL, BPPARAM = bpparam(), trainingSamples )
exprMatrix |
A SummarizedExperiment object containing expression data |
assay |
A string with the name of the object in SummarizedExperiment::assays(exprMatrix) that contains expression values |
regionAssoc |
A data.frame with the associations between regulatory regions and expressed genes, and with colnames = c("REGULATORY_REGION", "EXPRESSED_REGION") |
pca |
A list, which is the returningObject$pca from affiXcanPca() |
cov |
Optional; a data.frame with covariates values for the population structure where the columns are the PCs and the rows are the individual IDs; default is NULL |
BPPARAM |
A BiocParallelParam object. Default is bpparam(). For details on BiocParallelParam virtual base class see browseVignettes("BiocParallel") |
trainingSamples |
A vector of strings. The identifiers (e.g. row names of MultiAssayExperiment objects from tbaPaths) of the samples that have to be considered in the training phase, and not used for the cross-validation |
A list containing lists named as the EXPRESSED_REGIONS found in the param regionAssoc. Each of these lists contain three objects:
coefficients: An object containing the coefficients of the principal components used in the model, completely similar to the "coefficients" from the results of lm()
p.val: The uncorrected anova p-value of the model
r.sq: The coefficient of determination between the real total expression values and the imputed GReX, retrived from summary(model)$r.squared
corrected.p.val: The p-value of the model, corrected for multiple testing with benjamini-hochberg procedure
if (interactive()) { data(exprMatrix) data(trainingCovariates) data(regionAssoc) tbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") regionsCount <- overlookRegions(tbaPaths) assay <- "values" sampleNames <- colnames(exprMatrix) nSamples <- length(sampleNames) sampGroups <- subsetKFold(k=5, n=nSamples) for (i in seq(length(sampGroups))) { sampGroups[[i]] <- colnames(exprMatrix)[sampGroups[[i]]] } testingSamples <- sampGroups[[1]] trainingSamples <- sampleNames[!sampleNames %in% testingSamples] pca <- affiXcanPca(tbaPaths=tbaPaths, varExplained=80, scale=TRUE, regionsCount=regionsCount, trainingSamples=trainingSamples) bs <- affiXcanBs(exprMatrix=exprMatrix, assay=assay, regionAssoc=regionAssoc, pca=pca, cov=trainingCovariates, trainingSamples=trainingSamples) }
if (interactive()) { data(exprMatrix) data(trainingCovariates) data(regionAssoc) tbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") regionsCount <- overlookRegions(tbaPaths) assay <- "values" sampleNames <- colnames(exprMatrix) nSamples <- length(sampleNames) sampGroups <- subsetKFold(k=5, n=nSamples) for (i in seq(length(sampGroups))) { sampGroups[[i]] <- colnames(exprMatrix)[sampGroups[[i]]] } testingSamples <- sampGroups[[1]] trainingSamples <- sampleNames[!sampleNames %in% testingSamples] pca <- affiXcanPca(tbaPaths=tbaPaths, varExplained=80, scale=TRUE, regionsCount=regionsCount, trainingSamples=trainingSamples) bs <- affiXcanBs(exprMatrix=exprMatrix, assay=assay, regionAssoc=regionAssoc, pca=pca, cov=trainingCovariates, trainingSamples=trainingSamples) }
Compute a GReX from variables and their coefficients for a set of genes
affiXcanGReX(affiXcanTraining, pcs, BPPARAM = bpparam())
affiXcanGReX(affiXcanTraining, pcs, BPPARAM = bpparam())
affiXcanTraining |
The returning object from affiXcanTrain() |
pcs |
A list, which is the returning object from affiXcanPcs() |
BPPARAM |
A BiocParallelParam object. Default is bpparam(). For details on BiocParallelParam virtual base class see browseVignettes("BiocParallel") |
A SummarizedExperiment object containing the imputed GReX values
if (interactive()) { trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) testingTbaPaths <- system.file("extdata","testing.tba.toydata.rds", package="AffiXcan") pcs <- affiXcanPcs(tbaPaths=testingTbaPaths, affiXcanTraining=training, scale=TRUE) exprmatrix <- affiXcanGReX(affiXcanTraining=training, pcs=pcs) }
if (interactive()) { trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) testingTbaPaths <- system.file("extdata","testing.tba.toydata.rds", package="AffiXcan") pcs <- affiXcanPcs(tbaPaths=testingTbaPaths, affiXcanTraining=training, scale=TRUE) exprmatrix <- affiXcanGReX(affiXcanTraining=training, pcs=pcs) }
Impute a GReX for each gene for which a model was generated
affiXcanImpute(tbaPaths, affiXcanTraining, scale = TRUE, BPPARAM = bpparam())
affiXcanImpute(tbaPaths, affiXcanTraining, scale = TRUE, BPPARAM = bpparam())
tbaPaths |
A vector of strings, which are the paths to MultiAssayExperiment RDS files containing the tba values |
affiXcanTraining |
The returning object from affiXcanTrain() |
scale |
A logical; if scale=FALSE the TBA values will be only centered, not scaled before performing PCA; default is TRUE |
BPPARAM |
A BiocParallelParam object. Default is bpparam(). For details on BiocParallelParam virtual base class see browseVignettes("BiocParallel") |
A SummarizedExperiment object containing imputed GReX values
trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, varExplained=80, scale=TRUE) testingTbaPaths <- system.file("extdata","testing.tba.toydata.rds", package="AffiXcan") exprmatrix <- affiXcanImpute(tbaPaths=testingTbaPaths, affiXcanTraining=training, scale=TRUE)
trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, varExplained=80, scale=TRUE) testingTbaPaths <- system.file("extdata","testing.tba.toydata.rds", package="AffiXcan") exprmatrix <- affiXcanImpute(tbaPaths=testingTbaPaths, affiXcanTraining=training, scale=TRUE)
Perform a PCA on each experiment found in MultiAssayExperiment objects
affiXcanPca( tbaPaths, varExplained = 80, scale = TRUE, regionsCount, BPPARAM = bpparam(), trainingSamples )
affiXcanPca( tbaPaths, varExplained = 80, scale = TRUE, regionsCount, BPPARAM = bpparam(), trainingSamples )
tbaPaths |
A vector of strings, which are the paths to MultiAssayExperiment RDS files containing the tba values |
varExplained |
An integer between 0 and 100; varExplained=80 means that the principal components selected to fit the models must explain at least 80 percent of variation of TBA values; default is 80 |
scale |
A logical; if scale=FALSE the TBA values will be only centered, not scaled before performing PCA; default is TRUE |
regionsCount |
An integer, that is the summation of length(assays()) of every MultiAssayExperiment RDS object indicated in the param tbaPaths; it is the returning value from overlookRegions() |
BPPARAM |
A BiocParallelParam object. Default is bpparam(). For details on BiocParallelParam virtual base class see browseVignettes("BiocParallel") |
trainingSamples |
A vector of strings. The identifiers (e.g. row names of MultiAssayExperiment objects from tbaPaths) of the samples that have to be considered in the training phase, and not used for the cross-validation |
pca: A list containing lists named as the MultiAssayExperiment::experiments() found in the MultiAssayExperiment objects listed in the param tbaPaths. Each of these lists contain two objects:
eigenvectors: A matrix containing eigenvectors for those principal components selected according to the param varExplained
pcs: A matrix containing the principal components values selected according to the param varExplained
eigenvalues: A vector containing eigenvalues for those principal components selected according to the param varExplained
if (interactive()) { data(exprMatrix) tbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") regionsCount <- overlookRegions(tbaPaths) sampleNames <- colnames(exprMatrix) nSamples <- length(sampleNames) sampGroups <- subsetKFold(k=3, n=nSamples) for (i in seq(length(sampGroups))) { sampGroups[[i]] <- colnames(exprMatrix)[sampGroups[[i]]] } testingSamples <- sampGroups[[1]] trainingSamples <- sampleNames[!sampleNames %in% testingSamples] pca <- affiXcanPca(tbaPaths=tbaPaths, varExplained=80, scale=TRUE, regionsCount=regionsCount, trainingSamples=trainingSamples) }
if (interactive()) { data(exprMatrix) tbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") regionsCount <- overlookRegions(tbaPaths) sampleNames <- colnames(exprMatrix) nSamples <- length(sampleNames) sampGroups <- subsetKFold(k=3, n=nSamples) for (i in seq(length(sampGroups))) { sampGroups[[i]] <- colnames(exprMatrix)[sampGroups[[i]]] } testingSamples <- sampGroups[[1]] trainingSamples <- sampleNames[!sampleNames %in% testingSamples] pca <- affiXcanPca(tbaPaths=tbaPaths, varExplained=80, scale=TRUE, regionsCount=regionsCount, trainingSamples=trainingSamples) }
Compute PCs in MultiAssayExperiment objects using eigenvectors given by user
affiXcanPcs( tbaPaths, affiXcanTraining, scale, BPPARAM = bpparam(), testingSamples = NULL )
affiXcanPcs( tbaPaths, affiXcanTraining, scale, BPPARAM = bpparam(), testingSamples = NULL )
tbaPaths |
A vector of strings, which are the paths to MultiAssayExperiment RDS files containing the tba values |
affiXcanTraining |
The returning object from affiXcanTrain() |
scale |
A logical; if scale=FALSE the TBA values will be only centered, not scaled before performing PCA |
BPPARAM |
A BiocParallelParam object. Default is bpparam(). For details on BiocParallelParam virtual base class see browseVignettes("BiocParallel") |
testingSamples |
A vector of strings. The identifiers (e.g. row names of MultiAssayExperiment objects from tbaPaths) of the samples that have not been considered in the training phase, to be used in the cross-validation; default is NULL; if is.null(testingSamples)==TRUE then no filtering is performed |
A list of matrices containing the principal components values of TBA for each region; each object of the list is named after the MultiAssayExperiment object from which it derives
if (interactive()) { data(exprMatrix) data(trainingCovariates) data(regionAssoc) trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") testingTbaPaths <- system.file("extdata","testing.tba.toydata.rds", package="AffiXcan") assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) pcs <- affiXcanPcs(tbaPaths=testingTbaPaths, affiXcanTraining=training, scale=TRUE) }
if (interactive()) { data(exprMatrix) data(trainingCovariates) data(regionAssoc) trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") testingTbaPaths <- system.file("extdata","testing.tba.toydata.rds", package="AffiXcan") assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) pcs <- affiXcanPcs(tbaPaths=testingTbaPaths, affiXcanTraining=training, scale=TRUE) }
Train the model needed to impute a GReX for each gene
affiXcanTrain( exprMatrix, assay, tbaPaths, regionAssoc, cov = NULL, varExplained = 80, scale = TRUE, BPPARAM = bpparam(), kfold = 1 )
affiXcanTrain( exprMatrix, assay, tbaPaths, regionAssoc, cov = NULL, varExplained = 80, scale = TRUE, BPPARAM = bpparam(), kfold = 1 )
exprMatrix |
A SummarizedExperiment object containing expression data |
assay |
A string with the name of the object in SummarizedExperiment::assays(exprMatrix) that contains expression values |
tbaPaths |
A vector of strings, which are the paths to MultiAssayExperiment RDS files containing the tba values |
regionAssoc |
A data.frame with the association between regulatory regions and expressed genes and with colnames = c("REGULATORY_REGION", "EXPRESSED_REGION") |
cov |
Optional. A data.frame with covariates values for the population structure where the columns are the PCs and the rows are the individual IIDs Default is NULL |
varExplained |
An integer between 0 and 100; varExplained=80 means that the principal components selected to fit the models must explain at least 80 percent of variation of TBA values; default is 80 |
scale |
A logical; if scale=FALSE the TBA values will be only centered, not scaled before performing PCA; default is TRUE |
BPPARAM |
A BiocParallelParam object. Default is bpparam(). For details on BiocParallelParam virtual base class see browseVignettes("BiocParallel") |
kfold |
An integer. The k definition of k-fold cross-validation on the training dataset. Default is 1. This argument controls the behavior of the function in the following way:
|
The output depends on the parameter kfold
If kfold<2: a list containing three objects: pca, bs, regionsCount
pca: A list containing lists named as the MultiAssayExperiment::experiments() found in the MultiAssayExperiment objects listed in the param tbaPaths. Each of these lists contain two objects:
eigenvectors: A matrix containing eigenvectors for those principal components of the TBA selected according to the param varExplained
pcs: A matrix containing the principal components values of the TBA selected according to the param varExplained
eigenvalues: A vector containing eigenvalues for those principal components of the TBA selected according to the param varExplained
bs: A list containing lists named as the EXPRESSED_REGIONS found in the param regionAssoc that have a correspondent rowname in the expression values stored SummarizedExperiment::assays(exprMatrix)$assay. Each of the lists in bs contains three objects:
coefficients: The coefficients of the principal components used in the model, completely similar to the "coefficients" from the results of lm()
p.val: The uncorrected anova pvalue of the model
r.sq: The coefficient of determination between the real total expression values and the imputed GReX, retrived from summary(model)$r.squared
corrected.p.val: The p-value of the model, corrected for multiple testing with benjamini-hochberg procedure
regionsCount: An integer, that is the number of genomic regions taken into account during the training phase
If kfold>=2: a list containing k-fold objects, named from 1 to kfold and corresponding to the different cross-validations [i]; each one of these objects is a list containing lists named as the expressed gene IDs [y] (i.e. the rownames() of the object in SummarizedExperiment::assays(exprMatrix) containing the expression values), for which a GReX could be imputed. Each of these inner lists contain two objects:
rho: the pearson's correlation coefficient (R) between the real expression values and the imputed GReX for the cross-validation i on the expressed gene y, computed with cor()
rho.sq: the coefficient of determination (R^2) between the real expression values and the imputed GReX for the cross-validation i on the expressed gene y, computed as pearson^2
cor.test.p.val: the p-value of the cor.test() between the real expression values and the imputed GReX for the cross-validation i on the expressed gene y
model.p.val: The uncorrected anova pvalue of the model
model.corrected.p.val: The p-value of the model, corrected for multiple testing with benjamini-hochberg procedure
model.r.sq: the model's coefficient of determination (R^2) on the training data
if(interactive()) { trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE, kfold=3) }
if(interactive()) { trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE, kfold=3) }
Reorganize associations table in a list
assoc2list(gene, regionAssoc)
assoc2list(gene, regionAssoc)
gene |
A string; the name of an expressed gene |
regionAssoc |
A data.frame with the associations between regulatory regions and expressed genes, and with colnames = c("REGULATORY_REGION", "EXPRESSED_REGION") |
A list of data frames, each of which has the same structure of the param regionAssoc, except that contains the information relative to one expressed gene
if (interactive()) { data(regionAssoc) expressedRegions <- as.list(as.vector(unique(regionAssoc$EXPRESSED_REGION))) gene <- expressedRegions[[1]] assocList <- assoc2list(gene, regionAssoc) }
if (interactive()) { data(regionAssoc) expressedRegions <- as.list(as.vector(unique(regionAssoc$EXPRESSED_REGION))) gene <- expressedRegions[[1]] assocList <- assoc2list(gene, regionAssoc) }
Fit a linear model to impute a GReX for a certain gene
computeBs(assocRegions, pca, expr, covariates = NULL)
computeBs(assocRegions, pca, expr, covariates = NULL)
assocRegions |
A data.frame with the associations between regulatory regions and one expressed gene, and with colnames = c("REGULATORY_REGION", "EXPRESSED_REGION") |
pca |
The returningObject$pca from affiXcanTrain() |
expr |
A matrix containing the real total expression values, where the columns are genes and the rows are individual IIDs |
covariates |
Optrional; a data.frame with covariates values for the population structure where the columns are the PCs and the rows are the individual IIDs; default is NULL |
A list containing three objects:
coefficients: An object containing the coefficients of the principal components used in the model, completely similar to the "coefficients" from the results of lm()
p.val: The uncorrected anova pvalue of the model
r.sq: The coefficient of determination between the real total expression values and the imputed GReX, retrived from summary(model)$r.squared
if (interactive()) { data(exprMatrix) data(trainingCovariates) data(regionAssoc) tbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") regionsCount <- overlookRegions(tbaPaths) assay <- "values" sampleNames <- colnames(exprMatrix) nSamples <- length(sampleNames) sampGroups <- subsetKFold(k=5, n=nSamples) for (i in seq(length(sampGroups))) { sampGroups[[i]] <- colnames(exprMatrix)[sampGroups[[i]]] } testingSamples <- sampGroups[[1]] trainingSamples <- sampleNames[!sampleNames %in% testingSamples] pca <- affiXcanPca(tbaPaths=tbaPaths, varExplained=80, scale=TRUE, regionsCount=regionsCount, trainingSamples=trainingSamples) cov <- trainingCovariates cov <- cov[rownames(cov) %in% trainingSamples,] expr <- SummarizedExperiment::assays(exprMatrix)[[assay]] expr <- expr[,colnames(expr) %in% trainingSamples] expr <- t(as.data.frame(expr)) expressedRegions <- as.vector(unique(regionAssoc$EXPRESSED_REGION)) assocList <- BiocParallel::bplapply(X=expressedRegions, FUN=assoc2list, regionAssoc) names(assocList) <- expressedRegions assocRegions <- assocList[[1]] bs <- computeBs(assocRegions=assocRegions, pca=pca, expr=expr, covariates=cov) }
if (interactive()) { data(exprMatrix) data(trainingCovariates) data(regionAssoc) tbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") regionsCount <- overlookRegions(tbaPaths) assay <- "values" sampleNames <- colnames(exprMatrix) nSamples <- length(sampleNames) sampGroups <- subsetKFold(k=5, n=nSamples) for (i in seq(length(sampGroups))) { sampGroups[[i]] <- colnames(exprMatrix)[sampGroups[[i]]] } testingSamples <- sampGroups[[1]] trainingSamples <- sampleNames[!sampleNames %in% testingSamples] pca <- affiXcanPca(tbaPaths=tbaPaths, varExplained=80, scale=TRUE, regionsCount=regionsCount, trainingSamples=trainingSamples) cov <- trainingCovariates cov <- cov[rownames(cov) %in% trainingSamples,] expr <- SummarizedExperiment::assays(exprMatrix)[[assay]] expr <- expr[,colnames(expr) %in% trainingSamples] expr <- t(as.data.frame(expr)) expressedRegions <- as.vector(unique(regionAssoc$EXPRESSED_REGION)) assocList <- BiocParallel::bplapply(X=expressedRegions, FUN=assoc2list, regionAssoc) names(assocList) <- expressedRegions assocRegions <- assocList[[1]] bs <- computeBs(assocRegions=assocRegions, pca=pca, expr=expr, covariates=cov) }
Compute R and R^2 on a particular row of two SummarizedExperiment assays
computeCorrelation(geneName, realExpr, imputedExpr)
computeCorrelation(geneName, realExpr, imputedExpr)
geneName |
A string. The row name in realExpr and imputedExpr objects that identifies the vectors between which R and R^2 have to be computed |
realExpr |
A SummarizedExperiment object containing expression data |
imputedExpr |
The returning object of affiXcanImpute() |
A list of two objects:
rho: the pearson's correlation coefficient (R) between the real expression values and the imputed GReX for the cross-validation i on the expressed gene y, computed with cor()
rho.sq: the coefficient of determination (R^2) between the real expression values and the imputed GReX for the cross-validation i on the expressed gene y, computed as pearson^2
cor.test.p.val: the p-value of the cor.test() between the real expression values and the imputed GReX for the cross-validation i on the expressed gene y
if (interactive()) { trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) imputedExpr <- affiXcanImpute(tbaPaths=trainingTbaPaths, affiXcanTraining=training, scale=TRUE) realExpr <- exprMatrix geneName <- "ENSG00000256377.1" imputedExpr <- SummarizedExperiment::assays(imputedExpr)$GReX realExpr <- SummarizedExperiment::assays(realExpr)[[assay]] correlation <- computeCorrelation(geneName, realExpr, imputedExpr) }
if (interactive()) { trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) imputedExpr <- affiXcanImpute(tbaPaths=trainingTbaPaths, affiXcanTraining=training, scale=TRUE) realExpr <- exprMatrix geneName <- "ENSG00000256377.1" imputedExpr <- SummarizedExperiment::assays(imputedExpr)$GReX realExpr <- SummarizedExperiment::assays(realExpr)[[assay]] correlation <- computeCorrelation(geneName, realExpr, imputedExpr) }
Compute the imputed GReX for a certain gene on a set of individuals
computeExpr(bs, pcs)
computeExpr(bs, pcs)
bs |
A list containing three objects:
|
pcs |
A list, which is the returning object of affiXcanPcs() |
A vector of imputed GReX values
if (interactive()) { trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) testingTbaPaths <- system.file("extdata","testing.tba.toydata.rds", package="AffiXcan") pcs <- affiXcanPcs(tbaPaths=testingTbaPaths, affiXcanTraining=training, scale=TRUE) region <- "ENSG00000256377.1" bs <- training$bs[[region]] exprmatrix <- computeExpr(bs=bs, pcs=pcs) }
if (interactive()) { trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) testingTbaPaths <- system.file("extdata","testing.tba.toydata.rds", package="AffiXcan") pcs <- affiXcanPcs(tbaPaths=testingTbaPaths, affiXcanTraining=training, scale=TRUE) region <- "ENSG00000256377.1" bs <- training$bs[[region]] exprmatrix <- computeExpr(bs=bs, pcs=pcs) }
Perform a PCA on a matrix where columns are variables
computePca(data, varExplained = 80, scale = TRUE)
computePca(data, varExplained = 80, scale = TRUE)
data |
A matrix containing the TBA values for a certain genomic region; columns are PWMs, rows are individuals IIDs |
varExplained |
An integer between 0 and 100; varExplained=80 means that the principal components selected to fit the models must explain at least 80 percent of variation of TBA values; default is 80 |
scale |
A logical; if scale=FALSE the TBA values will be only centered, not scaled before performing PCA; default is TRUE |
A list containing two objects:
eigenvectors: A matrix containing eigenvectors for those principal components selected according to the param varExplained
pcs: A matrix containing the principal components values selected according to the param varExplained
eigenvalues: A vector containing eigenvalues for those principal components selected according to the param varExplained
if (interactive()) { data(exprMatrix) sampleNames <- colnames(exprMatrix) nSamples <- length(sampleNames) sampGroups <- subsetKFold(k=3, n=nSamples) for (i in seq(length(sampGroups))) { sampGroups[[i]] <- colnames(exprMatrix)[sampGroups[[i]]] } testingSamples <- sampGroups[[i]] trainingSamples <- sampleNames[!sampleNames %in% testingSamples] tbaMatrixMAE <- readRDS(system.file("extdata","training.tba.toydata.rds", package="AffiXcan")) tbaMatrixMAE <- MultiAssayExperiment::subsetByRow(tbaMatrixMAE, trainingSamples) tbaMatrix <- MultiAssayExperiment::experiments(tbaMatrixMAE) tba <- tbaMatrix$ENSG00000256377.1 pca <- computePca(data=tba, varExplained=80, scale=TRUE) }
if (interactive()) { data(exprMatrix) sampleNames <- colnames(exprMatrix) nSamples <- length(sampleNames) sampGroups <- subsetKFold(k=3, n=nSamples) for (i in seq(length(sampGroups))) { sampGroups[[i]] <- colnames(exprMatrix)[sampGroups[[i]]] } testingSamples <- sampGroups[[i]] trainingSamples <- sampleNames[!sampleNames %in% testingSamples] tbaMatrixMAE <- readRDS(system.file("extdata","training.tba.toydata.rds", package="AffiXcan")) tbaMatrixMAE <- MultiAssayExperiment::subsetByRow(tbaMatrixMAE, trainingSamples) tbaMatrix <- MultiAssayExperiment::experiments(tbaMatrixMAE) tba <- tbaMatrix$ENSG00000256377.1 pca <- computePca(data=tba, varExplained=80, scale=TRUE) }
Compute a matrix product between variables and eigenvectors
computePcs(region, tbaMatrix, scale, pca)
computePcs(region, tbaMatrix, scale, pca)
region |
A string, which is the name of the object in the list MultiAssayExperiment::experiments(tbaMatrix) that contains the TBA values of the genomic region of interest (see the param tbaMatrix) |
tbaMatrix |
A MultiAssayExperiment object containing the TBA values |
scale |
A logical; if scale=FALSE the TBA values will be only centered, not scaled before performing PCA |
pca |
The returningObject$pca from affiXcanTrain() |
A data.frame containing the principal components values of the TBA in a certain genomic region, as the result of the matrix product between the TBA values and the eigenvectors
if (interactive()) { trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) region <- "ENSG00000256377.1" tbaMatrixMAE <- readRDS(system.file("extdata","training.tba.toydata.rds", package="AffiXcan")) pca <- training$pca pcs <- computePcs(region=region, tbaMatrix=tbaMatrixMAE, scale=TRUE, pca=pca) }
if (interactive()) { trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) region <- "ENSG00000256377.1" tbaMatrixMAE <- readRDS(system.file("extdata","training.tba.toydata.rds", package="AffiXcan")) pca <- training$pca pcs <- computePcs(region=region, tbaMatrix=tbaMatrixMAE, scale=TRUE, pca=pca) }
Compute R and R^2 between rows of two SummarizedExperiment assays
computeRSquared( realExpr, imputedExpr, assay, testingSamples = NULL, BPPARAM = bpparam() )
computeRSquared( realExpr, imputedExpr, assay, testingSamples = NULL, BPPARAM = bpparam() )
realExpr |
A SummarizedExperiment object containing expression data |
imputedExpr |
The returning object of affiXcanImpute() |
assay |
A string with the name of the object in SummarizedExperiment::assays(realExpr) that contains expression values |
testingSamples |
A vector of strings. The identifiers of the samples that have to be considered by the function; default is NULL; if is.null(testingSamples)==TRUE then no filtering is performed |
BPPARAM |
A BiocParallelParam object. Default is bpparam(). For details on BiocParallelParam virtual base class see browseVignettes("BiocParallel") |
A list of lists; inner lists are named after the rows for which the correlation between realExpr and imputedExpr have been computed; inner lists contain two objects:
rho: the pearson's correlation coefficient (R) between the real expression values and the imputed GReX for the cross-validation i on the expressed gene y, computed with cor()
rho.sq: the coefficient of determination (R^2) between the real expression values and the imputed GReX for the cross-validation i on the expressed gene y, computed as pearson^2
cor.test.p.val: the p-value of the cor.test() between the real expression values and the imputed GReX for the cross-validation i on the expressed gene y
if (interactive()) { trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) imputedExpr <- affiXcanImpute(tbaPaths=trainingTbaPaths, affiXcanTraining=training, scale=TRUE) realExpr <- exprMatrix correlation <- computeRSquared(realExpr, imputedExpr, assay) }
if (interactive()) { trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE) imputedExpr <- affiXcanImpute(tbaPaths=trainingTbaPaths, affiXcanTraining=training, scale=TRUE) realExpr <- exprMatrix correlation <- computeRSquared(realExpr, imputedExpr, assay) }
Toy dataset used in examples to describe affiXcanTrain() function.
data(exprMatrix)
data(exprMatrix)
An object of class SummarizedExperiment
.
The data consist in a SummarizedExperiment object that contains a matrix of expression values (RPKM) of two randomly chosen genes ("ENSG00000139269.2" and "ENSG00000256377.1") for 229 individuals of european descent. The data was retrieved subsetting the RNA-sequencing data of EBV-transformed lymphocytes from the GEUVADIS public dataset (see 'source')
https://www.ebi.ac.uk/arrayexpress/files/E-GEUV-3/
library(SummarizedExperiment) data(exprMatrix) toyExpressionMatrix <- assays(exprMatrix)$values
library(SummarizedExperiment) data(exprMatrix) toyExpressionMatrix <- assays(exprMatrix)$values
Count the number of genomic regions on which the TBA was computed
overlookRegions(tbaPaths)
overlookRegions(tbaPaths)
tbaPaths |
A vector of strings, which are the paths to MultiAssayExperiment RDS files containing the tba values |
An integer, that is the summation of length(assays()) of every MultiAssayExperiment RDS object indicated in the param tbaPaths
if (interactive()) { testingTbaPaths <- system.file("extdata","testing.tba.toydata.rds", package="AffiXcan") regionsCount <- overlookRegions(tbaPaths=testingTbaPaths) }
if (interactive()) { testingTbaPaths <- system.file("extdata","testing.tba.toydata.rds", package="AffiXcan") regionsCount <- overlookRegions(tbaPaths=testingTbaPaths) }
Toy data used in examples to describe affiXcanTrain() and affiXcanImpute() functions.
data(regionAssoc)
data(regionAssoc)
An object of class data.frame
The object consists in a data.frame with two columns: for every "EXPRESSED_REGION" are listed the associated "REGULATORY_REGION"(s). For the illustrative purpose there are only two expressed genes: "ENSG00000139269.2" and "ENSG00000256377.1" (the same names are used in the SummarizedExperiment object containing the expression matrix, see help(exprMatrix) ).
For every gene, the associated regulatory regions were retrieved among the enhancers predicted by preSTIGE, a tool developed by O. Corradin et al. (https://genome.cshlp.org/content/24/1/1.full), in EBV-transformed lymphocytes cell lines. The name of the regulatory region being identical to the name of the expressed gene means that the regulatory region refers to a genomic window that spans at least for 2 Kbp and includes the most upstream TSS of the gene.
data(regionAssoc) head(regionAssoc)
data(regionAssoc) head(regionAssoc)
Split the numbers from 1 to n in k equally-sized lists
subsetKFold(k, n)
subsetKFold(k, n)
k |
An integer. The number of groups in which the first n natural numbers are to be splitted |
n |
An integer. Defines the interval 1..n |
A list of lists; the first natural n numbers equally distributed across k lists
sampGroups <- subsetKFold(k=5, n=23)
sampGroups <- subsetKFold(k=5, n=23)
Toy data used in examples to describe affiXcanTrain() function.
data(trainingCovariates)
data(trainingCovariates)
An object of class data.frame
This object consists in a data.frame where columns are the first three principal components of the population genetic structure and rows are individuals' IDs. These individuals are the same whom expression values are stored in the expression matrix (see help(exprMatrix) )
Genotypes of the individuals were downloaded from the GEUVADIS public dataset (https://www.ebi.ac.uk/arrayexpress/files/E-GEUV-1/) in vcf format. Following L. Price et al. (https://www.sciencedirect.com/science/article/pii/S0002929708003534), long range linkage disequilibrium (LRLD) regions were first filtered out with vcf-tools. Then, following J. Novembre et al. (www.nature.com/articles/nature07331), non-common alleles (MAF < 0.05) were filtered out with vcftools and LD pruning was performed with plink. Finally, principal components were computed with eigenstrat.
data(trainingCovariates) head(trainingCovariates)
data(trainingCovariates) head(trainingCovariates)