Title: | integrated Bayesian Modeling of eQTL data |
---|---|
Description: | integrated Bayesian Modeling of eQTL data |
Authors: | Marie-Pier Scott-Boyer and Greg Imholte |
Maintainer: | Greg Imholte <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.47.0 |
Built: | 2024-10-30 08:28:44 UTC |
Source: | https://github.com/bioc/iBMQ |
This method is designed to detect expression QTLs (eQTLs) by incorporating genotypic and gene expression data into a single model while 1) specifically coping with the high dimensionality of eQTL data (large number of genes), 2) borrowing strength from all gene expression data for the mapping procedures, and 3) controlling the number of false positives to a desirable level.
In the context of multiple testing and discoveries, a popular approach is to use a common threshold leading to a desired false discovery rate (FDR). In the Bayesian paradigm, derivation of the PPA threshold is trivial and can be calculated using a direct posterior probability calculation as described in Newton et al. (2004).
calculateThreshold(prob, threshold)
calculateThreshold(prob, threshold)
prob |
matrix or data frame that contains Posterior Probability of Association (output of eqtlMcmc function). |
threshold |
The desired false discovery rate. |
cutoff |
The significance threshold value |
Newton, MA., Noueiry, A., Sarkar, D. and Ahlquist, P. (2004): "Detecting differential gene expression with a semiparametric hierarchical mixture method."Biometrics, 5(2), 155-176
data(PPA.liver) cutoff.liver <- calculateThreshold(PPA.liver, 0.2)
data(PPA.liver) cutoff.liver <- calculateThreshold(PPA.liver, 0.2)
It is customary to distinguish two kinds of eQTLs: 1) cis-eQTLs (where the eQTL is on the same locus as the expressed gene); and 2) trans-eQTLs (where the eQTL is on a locus other than that of the expressed gene). The eqtlClassifier allows us to classify the eQTLs as either cis-eQTL or trans-eQTL according to their position in the genome.
eqtlClassifier(peak, posSNP, posGENE, max)
eqtlClassifier(peak, posSNP, posGENE, max)
peak |
A data.frame of significant eQTLs (the output of the findEqtl function) |
posSNP |
A data frame specifying the genomic locations of genomic markers (i.e. SNPs). |
posGENE |
A data frame specifying the genomic locations of the genes (or probes). |
max |
A cutoff value (in base pair) corresponding to the threshold where a eQTL is considered to be cis-eQTL. A numerical value. |
The output of the eqtlClassifier is a data frame where the first column contains the names of each gene, the second column contains the names of markers and the third column contains the PPA value for each significant eQTL. The fourth column contains the number of the chromosome to which the gene belongs, the fifth column contains the start position of the gene and the sixth column contains the end position of the gene. The seventh column contains the number of the chromosome to which the marker belongs, the eighth column contains position of the marker and the ninth column contains a descriptor of the type of eQTL (either cis or trans). Please note that in order to ascertain that an eQTL is either cis or trans, the positions of the markers and the gene need to the given to the function. If one of the values is missing the type of eQTL will be "NA".
data(PPA.liver) cutoff.liver <- calculateThreshold(PPA.liver, 0.2) eqtl.liver <- eqtlFinder(PPA.liver, cutoff.liver) data(map.liver) data(probe.liver) eqtl.type.liver <- eqtlClassifier(eqtl.liver, map.liver, probe.liver,5000000)
data(PPA.liver) cutoff.liver <- calculateThreshold(PPA.liver, 0.2) eqtl.liver <- eqtlFinder(PPA.liver, cutoff.liver) data(map.liver) data(probe.liver) eqtl.type.liver <- eqtlClassifier(eqtl.liver, map.liver, probe.liver,5000000)
We can calculate how many eQTLs have PPA above the cutoff with the eqtlFinder function.
eqtlFinder(prob, threshold)
eqtlFinder(prob, threshold)
prob |
matrix or data frame that contains the Posterior Probability of Association values (output of eqtlMcmc function) |
threshold |
Threshold to be used to determine which QTLs are significant. This value can be the output of the calculateThreshold function.It must be a numerical value between 0 and 1. |
The output of the eqtlFinder is a data frame where the first column contains the names of each gene, the second column contains the names of corresponding markers and the third column contains the PPA value for each significant eQTL.
data(PPA.liver) cutoff.liver <- calculateThreshold(PPA.liver, 0.2) eqtl.liver <- eqtlFinder(PPA.liver, cutoff.liver)
data(PPA.liver) cutoff.liver <- calculateThreshold(PPA.liver, 0.2) eqtl.liver <- eqtlFinder(PPA.liver, cutoff.liver)
Compute the MCMC algorithm to produce Posterior Probability of Association values for eQTL mapping.
eqtlMcmc(snp, expr, n.iter, burn.in, n.sweep, mc.cores, write.output = TRUE, RIS = TRUE)
eqtlMcmc(snp, expr, n.iter, burn.in, n.sweep, mc.cores, write.output = TRUE, RIS = TRUE)
snp |
SnpSet class object |
expr |
ExpressionSet class object |
n.iter |
Number of samples to be saved from the Markov Chain |
burn.in |
Number of burn-in iterations for the Markov Chain |
n.sweep |
Number of iterations between samples of the Markov Chain (AKA thinning interval) |
mc.cores |
The number of cores you would like to use for parallel processing. Can be set be set via ‘options(cores=4)’, if not set, the code will automatically detect the number of cores. |
write.output |
Write chain iterations to file. If TRUE, output for variables will be written to files created in the working directory. |
RIS |
If TRUE, the genotype needs to be either 0 and 1. If FALSE the genotype need to be either 1,2 and 3. |
The value of mc.cores
may be ignored and set to one when the iBMQ installation does not support openMP.
A matrix with Posterior Probability of Association values. Rows correspond to snps from original snp data objects, columns correspond to genes from expr data objects.
Scott-Boyer, MP., Tayeb, G., Imholte, Labbe, A., Deschepper C., and Gottardo R. An integrated Bayesian hierarchical model for multivariate eQTL mapping (iBMQ). Statistical Applications in Genetics and Molecular Biology Vol. 11, 2012.
data(phenotype.liver) data(genotype.liver) #PPA.liver <- eqtlMcmc(genotype.liver, phenotype.liver, n.iter=100,burn.in=100,n.sweep=20,mc.cores=6, RIS=FALSE)
data(phenotype.liver) data(genotype.liver) #PPA.liver <- eqtlMcmc(genotype.liver, phenotype.liver, n.iter=100,burn.in=100,n.sweep=20,mc.cores=6, RIS=FALSE)
This dataset comprises the profiles of mRNA abundance in whole eye tissue from n = 68 BXD RIS mice, as measured using Affymetrix M430 2.0 microarrays. To ease calculation and facilitate comparisons, we will use a set of G = 1000 probes
data(gene)
data(gene)
The format is: Formal class 'ExpressionSet' [package "Biobase"]
This example uses data generated by Williams and Lu, as available from the Gene Networkwebsite (genenetwork.com). This dataset comprises the profiles of mRNA abundance in whole eye tissue from n = 68 BXD RIS mice, as measured using Affymetrix M430 2.0 microarrays. To ease calculation and facilitate comparisons, we will use a set of G = 1000 probes
data(gene)
data(gene)
A data frame specifying the genomic locations of each gene/probe needs to be prepared with the following columns: gene name, chromosome number, start location (in base pairs) and the location (in base pairs).
A data frame with 1000 observations with the following columns: gene name, chromosome number, start location (in base pairs) and the location (in base pairs).
This example uses data generated by Williams and Lu, as available from the Gene Networkwebsite (genenetwork.com). This dataset comprises the profiles of mRNA abundance in whole eye tissue from n = 68 BXD RIS mice, as measured using Affymetrix M430 2.0 microarrays. To ease calculation and facilitate comparisons, we will use a set of G = 1000 probes
data(genepos)
data(genepos)
A set of 290 single nucleotide polymorphic markers (SNPs) from 60 F2 mice.
The format is: Formal class 'SnpSet' [package "Biobase"]
This F2 cross data set containing genotypic and phenotypic information for 60 mice was obtained from the lab of Alan Attie at the University of Wisconsin-Madison. These data are also available at GEO (accession number GSE3330). Only the 5000 most variable expression traits out of 45,265 transcripts from the liver were used for the current example.
data(genotype.liver)
data(genotype.liver)
One main advantage of our method is its increased sensitivity for finding trans-eQTL hotspots (corresponding to situations where a single SNP is linked to the expression of several genes across the genome).
hotspotFinder(peak, numgene)
hotspotFinder(peak, numgene)
peak |
A data frame (3 columns) corresponding to the output of the eqtlFinder function or the data frame (9 columns) corresponding to the output of the eqtlClassifier function. |
numgene |
The minimum of gene to detect. |
The output of this function is a list, where each element is a marker. For each marker there is a data frame with all the eQTLs linked to this marker.
data(PPA.liver) cutoff.liver <- calculateThreshold(PPA.liver, 0.2) eqtl.liver <- eqtlFinder(PPA.liver, cutoff.liver) hotspot.liver <- hotspotFinder(eqtl.liver,20)
data(PPA.liver) cutoff.liver <- calculateThreshold(PPA.liver, 0.2) eqtl.liver <- eqtlFinder(PPA.liver, cutoff.liver) hotspot.liver <- hotspotFinder(eqtl.liver,20)
A data frame specifying the genomic locations of each SNP with following columns: SNP name, chromosome number, SNP location (in base pair).
A data frame with 290 observations with the following columns: SNP name, chromosome number, SNP location (in base pair).
This F2 cross data set containing genotypic and phenotypic information for 60 mice was obtained from the lab of Alan Attie at the University of Wisconsin-Madison. These data are also available at GEO (accession number GSE3330). Only the 5000 most variable expression traits out of 45,265 transcripts from the liver were used for the current example.
data(map.liver)
data(map.liver)
Gene expression of 5000 probe from liver tissue from n = 60 F2 mice.
The format is: Formal class 'ExpressionSet' [package "Biobase"]
This F2 cross data set containing genotypic and phenotypic information for 60 mice was obtained from the lab of Alan Attie Lab at the University of Wisconsin-Madison. These data are also available at GEO (accession number GSE3330). Only the 5000 most variable expression traits out of 45,265 transcripts from the liver were used for the current example.
data(phenotype.liver)
data(phenotype.liver)
The result is a matrix with Posterior Probabilities of Association for each gene (row) and SNP (column). The PPA matrix was previously calculated with 100,000 iterations for liver tissue from n = 60 F2 mice dataset
data(PPA.liver)
data(PPA.liver)
A data frame specifying the genomic locations of each gene/probe needs to be prepared with the following columns: gene name, chromosome number, start location (in base pairs) and the location (in base pairs).
data(probe.liver)
data(probe.liver)
A data frame with 4427 observations with the following columns: gene name, chromosome number, start location (in base pairs) and the location (in base pairs).
This F2 cross data set containing genotypic and phenotypic information for 60 mice was obtained from the lab of Alan Attie Lab at the University of Wisconsin-Madison. These data are also available at GEO (accession number GSE3330). Only the 5000 most variable expression traits out of 45,265 transcripts from the liver were used for the current example.
data(probe.liver)
data(probe.liver)
A set of 1700 single nucleotide polymorphic markers (SNPs) from 68 BXD RIS mice.
data(snp)
data(snp)
The format is: Formal class 'SnpSet' [package "Biobase"]
This example uses data generated by Williams and Lu, as available from the Gene Networkwebsite (genenetwork.com). This dataset comprises the profiles of mRNA abundance in wholeeye tissue from n = 68 BXD RIS mice, as measured using Affymetrix M430 2.0 microarrays . To ease calculation and facilitate comparisons, we will use a set of G = 1000 probes and 1700 single nucleotide polymorphic markers (SNPs).
data(snp)
data(snp)
A data frame specifying the genomic locations of each SNP with following columns: SNP name, chromosome number, SNP location (in base pair).
data(snppos)
data(snppos)
A data frame with 1700 observations with the following columns: SNP name, chromosome number, SNP location (in base pair).
This example uses data generated by Williams and Lu, as available from the Gene Network website (genenetwork.com).
data(snppos)
data(snppos)