Title: | Identification of candidate genes associated with genetic variation |
---|---|
Description: | Identification of the most likely gene or genes through which variation at a given genomic locus in the human genome acts. The most basic functionality assumes that the closer gene is to the input locus, the more likely the gene is to be causative. Additionally, any empirical data that links genomic regions to genes (e.g. eQTL or genome conformation data) can be used if it is supplied in the UCSC .BED file format. |
Authors: | Arthur Wuster |
Maintainer: | Arthur Wuster <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.33.0 |
Built: | 2024-10-30 08:01:23 UTC |
Source: | https://github.com/bioc/geneAttribution |
Identification of the most likely gene or genes through which variation at a given genomic locus in the human genome acts. The most basic functionality assumes that the closer gene is to the input locus, the more likely the gene is to be causative. Additionally, any empirical data that links genomic regions to genes (e.g. eQTL or genome conformation data) can be used if it is supplied in UCSC .bed file format. A typical workflow requires loading gene models and empirical data, then running geneAttribution() on the locus of interest
Given genomic coordinate, return normalized probability for each gene
geneAttribution(chr, pos, geneCoordinates, empiricalData, lambda = 7.61e-06, maxDist = 1e+06, minPP = 0.01)
geneAttribution(chr, pos, geneCoordinates, empiricalData, lambda = 7.61e-06, maxDist = 1e+06, minPP = 0.01)
chr |
A character string representing a chromosome (e.g. "chr2") |
pos |
An integer representing a genomic position in the same genome build that gene models |
geneCoordinates |
A GenomicRanges object, as generated by geneModels() |
empiricalData |
A list of GenomicRanges objects, as generated by loadBed(). Optional |
lambda |
Float. Variable for exponential distribution. Default based on empirical eQTL data from multiple tissues. Optional |
maxDist |
Integer. Only genes within this distance of query locus are considered. Optional |
minPP |
Float. Genes with a posterior probability < minPP are lumped as "Other". Can be set to 0 when all genes should be reported. Optional |
A sorted, numeric vector of normalized gene probabilities
geneLocs <- geneModels() fileName <- system.file("extdata", "eqtlHaplotypeBlocks.b38.bed", package="geneAttribution") empirical <- loadBed(fileName) geneAttribution("chr2", 127156000, geneLocs, empirical)
geneLocs <- geneModels() fileName <- system.file("extdata", "eqtlHaplotypeBlocks.b38.bed", package="geneAttribution") empirical <- loadBed(fileName) geneAttribution("chr2", 127156000, geneLocs, empirical)
Get gene models as a GenomicRanges object, with gene names in the symbol column For hg19, you may want to use TxDb.Hsapiens.UCSC.hg19.knownGene and for GRCh38, TxDb.Hsapiens.UCSC.hg38.knownGene (set as default)
geneModels(txdb = TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene, maxGeneLength = 1e+06, genesToInclude, genesToExclude)
geneModels(txdb = TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene, maxGeneLength = 1e+06, genesToInclude, genesToExclude)
txdb |
GenomicFeatures TxDb object containing genomic coordinates of genes |
maxGeneLength |
An integer. Gene models that are longer than this are excluded. Optional |
genesToInclude |
A character vector of gene symbols of genes to include (e.g. only protein coding genes). Optional |
genesToExclude |
A character vector of gene symbols of genes to exclude. Optional |
A GenomicRanges object containing human gene models
geneModels() geneModels(genesToInclude = c("CYYR1", "ADAMTS1", "ADAMTS5", "N6AMT1", "LTN1"))
geneModels() geneModels(genesToInclude = c("CYYR1", "ADAMTS1", "ADAMTS5", "N6AMT1", "LTN1"))
Required *.BED file format (tab-separated): chr start end name (optional column: score). Sample files supplied with package are limited to chromosome 2.
loadBed(files, weights)
loadBed(files, weights)
files |
A character vector containing *.BED file names |
weights |
An integer vector containing weighting for each bed file. Optional |
A list of GenomicRanges objects containing the data from the *.BED files, with weightings in the score column
fileName1 <- system.file("extdata", "hiCRegions.b38.bed", package="geneAttribution") fileName2 <- system.file("extdata", "eqtlHaplotypeBlocks.b38.bed", package="geneAttribution") loadBed(c(fileName1, fileName2), c(2, 5))
fileName1 <- system.file("extdata", "hiCRegions.b38.bed", package="geneAttribution") fileName2 <- system.file("extdata", "eqtlHaplotypeBlocks.b38.bed", package="geneAttribution") loadBed(c(fileName1, fileName2), c(2, 5))
Normalize likelihoods and return probabilities
normP(pVector, minPP = 0)
normP(pVector, minPP = 0)
pVector |
A numeric vector of pre-normalization likelihoods |
minPP |
A float. Genes with a posterior probability < minPP are lumped as "Other". Optional |
A sorted, numeric vector of normalized probabilities
normP(c(5, 1, 1, 1, 1, 1, 0.1)) normP(c(5, 1, 1, 1, 1, 1, 0.1), minPP=0.1)
normP(c(5, 1, 1, 1, 1, 1, 0.1)) normP(c(5, 1, 1, 1, 1, 1, 0.1), minPP=0.1)