Title: | data structures for linkage disequilibrium measures in populations |
---|---|
Description: | Define data structures for linkage disequilibrium measures in populations. |
Authors: | VJ Carey <[email protected]> |
Maintainer: | VJ Carey <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.37.0 |
Built: | 2024-10-30 08:36:59 UTC |
Source: | https://github.com/bioc/ldblock |
c("\Sexpr[results=rd,stage=build]tools:::Rd_package_description(\"#1\")", "ldblock")Define data structures for linkage disequilibrium measures in populations.
The DESCRIPTION file: c("\Sexpr[results=rd,stage=build]tools:::Rd_package_DESCRIPTION(\"#1\")", "ldblock")This package was not yet installed at build time.\cr c("\Sexpr[results=rd,stage=build]tools:::Rd_package_indices(\"#1\")", "ldblock") Index: This package was not yet installed at build time.\cr
c("\Sexpr[results=rd,stage=build]tools:::Rd_package_author(\"#1\")", "ldblock")VJ Carey <[email protected]>
Maintainer: c("\Sexpr[results=rd,stage=build]tools:::Rd_package_maintainer(\"#1\")", "ldblock")VJ Carey <[email protected]>
# see vignette
# see vignette
download hapmap resource with LD estimates
downloadPopByChr( chrname = "chr1", popname = "CEU", urlTemplate = "http://hapmap.ncbi.nlm.nih.gov/downloads/ld_data/2009-02_phaseIII_r2/ld_%%CHRN%%_%%POPN%%.txt.gz", targfolder = Sys.getenv("LDBLOCK_TXTGZ_DIR") )
downloadPopByChr( chrname = "chr1", popname = "CEU", urlTemplate = "http://hapmap.ncbi.nlm.nih.gov/downloads/ld_data/2009-02_phaseIII_r2/ld_%%CHRN%%_%%POPN%%.txt.gz", targfolder = Sys.getenv("LDBLOCK_TXTGZ_DIR") )
chrname |
UCSC format tag for chromosome |
popname |
hapmap three letter code for population, e.g. 'CEU' |
urlTemplate |
pattern for creating URL given chr and pop |
targfolder |
destination |
delivers HapMap LD data to 'targfolder'
just run for side effect of download.file
## Not run: downloadPopByChr() ## End(Not run)
## Not run: downloadPopByChr() ## End(Not run)
singletons from EUR
EUR_singletons
EUR_singletons
character vector
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130606_sample_info/20130606_sample_info.xlsx, to which superpopulation codes were added
Given a set of SNP identifiers, use LD to expand the set to include linked loci
expandSnpSet( rsl, lb = 0.8, ldstruct, chrn = "chr17", popn = "CEU", txtgzfn = dir(system.file("hapmap", package = "ldblock"), full.names = TRUE) )
expandSnpSet( rsl, lb = 0.8, ldstruct, chrn = "chr17", popn = "CEU", txtgzfn = dir(system.file("hapmap", package = "ldblock"), full.names = TRUE) )
rsl |
input list – SNPs not found in the LD structure are simply returned along with those found, and the expansion list, all combined in a vector |
lb |
lower bound on statistic used to retrieve loci in LD |
ldstruct |
instance of |
chrn |
chromosome identifier |
popn |
population identifier (one of 'CEU', 'MEX', ...) |
txtgzfn |
path to gzipped hapmap file with LD information |
direct use of elementwise arithmetic comparison
character vector
As of 2015, it appears that locus names are more informative than addresses for determining SNP identity across resources.
og = Sys.getenv("LDBLOCK_TXTGZ_DIR") on.exit( Sys.setenv("LDBLOCK_TXTGZ_DIR" = og ) ) Sys.setenv("LDBLOCK_TXTGZ_DIR"=system.file("hapmap", package="ldblock")) ld17 = hmld(chr="chr17", pop="CEU") ee = expandSnpSet( ld17@allrs[1:10], ldstruct = ld17 )
og = Sys.getenv("LDBLOCK_TXTGZ_DIR") on.exit( Sys.setenv("LDBLOCK_TXTGZ_DIR" = og ) ) Sys.setenv("LDBLOCK_TXTGZ_DIR"=system.file("hapmap", package="ldblock")) ld17 = hmld(chr="chr17", pop="CEU") ee = expandSnpSet( ld17@allrs[1:10], ldstruct = ld17 )
import hapmap LD data and create a structure for its management; generates a sparse matrix representation of pairwise LD statistics and binds metadata on variant name and position
hmld(hmgztxt, poptag, chrom, genome = "hg19", stat = "Dprime")
hmld(hmgztxt, poptag, chrom, genome = "hg19", stat = "Dprime")
hmgztxt |
name of gzipped text file as distributed at
hapmap.ncbi.nlm.nih.gov/downloads/ld_data/2009-02_phaseIII_r2/. It
will be processed by |
poptag |
heuristic tag identifying population |
chrom |
heuristic tag for chromosome name |
genome |
genome tag |
stat |
statistic to use, "Dprime", "R2", and "LOD" are options |
instance of ldstruct class
getClass("ldstruct") # see vignette
getClass("ldstruct") # see vignette
Obtain LD statistics in region specified by a gene model.
ldByGene( sym = "MMP24", vcf = system.file("vcf/c20exch.vcf.gz", package = "ldblock"), flank = 1000, vcfSLS = "NCBI", genomeSLS = "hg19", stats = "D.prime", depth = 10 )
ldByGene( sym = "MMP24", vcf = system.file("vcf/c20exch.vcf.gz", package = "ldblock"), flank = 1000, vcfSLS = "NCBI", genomeSLS = "hg19", stats = "D.prime", depth = 10 )
sym |
A standard gene symbol for use with |
vcf |
Path to a tabix-indexed VCF file |
flank |
number of basepairs to flank gene model for search |
vcfSLS |
seqlevelsStyle (SLS) token for VCF; will be imposed on gene model |
genomeSLS |
character tag for genome, to be used with
|
stats |
passed to |
depth |
passed to |
sparse matrix representation of selected LD statistic, as returned
by ld
Uses an internal function genemod4ldblock, that relies on EnsDb.Hsapiens.v75 to get gene model.
if (interactive()) { # there is a warning owing to non-SNV present ld1 = ldByGene(depth=150) image(ld1[1:200,1:200], col.reg=heat.colors(120), colorkey=TRUE, main="SNPs in MMP24 (chr20)") }
if (interactive()) { # there is a warning owing to non-SNV present ld1 = ldByGene(depth=150) image(ld1[1:200,1:200], col.reg=heat.colors(120), colorkey=TRUE, main="SNPs in MMP24 (chr20)") }
use LDmat API from NCI LDlink service
ldmat(rsvec, pop = "CEU", type = "d", token = Sys.getenv("LDLINK_TOKEN"))
ldmat(rsvec, pop = "CEU", type = "d", token = Sys.getenv("LDLINK_TOKEN"))
rsvec |
character vector of SNP ids |
pop |
three letter code for HapMap population, defaults to CEU |
type |
'r2' or 'd', defaults to 'd' implying d-prime |
token |
the API token provided by NCI, defaults to value of environment variable LDLINK_TOKEN |
data.frame
if (interactive()) ldmat(c("rs77749396","rs9303279","rs9303280","rs9303281"))
if (interactive()) ldmat(c("rs77749396","rs9303279","rs9303280","rs9303281"))
accessor for matrix component
## S4 method for signature 'ldstruct' ldmat(x)
## S4 method for signature 'ldstruct' ldmat(x)
x |
instance of ldstruct |
Manage information about LD statistics as reported by HapMap.
Objects can be created by calls of the form
new("ldstruct", ...)
.
showClass("ldstruct")
showClass("ldstruct")
Create a URL referencing 1000 genomes content in AWS S3. stack1kg produces a VcfStack instance with references to VCF for 1000 genomes autosomal chrs. S3-resident VCF files with version "v5a.20130502" are used.
s3_1kg(chrnum, tmpl, dropchr = TRUE)
s3_1kg(chrnum, tmpl, dropchr = TRUE)
chrnum |
a character string denoting a chromosome, such as '22' |
tmpl |
alternate template for full URL, useful if versions prior to 2010 are of interest |
dropchr |
if TRUE |
by default, a TabixFile instance
The "wrap" parameter has been removed. A TabixFile structure will be returned. The tag parameter has been removed. Supply a tmpl argument if you are not using 20130502 version.
requireNamespace("Rsamtools") s3_1kg("22") # try scanVcfHeader from VariantAnnotation
requireNamespace("Rsamtools") s3_1kg("22") # try scanVcfHeader from VariantAnnotation
population and relationship information for 1000 genomes
sampinf_1kg
sampinf_1kg
data.frame
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130606_sample_info/20130606_sample_info.xlsx, to which superpopulation codes were added
couple together a group of VCFs
stack1kg(chrs = as.character(1:22), index = FALSE, useEBI = FALSE)
stack1kg(chrs = as.character(1:22), index = FALSE, useEBI = FALSE)
chrs |
a vector of chromosome names for extraction from 1000 genomes VCF collection |
index |
logical telling whether VcfStack should attempt to create the local index; for 1000 genomes, the tbi are in the cloud and will be used by readVcf so FALSE is appropriate |
useEBI |
logical(1) defaults to FALSE ... if TRUE, use tabix-indexed vcf from EBI, but in July 2022 the EBI FTP site does not respond. If FALSE, the AWS Open Data access path is used |
VcfStack instance
The seqinfo component of returned stack will have NA for genome. Please set it manually; for useEBI=TRUE this would be GRCh38; very likely so for useEBI=FALSE, but this should be checked.
if (interactive()) { st1 = stack1kg() st1 }
if (interactive()) { st1 = stack1kg() st1 }