Title: | Detection of Differentially Methylated Regions |
---|---|
Description: | This package detects significant differentially methylated regions (for both qualitative and quantitative traits), using a scan statistic with underlying Poisson heuristics. The scan statistic will depend on a sequence of window sizes (# of CpGs within each window) and on a threshold for each window size. This threshold can be calculated by three different means: i) analytically using Siegmund et.al (2012) solution (preferred), ii) an important sampling as suggested by Zhang (2008), and a iii) full MCMC modeling of the data, choosing between a number of different options for modeling the dependency between each CpG. |
Authors: | Christian M Page [aut, cre], Linda Vos [aut], Trine B Rounge [ctb, dtc], Hanne F Harbo [ths], Bettina K Andreassen [aut] |
Maintainer: | Christian M Page <[email protected]> |
License: | GPL-3 |
Version: | 1.29.0 |
Built: | 2024-11-29 05:10:26 UTC |
Source: | https://github.com/bioc/DMRScan |
DMR Scan function
dmrscan(observations, windowSize, windowThreshold = NULL, chr = NULL, pos = NULL, maxGap = 500, ...)
dmrscan(observations, windowSize, windowThreshold = NULL, chr = NULL, pos = NULL, maxGap = 500, ...)
observations |
An object of either; |
windowSize |
A sequence of windowSizes for the slidingWindow. Must be an integer vector, with equal length as the number of windows. |
windowThreshold |
Optional argument with corresponding cut-off for each window. Will be estimated if not supplied. |
chr |
A vector of chromosomal position. Only used when the observations vector is a matrix of test statistic. |
pos |
A vector of genomic coordinates for the CpGs to match the chr argument |
maxGap |
The maximum allowed gap between two CpGs within the same region. |
... |
Optional arguments to be passed to |
An object of type GRanges
with significantly differentially
## methylation data from chromosome 22 data(DMRScan.methylationData) ## phenotype (end-point for methylation data) data(DMRScan.phenotypes) ## Test for an association between phenotype and methylation test.statistics <- apply(DMRScan.methylationData,1,function(x,y) summary(glm(y ~ x, family = binomial(link = "logit")))$coefficients[2,3], y = DMRScan.phenotypes) ## Set chromosomal position to each test-statistic positions <- data.frame(matrix(as.integer(unlist(strsplit(names(test.statistics), split="chr|[.]"))), ncol = 3, byrow = TRUE))[,-1] ## Set clustering features min.cpg <- 4 ## Minimum number of CpGs in a tested cluster ## Maximum distance (in base-pairs) within a cluster ## before it is broken up into two separate cluster max.gap <- 750 ## Identify all clusters, and generate a list for each cluster regions <- makeCpGregions(observations = test.statistics, chr = positions[,1], pos = positions[,2], maxGap = max.gap, minCpG = min.cpg) ## Number of CpGs in the slidingWindows, can be either a single number ## or a sequence of windowSizes windowSizes <- 3:7 nCpG <- sum(sapply(regions,length)) ## Number of CpGs to be tested # Estimate the windowThreshold, based on the number of CpGs and windowSizes windowThresholds <- estimateWindowThreshold(nProbe = nCpG, windowSize = windowSizes, method = "sampling", mcmc = 10000) ## Run the slidingWindow DMRScanResults <- dmrscan(observations = regions, windowSize = windowSizes, windowThreshold = windowThresholds) ## Print the result print(DMRScanResults)
## methylation data from chromosome 22 data(DMRScan.methylationData) ## phenotype (end-point for methylation data) data(DMRScan.phenotypes) ## Test for an association between phenotype and methylation test.statistics <- apply(DMRScan.methylationData,1,function(x,y) summary(glm(y ~ x, family = binomial(link = "logit")))$coefficients[2,3], y = DMRScan.phenotypes) ## Set chromosomal position to each test-statistic positions <- data.frame(matrix(as.integer(unlist(strsplit(names(test.statistics), split="chr|[.]"))), ncol = 3, byrow = TRUE))[,-1] ## Set clustering features min.cpg <- 4 ## Minimum number of CpGs in a tested cluster ## Maximum distance (in base-pairs) within a cluster ## before it is broken up into two separate cluster max.gap <- 750 ## Identify all clusters, and generate a list for each cluster regions <- makeCpGregions(observations = test.statistics, chr = positions[,1], pos = positions[,2], maxGap = max.gap, minCpG = min.cpg) ## Number of CpGs in the slidingWindows, can be either a single number ## or a sequence of windowSizes windowSizes <- 3:7 nCpG <- sum(sapply(regions,length)) ## Number of CpGs to be tested # Estimate the windowThreshold, based on the number of CpGs and windowSizes windowThresholds <- estimateWindowThreshold(nProbe = nCpG, windowSize = windowSizes, method = "sampling", mcmc = 10000) ## Run the slidingWindow DMRScanResults <- dmrscan(observations = regions, windowSize = windowSizes, windowThreshold = windowThresholds) ## Print the result print(DMRScanResults)
DMRScan: An R-package for identification of Differentially Metylated Regions
observations |
An object of type GRangesList from makeCpGregions |
windowSize |
A sequence of windowSizes for the slidingWindow, must be an integer |
windowThreshold |
Optional argument with corresponding cut-off for each window. Will be estimated if not supplied. |
... |
Optional arguments to be pased to |
An object of type GRanges
with signficantly differentially
Christian Page, [email protected]
Not Published yet (Under revision)
## nProbeoad methylation data from chromosome 22 data(DMRScan.methylationData) ## nProbeoad phenotype (end-point for methylation data) data(DMRScan.phenotypes) ## Test for an association between phenotype and Methylation test.statistics <- apply(DMRScan.methylationData,1,function(x,y) summary(glm(y ~ x, family = binomial(link = "logit")))$coefficients[2,3], y = DMRScan.phenotypes) ## Set chromosomal position to each test-statistic positions <- data.frame(matrix(as.integer(unlist(strsplit(names(test.statistics), split="chr|[.]"))), ncol = 3, byrow = TRUE))[,-1] ## Set clustering features min.cpg <- 4 ## Minimum number of CpGs in a tested cluster ## Maxium distance (in base-pairs) within a cluster ## before it is broken up into two seperate cluster max.gap <- 750 ## Identify all clusters, and generate a list for each cluster regions <- makeCpGregions(observations = test.statistics, chr = positions[,1], pos = positions[,2], maxGap = max.gap, minCpG = min.cpg) ## Number of CpGs in the slidingWindows, can be either a single number ## or a sequence of windowSizes windowSizes <- 3:7 nCpG <- sum(sapply(regions, length)) ## Number of CpGs to be tested # Estimate the windowThreshold, based on the number of CpGs and windowSizes windowThresholds <- estimateWindowThreshold(nProbe = nCpG, windowSize = windowSizes, method = "sampling", mcmc = 10000) ## Run the slidingWindow DMRScanResults <- dmrscan(observations = regions, windowSize = windowSizes, windowThreshold = windowThresholds) ## Print the result print(DMRScanResults)
## nProbeoad methylation data from chromosome 22 data(DMRScan.methylationData) ## nProbeoad phenotype (end-point for methylation data) data(DMRScan.phenotypes) ## Test for an association between phenotype and Methylation test.statistics <- apply(DMRScan.methylationData,1,function(x,y) summary(glm(y ~ x, family = binomial(link = "logit")))$coefficients[2,3], y = DMRScan.phenotypes) ## Set chromosomal position to each test-statistic positions <- data.frame(matrix(as.integer(unlist(strsplit(names(test.statistics), split="chr|[.]"))), ncol = 3, byrow = TRUE))[,-1] ## Set clustering features min.cpg <- 4 ## Minimum number of CpGs in a tested cluster ## Maxium distance (in base-pairs) within a cluster ## before it is broken up into two seperate cluster max.gap <- 750 ## Identify all clusters, and generate a list for each cluster regions <- makeCpGregions(observations = test.statistics, chr = positions[,1], pos = positions[,2], maxGap = max.gap, minCpG = min.cpg) ## Number of CpGs in the slidingWindows, can be either a single number ## or a sequence of windowSizes windowSizes <- 3:7 nCpG <- sum(sapply(regions, length)) ## Number of CpGs to be tested # Estimate the windowThreshold, based on the number of CpGs and windowSizes windowThresholds <- estimateWindowThreshold(nProbe = nCpG, windowSize = windowSizes, method = "sampling", mcmc = 10000) ## Run the slidingWindow DMRScanResults <- dmrscan(observations = regions, windowSize = windowSizes, windowThreshold = windowThresholds) ## Print the result print(DMRScanResults)
Bi-sulfite sequencing data from all known CpG islands at chromosome 22 from 100 the Finish teens study, sampled from extreme BMI quantiles. The data set is reduced to 25139 sites on chromosome 22. See "Genome-wide DNA methylation in saliva and body size of adolescent girls", TB Rounge, CM Page, M Lepisto, E Pekka, and BK Andreassen and E Weiderpass, _Epigenomics_ 8.11 (2016): 1495-1505 for a full overview of the data set.
data(DMRScan.methylationData) head(DMRScan.methylationData)
data(DMRScan.methylationData) head(DMRScan.methylationData)
Accompanying phenotypes for the methylation data, indicating case- control status for the BMI quantiles. See "Genome-wide DNA methylation in saliva and body size of adolescent girls", TB Rounge, CM Page, M Lepisto, E Pekka, and BK Andreassen and E Weiderpass, _Epigenomics_ 8.11 (2016): 1495-1505 for a full description of the phenotypes.
data(DMRScan.phenotypes) table(DMRScan.phenotypes)
data(DMRScan.phenotypes) table(DMRScan.phenotypes)
Estimate window thresholds for sliding window, one unique value for each window size
estimateWindowThreshold(nProbe, windowSize, method = "siegmund", mcmc = 1000, nCPU = 1, submethod = "ar", ...)
estimateWindowThreshold(nProbe, windowSize, method = "siegmund", mcmc = 1000, nCPU = 1, submethod = "ar", ...)
nProbe |
The number of probes (CpGs) in the study. |
windowSize |
The different window sizes to be tested. Must be either one, or an ordered sequence of integers. |
method |
Gives the method by which the threshold is calculated. Can be either an analytical solution "siegmund", provided by Siegnumd et.al (2012), or an iterative process; either importance sampling "sampling", as suggested by Zhang (2012) or a full MCMC model "mcmc" which can account for any dependency structure, wich is pass to arima.sim, with ... |
mcmc |
The number of MCMC iterations to be used, when using either Important Sampling ("zhang") or MCMC estimation of the threshold. |
nCPU |
When calculating the thresholds on a cluster, how many CPUs should be used. This option is only compatible with the 'mcmc' method. |
submethod |
A character string indicating if an AR(5) or ARIMA model should be used. In the AR(5), the index runs from -2 to 2. A regular AR(p) model can be obtaine using ARIMA(p,0,0) instead. |
... |
Optinal parameters pased on to |
Returns a vector of the threshold for each window size
thresholdGrid <- estimateWindowThreshold(nProbe = 1000, windowSize = 3:8, method = "siegmund")
thresholdGrid <- estimateWindowThreshold(nProbe = 1000, windowSize = 3:8, method = "siegmund")
Cluster CpGs together in genes based on annotation
makeCpGgenes(observations, chr, pos, gene, minCpG = 2)
makeCpGgenes(observations, chr, pos, gene, minCpG = 2)
observations |
Vector of corresponding observed T-value for each CpG, must be ordered in the same way as chr and pos |
chr |
Vector of chromosome location for each CpG |
pos |
Vector giving base pair position for each CpG If unsorted, use order(chr,pos) to sort the genomic positions within each chromosome. |
gene |
A vector asigning each probe to a gene. |
minCpG |
Minimum number of CpGs allowed in each region to be considered. Default is set to at least 2 CpGs within each region. |
The suplied observations ordered into into a list, with one entry for each CpG region.
data(DMRScan.methylationData) ## Load methylation data from chromosome 22 data(DMRScan.phenotypes) ## Load phenotype (end-point for methylation data) ## Test for an association between phenotype and Methylation testStatistics <- apply(DMRScan.methylationData,1,function(x,y) summary(glm(y ~ x, family = binomial(link = "logit")))$coefficients[2,3], y = DMRScan.phenotypes) ## Set chromosomal position to each test-statistic pos <- data.frame(matrix(as.integer(unlist(strsplit(names(testStatistics), split="chr|[.]"))), ncol = 3, byrow = TRUE))[,-1] ## Set clustering features minCpG <- 3 ## Minimum number of CpGs in a tested cluster gene <- sample(paste("Gene",1:100,sep=""), length(testStatistics),replace=TRUE) regions <- makeCpGgenes(observations = testStatistics, chr = pos[,1], pos = pos[,2], gene = gene, minCpG = minCpG)
data(DMRScan.methylationData) ## Load methylation data from chromosome 22 data(DMRScan.phenotypes) ## Load phenotype (end-point for methylation data) ## Test for an association between phenotype and Methylation testStatistics <- apply(DMRScan.methylationData,1,function(x,y) summary(glm(y ~ x, family = binomial(link = "logit")))$coefficients[2,3], y = DMRScan.phenotypes) ## Set chromosomal position to each test-statistic pos <- data.frame(matrix(as.integer(unlist(strsplit(names(testStatistics), split="chr|[.]"))), ncol = 3, byrow = TRUE))[,-1] ## Set clustering features minCpG <- 3 ## Minimum number of CpGs in a tested cluster gene <- sample(paste("Gene",1:100,sep=""), length(testStatistics),replace=TRUE) regions <- makeCpGgenes(observations = testStatistics, chr = pos[,1], pos = pos[,2], gene = gene, minCpG = minCpG)
Cluster CpGs together in regions based on proximity
makeCpGregions(observations, chr, pos, maxGap = 500, minCpG = 2)
makeCpGregions(observations, chr, pos, maxGap = 500, minCpG = 2)
observations |
Vector of corresponding observed T-value for each CpG, must be ordered in the same way as chr and pos |
chr |
Vector of chromosome location for each CpG |
pos |
Vector giving base pair position for each CpG If unsorted, use order(chr,pos) to sort the genomic positions within each chromosome. |
maxGap |
Maximum allowed base pair gap within a cluster. Default is set to 500. |
minCpG |
Minimum number of CpGs allowed in each region to be considered. Default is set to at least 2 CpGs within each region. |
The suplied observations ordered into into a GRangesList object.
To be parsed further into dmrscan
data(DMRScan.methylationData) ## Load methylation data from chromosome 22 data(DMRScan.phenotypes) ## Load phenotype (end-point for methylation data) ## Test for an association between phenotype and Methylation testStatistics <- apply(DMRScan.methylationData,1,function(x,y) summary(glm(y ~ x, family = binomial(link = "logit")))$coefficients[2,3], y = DMRScan.phenotypes) ## Set chromosomal position to each test-statistic pos<- data.frame(matrix(as.integer(unlist(strsplit(names(testStatistics), split="chr|[.]"))), ncol = 3, byrow = TRUE))[,-1] ## Set clustering features minCpG <- 3 ## Minimum number of CpGs in a tested cluster ## Maxium distance (in base-pairs) within a cluster before it is ## broken up into two seperate cluster maxGap <- 750 regions <- makeCpGregions(observations = testStatistics, chr = pos[,1], pos = pos[,2], maxGap = maxGap, minCpG = minCpG)
data(DMRScan.methylationData) ## Load methylation data from chromosome 22 data(DMRScan.phenotypes) ## Load phenotype (end-point for methylation data) ## Test for an association between phenotype and Methylation testStatistics <- apply(DMRScan.methylationData,1,function(x,y) summary(glm(y ~ x, family = binomial(link = "logit")))$coefficients[2,3], y = DMRScan.phenotypes) ## Set chromosomal position to each test-statistic pos<- data.frame(matrix(as.integer(unlist(strsplit(names(testStatistics), split="chr|[.]"))), ncol = 3, byrow = TRUE))[,-1] ## Set clustering features minCpG <- 3 ## Minimum number of CpGs in a tested cluster ## Maxium distance (in base-pairs) within a cluster before it is ## broken up into two seperate cluster maxGap <- 750 regions <- makeCpGregions(observations = testStatistics, chr = pos[,1], pos = pos[,2], maxGap = maxGap, minCpG = minCpG)
Method Fixed window size scan for a sequence of window sizes
manyWindowSizeScanner(region, windowThreshold, windowSize) ## S4 method for signature 'GRangesList' manyWindowSizeScanner(region, windowThreshold, windowSize) ## S4 method for signature 'GRanges' manyWindowSizeScanner(region, windowThreshold, windowSize)
manyWindowSizeScanner(region, windowThreshold, windowSize) ## S4 method for signature 'GRangesList' manyWindowSizeScanner(region, windowThreshold, windowSize) ## S4 method for signature 'GRanges' manyWindowSizeScanner(region, windowThreshold, windowSize)
region |
Object of type GRanges |
windowThreshold |
Vector of window thresholds |
windowSize |
Vector of window sizes to be tested on regions |
A list of the windows that are significant
## Not run
## Not run
Method Fixed window size scan for one window size
oneWindowSizeScanner(region, windowThreshold, windowSize) ## S4 method for signature 'GRangesList' oneWindowSizeScanner(region, windowThreshold, windowSize) ## S4 method for signature 'GRanges' oneWindowSizeScanner(region, windowThreshold, windowSize)
oneWindowSizeScanner(region, windowThreshold, windowSize) ## S4 method for signature 'GRangesList' oneWindowSizeScanner(region, windowThreshold, windowSize) ## S4 method for signature 'GRanges' oneWindowSizeScanner(region, windowThreshold, windowSize)
region |
Object of type GRanges |
windowThreshold |
Vector of window thresholds |
windowSize |
Vector of window sizes to be tested on regions |
A list of the windows that are significant
## Not run
## Not run