Title: | Statistical Inferences with MeDIP-seq Data (SIMD) to infer the methylation level for each CpG site |
---|---|
Description: | This package provides a inferential analysis method for detecting differentially expressed CpG sites in MeDIP-seq data. It uses statistical framework and EM algorithm, to identify differentially expressed CpG sites. The methods on this package are described in the article 'Methylation-level Inferences and Detection of Differential Methylation with Medip-seq Data' by Yan Zhou, Jiadi Zhu, Mingtao Zhao, Baoxue Zhang, Chunfu Jiang and Xiyan Yang (2018, pending publication). |
Authors: | Yan Zhou |
Maintainer: | Jiadi Zhu <[email protected]> |
License: | GPL-3 |
Version: | 1.25.0 |
Built: | 2024-11-19 04:25:43 UTC |
Source: | https://github.com/bioc/SIMD |
SIMD is a package to infer the methylation expression level for each CpG sites.The main idea of SIMD is that by using statistical inference to with Medip-seq data method to infer the methylation level.
Zhou Yan Maintainer:Zhou Yan <[email protected]>
Zhou Y. (2018). Methylation-level inferences and detection of differential methylation with Medip-seq data.
Using EM algorithm to infer the real number of CpG sites.
EMalgorithm(cpgsitefile, allcpgfile, category = "1", writefile = NULL, reportfile = NULL)
EMalgorithm(cpgsitefile, allcpgfile, category = "1", writefile = NULL, reportfile = NULL)
cpgsitefile |
The path of file to store CpG site. |
allcpgfile |
The file to store CpG sites. |
category |
Default to "1". |
writefile |
The path of output results. (If writefile=NULL, there will return the results back to main program.) |
reportfile |
The path of output results. |
values or file If writefile is NULL, then return the values of results,otherwise output to write file.
datafile <- system.file("extdata", package="methylMnM") data(example_data) filepath <- datafile[1] allcpgfile <- EM_H1ESB1_MeDIP_sigleCpG dirwrite <- paste(setwd(getwd()), "/", sep="") readshort <- paste(filepath, "/H1ESB1_MeDIP_18.extended.txt", sep="") writefile <- paste(dirwrite, "EM2_H1ESB1_MeDIP_sigleCpG.bed", sep="") reportfile <- paste(dirwrite, "EM2_H1ESB1_MeDIP_sigleCpG_report.bed", sep="") f <- EMalgorithm(cpgsitefile=readshort, allcpgfile=allcpgfile, category="1", writefile=writefile, reportfile=reportfile)
datafile <- system.file("extdata", package="methylMnM") data(example_data) filepath <- datafile[1] allcpgfile <- EM_H1ESB1_MeDIP_sigleCpG dirwrite <- paste(setwd(getwd()), "/", sep="") readshort <- paste(filepath, "/H1ESB1_MeDIP_18.extended.txt", sep="") writefile <- paste(dirwrite, "EM2_H1ESB1_MeDIP_sigleCpG.bed", sep="") reportfile <- paste(dirwrite, "EM2_H1ESB1_MeDIP_sigleCpG_report.bed", sep="") f <- EMalgorithm(cpgsitefile=readshort, allcpgfile=allcpgfile, category="1", writefile=writefile, reportfile=reportfile)
Calculate the probability on condition that only a single CpG contributes to a short read.
emalgth(X)
emalgth(X)
X |
A matrix about X, the elements in X takes values on 0,1 and satisfy the sums of each row equal to 1. |
y1 The probability when sums equal to 1.
set.seed(123) d <- matrix(0, nrow=200, ncol=50) random_num <- sample(1:50, 200, replace=TRUE) for(i in 1:nrow(d)){ d[i,random_num[i]]<-1 } result <- emalgth(d) head(result)
set.seed(123) d <- matrix(0, nrow=200, ncol=50) random_num <- sample(1:50, 200, replace=TRUE) for(i in 1:nrow(d)){ d[i,random_num[i]]<-1 } result <- emalgth(d) head(result)
Calculate the probability on condition that at least a CpG contributes to a short read.
emalgth1(X)
emalgth1(X)
X |
A matrix about X, the elements in X takes values on 0,1 and satisfy the sums of each row more than 1. |
y1 The probability when sums more than 1.
set.seed(123) d <- matrix(0, nrow=200, ncol=50) random_num <- sample(1:10, 200, replace=TRUE) for(i in 1:nrow(d)){ temp <- sample(1:50, random_num[i], replace=FALSE) d[i,temp] <- 1 } result <- emalgth1(d) head(result)
set.seed(123) d <- matrix(0, nrow=200, ncol=50) random_num <- sample(1:10, 200, replace=TRUE) for(i in 1:nrow(d)){ temp <- sample(1:50, random_num[i], replace=FALSE) d[i,temp] <- 1 } result <- emalgth1(d) head(result)
Using statistical framework and EM algorithm to infer the methylation expression level of single sites.
EMtest(datafile = NULL, chrstring = NULL, cpgfile, mrecpgfile = NULL, writefile = NULL, reportfile = NULL, mreratio = 3/7, psd = 2, mkadded = 1, f = 1)
EMtest(datafile = NULL, chrstring = NULL, cpgfile, mrecpgfile = NULL, writefile = NULL, reportfile = NULL, mreratio = 3/7, psd = 2, mkadded = 1, f = 1)
datafile |
The files of sample. (datafile should be cbind(data1,data2, data3,data4), where data1 and data2 are Medip-seq data, data3 and data4 are MRE-seq data). |
chrstring |
The chromosome should be test. |
cpgfile |
The file of all CpG number. |
mrecpgfile |
The file of MRE-CpG number(If NULL, mrecpgfile will equal to cpgfile). |
writefile |
The path of file of output result. (If writefile=NULL, there will return the results back to main program) |
reportfile |
The path of output results of the number of bin, total reads before processing and total reads after processing. |
mreratio |
The ratio of total unmethylation level with total methylation level (Defaulted mreratio is 3/7). |
psd |
The parameters of pseudo count, which pseudo count added to Medip-seq and MRE-seq count. |
mkadded |
Added to all CpG and MRE CpG (We set psd=2 and mkadded=1 as defaulted for robust). |
f |
Adjustment weight, default to 1. |
values or file The output file "writefile" will own eleven columns, that is, "chr", "chrSt", "chrEnd", "Medip1", "Medip2", "MRE1", "MRE2", "cg","mrecg","pvalue" and "Ts". We also output a report file which will include parameters "s1/s2", "s3/s4", "N1", "N2", "N3", "N4", "c1", "c2", "Number of windows" and "Spend time".
data(example_data) data1 <- EM2_H1ESB1_MeDIP_sigleCpG data2 <- EM2_H1ESB2_MeDIP_sigleCpG data3 <- H1ESB1_MRE_sigleCpG data4 <- H1ESB2_MRE_sigleCpG datafile <- cbind(data1, data2, data3, data4) allcpg <- all_CpGsite_bin_chr18 mrecpg <- three_mre_cpg dirwrite <- paste(setwd(getwd()), "/", sep="") writefile <- paste(dirwrite, "pval_EM_H1ESB1_H1ESB21.bed", sep="") reportfile <- paste(dirwrite, "report_pvalH1ESB1_H1ESB21.bed", sep="") EMtest(datafile=datafile, chrstring=NULL, cpgfile=allcpg, mrecpgfile=mrecpg, writefile=writefile, reportfile=reportfile, mreratio=3/7, psd=2, mkadded=1, f=1)
data(example_data) data1 <- EM2_H1ESB1_MeDIP_sigleCpG data2 <- EM2_H1ESB2_MeDIP_sigleCpG data3 <- H1ESB1_MRE_sigleCpG data4 <- H1ESB2_MRE_sigleCpG datafile <- cbind(data1, data2, data3, data4) allcpg <- all_CpGsite_bin_chr18 mrecpg <- three_mre_cpg dirwrite <- paste(setwd(getwd()), "/", sep="") writefile <- paste(dirwrite, "pval_EM_H1ESB1_H1ESB21.bed", sep="") reportfile <- paste(dirwrite, "report_pvalH1ESB1_H1ESB21.bed", sep="") EMtest(datafile=datafile, chrstring=NULL, cpgfile=allcpg, mrecpgfile=mrecpg, writefile=writefile, reportfile=reportfile, mreratio=3/7, psd=2, mkadded=1, f=1)
Compute P-values.
probBinom(t, size1, size2, c1, c2)
probBinom(t, size1, size2, c1, c2)
t |
The real value for random variable according to dataset. |
size1 |
The sum of Medip-seq real reads of the each CpG site for control and treatment sample. |
size2 |
The sum of MRE-seq real reads of the each CpG site for control and treatment sample. |
c1 |
The scaling factor for MeDip-seq data. |
c2 |
The scaling factor for MRE-seq data. |
p The P-values for testing the methylation expression levels for each CpG sites.
set.seed(1234) t <- 0.1 size1 <- sample(1:1000, 1, replace=TRUE) size2 <- sample(1:1000, 1, replace=TRUE) c1 <- 1 c2 <- 2 result <- probBinom(t, size1, size2, c1, c2)
set.seed(1234) t <- 0.1 size1 <- sample(1:1000, 1, replace=TRUE) size2 <- sample(1:1000, 1, replace=TRUE) c1 <- 1 c2 <- 2 result <- probBinom(t, size1, size2, c1, c2)