Package 'SIMD'

Title: Statistical Inferences with MeDIP-seq Data (SIMD) to infer the methylation level for each CpG site
Description: This package provides a inferential analysis method for detecting differentially expressed CpG sites in MeDIP-seq data. It uses statistical framework and EM algorithm, to identify differentially expressed CpG sites. The methods on this package are described in the article 'Methylation-level Inferences and Detection of Differential Methylation with Medip-seq Data' by Yan Zhou, Jiadi Zhu, Mingtao Zhao, Baoxue Zhang, Chunfu Jiang and Xiyan Yang (2018, pending publication).
Authors: Yan Zhou
Maintainer: Jiadi Zhu <[email protected]>
License: GPL-3
Version: 1.25.0
Built: 2024-12-19 03:52:01 UTC
Source: https://github.com/bioc/SIMD

Help Index


A method to infer the methylation expression level for each CpG sites.

Description

SIMD is a package to infer the methylation expression level for each CpG sites.The main idea of SIMD is that by using statistical inference to with Medip-seq data method to infer the methylation level.

Author(s)

Zhou Yan Maintainer:Zhou Yan <[email protected]>

References

Zhou Y. (2018). Methylation-level inferences and detection of differential methylation with Medip-seq data.


EM algorithm to infer CpG sites.

Description

Using EM algorithm to infer the real number of CpG sites.

Usage

EMalgorithm(cpgsitefile, allcpgfile, category = "1", writefile = NULL,
  reportfile = NULL)

Arguments

cpgsitefile

The path of file to store CpG site.

allcpgfile

The file to store CpG sites.

category

Default to "1".

writefile

The path of output results. (If writefile=NULL, there will return the results back to main program.)

reportfile

The path of output results.

Value

values or file If writefile is NULL, then return the values of results,otherwise output to write file.

Examples

datafile <- system.file("extdata", package="methylMnM")
data(example_data)
filepath <- datafile[1]
allcpgfile <- EM_H1ESB1_MeDIP_sigleCpG
dirwrite <- paste(setwd(getwd()), "/", sep="")
readshort <- paste(filepath, "/H1ESB1_MeDIP_18.extended.txt", sep="")
writefile <- paste(dirwrite, "EM2_H1ESB1_MeDIP_sigleCpG.bed", sep="")
reportfile <- paste(dirwrite, "EM2_H1ESB1_MeDIP_sigleCpG_report.bed", sep="")
f <- EMalgorithm(cpgsitefile=readshort, allcpgfile=allcpgfile, category="1",
                 writefile=writefile, reportfile=reportfile)

Calculate the probability on condition that the sums equal to 1.

Description

Calculate the probability on condition that only a single CpG contributes to a short read.

Usage

emalgth(X)

Arguments

X

A matrix about X, the elements in X takes values on 0,1 and satisfy the sums of each row equal to 1.

Value

y1 The probability when sums equal to 1.

Examples

set.seed(123)
d <- matrix(0, nrow=200, ncol=50)
random_num <- sample(1:50, 200, replace=TRUE)
for(i in 1:nrow(d)){
    d[i,random_num[i]]<-1
}
result <- emalgth(d)
head(result)

Calculate the probability on condition that the sums more than 1.

Description

Calculate the probability on condition that at least a CpG contributes to a short read.

Usage

emalgth1(X)

Arguments

X

A matrix about X, the elements in X takes values on 0,1 and satisfy the sums of each row more than 1.

Value

y1 The probability when sums more than 1.

Examples

set.seed(123)
d <- matrix(0, nrow=200, ncol=50)
random_num <- sample(1:10, 200, replace=TRUE)
for(i in 1:nrow(d)){
    temp <- sample(1:50, random_num[i], replace=FALSE)
    d[i,temp] <- 1
}
result <- emalgth1(d)
head(result)

Inferring the methylation expression level of single sites.

Description

Using statistical framework and EM algorithm to infer the methylation expression level of single sites.

Usage

EMtest(datafile = NULL, chrstring = NULL, cpgfile, mrecpgfile = NULL,
  writefile = NULL, reportfile = NULL, mreratio = 3/7, psd = 2,
  mkadded = 1, f = 1)

Arguments

datafile

The files of sample. (datafile should be cbind(data1,data2, data3,data4), where data1 and data2 are Medip-seq data, data3 and data4 are MRE-seq data).

chrstring

The chromosome should be test.

cpgfile

The file of all CpG number.

mrecpgfile

The file of MRE-CpG number(If NULL, mrecpgfile will equal to cpgfile).

writefile

The path of file of output result. (If writefile=NULL, there will return the results back to main program)

reportfile

The path of output results of the number of bin, total reads before processing and total reads after processing.

mreratio

The ratio of total unmethylation level with total methylation level (Defaulted mreratio is 3/7).

psd

The parameters of pseudo count, which pseudo count added to Medip-seq and MRE-seq count.

mkadded

Added to all CpG and MRE CpG (We set psd=2 and mkadded=1 as defaulted for robust).

f

Adjustment weight, default to 1.

Value

values or file The output file "writefile" will own eleven columns, that is, "chr", "chrSt", "chrEnd", "Medip1", "Medip2", "MRE1", "MRE2", "cg","mrecg","pvalue" and "Ts". We also output a report file which will include parameters "s1/s2", "s3/s4", "N1", "N2", "N3", "N4", "c1", "c2", "Number of windows" and "Spend time".

Examples

data(example_data)
data1 <- EM2_H1ESB1_MeDIP_sigleCpG
data2 <- EM2_H1ESB2_MeDIP_sigleCpG
data3 <- H1ESB1_MRE_sigleCpG
data4 <- H1ESB2_MRE_sigleCpG
datafile <- cbind(data1, data2, data3, data4)
allcpg <- all_CpGsite_bin_chr18
mrecpg <- three_mre_cpg
dirwrite <- paste(setwd(getwd()), "/", sep="")
writefile <- paste(dirwrite, "pval_EM_H1ESB1_H1ESB21.bed", sep="")
reportfile <- paste(dirwrite, "report_pvalH1ESB1_H1ESB21.bed", sep="")
EMtest(datafile=datafile, chrstring=NULL, cpgfile=allcpg,
       mrecpgfile=mrecpg, writefile=writefile, reportfile=reportfile,
       mreratio=3/7, psd=2, mkadded=1, f=1)

Compute P-values for Medip-seq and MRE-seq data.

Description

Compute P-values.

Usage

probBinom(t, size1, size2, c1, c2)

Arguments

t

The real value for random variable according to dataset.

size1

The sum of Medip-seq real reads of the each CpG site for control and treatment sample.

size2

The sum of MRE-seq real reads of the each CpG site for control and treatment sample.

c1

The scaling factor for MeDip-seq data.

c2

The scaling factor for MRE-seq data.

Value

p The P-values for testing the methylation expression levels for each CpG sites.

Examples

set.seed(1234)
t <- 0.1
size1 <- sample(1:1000, 1, replace=TRUE)
size2 <- sample(1:1000, 1, replace=TRUE)
c1 <- 1
c2 <- 2
result <- probBinom(t, size1, size2, c1, c2)