Title: | Quality control and analysis tools for Illumina DNA methylation BeadChip |
---|---|
Description: | Tools for quanlity control, analysis and visulization of Illumina DNA methylation array data. |
Authors: | Zongli Xu [cre, aut], Liang Niu [aut], Jack Taylor [ctb] |
Maintainer: | Zongli Xu <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.43.2 |
Built: | 2024-11-28 03:12:55 UTC |
Source: | https://github.com/bioc/ENmix |
Convert methylation beta value to M value.
B2M(x)
B2M(x)
x |
An numeric matrix with values between 0 and 1 |
Methylation beta value is calculated as beta=M/(M+U+a). M is methylated intensity, U is unmethylated intensity, and a is a constant offset (by default , a=100). M value is calculated as M=log2((M+a)/(U+a)). M or U is usually greater than 1000, so a is negligible for most probes. if a=0, then M=log2(beta)/(1-beta).
A matrix of M values
Zongli Xu
if (require(minfiData)){ path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat=getmeth(rgSet) beta=getB(mdat,"Illumina") m=B2M(beta) }
if (require(minfiData)){ path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat=getmeth(rgSet) beta=getB(mdat,"Illumina") m=B2M(beta) }
Calculation of detection P values based on negtive internal control probes or out of the band (oob) probes
calcdetP(rgSet,detPtype = "negative")
calcdetP(rgSet,detPtype = "negative")
rgSet |
An object of class |
detPtype |
Calculation of detection P values based on negtive internal control ("negative") probes or out of the band ("oob") probes |
An numerical matrix of detection P values, with row for CpGs and column for samples
Zongli Xu
Wanding Zhou et al. SeSAMe: reducing artifactual detection of DNA methylation by Infinium BeadChips in genomic deletions, Nucleic Acids Research, 2018
path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) detp=calcdetP(rgSet,detPtype = "negative") detp2=calcdetP(rgSet,detPtype = "oob")
path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) detp=calcdetP(rgSet,detPtype = "negative") detp2=calcdetP(rgSet,detPtype = "oob")
To identify differentially methylated regions using a modified comb-p method
combp(data,dist.cutoff=1000,bin.size=310,seed=0.01, region_plot=TRUE,mht_plot=TRUE,nCores=10,verbose=TRUE)
combp(data,dist.cutoff=1000,bin.size=310,seed=0.01, region_plot=TRUE,mht_plot=TRUE,nCores=10,verbose=TRUE)
data |
A data frame with colname name "chr","start", "end","p" and "probe", indicating chromosome (1,2,3,...,X,Y), chromosome start and end position, P value and probe names |
dist.cutoff |
Maximum distance in base pair to combine adjacent DMRs |
bin.size |
bin size for autocorrelation calculation |
seed |
FDR significance threshold for initial selection of DMR region |
region_plot |
If TRUE, regional plots will be generated |
mht_plot |
If TRUE, mahattan plot will be generated |
nCores |
Number of computer cores will be used in calculation |
verbose |
If TRUE, detailed running information will be printed |
The input should be a data frame with column names "chr","start", "end","p", and "probe", indicating chromosome number, start position, end position, P value and probe name. The function use a modified comb-p method to identify differentially methylated regions. DMR results will be stored in a file with name resu_combp.csv. If plot options were selected, two figure files will be generated: mht.jpg and region_plot.pdf.
Liang Niu, Zongli Xu
Pedersen BS1, Schwartz DA, Yang IV, Kechris KJ. Comb-p: software for combining, analyzing, grouping and correcting spatially correlated P-values. Bioinfomatics 2012
Zongli Xu, Changchun Xie, Jack A. Taylor, Liang Niu, ipDMR: Identification of differentially methyl-ated regions with interval p-values, Bioinformatics 2020
dat=simubed() names(dat) #seed=0.1 is only for demonstration purpose, it should be smaller than 0.05 or 0.01 in actual study. combp(data=dat,seed=0.1)
dat=simubed() names(dat) #seed=0.1 is only for demonstration purpose, it should be smaller than 0.05 or 0.01 in actual study. combp(data=dat,seed=0.1)
Surrogate variables derived based on intensity data for non-negative internal control probes.
ctrlsva(rgSet,percvar=0.95,npc=1,flag=1)
ctrlsva(rgSet,percvar=0.95,npc=1,flag=1)
rgSet |
An object of class |
percvar |
Minimum percentage of data variations can be explained by surrogate variables, range from 0 to 1,default is 0.95 |
npc |
Number of surrogate variables, default is 1 |
flag |
1: select number of surrogate variables based on argument
|
The function will return an numerical matrix with columns indicating surrogate variables and rows corresponding to samples. These variables can be used in association analysis to adjust for experimental batch effects.
Zongli Xu
Zongli Xu, Liang Niu, Leping Li and Jack A. Taylor, ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Research 2015.
if (require(minfiData)) { path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) sva<-ctrlsva(rgSet) }
if (require(minfiData)) { path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) sva<-ctrlsva(rgSet) }
The function can be used to evaluate duplicate samples by calculating: 1) centered/un-centered Pearson's correlation coefficient between duplicates; 2) abosolute difference between duplicates; 3) ICC for each CpG probes using oneway or twoway model.
dupicc(dat,dupid,mvalue=FALSE,center=TRUE,nCores=2,qcflag=FALSE,qc=NULL, detPthre=0.05,nbthre=3,skipicc=FALSE,corfig=FALSE,model="oneway")
dupicc(dat,dupid,mvalue=FALSE,center=TRUE,nCores=2,qcflag=FALSE,qc=NULL, detPthre=0.05,nbthre=3,skipicc=FALSE,corfig=FALSE,model="oneway")
dat |
Methylation beta value matrix |
dupid |
A data frame with two variables, id1 and id2. The two ids in each row indicate a duplicate pair. These ids should be the same with column names of the input methylation matrix |
mvalue |
If TRUE, the beta value will be converted to M value for calculation of ICC |
center |
If TRUE, the methylation beta values will be centered for each CpG before calculation of ICC or correlation |
nCores |
Number of cores will be used for calculation of ICC |
qcflag |
Whether to perform QC before calculation of ICC |
qc |
QC object from function QCinfo |
detPthre |
If qcflag=TRUE, the methylation values with detection P value higher than the threshold will be removed before calculation |
nbthre |
If qcflag=TRUE, the methylation values with number of bead smaller than the threshold will be removed |
skipicc |
If TRUE, ICC calculation will be skipped |
corfig |
If TRUE, a figure will be generated to demonstrate correlations within duplicates or within non-duplicates |
model |
Using "oneway" or "twoway" model to calculate ICC |
icc: a data frame containing ICC and P values for each probe
dupcor: a data frame containing Pearson's correlation and averaged absolute difference between duplicates.
Zongli Xu
Zongli Xu, Jack A Taylor. Reliability of DNA methylation measures using Illumina methylation BeadChip. Epigenetics 2020
if (require(minfiData)){ path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat=getmeth(rgSet) beta=getB(mdat,"Illumina") dupidx=data.frame(id1=c("5723646052_R02C02","5723646052_R04C01","5723646052_R05C02"), id2=c("5723646053_R04C02","5723646053_R05C02","5723646053_R06C02")) iccresu<-dupicc(dat=beta,dupid=dupidx) }
if (require(minfiData)){ path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat=getmeth(rgSet) beta=getB(mdat,"Illumina") dupidx=data.frame(id1=c("5723646052_R02C02","5723646052_R04C01","5723646052_R05C02"), id2=c("5723646053_R04C02","5723646053_R05C02","5723646053_R06C02")) iccresu<-dupicc(dat=beta,dupid=dupidx) }
To estimates relative proportion of underlying cell types in a sample based on reference methylation data of pure cell types.
estimateCellProp(userdata,refdata="FlowSorted.Blood.450k", cellTypes=NULL,nonnegative = TRUE,nProbes=50, normalize=TRUE,refplot=FALSE)
estimateCellProp(userdata,refdata="FlowSorted.Blood.450k", cellTypes=NULL,nonnegative = TRUE,nProbes=50, normalize=TRUE,refplot=FALSE)
userdata |
The input can be |
refdata |
Reference data set will used. Current option: "FlowSorted.Blood.450k", "FlowSorted.DLPFC.450k","FlowSorted.CordBlood.450k", "FlowSorted.CordBloodCombined.450k", "FlowSorted.CordBloodNorway.450k" or "FlowSorted.Blood.EPIC". |
cellTypes |
Specific set of cell type data in reference data will be used for deconvolution. if "NULL" all cell types data will be used. see details for possible cell types |
normalize |
TRUE or FALSE, if TRUE, quantile normalization on methylated and unmethylated intensities will be performed. |
nonnegative |
TRUE or FALSE. If TRUE, the estimated proportions will be constrained to nonnegative values |
nProbes |
Number of best probes for each cell types will be used for the estimation. |
refplot |
TRUE or FALSE. IF TRUE, refdata distribution and heatmap will be plotted for inspection of reference dataset. |
This function use the method of Houseman et al (2012) to estimate cell type proportions based on reference DNA methylation data.
The following reference datasets can be used to assist the estimation. User should select a reference most resemble to user's data in tissue, age, and array type.
FlowSorted.Blood.450k: consisting of 450K methylation data for 60 blood samples from 6 male adults. Six samples for each of the cell types: Bcell CD4T CD8T Eos Gran Mono Neu NK PBMC WBC; See Reinius et al. 2012 for details.
FlowSorted.CordBlood.450k: consisting of 450k methylation data for 104 cord blood samples from 17 male and female individuals. Cell type (# samples) are: Bcell(15) CD4T(15) CD8T(14) Gran(12) Mono(15) NK(14) nRBC(4) WholeBlood(15). See Bakulski et al. Epigenetics 2016 for details.
FlowSorted.CordBloodNorway.450k: consisting of 450K methylation data for 77 cord blood samples from 11 individuals (6 girls and 5 boys). 11 samples for each of the cell types: Bcell CD4T CD8T Gran Mono NK WBC. See P Yousefi et al Environ. Mol. Mutagen 2015 for details.
FlowSorted.Blood.EPIC: consisting of EPIC methylation data for 37 magnetically sorted blood cell references from 12 individuals. See LA Salas et al. 2018 for details.
FlowSorted.DLPFC.450k: consisting of 450K methylation data for 58 brain tissue samples from 29 individuals. 15 females and 14 males, 6 Africans and 23 Caucasians, age range from 13 to 79. 29 samples for each of the cell types: NeuN_neg and NeuN_pos. See Guintivano et al. 2013 for details.
FlowSorted.CordBloodCombined.450k: consisting of 289 combined umbilical cord blood cells samples assayed by Bakulski et al, Gervin et al., de Goede et al., and Lin et al. see https://github.com/immunomethylomics/ FlowSorted.CordBloodCombined.450k. for details.
The output is a data frame composed of the estimates of cell type proportions with columns indicate cell types and rows indicate samples.
Zongli Xu
EA Houseman, WP Accomando, DC Koestler, BC Christensen, CJ Marsit, HH Nelson, JK Wiencke and KT Kelsey. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC bioinformatics (2012) 13:86.
require(minfiData) path <- file.path(find.package("minfiData"),"extdata") #based on rgDataset rgSet <- readidat(path = path,recursive = TRUE) celltype=estimateCellProp(userdata=rgSet,refdata="FlowSorted.Blood.450k", nonnegative = TRUE,normalize=TRUE) #using methDataSet qc=QCinfo(rgSet) mdat<-preprocessENmix(rgSet, bgParaEst="oob", dyeCorr="RELIC", QCinfo=qc, nCores=6) celltype=estimateCellProp(userdata=mdat,refdata="FlowSorted.Blood.450k", nonnegative = TRUE,normalize=TRUE) mdat<-norm.quantile(mdat, method="quantile1") #using beta value beta<-rcp(mdat,qcscore=qc) celltype=estimateCellProp(userdata=beta,refdata="FlowSorted.Blood.450k", nonnegative = TRUE)
require(minfiData) path <- file.path(find.package("minfiData"),"extdata") #based on rgDataset rgSet <- readidat(path = path,recursive = TRUE) celltype=estimateCellProp(userdata=rgSet,refdata="FlowSorted.Blood.450k", nonnegative = TRUE,normalize=TRUE) #using methDataSet qc=QCinfo(rgSet) mdat<-preprocessENmix(rgSet, bgParaEst="oob", dyeCorr="RELIC", QCinfo=qc, nCores=6) celltype=estimateCellProp(userdata=mdat,refdata="FlowSorted.Blood.450k", nonnegative = TRUE,normalize=TRUE) mdat<-norm.quantile(mdat, method="quantile1") #using beta value beta<-rcp(mdat,qcscore=qc) celltype=estimateCellProp(userdata=beta,refdata="FlowSorted.Blood.450k", nonnegative = TRUE)
Similar to histogram, frequency polygon plot can be used to inspect data distribution.
freqpoly(mat, nbreaks=15, col="black", xlab="", ylab="Frequency", type="l",append=FALSE,...)
freqpoly(mat, nbreaks=15, col="black", xlab="", ylab="Frequency", type="l",append=FALSE,...)
mat |
A numeric vector |
nbreaks |
Number of bins for frequency counting |
col |
color code |
xlab |
x-axis lable |
ylab |
y-axis lable |
type |
character indicating the type of plotting; actually any of the 'type's as in 'plot.default'. |
append |
TRUE or FALSE, whether to create a new figure or append to the current figure. |
... |
Further arguments that get passed to the function "plot" |
Frequency polygon plot.
Zongli Xu
Zongli Xu, Liang Niu, Leping Li and Jack A. Taylor, ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Research 2015.
freqpoly(rnorm(1000))
freqpoly(rnorm(1000))
Extract Methylation Beta value, Beta = Meth / (Meth + Unmeth + offset)
getB(mdat,type="Illumina",offset=100)
getB(mdat,type="Illumina",offset=100)
mdat |
An object of class |
type |
type="Illumina" sets offset=100 as per Genome Studio software. |
offset |
Regularization factor in calculating beta ratio, 100 in default |
Methylation Beta value = Meth / (Meth + Unmeth + offset). Meth is methylated intensity matrix, Unmeth is unmethylated intensity matrix.
Zongli Xu
if (require(minfiData)){ path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat=getmeth(rgSet) beta=getB(mdat,"Illumina") }
if (require(minfiData)){ path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat=getmeth(rgSet) beta=getB(mdat,"Illumina") }
Extract CpG probe annotation inforamtion from an rgDataSet
getCGinfo(rgSet, type="IandII")
getCGinfo(rgSet, type="IandII")
rgSet |
An object of class |
type |
One of the following options "I","II","IandII","ctrl", indicating type I, type II type I & II or control probes type |
An object of data frame class
Zongli Xu
require(minfiData) path <- file.path(find.package("minfiData"),"extdata") #based on rgDataset rgSet <- readidat(path = path,recursive = TRUE) cginfo=getCGinfo(rgSet,type="IandII")
require(minfiData) path <- file.path(find.package("minfiData"),"extdata") #based on rgDataset rgSet <- readidat(path = path,recursive = TRUE) cginfo=getCGinfo(rgSet,type="IandII")
To create a methDataSet based on a rgDataset
getmeth(rgSet)
getmeth(rgSet)
rgSet |
An object of class |
CpG annotation information is incorporated in the output methDataSet
object, and can be extracted using command rowData().
An object of class methDataSet
Zongli Xu
require(minfiData) path <- file.path(find.package("minfiData"),"extdata") #based on rgDataset rgSet <- readidat(path = path,recursive = TRUE) meth=getmeth(rgSet) meth cginfo=rowData(meth)
require(minfiData) path <- file.path(find.package("minfiData"),"extdata") #based on rgDataset rgSet <- readidat(path = path,recursive = TRUE) meth=getmeth(rgSet) meth cginfo=rowData(meth)
To identify differentially methylated regions using an interval P value method
ipdmr(data,include.all.sig.sites=TRUE,dist.cutoff=1000,bin.size=50, seed=0.05,region_plot=TRUE,mht_plot=TRUE,verbose=TRUE)
ipdmr(data,include.all.sig.sites=TRUE,dist.cutoff=1000,bin.size=50, seed=0.05,region_plot=TRUE,mht_plot=TRUE,verbose=TRUE)
data |
A data frame with colname name "chr","start", "end","p" and "probe", indicating chromosome (1,2,3,...,X,Y), chromosome start and end position, P value and probe names |
include.all.sig.sites |
Whether to use CpG singletons in calculation of FDR |
dist.cutoff |
Maximum distance in base pair to combine adjacent DMRs, and the maximum distance between CpGs where auto-correlation will be calculated |
bin.size |
bin size for autocorrelation calculation |
seed |
FDR threshold for initial selection of DMR regions |
region_plot |
If TRUE, regional plots will be produced for each DMR |
mht_plot |
If TRUE, a p-value mahattan plot with marked DMRs will be produced |
verbose |
Whether to output detailed information |
The input should be a data frame with column names "chr","start", "end","p" and "probe", indicating chromosome, start and end position, P value and probe name. The function will use a novel interval p value method to identify differentially methylated regions. DMR results will be stored in a file with name resu_ipdmr.csv. If plot options were selected, two figure files will be generated: mht.jpg and region_plot.pdf.
Liang Niu, Zongli Xu
Zongli Xu, Changchun Xie, Jack A. Taylor, Liang Niu, ipDMR: Identification of differentially methyl-ated regions with interval p-values, Bioinfomatics 2020
dat=simubed() names(dat) #seed=0.1 is only for demonstration purpose, it should be smaller than 0.05 or 0.01 in actual study. ipdmr(data=dat,seed=0.1) #seed=0.1
dat=simubed() names(dat) #seed=0.1 is only for demonstration purpose, it should be smaller than 0.05 or 0.01 in actual study. ipdmr(data=dat,seed=0.1) #seed=0.1
Converting methylation M value to methylation beta value.
M2B(x)
M2B(x)
x |
An numeric matrix |
Methylation beta value is calculated as beta=M/(M+U+a). M is methylated intensity, U is unmethylated intensity, and a is a constant offset (by default , a=100). M value is calculated as M=log2((M+a)/(U+a)). M or U is usually greater than 1000, so a is negligible for most probes. if a=0, then beta=2^M/(2^M+1).
A matrix of methylation Beta values.
Zongli Xu
if (require(minfiData)){ path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat=getmeth(rgSet) beta=getB(mdat,"Illumina") m=B2M(beta) b=M2B(m) }
if (require(minfiData)){ path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat=getmeth(rgSet) beta=getB(mdat,"Illumina") m=B2M(beta) b=M2B(m) }
"methDataSet"
A class for storing Illumina methylation array methylated and unmethylated intensity data, and CpG annotation infomation.
methDataSet(Meth = new("matrix"), Unmeth = new("matrix"), rowData=new("DataFrame"),...)
methDataSet(Meth = new("matrix"), Unmeth = new("matrix"), rowData=new("DataFrame"),...)
Meth |
A matrix of methylated intensity values with row for CpGs and column for samples |
Unmeth |
A matrix of unmethylated intensity values with row for CpGs and column for samples |
rowData |
A dataframe contains CpG annotation information |
... |
Other arguments for class SummarizedExperiment |
CpG annotation information is incorporated in the object, and can
be extracted using command rowData
An object of class methDataSet
showClass("methDataSet")
showClass("methDataSet")
To calculate various methylation predictors, including DNA methylation age, plasma protein levels, exposures etc.
methscore(datMeth,datPheno=NULL,fastImputation=FALSE,normalize=TRUE, GrimAgeComponent=NULL,UserRef=NULL,ForceUserRef=FALSE)
methscore(datMeth,datPheno=NULL,fastImputation=FALSE,normalize=TRUE, GrimAgeComponent=NULL,UserRef=NULL,ForceUserRef=FALSE)
datMeth |
Methylation beta value matrix with CpG names(row names) and sample ids(column names). |
datPheno |
Phenotype data, must include columns SampleID, Age (year) and Female (0:male,1:female). Age and Female information are required to calculate DNAmFitAge and PCClocks |
fastImputation |
If "TRUE" mean methylation values will used for imputation, if "FALSE", KNN nearest neighbor method will be used. |
normalize |
TRUE or FALSE, if TRUE, user data will be normalized to a reference data using a modified RCP method |
GrimAgeComponent |
A data frame of grimage component from methylation age online calculator (https://dnamage.clockfoundation.org/user/login). It must include the following variables, ("SampleID","DNAmADM", "DNAmB2M", "DNAmCystatinC", "DNAmGDF15", "DNAmLeptin", "DNAmPACKYRS", "DNAmPAI1", "DNAmTIMP1","DNAmGrimAge"). If the file is not provided, bAge and DNAmFitAge will be calculated using PC Grimage components |
UserRef |
User provided methylation reference data will be used for some predictors (see details), must include variables cg and meth_mean |
ForceUserRef |
If TRUE, UserRef will be forced to use in normalization and imputation for all estimates |
The original publications (see references) provided reference methylation mean values for most methylation predictors, and thus these values were used directly for normalization and imputations. For the following predictors without reference value in their original publications,
PEDBE,EpiToc,EpiToc2,Zhang10CpG,Horvath2,MiAge,DNAmTL,PEDBE,GACPC, GARPC,GARRPC,Bohlin and Knight, we used Sister Study data (~5000 samples) to derive a set of reference methylatin values. If UserRef was provided, it will be used to replace the Sister Study reference values for these predictors. If set ForceUserRef=TRUE, UserRef will be used for all predictors in normalization and CpG imputation.
A data frame with rows for samples and columns for methylation predictors. Output file "summary_CpG_count.csv" has information about CpG counts and reference for each predictor.
Zongli Xu
Belsky et al. DunedinPACE, a DNA methylation biomarker of the pace of aging. Elife. 2022 Jan 14;11:e73420. doi: 10.7554/eLife.73420
Bernabeu et al. Refining epigenetic prediction of chronological and biological age. Genome Med. 2023 Feb 28;15(1):12. doi: 10.1186/s13073-023-01161-y
Bohlin et al. Prediction of gestational age based on genome-wide differentially methylated regions.Genome Biol. 2016 Oct 7;17(1):207. doi: 10.1186/s13059-016-1063-4.
Hannum et al. Genome-wide methylation profiles reveal quantitative views of human aging rates.Mol Cell. 2013 Jan 24;49(2):359-367. doi: 10.1016/j.molcel.2012.10.016. Epub 2012 Nov 21.
Higgins-Chen et al. A computational solution for bolstering reliability of epigenetic clocks: Implications for clinical trials and longitudinal tracking. Nat Aging. 2022 Jul; 2(7): 644–661.
Horvath. DNA methylation age of human tissues and cell types. Genome Biol. 2013;14(10):R115. doi: 10.1186/gb-2013-14-10-r115.
Horvath et al. Epigenetic clock for skin and blood cells applied to Hutchinson Gilford Progeria Syndrome and ex vivo studies.Aging (Albany NY). 2018 Jul 26;10(7):1758-1775. doi: 10.18632/aging.101508.
Knight et al. An epigenetic clock for gestational age at birth based on blood methylation data. Genome Biol. 2016 Oct 7;17(1):206. doi: 10.1186/s13059-016-1068-z.
Lee et al. Placental epigenetic clocks: estimating gestational age using placental DNA methylation levels.Aging (Albany NY). 2019 Jun 30; 11(12): 4238–4253.
Levine et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging (Albany NY). 2018 Apr; 10(4): 573–591.
Li et al. Derivation and validation of an epigenetic frailty risk score in population-based cohorts of older adults. Nat Commun. 2022 Sep 7;13(1):5269. doi: 10.1038/s41467-022-32893-x.
Lu et al. DNA methylation-based estimator of telomere length.Aging (Albany NY). 2019 Aug 31; 11(16): 5895–5923.
McCartney et al. Epigenetic prediction of complex traits and death.Genome Biol. 2018 Sep 27;19(1):136. doi: 10.1186/s13059-018-1514-1.
McGreevy et al. DNAmFitAge: biological age indicator incorporating physical fitness. Aging(Albany NY). 2023 Feb 22;15(10):3904-3938. doi: 10.18632/aging.204538.
McEwen et al. The PedBE clock accurately estimates DNA methylation age in pediatric buccal cells.Proc Natl Acad Sci U S A. 2020 Sep 22; 117(38): 23329–23335.
Teschendorff et al. A comparison of epigenetic mitotic-like clocks for cancer risk prediction.Genome Med. 2020 Jun 24;12(1):56. doi: 10.1186/s13073-020-00752-3.
Yang et al. Correlation of an epigenetic mitotic clock with cancer risk. Genome Biol. 2016 Oct 3;17(1):205. doi: 10.1186/s13059-016-1064-3.
Youn et al. The MiAge Calculator: a DNA methylation-based mitotic age calculator of human tissue types.Epigenetics. 2018; 13(2): 192–206.
Zhang et al. Improved precision of epigenetic clock estimates across tissues and its implication for biological ageing.Genome Med. 2019 Aug 23;11(1):54. doi: 10.1186/s13073-019-0667-1.
Zhang et al. DNA methylation signatures in peripheral blood strongly predict all-cause mortality.Nat Commun. 2017 Mar 17;8:14617. doi: 10.1038/ncomms14617.
require(minfiData) path <- file.path(find.package("minfiData"),"extdata") #based on rgDataset rgSet <- readidat(path = path,recursive = TRUE) meth=getmeth(rgSet) beta=getB(meth) pheno=data.frame(SampleID=colnames(beta),Age=c(23,45,52,36,58,43),Female=c(1,0,1,1,0,1)) mage=methscore(datMeth=beta,datPheno=pheno)
require(minfiData) path <- file.path(find.package("minfiData"),"extdata") #based on rgDataset rgSet <- readidat(path = path,recursive = TRUE) meth=getmeth(rgSet) beta=getB(meth) pheno=data.frame(SampleID=colnames(beta),Age=c(23,45,52,36,58,43),Female=c(1,0,1,1,0,1)) mage=methscore(datMeth=beta,datPheno=pheno)
To calculate Methylation Age using Hovath, Hannum or PhenoAge methods and pace of aging DunedinPACE.
methyAge(beta,fastImputation=FALSE,normalize=TRUE,nCores=2)
methyAge(beta,fastImputation=FALSE,normalize=TRUE,nCores=2)
beta |
Methylation beta value matrix with CpG names(row names) and sample ids(column names). |
fastImputation |
If "TRUE" reference methylation values will used for imputation, if "FALSE", KNN nearest neighbor method will be used. |
normalize |
TRUE or FALSE, if TRUE, Hovath modified BMIQ method will be used to perform normalization. |
nCores |
Number of cores will be used for normalization |
A data frame with rows for sample and columns for types of methylation age.
Zongli Xu
Horvath S. DNA methylation age of human tissues and cell types. Genome biology 2013 14:R115.
Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Molecular cell 2013 49:359-367.
Levine ME, Lu AT, Quach A, Chen BH, Assimes TL, Bandinelli S, et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging (Albany NY) 2018 10:573-591.
Daniel W Belsky, Avshalom Caspi, David L Corcoran,et al. DunedinPACE, a DNA methylation biomarker of the pace of aging. eLife, 2022
require(minfiData) path <- file.path(find.package("minfiData"),"extdata") #based on rgDataset rgSet <- readidat(path = path,recursive = TRUE) meth=getmeth(rgSet) beta=getB(meth) mage=methyAge(beta)
require(minfiData) path <- file.path(find.package("minfiData"),"extdata") #based on rgDataset rgSet <- readidat(path = path,recursive = TRUE) meth=getmeth(rgSet) beta=getB(meth) mage=methyAge(beta)
P value manhattan plot
mhtplot(probe=NULL,chr=NULL, pos=NULL, p=NULL,color="bg",sigthre=NULL, sigthre2=NULL,threlty=c(2,1),markprobe=NULL,markcolor="red", outf="mht", outfmt="jpg",reducesize=0,...)
mhtplot(probe=NULL,chr=NULL, pos=NULL, p=NULL,color="bg",sigthre=NULL, sigthre2=NULL,threlty=c(2,1),markprobe=NULL,markcolor="red", outf="mht", outfmt="jpg",reducesize=0,...)
probe |
probe name |
chr |
Chromosome, 1,2,...,22,X,Y |
pos |
Chromosome positions |
p |
P values |
color |
Color scheme of manhattan plot, "bg" indicate "black and gray" |
sigthre |
P value of significant threshold line |
sigthre2 |
P value of second significant threshold line |
threlty |
Threshold line type, default is c(2,1) |
markprobe |
A list of CpGs to be marked out |
markcolor |
Color code for marked probe, "red" in default |
outf |
figure file name, default "mht" |
outfmt |
Output figure file format, "jpg" or "eps" |
reducesize |
A positive interger, larger the value, smaller the eps file size. Smaller file size is achieved by skipping some densely packed data points |
... |
Arguments pass to plot |
Zongli Xu
dat=simubed() thre1=1E-100 dat$fdr=p.adjust(mrgd$P, method="BH") if(sum(dat$fdr<0.05)>0){thre1=max(dat$p[dat$fdr<0.05])} thre2=1E-7 mprobe=dat$probe[dat$p<=thre1] mhtplot(probe=dat$probe,chr=dat$chr,pos=dat$start,p=dat$p,sigthre=thre1,sigthre2=thre2, markprobe=mprobe,outf="mht_try",outfmt="jpg")
dat=simubed() thre1=1E-100 dat$fdr=p.adjust(mrgd$P, method="BH") if(sum(dat$fdr<0.05)>0){thre1=max(dat$p[dat$fdr<0.05])} thre2=1E-7 mprobe=dat$probe[dat$p<=thre1] mhtplot(probe=dat$probe,chr=dat$chr,pos=dat$start,p=dat$p,sigthre=thre1,sigthre2=thre2, markprobe=mprobe,outf="mht_try",outfmt="jpg")
The pipeline performs background correction, dye bias correction, inter-array normalization and probe type bias correction for HumanMethylation 450 and MethylationEPIC BeadChip data. It removes or mitigates background noise and systematic experimental bias,It also perform quality controls, identifing and excluding low quality samples and probes, removing low quality and outlier values, and performing imputation.
mpreprocess(rgSet,nCores=2,bgParaEst="oob",dyeCorr="RELIC", qc=TRUE,qnorm=TRUE,qmethod="quantile1", fqcfilter=FALSE,rmcr=FALSE,impute=FALSE)
mpreprocess(rgSet,nCores=2,bgParaEst="oob",dyeCorr="RELIC", qc=TRUE,qnorm=TRUE,qmethod="quantile1", fqcfilter=FALSE,rmcr=FALSE,impute=FALSE)
rgSet |
An object of class |
nCores |
Number of cores will be used for computation |
bgParaEst |
Method to estimate background normal distribution parameters. Possible options: "oob","est", or "neg". |
dyeCorr |
Dye bias correction, "mean": correction based on averaged red/green ratio; or "RELIC": correction with RELIC method; or "none": no dye bias correction. The default is RELIC |
qc |
If TRUE, QC will be performed. Low quality samples and CpGs will be excluded before background correction. |
qnorm |
If TRUE, inter-array quantile normalization will be performed. |
qmethod |
Quantile normalization method. This should be one of the following strings: "quantile1", "quantile2", or "quantile3". See details in function norm.quantile. |
fqcfilter |
If TRUE, outlier and low quality values will be filtered out. |
rmcr |
TRUE: excluded rows and columns with more than 5% of missing values. FALSE is in default |
impute |
Whether to impute missing values. If TRUE, k-nearest neighbor's methods will be used for imputation. FALSE is in default. |
Fuction mpreprocess is a pipeline that perform methylaiton data preprocessing and quality controls using functions: preprocessENmix, norm.quantile, rcp, QCinfo and qcfilter. More customized preprocessing steps can be achieved using the individual functions, see user's guide.
A methylation beta value matrix with rows for CpGs and columns for samples.
Zongli Xu
Zongli Xu, Liang Niu, Leping Li and Jack A. Taylor, ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Research 2015.
Zongli Xu, Sabine A. S. Langie, Patrick De Boever, Jack A. Taylor1 and Liang Niu, RELIC: a novel dye-bias correction method for Illumina Methylation BeadChip, BMC Genomics, 2017
Liang Niu, Zongli Xu and Jack A. Taylor: RCP: a novel probe design bias correction method for Illumina Methylation BeadChip, Bioinformatics 2016
Package minfi
for classes RGChannelSet
and MethylSet
if (require(minfiData)) { #rgDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) beta=mpreprocess(rgSet,nCores=6,qc=TRUE,fqcfilter=TRUE,rmcr=TRUE,impute=TRUE) #methDataSet as input mdat=getmeth(rgSet) beta=mpreprocess(mdat,nCores=6) #RGChannelSet as input beta=mpreprocess(RGsetEx,nCores=6) #RGChannelSetExtended as input sheet <- read.metharray.sheet(file.path(find.package("minfiData"),"extdata"), pattern = "csv$") rgSet <- read.metharray.exp(targets = sheet,extended = TRUE) beta=mpreprocess(rgSet,nCores=6,qc=TRUE,fqcfilter=TRUE,rmcr=TRUE,impute=TRUE) #MethylSet as input mdat=preprocessRaw(rgSet) beta=mpreprocess(mdat,nCores=6) }
if (require(minfiData)) { #rgDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) beta=mpreprocess(rgSet,nCores=6,qc=TRUE,fqcfilter=TRUE,rmcr=TRUE,impute=TRUE) #methDataSet as input mdat=getmeth(rgSet) beta=mpreprocess(mdat,nCores=6) #RGChannelSet as input beta=mpreprocess(RGsetEx,nCores=6) #RGChannelSetExtended as input sheet <- read.metharray.sheet(file.path(find.package("minfiData"),"extdata"), pattern = "csv$") rgSet <- read.metharray.exp(targets = sheet,extended = TRUE) beta=mpreprocess(rgSet,nCores=6,qc=TRUE,fqcfilter=TRUE,rmcr=TRUE,impute=TRUE) #MethylSet as input mdat=preprocessRaw(rgSet) beta=mpreprocess(mdat,nCores=6) }
Produce Frequency polygon plot for each column of a numeric data matrix. Similar to multidensity function, the plot can be used to inspect data distribution but with much faster speed.
multifreqpoly(mat, nbreaks=100, col=1:ncol(mat), xlab="", ylab="Frequency",legend = list(x = "top", fill=col, legend = if(is.null(colnames(mat))) paste(1:ncol(mat)) else colnames(mat)),append=FALSE,...)
multifreqpoly(mat, nbreaks=100, col=1:ncol(mat), xlab="", ylab="Frequency",legend = list(x = "top", fill=col, legend = if(is.null(colnames(mat))) paste(1:ncol(mat)) else colnames(mat)),append=FALSE,...)
mat |
A numeric matrix |
nbreaks |
The number of bins for frequency counting |
col |
Line plot color code, the length should be equal to the number of columns in mat |
xlab |
x-axis lable |
ylab |
y-axis lable |
legend |
A list of arguments that get passed to the function "legend" |
append |
TRUE or FALSE, whether to create a new figure or append to the current figure. |
... |
Further arguments that get passed to the function "plot" |
Frequency polygon plots.
Zongli Xu
Zongli Xu, Liang Niu, Leping Li and Jack A. Taylor, ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Research 2015.
x=matrix(rnorm(10000),nrow=2000,ncol=5) multifreqpoly(x,nbreaks=15,legend=list(x="topright",fill=1:ncol(x),legend=paste("V",1:5,sep=""))) if (require(minfiData)) { path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mraw <- getmeth(rgSet) beta<-getB(mraw) jpeg("dist_raw.jpg") multifreqpoly(beta,col=1:ncol(beta)) dev.off() }
x=matrix(rnorm(10000),nrow=2000,ncol=5) multifreqpoly(x,nbreaks=15,legend=list(x="topright",fill=1:ncol(x),legend=paste("V",1:5,sep=""))) if (require(minfiData)) { path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mraw <- getmeth(rgSet) beta<-getB(mraw) jpeg("dist_raw.jpg") multifreqpoly(beta,col=1:ncol(beta)) dev.off() }
Due to SNPs in CpG probe region or other unknow factors, methylation beta values for some CpGs have multimodal distribution. This function is to identify this type of probes (so called gap probes) with obovious multimoal distribution.
nmode(x, minN = 3, modedist=0.2, nCores = 1)
nmode(x, minN = 3, modedist=0.2, nCores = 1)
x |
A methylation beta value matrix with row for probes and column for samples. |
minN |
Minimum number of data points at each cluster |
modedist |
Minimum distance between adjacent modes |
nCores |
Number of cores used for computation |
This function uses an empirical approach to estimate number of modes in methylation beta value for each CpG probe. By default, the function requires the distance between modes have to be greater than 0.2 in methylation beta value, and each mode clusters should has at least 3 data points or 5% of data points whichever is greater.
A vector of integers indicating number of modes. Gap probes are probes with number of mode greater than 1.
Zongli Xu
Zongli Xu, Liang Niu, Leping Li and Jack A. Taylor, ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Research 2015
if (require(minfiData)) { mdat <- preprocessRaw(RGsetEx) beta=getBeta(mdat, "Illumina") nmode=nmode(beta, minN = 3,modedist=0.2, nCores = 5) path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat <- getmeth(rgSet) beta=getB(mdat) nmode=nmode(beta, minN = 3,modedist=0.2, nCores = 5) }
if (require(minfiData)) { mdat <- preprocessRaw(RGsetEx) beta=getBeta(mdat, "Illumina") nmode=nmode(beta, minN = 3,modedist=0.2, nCores = 5) path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat <- getmeth(rgSet) beta=getB(mdat) nmode=nmode(beta, minN = 3,modedist=0.2, nCores = 5) }
Quantile normalization of methylation intensity data across samples for Illumina Infinium HumanMethylation 450 and MethylationEPIC BeadChip.
norm.quantile(mdat, method = "quantile1")
norm.quantile(mdat, method = "quantile1")
mdat |
An object of class |
method |
Quantile normalization method: "quantile1", "quantile2", or "quantile3". |
By default, method = "quantile1", which will separately quantile normalize Methylated or Unmethylated intensities for Infinium I or II probes. The "quantile2" will quantile normalize combined Methylated or Unmethylated intensities for Infinium I or II probes. The "quantile3" will quantile normalize combined Methylated or Unmethylated intensities for Infinium I and II probes together.
The output is an an object of class methDataSet
or MethylSet
.
Zongli Xu
Pidsley, R., CC, Y.W., Volta, M., Lunnon, K., Mill, J. and Schalkwyk, L.C. (2013) A data-driven approach to preprocessing Illumina 450K methylation array data. BMC genomics, 14, 293.
#for methDataSet path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat<-preprocessENmix(rgSet, bgParaEst="oob",nCores=6) mdatq<-norm.quantile(mdat, method="quantile1") #for MethySet if (require(minfiData)) { mdat=preprocessENmix(RGsetEx,bgParaEst="oob",nCores=6) mdatq=norm.quantile(mdat,method="quantile1") }
#for methDataSet path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat<-preprocessENmix(rgSet, bgParaEst="oob",nCores=6) mdatq<-norm.quantile(mdat, method="quantile1") #for MethySet if (require(minfiData)) { mdat=preprocessENmix(RGsetEx,bgParaEst="oob",nCores=6) mdatq=norm.quantile(mdat,method="quantile1") }
Maximum Likelihood Estimate (MLE) of 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) using sequencing/array data from paired bisulfite and oxidative bisulfite treated DNA experiments.
oxBS.MLE(beta.BS,beta.oxBS,N.BS,N.oxBS)
oxBS.MLE(beta.BS,beta.oxBS,N.BS,N.oxBS)
beta.BS |
A matrix of methylation beta values (proportion of methylated sites estimated as methylated intensity over total intensity) obtained from bisulfite (BS) experiments |
beta.oxBS |
A matrix of methylation beta values obtained from oxidative bisulfite (oxBS) experiments |
N.BS |
A matrix of total signals (sum of methylated and unmethylated intensity values) from BS experiments |
N.oxBS |
A matrix of total signals from oxBS experiments |
For all the inputs (beta.BS
, beta.oxBS
, N.BS
and
N.oxBS
), the rows should be corresponding to CpG loci and the
columns should be corresponding to samples.
The row/column names in all four matrices should be the same.
For a specific CpG of a sample, if any one of the four
values (beta.BS
, beta.oxBS
, N.BS
and N.oxBS
)
is NA
, or N.BS
is zero, or N.oxBS
is zero, the MLE of both 5mC
and 5hmC levels will be set as NA
.
The output is a list with two elements:
5mC
: a matrix of estimated 5mC levels.
5hmC
: a matrix for estimated 5hmC levels.
Liang Niu and Zongli Xu
Zongli Xu, Jack A. Taylor, Yuet-Kin Leung, Shuk-Mei Ho and Liang Niu, oxBS-MLE: an efficient method to estimate 5-methylcytosine and 5-hydroxymethylcytosine in paired bisulfite and oxidative bisulfite treated DNA, Bioinformatics. 2016
# load example data load(system.file("oxBS.MLE.RData",package="ENmix")) # run oxBS.MLE resu<-oxBS.MLE(beta.BS,beta.oxBS,N.BS,N.oxBS) dim(resu[["5mC"]]) dim(resu[["5hmC"]])
# load example data load(system.file("oxBS.MLE.RData",package="ENmix")) # run oxBS.MLE resu<-oxBS.MLE(beta.BS,beta.oxBS,N.BS,N.oxBS) dim(resu[["5mC"]]) dim(resu[["5hmC"]])
P value Q-Q plot with optional confidence interval
p.qqplot(pvalues,outf="qq",outfmt="jpg",draw.conf=TRUE, conf.col="lightgray",conf.alpha=.95,pch=20,col="black",reducesize=0,...)
p.qqplot(pvalues,outf="qq",outfmt="jpg",draw.conf=TRUE, conf.col="lightgray",conf.alpha=.95,pch=20,col="black",reducesize=0,...)
pvalues |
An numeric vector of P values |
outf |
figure file name, default "qq" |
outfmt |
Output figure file format, "jpg" or "eps" |
draw.conf |
Whether to draw confidence interval of expected P values under NULL hypothesis |
conf.col |
Color code of confidence interval |
conf.alpha |
Confidence interval range, 0.95 in default |
pch |
Point type code |
col |
Point color code |
reducesize |
A positive interger, larger the value, smaller the eps file size. Smaller file size is achieved by skipping some densely packed data points |
... |
Arguments pass to plot |
P value Q-Q plot with optional confidence interval
Zongli Xu
dat=simubed() p.qqplot(pvalues=dat$p,draw.conf=TRUE,outf="qq_try",outfmt="jpg")
dat=simubed() p.qqplot(pvalues=dat$p,draw.conf=TRUE,outf="qq_try",outfmt="jpg")
First, principal component analysis will be performed in the standadized input data matrix (standadized for each row/CpG), and then the specified number of top principal components (which explain most data variation) will be used to perform linear regression with each specified variable. Regression P values will be plotted for exploration of methylation data variance structure or identification of possible confounding variables in association analysis.
pcrplot(beta, cov,npc=50)
pcrplot(beta, cov,npc=50)
beta |
A methylation beta value matrix with rows for probes and columns for samples. The input matrix should not contain any missing value. |
cov |
A data frame of covariates. Categorical variables should be converted to factors. The number of rows should equal to the number of columns in beta matrix |
npc |
The number of top ranked principal components to be plotted |
A jpeg figure "svdscreeplot.jpg" to show the variations explained by each principal component.
A jpeg figure "pcr_diag.jpg" to show association strength between principal components and covariates with cell colors indicating different levels of association P values.
Zongli Xu
Zongli Xu, Liang Niu, Leping Li and Jack A. Taylor, ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Research 2015
if (require(minfiData)) { mdat <- preprocessRaw(RGsetEx) beta=getBeta(mdat, "Illumina") group=pData(mdat)$Sample_Group slide=factor(pData(mdat)$Slide) cov=data.frame(group,slide) pcrplot(beta,cov,npc=6) }
if (require(minfiData)) { mdat <- preprocessRaw(RGsetEx) beta=getBeta(mdat, "Illumina") group=pData(mdat)$Sample_Group slide=factor(pData(mdat)$Slide) cov=data.frame(group,slide) pcrplot(beta,cov,npc=6) }
The function will generate a series of internal control plots that are similar to the plots from Illumina GenomeStudio software. Users should refer to GenomeStudio online guide to interpret these figures. These figures can be used to check data quality and experimental procedures.
plotCtrl(rgSet,IDorder=NULL)
plotCtrl(rgSet,IDorder=NULL)
rgSet |
An object of class |
IDorder |
A list of sample ids in the order specified by user. The list can be a subset of sample ids in input dataset. If an id list is provided, all plots will be produced in the order of the list. |
A set of internal control figures in jpeg format.
Zongli Xu
Zongli Xu, Liang Niu, Leping Li and Jack A. Taylor, ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Research 2015.
if (require(minfiData)) { #rgDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) plotCtrl(rgSet) #RGChannelSet as input pinfo=pData(RGsetEx) IDorder=rownames(pinfo)[order(pinfo$Slide,pinfo$Array)] plotCtrl(RGsetEx,IDorder) }
if (require(minfiData)) { #rgDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) plotCtrl(rgSet) #RGChannelSet as input pinfo=pData(RGsetEx) IDorder=rownames(pinfo)[order(pinfo$Slide,pinfo$Array)] plotCtrl(RGsetEx,IDorder) }
Estimating sample sex based on methylation data
predSex(mdat, cutoff = 2)
predSex(mdat, cutoff = 2)
mdat |
An object of class |
cutoff |
The difference in log2 total intensity between X and Y chromosomes |
Estimation of sex is based on the difference of log2 median total intensity measures on the X and Y chromosomes.
Zongli Xu
if (require(minfiData)) { path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) sex=predSex(rgSet) }
if (require(minfiData)) { path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) sex=predSex(rgSet) }
The ENmix background correction for HumanMethylation 450 and MethylationEPIC BeadChip. ENmix models methylation signal intensities with a flexible exponential-normal mixture distribution, and models background noise with a truncated normal distribution. ENmix will split BeadChip intensity data into 6 parts and separately model methylated and unmethylated intensities, 2 different color channels and 2 different probe types.
preprocessENmix(rgSet, bgParaEst = "oob", dyeCorr="RELIC", QCinfo=NULL, exQCsample=TRUE, exQCcpg=TRUE, exSample=NULL, exCpG=NULL, nCores = 2)
preprocessENmix(rgSet, bgParaEst = "oob", dyeCorr="RELIC", QCinfo=NULL, exQCsample=TRUE, exQCcpg=TRUE, exSample=NULL, exCpG=NULL, nCores = 2)
rgSet |
An object of class |
bgParaEst |
Method to estimate background normal distribution parameters. Options are: "oob","est", or "neg". |
dyeCorr |
Dye bias correction method, "mean": correction based on averaged red/green ratio, or "RELIC": correction with RELIC method (default method), or "none": no dye bias correction. |
QCinfo |
If QCinfo object from function QCinfo() is provided, low quality samples (if exQCsample=TRUE) and CpGs (if exQCcpg=TRUE) will be excluded before background correction. |
exQCsample |
If TRUE, low quality samples listed in QCinfo will be excluded. |
exQCcpg |
If TRUE, low quality CpGs listed in QCinfo will be excluded. |
exSample |
User specified samples to be excluded before background correction |
exCpG |
User specified probes to be excluded before background correction |
nCores |
Number of cores will be used for computation |
By default, ENmix will use out-of-band Infinium I intensities ("oob") to estimate
normal distribution parameters for modeling background noise. Option "est" will use
combined methylated and unmethylated intensities to estimate background distribution
parameters separately for each color channel and each probe type. Option "neg" will
use 600 chip internal controls probes to estimate background distribution parameters.
If rgSet if a MethylSet
, then only option "est" can be selected.
An object of class same with input data
Zongli Xu and Liang Niu
Zongli Xu, Liang Niu, Leping Li and Jack A. Taylor, ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Research 2015.
Zongli Xu, Sabine A. S. Langie, Patrick De Boever, Jack A. Taylor1 and Liang Niu, RELIC: a novel dye-bias correction method for Illumina Methylation BeadChip, BMC Genomics. 2017
if (require(minfiData)) { #rgDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) #quality control information qc<-QCinfo(rgSet) #further excluding samples which are not in the qc$badsample list ex_id=c("5723646053_R04C02") #further excluding cpgs which are not in the qc$badCpG list ex_cg=c("cg00000622", "cg00001245", "cg00001261") mdat=preprocessENmix(rgSet,QCinfo=qc,exSample=ex_id,exCpG=ex_cg,nCores=6) #RGChannelSet as input mdat=preprocessENmix(RGsetEx,nCores=6) }
if (require(minfiData)) { #rgDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) #quality control information qc<-QCinfo(rgSet) #further excluding samples which are not in the qc$badsample list ex_id=c("5723646053_R04C02") #further excluding cpgs which are not in the qc$badCpG list ex_cg=c("cg00000622", "cg00001245", "cg00001261") mdat=preprocessENmix(rgSet,QCinfo=qc,exSample=ex_id,exCpG=ex_cg,nCores=6) #RGChannelSet as input mdat=preprocessENmix(RGsetEx,nCores=6) }
Outlier was defined as values smaller than 3 times IQR from the lower quartile or greater than 3 times IQR from the upper quartile. If data quality information were provided, low quality data points will be set as missing data first before looking for outliers. All outliers and low quality data will be set as miss in output matrix. If set imput=TRUE, imputation will be performed using k-nearest neighbors method to impute all missing values.
qcfilter(mat,qcscore=NULL,rmoutlier=TRUE,byrow=TRUE,detPthre=0.000001,nbthre=3, rmcr=FALSE,rthre=0.05,cthre=0.05,impute=FALSE,imputebyrow=TRUE,fastimpute=FALSE,...)
qcfilter(mat,qcscore=NULL,rmoutlier=TRUE,byrow=TRUE,detPthre=0.000001,nbthre=3, rmcr=FALSE,rthre=0.05,cthre=0.05,impute=FALSE,imputebyrow=TRUE,fastimpute=FALSE,...)
mat |
An numeric matirx containing methylation beta values |
qcscore |
If the data quality infomation (the output from function QCinfo) were provied, low quality data points as defined by detection p value threshold (detPthre) and number of bead threshold (nbthre) will be set as missing value. |
rmoutlier |
if TRUE, outliers data points will be set as missing data NA. |
byrow |
TRUE: Looking for outliers row by row, or FALSE: column by column. |
detPthre |
Detection P value threshold to define low qualitye data points, detPthre=0.000001 in default. |
nbthre |
Number of beads threshold to define low qualitye data points, nbthre=3 in default. |
rmcr |
TRUE: exclude rows and columns with too many missing values as defined by rthre and cthre. FALSE is in default |
rthre |
Minimum of percentage of missing values for a row to be excluded |
cthre |
Minimum of percentage of missing values for a column to be excluded |
impute |
If TRUE, k-nearest neighbors methods will used for imputation. |
imputebyrow |
TRUE: impute missing values using similar values in row, or FALSE: in column |
fastimpute |
If TRUE, probe median will be used for fast imputation. |
... |
Arguments to be passed to the function impute.knn in R package "impute" |
The output is an numeric matrix.
Zongli Xu
Zongli Xu, Liang Niu, Leping Li and Jack A. Taylor, ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Research 2015.
if (require(minfiData)) { path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) qc=QCinfo(rgSet) mdat=preprocessENmix(rgSet,QCinfo=qc,nCores=6) mdat=norm.quantile(mdat,method="quantile1") beta=rcp(mdat) #filter out outliers data points only b1=qcfilter(beta) #filter out low quality and outlier data points b2=qcfilter(beta,qcscore=qc) #filter out low quality and outlier values, remove rows and columns with # too many missing values b3=qcfilter(beta,qcscore=qc,rmcr=TRUE) #filter out low quality and outlier values, remove rows and columns with # too many missing values, and then do imputation b3=qcfilter(beta,qcscore=qc,rmcr=TRUE,impute=TRUE) }
if (require(minfiData)) { path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) qc=QCinfo(rgSet) mdat=preprocessENmix(rgSet,QCinfo=qc,nCores=6) mdat=norm.quantile(mdat,method="quantile1") beta=rcp(mdat) #filter out outliers data points only b1=qcfilter(beta) #filter out low quality and outlier data points b2=qcfilter(beta,qcscore=qc) #filter out low quality and outlier values, remove rows and columns with # too many missing values b3=qcfilter(beta,qcscore=qc,rmcr=TRUE) #filter out low quality and outlier values, remove rows and columns with # too many missing values, and then do imputation b3=qcfilter(beta,qcscore=qc,rmcr=TRUE,impute=TRUE) }
Extract information for data quanlity control: detection P values, number of beads and averaged bisulfite conversion intensity. The function can also identify low quality samples and probes, as well as outlier samples in total intensity or beta value distribution.
QCinfo(rgSet, detPthre=0.000001, detPtype="negative", nbthre=3, samplethre=0.05, CpGthre=0.05, bisulthre=NULL, outlier=TRUE, distplot=TRUE)
QCinfo(rgSet, detPthre=0.000001, detPtype="negative", nbthre=3, samplethre=0.05, CpGthre=0.05, bisulthre=NULL, outlier=TRUE, distplot=TRUE)
rgSet |
An object of class |
detPthre |
Detection P value threshold to identify low quality data point |
detPtype |
Calculate detection P values based on negtive internal control ("negative") probes or out of the band ("oob") probes |
nbthre |
Number of bead threshold to identify data point of low quality |
samplethre |
Threshold to identify samples with low data quality, the percentage of low quality methylation data points across probes for each sample |
CpGthre |
Threshold to identify probes with low data quality, percentage of low quality methylation data points across samples for each probe |
bisulthre |
Threshold of bisulfite intensity for identification of low quality samples. By default, Mean - 3 x SD of sample bisufite control intensities will be used as a threshold. |
outlier |
If TRUE, outlier samples in total intensity or beta value distribution will be idenfied and classified as bad samples. |
distplot |
TRUE or FALSE, whether to produce beta value distribution plots before and after QC. |
detP: a matrix of detection P values
nbead: a matrix for number of beads
bisul: a vector of averaged intensities for bisulfite conversion controls per sample
badsample: a list of low quality or outlier samples
badCpG: a list of low quality CpGs
outlier_sample: a list of outlier samples in methylation beta value or totol intensity distribution.
Figure "qc_sample.jpg": scatter plot of Percent of low quality data per sample vs. Average bisulfite conversion intensity
Figure "qc_CpG.jpg": histogram for Percent of low quality data per CpG.
Figure "freqpolygon_beta_beforeQC.jpg": distribution plot of input data, samples colored in red are "bad" samples, list in badsample, including samples with low data quality and outlier in methylaiton beta value or total intensity.
Figure "freqpolygon_beta_afterQC.jpg": distribution plot input data after filtering "bad" samples.
Zongli Xu
Zongli Xu, Liang Niu, Leping Li and Jack A. Taylor, ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Research 2015.
if (require(minfiData)) { #rgDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) qc=QCinfo(rgSet) #RGChannelSetExtended as input sheet <- read.metharray.sheet(file.path(find.package("minfiData"),"extdata"), pattern = "csv$") rgSet <- read.metharray.exp(targets = sheet,extended = TRUE) qc<-QCinfo(rgSet) }
if (require(minfiData)) { #rgDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) qc=QCinfo(rgSet) #RGChannelSetExtended as input sheet <- read.metharray.sheet(file.path(find.package("minfiData"),"extdata"), pattern = "csv$") rgSet <- read.metharray.exp(targets = sheet,extended = TRUE) qc<-QCinfo(rgSet) }
Probe design type bias correction using Regression on Correlated Probes (RCP) method
rcp(mdat, dist=25, quantile.grid=seq(0.001,0.999,by=0.001), qcscore = NULL, nbthre=3, detPthre=0.000001)
rcp(mdat, dist=25, quantile.grid=seq(0.001,0.999,by=0.001), qcscore = NULL, nbthre=3, detPthre=0.000001)
mdat |
An object of class |
dist |
Maximum distance in base pair between type I and type II probe pairs for regression calibration |
quantile.grid |
Quantile grid used in linear regression |
qcscore |
Data quality infomation object, the output from function QCinfo. If the object is provied, low quality data points as defined by detection p value threshold (detPthre) or number of bead threshold (nbthre) will be set as missing values. |
detPthre |
Detection P value threshold to define low qualitye data points |
nbthre |
Number of beads threshold to define low qualitye data points, nbthre=3 in default. |
The function will first identify type I and type II probe pairs within a specified distance, and then perform linear regression calibration between the probe types. With the estimates the function will then adjust type II data using type I data as references.
A beta value matrix
Liang Niu, Zongli Xu
Liang Niu, Zongli Xu and Jack A. Taylor RCP: a novel probe design bias correction method for Illumina Methylation BeadChip, Bioinformatics 2016
if (require(minfiData)) { #methDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) qc=QCinfo(rgSet) mdat=preprocessENmix(rgSet,QCinfo=qc,nCores=6) mdat=norm.quantile(mdat,method="quantile1") beta=rcp(mdat) #methylset as input sheet <- read.metharray.sheet(file.path(find.package("minfiData"),"extdata"), pattern = "csv$") rgSet <- read.metharray.exp(targets = sheet,extended = TRUE) qc=QCinfo(rgSet) mdat=preprocessENmix(rgSet,QCinfo=qc,nCores=6) mdat=norm.quantile(mdat,method="quantile1") beta=rcp(mdat) }
if (require(minfiData)) { #methDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) qc=QCinfo(rgSet) mdat=preprocessENmix(rgSet,QCinfo=qc,nCores=6) mdat=norm.quantile(mdat,method="quantile1") beta=rcp(mdat) #methylset as input sheet <- read.metharray.sheet(file.path(find.package("minfiData"),"extdata"), pattern = "csv$") rgSet <- read.metharray.exp(targets = sheet,extended = TRUE) qc=QCinfo(rgSet) mdat=preprocessENmix(rgSet,QCinfo=qc,nCores=6) mdat=norm.quantile(mdat,method="quantile1") beta=rcp(mdat) }
Modified RCP method to normalize user's data to a list of reference values
rcp2(datMeth, reference, quantile.grid=seq(0.001,0.999,by=0.001))
rcp2(datMeth, reference, quantile.grid=seq(0.001,0.999,by=0.001))
datMeth |
A matrix with row for probes and column for samples |
reference |
A data frame with two columns, "cg" for CpG names and "meth_mean" for reference values |
quantile.grid |
Quantile grid used in linear regression |
The function will normalize user data to a reference value distribution based on common set of probe between user data and reference data.
A matrix with same dimension of user data
Zongli Xu
Liang Niu, Zongli Xu and Jack A. Taylor RCP: a novel probe design bias correction method for Illumina Methylation BeadChip, Bioinformatics 2016
require(minfiData) path <- file.path(find.package("minfiData"),"extdata") #based on rgDataset rgSet <- readidat(path = path,recursive = TRUE) meth=getmeth(rgSet) beta=getB(meth) reference=data.frame(cg=rownames(beta),goldstandard=beta[,3]) reference=reference[sample(1:nrow(reference),2000),] beta2=rcp2(beta,reference)
require(minfiData) path <- file.path(find.package("minfiData"),"extdata") #based on rgDataset rgSet <- readidat(path = path,recursive = TRUE) meth=getmeth(rgSet) beta=getB(meth) reference=data.frame(cg=rownames(beta),goldstandard=beta[,3]) reference=reference[sample(1:nrow(reference),2000),] beta2=rcp2(beta,reference)
Read in IDAT files and create a rgDataSet with probe annotation
readidat(path = NULL,manifestfile=NULL,recursive = TRUE, verbose = FALSE, force=FALSE)
readidat(path = NULL,manifestfile=NULL,recursive = TRUE, verbose = FALSE, force=FALSE)
path |
Directory where idat files are located |
manifestfile |
Array manifestfile, which can be downloaded from Illumina website. Bioconductor manifest package will be used if not provided |
recursive |
if TRUE, idat files in the subdirectories will also be read in |
verbose |
if TRUE, detailed running info will be printed on screen |
force |
if TRUE, arrays with different sizes will be merged together |
Some array types and corresponding manifestfiles can be guessed by the program based on the number of probes per array. However, we recommand to provide correct manifest file directly, which can be easily downloaded from Illumina website, see below for some examples.
Probe annotation info can be extracted using command rowData
An object of class rgDataSet
,
Zongli Xu
#Illumina methylation array manifestfile #Infinium Mouse Methylation Manifest File (CSV) system("wget https://support.illumina.com/content/dam/illumina-support/documents/ downloads/productfiles/mouse-methylation/infinium-mouse-methylation- manifest-file-csv.zip") system("unzip infinium-mouse-methylation-manifest-file-csv.zip") mf="infinium-mouse-methylation-manifest-file.csv" #for MethylationEPIC v2.0 #download InfiniumMethylationEPICv2.0ProductFiles #mf= path to file "EPIC-8v2-0_A1.csv" or file "EPIC-8v2-0_A2.csv" #manifest=readmanifest(mf) #for MethylationEPIC v1.0 B5 system("wget https://webdata.illumina.com/downloads/productfiles/ methylationEPIC/infinium-methylationepic-v-1-0-b5- manifest-file-csv.zip") system("unzip infinium-methylationepic-v-1-0-b5-manifest-file-csv.zip") mf="infinium-methylationepic-v-1-0-b5-manifest-file.csv" manifest=readmanifest(mf) #for MethylationEPIC v1.0 B4 system("wget https://webdata.illumina.com/downloads/productfiles/ methylationEPIC/infinium-methylationepic-v-1-0-b4- manifest-file-csv.zip") system("unzip infinium-methylationepic-v-1-0-b4-manifest-file-csv.zip") mf="MethylationEPIC_v-1-0_B4.csv" manifest=readmanifest(mf) #for HumanMethylation450 system("wget https://webdata.illumina.com/downloads/productfiles/ humanmethylation450/humanmethylation450_15017482_v1-2.csv") mf="HumanMethylation450_15017482_v1-2.csv" mf="HumanMethylation450_15017482_v1-2.csv" if(require(minfiData)){ path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,manifestfile=mf,recursive = TRUE) #without providing manifest file, corresponding R manifestfile package will be used rgSet <- readidat(path = path,recursive = TRUE) }
#Illumina methylation array manifestfile #Infinium Mouse Methylation Manifest File (CSV) system("wget https://support.illumina.com/content/dam/illumina-support/documents/ downloads/productfiles/mouse-methylation/infinium-mouse-methylation- manifest-file-csv.zip") system("unzip infinium-mouse-methylation-manifest-file-csv.zip") mf="infinium-mouse-methylation-manifest-file.csv" #for MethylationEPIC v2.0 #download InfiniumMethylationEPICv2.0ProductFiles #mf= path to file "EPIC-8v2-0_A1.csv" or file "EPIC-8v2-0_A2.csv" #manifest=readmanifest(mf) #for MethylationEPIC v1.0 B5 system("wget https://webdata.illumina.com/downloads/productfiles/ methylationEPIC/infinium-methylationepic-v-1-0-b5- manifest-file-csv.zip") system("unzip infinium-methylationepic-v-1-0-b5-manifest-file-csv.zip") mf="infinium-methylationepic-v-1-0-b5-manifest-file.csv" manifest=readmanifest(mf) #for MethylationEPIC v1.0 B4 system("wget https://webdata.illumina.com/downloads/productfiles/ methylationEPIC/infinium-methylationepic-v-1-0-b4- manifest-file-csv.zip") system("unzip infinium-methylationepic-v-1-0-b4-manifest-file-csv.zip") mf="MethylationEPIC_v-1-0_B4.csv" manifest=readmanifest(mf) #for HumanMethylation450 system("wget https://webdata.illumina.com/downloads/productfiles/ humanmethylation450/humanmethylation450_15017482_v1-2.csv") mf="HumanMethylation450_15017482_v1-2.csv" mf="HumanMethylation450_15017482_v1-2.csv" if(require(minfiData)){ path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,manifestfile=mf,recursive = TRUE) #without providing manifest file, corresponding R manifestfile package will be used rgSet <- readidat(path = path,recursive = TRUE) }
Parsing Illumina methylation arrays manifest file.
readmanifest(file)
readmanifest(file)
file |
Illumina methylation array manifest file, downloaded from Illuminal website |
An object of dataframe caintaining probe annotation information
Zongli Xu
#for MethylationEPIC v1.0 B5 system("wget https://webdata.illumina.com/downloads/productfiles/methylationEPIC/ infinium-methylationepic-v-1-0-b5-manifest-file-csv.zip") system("unzip infinium-methylationepic-v-1-0-b5-manifest-file-csv.zip") mf="infinium-methylationepic-v-1-0-b5-manifest-file.csv" manifest=readmanifest(mf) #for MethylationEPIC v1.0 B4 system("wget https://webdata.illumina.com/downloads/productfiles/methylationEPIC/ infinium-methylationepic-v-1-0-b4-manifest-file-csv.zip") system("unzip infinium-methylationepic-v-1-0-b4-manifest-file-csv.zip") mf="MethylationEPIC_v-1-0_B4.csv" manifest=readmanifest(mf) #for HumanMethylation450 system("wget https://webdata.illumina.com/downloads/productfiles/ humanmethylation450/humanmethylation450_15017482_v1-2.csv") mf="HumanMethylation450_15017482_v1-2.csv" manifest=readmanifest(mf)
#for MethylationEPIC v1.0 B5 system("wget https://webdata.illumina.com/downloads/productfiles/methylationEPIC/ infinium-methylationepic-v-1-0-b5-manifest-file-csv.zip") system("unzip infinium-methylationepic-v-1-0-b5-manifest-file-csv.zip") mf="infinium-methylationepic-v-1-0-b5-manifest-file.csv" manifest=readmanifest(mf) #for MethylationEPIC v1.0 B4 system("wget https://webdata.illumina.com/downloads/productfiles/methylationEPIC/ infinium-methylationepic-v-1-0-b4-manifest-file-csv.zip") system("unzip infinium-methylationepic-v-1-0-b4-manifest-file-csv.zip") mf="MethylationEPIC_v-1-0_B4.csv" manifest=readmanifest(mf) #for HumanMethylation450 system("wget https://webdata.illumina.com/downloads/productfiles/ humanmethylation450/humanmethylation450_15017482_v1-2.csv") mf="HumanMethylation450_15017482_v1-2.csv" manifest=readmanifest(mf)
REgression on Logarithm of Internal Control probes (RELIC) correct for dye bias on whole array by utilizing the intensity values of paired internal control probes that monitor the two color channels.
relic (mdat,at_red,cg_grn)
relic (mdat,at_red,cg_grn)
mdat |
An object of class |
at_red |
an intensity matrix for Illumina control probes "NORM_A" and "NORM_T" |
cg_grn |
an intensity matrix for Illumina control probes "NORM_C" and "NORM_G" |
The Illumina MethylationEPIC BeadChip contains 85 pairs of internal normalization control probes (name with prefix NORM_A, NORM_T, NORM_G or NORM_C), while its predecessor, Illumina HumanMethyl-ation450 BeadChip contains 93 pairs. RELIC first performs a regression on the logarithms of the intensity values of the normalization control probes to derive a quantitative relationship between red and green channels, and then uses the relationship to correct for dye-bias on intensity values for whole array.
An object of class methDataSet
or MethylSet
depends on
input class.
Zongli Xu and Liang Niu
Zongli Xu, Sabine A. S. Langie, Patrick De Boever, Jack A. Taylor and Liang Niu, RELIC: a novel dye-bias correction method for Illumina Methylation BeadChip, BMC Genomics. 2017
Package preprocessENmix
if (require(minfiData)) { ##background correction and dye bias correction #rgDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat <- preprocessENmix(rgSet,bgParaEst="oob",nCores=6,dyeCorr ="RELIC") #RGChannelSet as input mdat=preprocessENmix(RGsetEx,bgParaEst="oob",nCores=6,dyeCorr ="RELIC") ##dye bias correction only #methDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) ctrls <- getCGinfo(rgSet,type="ctrl") ctrls <- ctrls[ctrls$Address %in% rownames(rgSet),] ctrl_r <- assays(rgSet)$Red[ctrls$Address,] ctrl_g <- assays(rgSet)$Green[ctrls$Address,] CG.controls <- ctrls$Type %in% c("NORM_C", "NORM_G") AT.controls <- ctrls$Type %in% c("NORM_A", "NORM_T") cg_grn=ctrl_g[CG.controls,] at_red=ctrl_r[AT.controls,] rownames(cg_grn) = ctrls$ExtendedType[CG.controls] rownames(at_red) = ctrls$ExtendedType[AT.controls] mdat=getmeth(rgSet) mdat <- relic(mdat,at_red,cg_grn) #MethylSet as input ctrls <- getProbeInfo(RGsetEx,type="Control") ctrls <- ctrls[ctrls$Address %in% featureNames(RGsetEx),] ctrl_r <- getRed(RGsetEx)[ctrls$Address,] ctrl_g <- getGreen(RGsetEx)[ctrls$Address,] CG.controls <- ctrls$Type %in% c("NORM_C","NORM_G") AT.controls <- ctrls$Type %in% c("NORM_A","NORM_T") cg_grn <- ctrl_g[CG.controls,] at_red <- ctrl_r[AT.controls,] rownames(cg_grn) = ctrls$ExtendedType[CG.controls] rownames(at_red) = ctrls$ExtendedType[AT.controls] mdat <- preprocessRaw(RGsetEx) mdat <- relic(mdat,at_red,cg_grn) }
if (require(minfiData)) { ##background correction and dye bias correction #rgDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat <- preprocessENmix(rgSet,bgParaEst="oob",nCores=6,dyeCorr ="RELIC") #RGChannelSet as input mdat=preprocessENmix(RGsetEx,bgParaEst="oob",nCores=6,dyeCorr ="RELIC") ##dye bias correction only #methDataSet as input path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) ctrls <- getCGinfo(rgSet,type="ctrl") ctrls <- ctrls[ctrls$Address %in% rownames(rgSet),] ctrl_r <- assays(rgSet)$Red[ctrls$Address,] ctrl_g <- assays(rgSet)$Green[ctrls$Address,] CG.controls <- ctrls$Type %in% c("NORM_C", "NORM_G") AT.controls <- ctrls$Type %in% c("NORM_A", "NORM_T") cg_grn=ctrl_g[CG.controls,] at_red=ctrl_r[AT.controls,] rownames(cg_grn) = ctrls$ExtendedType[CG.controls] rownames(at_red) = ctrls$ExtendedType[AT.controls] mdat=getmeth(rgSet) mdat <- relic(mdat,at_red,cg_grn) #MethylSet as input ctrls <- getProbeInfo(RGsetEx,type="Control") ctrls <- ctrls[ctrls$Address %in% featureNames(RGsetEx),] ctrl_r <- getRed(RGsetEx)[ctrls$Address,] ctrl_g <- getGreen(RGsetEx)[ctrls$Address,] CG.controls <- ctrls$Type %in% c("NORM_C","NORM_G") AT.controls <- ctrls$Type %in% c("NORM_A","NORM_T") cg_grn <- ctrl_g[CG.controls,] at_red <- ctrl_r[AT.controls,] rownames(cg_grn) = ctrls$ExtendedType[CG.controls] rownames(at_red) = ctrls$ExtendedType[AT.controls] mdat <- preprocessRaw(RGsetEx) mdat <- relic(mdat,at_red,cg_grn) }
The function can be used to calculate ICC for each CpG probe using balanced or unbalanced replicate samples.
repicc(dat,repid,mvalue=FALSE,nCores=2,qcflag=FALSE,qc=NULL, detPthre=0.05,nbthre=3)
repicc(dat,repid,mvalue=FALSE,nCores=2,qcflag=FALSE,qc=NULL, detPthre=0.05,nbthre=3)
dat |
Methylation beta value matrix |
repid |
A data frame with two variables, id and idx. id should be the same with column name of "dat", idx is a variable to show the relationship between samples with same value for samples from same individual. |
mvalue |
If TRUE, the beta value will be converted to M value for calculation of ICC |
nCores |
Number of cores will be used for calculation of ICC |
qcflag |
Whether to perform QC before calculation of ICC |
qc |
QC object from function QCinfo |
detPthre |
If qcflag=TRUE, the methylation values with detection P value higher than the threshold will be removed before calculation |
nbthre |
If qcflag=TRUE, the methylation values with number of bead smaller |
A data frame containing ICC for each probe
Zongli Xu
Zongli Xu, Jack A Taylor. Reliability of DNA methylation measures using Illumina methylation BeadChip. Epigenetics 2020
if (require(minfiData)){ path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat=getmeth(rgSet) beta=getB(mdat,"Illumina") repid=data.frame(id=c("5723646052_R02C02","5723646052_R04C01","5723646052_R05C02", "5723646053_R04C02","5723646053_R05C02","5723646053_R06C02"),idx=c(1,1,2,2,2,2)) iccresu<-repicc(dat=beta,repid=repid) }
if (require(minfiData)){ path <- file.path(find.package("minfiData"),"extdata") rgSet <- readidat(path = path,recursive = TRUE) mdat=getmeth(rgSet) beta=getB(mdat,"Illumina") repid=data.frame(id=c("5723646052_R02C02","5723646052_R04C01","5723646052_R05C02", "5723646053_R04C02","5723646053_R05C02","5723646053_R06C02"),idx=c(1,1,2,2,2,2)) iccresu<-repicc(dat=beta,repid=repid) }
"rgDataSet"
A class for storing Illumina methylation array raw intensity data of two color channels, and probe annotation infomation.
rgDataSet(Red = new("matrix"), Green = new("matrix"), NBeads = new("matrix"),rowData=new("DataFrame"),ictrl= new("DataFrame"),...)
rgDataSet(Red = new("matrix"), Green = new("matrix"), NBeads = new("matrix"),rowData=new("DataFrame"),ictrl= new("DataFrame"),...)
Red |
A matrix of Red channel intensity values with row for methylation probes and column for samples |
Green |
A matrix of Green channel intensity values with row for methylation probes and column for samples |
NBeads |
A matrix contains the number of beads used to generate intensity values on the Red and Green channels. |
rowData |
A dataframe contains probe annotation information |
ictrl |
A dataframe contais detailed information for Illumina internal control probes |
... |
other arguments for class SummarizedExperiment |
An object of class rgDataSet
showClass("rgDataSet")
showClass("rgDataSet")
Remove suffix from CpG names for EPIC v2 BeadChips and combined values for duplicated CpGs
rm.cgsuffix(datMeth)
rm.cgsuffix(datMeth)
datMeth |
A methlation data matrix with row names for CpG id |
Remove suffix from CpG names for EPIC v2 BeadChips and combined values for duplicated CpGs
A matrix with number of rows equal or less than input data.
Zongli Xu
#beta matrix with row for CpGs and column for samples beta2=rm.cgsuffix(beta)
#beta matrix with row for CpGs and column for samples beta2=rm.cgsuffix(beta)
Simulation of bed format example file.
simubed(nprobe=1000)
simubed(nprobe=1000)
nprobe |
Number of probes on each chromosome, default is 1000 |
Simulation of bed format example file.
A data frame
Zongli Xu
simubed(nprobe=1000)
simubed(nprobe=1000)