Title: | DNA methylation analysis from high-throughput bisulfite sequencing results |
---|---|
Description: | methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing. It also has functions to analyze base-pair resolution 5hmC data from experimental protocols such as oxBS-Seq and TAB-Seq. Methylation calling can be performed directly from Bismark aligned BAM files. |
Authors: | Altuna Akalin [aut, cre], Matthias Kormaksson [aut], Sheng Li [aut], Arsene Wabo [ctb], Adrian Bierling [aut], Alexander Blume [aut], Katarzyna Wreczycka [ctb] |
Maintainer: | Altuna Akalin <[email protected]>, Alexander Blume <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.33.0 |
Built: | 2024-10-30 08:47:43 UTC |
Source: | https://github.com/bioc/methylKit |
Measured 5mC levels via bisulfite sequencing might be a combination of 5hmC and 5mC levels since bisulfite sequencing can not distinguish between the two. This function can adjust 5mC levels of a bisulfite sequencing experiment if the user supplies corresponding 5hmC levels from the same sample.
adjustMethylC(mc,hmc,save.db,...,chunk.size) ## S4 method for signature 'methylRaw,methylRaw' adjustMethylC(mc, hmc, save.db = FALSE, ..., chunk.size = 1e+06) ## S4 method for signature 'methylRawList,methylRawList' adjustMethylC(mc, hmc, save.db = FALSE, ..., chunk.size = 1e+06) ## S4 method for signature 'methylRawDB,methylRawDB' adjustMethylC(mc, hmc, save.db = TRUE, ..., chunk.size = 1e+06) ## S4 method for signature 'methylRawListDB,methylRawListDB' adjustMethylC(mc, hmc, save.db = TRUE, ..., chunk.size = 1e+06)
adjustMethylC(mc,hmc,save.db,...,chunk.size) ## S4 method for signature 'methylRaw,methylRaw' adjustMethylC(mc, hmc, save.db = FALSE, ..., chunk.size = 1e+06) ## S4 method for signature 'methylRawList,methylRawList' adjustMethylC(mc, hmc, save.db = FALSE, ..., chunk.size = 1e+06) ## S4 method for signature 'methylRawDB,methylRawDB' adjustMethylC(mc, hmc, save.db = TRUE, ..., chunk.size = 1e+06) ## S4 method for signature 'methylRawListDB,methylRawListDB' adjustMethylC(mc, hmc, save.db = TRUE, ..., chunk.size = 1e+06)
mc |
a |
hmc |
a |
save.db |
A Logical to decide whether the resulting object should be saved as flat file database or not, default: explained in Details sections |
... |
optional Arguments used when save.db is TRUE
|
chunk.size |
Number of rows to be taken as a chunk for processing the
|
returns adjusted 5-methyl cytosine levels in the form of
methylRawList
, methylRaw
, methylRawDB
or
methylRawListDB
object depending on the input object
The parameter chunk.size
is only used when working with
methylRawDB
or methylRawListDB
objects,
as they are read in chunk by chunk to enable processing
large-sized objects which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work
for most systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
The parameter save.db
is per default TRUE for methylDB objects
as methylRawDB
and methylRawListDB
,
while being per default FALSE for methylRaw
and methylRawList
.
If you wish to save the result of an
in-memory-calculation as flat file database or if the size of the database
allows the calculation in-memory,
then you might change the value of this parameter.
1. Booth, Branco, et al. (2012). Quantitative Sequencing of 5-Methylcytosine and 5-Hydroxymethylcytosine at Single-Base Resolution. Science, 934
2. Yu, Hon, et al. (2012). Base-resolution analysis of 5-hydroxymethylcytosine in the Mammalian genome. Cell, 149(6), 1368-80.
# read 5hmC and 5mC files hmc.file=system.file("extdata", "test1.myCpG.txt", package = "methylKit") mc.file =system.file("extdata", "test2.myCpG.txt", package = "methylKit") my5hmC=methRead( hmc.file,sample.id="hmc",assembly="hg18") my5mC =methRead( mc.file,sample.id="mc",assembly="hg18") # adjusting the 5mC levels using 5hmC levels adjusted.5mC=adjustMethylC(my5mC,my5hmC)
# read 5hmC and 5mC files hmc.file=system.file("extdata", "test1.myCpG.txt", package = "methylKit") mc.file =system.file("extdata", "test2.myCpG.txt", package = "methylKit") my5hmC=methRead( hmc.file,sample.id="hmc",assembly="hg18") my5mC =methRead( mc.file,sample.id="mc",assembly="hg18") # adjusting the 5mC levels using 5hmC levels adjusted.5mC=adjustMethylC(my5mC,my5hmC)
This function associates principal components with sample annotations such as age, gender, batch_id. Can be used to detect which batch effects are associated with the variation in the methylation values.
assocComp(mBase, sampleAnnotation)
assocComp(mBase, sampleAnnotation)
mBase |
|
sampleAnnotation |
a data frame where columns are different annotations and rows are the samples, in the same order as in the methylBase object. |
a named list of principal component matrix (named 'pcs'), by principal compopents (named 'vars') and a p-value matrix showing association p-values between sample annotations and principal components (named 'association').
Altuna Akalin
data(methylKit) sampleAnnotation=data.frame(batch_id=c("a","a","b","b"),age=c(19,34,23,40)) as=assocComp(mBase=methylBase.obj,sampleAnnotation)
data(methylKit) sampleAnnotation=data.frame(batch_id=c("a","a","b","b"),age=c(19,34,23,40)) as=assocComp(mBase=methylBase.obj,sampleAnnotation)
The function converts methylRaw
, methylRawDB
,
methylRawList
,
methylRawListDB
, methylDiff
or
methylDiffDB
object
into a bedgraph format. It either writes as a file or returns a data.frame
bedgraph( methylObj, file.name = NULL, col.name, unmeth = FALSE, log.transform = FALSE, negative = FALSE, add.on = "", chunk.size = 1e+06 ) ## S4 method for signature 'methylDiff' bedgraph( methylObj, file.name, col.name, unmeth, log.transform, negative, add.on ) ## S4 method for signature 'methylRaw' bedgraph( methylObj, file.name, col.name, unmeth, log.transform, negative, add.on ) ## S4 method for signature 'methylRawList' bedgraph( methylObj, file.name, col.name, unmeth, log.transform, negative, add.on ) ## S4 method for signature 'methylRawDB' bedgraph( methylObj, file.name = NULL, col.name, unmeth = FALSE, log.transform = FALSE, negative = FALSE, add.on = "", chunk.size = 1e+06 ) ## S4 method for signature 'methylRawListDB' bedgraph( methylObj, file.name = NULL, col.name, unmeth = FALSE, log.transform = FALSE, negative = FALSE, add.on = "", chunk.size = 1e+06 ) ## S4 method for signature 'methylDiffDB' bedgraph( methylObj, file.name, col.name, log.transform, negative, add.on, chunk.size )
bedgraph( methylObj, file.name = NULL, col.name, unmeth = FALSE, log.transform = FALSE, negative = FALSE, add.on = "", chunk.size = 1e+06 ) ## S4 method for signature 'methylDiff' bedgraph( methylObj, file.name, col.name, unmeth, log.transform, negative, add.on ) ## S4 method for signature 'methylRaw' bedgraph( methylObj, file.name, col.name, unmeth, log.transform, negative, add.on ) ## S4 method for signature 'methylRawList' bedgraph( methylObj, file.name, col.name, unmeth, log.transform, negative, add.on ) ## S4 method for signature 'methylRawDB' bedgraph( methylObj, file.name = NULL, col.name, unmeth = FALSE, log.transform = FALSE, negative = FALSE, add.on = "", chunk.size = 1e+06 ) ## S4 method for signature 'methylRawListDB' bedgraph( methylObj, file.name = NULL, col.name, unmeth = FALSE, log.transform = FALSE, negative = FALSE, add.on = "", chunk.size = 1e+06 ) ## S4 method for signature 'methylDiffDB' bedgraph( methylObj, file.name, col.name, log.transform, negative, add.on, chunk.size )
methylObj |
a |
file.name |
Default: NULL. if a string is given a bedgraph file will be written, if NULL a data.frame or a list of data frames will be returned |
col.name |
name of the column in |
unmeth |
when working with |
log.transform |
Default FALSE, If TRUE the score column of the bedgraph wil be in log10 scale. Ignored when col.name='perc.meth' |
negative |
Default FALSE, If TRUE, the score column of the bedgraph will be multiplied by -1. Ignored when col.name='perc.meth' |
add.on |
additional string to be add on the track line of bedgraph. can be viewlimits,priority etc. Check bedgraph track line options at UCSC browser |
chunk.size |
Number of rows to be taken as a chunk for processing the
|
Returns a data.frame
or list of data.frames if
file.name=NULL
, if a file.name given appropriate bed file
will be written to that file
The parameter chunk.size
is only used when working with
methylRawDB
, methylRawListDB
or methylDiffDB
objects,
as they are read in chunk by chunk to enable processing large-sized objects
which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for
most systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
data(methylKit) # getting a bedgraph file from a methylDiff object containing differential # methylation percentages bedgraph(methylDiff.obj, file.name="test.bed", col.name="meth.diff", unmeth=FALSE,log.transform=FALSE,negative=FALSE,add.on="") # remove the file unlink("test.bed") # getting a bedgraph file from a methylBase object containing percent #methylation values bedgraph(methylRawList.obj[[1]], file.name="test2.bed", col.name="perc.meth", unmeth=FALSE,log.transform=FALSE,negative=FALSE,add.on="") unlink("test2.bed") # remove the file
data(methylKit) # getting a bedgraph file from a methylDiff object containing differential # methylation percentages bedgraph(methylDiff.obj, file.name="test.bed", col.name="meth.diff", unmeth=FALSE,log.transform=FALSE,negative=FALSE,add.on="") # remove the file unlink("test.bed") # getting a bedgraph file from a methylBase object containing percent #methylation values bedgraph(methylRawList.obj[[1]], file.name="test2.bed", col.name="perc.meth", unmeth=FALSE,log.transform=FALSE,negative=FALSE,add.on="") unlink("test2.bed") # remove the file
The function calculates differential methylation statistics between two groups of samples. The function uses either logistic regression test or Fisher's Exact test to calculate differential methylation. See the rest of the help page and references for detailed explanation on statistics.
calculateDiffMeth( .Object, covariates = NULL, overdispersion = c("none", "MN", "shrinkMN"), adjust = c("SLIM", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none", "qvalue"), effect = c("wmean", "mean", "predicted"), parShrinkMN = list(), test = c("F", "Chisq", "fast.fisher", "midPval"), mc.cores = 1, slim = TRUE, weighted.mean = TRUE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylBaseDB' calculateDiffMeth( .Object, covariates = NULL, overdispersion = c("none", "MN", "shrinkMN"), adjust = c("SLIM", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none", "qvalue"), effect = c("wmean", "mean", "predicted"), parShrinkMN = list(), test = c("F", "Chisq", "fast.fisher", "midPval"), mc.cores = 1, slim = TRUE, weighted.mean = TRUE, chunk.size = 1e+06, save.db = TRUE, ... )
calculateDiffMeth( .Object, covariates = NULL, overdispersion = c("none", "MN", "shrinkMN"), adjust = c("SLIM", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none", "qvalue"), effect = c("wmean", "mean", "predicted"), parShrinkMN = list(), test = c("F", "Chisq", "fast.fisher", "midPval"), mc.cores = 1, slim = TRUE, weighted.mean = TRUE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylBaseDB' calculateDiffMeth( .Object, covariates = NULL, overdispersion = c("none", "MN", "shrinkMN"), adjust = c("SLIM", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none", "qvalue"), effect = c("wmean", "mean", "predicted"), parShrinkMN = list(), test = c("F", "Chisq", "fast.fisher", "midPval"), mc.cores = 1, slim = TRUE, weighted.mean = TRUE, chunk.size = 1e+06, save.db = TRUE, ... )
.Object |
a |
covariates |
a |
overdispersion |
If set to "none"(default), no overdispersion correction will be attempted. If set to "MN", basic overdispersion correction, proposed by McCullagh and Nelder (1989) will be applied.This correction applies a scaling parameter to variance estimated by the model. EXPERIMENTAL: If set to "shrinkMN", scaling parameter will be shrunk towards a common value (not thoroughly tested as of yet). |
adjust |
different methods to correct the p-values for multiple testing.
Default is "SLIM" from methylKit. For "qvalue" please see
|
effect |
method to calculate the mean methylation different between groups using read coverage as weights (default). When set to "mean", the generic mean is applied and when set to "predicted", predicted means from the logistic regression model is used for calculating the effect. |
parShrinkMN |
a list for squeezeVar(). (NOT IMPLEMENTED) |
test |
the statistical test used to determine the methylation differences. The Chisq-test is used by default for more than two groups, while the F-test can be chosen if overdispersion control is applied. If there is one sample per group the Fisher's exact test will be applied using "fast.fisher", while "midPval" can be choosen to boost calculation speed. See details section for more information. |
mc.cores |
integer denoting how many cores should be used for parallel differential methylation calculations (can only be used in machines with multiple cores). |
slim |
If set to FALSE, |
weighted.mean |
If set to FALSE, |
chunk.size |
Number of rows to be taken as a chunk for processing the
|
save.db |
A Logical to decide whether the resulting object should be saved as flat file database or not, default: explained in Details section. |
... |
optional Arguments used when save.db is TRUE
|
a methylDiff
object containing the differential methylation
statistics and locations for regions or bases
Covariates can be included in the analysis. The function will then try to
separate the
influence of the covariates from the treatment effect via a linear model.
The Chisq-test is used per default only when no overdispersion correction is
applied.
If overdispersion correction is applied, the function automatically switches
to the
F-test. The Chisq-test can be manually chosen in this case as well, but the
F-test only
works with overdispersion correction switched on.
If there is one sample in each group, e.g. after applying the pooling samples,
the Fisher's exact test will be applied for differential methylation.
methyKit offers two implementations to perform this test, which yield
slightly different results but differ much in computation time.
"fast.fisher" is a cut down version 'fisher.test()' that should produce
the exact same results as the base implementation, while "midPval"
will produce marginaly different p-values, but offers a large boost in
calculation speed.
The parameter chunk.size
is only used when working with
methylBaseDB
objects, as they are read in chunk by chunk to enable
processing large-sized objects which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for most systems.
If you encounter memory problems or
have a high amount of memory available feel free to adjust the chunk.size
.
The parameter save.db
is per default TRUE for
methylDB objects as methylBaseDB
,
while being per default FALSE for methylBase
.
If you wish to save the result of an
in-memory-calculation as flat file database or if the size of the database
allows the calculation in-memory,
then you might want to change the value of this parameter.
Altuna Akalin, Matthias Kormaksson, Sheng Li, Francine E. Garrett-Bakelman, Maria E. Figueroa, Ari Melnick, Christopher E. Mason. (2012). "methylKit: A comprehensive R package for the analysis of genome-wide DNA methylation profiles." Genome Biology.
McCullagh and Nelder. (1989). Generalized Linear Models. Chapman and Hall. London New York.
Barnard. (1989). On alleged gains in power from lower P-values. Statistics in Medicine. Armitage and Berry. (1994) Statistical Methods in Medical Research (3rd edition). Blackwell.
data(methylKit) # The Chisq-test will be applied when no overdispersion control is chosen. my.diffMeth=calculateDiffMeth(methylBase.obj,covariates=NULL, overdispersion=c("none"), adjust=c("SLIM")) # pool samples in each group pooled.methylBase=pool(methylBase.obj,sample.ids=c("test","control")) # After applying the pool() function, there is one sample in each group. # The Fisher's exact test will be applied for differential methylation. my.diffMeth2=calculateDiffMeth(pooled.methylBase,covariates=NULL, overdispersion=c("none"), adjust=c("SLIM")) # Covariates and overdispersion control: # generate a methylBase object with age as a covariate covariates=data.frame(age=c(30,80,30,80)) sim.methylBase<-dataSim(replicates=4,sites=1000,treatment=c(1,1,0,0), covariates=covariates, sample.ids=c("test1","test2","ctrl1","ctrl2")) # Apply overdispersion correction and include covariates # in differential methylation calculations. my.diffMeth3<-calculateDiffMeth(sim.methylBase, covariates=covariates, overdispersion="MN",test="Chisq",mc.cores=1)
data(methylKit) # The Chisq-test will be applied when no overdispersion control is chosen. my.diffMeth=calculateDiffMeth(methylBase.obj,covariates=NULL, overdispersion=c("none"), adjust=c("SLIM")) # pool samples in each group pooled.methylBase=pool(methylBase.obj,sample.ids=c("test","control")) # After applying the pool() function, there is one sample in each group. # The Fisher's exact test will be applied for differential methylation. my.diffMeth2=calculateDiffMeth(pooled.methylBase,covariates=NULL, overdispersion=c("none"), adjust=c("SLIM")) # Covariates and overdispersion control: # generate a methylBase object with age as a covariate covariates=data.frame(age=c(30,80,30,80)) sim.methylBase<-dataSim(replicates=4,sites=1000,treatment=c(1,1,0,0), covariates=covariates, sample.ids=c("test1","test2","ctrl1","ctrl2")) # Apply overdispersion correction and include covariates # in differential methylation calculations. my.diffMeth3<-calculateDiffMeth(sim.methylBase, covariates=covariates, overdispersion="MN",test="Chisq",mc.cores=1)
This function provides an interface to the beta-binomial model from DSS package by Hao Wu. It calculates the differential methylation statistics using a beta-binomial model with parameter shrinkage. See the reference for details.
calculateDiffMethDSS( meth, adjust = c("SLIM", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none", "qvalue"), mc.cores = 1 ) ## S4 method for signature 'methylBase' calculateDiffMethDSS( meth, adjust = c("SLIM", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none", "qvalue"), mc.cores = 1 )
calculateDiffMethDSS( meth, adjust = c("SLIM", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none", "qvalue"), mc.cores = 1 ) ## S4 method for signature 'methylBase' calculateDiffMethDSS( meth, adjust = c("SLIM", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none", "qvalue"), mc.cores = 1 )
meth |
a methylBase object |
adjust |
different methods to correct the p-values for multiple testing.
Default is "SLIM" from methylKit. For "qvalue" please see
|
mc.cores |
integer denoting how many cores should be used for parallel differential methylation calculations (can only be used in machines with multiple cores). |
a methylDiff object
Feng H, Conneely K and Wu H (2014). A bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic acids research
data(methylKit) dssDiffay <- calculateDiffMethDSS(methylBase.obj, adjust="SLIM", mc.cores=1)
data(methylKit) dssDiffay <- calculateDiffMethDSS(methylBase.obj, adjust="SLIM", mc.cores=1)
hclust
function and
various distance metrics derived from percent methylation per base or per
region for each sample.Hierarchical Clustering using methylation data
The function clusters samples using hclust
function and
various distance metrics derived from percent methylation per base or per
region for each sample.
clusterSamples(.Object, dist="correlation", method="ward", sd.filter=TRUE,sd.threshold=0.5, filterByQuantile=TRUE, plot=TRUE,chunk.size) ## S4 method for signature 'methylBase' clusterSamples( .Object, dist, method, sd.filter, sd.threshold, filterByQuantile, plot ) ## S4 method for signature 'methylBaseDB' clusterSamples( .Object, dist = "correlation", method = "ward", sd.filter = TRUE, sd.threshold = 0.5, filterByQuantile = TRUE, plot = TRUE, chunk.size = 1e+06 )
clusterSamples(.Object, dist="correlation", method="ward", sd.filter=TRUE,sd.threshold=0.5, filterByQuantile=TRUE, plot=TRUE,chunk.size) ## S4 method for signature 'methylBase' clusterSamples( .Object, dist, method, sd.filter, sd.threshold, filterByQuantile, plot ) ## S4 method for signature 'methylBaseDB' clusterSamples( .Object, dist = "correlation", method = "ward", sd.filter = TRUE, sd.threshold = 0.5, filterByQuantile = TRUE, plot = TRUE, chunk.size = 1e+06 )
.Object |
a |
dist |
the distance measure to be used. This must be one of
" |
method |
the agglomeration method to be used. This should be
(an unambiguous abbreviation of) one of " |
sd.filter |
If |
sd.threshold |
A numeric value. If |
filterByQuantile |
A logical determining if |
plot |
a logical value indicating whether to plot hierarchical clustering. (default:TRUE) |
chunk.size |
Number of rows to be taken as a chunk for processing the |
a tree
object of a hierarchical cluster analysis using a set
of dissimilarities for the n objects being clustered.
The parameter chunk.size
is only used when working with
methylBaseDB
objects,
as they are read in chunk by chunk to enable processing large-sized
objects which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for
most systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
data(methylKit) clusterSamples(methylBase.obj, dist="correlation", method="ward", plot=TRUE)
data(methylKit) clusterSamples(methylBase.obj, dist="correlation", method="ward", plot=TRUE)
The function simulates DNA methylation data from multiple samples. See references for detailed explanation on statistics.
dataSim( replicates, sites, treatment, percentage = 10, effect = 25, alpha = 0.4, beta = 0.5, theta = 10, covariates = NULL, sample.ids = NULL, assembly = "hg18", context = "CpG", add.info = FALSE )
dataSim( replicates, sites, treatment, percentage = 10, effect = 25, alpha = 0.4, beta = 0.5, theta = 10, covariates = NULL, sample.ids = NULL, assembly = "hg18", context = "CpG", add.info = FALSE )
replicates |
the number of samples that should be simulated. |
sites |
the number of CpG sites per sample. |
treatment |
a vector containing treatment information. |
percentage |
the proportion of sites which should be affected by the treatment. |
effect |
a number between 0 and 100 specifying the effect size of the treatment. This is essentially describing the average percent methylation difference between differentially methylated bases.See 'Examples' and 'Details'. |
alpha |
shape1 parameter for beta distribution (used for initial sampling of methylation proportions) |
beta |
shape2 parameter for beta distribution (used for initial sampling of methylation proportions) |
theta |
dispersion parameter for beta distribution (initial sampling of methylation proportions) |
covariates |
a data.frame containing covariates (optional) |
sample.ids |
will be generated automatically from |
assembly |
the assembly description (e.g. "hg18").Only needed for book keeping. |
context |
the experimanteal context of the data (e.g. "CpG"). Only needed for book keeping. |
add.info |
if set to TRUE, the output will be a list with the first element being the methylbase object and a vector of indices that indicate which CpGs should be differentially methylated. This vector can be used to subset simulated methylBase or methylDiff object with differentially methylated bases. |
a methylBase object containing simulated methylation data, or if add.info=TRUE a list containing the methylbase object and the indices of all treated sites (differentially methylated bases or regions) as the second element.
The function uses
a Beta distribution to simulate the methylation proportion
background across all samples.
The parameters alpha
, beta
used in a beta distribution to draw
methylation proportions,, from a typical bimodal distribution.
For each initial methylation proportion drawn using the parameters above,
a range of
methylation proportions is distributed around the original
with
overdispersion parameter
, this is using an alternative
parameterization of Beta distribution:
.
The parameters
percentage
and effect
determine the proportion
of sites that are
affected by the treatment (meaning differential sites) and the strength of
this influence, respectively. effect
is added on top of for
the CpGs that are affected by the treament. The affected group of samples
for that
particular CpG will now be distributed by
.
The coverage is modeled with a negative binomial distribution, using
rnbinom
function with size=1
and prob=0.01
.
The additional information needed for a valid methylBase object, such as
CpG start, end and strand, is generated as "dummy values",
but can be overwritten as needed.
data(methylKit) # Simulate data for 4 samples with 20000 sites each. # The methylation in 10% of the sites are elevated by 25%. my.methylBase=dataSim(replicates=4,sites=2000,treatment=c(1,1,0,0), percentage=10,effect=25)
data(methylKit) # Simulate data for 4 samples with 20000 sites each. # The methylation in 10% of the sites are elevated by 25%. my.methylBase=dataSim(replicates=4,sites=2000,treatment=c(1,1,0,0), percentage=10,effect=25)
This function gets number of hyper/hypo methylated regions/bases from
methylDiff
object.
It can also plot percentages of differentially methylated bases per chromosome.
diffMethPerChr( x, plot = TRUE, qvalue.cutoff = 0.01, meth.cutoff = 25, exclude = NULL, keep.empty.chrom = FALSE, ... ) ## S4 method for signature 'methylDiff' diffMethPerChr( x, plot = TRUE, qvalue.cutoff = 0.01, meth.cutoff = 25, exclude = NULL, keep.empty.chrom = FALSE, ... ) ## S4 method for signature 'methylDiffDB' diffMethPerChr( x, plot = TRUE, qvalue.cutoff = 0.01, meth.cutoff = 25, exclude = NULL, keep.empty.chrom = FALSE, ... )
diffMethPerChr( x, plot = TRUE, qvalue.cutoff = 0.01, meth.cutoff = 25, exclude = NULL, keep.empty.chrom = FALSE, ... ) ## S4 method for signature 'methylDiff' diffMethPerChr( x, plot = TRUE, qvalue.cutoff = 0.01, meth.cutoff = 25, exclude = NULL, keep.empty.chrom = FALSE, ... ) ## S4 method for signature 'methylDiffDB' diffMethPerChr( x, plot = TRUE, qvalue.cutoff = 0.01, meth.cutoff = 25, exclude = NULL, keep.empty.chrom = FALSE, ... )
x |
a |
plot |
TRUE|FALSE. If TRUE horizontal barplots for proportion of hypo/hyper methylated bases/regions |
qvalue.cutoff |
cutoff for q-value |
meth.cutoff |
cutoff for percent methylation difference |
exclude |
names of chromosomes to be excluded from plot |
keep.empty.chrom |
keep chromosome in list / plot, even if it contains no hyper/hypo sites |
... |
extra graphical parameters to be passed to |
plots a piechart or a barplot for percentage of the target features overlapping with annotation
data(methylKit) # get a list of differentially methylated bases/regions per chromosome and overall diffMethPerChr(methylDiff.obj, plot=FALSE,qvalue.cutoff=0.01, meth.cutoff=25,exclude=NULL)
data(methylKit) # get a list of differentially methylated bases/regions per chromosome and overall diffMethPerChr(methylDiff.obj, plot=FALSE,qvalue.cutoff=0.01, meth.cutoff=25,exclude=NULL)
The function extracts part of the data and returns a new object.
## S4 method for signature 'methylRaw,ANY,ANY,ANY' x[i, j] ## S4 method for signature 'methylBase,ANY,ANY,ANY' x[i, j] ## S4 method for signature 'methylDiff,ANY,ANY,ANY' x[i, j] ## S4 method for signature 'methylRawDB,ANY,ANY,ANY' x[i, j] ## S4 method for signature 'methylBaseDB,ANY,ANY,ANY' x[i, j] ## S4 method for signature 'methylDiffDB,ANY,ANY,ANY' x[i, j]
## S4 method for signature 'methylRaw,ANY,ANY,ANY' x[i, j] ## S4 method for signature 'methylBase,ANY,ANY,ANY' x[i, j] ## S4 method for signature 'methylDiff,ANY,ANY,ANY' x[i, j] ## S4 method for signature 'methylRawDB,ANY,ANY,ANY' x[i, j] ## S4 method for signature 'methylBaseDB,ANY,ANY,ANY' x[i, j] ## S4 method for signature 'methylDiffDB,ANY,ANY,ANY' x[i, j]
x |
an |
i |
a numeric or logical vector. This vector corresponds to bases or
regions contained in |
j |
This argument can not be used for the extraction of columns.
As unintentional extraction of the columns will cause an error in
the downstream analysis. Using this argument will cause an error.
Use |
data(methylKit) # selects first hundred rows, returns a methylRaw object subset1=methylRawList.obj[[1]][1:100] # selects first hundred rows, returns a methylBase object subset2=methylBase.obj[1:100,] # selects first hundred rows, returns a methylDiff object subset3=methylDiff.obj[1:100,] # This will get chromomsomes, will return a factor # That means the resulting object will ceases to be a methylKit object chrs=methylDiff.obj[[2]]
data(methylKit) # selects first hundred rows, returns a methylRaw object subset1=methylRawList.obj[[1]][1:100] # selects first hundred rows, returns a methylBase object subset2=methylBase.obj[1:100,] # selects first hundred rows, returns a methylDiff object subset3=methylDiff.obj[1:100,] # This will get chromomsomes, will return a factor # That means the resulting object will ceases to be a methylKit object chrs=methylDiff.obj[[2]]
This function filters methylRaw
, methylRawDB
,
methylRawList
and methylRawListDB
objects.
You can filter based on lower read cutoff or high read cutoff.
Higher read cutoff is usefull to eliminate PCR effects
Lower read cutoff is usefull for doing better statistical tests.
filterByCoverage( methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRaw' filterByCoverage( methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawList' filterByCoverage( methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawDB' filterByCoverage( methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylRawListDB' filterByCoverage( methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = TRUE, ... )
filterByCoverage( methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRaw' filterByCoverage( methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawList' filterByCoverage( methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawDB' filterByCoverage( methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylRawListDB' filterByCoverage( methylObj, lo.count = NULL, lo.perc = NULL, hi.count = NULL, hi.perc = NULL, chunk.size = 1e+06, save.db = TRUE, ... )
methylObj |
a |
lo.count |
An integer for read counts.Bases/regions having lower coverage than this count is discarded |
lo.perc |
A double [0-100] for percentile of read counts. Bases/regions having lower coverage than this percentile is discarded |
hi.count |
An integer for read counts. Bases/regions having higher coverage than this is count discarded |
hi.perc |
A double [0-100] for percentile of read counts. Bases/regions having higher coverage than this percentile is discarded |
chunk.size |
Number of rows to be taken as a chunk for processing the
|
save.db |
A Logical to decide whether the resulting object should be saved as flat file database or not, default: explained in Details sections |
... |
optional Arguments used when save.db is TRUE
|
methylRaw
, methylRawDB
, methylRawList
or
methylRawListDB
object depending on input object
The parameter chunk.size
is only used when working with
methylRawDB
or methylRawListDB
objects,
as they are read in chunk by chunk to enable processing large-sized objects
which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for most
systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
The parameter save.db
is per default TRUE for methylDB objects as
methylRawDB
and methylRawListDB
,
while being per default FALSE for methylRaw
and methylRawList
.
If you wish to save the result of an
in-memory-calculation as flat file database or if the size of the database
allows the calculation in-memory,
then you might change the value of this parameter.
data(methylKit) # filter out bases with covereage above 500 reads filtered1=filterByCoverage(methylRawList.obj,lo.count=NULL,lo.perc=NULL, hi.count=500,hi.perc=NULL) # filter out bases with cread coverage above 99.9th percentile of coverage # distribution filtered2=filterByCoverage(methylRawList.obj,lo.count=NULL,lo.perc=NULL, hi.count=NULL,hi.perc=99.9) # filter out bases with covereage above 500 reads and save to database # "test1_max500.txt.bgz" # in directory "methylDB", filtered3 now becomes a \code{methylRawDB} object filtered3=filterByCoverage(methylRawList.obj[[1]], lo.count=NULL,lo.perc=NULL, hi.count=500, hi.perc=NULL, save.db=TRUE, suffix="max500", dbdir="methylDB") # tidy up rm(filtered3) unlink("methylDB",recursive=TRUE)
data(methylKit) # filter out bases with covereage above 500 reads filtered1=filterByCoverage(methylRawList.obj,lo.count=NULL,lo.perc=NULL, hi.count=500,hi.perc=NULL) # filter out bases with cread coverage above 99.9th percentile of coverage # distribution filtered2=filterByCoverage(methylRawList.obj,lo.count=NULL,lo.perc=NULL, hi.count=NULL,hi.perc=99.9) # filter out bases with covereage above 500 reads and save to database # "test1_max500.txt.bgz" # in directory "methylDB", filtered3 now becomes a \code{methylRawDB} object filtered3=filterByCoverage(methylRawList.obj[[1]], lo.count=NULL,lo.perc=NULL, hi.count=500, hi.perc=NULL, save.db=TRUE, suffix="max500", dbdir="methylDB") # tidy up rm(filtered3) unlink("methylDB",recursive=TRUE)
The function returns the genome assembly stored in any of the
methylBase
,methylBaseDB
,methylRaw
,
methylRawDB
,methylDiff
objects
getAssembly(x) ## S4 method for signature 'methylBase' getAssembly(x) ## S4 method for signature 'methylRaw' getAssembly(x) ## S4 method for signature 'methylDiff' getAssembly(x) ## S4 method for signature 'methylRawDB' getAssembly(x) ## S4 method for signature 'methylBaseDB' getAssembly(x) ## S4 method for signature 'methylDiffDB' getAssembly(x)
getAssembly(x) ## S4 method for signature 'methylBase' getAssembly(x) ## S4 method for signature 'methylRaw' getAssembly(x) ## S4 method for signature 'methylDiff' getAssembly(x) ## S4 method for signature 'methylRawDB' getAssembly(x) ## S4 method for signature 'methylBaseDB' getAssembly(x) ## S4 method for signature 'methylDiffDB' getAssembly(x)
x |
an |
the assembly string for the object
data(methylKit) getAssembly(methylBase.obj) getAssembly(methylDiff.obj) getAssembly(methylRawList.obj[[1]])
data(methylKit) getAssembly(methylBase.obj) getAssembly(methylDiff.obj) getAssembly(methylRawList.obj[[1]])
The function returns the context of methylation. For example: "CpG","CHH" or "CHG"
getContext(x) ## S4 method for signature 'methylBase' getContext(x) ## S4 method for signature 'methylRaw' getContext(x) ## S4 method for signature 'methylDiff' getContext(x) ## S4 method for signature 'methylRawDB' getContext(x) ## S4 method for signature 'methylBaseDB' getContext(x) ## S4 method for signature 'methylDiffDB' getContext(x)
getContext(x) ## S4 method for signature 'methylBase' getContext(x) ## S4 method for signature 'methylRaw' getContext(x) ## S4 method for signature 'methylDiff' getContext(x) ## S4 method for signature 'methylRawDB' getContext(x) ## S4 method for signature 'methylBaseDB' getContext(x) ## S4 method for signature 'methylDiffDB' getContext(x)
x |
an |
a string for the context methylation
data(methylKit) getContext(methylBase.obj) getContext(methylDiff.obj) getContext(methylRawList.obj[[1]])
data(methylKit) getContext(methylBase.obj) getContext(methylDiff.obj) getContext(methylRawList.obj[[1]])
The functions returns a matrix of correlation coefficients and/or a set
of scatterplots showing the relationship between samples. The scatterplots
will contain also fitted lines using lm()
for linear regression
and lowess
for polynomial regression.
getCorrelation(object,method="pearson",plot=FALSE,nrow) ## S4 method for signature 'methylBase' getCorrelation(object, method = c("pearson", "kendall", "spearman"), plot) ## S4 method for signature 'methylBaseDB' getCorrelation(object, method = "pearson", plot = FALSE, nrow = 2e+06)
getCorrelation(object,method="pearson",plot=FALSE,nrow) ## S4 method for signature 'methylBase' getCorrelation(object, method = c("pearson", "kendall", "spearman"), plot) ## S4 method for signature 'methylBaseDB' getCorrelation(object, method = "pearson", plot = FALSE, nrow = 2e+06)
object |
a methylBase or methylBaseDB object |
method |
a character string indicating which correlation coefficient (or covariance) is to be computed (default:"pearson", other options are "kendall" and "spearman") |
plot |
scatterPlot if TRUE (default:FALSE) |
nrow |
a numeric giving the number of lines to read in of methylBaseDB object, defaults to 2e6 |
a correlation matrix object and plot scatterPlot
The argument 'nrow' is only evaluated if the
input is a methylBaseDB
object.
If 'nrow' is not specified getCorrelation
will read the
first 2M records of the given object,
but if you want to read all records 'nrow' has to be NULL.
You should change 'nrow' if using getCorrelation
with
all records of the methylBaseDB object would take too long.
If the scatter plot is plotted, the red line in the plot is from linear
regression fit and the green line is from polynomial regression fit with
stats::lowess
.
data(methylKit) getCorrelation(methylBase.obj,method="pearson",plot=FALSE) # create methylBaseDB methylBaseDB.obj <- unite(methylRawList.obj,save.db=TRUE,dbdir="methylDB") getCorrelation(methylBaseDB.obj,method="pearson",plot=FALSE,nrow=10000) # remove Database again rm(methylBaseDB.obj) unlink("methylDB",recursive=TRUE)
data(methylKit) getCorrelation(methylBase.obj,method="pearson",plot=FALSE) # create methylBaseDB methylBaseDB.obj <- unite(methylRawList.obj,save.db=TRUE,dbdir="methylDB") getCorrelation(methylBaseDB.obj,method="pearson",plot=FALSE,nrow=10000) # remove Database again rm(methylBaseDB.obj) unlink("methylDB",recursive=TRUE)
The function returns basic statistics about read coverage per base. It can also plot a histogram of read coverage values.
getCoverageStats( object, plot = FALSE, both.strands = FALSE, labels = TRUE, ..., chunk.size = 1e+06 ) ## S4 method for signature 'methylRaw' getCoverageStats( object, plot = FALSE, both.strands = FALSE, labels = TRUE, ..., chunk.size = 1e+06 ) ## S4 method for signature 'methylRawDB' getCoverageStats( object, plot = FALSE, both.strands = FALSE, labels = TRUE, ..., chunk.size = 1e+06 )
getCoverageStats( object, plot = FALSE, both.strands = FALSE, labels = TRUE, ..., chunk.size = 1e+06 ) ## S4 method for signature 'methylRaw' getCoverageStats( object, plot = FALSE, both.strands = FALSE, labels = TRUE, ..., chunk.size = 1e+06 ) ## S4 method for signature 'methylRawDB' getCoverageStats( object, plot = FALSE, both.strands = FALSE, labels = TRUE, ..., chunk.size = 1e+06 )
object |
a |
plot |
plot a histogram of coverage if TRUE (default:FALSE) |
both.strands |
do stats and plot for both strands if TRUE (default:FALSE) |
labels |
should the bars of the histrogram have labels showing the percentage of values in each bin (default:TRUE) |
... |
options to be passed to |
chunk.size |
Number of rows to be taken as a chunk for processing the
|
a summary of coverage statistics or plot a histogram of coverage
The parameter chunk.size
is only used when working with
methylRawDB
or methylRawListDB
objects,
as they are read in chunk by chunk to enable processing large-sized objects
which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for most
systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
data(methylKit) # gets coverage stats for the first sample in methylRawList.obj object getCoverageStats(methylRawList.obj[[1]],plot=TRUE, both.strands=FALSE,labels=TRUE)
data(methylKit) # gets coverage stats for the first sample in methylRawList.obj object getCoverageStats(methylRawList.obj[[1]],plot=TRUE, both.strands=FALSE,labels=TRUE)
The functions retrieves the table containing methylation information from
methylKit
Objects.
The data retrived from this function is of a data.frame
.
This is basically containing all relevant methylation information per
genomic region or base.
getData(x) ## S4 method for signature 'methylBase' getData(x) ## S4 method for signature 'methylRaw' getData(x) ## S4 method for signature 'methylDiff' getData(x) ## S4 method for signature 'methylRawDB' getData(x) ## S4 method for signature 'methylBaseDB' getData(x) ## S4 method for signature 'methylDiffDB' getData(x)
getData(x) ## S4 method for signature 'methylBase' getData(x) ## S4 method for signature 'methylRaw' getData(x) ## S4 method for signature 'methylDiff' getData(x) ## S4 method for signature 'methylRawDB' getData(x) ## S4 method for signature 'methylBaseDB' getData(x) ## S4 method for signature 'methylDiffDB' getData(x)
x |
an |
data frame for methylation events
data(methylKit) # following commands show first lines of returned # data.frames from getData() function head( getData(methylBase.obj) ) head( getData(methylDiff.obj)) head(getData(methylRawList.obj[[1]]))
data(methylKit) # following commands show first lines of returned # data.frames from getData() function head( getData(methylBase.obj) ) head( getData(methylDiff.obj)) head(getData(methylRawList.obj[[1]]))
The function returns the path to the flat file database that stores the data
of the methylRawDB
,
methylRawListDB
, methylBaseDB
or
methylDiffDB
objects.
getDBPath(x) ## S4 method for signature 'methylRawListDB' getDBPath(x) ## S4 method for signature 'methylBaseDB' getDBPath(x) ## S4 method for signature 'methylRawDB' getDBPath(x) ## S4 method for signature 'methylDiffDB' getDBPath(x)
getDBPath(x) ## S4 method for signature 'methylRawListDB' getDBPath(x) ## S4 method for signature 'methylBaseDB' getDBPath(x) ## S4 method for signature 'methylRawDB' getDBPath(x) ## S4 method for signature 'methylDiffDB' getDBPath(x)
x |
an |
data(methylKit) methylBaseDB.obj <- unite(methylRawList.obj,save.db=TRUE,dbdir="methylDB") #The path to the database is returned getDBPath(methylBaseDB.obj) # remove Database again rm(methylBaseDB.obj) unlink("methylDB",recursive=TRUE)
data(methylKit) methylBaseDB.obj <- unite(methylRawList.obj,save.db=TRUE,dbdir="methylDB") #The path to the database is returned getDBPath(methylBaseDB.obj) # remove Database again rm(methylBaseDB.obj) unlink("methylDB",recursive=TRUE)
The function returns basic statistics about It can also plot a histogram of
getMethylationStats( object, plot = FALSE, both.strands = FALSE, labels = TRUE, ..., chunk.size = 1e+06 ) ## S4 method for signature 'methylRaw' getMethylationStats( object, plot = FALSE, both.strands = FALSE, labels = TRUE, ..., chunk.size = 1e+06 ) ## S4 method for signature 'methylRawDB' getMethylationStats( object, plot = FALSE, both.strands = FALSE, labels = TRUE, ..., chunk.size = 1e+06 )
getMethylationStats( object, plot = FALSE, both.strands = FALSE, labels = TRUE, ..., chunk.size = 1e+06 ) ## S4 method for signature 'methylRaw' getMethylationStats( object, plot = FALSE, both.strands = FALSE, labels = TRUE, ..., chunk.size = 1e+06 ) ## S4 method for signature 'methylRawDB' getMethylationStats( object, plot = FALSE, both.strands = FALSE, labels = TRUE, ..., chunk.size = 1e+06 )
object |
a |
plot |
plot a histogram of Methylation if TRUE (deafult:FALSE) |
both.strands |
do plots and stats for both strands seperately if TRUE (deafult:FALSE) |
labels |
should the bars of the histrogram have labels showing the percentage of values in each bin (default:TRUE) |
... |
options to be passed to |
chunk.size |
Number of rows to be taken as a chunk for processing the
|
a summary of Methylation statistics or plot a histogram of coverage
The parameter chunk.size
is only used when working with
methylRawDB
or methylRawListDB
objects,
as they are read in chunk by chunk to enable processing large-sized
objects which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for most
systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
data(methylKit) # gets Methylation stats for the first sample in methylRawList.obj object getMethylationStats(methylRawList.obj[[1]],plot=TRUE, both.strands=FALSE,labels=TRUE)
data(methylKit) # gets Methylation stats for the first sample in methylRawList.obj object getMethylationStats(methylRawList.obj[[1]],plot=TRUE, both.strands=FALSE,labels=TRUE)
The function subsets a methylDiff
or methylDiffDB
object in order to get
differentially methylated bases/regions
satisfying thresholds.
getMethylDiff( .Object, difference = 25, qvalue = 0.01, type = "all", chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylDiff' getMethylDiff( .Object, difference = 25, qvalue = 0.01, type = "all", chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylDiffDB' getMethylDiff( .Object, difference = 25, qvalue = 0.01, type = "all", chunk.size = 1e+06, save.db = TRUE, ... )
getMethylDiff( .Object, difference = 25, qvalue = 0.01, type = "all", chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylDiff' getMethylDiff( .Object, difference = 25, qvalue = 0.01, type = "all", chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylDiffDB' getMethylDiff( .Object, difference = 25, qvalue = 0.01, type = "all", chunk.size = 1e+06, save.db = TRUE, ... )
.Object |
a |
difference |
cutoff for absolute value of methylation percentage change between test and control (default:25) |
qvalue |
cutoff for qvalue of differential methylation statistic (default:0.01) |
type |
one of the "hyper","hypo" or "all" strings. Specifies what type of differentially menthylated bases/regions should be returned. For retrieving Hyper-methylated regions/bases type="hyper", for hypo-methylated type="hypo" (default:"all") |
chunk.size |
Number of rows to be taken as a chunk for processing the
|
save.db |
A Logical to decide whether the resulting object should be saved as flat file database or not, default: see Details |
... |
optional Arguments used when save.db is TRUE
|
a methylDiff or methylDiffDB object containing the differential methylated locations satisfying the criteria
The parameter chunk.size
is only used when working with
methylDiffDB
objects,
as they are read in chunk by chunk to enable processing large-sized objects
which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for most
systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the chunk.size
.
The parameter save.db
is per default TRUE for methylDB objects as
methylDiffDB
,
while being per default FALSE for methylDiff
. If you wish to save
the result of an
in-memory-calculation as flat file database or if the size of the database
allows the calculation in-memory,
then you might want to change the value of this parameter.
data(methylKit) # get differentially methylated bases/regions with specific cutoffs all.diff=getMethylDiff(methylDiff.obj,difference=25,qvalue=0.01,type="all") # get hyper-methylated hyper=getMethylDiff(methylDiff.obj,difference=25,qvalue=0.01,type="hyper") # get hypo-methylated hypo=getMethylDiff(methylDiff.obj,difference=25,qvalue=0.01,type="hypo")
data(methylKit) # get differentially methylated bases/regions with specific cutoffs all.diff=getMethylDiff(methylDiff.obj,difference=25,qvalue=0.01,type="all") # get hyper-methylated hyper=getMethylDiff(methylDiff.obj,difference=25,qvalue=0.01,type="hyper") # get hypo-methylated hypo=getMethylDiff(methylDiff.obj,difference=25,qvalue=0.01,type="hypo")
The function returns or replaces the sample-ids stored in any of the
following methylKit
objects:
methylRaw
, methylRawDB
, methylBase
,
methylBaseDB
,
methylRawList
, methylRawListDB
,
methylDiff
, methylDiffDB
.
getSampleID(x) getSampleID(x) <- value getSampleID(x) <- value ## S4 method for signature 'methylRawList' getSampleID(x) ## S4 replacement method for signature 'methylRawList' getSampleID(x) <- value ## S4 method for signature 'methylBase' getSampleID(x) ## S4 replacement method for signature 'methylBase' getSampleID(x) <- value ## S4 method for signature 'methylRaw' getSampleID(x) ## S4 replacement method for signature 'methylRaw' getSampleID(x) <- value ## S4 method for signature 'methylDiff' getSampleID(x) ## S4 replacement method for signature 'methylDiff' getSampleID(x) <- value ## S4 method for signature 'methylRawListDB' getSampleID(x) ## S4 replacement method for signature 'methylRawListDB' getSampleID(x) <- value ## S4 method for signature 'methylBaseDB' getSampleID(x) ## S4 replacement method for signature 'methylBaseDB' getSampleID(x) <- value ## S4 method for signature 'methylRawDB' getSampleID(x) ## S4 replacement method for signature 'methylRawDB' getSampleID(x) <- value ## S4 method for signature 'methylDiffDB' getSampleID(x) ## S4 replacement method for signature 'methylDiffDB' getSampleID(x) <- value
getSampleID(x) getSampleID(x) <- value getSampleID(x) <- value ## S4 method for signature 'methylRawList' getSampleID(x) ## S4 replacement method for signature 'methylRawList' getSampleID(x) <- value ## S4 method for signature 'methylBase' getSampleID(x) ## S4 replacement method for signature 'methylBase' getSampleID(x) <- value ## S4 method for signature 'methylRaw' getSampleID(x) ## S4 replacement method for signature 'methylRaw' getSampleID(x) <- value ## S4 method for signature 'methylDiff' getSampleID(x) ## S4 replacement method for signature 'methylDiff' getSampleID(x) <- value ## S4 method for signature 'methylRawListDB' getSampleID(x) ## S4 replacement method for signature 'methylRawListDB' getSampleID(x) <- value ## S4 method for signature 'methylBaseDB' getSampleID(x) ## S4 replacement method for signature 'methylBaseDB' getSampleID(x) <- value ## S4 method for signature 'methylRawDB' getSampleID(x) ## S4 replacement method for signature 'methylRawDB' getSampleID(x) <- value ## S4 method for signature 'methylDiffDB' getSampleID(x) ## S4 replacement method for signature 'methylDiffDB' getSampleID(x) <- value
x |
an |
value |
a valid replacement vector for the sample-ids of the object |
data(methylKit) #The Sample-Ids can be printed .. getSampleID(methylBase.obj) # .. or replaced. newObj <- methylBase.obj getSampleID(newObj) <- c("sample1","sample2","sample3","sample4") getSampleID(newObj)
data(methylKit) #The Sample-Ids can be printed .. getSampleID(methylBase.obj) # .. or replaced. newObj <- methylBase.obj getSampleID(newObj) <- c("sample1","sample2","sample3","sample4") getSampleID(newObj)
The function returns or replaces the treatment vector stored in any of the
following methylKit objects:
methylBase
,methylRawList
,methylBaseDB
,
methylRawListDB
,methylDiff
,methylDiffDB
.
getTreatment(x) getTreatment(x) <- value getTreatment(x) <- value ## S4 method for signature 'methylRawList' getTreatment(x) ## S4 replacement method for signature 'methylRawList' getTreatment(x) <- value ## S4 method for signature 'methylBase' getTreatment(x) ## S4 replacement method for signature 'methylBase' getTreatment(x) <- value ## S4 method for signature 'methylDiff' getTreatment(x) ## S4 replacement method for signature 'methylDiff' getTreatment(x) <- value ## S4 method for signature 'methylRawListDB' getTreatment(x) ## S4 replacement method for signature 'methylRawListDB' getTreatment(x) <- value ## S4 method for signature 'methylBaseDB' getTreatment(x) ## S4 replacement method for signature 'methylBaseDB' getTreatment(x) <- value ## S4 method for signature 'methylDiffDB' getTreatment(x) ## S4 replacement method for signature 'methylDiffDB' getTreatment(x) <- value
getTreatment(x) getTreatment(x) <- value getTreatment(x) <- value ## S4 method for signature 'methylRawList' getTreatment(x) ## S4 replacement method for signature 'methylRawList' getTreatment(x) <- value ## S4 method for signature 'methylBase' getTreatment(x) ## S4 replacement method for signature 'methylBase' getTreatment(x) <- value ## S4 method for signature 'methylDiff' getTreatment(x) ## S4 replacement method for signature 'methylDiff' getTreatment(x) <- value ## S4 method for signature 'methylRawListDB' getTreatment(x) ## S4 replacement method for signature 'methylRawListDB' getTreatment(x) <- value ## S4 method for signature 'methylBaseDB' getTreatment(x) ## S4 replacement method for signature 'methylBaseDB' getTreatment(x) <- value ## S4 method for signature 'methylDiffDB' getTreatment(x) ## S4 replacement method for signature 'methylDiffDB' getTreatment(x) <- value
x |
a |
value |
a valid replacement for the treatment vector of the object |
data(methylKit) # The treatment vector can be printed .. getTreatment(methylBase.obj) # .. or replaced with a new one newObj <- methylBase.obj getTreatment(newObj) <- c(1,2,3,4) getTreatment(newObj)
data(methylKit) # The treatment vector can be printed .. getTreatment(methylBase.obj) # .. or replaced with a new one newObj <- methylBase.obj getTreatment(newObj) <- c(1,2,3,4) getTreatment(newObj)
Segmentation and clustering are two distinct steps in methSeg(), leading to adjacent segments of the same class. This leads to a bias segment length distributions, which is removed by joining those neighbours.
joinSegmentNeighbours(res)
joinSegmentNeighbours(res)
res |
A |
A GRanges
object with segment
classification and information.
The function converts in-memory methylKit objects to methylDB objects
makeMethylDB(obj, dbdir = getwd()) ## S4 method for signature 'methylBase' makeMethylDB(obj, dbdir = getwd()) ## S4 method for signature 'methylRaw' makeMethylDB(obj, dbdir = getwd()) ## S4 method for signature 'methylDiff' makeMethylDB(obj, dbdir = getwd()) ## S4 method for signature 'methylRawList' makeMethylDB(obj, dbdir = getwd())
makeMethylDB(obj, dbdir = getwd()) ## S4 method for signature 'methylBase' makeMethylDB(obj, dbdir = getwd()) ## S4 method for signature 'methylRaw' makeMethylDB(obj, dbdir = getwd()) ## S4 method for signature 'methylDiff' makeMethylDB(obj, dbdir = getwd()) ## S4 method for signature 'methylRawList' makeMethylDB(obj, dbdir = getwd())
obj |
an |
dbdir |
directory where flat file database(s) should be stored, defaults to getwd(), working directory. |
an methylBaseDB
,methylRawDB
,
methylRawListDB
or an methylDiffDB
object
## Not run: data(methylKit) makeMethylDB(methylBase.obj,"my/path") ## End(Not run)
## Not run: data(methylKit) makeMethylDB(methylBase.obj,"my/path") ## End(Not run)
The function reads a list of files or single files with methylation information for bases/region in the genome and creates a methylrawList or methylraw object. The information can be stored as flat file database by creating a methylrawlistDB or methylrawDB object.
methRead( location, sample.id, assembly, dbtype = NA, pipeline = "amp", header = TRUE, skip = 0, sep = "\t", context = "CpG", resolution = "base", treatment = NA, dbdir = getwd(), mincov = 10 ) ## S4 method for signature 'character,character,character' methRead( location, sample.id, assembly, dbtype, pipeline, header, skip, sep, context, resolution, dbdir, mincov ) ## S4 method for signature 'character,ANY,ANY' methRead( location, sample.id, assembly, dbtype, pipeline, header, skip, sep, context, resolution, dbdir, mincov ) ## S4 method for signature 'list,list,character' methRead( location, sample.id, assembly, dbtype = NA, pipeline = "amp", header = TRUE, skip = 0, sep = "\t", context = "CpG", resolution = "base", treatment = NA, dbdir = getwd(), mincov = 10 ) ## S4 method for signature 'list,ANY,ANY' methRead( location, sample.id, assembly, dbtype = NA, pipeline = "amp", header = TRUE, skip = 0, sep = "\t", context = "CpG", resolution = "base", treatment = NA, dbdir = getwd(), mincov = 10 )
methRead( location, sample.id, assembly, dbtype = NA, pipeline = "amp", header = TRUE, skip = 0, sep = "\t", context = "CpG", resolution = "base", treatment = NA, dbdir = getwd(), mincov = 10 ) ## S4 method for signature 'character,character,character' methRead( location, sample.id, assembly, dbtype, pipeline, header, skip, sep, context, resolution, dbdir, mincov ) ## S4 method for signature 'character,ANY,ANY' methRead( location, sample.id, assembly, dbtype, pipeline, header, skip, sep, context, resolution, dbdir, mincov ) ## S4 method for signature 'list,list,character' methRead( location, sample.id, assembly, dbtype = NA, pipeline = "amp", header = TRUE, skip = 0, sep = "\t", context = "CpG", resolution = "base", treatment = NA, dbdir = getwd(), mincov = 10 ) ## S4 method for signature 'list,ANY,ANY' methRead( location, sample.id, assembly, dbtype = NA, pipeline = "amp", header = TRUE, skip = 0, sep = "\t", context = "CpG", resolution = "base", treatment = NA, dbdir = getwd(), mincov = 10 )
location |
file location(s), either a list of locations (each a character string) or one location string |
sample.id |
sample.id(s) |
assembly |
a string that defines the genome assembly such as hg18, mm9. this is just a string for book keeping. It can be any string. Although, when using multiple files from the same assembly, this string should be consistent in each object. |
dbtype |
type of the flat file database, currently only option other than NA is "tabix". When "tabix" is given the objects are stored in tabix files, which are compressed and indexed. The default value is NA, in which case the objects are stored in memory. |
pipeline |
name of the alignment pipeline, it can be either "amp",
"bismark","bismarkCoverage", "bismarkCytosineReport" or a list (default:'amp').
The methylation text files generated from other pipelines can be
read as generic methylation text files by supplying a named
|
header |
if the input file has a header or not (default: TRUE) |
skip |
number of lines to skip when reading. Can be set to 1 for bed files with track line (default: 0) |
sep |
seperator between fields, same as |
context |
methylation context string, ex: CpG,CHG,CHH, etc. (default:CpG) |
resolution |
designates whether methylation information is base-pair resolution or regional resolution. allowed values 'base' or 'region'. Default 'base' |
treatment |
a vector contatining 0 and 1 denoting which samples are control which samples are test |
dbdir |
directory where flat file database(s) should be stored, defaults to getwd(), working directory. |
mincov |
minimum read coverage to be included in the methylKit objects. defaults to 10. Any methylated base/region in the text files below the mincov value will be ignored. |
returns methylRaw, methylRawList, methylRawDB, methylRawListDB object
The output of methRead
is determined by specific input arguments,as there
are location
, sample.id
, assembly
and dbtype
.
The first three are obligatory, while if the last argument is given database
features are enabled.
If then location
refers to an uncompressed file the function will
create a flat file database and
the associated methylRawDB object will link to this database.
If then location
refers to an earlier created database file then the
object will directly link to this database,
skipping the preprocessing steps.
When pipeline
argument is a list, it is exptected to provide a named
list with following names.
'fraction' is a logical value, denoting if the column frequency of Cs has a
range from [0-1] or [0-100]. If true it assumes range is [0-1].
'chr.col" is the number of the column that has chrosome string.
'start.col' is the number of the column that has start coordinate of the
base/region of the methylation event.
'end.col' is the number of the column that has end coordinate of the
base/region of the methylation event.
'coverage.col' is the number of the column that has read coverage values.
'strand.col' is the number of the column that has strand information, the
strand information in the file has to be in the form of '+' or '-',
'freqC.col' is the number of the column that has the frequency of Cs.
See examples to see how to read a generic methylation text file.
Other possible values for pipeline
argument are 'amp','bismark',
'bismarkCoverage' and 'bismarkCytosineReport'. For 'amp' and 'bismark'
the function expects a tabular format shown in the webpage
(http://github.com/al2na/methylKit).
"amp" and "bismark" expect identical input and are kept for historical
reasons. 'amp' was a pipeline used in Akalin et al. 2012 Plos Genetics
paper, publicly available in googlecode.
Bismark aligner can output methylation information per
base in
multiple formats. With pipeline='bismarkCoverage'
,
the function reads bismark coverage files,
which have chr,start,end, number of cytosines (methylated bases)
and number of thymines (unmethylated bases) format.
If bismark coverage files are used the function will not have
the strand information,so beware of that fact.
With pipeline='bismarkCytosineReport'
, the function expects
cytosine report files from Bismark,
which have chr,start, strand, number of cytosines (methylated bases)
, number of thymines (unmethylated bases),context and trinucletide context
format.
The function can also read gzipped files. On unix systems, this is achieved
by using zcat filename
and feeding that into data.table::fread
. On Windows, the file is first uncompressed then read into R using
data.table::fread
.
# this is a list of example files, ships with the package # for your own analysis you will just need to provide set of paths to files # you will not need the "system.file(..." part file.list=list( system.file("extdata", "test1.myCpG.txt", package = "methylKit"), system.file("extdata", "test2.myCpG.txt", package = "methylKit"), system.file("extdata", "control1.myCpG.txt", package = "methylKit"), system.file("extdata", "control2.myCpG.txt", package = "methylKit") ) # read the files to a methylRawList object: myobj myobj=methRead(file.list, sample.id=list("test1","test2","ctrl1","ctrl2"), assembly="hg18",treatment=c(1,1,0,0)) # read one file as methylRaw object myobj=methRead( file.list[[1]], sample.id="test1",assembly="hg18") # read a generic text file containing CpG methylation values # let's first look at the content of the file generic.file=system.file("extdata", "generic1.CpG.txt",package = "methylKit") read.table(generic.file,header=TRUE) # And this is how you can read that generic file as a methylKit object myobj=methRead( generic.file, pipeline=list(fraction=FALSE,chr.col=1,start.col=2,end.col=2, coverage.col=4,strand.col=3,freqC.col=5), sample.id="test1",assembly="hg18") # This creates tabix files that save methylation data # Without specified dbdir first creates a folder named the following # in working directory: # paste("methylDB",Sys.Date(),paste(sample(c(0:9, letters, LETTERS),3, # replace=TRUE),collapse="")) # # Then, saves tabix files from methylKit objects there myobj=methRead( file.list, sample.id=list("test1","test2","ctrl1","ctrl2"), assembly="hg18",treatment=c(1,1,0,0), dbtype="tabix") # This creates a single tabix files that saves methylation data # first creates a "methylDB_objects" directory # Then, saves tabix file from methylKit objects there myobj=methRead(file.list[[1]], sample.id="test1", assembly="hg18", dbtype="tabix",dbdir="methylDB_objects") # tidy up rm(myobj) unlink(list.files(pattern = "methylDB",full.names = TRUE),recursive = TRUE)
# this is a list of example files, ships with the package # for your own analysis you will just need to provide set of paths to files # you will not need the "system.file(..." part file.list=list( system.file("extdata", "test1.myCpG.txt", package = "methylKit"), system.file("extdata", "test2.myCpG.txt", package = "methylKit"), system.file("extdata", "control1.myCpG.txt", package = "methylKit"), system.file("extdata", "control2.myCpG.txt", package = "methylKit") ) # read the files to a methylRawList object: myobj myobj=methRead(file.list, sample.id=list("test1","test2","ctrl1","ctrl2"), assembly="hg18",treatment=c(1,1,0,0)) # read one file as methylRaw object myobj=methRead( file.list[[1]], sample.id="test1",assembly="hg18") # read a generic text file containing CpG methylation values # let's first look at the content of the file generic.file=system.file("extdata", "generic1.CpG.txt",package = "methylKit") read.table(generic.file,header=TRUE) # And this is how you can read that generic file as a methylKit object myobj=methRead( generic.file, pipeline=list(fraction=FALSE,chr.col=1,start.col=2,end.col=2, coverage.col=4,strand.col=3,freqC.col=5), sample.id="test1",assembly="hg18") # This creates tabix files that save methylation data # Without specified dbdir first creates a folder named the following # in working directory: # paste("methylDB",Sys.Date(),paste(sample(c(0:9, letters, LETTERS),3, # replace=TRUE),collapse="")) # # Then, saves tabix files from methylKit objects there myobj=methRead( file.list, sample.id=list("test1","test2","ctrl1","ctrl2"), assembly="hg18",treatment=c(1,1,0,0), dbtype="tabix") # This creates a single tabix files that saves methylation data # first creates a "methylDB_objects" directory # Then, saves tabix file from methylKit objects there myobj=methRead(file.list[[1]], sample.id="test1", assembly="hg18", dbtype="tabix",dbdir="methylDB_objects") # tidy up rm(myobj) unlink(list.files(pattern = "methylDB",full.names = TRUE),recursive = TRUE)
The function uses a segmentation algorithm (fastseg
)
to segment the methylation profiles. Following that, it uses gaussian mixture
modelling to cluster the segments into k components. This process uses mean
methylation value of each segment in the modeling phase. Each component
ideally indicates quantitative classification of segments, such as high or
low methylated regions.
methSeg( obj, diagnostic.plot = TRUE, join.neighbours = FALSE, initialize.on.subset = 1, ... )
methSeg( obj, diagnostic.plot = TRUE, join.neighbours = FALSE, initialize.on.subset = 1, ... )
obj |
|
diagnostic.plot |
if TRUE a diagnostic plot is plotted. The plot shows methylation and length statistics per segment group. In addition, it shows diagnostics from mixture modeling: the density function estimated and BIC criterion used to decide the optimum number of components in mixture modeling. |
join.neighbours |
if TRUE neighbouring segments that cluster to the same seg.group are joined by extending the ranges, summing up num.marks and averaging over seg.means. |
initialize.on.subset |
a numeric value indicating either percentage or absolute value of regions to subsample from segments before performing the mixture modeling. The value can be either between 0 and 1, e.g. 0.1 means that 10 integer higher than 1 to define the number of regions to sample. By default uses the whole dataset, which can be time consuming on large datasets. (Default: 1) |
... |
arguments to |
To be sure that the algorithm will work on your data, the object should have at least 5000 records
After initial segmentation with fastseg(), segments are clustered
into self-similar groups based on their mean methylation profile
using mixture modeling. Mixture modeling estimates the initial
parameters of the distributions by using the whole dataset.
If "initialize.on.subset" argument set as described, the function
will use a subset of the data to initialize the parameters of the
mixture modeling prior to the Expectation-maximization (EM) algorithm
used by Mclust
package.
A GRanges
object with segment
classification and information.
'seg.mean' column shows the mean methylation per segment.
'seg.group' column shows the segment groups obtained by mixture modeling
Altuna Akalin, contributions by Arsene Wabo and Katarzyna Wreczycka
methSeg2bed
, joinSegmentNeighbours
download.file( "https://raw.githubusercontent.com/BIMSBbioinfo/compgen2018/master/day3_diffMeth/data/H1.chr21.chr22.rds", destfile="H1.chr21.chr22.rds",method="curl") mbw=readRDS("H1.chr21.chr22.rds") # it finds the optimal number of componets as 6 res=methSeg(mbw,diagnostic.plot=TRUE,maxInt=100,minSeg=10) # however the BIC stabilizes after 4, we can also try 4 componets res=methSeg(mbw,diagnostic.plot=TRUE,maxInt=100,minSeg=10,G=1:4) # get segments to BED file methSeg2bed(res,filename="H1.chr21.chr22.trial.seg.bed") unlink(list.files(pattern="H1.chr21.chr22",full.names=TRUE))
download.file( "https://raw.githubusercontent.com/BIMSBbioinfo/compgen2018/master/day3_diffMeth/data/H1.chr21.chr22.rds", destfile="H1.chr21.chr22.rds",method="curl") mbw=readRDS("H1.chr21.chr22.rds") # it finds the optimal number of componets as 6 res=methSeg(mbw,diagnostic.plot=TRUE,maxInt=100,minSeg=10) # however the BIC stabilizes after 4, we can also try 4 componets res=methSeg(mbw,diagnostic.plot=TRUE,maxInt=100,minSeg=10,G=1:4) # get segments to BED file methSeg2bed(res,filename="H1.chr21.chr22.trial.seg.bed") unlink(list.files(pattern="H1.chr21.chr22",full.names=TRUE))
The segments are color coded based on their score (methylation or differential methylation value). They are named by segment group (components in mixture modeling) and the score in the BED file is obtained from 'seg.mean' column of segments object.
methSeg2bed( segments, filename, trackLine = "track name='meth segments' description='meth segments' itemRgb=On", colramp = colorRamp(c("gray", "green", "darkgreen")) )
methSeg2bed( segments, filename, trackLine = "track name='meth segments' description='meth segments' itemRgb=On", colramp = colorRamp(c("gray", "green", "darkgreen")) )
segments |
|
filename |
name of the output data |
trackLine |
UCSC browser trackline |
colramp |
color scale to be used in the BED display defaults to gray,green, darkgreen scale. |
A BED files with the segmented data which can be visualized in the UCSC browser
This class is designed to contain methylation information such as coverage,
number of methylated bases, etc..
The methylation events contained in the class must be sampled in multiple
experiments (ex: only CpG bases covered in multiple experiments are stored
in the object of this class).
The class extends data.frame
and creates an object that holds
methylation information and genomic location.
The object belonging to this class is produced by unite
function.
sample.ids
:character vector for ids of samples in the object
assembly
:name of the genome assembly
context
:context of methylation. Ex: CpG,CpH,CHH, etc
treatment
:treatment vector denoting which samples are test and control
coverage.index
:vector denoting which columns in the data correspons to coverage values
numCs.index
:vector denoting which columns in the data correspons to number of methylatedCs values
numTs.index
:vector denoting which columns in the data correspons to number of unmethylated Cs values
destranded
: logical value. If TRUE
object is destranded, if FALSE
it is not.
resolution
:resolution of methylation information, allowed values: 'base' or 'region'
methylBase
class extends data.frame
class
therefore providing novice and experienced R users with a data
structure that is well known and ubiquitous in many R packages.
In the following code snippets, x
is a methylBase
.
Subsetting by x[i,]
will produce a new object if subsetting is done on
rows. Column subsetting is not directly allowed to prevent errors in the
downstream analysis. see ?methylKit[ .
The following functions provides access to data slots of methylDiffDB:
- getData
: get the data slot from the methylKit objects,
- getAssembly
: get assembly of the genome,
- getContext
: get the context of methylation
methylBase
object can be coerced to
GRanges
object via as
function.
data(methylKit) library(GenomicRanges) my.gr=as(methylBase.obj,"GRanges")
data(methylKit) library(GenomicRanges) my.gr=as(methylBase.obj,"GRanges")
methylBase
, methylDiff
and methylRawList
.
You can load the data using data(methylKit)
methylBase.obj object stores the location and methylation
information for bases that
are covered in all samples. methylBase
partially extends
data.frame
S3 class.
This class is designed to contain methylation information such as coverage,
number of methylated bases, etc...
The class creates an object that holds methylation information and genomic
location as flat file database.
The object belonging to this class is produced by unite
function.
dbpath
:path to flat file database(s)
num.records
:number of records (lines) in the object
sample.ids
:character vector for ids of samples in the object
assembly
:name of the genome assembly
context
:context of methylation. Ex: CpG,CpH,CHH, etc
treatment
:treatment vector denoting which samples are test and control
coverage.index
:vector denoting which columns in the data correspond to coverage values
numCs.index
:vector denoting which columns in the data correspond to number of methylatedCs values
numTs.index
:vector denoting which columns in the data correspond to number of unmethylated Cs values
destranded
: logical value.
If TRUE
object is destranded,
if FALSE
it is not.
resolution
:resolution of methylation information, allowed values: 'base' or 'region'
dbtype
:string for type of the flat file database, ex: tabix
methylBaseDB
class has the same functionality as
methylBase
class,
but the data is saved in a flat database file and therefore allocates
less space in memory.
In the following code snippets, x
is a methylBaseDB
.
Subsetting by x[i,]
will produce a new methylBase
object
if subsetting is done on
rows. Column subsetting is not directly allowed to prevent errors in the
downstream analysis. see ?methylKit[ .
The following functions provides access to data slots of methylDiffDB:
- getData
: get the data slot from the methylKit objects,
- getAssembly
: get assembly of the genome,
- getContext
: get the context of methylation
methylBaseDB
object can be coerced to:
GRanges
object via as
function.
methylBase
object via as
function.
data(methylKit) methylBaseDB.obj <- unite(methylRawList.obj,save.db=TRUE,dbdir="methylDB") library(GenomicRanges) my.gr=as(methylBaseDB.obj,"GRanges") # remove Database again rm(methylBaseDB.obj) unlink("methylDB",recursive=TRUE)
data(methylKit) methylBaseDB.obj <- unite(methylRawList.obj,save.db=TRUE,dbdir="methylDB") library(GenomicRanges) my.gr=as(methylBaseDB.obj,"GRanges") # remove Database again rm(methylBaseDB.obj) unlink("methylDB",recursive=TRUE)
This class is designed to hold statistics and locations for differentially
methylated regions/bases. It extends data.frame
class.
calculateDiffMeth
function returns an object
with methylDiff
class.
sample.ids
ids/names of samples in a vector
assembly
a name of genome assembly, such as :hg18,mm9, etc
context
numeric vector identifying which samples are which group
treatment
numeric vector identifying which samples are which group
destranded
logical denoting if methylation inormation is destranded or not
resolution
string either 'base' or 'region' defining the resolution of methylation information
.Data
data.frame holding the locations and statistics
methylDiff
class extends data.frame
class therefore
providing novice and experienced R users with a data structure that is
well known and ubiquitous in many R packages.
In the following code snippets, x
is a methylDiff
object.
Subsetting by x[i,]
will produce a new object if subsetting is done
on rows. Column subsetting is not directly allowed to prevent errors in the
downstream analysis. see ?methylKit[ .
methylDiff
object can be coerced to
GRanges
object via as
function.
The following functions provides access to data slots of methylDiffDB:
- getData
: get the data slot from the methylKit objects,
- getAssembly
: get assembly of the genome,
- getContext
: get the context of methylation
data(methylKit) library(GenomicRanges) my.gr=as(methylDiff.obj,"GRanges")
data(methylKit) library(GenomicRanges) my.gr=as(methylDiff.obj,"GRanges")
methylBase
, methylDiff
and methylRawList
.
You can load the data using data(methylKit)
The Differential methylation information is stored in
methylDiff.obj object.
methylBase
partially extends data.frame
S3 class.
This class is designed to hold statistics and locations for differentially
methylated regions/bases as flat file database.
calculateDiffMeth
function returns an object
with methylDiffDB
class.
dbpath
:path to flat file database(s)
num.records
:number of records (lines) in the object
sample.ids
ids/names of samples in a vector
assembly
a name of genome assembly, such as :hg18,mm9, etc
context
numeric vector identifying which samples are which group
treatment
numeric vector identifying which samples are which group
destranded
logical denoting if methylation inormation is destranded or not
resolution
string either 'base' or 'region' defining the resolution of methylation information
dbtype
:string for type of the flat file database, ex: tabix
methylDiffDB
class has the same functionality as
methylDiff
class,
but the data is saved in a flat database file and therefore
allocates less space in memory.
In the following code snippets, x
is a methylDiffDB
.
Subsetting by x[i,]
will produce a new object if subsetting is done
on rows. Column subsetting is not directly allowed to prevent errors in the
downstream analysis. see ?methylKit[ .
methylDiffDB
object can be coerced to:
GRanges
object via as
function.
methylDiff
object via as
function.
The following functions provides access to data slots of methylDiffDB:
- getData
: get the data slot from the methylKit objects,
- getAssembly
: get assembly of the genome,
- getContext
: get the context of methylation
data(methylKit) methylDiffDB.obj <- calculateDiffMeth(methylBase.obj,save.db=TRUE,dbdir="methylDB") library(GenomicRanges) my.gr=as(methylDiffDB.obj,"GRanges") # remove Database again rm(methylDiffDB.obj) unlink("methylDB",recursive=TRUE)
data(methylKit) methylDiffDB.obj <- calculateDiffMeth(methylBase.obj,save.db=TRUE,dbdir="methylDB") library(GenomicRanges) my.gr=as(methylDiffDB.obj,"GRanges") # remove Database again rm(methylDiffDB.obj) unlink("methylDB",recursive=TRUE)
These are deprecated or defunct functions. Most of them are replaced by genomation functions. See the vignette for examples on how to use genomation functions for annotation purposes.
annotate.WithFeature() annotate.WithFeature.Flank() annotate.WithGenicParts() read.bed() read.feature.flank() read.transcript.features() getFeatsWithTargetsStats() getFlanks() getMembers() getTargetAnnotationStats() plotTargetAnnotation() read() read.bismark() adjust.methylC() get.methylDiff()
annotate.WithFeature() annotate.WithFeature.Flank() annotate.WithGenicParts() read.bed() read.feature.flank() read.transcript.features() getFeatsWithTargetsStats() getFlanks() getMembers() getTargetAnnotationStats() plotTargetAnnotation() read() read.bismark() adjust.methylC() get.methylDiff()
This object stores the raw mehylation data that is read in through read
function and extends data.frame
.The raw methylation data is basically
percent methylation values and read coverage values per genomic base/region.
sample.id
:string for an identifier of the sample
assembly
:string for genome assembly, ex: hg18,hg19,mm9
context
:methylation context string, ex: CpG,CpH,CHH, etc.
resolution
:resolution of methylation information, 'base' or 'region'
methylRaw
class extends data.frame
class therefore
providing novice and experienced R users with a data structure that is well
known and ubiquitous in many R packages.
In the following code snippets, x
is a methylRaw
.
Subsetting by x[i,]
will produce a new object if subsetting is done on
rows. Column subsetting is not directly allowed to prevent errors in the
downstream analysis. see ?methylKit[ .
The following functions provides access to data slots of methylDiffDB:
- getData
: get the data slot from the methylKit objects,
- getAssembly
: get assembly of the genome,
- getContext
: get the context of methylation
methylRaw
object can be coerced to
GRanges
object via as
function.
# example of a raw methylation data as a text file read.table(system.file("extdata", "control1.myCpG.txt", package = "methylKit"), header=TRUE,nrows=5) data(methylKit) # example of a methylRaw object head(methylRawList.obj[[1]]) str(head(methylRawList.obj[[1]])) library(GenomicRanges) #coercing methylRaw object to GRanges object my.gr=as(methylRawList.obj[[1]],"GRanges")
# example of a raw methylation data as a text file read.table(system.file("extdata", "control1.myCpG.txt", package = "methylKit"), header=TRUE,nrows=5) data(methylKit) # example of a methylRaw object head(methylRawList.obj[[1]]) str(head(methylRawList.obj[[1]])) library(GenomicRanges) #coercing methylRaw object to GRanges object my.gr=as(methylRawList.obj[[1]],"GRanges")
This object stores the raw mehylation data that is read in through read function as flat file database.The raw methylation data is basically percent methylation values and read coverage values per genomic base/region.
dbpath
:path to flat file database
num.records
:number of records (lines) in the object
sample.id
:string for an identifier of the sample
assembly
:string for genome assembly, ex: hg18,hg19,mm9
context
:methylation context string, ex: CpG,CpH,CHH, etc.
resolution
:resolution of methylation information, 'base' or 'region'
dbtype
:string for type of the flat file database, ex: tabix
methylRawDB
is created via read
function and has the
same functionality as methylRaw
class,
but the data is saved in a flat database file and therefore allocates less
space in memory.
In the following code snippets, x
is a methylRawDB
.
Subsetting by x[i,]
will produce a new methylRaw
object if
subsetting is done on
rows. Column subsetting is not directly allowed to prevent errors in the
downstream analysis. see ?methylKit[ .
x[]
will return the methylRawDB
object as new methylRaw
object
The following functions provides access to data slots of methylDiffDB:
- getData
: get the data slot from the methylKit objects,
- getAssembly
: get assembly of the genome,
- getContext
: get the context of methylation
methylRawDB
object can be coerced to:
GRanges
object via as
function.
methylRaw
object via as
function.
# example of a raw methylation data contained as a text file read.table(system.file("extdata", "control1.myCpG.txt", package = "methylKit"), header=TRUE,nrows=5) methylRawDB.obj <- methRead( system.file("extdata", "control1.myCpG.txt", package = "methylKit"), sample.id = "ctrl1", assembly = "hg18", dbtype = "tabix", dbdir = "methylDB") # example of a methylRawDB object methylRawDB.obj str(methylRawDB.obj) library(GenomicRanges) #coercing methylRawDB object to GRanges object my.gr=as(methylRawDB.obj,"GRanges") #coercing methylRawDB object to methylRaw object myRaw=as(methylRawDB.obj,"methylRaw") # remove Database again rm(methylRawDB.obj) unlink("methylDB",recursive=TRUE)
# example of a raw methylation data contained as a text file read.table(system.file("extdata", "control1.myCpG.txt", package = "methylKit"), header=TRUE,nrows=5) methylRawDB.obj <- methRead( system.file("extdata", "control1.myCpG.txt", package = "methylKit"), sample.id = "ctrl1", assembly = "hg18", dbtype = "tabix", dbdir = "methylDB") # example of a methylRawDB object methylRawDB.obj str(methylRawDB.obj) library(GenomicRanges) #coercing methylRawDB object to GRanges object my.gr=as(methylRawDB.obj,"GRanges") #coercing methylRawDB object to methylRaw object myRaw=as(methylRawDB.obj,"methylRaw") # remove Database again rm(methylRawDB.obj) unlink("methylDB",recursive=TRUE)
This class stores the list of methylRaw
objects.
Functions such as lapply
can be used on this list. It extends
list
class. This object is primarily produced
by methRead
function.
methylRawList(..., treatment)
methylRawList(..., treatment)
... |
vector of methylRaw objects |
treatment |
vector of treatment values |
treatment
numeric vector denoting control and test samples
.Data
a list of
methylRaw
objects
methylRawList(...)
combine multiple methylRaw objects supplied in ... into a methylRawList object.
data(methylKit) #applying functions designed for methylRaw on methylRawList object lapply(methylRawList.obj,"getAssembly")
data(methylKit) #applying functions designed for methylRaw on methylRawList object lapply(methylRawList.obj,"getAssembly")
methylBase
, methylDiff
and methylRawList
.
You can load the data using data(methylKit)
Methylation data from multiple the samples regardless
of common coverage are stored in methylRawList.obj object.
methylRawList
extends list
S3 class
This class stores the list of methylRawDB
objects.
Functions such as lapply
can be used on this list. It extends
list
class. This object is primarily produced
by methRead
function.
methylRawListDB(..., treatment)
methylRawListDB(..., treatment)
... |
vector of methylRawDB files |
treatment |
vector of treatment values |
treatment
numeric vector denoting control and test samples
.Data
a list of methylRawDB
objects
methylRawListDB(...)
combine multiple methylRawDB objects supplied in ... into a methylRawListDB object.
file.list=list( system.file("extdata", "test1.myCpG.txt", package = "methylKit"), system.file("extdata", "test2.myCpG.txt", package = "methylKit"), system.file("extdata", "control1.myCpG.txt", package = "methylKit"), system.file("extdata", "control2.myCpG.txt", package = "methylKit") ) methylRawListDB.obj <- methRead(file.list, sample.id = list("test1","test2","ctrl1","ctrl2"), assembly = "hg18",treatment = c(1,1,0,0), dbtype = "tabix",dbdir = "methylDB") #applying functions designed for methylRawDB on methylRawListDB object lapply(methylRawListDB.obj,"getAssembly") # remove Database again rm(methylRawListDB.obj) unlink("methylDB",recursive=TRUE)
file.list=list( system.file("extdata", "test1.myCpG.txt", package = "methylKit"), system.file("extdata", "test2.myCpG.txt", package = "methylKit"), system.file("extdata", "control1.myCpG.txt", package = "methylKit"), system.file("extdata", "control2.myCpG.txt", package = "methylKit") ) methylRawListDB.obj <- methRead(file.list, sample.id = list("test1","test2","ctrl1","ctrl2"), assembly = "hg18",treatment = c(1,1,0,0), dbtype = "tabix",dbdir = "methylDB") #applying functions designed for methylRawDB on methylRawListDB object lapply(methylRawListDB.obj,"getAssembly") # remove Database again rm(methylRawListDB.obj) unlink("methylDB",recursive=TRUE)
The function normalizes coverage values between samples using a scaling factor derived from differences between mean or median of coverage distributions
normalizeCoverage(obj,method="median",chunk.size,save.db,...) ## S4 method for signature 'methylRawList' normalizeCoverage( obj, method = "median", chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawListDB' normalizeCoverage( obj, method = "median", chunk.size = 1e+06, save.db = TRUE, ... )
normalizeCoverage(obj,method="median",chunk.size,save.db,...) ## S4 method for signature 'methylRawList' normalizeCoverage( obj, method = "median", chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawListDB' normalizeCoverage( obj, method = "median", chunk.size = 1e+06, save.db = TRUE, ... )
obj |
|
method |
a string "mean" or "median" which denotes median or mean should be used to calculate scaling factor. (Default:median) |
chunk.size |
Number of rows to be taken as a chunk for processing
the |
save.db |
A Logical to decide whether the resulting object should be saved as flat file database or not, default: explained in Details sections |
... |
optional Arguments used when save.db is TRUE
|
a methylRawList
or methylRawList
object
depending on class of input object
The parameter chunk.size
is only used when working with
methylRawListDB
objects,
as they are read in chunk by chunk to enable processing large-sized
objects which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for
most systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
The parameter save.db
is per default TRUE for methylDB objects as
methylRawListDB
,
while being per default FALSE for methylRawList
. If you wish to
save the result of an
in-memory-calculation as flat file database or if the size of the
database allows the calculation in-memory,
then you might want to change the value of this parameter.
Altuna Akalin
data(methylKit) # normalize by the median coverage newObj = normalizeCoverage(methylRawList.obj,method="median") # normalize by mean coverage and save to database in folder methylDB newDBObj = normalizeCoverage(methylRawList.obj,method="mean", save.db=TRUE,dbdir="methylDB")
data(methylKit) # normalize by the median coverage newObj = normalizeCoverage(methylRawList.obj,method="median") # normalize by mean coverage and save to database in folder methylDB newDBObj = normalizeCoverage(methylRawList.obj,method="mean", save.db=TRUE,dbdir="methylDB")
The function does a PCA analysis using prcomp
function
using percent methylation matrix as an input.
PCASamples(.Object, screeplot=FALSE, adj.lim=c(0.0004,0.1), scale=TRUE, center=TRUE,comp=c(1,2),transpose=TRUE,sd.filter=TRUE, sd.threshold=0.5,filterByQuantile=TRUE,obj.return=FALSE,chunk.size) ## S4 method for signature 'methylBase' PCASamples( .Object, screeplot, adj.lim, scale, center, comp, transpose, sd.filter, sd.threshold, filterByQuantile, obj.return ) ## S4 method for signature 'methylBaseDB' PCASamples( .Object, screeplot = FALSE, adj.lim = c(4e-04, 0.1), scale = TRUE, center = TRUE, comp = c(1, 2), transpose = TRUE, sd.filter = TRUE, sd.threshold = 0.5, filterByQuantile = TRUE, obj.return = FALSE, chunk.size = 1e+06 )
PCASamples(.Object, screeplot=FALSE, adj.lim=c(0.0004,0.1), scale=TRUE, center=TRUE,comp=c(1,2),transpose=TRUE,sd.filter=TRUE, sd.threshold=0.5,filterByQuantile=TRUE,obj.return=FALSE,chunk.size) ## S4 method for signature 'methylBase' PCASamples( .Object, screeplot, adj.lim, scale, center, comp, transpose, sd.filter, sd.threshold, filterByQuantile, obj.return ) ## S4 method for signature 'methylBaseDB' PCASamples( .Object, screeplot = FALSE, adj.lim = c(4e-04, 0.1), scale = TRUE, center = TRUE, comp = c(1, 2), transpose = TRUE, sd.filter = TRUE, sd.threshold = 0.5, filterByQuantile = TRUE, obj.return = FALSE, chunk.size = 1e+06 )
.Object |
a |
screeplot |
a logical value indicating whether to plot the variances against the number of the principal component. (default: FALSE) |
adj.lim |
a vector indicating the propotional adjustment of xlim (adj.lim[1]) and ylim (adj.lim[2]). This is primarily used for adjusting the visibility of sample labels on the on the PCA plot. (default: c(0.0004,0.1)) |
scale |
logical indicating if |
center |
logical indicating if |
comp |
vector of integers with 2 elements specifying which components to be plotted. |
transpose |
if TRUE (default) percent methylation matrix will be transposed, this is equivalent to doing PCA on variables that are regions/bases. The resulting plot will location of samples in the new coordinate system if FALSE the variables for the matrix will be samples and the resulting plot whill show how each sample (variable) contributes to the principle component.the samples that are highly correlated should have similar contributions to the principal components. |
sd.filter |
If |
sd.threshold |
A numeric value. If |
filterByQuantile |
A logical determining if |
obj.return |
if the result of |
chunk.size |
Number of rows to be taken as a chunk for processing the
|
The form of the value returned by PCASamples
is the summary
of principal component analysis by prcomp
.
The parameter chunk.size
is only used when working with
methylBaseDB
objects,
as they are read in chunk by chunk to enable processing large-sized
objects which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for most
systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
cor option is not in use anymore, since prcomp
is used for PCA
analysis instead of princomp
data(methylKit) # do PCA with filtering rows with low variation, filter rows with standard # deviation lower than the 50th percentile of Standard deviation distribution PCASamples(methylBase.obj,screeplot=FALSE, adj.lim=c(0.0004,0.1), scale=TRUE,center=TRUE,comp=c(1,2),transpose=TRUE,sd.filter=TRUE, sd.threshold=0.5,filterByQuantile=TRUE,obj.return=FALSE)
data(methylKit) # do PCA with filtering rows with low variation, filter rows with standard # deviation lower than the 50th percentile of Standard deviation distribution PCASamples(methylBase.obj,screeplot=FALSE, adj.lim=c(0.0004,0.1), scale=TRUE,center=TRUE,comp=c(1,2),transpose=TRUE,sd.filter=TRUE, sd.threshold=0.5,filterByQuantile=TRUE,obj.return=FALSE)
get percent methylation scores from methylBase or methylBaseDB object
percMethylation( methylBase.obj, rowids = FALSE, save.txt = FALSE, chunk.size = 1e+06 ) ## S4 method for signature 'methylBase' percMethylation(methylBase.obj, rowids = FALSE) ## S4 method for signature 'methylBaseDB' percMethylation( methylBase.obj, rowids = FALSE, save.txt = FALSE, chunk.size = 1e+06 )
percMethylation( methylBase.obj, rowids = FALSE, save.txt = FALSE, chunk.size = 1e+06 ) ## S4 method for signature 'methylBase' percMethylation(methylBase.obj, rowids = FALSE) ## S4 method for signature 'methylBaseDB' percMethylation( methylBase.obj, rowids = FALSE, save.txt = FALSE, chunk.size = 1e+06 )
methylBase.obj |
a methylBase or methylBaseDB object |
rowids |
if TRUE, matrix rownames have identifiers as base/region location (default:FALSE) |
save.txt |
if TRUE, the matrix will be written to a text file, but only for methylBaseDB objects (default: FALSE) |
chunk.size |
Number of rows to be taken as a chunk for processing
the |
matrix with percent methylation values per base/region across all samples, row names would be base/region identifiers
The parameter chunk.size
is only used when working with
methylBaseDB
objects,
as they are read in chunk by chunk to enable processing large-sized
objects which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for
most systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
data(methylKit) mat=percMethylation(methylBase.obj) head(mat)
data(methylKit) mat=percMethylation(methylBase.obj) head(mat)
The function sums up coverage, numCs and numTs values within each group so one representative sample for each group will be created in a new methylBase object
pool(obj, sample.ids, chunk.size = 1e+06, save.db = FALSE, ...) ## S4 method for signature 'methylBase' pool(obj, sample.ids, chunk.size = 1e+06, save.db = FALSE, ...) ## S4 method for signature 'methylBaseDB' pool(obj, sample.ids, chunk.size = 1e+06, save.db = TRUE, ...)
pool(obj, sample.ids, chunk.size = 1e+06, save.db = FALSE, ...) ## S4 method for signature 'methylBase' pool(obj, sample.ids, chunk.size = 1e+06, save.db = FALSE, ...) ## S4 method for signature 'methylBaseDB' pool(obj, sample.ids, chunk.size = 1e+06, save.db = TRUE, ...)
obj |
methylBase or methylBaseDB object with two groups or more and each group should have multiple samples |
sample.ids |
a character vector of new sample.ids ex:c("test","control"), should follow the same order as unique treatment vector, and should be equal to the length of the unique treatment vector |
chunk.size |
Number of rows to be taken as a chunk for processing the
|
save.db |
A Logical to decide whether the resulting object should be saved as flat file database or not, default: explained in Details sections |
... |
optional Arguments used when save.db is TRUE
|
a methylBase or methylBaseDB object depending on class of input object
The parameter chunk.size
is only used when working with
methylBaseDB
objects,
as they are read in chunk by chunk to enable processing large-sized
objects which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for
most systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
The parameter save.db
is per default TRUE for methylDB objects as
methylBaseDB
,
while being per default FALSE for methylBase
. If you wish to save
the result of an
in-memory-calculation as flat file database or if the size of the
database allows the calculation in-memory,
then you might want to change the value of this parameter.
Altuna Akalin
data(methylKit) # methylBase.obj has two groups, each group has two samples, # the following function will pool the samples in each group # so that each group will be represented by one pooled sample pooled.methylBase=pool(methylBase.obj,sample.ids=c("test","control"))
data(methylKit) # methylBase.obj has two groups, each group has two samples, # the following function will pool the samples in each group # so that each group will be represented by one pooled sample pooled.methylBase=pool(methylBase.obj,sample.ids=c("test","control"))
The function calls methylation percentage per base from sorted Bismark
SAM or BAM
files and reads methylation information as methylKit objects.
Bismark is a popular aligner for
high-throughput bisulfite sequencing experiments and it outputs its results in
SAM format by default, and can be converted to BAM.
Bismark SAM/BAM format contains
aligner specific tags which are absolutely necessary for methylation
percentage calling using processBismarkAln
.
SAM/BAM files from other aligners will not work with this function.
processBismarkAln( location, sample.id, assembly, save.folder = NULL, save.context = c("CpG"), read.context = "CpG", nolap = FALSE, mincov = 10, minqual = 20, phred64 = FALSE, treatment = NULL, save.db = FALSE, verbose = 1 ) ## S4 method for signature 'character,character,character' processBismarkAln( location, sample.id, assembly, save.folder = NULL, save.context = c("CpG"), read.context = "CpG", nolap = FALSE, mincov = 10, minqual = 20, phred64 = FALSE, treatment = NULL, save.db = FALSE, verbose = 1 ) ## S4 method for signature 'list,list,character' processBismarkAln( location, sample.id, assembly, save.folder = NULL, save.context = c("CpG"), read.context = "CpG", nolap = FALSE, mincov = 10, minqual = 20, phred64 = FALSE, treatment = NULL, save.db = FALSE, verbose = 1 )
processBismarkAln( location, sample.id, assembly, save.folder = NULL, save.context = c("CpG"), read.context = "CpG", nolap = FALSE, mincov = 10, minqual = 20, phred64 = FALSE, treatment = NULL, save.db = FALSE, verbose = 1 ) ## S4 method for signature 'character,character,character' processBismarkAln( location, sample.id, assembly, save.folder = NULL, save.context = c("CpG"), read.context = "CpG", nolap = FALSE, mincov = 10, minqual = 20, phred64 = FALSE, treatment = NULL, save.db = FALSE, verbose = 1 ) ## S4 method for signature 'list,list,character' processBismarkAln( location, sample.id, assembly, save.folder = NULL, save.context = c("CpG"), read.context = "CpG", nolap = FALSE, mincov = 10, minqual = 20, phred64 = FALSE, treatment = NULL, save.db = FALSE, verbose = 1 )
location |
location of sam or bam file(s). If multiple files are given this argument must be a list. |
sample.id |
the id(s) of samples in the same order as file. If multiple sam files are given this arugment must be a list. |
assembly |
string that determines the genome assembly. Ex: mm9,hg18 etc. This is just a string for book keeping. It can be any string. Although, when using multiple files from the same assembly, this string should be consistent in each object. |
save.folder |
The folder which will be used to save methylation call files,
if set to NULL no methylation call file will be saved
as a text file.
The files saved can be read into R in less time using
|
save.context |
A character vector consisting following strings: "CpG","CHG","CHH". The methylation percentages for these methylation contexts will be saved to save.folder |
read.context |
One of the 'CpG','CHG','CHH' or 'none' strings. Determines what type of methylation context will be read-in to the memory which can be immediately used for analysis. If given as 'none', processBismarkAln will not return any object, but if a save.folder argument given it will save the methylation percentage call files. |
nolap |
if set to TRUE and the SAM/BAM file has paired-end reads, the one read of the overlapping paired-end read pair will be ignored for methylation calling. |
mincov |
minimum read coverage to call a methylation status for a base. |
minqual |
minimum phred quality score to call a methylation status for a base. |
phred64 |
logical (default: FALSE) you will not need to set this TRUE, Currently bismark gives only phred33 scale |
treatment |
treatment vector only to be used when location and sample.id
parameters are |
save.db |
A Logical to decide whether the resulting object should be saved
as flat file database or not ( default: FALSE). With
the default value, a text file containing methylation values
will be saved.
If TRUE, database will either be saved to location
|
verbose |
logical set verbosity of methCall (default: TRUE) |
methylRaw
or methylRawList
object
SAM files should be sorted with samtools sort or unix sort. Other sorting
methods can alter the order of fields(columns) in the SAM file and that will
result in an error when using processBismarkAln()
.
# reading one bismark file: my.file=system.file("extdata", "test.fastq_bismark.sorted.min.sam", package = "methylKit") obj=processBismarkAln(my.file,"test",assembly="hg18",save.folder=NULL, save.context="CpG",read.context="CpG") # reading multiple files file.list2=list(system.file("extdata", "test.fastq_bismark.sorted.min.sam", package = "methylKit"), system.file("extdata", "test.fastq_bismark.sorted.min.sam", package = "methylKit"), system.file("extdata", "test.fastq_bismark.sorted.min.sam", package = "methylKit"), system.file("extdata", "test.fastq_bismark.sorted.min.sam", package = "methylKit")) objs=processBismarkAln(location=file.list2 ,sample.id=list("test1","test2","ctrl1","ctrl1"),assembly="hg18", save.folder=NULL,save.context=NULL,read.context="CpG", nolap=FALSE,mincov=10,minqual=20,phred64=FALSE, treatment=c(1,1,0,0))
# reading one bismark file: my.file=system.file("extdata", "test.fastq_bismark.sorted.min.sam", package = "methylKit") obj=processBismarkAln(my.file,"test",assembly="hg18",save.folder=NULL, save.context="CpG",read.context="CpG") # reading multiple files file.list2=list(system.file("extdata", "test.fastq_bismark.sorted.min.sam", package = "methylKit"), system.file("extdata", "test.fastq_bismark.sorted.min.sam", package = "methylKit"), system.file("extdata", "test.fastq_bismark.sorted.min.sam", package = "methylKit"), system.file("extdata", "test.fastq_bismark.sorted.min.sam", package = "methylKit")) objs=processBismarkAln(location=file.list2 ,sample.id=list("test1","test2","ctrl1","ctrl1"),assembly="hg18", save.folder=NULL,save.context=NULL,read.context="CpG", nolap=FALSE,mincov=10,minqual=20,phred64=FALSE, treatment=c(1,1,0,0))
The function reads the header from a given tabix file and loads it into corresponding methylDB object.
readMethylDB(dbpath)
readMethylDB(dbpath)
dbpath |
path to a tabix file with header |
an methylBaseDB
,methylRawDB
,
methylRawListDB
or an methylDiffDB
object
## Not run: data(methylKit) baseDB.obj <- makeMethylDB(methylBase.obj,"my/path") mydbpath <- getDBPath(baseDB.obj) rm(baseDB.obj) readMethylDB(mydbpath) ## End(Not run)
## Not run: data(methylKit) baseDB.obj <- makeMethylDB(methylBase.obj,"my/path") mydbpath <- getDBPath(baseDB.obj) rm(baseDB.obj) readMethylDB(mydbpath) ## End(Not run)
The function reconstructs a new methylBase object from an input methylBase object and percent methylation matrix. Basically, it uses the read coverages in the input methylBase object and deduces new number of methylated Cs and unmethylated Cs based on the input percent methylation matrix. It is ideally to be used to reconstruct methylBase objects after batch correction on percent methylation values. The percent methylation matrix rows must match methylBase object rows in order ,and in addition column order (the order of samples) in input methylBase must match the order in percent methylation matrix.
reconstruct(methMat, mBase, chunk.size = 1e+06, save.db = FALSE, ...) ## S4 method for signature 'ANY,methylBase' reconstruct(methMat, mBase, chunk.size = 1e+06, save.db = FALSE, ...) ## S4 method for signature 'ANY,methylBaseDB' reconstruct(methMat, mBase, chunk.size = 1e+06, save.db = TRUE, ...)
reconstruct(methMat, mBase, chunk.size = 1e+06, save.db = FALSE, ...) ## S4 method for signature 'ANY,methylBase' reconstruct(methMat, mBase, chunk.size = 1e+06, save.db = FALSE, ...) ## S4 method for signature 'ANY,methylBaseDB' reconstruct(methMat, mBase, chunk.size = 1e+06, save.db = TRUE, ...)
methMat |
percent methylation matrix, row order and order of the samples same as the methylBase object |
mBase |
|
chunk.size |
Number of rows to be taken as a chunk for processing the
|
save.db |
A Logical to decide whether the resulting object should be saved as flat file database or not, default: explained in Details sections |
... |
optional Arguments used when save.db is TRUE
|
new methylBase
or methylBase
object where methylation percentage matches
input methMat
and coverages matches input mBase
The parameter chunk.size
is only used when working with
methylBaseDB
objects,
as they are read in chunk by chunk to enable processing
large-sized objects which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work
for most systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the chunk.size
.
The parameter save.db
is per default TRUE for methylDB
objects as methylBaseDB
,
while being per default FALSE for methylBase
. If you wish
to save the result of an
in-memory-calculation as flat file database or if the size of the
database allows the calculation in-memory,
then you might want to change the value of this parameter.
Batch effect correction (if any batch effect exists) is a tricky issue.
We provide some simple ways to deal with it
(see assocComp
and removeComp
),
But if you can find other ways to correct for batch effects and want
to create
a methylBase object with the corrected percent methylation values,
you can use this function.
Altuna Akalin
data(methylKit) # get percent methylation mat=percMethylation(methylBase.obj) # do some changes in the matrix # this is just a toy example # ideally you want to correct the matrix # for batch effects mat[mat==100]=80 # reconstruct the methylBase from the corrected matrix newobj=reconstruct(mat,methylBase.obj)
data(methylKit) # get percent methylation mat=percMethylation(methylBase.obj) # do some changes in the matrix # this is just a toy example # ideally you want to correct the matrix # for batch effects mat[mat==100]=80 # reconstruct the methylBase from the corrected matrix newobj=reconstruct(mat,methylBase.obj)
Convert methylRaw
, methylRawDB
,
methylRawList
, methylRawListDB
,
methylBase
or methylBaseDB
object into regional
counts for a given GRanges
or GRangesList
object.
regionCounts(object,regions,cov.bases=0,strand.aware=FALSE,chunk.size,save.db,...) ## S4 method for signature 'methylRaw,GRanges' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylBase,GRanges' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRaw,GRangesList' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylBase,GRangesList' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawList,GRanges' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawList,GRangesList' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawDB,GRanges' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylRawDB,GRangesList' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylRawListDB,GRanges' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylRawListDB,GRangesList' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylBaseDB,GRanges' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylBaseDB,GRangesList' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = TRUE, ... )
regionCounts(object,regions,cov.bases=0,strand.aware=FALSE,chunk.size,save.db,...) ## S4 method for signature 'methylRaw,GRanges' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylBase,GRanges' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRaw,GRangesList' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylBase,GRangesList' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawList,GRanges' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawList,GRangesList' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawDB,GRanges' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylRawDB,GRangesList' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylRawListDB,GRanges' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylRawListDB,GRangesList' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylBaseDB,GRanges' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylBaseDB,GRangesList' regionCounts( object, regions, cov.bases = 0, strand.aware = FALSE, chunk.size = 1e+06, save.db = TRUE, ... )
object |
a NOTE: The given regions (Granges/GrangesList object) will be orderd based on chromosome and position before searching for overlaps, so the resulting methylKit object might have a different ording than expected. See details section for the reasoning of this choice and ways to still get custom ordering of regions. |
regions |
a GRanges or GRangesList object. Make sure that the GRanges objects are unique in chr,start,end and strand columns.You can make them unique by using unique() function. |
cov.bases |
number minimum bases covered per region (Default:0). Only regions with base coverage above this threshold are returned. |
strand.aware |
if set to TRUE only CpGs that match the strand of the region will be summarized. (default:FALSE) |
chunk.size |
Number of rows to be taken as a chunk for processing
the |
save.db |
A Logical to decide whether the resulting object should be saved as flat file database or not, default: explained in Details sections |
... |
optional Arguments used when save.db is TRUE
|
a new methylRaw,methylBase or methylRawList object. If strand.aware
is
set to FALSE (default). Even though the resulting object will have
the strand information of regions
it will still contain
methylation information from both strands.
The given regions (Granges/GrangesList object) will be orderd based on chromosome and position before searching for overlaps, so the resulting methylKit object might have a different ording than expected. We are doing this is to ensure that resulting output is consistent for in-memory and database based objects, as database based objects always have to be sorted to enable tabix indexing and providing fast random access.
If you to still want get a custom ordering of the output regions you can
access the single regions in any object providing your indices to the
select
or extract
functions.
The parameter chunk.size
is only used when working with
methylRawDB
, methylBaseDB
or methylRawListDB
objects,
as they are read in chunk by chunk to enable processing large-sized objects
which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for most
systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the chunk.size
.
The parameter save.db
is per default TRUE for methylDB objects as
methylRawDB
, methylBaseDB
or methylRawListDB
,
while being per default FALSE for methylRaw
, methylBase
or
methylRawList
. If you wish to save the result of an
in-memory-calculation as flat file database or if the size of the database
allows the calculation in-memory,
then you might want to change the value of this parameter.
data(methylKit) # get the windows of interest as a GRanges object, this can be any set # of genomic locations library(GenomicRanges) my.win=GRanges(seqnames="chr21", ranges=IRanges(start=seq(from=9764513,by=10000,length.out=20),width=5000) ) # getting counts per region regional.methylRaw=regionCounts(object=methylRawList.obj, regions=my.win, cov.bases=0,strand.aware=FALSE)
data(methylKit) # get the windows of interest as a GRanges object, this can be any set # of genomic locations library(GenomicRanges) my.win=GRanges(seqnames="chr21", ranges=IRanges(start=seq(from=9764513,by=10000,length.out=20),width=5000) ) # getting counts per region regional.methylRaw=regionCounts(object=methylRawList.obj, regions=my.win, cov.bases=0,strand.aware=FALSE)
This function can remove a given principal componet from a given
methylBase object. First, it calculates principal components from
percent methylation matrix and removes the given component(s), reconstructs
the methylation matrix then reconstructs number of methylated and unmethylated Cs per
position based on the reconstructed percent methylation matrix, and finally returns
a new methylBase
object.
removeComp(mBase, comp = NULL, chunk.size = 1e+06, save.db = FALSE, ...) ## S4 method for signature 'methylBase' removeComp(mBase, comp = NULL, chunk.size = 1e+06, save.db = FALSE, ...) ## S4 method for signature 'methylBaseDB' removeComp(mBase, comp = NULL, chunk.size = 1e+06, save.db = TRUE, ...)
removeComp(mBase, comp = NULL, chunk.size = 1e+06, save.db = FALSE, ...) ## S4 method for signature 'methylBase' removeComp(mBase, comp = NULL, chunk.size = 1e+06, save.db = FALSE, ...) ## S4 method for signature 'methylBaseDB' removeComp(mBase, comp = NULL, chunk.size = 1e+06, save.db = TRUE, ...)
mBase |
|
comp |
vector of component numbers to be removed |
chunk.size |
Number of rows to be taken as a chunk for processing
the |
save.db |
A Logical to decide whether the resulting object should be saved as flat file database or not, default: explained in Details sections |
... |
optional Arguments used when save.db is TRUE
|
new methylBase
or methylBaseDB
object
The parameter chunk.size
is only used when working with
methylBaseDB
objects,
as they are read in chunk by chunk to enable processing large-sized
objects which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for
most systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
The parameter save.db
is per default TRUE for methylDB objects as
methylBaseDB
,
while being per default FALSE for methylBase
. If you wish to save
the result of an
in-memory-calculation as flat file database or if the size of the
database allows the calculation in-memory,
then you might want to change the value of this parameter.
data(methylKit) # remove 1st principal component newObj=removeComp(methylBase.obj,comp=1) # remove 3rd and 4th principal components newObj=removeComp(methylBase.obj,comp=c(3,4))
data(methylKit) # remove 1st principal component newObj=removeComp(methylBase.obj,comp=1) # remove 3rd and 4th principal components newObj=removeComp(methylBase.obj,comp=c(3,4))
The function creates a new methylRawList
, methylRawListDB
,
methylBase
or methylBaseDB
object by selecting a subset of samples from the input object, which is
a methylRawList
or methylBase
object. You can use the function
to partition a large methylRawList or methylBase object
to smaller object based on sample ids or when you want to reorder samples
and/or give a new treatmet vector.
reorganize( methylObj, sample.ids, treatment, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylBase' reorganize( methylObj, sample.ids, treatment, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawList' reorganize( methylObj, sample.ids, treatment, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawListDB' reorganize( methylObj, sample.ids, treatment, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylBaseDB' reorganize( methylObj, sample.ids, treatment, chunk.size = 1e+06, save.db = TRUE, ... )
reorganize( methylObj, sample.ids, treatment, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylBase' reorganize( methylObj, sample.ids, treatment, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawList' reorganize( methylObj, sample.ids, treatment, chunk.size = 1e+06, save.db = FALSE, ... ) ## S4 method for signature 'methylRawListDB' reorganize( methylObj, sample.ids, treatment, chunk.size = 1e+06, save.db = TRUE, ... ) ## S4 method for signature 'methylBaseDB' reorganize( methylObj, sample.ids, treatment, chunk.size = 1e+06, save.db = TRUE, ... )
methylObj |
a |
sample.ids |
a vector for sample.ids to be subset. Order is important and the order should be similar to treatment. sample.ids should be a subset or reordered version of sample ids in the input object. |
treatment |
treatment vector, should be same length as sample.ids vector |
chunk.size |
Number of rows to be taken as a chunk for processing the
|
save.db |
A Logical to decide whether the resulting object should be saved as flat file database or not, default: explained in Details sections |
... |
optional Arguments used when save.db is TRUE
|
returns a methylRawList
, methylRawListDB
,
methylBase
or methylBaseDB
object depending on the input object
The parameter chunk.size
is only used when working with
methylBaseDB
or methylRawListDB
objects,
as they are read in chunk by chunk to enable processing large-sized
objects which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for most systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
The parameter save.db
is per default TRUE for methylDB objects as
methylBaseDB
and methylRawListDB
,
while being per default FALSE for methylBase
and
methylRawList
. If you wish to save the result of an
in-memory-calculation as flat file database or if the size of the
database allows the calculation in-memory,
then you might want to change the value of this parameter.
# this is a list of example files, ships with the package file.list=list( system.file("extdata", "test1.myCpG.txt", package = "methylKit"), system.file("extdata", "test2.myCpG.txt", package = "methylKit"), system.file("extdata", "control1.myCpG.txt", package = "methylKit"), system.file("extdata", "control2.myCpG.txt", package = "methylKit") ) # read the files to a methylRawList object: myobj myobj=methRead( file.list, sample.id=list("test1","test2","ctrl1","ctrl2"), assembly="hg18",pipeline="amp",treatment=c(1,1,0,0)) meth=unite(myobj,destrand=TRUE) # get samples named "test1" and "ctrl2" from myobj and create a new methylRawList object myobj2=reorganize(myobj,sample.ids=c("test1","ctrl2"),treatment=c(1,0) ) # # get samples named "test1" and "ctrl2" from meth and create a new methylBase object meth2 =reorganize(meth,sample.ids=c("test1","ctrl2"),treatment=c(1,0) )
# this is a list of example files, ships with the package file.list=list( system.file("extdata", "test1.myCpG.txt", package = "methylKit"), system.file("extdata", "test2.myCpG.txt", package = "methylKit"), system.file("extdata", "control1.myCpG.txt", package = "methylKit"), system.file("extdata", "control2.myCpG.txt", package = "methylKit") ) # read the files to a methylRawList object: myobj myobj=methRead( file.list, sample.id=list("test1","test2","ctrl1","ctrl2"), assembly="hg18",pipeline="amp",treatment=c(1,1,0,0)) meth=unite(myobj,destrand=TRUE) # get samples named "test1" and "ctrl2" from myobj and create a new methylRawList object myobj2=reorganize(myobj,sample.ids=c("test1","ctrl2"),treatment=c(1,0) ) # # get samples named "test1" and "ctrl2" from meth and create a new methylBase object meth2 =reorganize(meth,sample.ids=c("test1","ctrl2"),treatment=c(1,0) )
The function returns a subset of data contained in the methylKit
objects.
select(x,i) ## S4 method for signature 'methylBase' select(x, i) ## S4 method for signature 'methylRaw' select(x, i) ## S4 method for signature 'methylDiff' select(x, i) ## S4 method for signature 'methylRawDB' select(x, i) ## S4 method for signature 'methylBaseDB' select(x, i) ## S4 method for signature 'methylDiffDB' select(x, i)
select(x,i) ## S4 method for signature 'methylBase' select(x, i) ## S4 method for signature 'methylRaw' select(x, i) ## S4 method for signature 'methylDiff' select(x, i) ## S4 method for signature 'methylRawDB' select(x, i) ## S4 method for signature 'methylBaseDB' select(x, i) ## S4 method for signature 'methylDiffDB' select(x, i)
x |
an |
i |
a numeric or logical vector. This vector corresponds to bases or
regions contained in |
a methylBase
,methylRaw
or
methylDiff
object depending on the input object.
data(methylKit) methylRawDB.obj=methRead( system.file("extdata","test1.txt.bgz",package="methylKit"), sample.id="test1", assembly="hg18", dbtype = "tabix",dbdir = "methylDB") methylBaseDB.obj=unite(methylRawList.obj,save.db=TRUE,dbdir="methylDB") # selects first hundred rows, returns a methylRaw object subset1=select(methylRawList.obj[[1]],1:100) subset1=select(methylRawDB.obj,1:100) # selects first hundred rows, returns a methylBase object subset2=select(methylBase.obj,1:100) subset2=select(methylBaseDB.obj,1:100) # selects first hundred rows, returns a methylDiff object subset3=select(methylDiff.obj,1:100) # remove Database again rm(methylBaseDB.obj) rm(methylRawDB.obj) unlink("methylDB",recursive=TRUE)
data(methylKit) methylRawDB.obj=methRead( system.file("extdata","test1.txt.bgz",package="methylKit"), sample.id="test1", assembly="hg18", dbtype = "tabix",dbdir = "methylDB") methylBaseDB.obj=unite(methylRawList.obj,save.db=TRUE,dbdir="methylDB") # selects first hundred rows, returns a methylRaw object subset1=select(methylRawList.obj[[1]],1:100) subset1=select(methylRawDB.obj,1:100) # selects first hundred rows, returns a methylBase object subset2=select(methylBase.obj,1:100) subset2=select(methylBaseDB.obj,1:100) # selects first hundred rows, returns a methylDiff object subset3=select(methylDiff.obj,1:100) # remove Database again rm(methylBaseDB.obj) rm(methylRawDB.obj) unlink("methylDB",recursive=TRUE)
The function selects records from any methylKit
object that lie
inside the regions given by ranges
of class GRanges
and returns
an in-memory equivalent of this object
selectByOverlap(object,ranges) ## S4 method for signature 'methylRaw,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylRawList,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylBase,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylDiff,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylRawDB,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylRawListDB,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylBaseDB,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylDiffDB,GRanges' selectByOverlap(object, ranges)
selectByOverlap(object,ranges) ## S4 method for signature 'methylRaw,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylRawList,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylBase,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylDiff,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylRawDB,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylRawListDB,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylBaseDB,GRanges' selectByOverlap(object, ranges) ## S4 method for signature 'methylDiffDB,GRanges' selectByOverlap(object, ranges)
object |
an |
ranges |
a GRanges object specifying the regions of interest |
a methylBase
,methylRaw
,
methylRawList
or methylDiff
object
depending on the input object.
Alexander Gosdschan
data(methylKit) file.list=list( system.file("extdata", "test1.myCpG.txt", package = "methylKit"), system.file("extdata", "test2.myCpG.txt", package = "methylKit"), system.file("extdata", "control1.myCpG.txt", package = "methylKit"), system.file("extdata", "control2.myCpG.txt", package = "methylKit") ) methylRawListDB.obj=methRead(file.list, sample.id=list("test1","test2","ctrl1","ctrl2"), assembly="hg18",treatment=c(1,1,0,0), dbtype = "tabix",dbdir = "methylDB") methylBaseDB.obj=unite(methylRawListDB.obj) methylDiffDB.obj = calculateDiffMeth(methylBaseDB.obj) # define the windows of interest as a GRanges object, this can be any set # of genomic locations library(GenomicRanges) my.win=GRanges(seqnames="chr21", ranges=IRanges(start=seq(from=9764513,by=10000,length.out=20),width=5000) ) # selects the records that lie inside the regions myRaw <- selectByOverlap(methylRawListDB.obj[[1]],my.win) # selects the records that lie inside the regions myBase <- selectByOverlap(methylBaseDB.obj,my.win) # selects the records that lie inside the regions myDiff <- selectByOverlap(methylDiffDB.obj,my.win) # selects the records that lie inside the regions myRaw2 <- selectByOverlap(methylRawList.obj[[1]],my.win) # selects the records that lie inside the regions myRawList2 <- selectByOverlap(methylRawList.obj,my.win) # selects the records that lie inside the regions myBase2 <- selectByOverlap(methylBase.obj,my.win) # selects the records that lie inside the regions myDiff2 <- selectByOverlap(methylDiff.obj,my.win) rm(methylRawListDB.obj) rm(methylBaseDB.obj) rm(methylDiffDB.obj) unlink("methylDB",recursive=TRUE)
data(methylKit) file.list=list( system.file("extdata", "test1.myCpG.txt", package = "methylKit"), system.file("extdata", "test2.myCpG.txt", package = "methylKit"), system.file("extdata", "control1.myCpG.txt", package = "methylKit"), system.file("extdata", "control2.myCpG.txt", package = "methylKit") ) methylRawListDB.obj=methRead(file.list, sample.id=list("test1","test2","ctrl1","ctrl2"), assembly="hg18",treatment=c(1,1,0,0), dbtype = "tabix",dbdir = "methylDB") methylBaseDB.obj=unite(methylRawListDB.obj) methylDiffDB.obj = calculateDiffMeth(methylBaseDB.obj) # define the windows of interest as a GRanges object, this can be any set # of genomic locations library(GenomicRanges) my.win=GRanges(seqnames="chr21", ranges=IRanges(start=seq(from=9764513,by=10000,length.out=20),width=5000) ) # selects the records that lie inside the regions myRaw <- selectByOverlap(methylRawListDB.obj[[1]],my.win) # selects the records that lie inside the regions myBase <- selectByOverlap(methylBaseDB.obj,my.win) # selects the records that lie inside the regions myDiff <- selectByOverlap(methylDiffDB.obj,my.win) # selects the records that lie inside the regions myRaw2 <- selectByOverlap(methylRawList.obj[[1]],my.win) # selects the records that lie inside the regions myRawList2 <- selectByOverlap(methylRawList.obj,my.win) # selects the records that lie inside the regions myBase2 <- selectByOverlap(methylBase.obj,my.win) # selects the records that lie inside the regions myDiff2 <- selectByOverlap(methylDiff.obj,my.win) rm(methylRawListDB.obj) rm(methylBaseDB.obj) rm(methylDiffDB.obj) unlink("methylDB",recursive=TRUE)
The show method works for methylRaw
,methylRawDB
,
methylRawList
,methylRawListDB
,
methylBase
,methylBaseDB
and methylDiff
objects
## S4 method for signature 'methylBase' show(object) ## S4 method for signature 'methylRaw' show(object) ## S4 method for signature 'methylRawList' show(object) ## S4 method for signature 'methylDiff' show(object) ## S4 method for signature 'methylRawDB' show(object) ## S4 method for signature 'methylRawListDB' show(object) ## S4 method for signature 'methylBaseDB' show(object) ## S4 method for signature 'methylDiffDB' show(object)
## S4 method for signature 'methylBase' show(object) ## S4 method for signature 'methylRaw' show(object) ## S4 method for signature 'methylRawList' show(object) ## S4 method for signature 'methylDiff' show(object) ## S4 method for signature 'methylRawDB' show(object) ## S4 method for signature 'methylRawListDB' show(object) ## S4 method for signature 'methylBaseDB' show(object) ## S4 method for signature 'methylDiffDB' show(object)
object |
any methylKit object |
data(methylKit) methylDiff.obj show(methylDiff.obj)
data(methylKit) methylDiff.obj show(methylDiff.obj)
The function summarizes methylated/unmethylated base counts over tilling windows accross genome. This function can be used when differential methylation analysis is preferable to tilling windows instead of base pairs.
tileMethylCounts(object,win.size=1000,step.size=1000,cov.bases=0,mc.cores=1,save.db,...) ## S4 method for signature 'methylRaw' tileMethylCounts( object, win.size = 1000, step.size = 1000, cov.bases = 0, mc.cores = 1, save.db = FALSE, ... ) ## S4 method for signature 'methylRawList' tileMethylCounts( object, win.size = 1000, step.size = 1000, cov.bases = 0, mc.cores = 1, save.db = FALSE, ... ) ## S4 method for signature 'methylBase' tileMethylCounts( object, win.size = 1000, step.size = 1000, cov.bases = 0, mc.cores = 1, save.db = FALSE, ... ) ## S4 method for signature 'methylRawDB' tileMethylCounts( object, win.size = 1000, step.size = 1000, cov.bases = 0, mc.cores = 1, save.db = TRUE, ... ) ## S4 method for signature 'methylRawListDB' tileMethylCounts( object, win.size = 1000, step.size = 1000, cov.bases = 0, mc.cores = 1, save.db = TRUE, ... ) ## S4 method for signature 'methylBaseDB' tileMethylCounts( object, win.size = 1000, step.size = 1000, cov.bases = 0, mc.cores = 1, save.db = TRUE, ... )
tileMethylCounts(object,win.size=1000,step.size=1000,cov.bases=0,mc.cores=1,save.db,...) ## S4 method for signature 'methylRaw' tileMethylCounts( object, win.size = 1000, step.size = 1000, cov.bases = 0, mc.cores = 1, save.db = FALSE, ... ) ## S4 method for signature 'methylRawList' tileMethylCounts( object, win.size = 1000, step.size = 1000, cov.bases = 0, mc.cores = 1, save.db = FALSE, ... ) ## S4 method for signature 'methylBase' tileMethylCounts( object, win.size = 1000, step.size = 1000, cov.bases = 0, mc.cores = 1, save.db = FALSE, ... ) ## S4 method for signature 'methylRawDB' tileMethylCounts( object, win.size = 1000, step.size = 1000, cov.bases = 0, mc.cores = 1, save.db = TRUE, ... ) ## S4 method for signature 'methylRawListDB' tileMethylCounts( object, win.size = 1000, step.size = 1000, cov.bases = 0, mc.cores = 1, save.db = TRUE, ... ) ## S4 method for signature 'methylBaseDB' tileMethylCounts( object, win.size = 1000, step.size = 1000, cov.bases = 0, mc.cores = 1, save.db = TRUE, ... )
object |
|
win.size |
an integer for the size of the tiling windows |
step.size |
an integer for the step size of tiling windows |
cov.bases |
minimum number of bases to be covered in a given window |
mc.cores |
number of cores to use when processing |
save.db |
A Logical to decide whether the resulting object should be saved as flat file database or not, default: explained in Details sections |
... |
optional Arguments used when save.db is TRUE
|
methylRaw
,methylBase
or methylRawList
object
The parameter chunk.size
is only used when working with
methylRawDB
, methylBaseDB
or methylRawListDB
objects,
as they are read in chunk by chunk to enable processing large-sized objects
which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for most
systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the chunk.size
.
The parameter save.db
is per default TRUE for methylDB objects as
methylRawDB
, methylBaseDB
or methylRawListDB
,
while being per default FALSE for methylRaw
, methylBase
or
methylRawList
. If you wish to save the result of an
in-memory-calculation as flat file database or if the size of the database
allows the calculation in-memory,
then you might want to change the value of this parameter.
data(methylKit) tiled.methylRaw=tileMethylCounts(object=methylRawList.obj,win.size=1000, step.size=1000,cov.bases=0)
data(methylKit) tiled.methylRaw=tileMethylCounts(object=methylRawList.obj,win.size=1000, step.size=1000,cov.bases=0)
This functions unites methylRawList
and methylRawListDB
objects that only bases with coverage from all samples are retained.
The resulting object is either of class methylBase
or
methylBaseDB
depending on input.
unite( object, destrand = FALSE, min.per.group = NULL, chunk.size = 1e+06, mc.cores = 1, save.db = FALSE, ... ) ## S4 method for signature 'methylRawList' unite( object, destrand = FALSE, min.per.group = NULL, chunk.size = 1e+06, mc.cores = 1, save.db = FALSE, ... ) ## S4 method for signature 'methylRawListDB' unite( object, destrand = FALSE, min.per.group = NULL, chunk.size = 1e+06, mc.cores = 1, save.db = TRUE, ... )
unite( object, destrand = FALSE, min.per.group = NULL, chunk.size = 1e+06, mc.cores = 1, save.db = FALSE, ... ) ## S4 method for signature 'methylRawList' unite( object, destrand = FALSE, min.per.group = NULL, chunk.size = 1e+06, mc.cores = 1, save.db = FALSE, ... ) ## S4 method for signature 'methylRawListDB' unite( object, destrand = FALSE, min.per.group = NULL, chunk.size = 1e+06, mc.cores = 1, save.db = TRUE, ... )
object |
a methylRawList or methylRawListDB object to be merged by common locations covered by reads |
destrand |
if TRUE, reads covering both strands of a CpG dinucleotide will be merged, do not set to TRUE if not only interested in CpGs (default: FALSE). If the methylRawList object contains regions rather than bases setting destrand to TRUE will have no effect. |
min.per.group |
an integer denoting minimum number of samples per replicate needed to cover a region/base. By default only regions/bases that are covered in all samples are united as methylBase object, however by supplying an integer for this argument users can control how many samples needed to cover region/base to be united as methylBase object. For example, if min.per.group set to 2 and there are 3 replicates per condition, the bases/regions that are covered in at least 2 replicates will be united and missing data for uncovered bases/regions will appear as NAs. |
chunk.size |
Number of rows to be taken as a chunk for processing the
|
mc.cores |
number of cores to use when processing |
save.db |
A Logical to decide whether the resulting object should be saved as flat file database or not, default: explained in Details sections |
... |
optional Arguments used when save.db is TRUE
|
a methylBase or methylBaseDB object depending on input
The parameter chunk.size
is only used when working with
methylRawDB
or methylRawListDB
objects,
as they are read in chunk by chunk to enable processing large-sized
objects which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work
for most systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
The parameter save.db
is per default TRUE for methylDB objects as
methylRawListDB
,
while being per default FALSE for methylRawList
.
If you wish to save the result of an
in-memory-calculation as flat file database or if the size of the database
allows the calculation in-memory,
then you might change the value of this parameter.
data(methylKit) ## Following my.methylBase=unite(methylRawList.obj) my.methylBase=unite(methylRawList.obj,destrand=TRUE)
data(methylKit) ## Following my.methylBase=unite(methylRawList.obj) my.methylBase=unite(methylRawList.obj,destrand=TRUE)
update methylKit objects
The method updates object from earlier versions (<v0.9.1) to latest object.
updateMethObject(object)
updateMethObject(object)
object |
a methylKit object: methylRaw, methylRawList, methylBase or methylDiff |
methylRaw
,methylDiff
,methylBase
or methylRawList
object
@export @docType methods @rdname updateMethObject