Title: | Base resolution DNA methylation data analysis |
---|---|
Description: | Memory efficient analysis of base resolution DNA methylation data in both the CpG and non-CpG sequence context. Integration of DNA methylation data derived from any methodology providing base- or low-resolution data. |
Authors: | Mattia Pelizzola [aut], Kamal Kishore [aut], Mattia Furlan [ctb, cre] |
Maintainer: | Mattia Furlan <[email protected]> |
License: | GPL(>=2) |
Version: | 1.41.0 |
Built: | 2024-11-29 07:45:52 UTC |
Source: | https://github.com/bioc/methylPipe |
Analysis of base-pair resolution DNA methylation data.
Package: | methylPipe |
Type: | Package |
Version: | 1.0.5 |
Date: | 2015-02-25 |
License: | GPL |
Depends: | methods |
The package offers the following functionalities:
BSdata-class : This class is used in to point to a TABIX compressed file containing base-resolution DNA-methylation data and reference genome sequence
mCsmoothing : Smoothing and plotting methylation data, even chromosome wide
findPMDs : Find partially methylated regions for a given sample
mapBSdata2GRanges : Retrieve mC calls for a GRanges from a BSdata object for a sample
BSdataSet-class : This class is a set of BSdata objects
findDMR : Identifying differentially methylated regions for pairwise or multiple samples comparision
methstats : Descriptive methylation statistics of samples within BSdataSet object
consolidateDMRs : Joins differentially methylated regions according to their proximity to each other, statistical significance, methylation difference and DMR type
GEcollection-class : This class is used to define and manipulate a set of genomic regions and the associated DNA methylation patterns
getCpos : Get genomic Cxx positons for a series of genomic regions
getCposDensity : Determines the density of genomic Cxx positions for a series of genomic regions
profileDNAmetBin : Profile DNA methylation data for a set of genomic regions
plotMeth : Plot the methylation/omics data of a GEcollection object
BSprepare : Preparing tabular data to be used to feed a BSdata object
meth.call : Reads the methylation information from the sorted SAM files generated from BISMARK aligner
pool.reads : Combine reads of replicates within a group
GElist-class : This class is a set of GEcollection objects
Computational Epigenomics Unit at the Center for Genomic Sciences of IIT@SEMM, Milan, Italy
http://genomics.iit.it/groups/computational-epigenomics.html
This class is used in the methylPipe library to point to a TABIX compressed file containing base-resolution DNA-methylation data and reference genome sequence
Objects can be created by calls of the form new("BSdata", ...)
or using the function BSdata(file, uncov, org) whose arguments are
described in the next section Slots.
file
:Object of class "character"
: the TABIX
compressed file containing base-resolution DNA-methylation data.
This file can be generated through the BSprepare
function, and it must contain the following columns:
chromosome assignment (in the form chr1, chr2...),
genomic position (positive integer),
strand (either - or +),
methylation sequence context (either CG, CHG or CHH),
number (>0) of sequencing reads with C calls at that genomic position,
number of sequencing reads with T calls at that genomic position,
binomial pvalue (-10*log10(pvalue)) for calling a mC at that position.
uncov
:Object of class GRanges : this
GRanges object consists of the list of genomic regions with
sequencing coverage information; this information is used to distinguish
which methylation sites are unmethylated, but covered, from those
that are missing data since they have no sequencing coverage. This
object is automatically generated by the meth.call
function while processing aligned files generated from the aligner.
org
:refrence genome of class BSgenome
Mattia Pelizzola
require(BSgenome.Hsapiens.UCSC.hg18) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens)
require(BSgenome.Hsapiens.UCSC.hg18) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens)
This class is used in the methylPipe library to store a set of BSdata objects
Objects can be created by calls of the form new("BSdataSet",
...)
or using the function BSdataSet(org,Objlist,names), see below.
Objlist
:Object of class "list"
: a list with
more than one item, where each item is a BSdata object
names
:Object of class "character"
: vector of
the names of the objects, i.e. the sample names
group
:Object of class "character"
: indicating
conditions and replicates; replicated samples within the same condition
have to be assigned the same group name; if object has only two groups,
"C"(control) and "E" (Experiment) should be specified as groups name
org
:refrence genome of class BSgenome
signature(x = "BSdataSet")
: subsets the
BSdataSet returning a specific BSdata object
signature(x = "BSdataSet")
: replaces the
specific BSdata object in the BSdataSet
signature(x = "BSdataSet")
: subsets the
BSdataSet returning another BSdataSet
Mattia Pelizzola, Kamal Kishore
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) IMR90data <- system.file('extdata', 'IMR90_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') IMR90.db <- BSdata(file=IMR90data, uncov=uncov_GR, org=Hsapiens) H1.IMR90.set <- BSdataSet(org=Hsapiens, group=c("C","E"), IMR90=IMR90.db, H1=H1.db)
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) IMR90data <- system.file('extdata', 'IMR90_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') IMR90.db <- BSdata(file=IMR90data, uncov=uncov_GR, org=Hsapiens) H1.IMR90.set <- BSdataSet(org=Hsapiens, group=c("C","E"), IMR90=IMR90.db, H1=H1.db)
Appending p-values and TABIX compressing tabular data containing DNA-methylation data so that they can be used to create a BSdata object.
BSprepare(files_location, output_folder, tabixPath, bc=1.5/100)
BSprepare(files_location, output_folder, tabixPath, bc=1.5/100)
files_location |
character; the path to the files |
output_folder |
character; the path to the output files |
tabixPath |
character; the path to the Tabix application folder |
bc |
numeric; combined bisulfite conversion and sequencing error rate |
This function can be used to convert tabular files containing DNA-methylation base-resolution data so that they can be used to create a BSdata object. Genomic coordinates in the 1-base system are required, i.e. the first base of each chromosome should be at position 1. Files have to be tab-delimited and they have to contain the following columns: chromosome assignment (in the form chr1, .., chr22, chrX, chrY, chrM, chrC), genomic position (positive integer), strand (either - or +), methylation sequence context (either CG, CHG or CHH), number of sequencing reads with C calls (>0) at that genomic position, number of sequencing reads with T calls at that genomic position. Each file has to be sorted by genomic coordinate.
Binomial p-values are added and a compressed file is created together with the Tabix index. P-values indicate for each Cytosine the probability of observing by chance the occurrence of unmethylated reads. The lower the p-value the higher the confidence in calling that Cytosine methylated.
Mattia Pelizzola, Kamal Kishore
#BSprepare("/path-to-input/","/path-to-output/", tabix="/path-to-tabix/tabix-0.2.6")
#BSprepare("/path-to-input/","/path-to-output/", tabix="/path-to-tabix/tabix-0.2.6")
Fisher's method implementation, used to combine the results from several independent tests bearing upon the same overall hypothesis.
chiCombP(Pvalues)
chiCombP(Pvalues)
Pvalues |
an array of pvalues |
Pvalues will not be corrected for multiple testing. The sum of the log of the pvalues is determined (S). -2*S has a chi-square distribution with 2k degrees of freedom, where k is the number of tests being combined. A chi-square test is then performed.
The chi-square final pvalue
Mattia Pelizzola
chiCombP(c(1e-3,1e-5,1e-2))
chiCombP(c(1e-3,1e-5,1e-2))
Joins differentially methylated regions according to their proximity to each other, statistical significance and methylation difference
consolidateDMRs(DmrGR, pvThr=0.05, MethDiff_Thr=NULL, log2Er_Thr =NULL, GAP=0, type=NULL, correct=FALSE)
consolidateDMRs(DmrGR, pvThr=0.05, MethDiff_Thr=NULL, log2Er_Thr =NULL, GAP=0, type=NULL, correct=FALSE)
DmrGR |
the GRanges object resulting from the
|
pvThr |
numeric; the minimum pvalue for a DMR to be selected |
MethDiff_Thr |
numeric; the absolute value of minimum methylation difference percentage (for both hyper- and hypo-methyaltion) cutoff for the selection of a DMR |
log2Er_Thr |
numeric; the absolute value of minimum log2Enrichment (for both hyper- and hypo-methyaltion) cutoff for the selection of a DMR |
GAP |
numeric; the minimum distance between two adjacent DMRs; DMRs closer than that will be joined, resulting DMRs will be updated mean methylation difference and Pvalues combined using the Fisher's Method |
type |
character; one of the "hyper" or "hypo", specifies the type of differentially menthylated regions selected |
correct |
logical; whether to correct the pvalues using the Benjamini-Hochberg muliple testing correction method |
After the DMRs are identified by findDMR method, a consolidation can be applied on them. This functions allows to select hyper/hypo differentially methylated regions based on P-value and absolute methylation change thresholds. Moreover, DMRs closer than a given distance can be joined. The final GRanges object with the set of final DMRs will be provided with updated mean methylation difference and Pvalues combined using the Fisher's Method.
Either NULL or a GRanges object containing the differential methylated regions satisfying the criteria.
Kamal Kishore
DMRs_file <- system.file('extdata', 'DMRs.Rdata', package='methylPipe') load(DMRs_file) hyper.DMRs.conso <- consolidateDMRs(DmrGR=DMRs, pvThr=0.05, GAP=100, type="hyper", correct=TRUE) hyper.DMRs.conso
DMRs_file <- system.file('extdata', 'DMRs.Rdata', package='methylPipe') load(DMRs_file) hyper.DMRs.conso <- consolidateDMRs(DmrGR=DMRs, pvThr=0.05, GAP=100, type="hyper", correct=TRUE) hyper.DMRs.conso
For genomic ranges with N bins, extract the Genomic ranges for a given bin.
extractBinGRanges(GenoRanges, bin, nbins)
extractBinGRanges(GenoRanges, bin, nbins)
GenoRanges |
An object of class GRanges |
bin |
numeric; the bin corresponding to which data has to be extracted |
nbins |
numeric; the number of bins in which genomic regions are divided |
A GRanges Object
Mattia Pelizzola
gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) extractBinGRanges(GR_chr20, 2, 5)
gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) extractBinGRanges(GR_chr20, 2, 5)
Identifying differentially methylated regions for pairwise or multiple samples comparision.
## S4 method for signature 'methylPipe,BSdataSet' findDMR(object, Nproc=NULL, ROI=NULL, pmdGRanges=NULL, MCClass='mCG', dmrSize=10, dmrBp=1000, binsize=0, eprop=0.3, coverage=1, Pvalue=NULL, SNPs=NULL)
## S4 method for signature 'methylPipe,BSdataSet' findDMR(object, Nproc=NULL, ROI=NULL, pmdGRanges=NULL, MCClass='mCG', dmrSize=10, dmrBp=1000, binsize=0, eprop=0.3, coverage=1, Pvalue=NULL, SNPs=NULL)
object |
An object of class BSdataSet |
Nproc |
numeric; the number of processors to use, one chromosome is ran for each processor |
ROI |
character; either NULL or an object of class GRanges consisting of genomic regions of interest for which DMRs are identified |
pmdGRanges |
a GRanges object containing the genomic coordinates of Partially Methylated Domains that will be masked |
MCClass |
character; the mC sequence context to be considered, one of all, mCG, mCHG or mCHH |
dmrSize |
numeric; the number of consecutive mC to be simulataneously considered; atleast 5 |
dmrBp |
numeric; the max number of bp containing the dmrSize mC |
binsize |
numeric; the size of the bin used for smoothing the methylation levels, useful for nonCG methylation in human |
eprop |
numeric; the max - min methylation level is determined for each mC, or for each bin, and only mC (or bins) with difference greater than eprop are considered |
coverage |
numeric; the minimum number of total reads at a given cytosine genomic position |
Pvalue |
numeric; to select only those mC with significant p-value |
SNPs |
GRanges; if SNPs information is provided those cytosine are removed from DMR computation |
Typically for nonCG methylation in human a dmrSize of 50, a
dmrBp of 50000 and a binsize of 1000 are used. For CpG methylation in
human and both CpG and nonCG methylation in plants the default
settings are usually fine. Partially Methylated Domains are usually
found in differentiated cells and can constitute up to one third of
the genome (Lister R et al, Nature 2009). Usually DMRs are not
selected within those regions especially when comparing differentiated
and pluripotent cells. Eprop is used to speed up the
analysis. According to the number of samples different test are used
to compare the methylation levels (percentage of methylated reads for
each mC). In case of two samples the non parametric wilcoxon test is
used. In case of more than two samples the kruskal wallis non
parametric testis used. Check consolidateDMRs
to further
process and finalize the list of DMRs.
A GRanges object of DMRs with the metadata slots for pValue, MethDiff_Perc and log2Enrichment. When two samples are compared, MethDiff_Perc is the diference between percentage methylation between the conditions compared. However, log2Enrichment is the log2ratio between the mean for the samples.
Mattia Pelizzola, Kamal Kishore
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) IMR90data <- system.file('extdata', 'IMR90_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') IMR90.db <- BSdata(file=IMR90data, uncov=uncov_GR, org=Hsapiens) H1.IMR90.set <- BSdataSet(org=Hsapiens, group=c("C","E"), IMR90=IMR90.db, H1=H1.db) gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) DMRs <- findDMR(object= H1.IMR90.set, Nproc=1, ROI=GR_chr20, MCClass='mCG', dmrSize=10, dmrBp=1000, eprop=0.3) head(DMRs)
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) IMR90data <- system.file('extdata', 'IMR90_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') IMR90.db <- BSdata(file=IMR90data, uncov=uncov_GR, org=Hsapiens) H1.IMR90.set <- BSdataSet(org=Hsapiens, group=c("C","E"), IMR90=IMR90.db, H1=H1.db) gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) DMRs <- findDMR(object= H1.IMR90.set, Nproc=1, ROI=GR_chr20, MCClass='mCG', dmrSize=10, dmrBp=1000, eprop=0.3) head(DMRs)
This function is a wrapper function to identify partially methylated domains (PMDs) in Bis-seq data.
## S4 method for signature 'methylPipe,BSdata' findPMDs(Object, Nproc=1, Chrs=NULL)
## S4 method for signature 'methylPipe,BSdata' findPMDs(Object, Nproc=1, Chrs=NULL)
Object |
An object of class BSdataSet |
Nproc |
numeric; the number of processors to use, one chromosome is ran for each processor |
Chrs |
character; Chromosome on which PMDs are identified |
This functions is a wrapper function of segmentPMDs method of package MethylSeekR. This function trains a Hidden Markov Model (HMM) to detect partially methylated domains (PMDs) in Bis-seq data.
A GRangesList object containing segments that partition the genome into PMDs and regions outside of PMDs. The object contains two metadata columns indicating the type of region (PMD/notPMD) and the number of covered (by at least 5 reads) CpGs (nCG) in the region.
Kamal Kishore
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) PMDs <- findPMDs(H1.db, Nproc=1, Chrs="chr20")
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) PMDs <- findPMDs(H1.db, Nproc=1, Chrs="chr20")
This class is used in the methylPipe library to define and manipulate a set of genomic regions and the associated DNA methylation patterns
This class is an extension of the RangedSummarizedExperiment class
from the SummarizedExperiment package. Objects can be created using
the function profileDNAmetBin
which determines the absolute and
relative methylation level by filling the binC, binmC and binrC slots. The
assays slot of the RangedSummarizedExperiment class here consists of
four matrices:
binC: each genomic region is divided in one or more bins and for each bin the density (per bp) of potential methylation sites is determined.
binmC: each genomic region is divided in one or more bins and for each bin the density (per bp) of methylation events is determined.
binrC: each genomic region is divided in one or more bins and for each bin the relative mC/C content is determined.
binscore: each genomic region is divided in one or more bins and scores can be assigned to them. In particular, it can be convenient for storing reads count for each bin of each genomic region.
The minimal set of data to create a GEcollection object is a set of genomic regions to be provided as a GRanges object and a dataset of class BSdata.
signature(object = "GEcollection")
: extracts the
chr assignment of the genomic regions
signature(object = "GEcollection")
: extracts
the strand assignment of the genomic regions
signature(object = "GEcollection")
: extracts the
binC matrix
signature(object = "GEcollection")
: extracts the
binmC matrix
signature(object = "GEcollection")
: extracts the
binrC matrix
signature(object = "GEcollection")
: extracts
the binscore matrix
signature(object = "GEcollection")
:
replaces the binscore matrix
signature(object = "GEcollection")
: returns the
number of bins
Kamal Kishore
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) gec.H1 <- profileDNAmetBin(GenoRanges=GR_chr20, Sample=H1.db, mcCLASS='mCG') gec.H1
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) gec.H1 <- profileDNAmetBin(GenoRanges=GR_chr20, Sample=H1.db, mcCLASS='mCG') gec.H1
This class is used in the methylPipe library to collect a set of GEcollection objects
Objects can be created by calls of the form new("GElist", ...)
or using the function GElist(Objlist,names), see below.
GElist are a collection of GEcollection objects (see
GElist-class
).
Objlist
:Object of class "list"
: a list where
each item is a GEcollection object
names
:Object of class "character"
: vector of
the names of the objects
signature(x = "GElist")
: subsets the
GElist returning a specific GEcollection object
signature(x = "GElist")
: replaces the specific
GEcollection object in the GElist
signature(x = "GElist")
: subsets the GElist
returning another GElist
Mattia Pelizzola
gecollect_file <- system.file('extdata', 'gec.H1.Rdata', package='methylPipe') load(gecollect_file) gec1 <- gec.H1[start(gec.H1) < 153924] gec2 <- gec.H1[start(gec.H1) > 153924] gel.set <- GElist(g1=gec1, g2=gec2)
gecollect_file <- system.file('extdata', 'gec.H1.Rdata', package='methylPipe') load(gecollect_file) gec1 <- gec.H1[start(gec.H1) < 153924] gec2 <- gec.H1[start(gec.H1) > 153924] gel.set <- GElist(g1=gec1, g2=gec2)
getCpos retrieves genomic Cxx positions, possible target of DNA methylation for a series of genomic regions (and bins thereof) and a given organism. getCposChr is a Helper function which performs the same task for any given DNAString sequence and is not intended for the user to call directly.
getCpos(GenoRanges, seqContext='all', nbins, org) getCposChr(GenoRanges, seqContext, chrseq, nbins)
getCpos(GenoRanges, seqContext='all', nbins, org) getCposChr(GenoRanges, seqContext, chrseq, nbins)
GenoRanges |
An object of class GRanges |
seqContext |
character; one of: all, CG, CHG or CHH |
org |
an object of class BSgenome; typically the genome sequences of a given organism |
chrseq |
an object of class DNAString; typically a chromosome sequence of a given organism |
nbins |
numeric; the number of bins each region of genomic regions is divided |
A list is returned with the position of the Cxx in the GRanges regions. The length of the list is equal to the length of the GRanges. For each list item a list with length equal to the number of bins of the GRanges is returned. For each bin the position of the Cxx relative to the genomic coordinates of that bin is returned.
Mattia Pelizzola
getCposDensity
, profileDNAmetBin
require(BSgenome.Hsapiens.UCSC.hg18) gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) res <- getCpos(GR_chr20, seqContext='CG', nbins=1, org=Hsapiens) res[[1]]
require(BSgenome.Hsapiens.UCSC.hg18) gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) res <- getCpos(GR_chr20, seqContext='CG', nbins=1, org=Hsapiens) res[[1]]
After having used getCpos (or getCposChr), getCposDensity determines the density of Cxx sites for each bin of each genomic region.
getCposDensity(GenoRanges, Cpos, nbins)
getCposDensity(GenoRanges, Cpos, nbins)
GenoRanges |
an object of class GRanges used to generate the Cpos list |
Cpos |
list returned by getCpos or getCposChr methods |
nbins |
numeric; the number of bins each region of genomic regions is divided |
Returns a list with the number of Cxx sites per bp of bin size for each region of the GRanges.
Mattia Pelizzola
require(BSgenome.Hsapiens.UCSC.hg18) gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) resC <- getCposChr(GenoRanges=GR_chr20, seqContext='CG', chrseq=unmasked(Hsapiens[['chr20']]), nbins=3) resd <- getCposDensity(GenoRanges=GR_chr20, Cpos= resC, nbins=3)
require(BSgenome.Hsapiens.UCSC.hg18) gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) resC <- getCposChr(GenoRanges=GR_chr20, seqContext='CG', chrseq=unmasked(Hsapiens[['chr20']]), nbins=3) resd <- getCposDensity(GenoRanges=GR_chr20, Cpos= resC, nbins=3)
mapBSdata2GRanges retrieves mC calls for a GRanges given a BSdata object for a sample. mapBSdata2GRangesBin does the same for each bin of each genomic region.
mapBSdata2GRanges(GenoRanges, Sample, context='all', mC=1, depth=0, pValue=1) mapBSdata2GRangesBin(GenoRanges, Sample, context='all', mC=1, depth=0, pValue=1, nbins)
mapBSdata2GRanges(GenoRanges, Sample, context='all', mC=1, depth=0, pValue=1) mapBSdata2GRangesBin(GenoRanges, Sample, context='all', mC=1, depth=0, pValue=1, nbins)
GenoRanges |
An object of class GRanges |
Sample |
An object of class BSdata |
context |
character; one of: all, CG, CHG or CHH |
mC |
numeric; the minumum number of reads with C (DNA-methylation events) at a given cytosine genomic position |
depth |
numeric; the minimum number of total reads at a given cytosine genomic position |
pValue |
numeric; the minimum binomial pValue for an mC call at a given cytosine genomic position |
nbins |
numeric; the number of bins in which Genomic regions are divided |
DNA-methylation data contained for a sample within a BSdata object is extracted for the set of genomic regions of a GRanges (and in particular for each bin using the mapBSdata2GRangesBin method). It is also possible to restrict the mC sequence context, to specify the minimal number of reads with C events at a given cytosine genomic position, to specify the minimum depth of sequencing and binomial pValue for the mC calls. A region with no mC will be defined unmethylated (0 is returned for that region). However, if it is overlapping with at least one uncovered region then it is defined non evaluable (NA is retuned).
A list is returned. The length of the list is equal to the length of the GRanges. For each list item either NA, 0 or a data frame are returned. 0 means that the region contains unmethylated DNA methylation sites, whereas NA indicates that the region or some part of region was not covered by the sequencing. If a data frame is returned, it has the following columns: chromosome assignment (in the form chr1, .., chr22, chrX, chrY, chrM, chrC), genomic position (positive integer), strand (either - or +), methylation sequence context (either CG, CHG or CHH), number (>0) of sequencing reads with C calls at that genomic position, number of sequencing reads with T calls at that genomic position, binomial pvalue (-10*log10(pvalue)) for calling a mC at that position.
Mattia Pelizzola, Kamal Kishore
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) res <- mapBSdata2GRanges(GenoRanges=GR_chr20, Sample=H1.db, context='CG', mC=1, depth=0, pValue=1) resbin <- mapBSdata2GRangesBin(GenoRanges=GR_chr20, Sample=H1.db, context='CG', mC=1, depth=0, pValue=1, nbins=2)
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) res <- mapBSdata2GRanges(GenoRanges=GR_chr20, Sample=H1.db, context='CG', mC=1, depth=0, pValue=1) resbin <- mapBSdata2GRangesBin(GenoRanges=GR_chr20, Sample=H1.db, context='CG', mC=1, depth=0, pValue=1, nbins=2)
Smoothing and plotting methylation data, even chromosome wide.
## S4 method for signature 'methylPipe,BSdata' mCsmoothing(Object, refgr, Scorefun='sum', Nbins=20, Context="CG", plot=TRUE)
## S4 method for signature 'methylPipe,BSdata' mCsmoothing(Object, refgr, Scorefun='sum', Nbins=20, Context="CG", plot=TRUE)
Object |
An object of class BSdata |
refgr |
GRanges; Genomic Ranges to plot the data |
Scorefun |
character; either sum or mean for smoothing |
Nbins |
numeric; the number of interval each range is divided |
Context |
character; either all or a combination of CG, CHG, and CHH |
plot |
logical; whether the smoothed profile has to be plotted |
The sum or the mean methylation level is determined on each window of size Binsize and smoothed with the smooth.spline function.
A list with three components: pos (the left most point of each window), score (either the sum or the mean methylation levels), smoothed (the smoothed methylation levels).
Mattia Pelizzola
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) gr <- GRanges("chr20",IRanges(1,5e5)) sres <- mCsmoothing(H1.db, gr, Scorefun='sum', Nbins=50, Context="CG", plot=TRUE)
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) gr <- GRanges("chr20",IRanges(1,5e5)) sres <- mCsmoothing(H1.db, gr, Scorefun='sum', Nbins=50, Context="CG", plot=TRUE)
Reads the methylation calls from sorted SAM files generated from Bismark aligner.
meth.call(files_location, output_folder, no_overlap, read.context, Nproc)
meth.call(files_location, output_folder, no_overlap, read.context, Nproc)
files_location |
character; the path(s) to the folder location consisting of sorted SAM files |
output_folder |
character; the path(s) to the folder location where the output files are written |
no_overlap |
character; if set to TRUE and the SAM file has paired-end reads, then one read of the overlapping paired-end read pair will be ignored for methylation calling |
read.context |
character; One of the 'CpG' or 'All'. Determines what type of methylation context will be read. If given as 'all', cytosine methylation information in all sequence context will be read. |
Nproc |
numeric; the number of processors to use, one sample is processed by each processor. |
The function reads methylation calls from the sorted SAM file
so that they can be used to create a BSdata object. SAM files
must be sorted based on chr and start of reads. The user can specify
the sequence context in which the methylation information is read from
these files either "CpG" or "All". If "All" is specified, cytosine
methylation in all context (CG, CHG or CHH) will be read. The
methylation calls is saved as a text file in the output folder.
These text files are tab-delimited and contain the following columns:
chromosome assignment (in the form chr1, chr2..),
genomic position (positive integer),
strand (either - or +),
methylation sequence context (either CG, CHG or CHH),
number (>0) of sequencing reads with C calls at that genomic position,
number of sequencing reads with T calls at that genomic position.
In addition a GRanges object consisting of uncovered genomic regions is
generated and saved in the output folder for each sample. This information is
used to distinguish unmethylated cytosines from those that are not covered by
sequencing. This GRanges object is used further to provide uncovered regions
information while creating BSdata object by BSdata
method.
A text file of methylation calls and a GRanges object consisting of uncovered genomic regions for each sample are generated in the "output_folder" folder. The files are prefixed with sample name.
Kamal Kishore
file_loc <- system.file('extdata', 'test_methcall', package='methylPipe') meth.call(files_location=file_loc, output_folder=tempdir(), no_overlap=TRUE, read.context="CpG", Nproc=1)
file_loc <- system.file('extdata', 'test_methcall', package='methylPipe') meth.call(files_location=file_loc, output_folder=tempdir(), no_overlap=TRUE, read.context="CpG", Nproc=1)
Exploratory methylation statistics of samples in BSdataSet object.
## S4 method for signature 'methylPipe,BSdataSet' methstats(object, chrom, mcClass='mCG', minC=1, coverage=1, pval=1, Nproc=1)
## S4 method for signature 'methylPipe,BSdataSet' methstats(object, chrom, mcClass='mCG', minC=1, coverage=1, pval=1, Nproc=1)
object |
An object of class BSdataSet |
chrom |
character; either NULL or an object of class character |
mcClass |
character; the mC sequence context to be considered, one of all, mCG, mCHG or mCHH |
minC |
numeric; the minumum number of reads with C (DNA-methylation events) at a given cytosine genomic position |
coverage |
numeric; the minimum number of total reads at a given cytosine genomic position |
pval |
numeric; the minimum binomial pValue for an mC call at a given cytosine genomic position |
Nproc |
numeric; the number of processors to use, one chromosome is ran for each processor |
The function provides basic statistical methods which computes descriptive statistics, correlation matrix and clustering of samples within the BSdataSet.
A list with slots named descriptive_stats and correlation_mat.
Kamal Kishore
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges('chr20', IRanges(14350, 18349)) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) IMR90data <- system.file('extdata', 'IMR90_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') IMR90.db <- BSdata(file=IMR90data, uncov=uncov_GR, org=Hsapiens) H1.IMR90.set <- BSdataSet(org=Hsapiens, group=c("C","C","E","E"), IMR_1=IMR90.db, IMR_2=IMR90.db, H1_1=H1.db,H1_2=H1.db) stats_res <- methstats(H1.IMR90.set,chrom="chr20",mcClass='mCG', Nproc=1) stats_res
require(BSgenome.Hsapiens.UCSC.hg18) uncov_GR <- GRanges('chr20', IRanges(14350, 18349)) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) IMR90data <- system.file('extdata', 'IMR90_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') IMR90.db <- BSdata(file=IMR90data, uncov=uncov_GR, org=Hsapiens) H1.IMR90.set <- BSdataSet(org=Hsapiens, group=c("C","C","E","E"), IMR_1=IMR90.db, IMR_2=IMR90.db, H1_1=H1.db,H1_2=H1.db) stats_res <- methstats(H1.IMR90.set,chrom="chr20",mcClass='mCG', Nproc=1) stats_res
Plot DNA methylation data (either high- or low-resolution) together with other omics data (ChIP-seq, RNA-seq), or annotation tracks for one genomic region (genome-browser like view based on gviz).
plotMeth(grl=NULL, colors=NULL, datatype=NULL, yLim, brmeth=NULL, mcContext="CG", annodata=NULL, transcriptDB, chr, start, end, org)
plotMeth(grl=NULL, colors=NULL, datatype=NULL, yLim, brmeth=NULL, mcContext="CG", annodata=NULL, transcriptDB, chr, start, end, org)
grl |
An object of class GElist or a potentially mixed list of GRanges or GEcollection objects |
colors |
character of length equal to grl; name of colors to display data tracks from the grl object |
datatype |
character of length equal to grl; one of C, mC , rC, density or cols |
yLim |
numeric vector with the same length of grl setting maximum values |
brmeth |
A list of object of class BSdata |
mcContext |
character; one of all, CG, CHG or CHH |
annodata |
An object of class GRangesList |
transcriptDB |
An object of class TranscriptDb |
chr |
character; chromosome name |
start |
numeric; chromosome start |
end |
numeric; chromosome end |
org |
BSgenome; an object of class BSgenome |
This function can be used to display for one genomic region (genome-browser like) DNA methylation data together with other omics data or static annotation info. The genomic region is indicated by chr, start and end. Specifically, grl is used to display binned high- or low-resolution data, while brmeth is used to point to (unbinned) base-resolution data. Each component of grl can either be a GEcollection or a GRanges.
In case of GEcollection, binC, binmC or binrC components will be extracted as indicated in datatype (setting C, mC or rC, respectively), and if more than 1 bin is present the average value will be considered for each range. Datatype can be set to density to extract the binscore component of the GEcollection, which can be used to store low-resolution or other omics data attacched to a base-resolution dataset.
In case of a GRanges (suitable for low-resolution or other omics data independently from base-resolution data), only the 1st column of the mcols will be considered. Eventually, for both GEcolleciton and GRanges tracks, a bar with the specific value will be displayed for the ranges occurring in the considered region (if any).
Regarding unbinned base-resolution data, mcContext defines the sequence context to be considered for the methyl-cytosines for each component of the brmeth object; a bar with height equal to the methylation level of each cytosine will be displayed for each sample (track).
Annodata is an optional GRangesList that can be used to display co-occurring annotation data, such as CpG islands (presence or absence of the regions only). transcriptDB and BSgenome are used to overlay the structure of annotated genes and chromosome ideogram, respectively.
Kamal Kishore
require(TxDb.Hsapiens.UCSC.hg18.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene require(BSgenome.Hsapiens.UCSC.hg18) gecH1_file <- system.file('extdata', 'gec.H1.Rdata', package='methylPipe') load(gecH1_file) gecIMR_file <- system.file('extdata', 'gec.IMR90.Rdata', package='methylPipe') load(gecIMR_file) gel <- GElist(gecH1=gec.H1, gecIMR90=gec.IMR90) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) IMR90data <- system.file('extdata', 'IMR90_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') IMR90.db <- BSdata(file=IMR90data, uncov=uncov_GR, org=Hsapiens) H1.IMR90.set <- list(H1=H1.db, IMR90=IMR90.db) plotMeth(gel, colors=c("red","blue"), datatype=c("mC","mC"), yLim=c(.025, .025), brmeth=H1.IMR90.set, mcContext="CG", transcriptDB=txdb, chr="chr20", start=14350 , end=474481, org=Hsapiens)
require(TxDb.Hsapiens.UCSC.hg18.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene require(BSgenome.Hsapiens.UCSC.hg18) gecH1_file <- system.file('extdata', 'gec.H1.Rdata', package='methylPipe') load(gecH1_file) gecIMR_file <- system.file('extdata', 'gec.IMR90.Rdata', package='methylPipe') load(gecIMR_file) gel <- GElist(gecH1=gec.H1, gecIMR90=gec.IMR90) uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') H1.db <- BSdata(file=H1data, uncov=uncov_GR, org=Hsapiens) IMR90data <- system.file('extdata', 'IMR90_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') IMR90.db <- BSdata(file=IMR90data, uncov=uncov_GR, org=Hsapiens) H1.IMR90.set <- list(H1=H1.db, IMR90=IMR90.db) plotMeth(gel, colors=c("red","blue"), datatype=c("mC","mC"), yLim=c(.025, .025), brmeth=H1.IMR90.set, mcContext="CG", transcriptDB=txdb, chr="chr20", start=14350 , end=474481, org=Hsapiens)
Combine reads of replicates within a group.
pool.reads(files_location)
pool.reads(files_location)
files_location |
character; the path to the folder location consisting of tab separated text files |
The function reads tab separated text files of methylation calls generated from meth.call or user supplied according to the format specified for BSprepare method. It pools all the reads of the replicates within a single group for each cytosine position and creates a file consisting of the cytosines with pooled reads information.
A text file of methylation calls are generated in the "files_location" folder.
Kamal Kishore
#pool.reads(files_location)
#pool.reads(files_location)
Processing MLML software output to generate files with hmC and CpG methylation information.
process.hmc(file, output_folder, Coverage)
process.hmc(file, output_folder, Coverage)
file |
character; the path to the file |
output_folder |
character; the path to the output files |
Coverage |
GRanges; the object containing coverage for each cytosine |
This function allows processing of the output files from MLML software (Qu et al, Bioinformatics 2013). MLML read counts from BS-seq, oxBS-seq and TAB-seq to provide simultaneous estimates of 5hmC and 5mC levels. The input for this function is output file from this software alongwith a GRanges object consisting of coverage of each cytosine. The GRanges object should contain "coverage" column. This object can be generated using the coverage method of R package GenomicRanges.
The function will return two files one each for "CpG" and "hmC" for the given sample which can directly be used for BSdata object creation.
Kamal Kishore
#process.hmc(file,"/path-to-output/", Coverage)
#process.hmc(file,"/path-to-output/", Coverage)
Profile the absolute and relative density of mC sites for each bin of each genomic region of a GEcollection object.
profileDNAmetBin(GenoRanges, Sample, mcCLASS="mCG", mC=1, depthThr=0, mCpv=1, minCoverage=0.75, nbins = 2) profileDNAmetBinParallel(GenoRanges, Sample, mcCLASS="mCG", mC=1, depthThr=0, mCpv=1, minCoverage=0.75, Nproc=1, nbins = 2)
profileDNAmetBin(GenoRanges, Sample, mcCLASS="mCG", mC=1, depthThr=0, mCpv=1, minCoverage=0.75, nbins = 2) profileDNAmetBinParallel(GenoRanges, Sample, mcCLASS="mCG", mC=1, depthThr=0, mCpv=1, minCoverage=0.75, Nproc=1, nbins = 2)
GenoRanges |
an object of class GRanges |
Sample |
an object of class BSdata |
mcCLASS |
character; one of: mCG, mCHG, mCHH |
mC |
numeric; the minumum number of reads with C (DNA-methylation events) at a given cytosine genomic position |
depthThr |
numeric; the minimum number of total reads at a given cytosine genomic position |
mCpv |
numeric; the minimum binomial pValue for an mC call at a given cytosine genomic position |
minCoverage |
numeric between 0 and 1; the minimum coverage of for the genomic region to be profiled; currently ignored |
Nproc |
numeric; the number of processor for parallelization |
nbins |
numeric; the number of bins each genomic region is divided |
For each bin of each genomic region of a GRanges object, the absolute and relative density of mC sites is determined and an object of class GEcollection is created.
An object of class GRanges from which an object of class GEcollection is created with the binC, binmC and binrC data slots been filled with density matrices. These matrices report the density of mC sites in the sequence context specified by mcCLASS. They are counted for each bin in each genomic region and their count is divided by the bin size in bp. The binC data slot is filled with the density of all possible methylation sites in the specified sequence context. The binmC data slot is filled with the density of mC sites in the specified sequence context for the considered sample. The binrC data slot is filled with the ratio of binC and binmC matrices, representing the relative methylation for each bin in each genomic region.
Mattia Pelizzola, Kamal Kishore
require(BSgenome.Hsapiens.UCSC.hg18) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1.db <- BSdata(file=H1data, uncov= uncov_GR, org=Hsapiens) gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) gec.H1 <- profileDNAmetBin(GenoRanges=GR_chr20, Sample=H1.db, mcCLASS='mCG', nbins=2) head(binmC(gec.H1))
require(BSgenome.Hsapiens.UCSC.hg18) H1data <- system.file('extdata', 'H1_chr20_CG_10k_tabix_out.txt.gz', package='methylPipe') uncov_GR <- GRanges(Rle('chr20'), IRanges(c(14350,69251,84185), c(18349,73250,88184))) H1.db <- BSdata(file=H1data, uncov= uncov_GR, org=Hsapiens) gr_file <- system.file('extdata', 'GR_chr20.Rdata', package='methylPipe') load(gr_file) gec.H1 <- profileDNAmetBin(GenoRanges=GR_chr20, Sample=H1.db, mcCLASS='mCG', nbins=2) head(binmC(gec.H1))
Helper function to partition genome chromosome-wise for parallel computation. This function is not intended for the user to call directly.
splitChrs(chrs, org)
splitChrs(chrs, org)
chrs |
character; an aray of chromome names in the form chr1, .., chrX |
org |
an object of class BSgenome |
A data frame with chromosome name, start and end position of each chunk.
Helper function to convert the list returned by the function scanTabix into a GRanges. This function is not intended for the user to call directly.
tabixdata2GR(x)
tabixdata2GR(x)
x |
list; the list returned by the function scanTabix |
An object of class data frame.
Mattia Pelizzola, Kamal Kishore