| Title: | Differentially Methylated Regions Caller |
|---|---|
| Description: | Uses Bisulfite sequencing data in two conditions and identifies differentially methylated regions between the conditions in CG and non-CG context. The input is the CX report files produced by Bismark and the output is a list of DMRs stored as GRanges objects. |
| Authors: | Nicolae Radu Zabet <[email protected]>, Jonathan Michael Foonlan Tsang <[email protected]>, Alessandro Pio Greco <[email protected]>, Ryan Merritt <[email protected]> and Young Jun Kim <[email protected]> |
| Maintainer: | Nicolae Radu Zabet <[email protected]> |
| License: | GPL-3 |
| Version: | 1.45.0 |
| Built: | 2026-05-29 10:29:15 UTC |
| Source: | https://github.com/bioc/DMRcaller |
This function extracts from the methylation data the total number of reads, the number of methylated reads and the number of cytosines in the specific context from a region (e.g. DMRs)
analyseReadsInsideRegionsForCondition( regions, methylationData, context, label = "", parallel = FALSE, BPPARAM = NULL, cores = NULL )analyseReadsInsideRegionsForCondition( regions, methylationData, context, label = "", parallel = FALSE, BPPARAM = NULL, cores = NULL )
regions |
a |
methylationData |
the methylation data in one condition
(see |
context |
the context in which to extract the reads ( |
label |
a string to be added to the columns to identify the condition |
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
a GRanges object with additional four metadata columns
the number of methylated reads
the total number of reads
the proportion methylated reads
the number of cytosines in the regions
Nicolae Radu Zabet
readONTbam, filterDMRs, computeDMRs,
DMRsNoiseFilterCG, and mergeDMRsIteratively
# load the methylation data data(methylationDataList) #load the DMRs in CG context. These DMRs were computed with minGap = 200. data(DMRsNoiseFilterCG) #retrive the number of reads in CHH context in WT DMRsNoiseFilterCGreadsCHH <- analyseReadsInsideRegionsForCondition( DMRsNoiseFilterCG[1:10], methylationDataList[["WT"]], context = "CHH", label = "WT")# load the methylation data data(methylationDataList) #load the DMRs in CG context. These DMRs were computed with minGap = 200. data(DMRsNoiseFilterCG) #retrive the number of reads in CHH context in WT DMRsNoiseFilterCGreadsCHH <- analyseReadsInsideRegionsForCondition( DMRsNoiseFilterCG[1:10], methylationDataList[["WT"]], context = "CHH", label = "WT")
This function extracts from the methylation data the total number of reads, the number of methylated reads and the number of cytosines in the specific context from a region (e.g. PMDs)
analyseReadsInsideRegionsForConditionPMD( regions, methylationData, context, label = "", parallel = FALSE, BPPARAM = NULL, cores = NULL )analyseReadsInsideRegionsForConditionPMD( regions, methylationData, context, label = "", parallel = FALSE, BPPARAM = NULL, cores = NULL )
regions |
a |
methylationData |
the methylation data in one condition
(see |
context |
the context in which to extract the reads ( |
label |
a string to be added to the columns to identify the condition |
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
a GRanges object with additional four metadata columns
the number of methylated reads
the total number of reads
the proportion methylated reads
the number of cytosines in the regions
Nicolae Radu Zabet and Young Jun Kim
filterPMDs, computePMDs,
PMDsNoiseFilterCG, and mergePMDsIteratively
# load the ONT methylation data data(ontSampleGRangesList) #load the PMDs in CG context. These PMDs were computed with minGap = 200. data(PMDsNoiseFilterCG) #retrive the number of reads in CG context in GM18501 PMDsNoiseFilterCGreadsCG <- analyseReadsInsideRegionsForConditionPMD( PMDsNoiseFilterCG[1:10], ontSampleGRangesList[["GM18501"]], context = "CG", label = "GM18501")# load the ONT methylation data data(ontSampleGRangesList) #load the PMDs in CG context. These PMDs were computed with minGap = 200. data(PMDsNoiseFilterCG) #retrive the number of reads in CG context in GM18501 PMDsNoiseFilterCGreadsCG <- analyseReadsInsideRegionsForConditionPMD( PMDsNoiseFilterCG[1:10], ontSampleGRangesList[["GM18501"]], context = "CG", label = "GM18501")
computeCoMethylatedPositions() calculates pairwise co-methylation between all cytosine sites
within each given region, using ONT methylation calls annotated to each site.
For each pair of cytosines within the same strand and PMD, it builds a 2x2 contingency table
reflecting the overlap state of reads (both methylated, only one methylated, or neither),
performs a statistical test (Fisher's exact by default), and reports FDR-adjusted p-values.
computeCoMethylatedPositions( methylationData, regions, minDistance = 150, maxDistance = 1000, minCoverage = 4, pValueThreshold = 0.01, alternative = "two.sided", test = "fisher", parallel = FALSE, BPPARAM = NULL, cores = NULL )computeCoMethylatedPositions( methylationData, regions, minDistance = 150, maxDistance = 1000, minCoverage = 4, pValueThreshold = 0.01, alternative = "two.sided", test = "fisher", parallel = FALSE, BPPARAM = NULL, cores = NULL )
methylationData |
A |
regions |
A |
minDistance |
Minimum distance (in bp) between two cytosines to consider for co-methylation (default: 150). |
maxDistance |
Maximum distance (in bp) between two cytosines to consider (default: 1000). |
minCoverage |
Minimum read coverage required for both cytosines in a pair (default: 4). |
pValueThreshold |
FDR-adjusted p-value threshold for reporting significant co-methylation (default: 0.01). |
alternative |
indicates the alternative hypothesis and must be one of
|
test |
Statistical test to use for co-methylation ( |
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
Compute Co-Methylation Positions within Regions (CMPs)
Pairwise tests are performed separately for each strand (+ and -) within each region. FDR correction is performed for all pairs within each region and strand.
A list of length equal to regions, where each entry is a GInteractions object
of significant cytosine pairs (by strand), annotated with:
number of reads methylated at both cytosines
number methylated at only first cytosine
number methylated at only second cytosine
number methylated at neither cytosines
The DNA strand ("+" or "-") on which the two CpGs reside.
The original region (from regions) containing
this cytosines pair, formatted in UCSC or IGV style, e.g. "chr1:1522971-1523970".
FDR-adjusted p-value for co-methylation association
Nicolae Radu Zabet and Young Jun Kim
readONTbam, computePMDs,
ontSampleGRangesList
## Not run: # load the ONT methylation data and PMD data data(ont_gr_GM18870_chr1_PMD_bins_1k) data(ont_gr_GM18870_chr1_sorted_bins_1k) # compute the co-methylations with Fisher's exact test coMetylationFisher <- computeCoMethylatedPositions( ont_gr_GM18870_chr1_sorted_bins_1k, regions = ont_gr_GM18870_chr1_PMD_bins_1k[1:4], minDistance = 150, maxDistance = 1000, minCoverage = 4, pValueThreshold = 0.01, test = "fisher", parallel = FALSE) # compute the co-methylations with Permuation test coMetylationPermutation <- computeCoMethylatedPositions( ont_gr_GM18870_chr1_sorted_bins_1k, regions = ont_gr_GM18870_chr1_PMD_bins_1k[1:4], minDistance = 150, maxDistance = 1000, minCoverage = 4, pValueThreshold = 0.01, test = "permutation", parallel = FALSE) # highly recommended to set as TRUE ## End(Not run)## Not run: # load the ONT methylation data and PMD data data(ont_gr_GM18870_chr1_PMD_bins_1k) data(ont_gr_GM18870_chr1_sorted_bins_1k) # compute the co-methylations with Fisher's exact test coMetylationFisher <- computeCoMethylatedPositions( ont_gr_GM18870_chr1_sorted_bins_1k, regions = ont_gr_GM18870_chr1_PMD_bins_1k[1:4], minDistance = 150, maxDistance = 1000, minCoverage = 4, pValueThreshold = 0.01, test = "fisher", parallel = FALSE) # compute the co-methylations with Permuation test coMetylationPermutation <- computeCoMethylatedPositions( ont_gr_GM18870_chr1_sorted_bins_1k, regions = ont_gr_GM18870_chr1_PMD_bins_1k[1:4], minDistance = 150, maxDistance = 1000, minCoverage = 4, pValueThreshold = 0.01, test = "permutation", parallel = FALSE) # highly recommended to set as TRUE ## End(Not run)
computeCoMethylatedRegions() calculates pairwise correlation statistics for methylation levels
across defined genomic regions (e.g., PMDs, Enhancer binding sites). For each region pair within the
specified distance range, the function computes per-read methylation proportions and
performs correlation testing (Pearson, Spearman, or Kendall). Pairs with strong correlations
(beyond user-defined thresholds) and significant p-values (FDR-adjusted) are returned.
computeCoMethylatedRegions( methylationData, regions, minDistance = 500, maxDistance = 50000, minCoverage = 4, pValueThreshold = 0.05, correlation_test = "pearson", minCorrelation = -0.5, maxCorrelation = 0.5, parallel = FALSE, BPPARAM = NULL, cores = NULL )computeCoMethylatedRegions( methylationData, regions, minDistance = 500, maxDistance = 50000, minCoverage = 4, pValueThreshold = 0.05, correlation_test = "pearson", minCorrelation = -0.5, maxCorrelation = 0.5, parallel = FALSE, BPPARAM = NULL, cores = NULL )
methylationData |
A |
regions |
A |
minDistance |
Minimum genomic distance (in base pairs) between two regions to be considered (default: 500). |
maxDistance |
Maximum genomic distance (in base pairs) between two regions (default: 50,000). |
minCoverage |
Minimum number of shared reads (per region pair) required to compute correlation (default: 4). |
pValueThreshold |
Significance threshold for FDR-adjusted p-values (default: 0.05). |
correlation_test |
Statistical method to compute correlation; must be one of |
minCorrelation |
Minimum allowed correlation value for a significant result (must be in between -1 and 0; default: -0.5). |
maxCorrelation |
Maximum allowed correlation value for a significant result (must be in between 0 and 1; default: 0.5). |
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
Compute Co-Methylated Regions (CMRs)
The function first identifies all region pairs within the user-defined distance range. For each pair, it calculates methylation proportions per read across both regions, extracts common reads, and tests correlation using the selected method. FDR correction is applied globally across all region pairs.
A GInteractions object, where each row represents a significantly correlated pair of genomic regions
from the input regions. The anchors of each interaction correspond to original regions,
and their genomic coordinates are retained in the anchor1 and anchor2 slots.
Additionally, a genomic_position meta-column is included to indicate the original coordinate ranges
(in UCSC/IGV format) for each interaction, aiding downstream interpretation or visualisation.
Each interaction is annotated with:
Correlation coefficient between the two regions
Number of shared reads used for correlation
FDR-adjusted p-value for the correlation test
Nicolae Radu Zabet and Young Jun Kim
readONTbam, computePMDs,
ontSampleGRangesList
# Load methylation data and PMD regions data("ont_gr_GM18870_chr1_sorted_bins_1k") data("ont_gr_GM18870_chr1_PMD_bins_1k") # Compute highly correlated regions (Pearson) coMethylationRegion_pearson <- computeCoMethylatedRegions(ont_gr_GM18870_chr1_sorted_bins_1k, ont_gr_GM18870_chr1_PMD_bins_1k[1:5], minDistance = 500, maxDistance = 50000, minCoverage = 4, pValueThreshold = 0.05, correlation_test = "pearson", minCorrelation = -0.5, maxCorrelation = 0.5, parallel = FALSE, BPPARAM = NULL)# Load methylation data and PMD regions data("ont_gr_GM18870_chr1_sorted_bins_1k") data("ont_gr_GM18870_chr1_PMD_bins_1k") # Compute highly correlated regions (Pearson) coMethylationRegion_pearson <- computeCoMethylatedRegions(ont_gr_GM18870_chr1_sorted_bins_1k, ont_gr_GM18870_chr1_PMD_bins_1k[1:5], minDistance = 500, maxDistance = 50000, minCoverage = 4, pValueThreshold = 0.05, correlation_test = "pearson", minCorrelation = -0.5, maxCorrelation = 0.5, parallel = FALSE, BPPARAM = NULL)
This function computes the differentially methylated regions between two conditions.
computeDMRs( methylationData1, methylationData2, regions = NULL, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", lambda = 0.5, binSize = 100, test = "fisher", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, parallel = FALSE, BPPARAM = NULL, cores = NULL )computeDMRs( methylationData1, methylationData2, regions = NULL, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", lambda = 0.5, binSize = 100, test = "fisher", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, parallel = FALSE, BPPARAM = NULL, cores = NULL )
methylationData1 |
the methylation data in condition 1
(see |
methylationData2 |
the methylation data in condition 2
(see |
regions |
a |
context |
the context in which the DMRs are computed ( |
method |
the method used to compute the DMRs ( |
windowSize |
the size of the triangle base measured in nucleotides.
This parameter is required only if the selected method is
|
kernelFunction |
a |
lambda |
numeric value required for the Gaussian filter
( |
binSize |
the size of the tiling bins in nucleotides. This parameter is
required only if the selected method is |
test |
the statistical test used to call DMRs ( |
pValueThreshold |
DMRs with p-values (when performing the statistical
test; see |
minCytosinesCount |
DMRs with less cytosines in the specified context
than |
minProportionDifference |
DMRs where the difference in methylation
proportion between the two conditions is lower than
|
minGap |
DMRs separated by a gap of at least |
minSize |
DMRs with a size smaller than |
minReadsPerCytosine |
DMRs with the average number of reads lower than
|
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
the DMRs stored as a GRanges object with the following
metadata columns:
a number indicating whether the region lost (-1) or gain (+1) methylation in condition 2 compared to condition 1.
the context in which the DMRs was computed ("CG",
"CHG" or "CHH").
the number of methylated reads in condition 1.
the total number of reads in condition 1.
the proportion methylated reads in condition 1.
the number of methylated reads in condition 2.
the total number reads in condition 2.
the proportion methylated reads in condition 2.
the number of cytosines in the DMR.
a string indicating whether the region lost ("loss")
or gained ("gain") methylation in condition 2 compared to condition 1.
the p-value (adjusted to control the false discovery rate with the Benjamini and Hochberg's method) of the statistical test when the DMR was called.
Nicolae Radu Zabet and Jonathan Michael Foonlan Tsang
filterDMRs, mergeDMRsIteratively,
analyseReadsInsideRegionsForCondition and
DMRsNoiseFilterCG
# load the methylation data data(methylationDataList) # the regions where to compute the DMRs regions <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E5)) # compute the DMRs in CG context with noise_filter method DMRsNoiseFilterCG <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) ## Not run: # compute the DMRs in CG context with neighbourhood method DMRsNeighbourhoodCG <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "neighbourhood", test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) # compute the DMRs in CG context with bins method DMRsBinsCG <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "bins", binSize = 100, test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) ## End(Not run)# load the methylation data data(methylationDataList) # the regions where to compute the DMRs regions <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E5)) # compute the DMRs in CG context with noise_filter method DMRsNoiseFilterCG <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) ## Not run: # compute the DMRs in CG context with neighbourhood method DMRsNeighbourhoodCG <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "neighbourhood", test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) # compute the DMRs in CG context with bins method DMRsBinsCG <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "bins", binSize = 100, test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) ## End(Not run)
This function computes the differentially methylated regions between replicates with two conditions.
computeDMRsReplicates( methylationData, condition = NULL, regions = NULL, context = "CG", method = "neighbourhood", binSize = 100, test = "betareg", pseudocountM = 1, pseudocountN = 2, pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, parallel = FALSE, BPPARAM = NULL, cores = NULL )computeDMRsReplicates( methylationData, condition = NULL, regions = NULL, context = "CG", method = "neighbourhood", binSize = 100, test = "betareg", pseudocountM = 1, pseudocountN = 2, pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, parallel = FALSE, BPPARAM = NULL, cores = NULL )
methylationData |
the methylation data containing all the conditions for all the replicates. |
condition |
a vector of strings indicating the conditions for each
sample in |
regions |
a |
context |
the context in which the DMRs are computed ( |
method |
the method used to compute the DMRs |
binSize |
the size of the tiling bins in nucleotides. This parameter is
required only if the selected method is |
test |
the statistical test used to call DMRs ( |
pseudocountM |
numerical value to be added to the methylated reads before modelling beta regression. |
pseudocountN |
numerical value to be added to the total reads before modelling beta regression. |
pValueThreshold |
DMRs with p-values (when performing the statistical
test; see |
minCytosinesCount |
DMRs with less cytosines in the specified context
than |
minProportionDifference |
DMRs where the difference in methylation
proportion between the two conditions is lower than
|
minGap |
DMRs separated by a gap of at least |
minSize |
DMRs with a size smaller than |
minReadsPerCytosine |
DMRs with the average number of reads lower than
|
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
the DMRs stored as a GRanges object with the following
metadata columns:
a number indicating whether the region lost (-1) or gain (+1) methylation in condition 2 compared to condition 1.
the context in which the DMRs was computed ("CG",
"CHG" or "CHH").
the number of methylated reads in condition 1.
the total number of reads in condition 1.
the proportion methylated reads in condition 1.
the number of methylated reads in condition 2.
the total number reads in condition 2.
the proportion methylated reads in condition 2.
the number of cytosines in the DMR.
a string indicating whether the region lost ("loss")
or gained ("gain") methylation in condition 2 compared to condition 1.
the p-value (adjusted to control the false discovery rate with the Benjamini and Hochberg's method) of the statistical test when the DMR was called.
Alessandro Pio Greco and Nicolae Radu Zabet
## Not run: # starting with data joined using joinReplicates data("syntheticDataReplicates") # compute the DMRs in CG context with neighbourhood method # creating condition vector condition <- c("a", "a", "b", "b") # computing DMRs using the neighbourhood method DMRsReplicatesNeighbourhood <- computeDMRsReplicates(methylationData = syntheticDataReplicates, condition = condition, regions = NULL, context = "CHH", method = "neighbourhood", test = "betareg", pseudocountM = 1, pseudocountN = 2, pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) ## End(Not run)## Not run: # starting with data joined using joinReplicates data("syntheticDataReplicates") # compute the DMRs in CG context with neighbourhood method # creating condition vector condition <- c("a", "a", "b", "b") # computing DMRs using the neighbourhood method DMRsReplicatesNeighbourhood <- computeDMRsReplicates(methylationData = syntheticDataReplicates, condition = condition, regions = NULL, context = "CHH", method = "neighbourhood", test = "betareg", pseudocountM = 1, pseudocountN = 2, pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) ## End(Not run)
This function computes the coverage for bisulfite sequencing data. It
returns a vector with the proportion (or raw count) of cytosines that
have the number of reads higher or equal than a vector of specified
thresholds.
computeMethylationDataCoverage( methylationData, regions = NULL, context = "CG", breaks = NULL, proportion = TRUE )computeMethylationDataCoverage( methylationData, regions = NULL, context = "CG", breaks = NULL, proportion = TRUE )
methylationData |
the methylation data stored as a |
regions |
a |
context |
the context in which the DMRs are computed ( |
breaks |
a |
proportion |
a |
a vector with the proportion (or raw count) of cytosines that
have the number of reads higher or equal than the threshold values specified
in the breaks vector.
Nicolae Radu Zabet and Jonathan Michael Foonlan Tsang
plotMethylationDataCoverage,
methylationDataList
# load the methylation data data(methylationDataList) # compute coverage in CG context breaks <- c(1,5,10,15) coverage_CG_wt <- computeMethylationDataCoverage(methylationDataList[["WT"]], context="CG", breaks=breaks)# load the methylation data data(methylationDataList) # compute coverage in CG context breaks <- c(1,5,10,15) coverage_CG_wt <- computeMethylationDataCoverage(methylationDataList[["WT"]], context="CG", breaks=breaks)
This function computes the correlation of the methylation levels as a
function of the distances between the Cytosines. The function returns a
vector with the correlation of methylation levels at distance equal to
a vector of specified thresholds.
computeMethylationDataSpatialCorrelation( methylationData, regions = NULL, context = "CG", distances = NULL )computeMethylationDataSpatialCorrelation( methylationData, regions = NULL, context = "CG", distances = NULL )
methylationData |
the methylation data stored as a |
regions |
a |
context |
the context in which the correlation is computed ( |
distances |
a |
a vector with the correlation of the methylation levels for
Cytosines located at distances specified in the distances
vector.
Nicolae Radu Zabet
plotMethylationDataSpatialCorrelation,
methylationDataList
# load the methylation data data(methylationDataList) # compute spatial correlation in CG context distances <- c(1,5,10,15) correlation_CG_wt <- computeMethylationDataSpatialCorrelation(methylationDataList[["WT"]], context="CG", distances=distances)# load the methylation data data(methylationDataList) # compute spatial correlation in CG context distances <- c(1,5,10,15) correlation_CG_wt <- computeMethylationDataSpatialCorrelation(methylationDataList[["WT"]], context="CG", distances=distances)
This function computes the low resolution profiles for the bisulfite sequencing data.
computeMethylationProfile( methylationData, region, windowSize = floor(width(region)/500), context = "CG" )computeMethylationProfile( methylationData, region, windowSize = floor(width(region)/500), context = "CG" )
methylationData |
the methylation data stored as a |
region |
a |
windowSize |
a |
context |
the context in which the DMRs are computed ( |
a GRanges object with equal sized tiles of the
region. The object consists of the following metadata
the number of methylated reads.
the total number of reads.
the proportion of methylated reads.
the number of cytosines in the regions
the context ("CG", "CHG" or "CHH").
Nicolae Radu Zabet and Jonathan Michael Foonlan Tsang
plotMethylationProfileFromData,
plotMethylationProfile, methylationDataList
# load the methylation data data(methylationDataList) # the region where to compute the profile region <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E6)) # compute low resolution profile in 20 Kb windows lowResProfileWTCHH <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 20000, context = "CHH") ## Not run: # compute low resolution profile in 10 Kb windows lowResProfileWTCG <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 10000, context = "CG") lowResProfileMet13CG <- computeMethylationProfile( methylationDataList[["met1-3"]], region, windowSize = 10000, context = "CG") ## End(Not run)# load the methylation data data(methylationDataList) # the region where to compute the profile region <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E6)) # compute low resolution profile in 20 Kb windows lowResProfileWTCHH <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 20000, context = "CHH") ## Not run: # compute low resolution profile in 10 Kb windows lowResProfileWTCG <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 10000, context = "CG") lowResProfileMet13CG <- computeMethylationProfile( methylationDataList[["met1-3"]], region, windowSize = 10000, context = "CG") ## End(Not run)
This function computes the distribution of a subset of regions
(GRanges object) over a large region (GRanges
object)
computeOverlapProfile( subRegions, largeRegion, windowSize = floor(width(largeRegion)/500), binary = TRUE, cores = 1 )computeOverlapProfile( subRegions, largeRegion, windowSize = floor(width(largeRegion)/500), binary = TRUE, cores = 1 )
subRegions |
a |
largeRegion |
a |
windowSize |
The |
binary |
a value indicating whether to count 1 for each overlap or to compute the width of the overlap |
cores |
the number of cores used to compute the DMRs. |
a GRanges object with equal sized tiles of the regions.
The object has one metadata file score which represents: the number of
subRegions overlapping with the tile, in the case of binary = TRUE,
and the width of the subRegions overlapping with the tile , in the case of
binary = FALSE.
Nicolae Radu Zabet
plotOverlapProfile, filterDMRs,
computeDMRs and mergeDMRsIteratively
# load the methylation data data(methylationDataList) # load the DMRs in CG context data(DMRsNoiseFilterCG) # the coordinates of the area to be plotted largeRegion <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E5)) # compute overlaps distribution hotspots <- computeOverlapProfile(DMRsNoiseFilterCG, largeRegion, windowSize = 10000, binary = FALSE)# load the methylation data data(methylationDataList) # load the DMRs in CG context data(DMRsNoiseFilterCG) # the coordinates of the area to be plotted largeRegion <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E5)) # compute overlaps distribution hotspots <- computeOverlapProfile(DMRsNoiseFilterCG, largeRegion, windowSize = 10000, binary = FALSE)
This function computes the partially methylated domains between pre-set min and max proportion values.
computePMDs( methylationData, regions = NULL, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", lambda = 0.5, binSize = 100, minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, parallel = FALSE, BPPARAM = NULL, cores = NULL )computePMDs( methylationData, regions = NULL, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", lambda = 0.5, binSize = 100, minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, parallel = FALSE, BPPARAM = NULL, cores = NULL )
methylationData |
the methylation data in condition
(see |
regions |
a |
context |
the context in which the PMDs are computed ( |
method |
Character string specifying the algorithm for PMD detection.
If |
windowSize |
the size of the triangle base measured in nucleotides.
This parameter is required only if the selected method is
|
kernelFunction |
a |
lambda |
numeric value required for the Gaussian filter
( |
binSize |
the size of the tiling bins in nucleotides. This parameter is
required only if the selected method is |
minCytosinesCount |
PMDs with less cytosines in the specified context
than |
minMethylation |
Numeric [0,1]; minimum mean methylation proportion. |
maxMethylation |
Numeric [0,1]; maximum mean methylation proportion. |
minGap |
PMDs separated by a gap of at least |
minSize |
PMDs with a size smaller than |
minReadsPerCytosine |
PMDs with the average number of reads lower than
|
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
the PMDs stored as a GRanges object with the following
metadata columns:
the context in which the PMDs was computed ("CG",
"CHG" or "CHH").
the number of methylated reads.
the total number of reads.
the proportion methylated reads filtered between
minMethylation and maxMethylation.
the number of cytosines in the PMDs.
Nicolae Radu Zabet, Jonathan Michael Foonlan Tsang and Young Jun Kim
readONTbam, filterPMDs, mergePMDsIteratively,
analyseReadsInsideRegionsForConditionPMD and
PMDsNoiseFilterCG
# load the ONT methylation data data(ontSampleGRangesList) # the regions where to compute the PMDs chr1_ranges <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) # compute the PMDs in CG context with noise_filter method PMDsNoiseFilterCG <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", windowSize = 100, method = "noise_filter", kernelFunction = "triangular", lambda = 0.5, minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1, parallel = FALSE) ## Not run: # compute the PMDs in CG context with neighbourhood method PMDsNeighbourhoodCG <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "neighbourhood" minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1, parallel = FALSE) # compute the PMDs in CG context with bins method PMDsBinsCG <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "bins", binSize = 100, minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1, parallel = FALSE) ## End(Not run)# load the ONT methylation data data(ontSampleGRangesList) # the regions where to compute the PMDs chr1_ranges <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) # compute the PMDs in CG context with noise_filter method PMDsNoiseFilterCG <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", windowSize = 100, method = "noise_filter", kernelFunction = "triangular", lambda = 0.5, minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1, parallel = FALSE) ## Not run: # compute the PMDs in CG context with neighbourhood method PMDsNeighbourhoodCG <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "neighbourhood" minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1, parallel = FALSE) # compute the PMDs in CG context with bins method PMDsBinsCG <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "bins", binSize = 100, minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1, parallel = FALSE) ## End(Not run)
This function computes the variance methylated domains between pre-set min and max proportion values.
computeVMDs( methylationData, regions = NULL, context = "CG", binSize = 100, minCytosinesCount = 4, sdCutoffMethod = "per.high", percentage = 0.05, minGap = 200, minSize = 50, minReadsPerCytosine = 4, parallel = FALSE, BPPARAM = NULL, cores = NULL )computeVMDs( methylationData, regions = NULL, context = "CG", binSize = 100, minCytosinesCount = 4, sdCutoffMethod = "per.high", percentage = 0.05, minGap = 200, minSize = 50, minReadsPerCytosine = 4, parallel = FALSE, BPPARAM = NULL, cores = NULL )
methylationData |
the methylation data in condition
(see |
regions |
a |
context |
the context in which the VMDs are computed ( |
binSize |
the size of the tiling bins in nucleotides. This parameter is
required only if the selected method is |
minCytosinesCount |
VMDs with less cytosines in the specified context
than |
sdCutoffMethod |
Character string specifying how to determine the cutoff for filtering VMDs based on their methylation variance (standard deviation). Available options are:
This allows either quantile-based filtering or automatic detection of variance thresholds based on distribution shape. |
percentage |
Numeric cutoff used when |
minGap |
VMDs separated by a gap of at least |
minSize |
VMDs with a size smaller than |
minReadsPerCytosine |
VMDs with the average number of reads lower than
|
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
the VMDs stored as a GRanges object with the following
metadata columns:
the context in which the VMDs was computed ("CG",
"CHG" or "CHH").
the number of methylated reads.
the total number of reads.
the proportion from total methylated reads.
the number of cytosines in the VMDs.
mean value comparing per‐read proportions
standard deviation comparing per‐read proportions
weighted mean value comparing per‐read proportions
weighted standard deviation comparing per‐read proportions
Nicolae Radu Zabet, Jonathan Michael Foonlan Tsang and Young Jun Kim
readONTbam, filterVMDs
and analyseReadsInsideRegionsForCondition
## Not run: # load the ONT methylation data data(ontSampleGRangesList) # the regions where to compute the VMDs chr1_ranges <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,1E6+6E5)) # compute the VMDs in CG context with bins method VMDsBinsCG <- computeVMDs(ontSampleGRangesList[["GM18501"]], regions = NULL, context = "CG", binSize = 100, minCytosinesCount = 4, sdCutoffMethod = "EDE.high", percentage = 0.05, minGap = 200, minSize = 50, minReadsPerCytosine = 4, parallel = FALSE, BPPARAM = NULL, cores = 1) ## End(Not run)## Not run: # load the ONT methylation data data(ontSampleGRangesList) # the regions where to compute the VMDs chr1_ranges <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,1E6+6E5)) # compute the VMDs in CG context with bins method VMDsBinsCG <- computeVMDs(ontSampleGRangesList[["GM18501"]], regions = NULL, context = "CG", binSize = 100, minCytosinesCount = 4, sdCutoffMethod = "EDE.high", percentage = 0.05, minGap = 200, minSize = 50, minReadsPerCytosine = 4, parallel = FALSE, BPPARAM = NULL, cores = 1) ## End(Not run)
Supports both Bisulfite Sequencing (Bismark CX reports) and Oxford Nanopore Sequencing (MM/ML tags) for per-site methylation calling. Identifies differentially methylated regions between two samples in CG and non-CG contexts.
For bisulfite data, the input is Bismark CX report files and the output
is a list of DMRs stored as a GRanges object.
readsM — count of modified reads per site
readsN — total same-strand coverage per site
For Nanopore data, the input is an indexed ONT BAM with calling
readONTbam function in this package and the output is a
GRanges augmented with metadata columns:
ONT_Cm — comma-delimited read-indices called “modified”
ONT_C — comma-delimited read-indices covering but not modified
readsM — count of modified reads per site
readsN — total same-strand coverage per site
The most important functions in the DMRcaller are:
readBismarkreads the Bismark CX report files in a
GRanges object.
readBismarkPoolReads multiple CX report files and pools them together.
saveBismarksaves the methylation data stored in a
GRanges object into a Bismark CX report file.
selectCytosineEnumerates cytosine positions in a BSgenome reference, optionally filtering by methylation context (CG/CHG/CHH), chromosome and genomic region.
readONTbamLoads an Oxford Nanopore BAM (with MM/ML tags), decodes per‐C modification probabilities and counts modified vs. unmodified reads per site.
poolMethylationDatasetspools together multiple methylation datasets.
poolTwoMethylationDatasetspools together two methylation datasets.
computeMethylationDataCoverageComputes the coverage for the bisulfite sequencing data.
plotMethylationDataCoveragePlots the coverage for the bisulfite sequencing data.
computeMethylationDataSpatialCorrelationComputes the correlation between methylation levels as a function of the distances between the Cytosines.
plotMethylationDataSpatialCorrelationPlots the correlation of methylation levels for Cytosines located at a certain distance apart.
computeMethylationProfileComputes the low resolution profiles for the bisulfite sequencing data at certain locations.
plotMethylationProfilePlots the low resolution profiles for the bisulfite sequencing data at certain locations.
plotMethylationProfileFromDataPlots the low resolution profiles for the loaded bisulfite sequencing data.
computeDMRsComputes the differentially methylated regions between two conditions.
filterDMRsFilters a list of (potential) differentially methylated regions.
mergeDMRsIterativelyMerge DMRs iteratively.
analyseReadsInsideRegionsForConditionAnalyse reads inside regions for condition.
plotLocalMethylationProfilePlots the methylation profile at one locus for the bisulfite sequencing data.
computeOverlapProfileComputes the distribution of a set of subregions on a large region.
plotOverlapProfilePlots the distribution of a set of subregions on a large region.
getWholeChromosomesComputes the GRanges objects with each chromosome as an element from the methylationData.
joinReplicatesMerges two GRanges objects with single reads columns. It is necessary to start the analysis of DMRs with biological replicates.
computeDMRsReplicatesComputes the differentially methylated regions between two conditions with multiple biological replicates.
selectCytosineEnumerates cytosine positions in a BSgenome reference.
readONTbamLoads an ONT BAM (MM/ML tags), decodes per‐C modification probabilities, and counts modified vs. unmodified reads per site.
computePMDsPartitions the genome into PMDs via three methods ("noise_filter", "neighbourhood", "bins").
filterPMDsFilters a set of PMDs by methylation level and read depth.
mergePMDsIterativelyMerge PMDs while preserving statistical significance.
analyseReadsInsideRegionsForConditionPMDCounts reads in each PMD for one condition.
computeVMDsComputes the variance methylated domains between pre-set min and max proportion values
filterVMRsONTFilters VMRs with ONT-specific variance tests and CI filters
computeCoMethylatedPositionsComputes pairwise co‐methylation between Cytosine sites within regions.
computeCoMethylatedRegionsComputes pairwise co-methylation statistics between regions.
Nicolae Radu Zabet [email protected], Jonathan Michael Foonlan Tsang [email protected], Alessandro Pio Greco [email protected], Young Jun Kim [email protected]
Maintainer: Nicolae Radu Zabet [email protected]
See vignette("rd", package = "DMRcaller") for an overview
of the package.
## Not run: # load the methylation data data(methylationDataList) library(BSgenome.Hsapiens.UCSC.hg38) # All cytosines in hg38: gr_all <- selectCytosine() # Only CpG sites on chr1 and chr2: gr_chr1_2 <- selectCytosine(context="CG", chr=c("chr1","chr2")) # CHH sites in a specific region on chr3: my_region <- GRanges("chr3", IRanges(1e6, 1e6 + 1e5)) gr_region <- selectCytosine(context="CHH", chr="chr3", region=my_region) # set the bam file directory bam_path <- system.file("extdata", "scanBamChr1Random5.bam", package="DMRcaller") # read ONTbam file (chromosome 1 only) in CG context with BSgenome.Hsapiens.UCSC.hg38 ONTSampleGRanges <- readONTbam(bamfile = bam_path, ref_gr = NULL, modif = "C+m?", prob_thresh = 0.50,genome = BSgenome.Hsapiens.UCSC.hg38, context = "CG", chr = "chr1", region = NULL, synonymous = FALSE, parallel = FALSE, BPPARAM = NULL) # plot the low resolution profile at 5 Kb resolution par(mar=c(4, 4, 3, 1)+0.1) plotMethylationProfileFromData(methylationDataList[["WT"]], methylationDataList[["met1-3"]], conditionsNames=c("WT", "met1-3"), windowSize = 5000, autoscale = TRUE, context = c("CG", "CHG", "CHH"), labels = LETTERS) # compute low resolution profile in 10 Kb windows in CG context lowResProfileWTCG <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 10000, context = "CG") lowResProfileMet13CG <- computeMethylationProfile( methylationDataList[["met1-3"]], region, windowSize = 10000, context = "CG") lowResProfileCG <- GRangesList("WT" = lowResProfileWTCG, "met1-3" = lowResProfileMet13CG) # compute low resolution profile in 10 Kb windows in CHG context lowResProfileWTCHG <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 10000, context = "CHG") lowResProfileMet13CHG <- computeMethylationProfile( methylationDataList[["met1-3"]], region, windowSize = 10000, context = "CHG") lowResProfileCHG <- GRangesList("WT" = lowResProfileWTCHG, "met1-3" = lowResProfileMet13CHG) # plot the low resolution profile par(mar=c(4, 4, 3, 1)+0.1) par(mfrow=c(2,1)) plotMethylationProfile(lowResProfileCG, autoscale = FALSE, labels = LETTERS[1], title="CG methylation on Chromosome 3", col=c("#D55E00","#E69F00"), pch = c(1,0), lty = c(4,1)) plotMethylationProfile(lowResProfileCHG, autoscale = FALSE, labels = LETTERS[2], title="CHG methylation on Chromosome 3", col=c("#0072B2", "#56B4E9"), pch = c(16,2), lty = c(3,2)) # plot the coverage in all three contexts plotMethylationDataCoverage(methylationDataList[["WT"]], methylationDataList[["met1-3"]], breaks = 1:15, regions = NULL, conditionsNames = c("WT","met1-3"), context = c("CG", "CHG", "CHH"), proportion = TRUE, labels = LETTERS, col = NULL, pch = c(1,0,16,2,15,17), lty = c(4,1,3,2,6,5), contextPerRow = FALSE) # plot the correlation of methylation levels as a function of distance plotMethylationDataSpatialCorrelation(methylationDataList[["WT"]], distances = c(1,5,10,15), regions = NULL, conditionsNames = c("WT","met1-3"), context = c("CG"), labels = LETTERS, col = NULL, pch = c(1,0,16,2,15,17), lty = c(4,1,3,2,6,5), contextPerRow = FALSE) # the regions where to compute the DMRs regions <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E6)) # compute the DMRs in CG context with noise_filter method DMRsNoiseFilterCG <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) # compute the DMRs in CG context with neighbourhood method DMRsNeighbourhoodCG <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "neighbourhood", test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) # compute the DMRs in CG context with bins method DMRsBinsCG <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "bins", binSize = 100, test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) # load the gene annotation data data(GEs) # select the genes genes <- GEs[which(GEs$type == "gene")] # the regions where to compute the DMRs genes <- genes[overlapsAny(genes, regions)] # filter genes that are differntially methylated in the two conditions DMRsGenesCG <- filterDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], potentialDMRs = genes, context = "CG", test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minReadsPerCytosine = 3, cores = 1) # merge the DMRs DMRsNoiseFilterCGLarger <- mergeDMRsIteratively(DMRsNoiseFilterCG, minGap = 500, respectSigns = TRUE, methylationDataList[["WT"]], methylationDataList[["met1-3"]], context = "CG", minProportionDifference=0.4, minReadsPerCytosine = 1, pValueThreshold=0.01, test="score",alternative = "two.sided") # select the genes genes <- GEs[which(GEs$type == "gene")] # the coordinates of the area to be plotted chr3Reg <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(510000,530000)) # load the DMRs in CG context data(DMRsNoiseFilterCG) DMRsCGlist <- list("noise filter"=DMRsNoiseFilterCG, "neighbourhood"=DMRsNeighbourhoodCG, "bins"=DMRsBinsCG, "genes"=DMRsGenesCG) # plot the CG methylation par(mar=c(4, 4, 3, 1)+0.1) par(mfrow=c(1,1)) plotLocalMethylationProfile(methylationDataList[["WT"]], methylationDataList[["met1-3"]], chr3Reg, DMRsCGlist, c("WT", "met1-3"), GEs, windowSize=100, main="CG methylation") hotspotsHypo <- computeOverlapProfile( DMRsNoiseFilterCG[(DMRsNoiseFilterCG$regionType == "loss")], region, windowSize=2000, binary=TRUE, cores=1) hotspotsHyper <- computeOverlapProfile( DMRsNoiseFilterCG[(DMRsNoiseFilterCG$regionType == "gain")], region, windowSize=2000, binary=TRUE, cores=1) plotOverlapProfile(GRangesList("Chr3"=hotspotsHypo), GRangesList("Chr3"=hotspotsHyper), names=c("loss", "gain"), title="CG methylation") # loading synthetic data data("syntheticDataReplicates") # creating condition vector condition <- c("a", "a", "b", "b") # computing DMRs using the neighbourhood method DMRsReplicatesNeighbourhood <- computeDMRsReplicates(methylationData = methylationData, condition = condition, regions = NULL, context = "CHH", method = "neighbourhood", test = "betareg", pseudocountM = 1, pseudocountN = 2, pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) # load the ONT methylation data data(ontSampleGRangesList) # the regions where to compute the PMDs chr1_ranges <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) # compute the PMDs in CG context with noise_filter method PMDsNoiseFilterCG <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", windowSize = 100, method = "noise_filter", kernelFunction = "triangular", lambda = 0.5, minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1, parallel = FALSE) # compute the PMDs in CG context with neighbourhood method PMDsNeighbourhoodCG <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "neighbourhood" minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1, parallel = FALSE) # compute the PMDs in CG context with bins method PMDsBinsCG <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "bins", binSize = 100, minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1, parallel = FALSE) # load the gene annotation data data(GEs_hg38) # select the transcript transcript <- GEs_hg38[which(GEs_hg38$type == "transcript")] # the regions where to compute the PMDs regions <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) transcript <- transcript[overlapsAny(transcript, regions)] # filter genes that are partially methylated in the two conditions PMDsGenesCG <- filterPMDs(ontSampleGRangesList[["GM18501"]], potentialPMDs = transcript, context = "CG", minMethylation = 0.4, maxMethylation = 0.6, minCytosinesCount = 4, minReadsPerCytosine = 3, cores = 1) # load the PMDs in CG context they were computed with minGap = 200 data(PMDsNoiseFilterCG) # merge the PMDs PMDsNoiseFilterCGLarger <- mergePMDsIteratively(PMDsNoiseFilterCG[1:100], minGap = 500, respectSigns = TRUE, ontSampleGRangesList[["GM18501"]], context = "CG", minReadsPerCytosine = 4, minMethylation = 0.4, maxMethylation = 0.6, cores = 1) # set genomic coordinates where to compute PMDs chr1_ranges <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) # compute PMDs and remove gaps smaller than 200 bp PMDsNoiseFilterCG200 <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", minCytosinesCount = 1, minMethylation = 0.4, maxMethylation = 0.6, minGap = 0, minSize = 200, minReadsPerCytosine = 1, cores = 1) PMDsNoiseFilterCG0 <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", minCytosinesCount = 1, minMethylation = 0.4, maxMethylation = 0.6, minGap = 0, minSize = 0, minReadsPerCytosine = 1, cores = 1) PMDsNoiseFilterCG0Merged200 <- mergePMDsIteratively(PMDsNoiseFilterCG0, minGap = 200, respectSigns = TRUE, ontSampleGRangesList[["GM18501"]], context = "CG", minReadsPerCytosine = 4, minMethylation = 0.4, maxMethylation = 0.6, cores = 1) #check that all newley computed PMDs are identical print(all(PMDsNoiseFilterCG200 == PMDsNoiseFilterCG0Merged200)) #retrive the number of reads in CG context in GM18501 PMDsNoiseFilterCGreadsCG <- analyseReadsInsideRegionsForConditionPMD( PMDsNoiseFilterCG[1:10], ontSampleGRangesList[["GM18501"]], context = "CG", label = "GM18501") # load the PMD data data(PMDsBinsCG) # compute the co-methylations with Fisher's exact test coMetylationFisher <- computeCoMethylatedPositions( ontSampleGRangesList[[1]], regions = PMDsBinsCG, minDistance = 150, maxDistance = 1000, minCoverage = 4, pValueThreshold = 0.01, test = "fisher", parallel = FALSE) # compute the co-methylations with Permuation test coMetylationPermutation <- computeCoMethylatedPositions( ontSampleGRangesList[[1]], regions = PMDsBinsCG, minDistance = 150, maxDistance = 1000, minCoverage = 4, pValueThreshold = 0.01, test = "permutation", parallel = FALSE) # highly recommended to set as TRUE # select the transcript transcript <- GEs_hg38[which(GEs_hg38$type == "transcript")] # the regions where to compute the PMDs regions <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) transcript <- transcript[overlapsAny(transcript, regions)] # filter genes that are differntially methylated in the two conditions VMRsGenesCG <- filterVMRsONT(ontSampleGRangesList[["GM18501"]], ontSampleGRangesList[["GM18876"]], potentialVMRs = transcript, context = "CG", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.01, minReadsPerCytosine = 3, ciExcludesOne = TRUE, varRatioFc = NULL, parallel = TRUE) # parallel recommended ## End(Not run)## Not run: # load the methylation data data(methylationDataList) library(BSgenome.Hsapiens.UCSC.hg38) # All cytosines in hg38: gr_all <- selectCytosine() # Only CpG sites on chr1 and chr2: gr_chr1_2 <- selectCytosine(context="CG", chr=c("chr1","chr2")) # CHH sites in a specific region on chr3: my_region <- GRanges("chr3", IRanges(1e6, 1e6 + 1e5)) gr_region <- selectCytosine(context="CHH", chr="chr3", region=my_region) # set the bam file directory bam_path <- system.file("extdata", "scanBamChr1Random5.bam", package="DMRcaller") # read ONTbam file (chromosome 1 only) in CG context with BSgenome.Hsapiens.UCSC.hg38 ONTSampleGRanges <- readONTbam(bamfile = bam_path, ref_gr = NULL, modif = "C+m?", prob_thresh = 0.50,genome = BSgenome.Hsapiens.UCSC.hg38, context = "CG", chr = "chr1", region = NULL, synonymous = FALSE, parallel = FALSE, BPPARAM = NULL) # plot the low resolution profile at 5 Kb resolution par(mar=c(4, 4, 3, 1)+0.1) plotMethylationProfileFromData(methylationDataList[["WT"]], methylationDataList[["met1-3"]], conditionsNames=c("WT", "met1-3"), windowSize = 5000, autoscale = TRUE, context = c("CG", "CHG", "CHH"), labels = LETTERS) # compute low resolution profile in 10 Kb windows in CG context lowResProfileWTCG <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 10000, context = "CG") lowResProfileMet13CG <- computeMethylationProfile( methylationDataList[["met1-3"]], region, windowSize = 10000, context = "CG") lowResProfileCG <- GRangesList("WT" = lowResProfileWTCG, "met1-3" = lowResProfileMet13CG) # compute low resolution profile in 10 Kb windows in CHG context lowResProfileWTCHG <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 10000, context = "CHG") lowResProfileMet13CHG <- computeMethylationProfile( methylationDataList[["met1-3"]], region, windowSize = 10000, context = "CHG") lowResProfileCHG <- GRangesList("WT" = lowResProfileWTCHG, "met1-3" = lowResProfileMet13CHG) # plot the low resolution profile par(mar=c(4, 4, 3, 1)+0.1) par(mfrow=c(2,1)) plotMethylationProfile(lowResProfileCG, autoscale = FALSE, labels = LETTERS[1], title="CG methylation on Chromosome 3", col=c("#D55E00","#E69F00"), pch = c(1,0), lty = c(4,1)) plotMethylationProfile(lowResProfileCHG, autoscale = FALSE, labels = LETTERS[2], title="CHG methylation on Chromosome 3", col=c("#0072B2", "#56B4E9"), pch = c(16,2), lty = c(3,2)) # plot the coverage in all three contexts plotMethylationDataCoverage(methylationDataList[["WT"]], methylationDataList[["met1-3"]], breaks = 1:15, regions = NULL, conditionsNames = c("WT","met1-3"), context = c("CG", "CHG", "CHH"), proportion = TRUE, labels = LETTERS, col = NULL, pch = c(1,0,16,2,15,17), lty = c(4,1,3,2,6,5), contextPerRow = FALSE) # plot the correlation of methylation levels as a function of distance plotMethylationDataSpatialCorrelation(methylationDataList[["WT"]], distances = c(1,5,10,15), regions = NULL, conditionsNames = c("WT","met1-3"), context = c("CG"), labels = LETTERS, col = NULL, pch = c(1,0,16,2,15,17), lty = c(4,1,3,2,6,5), contextPerRow = FALSE) # the regions where to compute the DMRs regions <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E6)) # compute the DMRs in CG context with noise_filter method DMRsNoiseFilterCG <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) # compute the DMRs in CG context with neighbourhood method DMRsNeighbourhoodCG <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "neighbourhood", test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) # compute the DMRs in CG context with bins method DMRsBinsCG <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "bins", binSize = 100, test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) # load the gene annotation data data(GEs) # select the genes genes <- GEs[which(GEs$type == "gene")] # the regions where to compute the DMRs genes <- genes[overlapsAny(genes, regions)] # filter genes that are differntially methylated in the two conditions DMRsGenesCG <- filterDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], potentialDMRs = genes, context = "CG", test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minReadsPerCytosine = 3, cores = 1) # merge the DMRs DMRsNoiseFilterCGLarger <- mergeDMRsIteratively(DMRsNoiseFilterCG, minGap = 500, respectSigns = TRUE, methylationDataList[["WT"]], methylationDataList[["met1-3"]], context = "CG", minProportionDifference=0.4, minReadsPerCytosine = 1, pValueThreshold=0.01, test="score",alternative = "two.sided") # select the genes genes <- GEs[which(GEs$type == "gene")] # the coordinates of the area to be plotted chr3Reg <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(510000,530000)) # load the DMRs in CG context data(DMRsNoiseFilterCG) DMRsCGlist <- list("noise filter"=DMRsNoiseFilterCG, "neighbourhood"=DMRsNeighbourhoodCG, "bins"=DMRsBinsCG, "genes"=DMRsGenesCG) # plot the CG methylation par(mar=c(4, 4, 3, 1)+0.1) par(mfrow=c(1,1)) plotLocalMethylationProfile(methylationDataList[["WT"]], methylationDataList[["met1-3"]], chr3Reg, DMRsCGlist, c("WT", "met1-3"), GEs, windowSize=100, main="CG methylation") hotspotsHypo <- computeOverlapProfile( DMRsNoiseFilterCG[(DMRsNoiseFilterCG$regionType == "loss")], region, windowSize=2000, binary=TRUE, cores=1) hotspotsHyper <- computeOverlapProfile( DMRsNoiseFilterCG[(DMRsNoiseFilterCG$regionType == "gain")], region, windowSize=2000, binary=TRUE, cores=1) plotOverlapProfile(GRangesList("Chr3"=hotspotsHypo), GRangesList("Chr3"=hotspotsHyper), names=c("loss", "gain"), title="CG methylation") # loading synthetic data data("syntheticDataReplicates") # creating condition vector condition <- c("a", "a", "b", "b") # computing DMRs using the neighbourhood method DMRsReplicatesNeighbourhood <- computeDMRsReplicates(methylationData = methylationData, condition = condition, regions = NULL, context = "CHH", method = "neighbourhood", test = "betareg", pseudocountM = 1, pseudocountN = 2, pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1) # load the ONT methylation data data(ontSampleGRangesList) # the regions where to compute the PMDs chr1_ranges <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) # compute the PMDs in CG context with noise_filter method PMDsNoiseFilterCG <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", windowSize = 100, method = "noise_filter", kernelFunction = "triangular", lambda = 0.5, minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1, parallel = FALSE) # compute the PMDs in CG context with neighbourhood method PMDsNeighbourhoodCG <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "neighbourhood" minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1, parallel = FALSE) # compute the PMDs in CG context with bins method PMDsBinsCG <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "bins", binSize = 100, minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minGap = 200, minSize = 50, minReadsPerCytosine = 4, cores = 1, parallel = FALSE) # load the gene annotation data data(GEs_hg38) # select the transcript transcript <- GEs_hg38[which(GEs_hg38$type == "transcript")] # the regions where to compute the PMDs regions <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) transcript <- transcript[overlapsAny(transcript, regions)] # filter genes that are partially methylated in the two conditions PMDsGenesCG <- filterPMDs(ontSampleGRangesList[["GM18501"]], potentialPMDs = transcript, context = "CG", minMethylation = 0.4, maxMethylation = 0.6, minCytosinesCount = 4, minReadsPerCytosine = 3, cores = 1) # load the PMDs in CG context they were computed with minGap = 200 data(PMDsNoiseFilterCG) # merge the PMDs PMDsNoiseFilterCGLarger <- mergePMDsIteratively(PMDsNoiseFilterCG[1:100], minGap = 500, respectSigns = TRUE, ontSampleGRangesList[["GM18501"]], context = "CG", minReadsPerCytosine = 4, minMethylation = 0.4, maxMethylation = 0.6, cores = 1) # set genomic coordinates where to compute PMDs chr1_ranges <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) # compute PMDs and remove gaps smaller than 200 bp PMDsNoiseFilterCG200 <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", minCytosinesCount = 1, minMethylation = 0.4, maxMethylation = 0.6, minGap = 0, minSize = 200, minReadsPerCytosine = 1, cores = 1) PMDsNoiseFilterCG0 <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", minCytosinesCount = 1, minMethylation = 0.4, maxMethylation = 0.6, minGap = 0, minSize = 0, minReadsPerCytosine = 1, cores = 1) PMDsNoiseFilterCG0Merged200 <- mergePMDsIteratively(PMDsNoiseFilterCG0, minGap = 200, respectSigns = TRUE, ontSampleGRangesList[["GM18501"]], context = "CG", minReadsPerCytosine = 4, minMethylation = 0.4, maxMethylation = 0.6, cores = 1) #check that all newley computed PMDs are identical print(all(PMDsNoiseFilterCG200 == PMDsNoiseFilterCG0Merged200)) #retrive the number of reads in CG context in GM18501 PMDsNoiseFilterCGreadsCG <- analyseReadsInsideRegionsForConditionPMD( PMDsNoiseFilterCG[1:10], ontSampleGRangesList[["GM18501"]], context = "CG", label = "GM18501") # load the PMD data data(PMDsBinsCG) # compute the co-methylations with Fisher's exact test coMetylationFisher <- computeCoMethylatedPositions( ontSampleGRangesList[[1]], regions = PMDsBinsCG, minDistance = 150, maxDistance = 1000, minCoverage = 4, pValueThreshold = 0.01, test = "fisher", parallel = FALSE) # compute the co-methylations with Permuation test coMetylationPermutation <- computeCoMethylatedPositions( ontSampleGRangesList[[1]], regions = PMDsBinsCG, minDistance = 150, maxDistance = 1000, minCoverage = 4, pValueThreshold = 0.01, test = "permutation", parallel = FALSE) # highly recommended to set as TRUE # select the transcript transcript <- GEs_hg38[which(GEs_hg38$type == "transcript")] # the regions where to compute the PMDs regions <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) transcript <- transcript[overlapsAny(transcript, regions)] # filter genes that are differntially methylated in the two conditions VMRsGenesCG <- filterVMRsONT(ontSampleGRangesList[["GM18501"]], ontSampleGRangesList[["GM18876"]], potentialVMRs = transcript, context = "CG", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.01, minReadsPerCytosine = 3, ciExcludesOne = TRUE, varRatioFc = NULL, parallel = TRUE) # parallel recommended ## End(Not run)
A GRangesList object containing the DMRs between Wild Type (WT) and
met1-3 mutant (met1-3) in Arabidopsis thaliana
(see methylationDataList). The DMRs were computed on the first
1 Mbp from Chromosome 3 with noise filter method using a triangular kernel
and a windowSize of 100 bp
The GRanges element contain 11 metadata columns;
see computeDMRs
filterDMRs, computeDMRs,
analyseReadsInsideRegionsForCondition
and mergeDMRsIteratively
This function extracts GC sites in the genome
extractGC(methylationData, genome, contexts = c("ALL", "CG", "CHG", "CHH"))extractGC(methylationData, genome, contexts = c("ALL", "CG", "CHG", "CHH"))
methylationData |
the methylation data stored as a |
genome |
a BSgenome with the DNA sequence of the organism |
contexts |
the context in which the DMRs are computed ( |
the a subset of methylationData consisting of all GC sites.
Ryan Merritt
## Not run: # load the genome sequence if(!require("BSgenome.Athaliana.TAIR.TAIR9", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Athaliana.TAIR.TAIR9") } library(BSgenome.Athaliana.TAIR.TAIR9) # load the methylation data data(methylationDataList) methylationDataWTGpCpG <- extractGC(methylationDataList[["WT"]], BSgenome.Athaliana.TAIR.TAIR9, "CG") ## End(Not run)## Not run: # load the genome sequence if(!require("BSgenome.Athaliana.TAIR.TAIR9", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Athaliana.TAIR.TAIR9") } library(BSgenome.Athaliana.TAIR.TAIR9) # load the methylation data data(methylationDataList) methylationDataWTGpCpG <- extractGC(methylationDataList[["WT"]], BSgenome.Athaliana.TAIR.TAIR9, "CG") ## End(Not run)
This function verifies whether a set of pottential DMRs (e.g. genes, transposons, CpG islands) are differentially methylated or not.
filterDMRs( methylationData1, methylationData2, potentialDMRs, context = "CG", test = "fisher", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minReadsPerCytosine = 3, parallel = FALSE, BPPARAM = NULL, cores = NULL )filterDMRs( methylationData1, methylationData2, potentialDMRs, context = "CG", test = "fisher", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minReadsPerCytosine = 3, parallel = FALSE, BPPARAM = NULL, cores = NULL )
methylationData1 |
the methylation data in condition 1
(see |
methylationData2 |
the methylation data in condition 2
(see |
potentialDMRs |
a |
context |
the context in which the DMRs are computed ( |
test |
the statistical test used to call DMRs ( |
pValueThreshold |
DMRs with p-values (when performing the statistical
test; see |
minCytosinesCount |
DMRs with less cytosines in the specified context
than |
minProportionDifference |
DMRs where the difference in methylation
proportion between the two conditions is lower than
|
minReadsPerCytosine |
DMRs with the average number of reads lower than
|
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
a GRanges object with 11 metadata columns that contain
the DMRs; see computeDMRs.
Nicolae Radu Zabet
DMRsNoiseFilterCG, computeDMRs,
analyseReadsInsideRegionsForCondition
and mergeDMRsIteratively
# load the methylation data data(methylationDataList) # load the gene annotation data data(GEs) #select the genes genes <- GEs[which(GEs$type == "gene")] # the regions where to compute the DMRs regions <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E5)) genes <- genes[overlapsAny(genes, regions)] # filter genes that are differntially methylated in the two conditions DMRsGenesCG <- filterDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], potentialDMRs = genes, context = "CG", test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minReadsPerCytosine = 3, cores = 1)# load the methylation data data(methylationDataList) # load the gene annotation data data(GEs) #select the genes genes <- GEs[which(GEs$type == "gene")] # the regions where to compute the DMRs regions <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E5)) genes <- genes[overlapsAny(genes, regions)] # filter genes that are differntially methylated in the two conditions DMRsGenesCG <- filterDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], potentialDMRs = genes, context = "CG", test = "score", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minReadsPerCytosine = 3, cores = 1)
This function verifies whether a set of potential PMDs (e.g. genes, transposons, CpG islands) are partially methylated or not.
filterPMDs( methylationData, potentialPMDs, context = "CG", minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minReadsPerCytosine = 3, parallel = FALSE, BPPARAM = NULL, cores = NULL )filterPMDs( methylationData, potentialPMDs, context = "CG", minCytosinesCount = 4, minMethylation = 0.4, maxMethylation = 0.6, minReadsPerCytosine = 3, parallel = FALSE, BPPARAM = NULL, cores = NULL )
methylationData |
the methylation data in condition
(see |
potentialPMDs |
a |
context |
the context in which the PMDs are computed ( |
minCytosinesCount |
PMDs with less cytosines in the specified context
than |
minMethylation |
Numeric [0,1]; minimum mean methylation proportion. |
maxMethylation |
Numeric [0,1]; maximum mean methylation proportion. |
minReadsPerCytosine |
PMDs with the average number of reads lower than
|
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
a GRanges object with 5 metadata columns that contain
the PMDs; see computePMDs.
Nicolae Radu Zabet and Young Jun Kim
PMDsNoiseFilterCG, computePMDs,
analyseReadsInsideRegionsForCondition
and mergePMDsIteratively
# load the ONT methylation data data(ontSampleGRangesList) # load the gene annotation data data(GEs_hg38) # select the transcript transcript <- GEs_hg38[which(GEs_hg38$type == "transcript")] # the regions where to compute the PMDs regions <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) transcript <- transcript[overlapsAny(transcript, regions)] # filter genes that are partially methylated in the two conditions PMDsGenesCG <- filterPMDs(ontSampleGRangesList[["GM18501"]], potentialPMDs = transcript, context = "CG", minMethylation = 0.4, maxMethylation = 0.6, minCytosinesCount = 4, minReadsPerCytosine = 3, cores = 1)# load the ONT methylation data data(ontSampleGRangesList) # load the gene annotation data data(GEs_hg38) # select the transcript transcript <- GEs_hg38[which(GEs_hg38$type == "transcript")] # the regions where to compute the PMDs regions <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) transcript <- transcript[overlapsAny(transcript, regions)] # filter genes that are partially methylated in the two conditions PMDsGenesCG <- filterPMDs(ontSampleGRangesList[["GM18501"]], potentialPMDs = transcript, context = "CG", minMethylation = 0.4, maxMethylation = 0.6, minCytosinesCount = 4, minReadsPerCytosine = 3, cores = 1)
This function verifies whether a set of potential VMDs (e.g. genes, transposons, CpG islands) are variance methylated or not.
filterVMDs( methylationData, potentialVMDs, context = "CG", minCytosinesCount = 4, minReadsPerCytosine = 3, sdCutoffMethod = "per.high", percentage = 0.05, parallel = FALSE, BPPARAM = NULL, cores = NULL )filterVMDs( methylationData, potentialVMDs, context = "CG", minCytosinesCount = 4, minReadsPerCytosine = 3, sdCutoffMethod = "per.high", percentage = 0.05, parallel = FALSE, BPPARAM = NULL, cores = NULL )
methylationData |
the methylation data in condition
(see |
potentialVMDs |
a |
context |
the context in which the VMDs are computed ( |
minCytosinesCount |
VMDs with less cytosines in the specified context
than |
minReadsPerCytosine |
VMDs with the average number of reads lower than
|
sdCutoffMethod |
Character string specifying how to determine the cutoff for filtering VMDs based on their methylation variance (weighted standard deviation). Available options are:
This allows either quantile-based filtering or automatic detection of variance thresholds based on distribution shape. |
percentage |
Numeric cutoff used when |
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
a GRanges object with 9 metadata columns that contain
the VMDs; see computeVMDs.
Nicolae Radu Zabet and Young Jun Kim
computeVMDs
and analyseReadsInsideRegionsForCondition
# load the ONT methylation data data(ontSampleGRangesList) # load the gene annotation data data(GEs_hg38) # select the transcript transcript <- GEs_hg38[which(GEs_hg38$type == "transcript")] # the regions where to compute the VMDs regions <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) transcript <- transcript[overlapsAny(transcript, regions)] # filter genes that are variance methylated in the two conditions VMDsGenesCG <- filterVMDs(ontSampleGRangesList[["GM18501"]], potentialVMDs = transcript, context = "CG", sdCutoffMethod = "per.high", percentage = 0.05, minCytosinesCount = 4, minReadsPerCytosine = 3, cores = 1)# load the ONT methylation data data(ontSampleGRangesList) # load the gene annotation data data(GEs_hg38) # select the transcript transcript <- GEs_hg38[which(GEs_hg38$type == "transcript")] # the regions where to compute the VMDs regions <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) transcript <- transcript[overlapsAny(transcript, regions)] # filter genes that are variance methylated in the two conditions VMDsGenesCG <- filterVMDs(ontSampleGRangesList[["GM18501"]], potentialVMDs = transcript, context = "CG", sdCutoffMethod = "per.high", percentage = 0.05, minCytosinesCount = 4, minReadsPerCytosine = 3, cores = 1)
Filter VMRs with ONT-specific variance tests and CI filters
filterVMRsONT( methylationData1, methylationData2, potentialVMRs, context = "CG", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minReadsPerCytosine = 3, ciExcludesOne = TRUE, varRatioFc = NULL, parallel = FALSE, BPPARAM = NULL, cores = NULL )filterVMRsONT( methylationData1, methylationData2, potentialVMRs, context = "CG", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.4, minReadsPerCytosine = 3, ciExcludesOne = TRUE, varRatioFc = NULL, parallel = FALSE, BPPARAM = NULL, cores = NULL )
methylationData1 |
A |
methylationData2 |
A |
potentialVMRs |
A |
context |
Character string specifying cytosine context ("CG", "CHG", or "CHH"). |
pValueThreshold |
Numeric p-value threshold (0<value<1) for both Wilcoxon and F-tests after FDR adjustment. |
minCytosinesCount |
Integer minimum number of cytosines per region. |
minProportionDifference |
Numeric minimum methylation difference between conditions (0<value<1). |
minReadsPerCytosine |
Integer minimum average coverage per cytosine. |
ciExcludesOne |
Logical; if |
varRatioFc |
Optional; numeric fold-change cutoff on variance ratio
(e.g., 2 for twofold variance difference). Regions with variance ratio
outside |
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed |
This function verifies whether a set of potential VMRs (e.g., genes, transposons, CpG islands) are differentially methylated or not in ONT data, adding per-read Wilcoxon and F-tests on per-site proportions, confidence interval filtering, and optional variance-fold change cutoffs.
For each potential VMR, per-site methylation proportions are aggregated per read,
then a two-sample Wilcoxon rank-sum test compares means (wilcox_pvalue), and
an F-test compares variances (f_pvalue). You may further filter by requiring
the 95
apply a fold-change cutoff on the variance ratio (varRatioFc).
A GRanges with the same ranges as regions, plus these metadata:
total methylated reads in condition 1
total reads in condition 1
methylation proportion (sumReadsM1/sumReadsN1)
variance of per-read methylation proportions in condition 1
total methylated reads in condition 2
total reads in condition 2
methylation proportion (sumReadsM2/sumReadsN2)
variance of per-read methylation proportions in condition 2
number of cytosines observed in each region
FDR adjusted p-value from Wilcoxon rank-sum test comparing per-read proportions
FDR adjusted p-value from F-test comparing variances of per-read proportions
Ratio of variances (variance1 / variance2)
Full htest object returned by wilcox.test
Full htest object returned by var.test
a number indicating whether the region lost (-1) or gain (+1) methylation in condition 2 compared to condition 1.
a string indicating whether the region lost ("loss")
or gained ("gain") methylation in condition 2 compared to condition 1.
logical; TRUE if region passed the wilcox.test
logical; TRUE if region passed the var.test
Nicolae Radu Zabet and Young Jun Kim
readONTbam,
computePMDs, computeCoMethylatedPositions,
ontSampleGRangesList, GEs_hg38
## Not run: # load the ONT methylation data data(ontSampleGRangesList) # load the gene annotation data data(GEs_hg38) # select the transcript transcript <- GEs_hg38[which(GEs_hg38$type == "transcript")] # the regions where to compute the PMDs regions <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) transcript <- transcript[overlapsAny(transcript, regions)] # filter genes that are differntially methylated in the two conditions VMRsGenesCG <- filterVMRsONT(ontSampleGRangesList[["GM18501"]], ontSampleGRangesList[["GM18876"]], potentialVMRs = transcript, context = "CG", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.01, minReadsPerCytosine = 3, ciExcludesOne = TRUE, varRatioFc = NULL, parallel = TRUE) # parallel recommended ## End(Not run)## Not run: # load the ONT methylation data data(ontSampleGRangesList) # load the gene annotation data data(GEs_hg38) # select the transcript transcript <- GEs_hg38[which(GEs_hg38$type == "transcript")] # the regions where to compute the PMDs regions <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) transcript <- transcript[overlapsAny(transcript, regions)] # filter genes that are differntially methylated in the two conditions VMRsGenesCG <- filterVMRsONT(ontSampleGRangesList[["GM18501"]], ontSampleGRangesList[["GM18876"]], potentialVMRs = transcript, context = "CG", pValueThreshold = 0.01, minCytosinesCount = 4, minProportionDifference = 0.01, minReadsPerCytosine = 3, ciExcludesOne = TRUE, varRatioFc = NULL, parallel = TRUE) # parallel recommended ## End(Not run)
A GRanges object containing the annotation of the Arabidopsis thaliana
A GRanges object
The object was created by calling import.gff3 function
from rtracklayer package for
ftp://ftp.arabidopsis.org/Maps/gbrowse_data/TAIR10/TAIR10_GFF3_genes_transposons.gff
A GRanges object containing the annotation of the Homo sapiens (hg38)
A GRanges object
The object was created by loading the gtf file from
https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.refGene.gtf.gz.
and concatenated by range in 1.5 ~ 2 Mbp from Chromosome 1
Returns a GRanges object spanning from the first cytocine until
the last one on each chromosome
getWholeChromosomes(methylationData)getWholeChromosomes(methylationData)
methylationData |
the methylation data stored as a |
a GRanges object will all chromosomes.
Nicolae Radu Zabet
# load the methylation data data(methylationDataList) # get all chromosomes chromosomes <- getWholeChromosomes(methylationDataList[["WT"]])# load the methylation data data(methylationDataList) # get all chromosomes chromosomes <- getWholeChromosomes(methylationDataList[["WT"]])
This function joins together data that come from biological replicates to perform analysis
joinReplicates(methylationData1, methylationData2, usecomplete = FALSE)joinReplicates(methylationData1, methylationData2, usecomplete = FALSE)
methylationData1 |
the methylation data stored as a |
methylationData2 |
the methylation data stored as a |
usecomplete |
Boolean, determine wheter, when the two dataset differ for number of cytosines, if the smaller dataset should be added with zero reads to match the bigger dataset. |
returns a GRanges object containing multiple metadata
columns with the reads from each object passed as parameter
Alessandro Pio Greco and Nicolae Radu Zabet
# load the methylation data data(methylationDataList) # Joins the wildtype and the mutant in a single object joined_data <- joinReplicates(methylationDataList[["WT"]], methylationDataList[["met1-3"]], FALSE)# load the methylation data data(methylationDataList) # Joins the wildtype and the mutant in a single object joined_data <- joinReplicates(methylationDataList[["WT"]], methylationDataList[["met1-3"]], FALSE)
This function takes a list of DMRs and attempts to merge DMRs while keeping the new DMRs statistically significant.
mergeDMRsIteratively( DMRs, minGap, respectSigns = TRUE, methylationData1, methylationData2, context = "CG", minProportionDifference = 0.4, minReadsPerCytosine = 4, pValueThreshold = 0.01, test = "fisher", alternative = "two.sided", parallel = FALSE, BPPARAM = NULL, cores = NULL )mergeDMRsIteratively( DMRs, minGap, respectSigns = TRUE, methylationData1, methylationData2, context = "CG", minProportionDifference = 0.4, minReadsPerCytosine = 4, pValueThreshold = 0.01, test = "fisher", alternative = "two.sided", parallel = FALSE, BPPARAM = NULL, cores = NULL )
DMRs |
the list of DMRs as a |
minGap |
DMRs separated by a gap of at least |
respectSigns |
logical value indicating whether to respect the sign when joining DMRs. |
methylationData1 |
the methylation data in condition 1
(see |
methylationData2 |
the methylation data in condition 2
(see |
context |
the context in which the DMRs are computed ( |
minProportionDifference |
two adjacent DMRs are merged only if the
difference in methylation proportion of the new DMR is higher than
|
minReadsPerCytosine |
two adjacent DMRs are merged only if the number of
reads per cytosine of the new DMR is higher than |
pValueThreshold |
two adjacent DMRs are merged only if the p-value of
the new DMR (see |
test |
the statistical test used to call DMRs ( |
alternative |
indicates the alternative hypothesis and must be one of
|
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
the reduced list of DMRs as a GRanges object;
e.g. see computeDMRs
Nicolae Radu Zabet
filterDMRs, computeDMRs,
analyseReadsInsideRegionsForCondition and
DMRsNoiseFilterCG
# load the methylation data data(methylationDataList) #load the DMRs in CG context they were computed with minGap = 200 data(DMRsNoiseFilterCG) #merge the DMRs DMRsNoiseFilterCGLarger <- mergeDMRsIteratively(DMRsNoiseFilterCG[1:100], minGap = 500, respectSigns = TRUE, methylationDataList[["WT"]], methylationDataList[["met1-3"]], context = "CG", minProportionDifference=0.4, minReadsPerCytosine = 1, pValueThreshold=0.01, test="score",alternative = "two.sided") ## Not run: #set genomic coordinates where to compute DMRs regions <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E5)) # compute DMRs and remove gaps smaller than 200 bp DMRsNoiseFilterCG200 <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", test = "score", pValueThreshold = 0.01, minCytosinesCount = 1, minProportionDifference = 0.4, minGap = 200, minSize = 0, minReadsPerCytosine = 1, cores = 1) DMRsNoiseFilterCG0 <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", test = "score", pValueThreshold = 0.01, minCytosinesCount = 1, minProportionDifference = 0.4, minGap = 0, minSize = 0, minReadsPerCytosine = 1, cores = 1) DMRsNoiseFilterCG0Merged200 <- mergeDMRsIteratively(DMRsNoiseFilterCG0, minGap = 200, respectSigns = TRUE, methylationDataList[["WT"]], methylationDataList[["met1-3"]], context = "CG", minProportionDifference=0.4, minReadsPerCytosine = 1, pValueThreshold=0.01, test="score",alternative = "two.sided") #check that all newley computed DMRs are identical print(all(DMRsNoiseFilterCG200 == DMRsNoiseFilterCG0Merged200)) ## End(Not run)# load the methylation data data(methylationDataList) #load the DMRs in CG context they were computed with minGap = 200 data(DMRsNoiseFilterCG) #merge the DMRs DMRsNoiseFilterCGLarger <- mergeDMRsIteratively(DMRsNoiseFilterCG[1:100], minGap = 500, respectSigns = TRUE, methylationDataList[["WT"]], methylationDataList[["met1-3"]], context = "CG", minProportionDifference=0.4, minReadsPerCytosine = 1, pValueThreshold=0.01, test="score",alternative = "two.sided") ## Not run: #set genomic coordinates where to compute DMRs regions <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E5)) # compute DMRs and remove gaps smaller than 200 bp DMRsNoiseFilterCG200 <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", test = "score", pValueThreshold = 0.01, minCytosinesCount = 1, minProportionDifference = 0.4, minGap = 200, minSize = 0, minReadsPerCytosine = 1, cores = 1) DMRsNoiseFilterCG0 <- computeDMRs(methylationDataList[["WT"]], methylationDataList[["met1-3"]], regions = regions, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", test = "score", pValueThreshold = 0.01, minCytosinesCount = 1, minProportionDifference = 0.4, minGap = 0, minSize = 0, minReadsPerCytosine = 1, cores = 1) DMRsNoiseFilterCG0Merged200 <- mergeDMRsIteratively(DMRsNoiseFilterCG0, minGap = 200, respectSigns = TRUE, methylationDataList[["WT"]], methylationDataList[["met1-3"]], context = "CG", minProportionDifference=0.4, minReadsPerCytosine = 1, pValueThreshold=0.01, test="score",alternative = "two.sided") #check that all newley computed DMRs are identical print(all(DMRsNoiseFilterCG200 == DMRsNoiseFilterCG0Merged200)) ## End(Not run)
This function takes a list of PMDs and attempts to merge PMDs while keeping the new PMDs statistically significant.
mergePMDsIteratively( PMDs, minGap = 200, respectSigns = TRUE, methylationData, context = "CG", minReadsPerCytosine = 4, minMethylation = 0.4, maxMethylation = 0.6, parallel = FALSE, BPPARAM = NULL, cores = NULL )mergePMDsIteratively( PMDs, minGap = 200, respectSigns = TRUE, methylationData, context = "CG", minReadsPerCytosine = 4, minMethylation = 0.4, maxMethylation = 0.6, parallel = FALSE, BPPARAM = NULL, cores = NULL )
PMDs |
the list of PMDs as a |
minGap |
PMDs separated by a gap of at least |
respectSigns |
logical value indicating whether to respect the sign when joining PMDs. |
methylationData |
the methylation data in GRanges
(see |
context |
the context in which the PMDs are computed ( |
minReadsPerCytosine |
two adjacent PMDs are merged only if the number of
reads per cytosine of the new DMR is higher than |
minMethylation |
Numeric [0,1]; minimum mean methylation proportion. |
maxMethylation |
Numeric [0,1]; maximum mean methylation proportion. |
parallel |
Logical; run in parallel if |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
the reduced list of PMDs as a GRanges object;
e.g. see computePMDs
Nicolae Radu Zabet and Young Jun Kim
filterPMDs, computePMDs,
analyseReadsInsideRegionsForCondition and
PMDsNoiseFilterCG
# load the ONT methylation data data(ontSampleGRangesList) # load the PMDs in CG context they were computed with minGap = 200 data(PMDsNoiseFilterCG) # merge the PMDs PMDsNoiseFilterCGLarger <- mergePMDsIteratively(PMDsNoiseFilterCG[1:100], minGap = 500, respectSigns = TRUE, ontSampleGRangesList[["GM18501"]], context = "CG", minReadsPerCytosine = 4, minMethylation = 0.4, maxMethylation = 0.6, cores = 1) ## Not run: # set genomic coordinates where to compute PMDs chr1_ranges <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) # compute PMDs and remove gaps smaller than 200 bp PMDsNoiseFilterCG200 <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", minCytosinesCount = 1, minMethylation = 0.4, maxMethylation = 0.6, minGap = 0, minSize = 200, minReadsPerCytosine = 1, cores = 1) PMDsNoiseFilterCG0 <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", minCytosinesCount = 1, minMethylation = 0.4, maxMethylation = 0.6, minGap = 0, minSize = 0, minReadsPerCytosine = 1, cores = 1) PMDsNoiseFilterCG0Merged200 <- mergePMDsIteratively(PMDsNoiseFilterCG0, minGap = 200, respectSigns = TRUE, ontSampleGRangesList[["GM18501"]], context = "CG", minReadsPerCytosine = 4, minMethylation = 0.4, maxMethylation = 0.6, cores = 1) #check that all newley computed PMDs are identical print(all(PMDsNoiseFilterCG200 == PMDsNoiseFilterCG0Merged200)) ## End(Not run)# load the ONT methylation data data(ontSampleGRangesList) # load the PMDs in CG context they were computed with minGap = 200 data(PMDsNoiseFilterCG) # merge the PMDs PMDsNoiseFilterCGLarger <- mergePMDsIteratively(PMDsNoiseFilterCG[1:100], minGap = 500, respectSigns = TRUE, ontSampleGRangesList[["GM18501"]], context = "CG", minReadsPerCytosine = 4, minMethylation = 0.4, maxMethylation = 0.6, cores = 1) ## Not run: # set genomic coordinates where to compute PMDs chr1_ranges <- GRanges(seqnames = Rle("chr1"), ranges = IRanges(1E6+5E5,2E6)) # compute PMDs and remove gaps smaller than 200 bp PMDsNoiseFilterCG200 <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", minCytosinesCount = 1, minMethylation = 0.4, maxMethylation = 0.6, minGap = 0, minSize = 200, minReadsPerCytosine = 1, cores = 1) PMDsNoiseFilterCG0 <- computePMDs(ontSampleGRangesList[["GM18501"]], regions = chr1_ranges, context = "CG", method = "noise_filter", windowSize = 100, kernelFunction = "triangular", minCytosinesCount = 1, minMethylation = 0.4, maxMethylation = 0.6, minGap = 0, minSize = 0, minReadsPerCytosine = 1, cores = 1) PMDsNoiseFilterCG0Merged200 <- mergePMDsIteratively(PMDsNoiseFilterCG0, minGap = 200, respectSigns = TRUE, ontSampleGRangesList[["GM18501"]], context = "CG", minReadsPerCytosine = 4, minMethylation = 0.4, maxMethylation = 0.6, cores = 1) #check that all newley computed PMDs are identical print(all(PMDsNoiseFilterCG200 == PMDsNoiseFilterCG0Merged200)) ## End(Not run)
A GRangesList object containing the methylation data at each cytosine
location in the genome in Wild Type (WT) and met1-3 mutant (met1-3) in
Arabidopsis thaliana. The data only contains the first 1 Mbp from Chromosome 3.
The GRanges elements contain four metadata columns
the context in which the DMRs are computed ("CG",
"CHG" or "CHH").
the number of methylated reads.
the total number of reads.
the specific context of the cytosine (H is replaced by the actual nucleotide).
Each element was created by by calling readBismark
function on the CX report files generated by Bismark
http://www.bioinformatics.babraham.ac.uk/projects/bismark/
for http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM980986 dataset
in the case of Wild Type (WT) and
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM981032
dataset in the case of met1-3 mutant (met1-3).
Partially methylated domains called on chr1 in GM18870 cells called in 1Kb
bins with computePMDs function.
The GRanges elements contain seven metadata columns:
the context in which the PMDs was computed ("CG",
"CHG" or "CHH").
the number of methylated reads.
the total number of reads.
the proportion methylated reads filtered between
minMethylation and maxMethylation.
the number of cytosines in the PMDs.
data from https://genome.cshlp.org/content/34/11/2061.
A GRanges object containing cytosine sites, annotated with
per-site ONT methylation calls
The GRanges elements contain four
additional metadata columns:
comma-delimited read‐indices called modified
comma-delimited read‐indices covering but unmodified
integer count of modified reads per site
integer count of same‐strand reads covering each site
data from https://genome.cshlp.org/content/34/11/2061.
A GRangesList object containing the methylation data at each cytosine
location in the genome in GM18501 and GM18876 B-Lymphocyte cell lines in
Homo sapiens. The data only contains the 1.5 ~ 2 Mbp from Chromosome 1.
The GRanges elements contain six metadata columns
the context in which the DMRs are computed "CG".
the specific context of the cytosine (H is replaced by the actual nucleotide).
comma-delimited read‐indices called modified
comma-delimited read‐indices covering but unmodified
the number of methylated reads.
the total number of reads.
Each element was created by calling bam files with readONTbam
function which in DMRcaller package.
The sample pod5 files were from the nanopore dataset from 1000 genome project
https://pmc.ncbi.nlm.nih.gov/articles/PMC10942501/.
https://s3.amazonaws.com/1000g-ont/index.html?prefix=pod5_data/GM18501_R9/
.pod5 files in the case of GM18501 and
https://s3.amazonaws.com/1000g-ont/index.html?prefix=pod5_data/GM18876_R9/
.pod5 files in the case of GM18876 cell line.
For base-calling and alignment, run dorado, ver.0.9.6
https://github.com/nanoporetech/dorado?tab=readme-ov-file#dna-models
to generate the bam files with [email protected] as basecalling model.
This function plots the methylation profile at one locus for the bisulfite
sequencing data.The points on the graph represent methylation proportion of
individual cytosines, their colour which sample they belong to and the
intesity of the the colour how many reads that particular cytosine had. This
means that darker colors indicate stronger evidence that the corresponding
cytosine has the corresponding methylation proportion, while lighter colors
indicate a weaker evidence. The solid lines represent the smoothed profiles
and the intensity of the line the coverage at the corresponding position
(darker colors indicate more reads while lighter ones less reads). The boxes
on top represent the DMRs, where a filled box will represent a DMR which
gained methylation while a box with a pattern represent a DMR that lost
methylation. The DMRs need to have a metadafield "regionType" which
can be either "gain" (where there is more methylation in condition 2
compared to condition 1) or "loss" (where there is less methylation in
condition 2 compared to condition 1). In case this metadafield is missing all
DMRs are drawn using a filled box. Finally, we also allow annotation of the
DNA sequence. We represent by a black boxes all the exons, which are joined
by a horizontal black line, thus, marking the full body of the gene. With
grey boxes we mark the transposable elements. Both for genes and transposable
elements we plot them over a mid line if they are on the positive strand and
under the mid line if they are on the negative strand.
plotLocalMethylationProfile( methylationData1, methylationData2, region, DMRs = NULL, conditionsNames = NULL, gff = NULL, windowSize = 150, context = "CG", labels = NULL, col = NULL, main = "", plotMeanLines = TRUE, plotPoints = TRUE )plotLocalMethylationProfile( methylationData1, methylationData2, region, DMRs = NULL, conditionsNames = NULL, gff = NULL, windowSize = 150, context = "CG", labels = NULL, col = NULL, main = "", plotMeanLines = TRUE, plotPoints = TRUE )
methylationData1 |
the methylation data in condition 1
(see |
methylationData2 |
the methylation data in condition 2
(see |
region |
a |
DMRs |
a |
conditionsNames |
the names of the two conditions. This will be used to plot the legend. |
gff |
a |
windowSize |
the size of the triangle base used to smooth the average methylation profile. |
context |
the context in which the DMRs are computed ( |
labels |
a |
col |
a |
main |
a |
plotMeanLines |
a |
plotPoints |
a |
Invisibly returns NULL
Nicolae Radu Zabet
# load the methylation data data(methylationDataList) # load the gene annotation data data(GEs) #select the genes genes <- GEs[which(GEs$type == "gene")] # the coordinates of the area to be plotted chr3Reg <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(510000,530000)) # load the DMRs in CG context data(DMRsNoiseFilterCG) DMRsCGlist <- list("noise filter"=DMRsNoiseFilterCG) # plot the CG methylation par(mar=c(4, 4, 3, 1)+0.1) par(mfrow=c(1,1)) plotLocalMethylationProfile(methylationDataList[["WT"]], methylationDataList[["met1-3"]], chr3Reg, DMRsCGlist, c("WT", "met1-3"), GEs, windowSize=100, main="CG methylation")# load the methylation data data(methylationDataList) # load the gene annotation data data(GEs) #select the genes genes <- GEs[which(GEs$type == "gene")] # the coordinates of the area to be plotted chr3Reg <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(510000,530000)) # load the DMRs in CG context data(DMRsNoiseFilterCG) DMRsCGlist <- list("noise filter"=DMRsNoiseFilterCG) # plot the CG methylation par(mar=c(4, 4, 3, 1)+0.1) par(mfrow=c(1,1)) plotLocalMethylationProfile(methylationDataList[["WT"]], methylationDataList[["met1-3"]], chr3Reg, DMRsCGlist, c("WT", "met1-3"), GEs, windowSize=100, main="CG methylation")
This function plots the coverage for the bisulfite sequencing data.
plotMethylationDataCoverage( methylationData1, methylationData2 = NULL, breaks, regions = NULL, conditionsNames = NULL, context = "CG", proportion = TRUE, labels = NULL, col = NULL, pch = c(1, 0, 16, 2, 15, 17), lty = c(4, 1, 3, 2, 6, 5), contextPerRow = FALSE )plotMethylationDataCoverage( methylationData1, methylationData2 = NULL, breaks, regions = NULL, conditionsNames = NULL, context = "CG", proportion = TRUE, labels = NULL, col = NULL, pch = c(1, 0, 16, 2, 15, 17), lty = c(4, 1, 3, 2, 6, 5), contextPerRow = FALSE )
methylationData1 |
the methylation data in condition 1
(see |
methylationData2 |
the methylation data in condition 2
(see |
breaks |
a |
regions |
a |
conditionsNames |
a vector of character with the names of the conditions
for |
context |
the context in which the DMRs are computed ( |
proportion |
a |
labels |
a |
col |
a |
pch |
the R symbols used to plot the data. It needs to contain a minimum
of 2 symbols per condition. If not or if |
lty |
the line types used to plot the data. It needs to contain a
minimum of 2 line types per condition. If not or if |
contextPerRow |
a |
This function plots the proportion of cytosines in a specific context that have at least a certain number of reads (x-axis)
Invisibly returns NULL
Nicolae Radu Zabet
computeMethylationDataCoverage,
methylationDataList
# load the methylation data data(methylationDataList) # plot the coverage in CG context par(mar=c(4, 4, 3, 1)+0.1) plotMethylationDataCoverage(methylationDataList[["WT"]], methylationDataList[["met1-3"]], breaks = c(1,5,10,15), regions = NULL, conditionsNames = c("WT","met1-3"), context = c("CG"), proportion = TRUE, labels = LETTERS, col = NULL, pch = c(1,0,16,2,15,17), lty = c(4,1,3,2,6,5), contextPerRow = FALSE) ## Not run: # plot the coverage in all three contexts plotMethylationDataCoverage(methylationDataList[["WT"]], methylationDataList[["met1-3"]], breaks = 1:15, regions = NULL, conditionsNames = c("WT","met1-3"), context = c("CG", "CHG", "CHH"), proportion = TRUE, labels = LETTERS, col = NULL, pch = c(1,0,16,2,15,17), lty = c(4,1,3,2,6,5), contextPerRow = FALSE) ## End(Not run)# load the methylation data data(methylationDataList) # plot the coverage in CG context par(mar=c(4, 4, 3, 1)+0.1) plotMethylationDataCoverage(methylationDataList[["WT"]], methylationDataList[["met1-3"]], breaks = c(1,5,10,15), regions = NULL, conditionsNames = c("WT","met1-3"), context = c("CG"), proportion = TRUE, labels = LETTERS, col = NULL, pch = c(1,0,16,2,15,17), lty = c(4,1,3,2,6,5), contextPerRow = FALSE) ## Not run: # plot the coverage in all three contexts plotMethylationDataCoverage(methylationDataList[["WT"]], methylationDataList[["met1-3"]], breaks = 1:15, regions = NULL, conditionsNames = c("WT","met1-3"), context = c("CG", "CHG", "CHH"), proportion = TRUE, labels = LETTERS, col = NULL, pch = c(1,0,16,2,15,17), lty = c(4,1,3,2,6,5), contextPerRow = FALSE) ## End(Not run)
This function plots the correlation of methylation levels for Cytosines located at a certain distance apart.
plotMethylationDataSpatialCorrelation( methylationData1, methylationData2 = NULL, distances, regions = NULL, conditionsNames = NULL, context = "CG", labels = NULL, col = NULL, pch = c(1, 0, 16, 2, 15, 17), lty = c(4, 1, 3, 2, 6, 5), contextPerRow = FALSE, log = "" )plotMethylationDataSpatialCorrelation( methylationData1, methylationData2 = NULL, distances, regions = NULL, conditionsNames = NULL, context = "CG", labels = NULL, col = NULL, pch = c(1, 0, 16, 2, 15, 17), lty = c(4, 1, 3, 2, 6, 5), contextPerRow = FALSE, log = "" )
methylationData1 |
the methylation data in condition 1
(see |
methylationData2 |
the methylation data in condition 2
(see |
distances |
a |
regions |
a |
conditionsNames |
a vector of character with the names of the conditions
for |
context |
the context in which the DMRs are computed ( |
labels |
a |
col |
a |
pch |
the R symbols used to plot the data. It needs to contain a minimum
of 2 symbols per condition. If not or if |
lty |
the line types used to plot the data. It needs to contain a
minimum of 2 line types per condition. If not or if |
contextPerRow |
a |
log |
a |
This function plots the proportion of cytosines in a specific context that have at least a certain number of reads (x-axis)
Invisibly returns NULL
Nicolae Radu Zabet
computeMethylationDataSpatialCorrelation,
methylationDataList
# load the methylation data data(methylationDataList) # plot the spatial correlation in CG context par(mar=c(4, 4, 3, 1)+0.1) plotMethylationDataSpatialCorrelation(methylationDataList[["WT"]], methylationDataList[["met1-3"]], distances = c(1,5,10,15), regions = NULL, conditionsNames = c("WT","met1-3"), context = c("CG"), labels = LETTERS, col = NULL, pch = c(1,0,16,2,15,17), lty = c(4,1,3,2,6,5), contextPerRow = FALSE) ## Not run: # plot the spatial correlation in all three contexts plotMethylationDataSpatialCorrelation(methylationDataList[["WT"]], methylationDataList[["met1-3"]], distances = c(1,5,10,15,20,50,100,150,200,500,1000), regions = NULL, conditionsNames = c("WT","met1-3"), context = c("CG", "CHG", "CHH"), labels = LETTERS, col = NULL, pch = c(1,0,16,2,15,17), lty = c(4,1,3,2,6,5), contextPerRow = FALSE, log="x") ## End(Not run)# load the methylation data data(methylationDataList) # plot the spatial correlation in CG context par(mar=c(4, 4, 3, 1)+0.1) plotMethylationDataSpatialCorrelation(methylationDataList[["WT"]], methylationDataList[["met1-3"]], distances = c(1,5,10,15), regions = NULL, conditionsNames = c("WT","met1-3"), context = c("CG"), labels = LETTERS, col = NULL, pch = c(1,0,16,2,15,17), lty = c(4,1,3,2,6,5), contextPerRow = FALSE) ## Not run: # plot the spatial correlation in all three contexts plotMethylationDataSpatialCorrelation(methylationDataList[["WT"]], methylationDataList[["met1-3"]], distances = c(1,5,10,15,20,50,100,150,200,500,1000), regions = NULL, conditionsNames = c("WT","met1-3"), context = c("CG", "CHG", "CHH"), labels = LETTERS, col = NULL, pch = c(1,0,16,2,15,17), lty = c(4,1,3,2,6,5), contextPerRow = FALSE, log="x") ## End(Not run)
This function plots the low resolution profiles for the bisulfite sequencing data.
plotMethylationProfile( methylationProfiles, autoscale = FALSE, labels = NULL, title = "", col = NULL, pch = c(1, 0, 16, 2, 15, 17), lty = c(4, 1, 3, 2, 6, 5) )plotMethylationProfile( methylationProfiles, autoscale = FALSE, labels = NULL, title = "", col = NULL, pch = c(1, 0, 16, 2, 15, 17), lty = c(4, 1, 3, 2, 6, 5) )
methylationProfiles |
a |
autoscale |
a |
labels |
a |
title |
the plot title. |
col |
a |
pch |
the R symbols used to plot the data. |
lty |
the line types used to plot the data. |
Invisibly returns NULL
Nicolae Radu Zabet
plotMethylationProfileFromData,
computeMethylationProfile and methylationDataList
# load the methylation data data(methylationDataList) # the region where to compute the profile region <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E6)) # compute low resolution profile in 20 Kb windows lowResProfileWTCG <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 20000, context = "CG") lowResProfilsCG <- GRangesList("WT" = lowResProfileWTCG) #plot the low resolution profile par(mar=c(4, 4, 3, 1)+0.1) par(mfrow=c(1,1)) plotMethylationProfile(lowResProfilsCG, autoscale = FALSE, title="CG methylation on Chromosome 3", col=c("#D55E00","#E69F00"), pch = c(1,0), lty = c(4,1)) ## Not run: # compute low resolution profile in 10 Kb windows in CG context lowResProfileWTCG <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 10000, context = "CG") lowResProfileMet13CG <- computeMethylationProfile( methylationDataList[["met1-3"]], region, windowSize = 10000, context = "CG") lowResProfileCG <- GRangesList("WT" = lowResProfileWTCG, "met1-3" = lowResProfileMet13CG) # compute low resolution profile in 10 Kb windows in CHG context lowResProfileWTCHG <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 10000, context = "CHG") lowResProfileMet13CHG <- computeMethylationProfile( methylationDataList[["met1-3"]], region, windowSize = 10000, context = "CHG") lowResProfileCHG <- GRangesList("WT" = lowResProfileWTCHG, "met1-3" = lowResProfileMet13CHG) # plot the low resolution profile par(mar=c(4, 4, 3, 1)+0.1) par(mfrow=c(2,1)) plotMethylationProfile(lowResProfileCG, autoscale = FALSE, labels = LETTERS[1], title="CG methylation on Chromosome 3", col=c("#D55E00","#E69F00"), pch = c(1,0), lty = c(4,1)) plotMethylationProfile(lowResProfileCHG, autoscale = FALSE, labels = LETTERS[2], title="CHG methylation on Chromosome 3", col=c("#0072B2", "#56B4E9"), pch = c(16,2), lty = c(3,2)) ## End(Not run)# load the methylation data data(methylationDataList) # the region where to compute the profile region <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E6)) # compute low resolution profile in 20 Kb windows lowResProfileWTCG <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 20000, context = "CG") lowResProfilsCG <- GRangesList("WT" = lowResProfileWTCG) #plot the low resolution profile par(mar=c(4, 4, 3, 1)+0.1) par(mfrow=c(1,1)) plotMethylationProfile(lowResProfilsCG, autoscale = FALSE, title="CG methylation on Chromosome 3", col=c("#D55E00","#E69F00"), pch = c(1,0), lty = c(4,1)) ## Not run: # compute low resolution profile in 10 Kb windows in CG context lowResProfileWTCG <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 10000, context = "CG") lowResProfileMet13CG <- computeMethylationProfile( methylationDataList[["met1-3"]], region, windowSize = 10000, context = "CG") lowResProfileCG <- GRangesList("WT" = lowResProfileWTCG, "met1-3" = lowResProfileMet13CG) # compute low resolution profile in 10 Kb windows in CHG context lowResProfileWTCHG <- computeMethylationProfile(methylationDataList[["WT"]], region, windowSize = 10000, context = "CHG") lowResProfileMet13CHG <- computeMethylationProfile( methylationDataList[["met1-3"]], region, windowSize = 10000, context = "CHG") lowResProfileCHG <- GRangesList("WT" = lowResProfileWTCHG, "met1-3" = lowResProfileMet13CHG) # plot the low resolution profile par(mar=c(4, 4, 3, 1)+0.1) par(mfrow=c(2,1)) plotMethylationProfile(lowResProfileCG, autoscale = FALSE, labels = LETTERS[1], title="CG methylation on Chromosome 3", col=c("#D55E00","#E69F00"), pch = c(1,0), lty = c(4,1)) plotMethylationProfile(lowResProfileCHG, autoscale = FALSE, labels = LETTERS[2], title="CHG methylation on Chromosome 3", col=c("#0072B2", "#56B4E9"), pch = c(16,2), lty = c(3,2)) ## End(Not run)
This function plots the low resolution profiles for all bisulfite sequencing data.
plotMethylationProfileFromData( methylationData1, methylationData2 = NULL, regions = NULL, conditionsNames = NULL, context = "CG", windowSize = NULL, autoscale = FALSE, labels = NULL, col = NULL, pch = c(1, 0, 16, 2, 15, 17), lty = c(4, 1, 3, 2, 6, 5), contextPerRow = TRUE )plotMethylationProfileFromData( methylationData1, methylationData2 = NULL, regions = NULL, conditionsNames = NULL, context = "CG", windowSize = NULL, autoscale = FALSE, labels = NULL, col = NULL, pch = c(1, 0, 16, 2, 15, 17), lty = c(4, 1, 3, 2, 6, 5), contextPerRow = TRUE )
methylationData1 |
the methylation data in condition 1
(see |
methylationData2 |
the methylation data in condition 2
(see |
regions |
a |
conditionsNames |
the names of the two conditions. This will be used to plot the legend. |
context |
a |
windowSize |
a |
autoscale |
a |
labels |
a |
col |
a |
pch |
the R symbols used to plot the data It needs to contain a minimum
of 2 symbols per condition. If not or if |
lty |
the line types used to plot the data. It needs to contain a
minimum of 2 line types per condition. If not or if |
contextPerRow |
a |
Invisibly returns NULL
Nicolae Radu Zabet
plotMethylationProfile,
computeMethylationProfile and methylationDataList
# load the methylation data data(methylationDataList) #plot the low resolution profile at 10 Kb resolution par(mar=c(4, 4, 3, 1)+0.1) plotMethylationProfileFromData(methylationDataList[["WT"]], methylationDataList[["met1-3"]], conditionsNames=c("WT", "met1-3"), windowSize = 20000, autoscale = TRUE, context = c("CHG")) ## Not run: #plot the low resolution profile at 5 Kb resolution par(mar=c(4, 4, 3, 1)+0.1) plotMethylationProfileFromData(methylationDataList[["WT"]], methylationDataList[["met1-3"]], conditionsNames=c("WT", "met1-3"), windowSize = 5000, autoscale = TRUE, context = c("CG", "CHG", "CHH"), labels = LETTERS) ## End(Not run)# load the methylation data data(methylationDataList) #plot the low resolution profile at 10 Kb resolution par(mar=c(4, 4, 3, 1)+0.1) plotMethylationProfileFromData(methylationDataList[["WT"]], methylationDataList[["met1-3"]], conditionsNames=c("WT", "met1-3"), windowSize = 20000, autoscale = TRUE, context = c("CHG")) ## Not run: #plot the low resolution profile at 5 Kb resolution par(mar=c(4, 4, 3, 1)+0.1) plotMethylationProfileFromData(methylationDataList[["WT"]], methylationDataList[["met1-3"]], conditionsNames=c("WT", "met1-3"), windowSize = 5000, autoscale = TRUE, context = c("CG", "CHG", "CHH"), labels = LETTERS) ## End(Not run)
This function plots the distribution of a set of subregions on a large region.
plotOverlapProfile( overlapsProfiles1, overlapsProfiles2 = NULL, names = NULL, labels = NULL, col = NULL, title = "", logscale = FALSE, maxValue = NULL )plotOverlapProfile( overlapsProfiles1, overlapsProfiles2 = NULL, names = NULL, labels = NULL, col = NULL, title = "", logscale = FALSE, maxValue = NULL )
overlapsProfiles1 |
a |
overlapsProfiles2 |
a |
names |
a |
labels |
a |
col |
a |
title |
the title of the plot. |
logscale |
a |
maxValue |
a maximum value in a region. Used for the colour scheme. |
Invisibly returns NULL.
Nicolae Radu Zabet
computeOverlapProfile, filterDMRs,
computeDMRs and mergeDMRsIteratively
# load the methylation data data(methylationDataList) # load the DMRs in CG context data(DMRsNoiseFilterCG) # the coordinates of the area to be plotted largeRegion <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E5)) # compute overlaps distribution hotspotsHypo <- computeOverlapProfile(DMRsNoiseFilterCG, largeRegion, windowSize = 10000, binary = FALSE) plotOverlapProfile(GRangesList("Chr3"=hotspotsHypo), names = c("hypomethylated"), title = "CG methylation") ## Not run: largeRegion <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E6)) hotspotsHypo <- computeOverlapProfile( DMRsNoiseFilterCG[(DMRsNoiseFilterCG$regionType == "loss")], largeRegion, windowSize=2000, binary=TRUE, cores=1) hotspotsHyper <- computeOverlapProfile( DMRsNoiseFilterCG[(DMRsNoiseFilterCG$regionType == "gain")], largeRegion, windowSize=2000, binary=TRUE, cores=1) plotOverlapProfile(GRangesList("Chr3"=hotspotsHypo), GRangesList("Chr3"=hotspotsHyper), names=c("loss", "gain"), title="CG methylation") ## End(Not run)# load the methylation data data(methylationDataList) # load the DMRs in CG context data(DMRsNoiseFilterCG) # the coordinates of the area to be plotted largeRegion <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E5)) # compute overlaps distribution hotspotsHypo <- computeOverlapProfile(DMRsNoiseFilterCG, largeRegion, windowSize = 10000, binary = FALSE) plotOverlapProfile(GRangesList("Chr3"=hotspotsHypo), names = c("hypomethylated"), title = "CG methylation") ## Not run: largeRegion <- GRanges(seqnames = Rle("Chr3"), ranges = IRanges(1,1E6)) hotspotsHypo <- computeOverlapProfile( DMRsNoiseFilterCG[(DMRsNoiseFilterCG$regionType == "loss")], largeRegion, windowSize=2000, binary=TRUE, cores=1) hotspotsHyper <- computeOverlapProfile( DMRsNoiseFilterCG[(DMRsNoiseFilterCG$regionType == "gain")], largeRegion, windowSize=2000, binary=TRUE, cores=1) plotOverlapProfile(GRangesList("Chr3"=hotspotsHypo), GRangesList("Chr3"=hotspotsHyper), names=c("loss", "gain"), title="CG methylation") ## End(Not run)
A GRangesList object containing the PMDs between GM18501 and
GM18876 B-Lymphocyte cell lines in Homo sapiens
(see ontSampleGRangesList). The PMDs were computed on the
1.5 ~ 2 Mbp from Chromosome 1 with bins method using binSize of 1 kbp
The GRanges element contain 5 metadata columns;
see computePMDs
filterPMDs, computePMDs,
analyseReadsInsideRegionsForConditionPMD
and mergePMDsIteratively
A GRangesList object containing the PMDs between GM18501 and
GM18876 B-Lymphocyte cell lines in Homo sapiens
(see ontSampleGRangesList). The PMDs were computed on the
1.5 ~ 2 Mbp from Chromosome 1 with noise filter method using a triangular kernel
and a windowSize of 100 bp
The GRanges element contain 5 metadata columns;
see computePMDs
filterPMDs, computePMDs,
analyseReadsInsideRegionsForConditionPMD
and mergePMDsIteratively
This function pools together multiple methylation datasets.
poolMethylationDatasets(methylationDataList)poolMethylationDatasets(methylationDataList)
methylationDataList |
a |
the methylation data stored as a GRanges
object with four metadata columns (see methylationDataList).
If the Granges are from ONT datasets, its have six metedata columns include as
ONT_Cm and ONT_C (see readONTbam).
Nicolae Radu Zabet updated by Young Jun Kim
# load methylation data object data(methylationDataList) # pools the two datasets together pooledMethylationData <- poolMethylationDatasets(methylationDataList)# load methylation data object data(methylationDataList) # pools the two datasets together pooledMethylationData <- poolMethylationDatasets(methylationDataList)
This function pools together two methylation datasets.
poolTwoMethylationDatasets( methylationData1, methylationData2, sample1_name = NULL, sample2_name = NULL )poolTwoMethylationDatasets( methylationData1, methylationData2, sample1_name = NULL, sample2_name = NULL )
methylationData1 |
a |
methylationData2 |
a |
sample1_name |
the label used for sample 1. |
sample2_name |
the label used for sample 2. |
the methylation data stored as a GRanges
object with four metadata columns (see methylationDataList).
If the Granges are from ONT datasets, its have six metedata columns include as
ONT_Cm and ONT_C (see readONTbam).
Nicolae Radu Zabet updated by Young Jun Kim
# load methylation data object data(methylationDataList) # save the two datasets together pooledMethylationData <- poolTwoMethylationDatasets(methylationDataList[[1]], methylationDataList[[2]])# load methylation data object data(methylationDataList) # save the two datasets together pooledMethylationData <- poolTwoMethylationDatasets(methylationDataList[[1]], methylationDataList[[2]])
This function takes as input a CX report file produced by Bismark
and returns a GRanges object with four metadata columns
The file represents the bisulfite sequencing methylation data.
readBismark(file)readBismark(file)
file |
The filename (including path) of the methylation (CX report generated by Bismark) to be read. |
the methylation data stored as a GRanges
object with four metadata columns (see methylationDataList).
Nicolae Radu Zabet and Jonathan Michael Foonlan Tsang
# load methylation data object data(methylationDataList) # save the one datasets into a file saveBismark(methylationDataList[["WT"]], "chr3test_a_thaliana_wt.CX_report") # load the data methylationDataWT <- readBismark("chr3test_a_thaliana_wt.CX_report") #check that the loading worked all(methylationDataWT == methylationDataList[["WT"]])# load methylation data object data(methylationDataList) # save the one datasets into a file saveBismark(methylationDataList[["WT"]], "chr3test_a_thaliana_wt.CX_report") # load the data methylationDataWT <- readBismark("chr3test_a_thaliana_wt.CX_report") #check that the loading worked all(methylationDataWT == methylationDataList[["WT"]])
This function takes as input a vector of CX report file produced by Bismark
and returns a GRanges object with four metadata columns
(see methylationDataList). The file represents the pooled
bisulfite sequencing data.
readBismarkPool(files)readBismarkPool(files)
files |
The filenames (including path) of the methylation (CX report generated with Bismark) to be read |
the methylation data stored as a GRanges
object with four metadata columns (see methylationDataList).
Nicolae Radu Zabet and Jonathan Michael Foonlan Tsang
# load methylation data object data(methylationDataList) # save the two datasets saveBismark(methylationDataList[["WT"]], "chr3test_a_thaliana_wt.CX_report") saveBismark(methylationDataList[["met1-3"]], "chr3test_a_thaliana_met13.CX_report") # reload the two datasets and pool them filenames <- c("chr3test_a_thaliana_wt.CX_report", "chr3test_a_thaliana_met13.CX_report") methylationDataPool <- readBismarkPool(filenames)# load methylation data object data(methylationDataList) # save the two datasets saveBismark(methylationDataList[["WT"]], "chr3test_a_thaliana_wt.CX_report") saveBismark(methylationDataList[["met1-3"]], "chr3test_a_thaliana_met13.CX_report") # reload the two datasets and pool them filenames <- c("chr3test_a_thaliana_wt.CX_report", "chr3test_a_thaliana_met13.CX_report") methylationDataPool <- readBismarkPool(filenames)
readONTbam() takes an indexed Nanopore BAM file with MM/ML tags,
decodes each read’s per-C modification probabilities, and overlays
them on a GRanges of candidate cytosine sites.
It returns a copy of ref_gr augmented with:
ONT_Cm — comma-delimited read‐indices called “modified”
ONT_C — comma-delimited read‐indices covering but _not_ modified
readsM — count of modified reads at each site
readsN — total same-strand coverage at each site
You can either supply your own ref_gr (e.g.\ from selectCytosine())
or leave it NULL and pass context, chr, region
to build ref_gr on the fly.
readONTbam( bamfile, ref_gr = NULL, modif = "C+m?", prob_thresh = 0.5, genome = BSgenome.Hsapiens.UCSC.hg38, context = "CG", chr = NULL, region = NULL, synonymous = FALSE, parallel = FALSE, BPPARAM = NULL, cores = NULL )readONTbam( bamfile, ref_gr = NULL, modif = "C+m?", prob_thresh = 0.5, genome = BSgenome.Hsapiens.UCSC.hg38, context = "CG", chr = NULL, region = NULL, synonymous = FALSE, parallel = FALSE, BPPARAM = NULL, cores = NULL )
bamfile |
Path to an indexed ONT BAM with MM/ML tags. |
ref_gr |
A |
modif |
Character vector of MM codes to treat as “modified”
(e.g. |
prob_thresh |
Numeric in \[0,1\] — minimum ML probability to call a read “modified.” |
genome |
A |
context |
Sequence context for |
chr |
Chromosome names to restrict |
region |
A |
synonymous |
Logical (default: FALSE). If TRUE, include modified calls that match the specified context sequence (e.g. CGG), even if the site was previously excluded due to deletion or mismatch. For example, if a deletion occurs at position 234523, but the surrounding context still forms CGG, then the modified C at 234523 will be retained (Nmod=1). |
parallel |
Logical. If TRUE, automatically detect your system condition and decoding will use parallel threads via BiocParallel::. If FALSE (default), decoding is done serially. |
BPPARAM |
A |
cores |
Integer number of workers (must not exceed BPPARAM$workers). This value will automatically set as the maximum number of system workers, also able to set as manually. |
This function read and annotate ONT MM/ML tags against a cytosine reference
A GRanges of the same length as ref_gr, with four
additional metadata columns:
comma-delimited read‐indices called modified
comma-delimited read‐indices covering but unmodified
integer count of modified reads per site
integer count of same‐strand reads covering each site
Nicolae Radu Zabet and Young Jun Kim
selectCytosine, computeDMRs,
computePMDs, computeCoMethylatedPositions,
computeCoMethylatedRegions, filterVMRsONT,
ontSampleGRangesList, scanBamChr1Random5
## Not run: library(DMRcaller) library(BSgenome.Hsapiens.UCSC.hg38) # set the bam file directory bam_path <- system.file("extdata", "scanBamChr1Random5.bam", package="DMRcaller") # read ONTbam file (chromosome 1 only) in CG context with BSgenome.Hsapiens.UCSC.hg38 ONTSampleGRanges <- readONTbam(bamfile = bam_path, ref_gr = NULL, modif = "C+m?", prob_thresh = 0.50,genome = BSgenome.Hsapiens.UCSC.hg38, context = "CG", chr = "chr1", region = NULL, synonymous = FALSE, parallel = FALSE, BPPARAM = NULL) ## End(Not run)## Not run: library(DMRcaller) library(BSgenome.Hsapiens.UCSC.hg38) # set the bam file directory bam_path <- system.file("extdata", "scanBamChr1Random5.bam", package="DMRcaller") # read ONTbam file (chromosome 1 only) in CG context with BSgenome.Hsapiens.UCSC.hg38 ONTSampleGRanges <- readONTbam(bamfile = bam_path, ref_gr = NULL, modif = "C+m?", prob_thresh = 0.50,genome = BSgenome.Hsapiens.UCSC.hg38, context = "CG", chr = "chr1", region = NULL, synonymous = FALSE, parallel = FALSE, BPPARAM = NULL) ## End(Not run)
This function takes as input a GRanges object generated with
readBismark and saves the output to a file using
Bismark CX report format.
saveBismark(methylationData, filename)saveBismark(methylationData, filename)
methylationData |
the methylation data stored as a |
filename |
the filename where the data will be saved. |
Invisibly returns NULL
Nicolae Radu Zabet
# load methylation data object data(methylationDataList) # save one dataset to a file saveBismark(methylationDataList[["WT"]], "chr3test_a_thaliana_wt.CX_report")# load methylation data object data(methylationDataList) # save one dataset to a file saveBismark(methylationDataList[["WT"]], "chr3test_a_thaliana_wt.CX_report")
A .bam file containing the basecalling result of 5 sequences
containing with the MM, ML tag from dorado.
A .bam object
The object was created by base-calling and alignment by dorado, ver.0.9.6
https://github.com/nanoporetech/dorado?tab=readme-ov-file#dna-models
to generate the bam files with [email protected] as basecalling model
from GM18501 cell line https://s3.amazonaws.com/1000g-ont/index.html?prefix=pod5_data/GM18501_R9/.
To subset, call the bam file by scanBam function from
Rsamtools package and randomly select the 5 sequences.
and repackaging the subsetted list to the bam file for test running.
Constructs a GRanges of all cytosine positions
in the specified BSgenome (or BSgenome package name), optionally filtering
by methylation context ("CG", "CHG", "CHH"), by
chromosome, and by genomic region.
selectCytosine( genome = BSgenome.Hsapiens.UCSC.hg38, context = c("CG", "CHG", "CHH"), chr = NULL, region = NULL )selectCytosine( genome = BSgenome.Hsapiens.UCSC.hg38, context = c("CG", "CHG", "CHH"), chr = NULL, region = NULL )
genome |
A |
context |
A character vector of one or more methylation contexts to include:
|
chr |
An optional character vector of chromosome names to restrict the
enumeration. If |
region |
An optional |
This function enumerate cytosine positions in a BSgenome reference
A GRanges object with one 1-bp range per
cytosine, and two metadata columns:
A factor indicating the context ("CG",
"CHG", or "CHH").
Character or factor giving the
surrounding trinucleotide sequence (or NA for "CG").
Nicolae Radu Zabet and Young Jun Kim
readONTbam, ontSampleGRangesList
## Not run: library(BSgenome.Hsapiens.UCSC.hg38) # Only CpG sites on chr1 and chr2: gr_chr1_2 <- selectCytosine(context="CG", chr=c("chr1","chr2")) # CHH sites in a specific region on chr3: my_region <- GRanges("chr3", IRanges(1e6, 1e6 + 1e5)) gr_region <- selectCytosine(context="CHH", chr="chr3", region=my_region) ## End(Not run)## Not run: library(BSgenome.Hsapiens.UCSC.hg38) # Only CpG sites on chr1 and chr2: gr_chr1_2 <- selectCytosine(context="CG", chr=c("chr1","chr2")) # CHH sites in a specific region on chr3: my_region <- GRanges("chr3", IRanges(1e6, 1e6 + 1e5)) gr_region <- selectCytosine(context="CHH", chr="chr3", region=my_region) ## End(Not run)
A GRanges object containing simulated date for methylation in four
samples. The conditions assciated witch each sample are a, a, b and b.
A GRanges object containing multiple metadata
columns with the reads from each object passed as parameter
The object was created by calling joinReplicates
function.