Title: | Functions to conduct quality control analysis in methylation data |
---|---|
Description: | Functions to analyze methylation data can be found here. Some functions are relevant for single cell methylation data but most other functions can be used for any methylation data. Highlight of this workflow is the comprehensive quality control report. |
Authors: | Divy Kangeyan <[email protected]> |
Maintainer: | Divy Kangeyan <[email protected]> |
License: | GPL-2 |
Version: | 1.27.0 |
Built: | 2025-01-17 05:44:44 UTC |
Source: | https://github.com/bioc/scmeth |
Plot the bisulfite conversion rate for each sample based on the pheno data in the bs object
bsConversionPlot(bs)
bsConversionPlot(bs)
bs |
bsseq object |
Plot showing bisulfite conversion rate for each sample
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) bsConversionPlot(bs)
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) bsConversionPlot(bs)
Provides Coverage metrics for each sample by the chromosome
chromosomeCoverage(bs)
chromosomeCoverage(bs)
bs |
bsseq object |
matrix of chromsome covergae with column and rows indicating the samples and the chromosome respectively
directory <- system.file("extdata/bismark_data",package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) chromosomeCoverage(bs)
directory <- system.file("extdata/bismark_data",package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) chromosomeCoverage(bs)
Provides Coverage for each cell in a library pool
coverage(bs, subSample = 1e+06, offset = 50000)
coverage(bs, subSample = 1e+06, offset = 50000)
bs |
bsseq object |
subSample |
number of CpGs to subsample. Default value is 1000000. |
offset |
how many CpGs to offset when subsampling Default value is set to be 50000, i.e. first 50000 CpGs will be ignored in subsampling. |
vector of coverage for the cells in bs object
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) coverage(bs)
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) coverage(bs)
Provides Coverage by the CpG density. CpG Density is defined as the number of CpGs observed in certain base pair long region.
cpgDensity(bs, organism, windowLength = 1000, small = FALSE)
cpgDensity(bs, organism, windowLength = 1000, small = FALSE)
bs |
bsseq object |
organism |
scientific name of the organism of interest, e.g. Mmusculus or Hsapiens |
windowLength |
Length of the window to calculate the density |
small |
Indicator for a small dataset, cpg density is calculated more memory efficiently for large dataset but for small dataset a different quicker method is used Default value for window length is 1000 basepairs. |
Data frame with sample name and coverage in repeat masker regions
library(BSgenome.Hsapiens.NCBI.GRCh38) directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) cpgDensity(bs, Hsapiens, 1000, small=TRUE)
library(BSgenome.Hsapiens.NCBI.GRCh38) directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) cpgDensity(bs, Hsapiens, 1000, small=TRUE)
In single cell analysis overwhelmingly large number of CpGs have binary methylation Due to errors in sequencing and amplification many CpGs tend to have non-binary methylation. Hence this function catergorizes the non-binary CpGs as methylated if the methyation is above 0.8 and unmethylated if the methylation is below 0.2
cpgDiscretization(bs, subSample = 1e+06, offset = 50000, coverageVec = NULL)
cpgDiscretization(bs, subSample = 1e+06, offset = 50000, coverageVec = NULL)
bs |
bsseq object |
subSample |
number of CpGs to subsample. Default value is 1000000. |
offset |
how many CpGs to offset when subsampling Default value is set to be 50000, i.e. first 50000 CpGs will be ignored in subsampling. |
coverageVec |
If coverage vector is already calculated provide it to speed up the process |
meth discretized methylation matrix
discard total number of removed CpGs from each sample
Percentage of CpGs discarded compared to the total number of CpGs
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) cpgDiscretization(bs)
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) cpgDiscretization(bs)
Downsample the CpG coverage matrix for saturation analysis
downsample(bs, dsRates = c(0.01, 0.02, 0.05, seq(0.1, 0.9, 0.1)), subSample = 1e+06, offset = 50000)
downsample(bs, dsRates = c(0.01, 0.02, 0.05, seq(0.1, 0.9, 0.1)), subSample = 1e+06, offset = 50000)
bs |
bsseq object |
dsRates |
downsampling rate. i.e. the probabaility of sampling a single CpG default is list of probabilities ranging from 0.01 to 1 0.01 0.02 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 For more continuous saturation curve dsRates can be changed to add more sampling rates |
subSample |
number of CpGs to subsample Default value is 1000000. |
offset |
how many CpGs to offset when subsampling Default value is set to be 50000, i.e. first 50000 CpGs will be ignored in subsampling. |
Data frame with the CpG coverage for each sample at each sampling rate
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) scmeth::downsample(bs)
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) scmeth::downsample(bs)
Provides Coverage metrics for the sample by each genomic features provided by the user
featureCoverage(bs, features, genomebuild)
featureCoverage(bs, features, genomebuild)
bs |
bsseq object |
features |
list of genomic features, e.g. genes_exons, genes_introns, cpg_islands, cpg_shelves Names are based on the annotatr packages, so all the features provided by the annotatr package will be supported in this function |
genomebuild |
reference alignment, i.e. mm10 or hg38 |
a data frame with genomic feature names and the number of CpG covered in each feature
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) featureCoverage(bs, c('cpg_islands', 'cpg_shores'), 'hg38')
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) featureCoverage(bs, c('cpg_islands', 'cpg_shores'), 'hg38')
Plot the methylation at each position of the read to observe any biases in the methylation based on the read position
mbiasplot(dir = NULL, mbiasFiles = NULL)
mbiasplot(dir = NULL, mbiasFiles = NULL)
dir |
directory name with mbias files |
mbiasFiles |
list of mbias files |
Returns a list containing the methylation across the read position in original top and original bottom strand both in forward and reverse reads for multiple samples
mbiasFile <- '2017-04-21_HG23KBCXY_2_AGGCAGAA_TATCTC_pe.M-bias.txt' mbiasplot(mbiasFiles=system.file("extdata", mbiasFile, package='scmeth'))
mbiasFile <- '2017-04-21_HG23KBCXY_2_AGGCAGAA_TATCTC_pe.M-bias.txt' mbiasplot(mbiasFiles=system.file("extdata", mbiasFile, package='scmeth'))
Plot the methylation distribution for the cells in bsseq object
methylationDist(bs)
methylationDist(bs)
bs |
bsseq object |
mean methylation for each sample
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) methylationDist(bs)
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) methylationDist(bs)
Plot the mapped and unmapped reads
readmetrics(bs)
readmetrics(bs)
bs |
bsseq object |
Plot showing the mapped and unmapped read information for each cell
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) readmetrics(bs)
directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) readmetrics(bs)
Provides Coverage metrics in the repeat masker region
repMask(bs, organism, genome)
repMask(bs, organism, genome)
bs |
bsseq object |
organism |
scientific name of the organism of interest, e.g. Mmusculus or Hsapiens |
genome |
reference alignment, i.e. mm10 or hg38 |
Data frame with sample name and coverage in repeat masker regions
library(BSgenome.Mmusculus.UCSC.mm10) library(AnnotationHub) load(system.file("extdata", 'bsObject.rda', package='scmeth')) repMask(bs, Mmusculus, 'mm10')
library(BSgenome.Mmusculus.UCSC.mm10) library(AnnotationHub) load(system.file("extdata", 'bsObject.rda', package='scmeth')) repMask(bs, Mmusculus, 'mm10')
This function uses most of the functions in this package to generate a report for the user
report(bsObj, outdirectory, organism, genome, mbiasDir = NULL, subSample = 1e+06, offset = 50000, small = FALSE)
report(bsObj, outdirectory, organism, genome, mbiasDir = NULL, subSample = 1e+06, offset = 50000, small = FALSE)
bsObj |
bsseq object |
outdirectory |
name of the output directory where the report will be saved |
organism |
scientific name of the organism of interest, e.g. Mmusculus or Hsapiens |
genome |
reference alignment, e.g. mm10 or hg38 the report will have graphics on read information |
mbiasDir |
Optional argument to provide directory name that has the mbias files or the list of mbias files |
subSample |
number of CpGs to subsample Default value is 1000000. |
offset |
how many CpGs to offset when subsampling Default value is set to be 50000, i.e. first 50000 CpGs will be ignored in subsampling. |
small |
Indicator for a small dataset, cpg density is calculated more |
Report will be an html file
library(BSgenome.Hsapiens.NCBI.GRCh38) directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) mbiasDirectory=system.file("extdata", package='scmeth') outDir <- system.file(package='scmeth') report(bs, outDir, Hsapiens, 'hg38', mbiasDir=mbiasDirectory, small=TRUE)
library(BSgenome.Hsapiens.NCBI.GRCh38) directory <- system.file("extdata/bismark_data", package='scmeth') bs <- HDF5Array::loadHDF5SummarizedExperiment(directory) mbiasDirectory=system.file("extdata", package='scmeth') outDir <- system.file(package='scmeth') report(bs, outDir, Hsapiens, 'hg38', mbiasDir=mbiasDirectory, small=TRUE)
scmeth: a package to conduct quality control analysis for methylation data. Most functions can be applied to both bulk and single-cell methylation while other functions are specific to single-cell methylation data. scmeth is especially customized to use the output from the FireCloud implementation of methylation pipeline to produce comprehensive quality control report