Title: | Analysis tools for Single Molecule Footprinting (SMF) data |
---|---|
Description: | SingleMoleculeFootprinting provides functions to analyze Single Molecule Footprinting (SMF) data. Following the workflow exemplified in its vignette, the user will be able to perform basic data analysis of SMF data with minimal coding effort. Starting from an aligned bam file, we show how to perform quality controls over sequencing libraries, extract methylation information at the single molecule level accounting for the two possible kind of SMF experiments (single enzyme or double enzyme), classify single molecules based on their patterns of molecular occupancy, plot SMF information at a given genomic location. |
Authors: | Guido Barzaghi [aut, cre] , Arnaud Krebs [aut] , Mike Smith [ctb] |
Maintainer: | Guido Barzaghi <[email protected]> |
License: | GPL-3 |
Version: | 2.1.0 |
Built: | 2024-10-31 05:58:16 UTC |
Source: | https://github.com/bioc/SingleMoleculeFootprinting |
For each TFBS, the genomic neighborhood defined by max_cluster_width will be scanned for adjacent TFBSs. The hits will be filtered for min_intersite_distance where, in case of overlapping TFBSs, the second TFBS will be arbitrarily dropped. These TFBSs plus the central "anchoring" one will define a TFBS cluster. This approach implies that the same TFBS can be employed to design multiple clusters in a sliding-window fashion.
Arrange_TFBSs_clusters( TFBSs, max_intersite_distance = 75, min_intersite_distance = 15, max_cluster_size = 6, max_cluster_width = 300, add.single.TFs = TRUE )
Arrange_TFBSs_clusters( TFBSs, max_intersite_distance = 75, min_intersite_distance = 15, max_cluster_size = 6, max_cluster_width = 300, add.single.TFs = TRUE )
TFBSs |
GRanges object of TFBSs |
max_intersite_distance |
maximum allowed distance in base pairs between two TFBS centers for them to be considered part of the same cluster. Defaults to 75. |
min_intersite_distance |
minimum allowed distance in base pairs between two TFBS centers for them not to be discarded as overlapping. This parameter should be set according to the width of the bins used for later sorting. Defaults to 15. |
max_cluster_size |
maximum number of TFBSs to be contained in any given cluster. Defaults to 6 |
max_cluster_width |
maximum width of TFBS clusters in bps. Defaults to 300 |
add.single.TFs |
Whether to add the TFs not used to create TFBS.clusters to the list for sorting. Defaults to TRUE |
list with two elements: ClusterCoordinates (GRanges object of clusters coordinates) and ClusterComposition (GRangesList of sites for each cluster)
KLF4s = qs::qread(system.file("extdata", "KLF4_chr19.qs", package="SingleMoleculeFootprinting")) Arrange_TFBSs_clusters(KLF4s)
KLF4s = qs::qread(system.file("extdata", "KLF4_chr19.qs", package="SingleMoleculeFootprinting")) Arrange_TFBSs_clusters(KLF4s)
check bait capture efficiency. Expected to be ~70
BaitCapture(sampleFile, genome, baits, clObj = NULL)
BaitCapture(sampleFile, genome, baits, clObj = NULL)
sampleFile |
QuasR sample sheet |
genome |
BS genome |
baits |
GRanges obj of bait coordinates. We provide and example through SingleMoleculeFootprintingData::EnrichmentRegions_mm10.rds() |
clObj |
cluster object to emply for parallel processing created using the parallel::makeCluster function. Defaults to NULL |
bait capture efficiency
sampleFile = paste0(tempdir(), "/NRF1Pair_Qinput.txt") if(file.exists(sampleFile)){ library(BSgenome.Mmusculus.UCSC.mm10) BaitRegions = SingleMoleculeFootprintingData::EnrichmentRegions_mm10.rds() BaitCapture(sampleFile = sampleFile, genome = BSgenome.Mmusculus.UCSC.mm10, baits = BaitRegions) }
sampleFile = paste0(tempdir(), "/NRF1Pair_Qinput.txt") if(file.exists(sampleFile)){ library(BSgenome.Mmusculus.UCSC.mm10) BaitRegions = SingleMoleculeFootprintingData::EnrichmentRegions_mm10.rds() BaitCapture(sampleFile = sampleFile, genome = BSgenome.Mmusculus.UCSC.mm10, baits = BaitRegions) }
Summarize methylation inside sorting bins
BinMethylation(MethSM, Bin)
BinMethylation(MethSM, Bin)
MethSM |
Single molecule matrix |
Bin |
IRanges object with absolute coordinates for single sorting bin. |
Reads covering bin with their summarized methylation status
library(IRanges) library(GenomicRanges) MethSM = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting"))[[2]]$SMF_MM_TKO_DE_ TFBSs = qs::qread(system.file("extdata", "TFBSs_1.qs", package="SingleMoleculeFootprinting")) motif_center_1 = start(IRanges::resize(TFBSs[1], 1, "center")) motif_center_2 = start(IRanges::resize(TFBSs[2], 1, "center")) SortingBins = c( GRanges("chr6", IRanges(motif_center_1-35, motif_center_1-25)), GRanges("chr6", IRanges(motif_center_1-7, motif_center_1+7)), GRanges("chr6", IRanges(motif_center_2-7, motif_center_2+7)), GRanges("chr6", IRanges(motif_center_2+25, motif_center_2+35)) ) binMethylationValues = BinMethylation(MethSM = MethSM, Bin = SortingBins[1])
library(IRanges) library(GenomicRanges) MethSM = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting"))[[2]]$SMF_MM_TKO_DE_ TFBSs = qs::qread(system.file("extdata", "TFBSs_1.qs", package="SingleMoleculeFootprinting")) motif_center_1 = start(IRanges::resize(TFBSs[1], 1, "center")) motif_center_2 = start(IRanges::resize(TFBSs[2], 1, "center")) SortingBins = c( GRanges("chr6", IRanges(motif_center_1-35, motif_center_1-25)), GRanges("chr6", IRanges(motif_center_1-7, motif_center_1+7)), GRanges("chr6", IRanges(motif_center_2-7, motif_center_2+7)), GRanges("chr6", IRanges(motif_center_2+25, motif_center_2+35)) ) binMethylationValues = BinMethylation(MethSM = MethSM, Bin = SortingBins[1])
Can deal with multiple samples
CallContextMethylation( sampleFile, samples, genome, RegionOfInterest, coverage = 20, ConvRate.thr = NULL, returnSM = TRUE, clObj = NULL, verbose = FALSE )
CallContextMethylation( sampleFile, samples, genome, RegionOfInterest, coverage = 20, ConvRate.thr = NULL, returnSM = TRUE, clObj = NULL, verbose = FALSE )
sampleFile |
QuasR pointer file |
samples |
vector of unique sample names corresponding to the SampleName field from the sampleFile |
genome |
BSgenome |
RegionOfInterest |
GenimocRange representing the genomic region of interest |
coverage |
coverage threshold as integer for least number of reads to cover a cytosine for it to be carried over in the analysis. Defaults to 20. |
ConvRate.thr |
Convesion rate threshold. Double between 0 and 1, defaults to NULL. To skip this filtering step, set to NULL. For more information, check out the details section. |
returnSM |
whether to return the single molecule matrix, defaults to TRUE |
clObj |
cluster object for parallel processing of multiple samples. For now only used by qMeth call for bulk methylation. Should be the output of a parallel::makeCluster() call |
verbose |
whether to print out messages while executing. Defaults to FALSE |
The ConvRate.thr argument should be used with care as it could create biases (e.g. when only one C out of context is present) while generally only marginally cleaning up the data.
List with two Granges objects: average methylation call (GRanges) and single molecule methylation call (matrix)
sampleFile = NULL if(!is.null(sampleFile)){ Methylation <- CallContextMethylation( sampleFile = sampleFile, samples = samples, genome = BSgenome.Mmusculus.UCSC.mm10, RegionOfInterest = RegionOfInterest, coverage = 20, returnSM = TRUE, ConvRate.thr = NULL, clObj = NULL ) }
sampleFile = NULL if(!is.null(sampleFile)){ Methylation <- CallContextMethylation( sampleFile = sampleFile, samples = samples, genome = BSgenome.Mmusculus.UCSC.mm10, RegionOfInterest = RegionOfInterest, coverage = 20, returnSM = TRUE, ConvRate.thr = NULL, clObj = NULL ) }
Implementation performing a similar operation of rbind.fill.Matrix but for columns
cbind.fill.Matrix(x, y)
cbind.fill.Matrix(x, y)
x |
sparse matrix constructed using the function Matrix::sparseMatrix. Should have Dimnames and dims (e.g. when indexing drop=FALSE) |
y |
sparse matrix constructed using the function Matrix::sparseMatrix. Should have Dimnames and dims (e.g. when indexing drop=FALSE) |
N.b. only possible fill at the moment is 0
Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) MethSM_1 = Methylation[[2]][[1]] Methylation = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting")) MethSM_2 = Methylation[[2]][[1]] cbind.fill.Matrix(MethSM_1, MethSM_2)
Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) MethSM_1 = Methylation[[2]][[1]] Methylation = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting")) MethSM_2 = Methylation[[2]][[1]] cbind.fill.Matrix(MethSM_1, MethSM_2)
Collapse strands
CollapseStrands(MethGR, context)
CollapseStrands(MethGR, context)
MethGR |
Granges obj of average methylation |
context |
"GC" or "HCG". Broad because indicates just the directionality of collapse. |
MethGR with collapsed strands (everything turned to - strand)
# CollapseStrands(MethGR, "GC")
# CollapseStrands(MethGR, "GC")
The idea here is that (regardless of context) if a C is on the - strand, calling getSeq on that coord (N.b. unstranded, that's the important bit) will give a "G', a "C" if it's a + strand.
CollapseStrandsSM(MethSM, context, genome, chr)
CollapseStrandsSM(MethSM, context, genome, chr)
MethSM |
Single molecule matrix |
context |
"GC" or "CG". Broad because indicates just the directionality of collapse. |
genome |
BSgenome |
chr |
Chromosome, MethSM doesn't carry this info |
Strand collapsed MethSM
# CollapseStrandsSM(MethSM, "GC", BSgenome.Mmusculus.UCSC.mm10, "chr19")
# CollapseStrandsSM(MethSM, "GC", BSgenome.Mmusculus.UCSC.mm10, "chr19")
Collect bulk SMF data for later composite plotting
CollectCompositeData( sampleFile, samples, genome, TFBSs, window, coverage = 20, ConvRate.thr = NULL, cores = 1 )
CollectCompositeData( sampleFile, samples, genome, TFBSs, window, coverage = 20, ConvRate.thr = NULL, cores = 1 )
sampleFile |
QuasR sampleFile |
samples |
vector of unique sample names corresponding to the SampleName field from the sampleFile |
genome |
BSgenome |
TFBSs |
GRanges object of TF binding sites to collect info for. We reccommend employing 50 to 200 TFBSs. |
window |
window size to collect methylation information for |
coverage |
coverage threshold as integer for least number of reads to cover a cytosine for it to be carried over in the analysis. Defaults to 20. |
ConvRate.thr |
Convesion rate threshold. Double between 0 and 1, defaults to NULL For more information, check out the details section |
cores |
number of cores to use |
data.frame of bulk SMF info ready for plotting
sampleFile = NULL if(!is.null(sampleFile)){ CollectCompositeData( sampleFile = sampleFile, samples = samples, genome = BSgenome.Mmusculus.UCSC.mm10, TFBSs = TopMotifs, window = 1000, coverage = 20, ConvRate.thr = NULL, cores = 16 ) -> CompositeData }
sampleFile = NULL if(!is.null(sampleFile)){ CollectCompositeData( sampleFile = sampleFile, samples = samples, genome = BSgenome.Mmusculus.UCSC.mm10, TFBSs = TopMotifs, window = 1000, coverage = 20, ConvRate.thr = NULL, cores = 16 ) -> CompositeData }
Calculate colMeans after dropping zeros
colMeans_drop0(MethSM)
colMeans_drop0(MethSM)
MethSM |
one single molecule sparse matrix |
colMeans (N.b. this is +1 based)
Monitor methylation rate distribution in a low coverage samples as compared to a high coverage "reference" one. It bins cytosines with similar methylation rates (as observed in the HighCoverage sample) into bins. A single methylation rate value is computed for each bin
CompositeMethylationCorrelation( LowCoverage, LowCoverage_samples, HighCoverage, HighCoverage_samples, bins = 50, returnDF = FALSE, returnPlot = TRUE, RMSE = TRUE, return_RMSE_DF = FALSE, return_RMSE_plot = TRUE )
CompositeMethylationCorrelation( LowCoverage, LowCoverage_samples, HighCoverage, HighCoverage_samples, bins = 50, returnDF = FALSE, returnPlot = TRUE, RMSE = TRUE, return_RMSE_DF = FALSE, return_RMSE_plot = TRUE )
LowCoverage |
Single GRanges object as returned by CallContextMethylation function run with Coverage parameter set to 1. The object can also contain cytosines from multiple contexts |
LowCoverage_samples |
Samples to use from the LowCoverage object. Either a string or a vector (for multiple samples). |
HighCoverage |
Single GRanges object as returned by CallContextMethylation function. The object can also contain cytosines from multiple contexts. |
HighCoverage_samples |
Single sample to use from HighCoverage. String |
bins |
The number of bins for which to calculate the "binned" methylation rate. Defaults to 50 |
returnDF |
Whether to return the data.frame used for plotting. Defaults to FALSE |
returnPlot |
Whether to return the plot. Defaults to TRUE |
RMSE |
Whether to calculate Mean squared error (RMSE) of methylation rate distribution estimates for low coverage samples. Defaults to TRUE |
return_RMSE_DF |
Whether to return a data.frame of computed RMSE values. Defaults to FALSE |
return_RMSE_plot |
Whether to return a barplot of computed values. Defaults to TRUE |
# I don't have enough example data for this # CompositeMethylationCorrelation(LowCoverage = LowCoverage$DGCHN, # LowCoverage_samples = LowCoverage_Samples, # HighCoverage = HighCoverage$DGCHN, # HighCoverage_samples = HighCoverage_samples[1], # returnDF = FALSE, # returnPlot = TRUE, # RMSE = TRUE, # return_RMSE_DF = FALSE, # return_RMSE_plot = TRUE)
# I don't have enough example data for this # CompositeMethylationCorrelation(LowCoverage = LowCoverage$DGCHN, # LowCoverage_samples = LowCoverage_Samples, # HighCoverage = HighCoverage$DGCHN, # HighCoverage_samples = HighCoverage_samples[1], # returnDF = FALSE, # returnPlot = TRUE, # RMSE = TRUE, # return_RMSE_DF = FALSE, # return_RMSE_plot = TRUE)
Will use geom_point with <= 5000 points, geom_hex otherwise
CompositePlot(CompositeData, span = 0.1, TF)
CompositePlot(CompositeData, span = 0.1, TF)
CompositeData |
the output of the CollectCompositeData function |
span |
the span parameter to pass to geom_smooth |
TF |
string of TF name to use for plot title |
# CompositePlot(CompositeData = CompositeData, span = 0.1, TF = "Rest")
# CompositePlot(CompositeData = CompositeData, span = 0.1, TF = "Rest")
calculate sequencing library conversion rate on a chromosome of choice
ConversionRate(sampleFile, genome, chr = 19, cores = 1)
ConversionRate(sampleFile, genome, chr = 19, cores = 1)
sampleFile |
QuasR sample sheet |
genome |
BS genome |
chr |
chromosome to calculate conversion rate on (default: 19) |
cores |
number of cores for parallel processing. Defaults to 1 |
# ConversionRate(sampleFile = sampleFile, # genome = BSgenome.Mmusculus.UCSC.mm10, chr = 19, cores = 1)
# ConversionRate(sampleFile = sampleFile, # genome = BSgenome.Mmusculus.UCSC.mm10, chr = 19, cores = 1)
Filter Cs for coverage
CoverageFilter(MethGR, thr)
CoverageFilter(MethGR, thr)
MethGR |
Granges obj of average methylation |
thr |
converage threshold |
filtered MethGR
Relevant for genome-wide analyses
Create_MethylationCallingWindows( RegionsOfInterest, max_intercluster_distance = 1e+05, max_window_width = 5e+06, min_cluster_width = 600, genomic.seqlenghts, fix.window.size = FALSE, max.window.size = 500 )
Create_MethylationCallingWindows( RegionsOfInterest, max_intercluster_distance = 1e+05, max_window_width = 5e+06, min_cluster_width = 600, genomic.seqlenghts, fix.window.size = FALSE, max.window.size = 500 )
RegionsOfInterest |
TFBS cluster coordinates analogous to ClusterCoordinates object returned by Arrange_TFBSs_clusters function |
max_intercluster_distance |
maximum distance between two consecutive TFBS clusters for them to be grouped in the same window |
max_window_width |
upper limit to window width. This value should be adjusted according to the user's system as it determines the amount of memory used in the later context methylation call |
min_cluster_width |
lower limit to window width. Corresponds to the scenario when a window contains a single TFBS cluster. |
genomic.seqlenghts |
used to fix the windows spanning over chromosome edges. To be fetched by GenomeInfoDb::seqlengths() or equivalent. |
fix.window.size |
Defaults to FALSE. When TRUE, overrides arguments max_intercluster_distance and max_window_width and produces windows containing a fixed number of TFBS_clusters. |
max.window.size |
Max number of TFBS_clusters per window. Used only when fix.window.size is TRUE. N.b.: window size could be slightly higher than passed value if RegionsOfInterest overlap |
GRanges object of window coordinates to be used for more efficient calls of CallContextMethylation
KLF4s = qs::qread(system.file("extdata", "KLF4_chr19.qs", package="SingleMoleculeFootprinting")) Create_MethylationCallingWindows(RegionsOfInterest = KLF4s)
KLF4s = qs::qread(system.file("extdata", "KLF4_chr19.qs", package="SingleMoleculeFootprinting")) Create_MethylationCallingWindows(RegionsOfInterest = KLF4s)
Detect type of experiment
DetectExperimentType(Samples)
DetectExperimentType(Samples)
Samples |
SampleNames field from QuasR sampleFile |
CacheDir = ExperimentHub::getExperimentHubOption(arg = "CACHE") sampleFile = paste0(CacheDir, "/NRF1Pair_sampleFile.txt") samples = suppressMessages(unique(readr::read_delim(sampleFile, delim = "\t")[[2]])) DetectExperimentType(samples)
CacheDir = ExperimentHub::getExperimentHubOption(arg = "CACHE") sampleFile = paste0(CacheDir, "/NRF1Pair_sampleFile.txt") samples = suppressMessages(unique(readr::read_delim(sampleFile, delim = "\t")[[2]])) DetectExperimentType(samples)
Recalculate *_T and *_M values in MethGR object after filtering reads e.g. for conversion rate
filter_reads_from_MethGR(MethGR, MethSM, MethSM_filtered, sampleIndex)
filter_reads_from_MethGR(MethGR, MethSM, MethSM_filtered, sampleIndex)
MethGR |
GRanges object of methylation call |
MethSM |
Single Molecule methylation matrix |
MethSM_filtered |
Single Molecule methylation matrix after filtering reads |
sampleIndex |
index for sample to treat. It serves as a correspondence between the index of the SM matrix and the order samples appear in the elementMetadata() columns |
MethGR with recalculated counts
Calculate reads conversion rate
FilterByConversionRate(MethSM, chr, genome, thr)
FilterByConversionRate(MethSM, chr, genome, thr)
MethSM |
as comes out of the func GetSingleMolMethMat |
chr |
Chromosome, MethSM doesn't carry this info |
genome |
BSgenome |
thr |
Double between 0 and 1. Threshold below which to filter reads. |
Filtered MethSM
library(BSgenome.Mmusculus.UCSC.mm10) MethSM = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting"))[[2]]$SMF_MM_TKO_DE_ FilterByConversionRate(MethSM, chr = "chr19", genome = BSgenome.Mmusculus.UCSC.mm10, thr = 0.8)
library(BSgenome.Mmusculus.UCSC.mm10) MethSM = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting"))[[2]]$SMF_MM_TKO_DE_ FilterByConversionRate(MethSM, chr = "chr19", genome = BSgenome.Mmusculus.UCSC.mm10, thr = 0.8)
Filter Cytosines in context
FilterContextCytosines(MethGR, genome, context)
FilterContextCytosines(MethGR, genome, context)
MethGR |
Granges obj of average methylation |
genome |
BSgenome |
context |
Context of interest (e.g. "GC", "CG",..) |
filtered Granges obj
library(BSgenome.Mmusculus.UCSC.mm10) MethGR = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting"))[[1]] FilterContextCytosines(MethGR, BSgenome.Mmusculus.UCSC.mm10, "NGCNN")
library(BSgenome.Mmusculus.UCSC.mm10) MethGR = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting"))[[1]] FilterContextCytosines(MethGR, BSgenome.Mmusculus.UCSC.mm10, "NGCNN")
Utility function to perform the dplyr full_join operation on GRanges object
full.join.granges(MethGR1, MethGR2)
full.join.granges(MethGR1, MethGR2)
MethGR1 |
Methylation GRanges as output by the CallContextMethylation() function |
MethGR2 |
Methylation GRanges as output by the CallContextMethylation() function |
Get QuasRprj
GetQuasRprj(sampleFile, genome)
GetQuasRprj(sampleFile, genome)
sampleFile |
QuasR pointer file |
genome |
BSgenome |
library(BSgenome.Mmusculus.UCSC.mm10) CacheDir <- ExperimentHub::getExperimentHubOption(arg = "CACHE") sampleFile = paste0(CacheDir, "/NRF1Pair_sampleFile.txt") QuasRprj = GetQuasRprj(sampleFile, BSgenome.Mmusculus.UCSC.mm10)
library(BSgenome.Mmusculus.UCSC.mm10) CacheDir <- ExperimentHub::getExperimentHubOption(arg = "CACHE") sampleFile = paste0(CacheDir, "/NRF1Pair_sampleFile.txt") QuasRprj = GetQuasRprj(sampleFile, BSgenome.Mmusculus.UCSC.mm10)
Used internally as the first step in CallContextMethylation
GetSingleMolMethMat(QuasRprj, range, sample)
GetSingleMolMethMat(QuasRprj, range, sample)
QuasRprj |
QuasR project object as returned by calling the QuasR function qAling on previously aligned data |
range |
GenimocRange representing the genomic region of interest |
sample |
One of the sample names as reported in the SampleName field of the QuasR sampleFile provided to qAlign. N.b. all the files with the passed sample name will be used to call methylation |
List of single molecule methylation matrixes (all Cytosines), one per sample
library(BSgenome.Mmusculus.UCSC.mm10) library(IRanges) library(GenomicRanges) CacheDir <- ExperimentHub::getExperimentHubOption(arg = "CACHE") sampleFile = paste0(CacheDir, "/NRF1Pair_sampleFile.txt") sample = suppressMessages(readr::read_delim(sampleFile, delim = "\t")[[2]]) QuasRprj = GetQuasRprj(sampleFile, BSgenome.Mmusculus.UCSC.mm10) range = GRanges("chr6", IRanges(88106000, 88106500)) GetSingleMolMethMat(QuasRprj, range, sample)
library(BSgenome.Mmusculus.UCSC.mm10) library(IRanges) library(GenomicRanges) CacheDir <- ExperimentHub::getExperimentHubOption(arg = "CACHE") sampleFile = paste0(CacheDir, "/NRF1Pair_sampleFile.txt") sample = suppressMessages(readr::read_delim(sampleFile, delim = "\t")[[2]]) QuasRprj = GetQuasRprj(sampleFile, BSgenome.Mmusculus.UCSC.mm10) range = GRanges("chr6", IRanges(88106000, 88106500)) GetSingleMolMethMat(QuasRprj, range, sample)
Inner utility for LowCoverageMethRateDistribution
GRanges_to_DF(GRanges_obj)
GRanges_to_DF(GRanges_obj)
GRanges_obj |
GRanges object as returned by CallContextMethylation function |
Perform Hierarchical clustering on single reads
HierarchicalClustering(MethSM)
HierarchicalClustering(MethSM)
MethSM |
Single molecule methylation matrix |
Calculate Root mean squared error (RMSE) of methylation rate distribution estimates for low coverage samples
LowCoverageMethRate_RMSE(BinnedMethRate)
LowCoverageMethRate_RMSE(BinnedMethRate)
BinnedMethRate |
data.frame as returned by GRanges_to_DF function. |
Utility function to remove cytosines whose MTase target genomic context is affected by SNPs
MaskSNPs( Methylation, CytosinesToMask, MaskSMmat = FALSE, SampleStringMatch = list(Cast = "_CTKO", Spret = "_STKO"), Experiment )
MaskSNPs( Methylation, CytosinesToMask, MaskSMmat = FALSE, SampleStringMatch = list(Cast = "_CTKO", Spret = "_STKO"), Experiment )
Methylation |
as output by the CallContextMethylation() function |
CytosinesToMask |
GRanges specifying the coordinate of the cytosines to discard. |
MaskSMmat |
whether the parameter Methylation includes single molecule matrixes |
SampleStringMatch |
list of per-sample string matches that are used to uniquely identify the relevant column for each species in the Methylation object. Defaults to list(Cast = "_CTKO", Spret = "_STKO") |
Experiment |
as detected by the DetectExperimentType() function. Should be either "DE" or "NO" |
Methylation = qs::qread(system.file("extdata", "Methylation_2.qs", package="SingleMoleculeFootprinting")) CytosinesToMask = qs::qread(system.file("extdata", "cytosines_to_mask.qs", package="SingleMoleculeFootprinting")) MaskSNPs(Methylation = Methylation, CytosinesToMask = CytosinesToMask, MaskSMmat = FALSE, SampleStringMatch = list(Cast = "_CTKO", Spret = "_STKO"), Experiment = "DE") -> Methylation_masked
Methylation = qs::qread(system.file("extdata", "Methylation_2.qs", package="SingleMoleculeFootprinting")) CytosinesToMask = qs::qread(system.file("extdata", "cytosines_to_mask.qs", package="SingleMoleculeFootprinting")) MaskSNPs(Methylation = Methylation, CytosinesToMask = CytosinesToMask, MaskSMmat = FALSE, SampleStringMatch = list(Cast = "_CTKO", Spret = "_STKO"), Experiment = "DE") -> Methylation_masked
Compute MethGR from MethSM
MethSM.to.MethGR(MethSM, chromosome)
MethSM.to.MethGR(MethSM, chromosome)
MethSM |
internal CallContextMethylation |
chromosome |
string |
Utility for HighCoverage_MethRate_SampleCorrelation
panel.cor(x, y, digits = 2, prefix = "", cex.cor)
panel.cor(x, y, digits = 2, prefix = "", cex.cor)
x |
x variable |
y |
y variable |
digits |
number of digits |
prefix |
string |
cex.cor |
graphical param |
Utility for HighCoverage_MethRate_SampleCorrelation
panel.hist(x, ...)
panel.hist(x, ...)
x |
data for hist |
... |
data for hist |
Utility for HighCoverage_MethRate_SampleCorrelation
panel.jet(...)
panel.jet(...)
... |
data for lower pairs panel |
Inner utility for LowCoverageMethRateDistribution
Plot_LowCoverageMethRate(Plotting_DF)
Plot_LowCoverageMethRate(Plotting_DF)
Plotting_DF |
data.frame as returned by GRanges_to_DF function. |
Produce barplot of RMSE values calculated for methylation rate distribution estimates of low coverage samples
Plot_LowCoverageMethRate_RMSE(RMSE_DF)
Plot_LowCoverageMethRate_RMSE(RMSE_DF)
RMSE_DF |
data.frame as returned by the LowCoverageMethRate_RMSE function |
Plot average methylation
PlotAvgSMF( MethGR, MethSM = NULL, RegionOfInterest, SortedReads = NULL, ShowContext = FALSE, TFBSs = NULL, SNPs = NULL, SortingBins = NULL )
PlotAvgSMF( MethGR, MethSM = NULL, RegionOfInterest, SortedReads = NULL, ShowContext = FALSE, TFBSs = NULL, SNPs = NULL, SortingBins = NULL )
MethGR |
Average methylation GRanges obj |
MethSM |
Single molecule matrix(es) |
RegionOfInterest |
GRanges interval to plot |
SortedReads |
List of sorted reads, needs to be passed along with the parameter MethSM. If both are passed, only counts relevant to sorting will be plotted |
ShowContext |
TRUE or FALSE (default). Causes the genomic context of the plotted cytosines to be displayed as the dot shape |
TFBSs |
GRanges object of transcription factor binding sites to include in the plot. Assumed to be already subset. Also assumed that the tf names are under the column "TF" |
SNPs |
GRanges object of SNPs to visualize. Assumed to be already subset. Assumed to have the reference and alternative sequences respectively under the columns "R" and "A" |
SortingBins |
GRanges object of sorting bins (absolute) coordinate to visualize |
library(GenomicRanges) RegionOfInterest = GRanges("chr12", IRanges(20464551, 20465050)) Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) PlotAvgSMF(MethGR = Methylation[[1]], RegionOfInterest = RegionOfInterest, TFBSs = TFBSs)
library(GenomicRanges) RegionOfInterest = GRanges("chr12", IRanges(20464551, 20465050)) Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) PlotAvgSMF(MethGR = Methylation[[1]], RegionOfInterest = RegionOfInterest, TFBSs = TFBSs)
Plot single molecule stack
PlotSingleMoleculeStack(MethSM, RegionOfInterest)
PlotSingleMoleculeStack(MethSM, RegionOfInterest)
MethSM |
Single molecule methylation matrix |
RegionOfInterest |
GRanges interval to plot |
library(GenomicRanges) RegionOfInterest = GRanges("chr12", IRanges(20464551, 20465050)) Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) PlotSingleMoleculeStack(MethSM = Methylation[[2]], RegionOfInterest = RegionOfInterest)
library(GenomicRanges) RegionOfInterest = GRanges("chr12", IRanges(20464551, 20465050)) Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) PlotSingleMoleculeStack(MethSM = Methylation[[2]], RegionOfInterest = RegionOfInterest)
Plot SMF data at single site
PlotSingleSiteSMF( Methylation, RegionOfInterest, ShowContext = FALSE, TFBSs = NULL, SNPs = NULL, SortingBins = NULL, SortedReads = NULL, sorting.strategy = "None" )
PlotSingleSiteSMF( Methylation, RegionOfInterest, ShowContext = FALSE, TFBSs = NULL, SNPs = NULL, SortingBins = NULL, SortedReads = NULL, sorting.strategy = "None" )
Methylation |
Context methylation object as returned by CallContextMethylation function |
RegionOfInterest |
GRanges interval to plot |
ShowContext |
TRUE or FALSE (default). Causes the genomic context of the plotted cytosines to be displayed as the dot shape |
TFBSs |
GRanges object of transcription factor binding sites to include in the plot. Assumed to be already subset. Also assumed that the tf names are under the column "TF" |
SNPs |
GRanges object of SNPs to visualize. Assumed to be already subset. Assumed to have the reference and alternative sequences respectively under the columns "R" and "A" |
SortingBins |
GRanges object of sorting bins (absolute) coordinate to visualize |
SortedReads |
Defaults to NULL, in which case will plot unsorted reads. Sorted reads object as returned by SortReads function or "HC" to perform hierarchical clustering |
sorting.strategy |
One of "classical" (default), "custom", "hierarchical.clustering" or "None". Determines how to display reads. For details check documentation from PlotSM function. |
library(GenomicRanges) RegionOfInterest = GRanges("chr12", IRanges(20464551, 20465050)) Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsBySingleTF(MethSM = Methylation[[2]], TFBS = TFBSs) PlotSingleSiteSMF(Methylation = Methylation, RegionOfInterest = RegionOfInterest, SortedReads = SortedReads, TFBSs = TFBSs)
library(GenomicRanges) RegionOfInterest = GRanges("chr12", IRanges(20464551, 20465050)) Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsBySingleTF(MethSM = Methylation[[2]], TFBS = TFBSs) PlotSingleSiteSMF(Methylation = Methylation, RegionOfInterest = RegionOfInterest, SortedReads = SortedReads, TFBSs = TFBSs)
adds the convenience of arranging reads before plotting
PlotSM( MethSM, RegionOfInterest, sorting.strategy = "classical", SortedReads = NULL )
PlotSM( MethSM, RegionOfInterest, sorting.strategy = "classical", SortedReads = NULL )
MethSM |
Single molecule methylation matrix |
RegionOfInterest |
GRanges interval to plot |
sorting.strategy |
One of "classical" (default), "custom", "hierarchical.clustering" or "None". Set to "classical" for classical one-TF/TF-pair sorting (as described in Sönmezer et al, MolCell, 2021). Should be passed along with argument SortedReads set to the Sorted reads object as returned by SortReads function. If set to "custom", SortedReads should be a list with one item per sample (corresponding to MethSM). If set to "hierarchical.clustering", the function will perform hierarchical clustering in place on a subset of reads. Useful to check for duplicated reads in amplicon sequencing experiments. If set to "None", it will plot unsorted reads. The argument sorting,strategy will always determine how to display reads with priority over the argument SortedReads |
SortedReads |
Defaults to NULL, in which case will plot unsorted reads. Sorted reads object as returned by SortReads function |
library(GenomicRanges) RegionOfInterest = GRanges("chr12", IRanges(20464551, 20465050)) Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsBySingleTF(MethSM = Methylation[[2]], TFBS = TFBSs) PlotSM(MethSM = Methylation[[2]], RegionOfInterest = RegionOfInterest, SortedReads = SortedReads)
library(GenomicRanges) RegionOfInterest = GRanges("chr12", IRanges(20464551, 20465050)) Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsBySingleTF(MethSM = Methylation[[2]], TFBS = TFBSs) PlotSM(MethSM = Methylation[[2]], RegionOfInterest = RegionOfInterest, SortedReads = SortedReads)
Implementation performing a similar operation of plyr::rbind.fill.matrix but for sparseMatrix
rbind.fill.Matrix(x, y)
rbind.fill.Matrix(x, y)
x |
sparse matrix constructed using the function Matrix::sparseMatrix. Should have Dimnames and dims (e.g. when indexing drop=FALSE) |
y |
sparse matrix constructed using the function Matrix::sparseMatrix. Should have Dimnames and dims (e.g. when indexing drop=FALSE) |
N.b. only possible fill at the moment is 0
Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) MethSM_1 = Methylation[[2]][[1]] Methylation = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting")) MethSM_2 = Methylation[[2]][[1]] rbind.fill.Matrix(MethSM_1, MethSM_2)
Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) MethSM_1 = Methylation[[2]][[1]] Methylation = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting")) MethSM_2 = Methylation[[2]][[1]] rbind.fill.Matrix(MethSM_1, MethSM_2)
Calculate rowMeans after dropping zeros
rowMeans_drop0(MethSM)
rowMeans_drop0(MethSM)
MethSM |
one single molecule sparse matrix |
rowMeans (N.b. this is +1 based)
Single TF state quantification bar
SingleTFStateQuantificationPlot(SortedReads)
SingleTFStateQuantificationPlot(SortedReads)
SortedReads |
Sorted reads as returned by SortReadsBySingleTF |
Hard-coded interpretation of biological states from single TF sorting
SingleTFStates()
SingleTFStates()
list of states
SingleTFStates()
SingleTFStates()
Sort reads by single TF
SortReads(MethSM, BinsCoordinates, coverage = NULL)
SortReads(MethSM, BinsCoordinates, coverage = NULL)
MethSM |
Single molecule matrix |
BinsCoordinates |
IRanges object of absolute coordinates for sorting bins |
coverage |
integer. Minimum number of reads covering all sorting bins for sorting to be performed |
list of sorted reads
library(IRanges) Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBS = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) bins = list(c(-35,-25), c(-15,15), c(25,35)) TFBS_center = start(TFBS) + (end(TFBS)-start(TFBS))/2 BinsCoordinates = IRanges( start = c(TFBS_center+bins[[1]][1], TFBS_center+bins[[2]][1], TFBS_center+bins[[3]][1]), end = c(TFBS_center+bins[[1]][2], TFBS_center+bins[[2]][2], TFBS_center+bins[[3]][2]) ) SortedReads = SortReads(Methylation[[2]]$SMF_MM_TKO_DE_, BinsCoordinates, coverage = 20)
library(IRanges) Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBS = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) bins = list(c(-35,-25), c(-15,15), c(25,35)) TFBS_center = start(TFBS) + (end(TFBS)-start(TFBS))/2 BinsCoordinates = IRanges( start = c(TFBS_center+bins[[1]][1], TFBS_center+bins[[2]][1], TFBS_center+bins[[3]][1]), end = c(TFBS_center+bins[[1]][2], TFBS_center+bins[[2]][2], TFBS_center+bins[[3]][2]) ) SortedReads = SortReads(Methylation[[2]]$SMF_MM_TKO_DE_, BinsCoordinates, coverage = 20)
Wrapper to SortReads for single TF case
SortReadsBySingleTF( MethSM, TFBS, bins = list(c(-35, -25), c(-15, 15), c(25, 35)), coverage = 20 )
SortReadsBySingleTF( MethSM, TFBS, bins = list(c(-35, -25), c(-15, 15), c(25, 35)), coverage = 20 )
MethSM |
Single molecule matrix list as returned by CallContextMethylation |
TFBS |
Transcription factor binding site to use for sorting, passed as a GRanges object of length 1 |
bins |
list of 3 relative bin coordinates. Defaults to list(c(-35,-25), c(-15,15), c(25,35)). bins[[1]] represents the upstream bin, with coordinates relative to the start of the TFBS. bins[[2]] represents the TFBS bin, with coordinates relative to the center of the TFBS. bins[[3]] represents the downstream bin, with coordinates relative to the end of the TFBS. |
coverage |
integer. Minimum number of reads covering all sorting bins for sorting to be performed. Defaults to 20 |
List of reads sorted by single TF
Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsBySingleTF(MethSM = Methylation[[2]], TFBS = TFBSs)
Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsBySingleTF(MethSM = Methylation[[2]], TFBS = TFBSs)
The function starts from a list of single TFBSs, arranges them into clusters, calls methylation at the interested sites and outputs sorted reads
SortReadsBySingleTF_MultiSiteWrapper( sampleFile, samples, genome, coverage = 20, ConvRate.thr = NULL, CytosinesToMask = NULL, TFBSs, max_interTF_distance = 1e+05, max_window_width = 5e+06, min_cluster_width = 600, fix.window.size = FALSE, max.window.size = NULL, sorting_coverage = 30, bins = list(c(-35, -25), c(-15, 15), c(25, 35)), cores = 1 )
SortReadsBySingleTF_MultiSiteWrapper( sampleFile, samples, genome, coverage = 20, ConvRate.thr = NULL, CytosinesToMask = NULL, TFBSs, max_interTF_distance = 1e+05, max_window_width = 5e+06, min_cluster_width = 600, fix.window.size = FALSE, max.window.size = NULL, sorting_coverage = 30, bins = list(c(-35, -25), c(-15, 15), c(25, 35)), cores = 1 )
sampleFile |
QuasR pointer file |
samples |
samples to use, from the SampleName field of the sampleFile |
genome |
BSgenome |
coverage |
coverage threshold as integer for least number of reads to cover a cytosine for it to be carried over in the analysis. Defaults to 20. |
ConvRate.thr |
Convesion rate threshold. Double between 0 and 1, defaults to NULL. To skip this filtering step, set to NULL. For more information, check out the details section. |
CytosinesToMask |
CytosinesToMask object. Passed to MaskSNPs function |
TFBSs |
GRanges object of transcription factor binding sites coordinates |
max_interTF_distance |
maximum distance between two consecutive TFBSs for them to be grouped in the same window |
max_window_width |
upper limit to window width. This value should be adjusted according to the user's system as it determines the amount of memory used in the later context methylation call |
min_cluster_width |
lower limit to window width. Corresponds to the scenario when a window contains a single TFBS. |
fix.window.size |
defaults to FALSE. Passed to Create_MethylationCallingWindows function. |
max.window.size |
defaults to NULL. Passed to Create_MethylationCallingWindows function. |
sorting_coverage |
integer. Minimum number of reads covering all sorting bins for sorting to be performed. Defaults to 30. |
bins |
list of 3 relative bin coordinates. Defaults to list(c(-35,-25), c(-15,15), c(25,35)). bins[[1]] represents the upstream bin, with coordinates relative to the start of the most upstream TFBS. bins[[2]] represents all the TFBS bins, with coordinates relative to the center of each TFBS. bins[[3]] represents the downstream bin, with coordinates relative to the end of the most downstream TFBS. |
cores |
number of cores to use for parallel processing of multiple Methylation Calling Windows (i.e. groupings of adjecent TFBS clusters) |
list where [[1]] is the TFBSs GRanges object describing coordinates TFBSs used to sort single molecules [[2]] is a list of SortedReads nested per TFBS_cluster and sample [[3]] is a tibble reporting the count (and frequency) of reads per state, sample and TFBS cluster
sampleFile = NULL if(!is.null(sampleFile)){ SortReadsBySingleTF_MultiSiteWrapper( sampleFile = sampleFile, samples = samples, genome = BSgenome.Mmusculus.UCSC.mm10, coverage = 20, ConvRate.thr = NULL, CytosinesToMask = NULL, TFBSs = KLF4s, max_interTF_distance = NULL, max_window_width = NULL, min_cluster_width = NULL, fix.window.size = TRUE, max.window.size = 50, cores = 4 ) -> sorting_results }
sampleFile = NULL if(!is.null(sampleFile)){ SortReadsBySingleTF_MultiSiteWrapper( sampleFile = sampleFile, samples = samples, genome = BSgenome.Mmusculus.UCSC.mm10, coverage = 20, ConvRate.thr = NULL, CytosinesToMask = NULL, TFBSs = KLF4s, max_interTF_distance = NULL, max_window_width = NULL, min_cluster_width = NULL, fix.window.size = TRUE, max.window.size = 50, cores = 4 ) -> sorting_results }
Wrapper to SortReads for TF cluster case
SortReadsByTFCluster( MethSM, TFBS_cluster, bins = list(c(-35, -25), c(-7, 7), c(25, 35)), coverage = 30 )
SortReadsByTFCluster( MethSM, TFBS_cluster, bins = list(c(-35, -25), c(-7, 7), c(25, 35)), coverage = 30 )
MethSM |
Single molecule matrix list as returned by CallContextMethylation |
TFBS_cluster |
Transcription factor binding sites to use for sorting, passed as a GRanges object of length > 1 |
bins |
list of 3 relative bin coordinates. Defaults to list(c(-35,-25), c(-7,7), c(25,35)). bins[[1]] represents the upstream bin, with coordinates relative to the start of the most upstream TFBS. bins[[2]] represents all the TFBS bins, with coordinates relative to the center of each TFBS. bins[[3]] represents the downstream bin, with coordinates relative to the end of the most downstream TFBS. |
coverage |
integer. Minimum number of reads covering all sorting bins for sorting to be performed. Defaults to 30 |
List of reads sorted by TF cluster
Methylation = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_1.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsByTFCluster(MethSM = Methylation[[2]], TFBS_cluster = TFBSs)
Methylation = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_1.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsByTFCluster(MethSM = Methylation[[2]], TFBS_cluster = TFBSs)
The function starts from a list of single TFBSs, arranges them into clusters, calls methylation at the interested sites and outputs sorted reads
SortReadsByTFCluster_MultiSiteWrapper( sampleFile, samples, genome, coverage = 20, ConvRate.thr = 0.8, CytosinesToMask = NULL, TFBSs, max_intersite_distance = 75, min_intersite_distance = 15, max_cluster_size = 10, max_cluster_width = 300, add.single.TFs = TRUE, max_intercluster_distance = 1e+05, max_window_width = 5e+06, min_cluster_width = 600, fix.window.size = FALSE, max.window.size = NULL, sorting_coverage = 30, bins = list(c(-35, -25), c(-7, 7), c(25, 35)), cores = 1 )
SortReadsByTFCluster_MultiSiteWrapper( sampleFile, samples, genome, coverage = 20, ConvRate.thr = 0.8, CytosinesToMask = NULL, TFBSs, max_intersite_distance = 75, min_intersite_distance = 15, max_cluster_size = 10, max_cluster_width = 300, add.single.TFs = TRUE, max_intercluster_distance = 1e+05, max_window_width = 5e+06, min_cluster_width = 600, fix.window.size = FALSE, max.window.size = NULL, sorting_coverage = 30, bins = list(c(-35, -25), c(-7, 7), c(25, 35)), cores = 1 )
sampleFile |
QuasR pointer file |
samples |
samples to use, from the SampleName field of the sampleFile |
genome |
BSgenome |
coverage |
coverage threshold as integer for least number of reads to cover a cytosine for it to be carried over in the analysis. Defaults to 20. |
ConvRate.thr |
Convesion rate threshold. Double between 0 and 1, defaults to 0.8. To skip this filtering step, set to NULL. For more information, check out the details section. |
CytosinesToMask |
CytosinesToMask object. Passed to MaskSNPs function |
TFBSs |
GRanges object of transcription factor binding sites coordinates |
max_intersite_distance |
maximum allowed distance in base pairs between two TFBS centers for them to be considered part of the same cluster. Defaults to 75. |
min_intersite_distance |
minimum allowed distance in base pairs between two TFBS centers for them not to be discarded as overlapping. This parameter should be set according to the width of the bins used for later sorting. Defaults to 15. |
max_cluster_size |
maximum number of TFBSs to be contained in any given cluster. Defaults to 10 |
max_cluster_width |
maximum cluster width in bp. Defaults to 300 |
add.single.TFs |
whether to add to output the TFBSs that didn't make it into clusters. Defaults to TRUE |
max_intercluster_distance |
maximum distance between two consecutive TFBS clusters for them to be grouped in the same window |
max_window_width |
upper limit to window width. This value should be adjusted according to the user's system as it determines the amount of memory used in the later context methylation call |
min_cluster_width |
lower limit to window width. Corresponds to the scenario when a window contains a single TFBS cluster. |
fix.window.size |
defaults to FALSE. Passed to Create_MethylationCallingWindows function. |
max.window.size |
defaults to NULL. Passed to Create_MethylationCallingWindows function. |
sorting_coverage |
integer. Minimum number of reads covering all sorting bins for sorting to be performed. Defaults to 30. |
bins |
list of 3 relative bin coordinates. Defaults to list(c(-35,-25), c(-7,7), c(25,35)). bins[[1]] represents the upstream bin, with coordinates relative to the start of the most upstream TFBS. bins[[2]] represents all the TFBS bins, with coordinates relative to the center of each TFBS. bins[[3]] represents the downstream bin, with coordinates relative to the end of the most downstream TFBS. |
cores |
number of cores to use for parallel processing of multiple Methylation Calling Windows (i.e. groupings of adjecent TFBS clusters) |
list where [[1]] is the TFBS_Clusters object describing coordinates and composition of the TFBS clusters used to sort single molecules [[2]] is a list of SortedReads nested per TFBS_cluster and sample [[3]] is a tibble reporting the count (and frequency) of reads per state, samples and TFBS cluster
sampleFile = NULL if(!is.null(sampleFile)){ SortReadsByTFCluster_MultiSiteWrapper( sampleFile = sampleFile, samples = samples, genome = BSgenome.Mmusculus.UCSC.mm10, coverage = 20, ConvRate.thr = NULL, CytosinesToMask = NULL, TFBSs = KLF4s, max_interTF_distance = NULL, max_window_width = NULL, min_cluster_width = NULL, fix.window.size = TRUE, max.window.size = 50, cores = 4 ) -> sorting_results }
sampleFile = NULL if(!is.null(sampleFile)){ SortReadsByTFCluster_MultiSiteWrapper( sampleFile = sampleFile, samples = samples, genome = BSgenome.Mmusculus.UCSC.mm10, coverage = 20, ConvRate.thr = NULL, CytosinesToMask = NULL, TFBSs = KLF4s, max_interTF_distance = NULL, max_window_width = NULL, min_cluster_width = NULL, fix.window.size = TRUE, max.window.size = 50, cores = 4 ) -> sorting_results }
Convenience for calculating state frequencies
StateQuantification(SortedReads, states)
StateQuantification(SortedReads, states)
SortedReads |
List of sorted reads (can be multiple samples) as returned by either read sorting function (SortReads, SortReadsBySingleTF, SortReadsByTFCluster) |
states |
states reporting the biological interpretation of patterns as return by either SingleTFStates or TFPairStates functions. If NULL (default) will return frequencies without biological interpretation. |
tibble with state frequency information
Methylation = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_1.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsByTFCluster(MethSM = Methylation[[2]], TFBS_cluster = TFBSs) StateQuantification(SortedReads = SortedReads, states = TFPairStates())
Methylation = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_1.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsByTFCluster(MethSM = Methylation[[2]], TFBS_cluster = TFBSs) StateQuantification(SortedReads = SortedReads, states = TFPairStates())
wraps around StateQuantification function
StateQuantificationBySingleTF(SortedReads)
StateQuantificationBySingleTF(SortedReads)
SortedReads |
List of sorted reads (can be multiple samples) as returned by SortReadsBySingleTF (or SortReads run with analogous parameters) |
tibble with state frequency information
Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsBySingleTF(MethSM = Methylation[[2]], TFBS = TFBSs) StateQuantificationBySingleTF(SortedReads = SortedReads)
Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsBySingleTF(MethSM = Methylation[[2]], TFBS = TFBSs) StateQuantificationBySingleTF(SortedReads = SortedReads)
wraps around StateQuantification function
StateQuantificationByTFPair(SortedReads)
StateQuantificationByTFPair(SortedReads)
SortedReads |
List of sorted reads (can be multiple samples) as returned by SortReadsByTFCluster run for clusters of size 2 (or SortReads run with analogous parameters) |
tibble with state frequency information
Methylation = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_1.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsByTFCluster(MethSM = Methylation[[2]], TFBS_cluster = TFBSs) StateQuantificationByTFPair(SortedReads = SortedReads)
Methylation = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_1.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsByTFCluster(MethSM = Methylation[[2]], TFBS_cluster = TFBSs) StateQuantificationByTFPair(SortedReads = SortedReads)
Plot states quantification bar
StateQuantificationPlot(SortedReads, states)
StateQuantificationPlot(SortedReads, states)
SortedReads |
Sorted reads object as returned by SortReads function |
states |
either SingleTFStates() or TFPairStates() |
Bar plot quantifying states
library(GenomicRanges) RegionOfInterest = GRanges("chr12", IRanges(20464551, 20465050)) Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsBySingleTF(MethSM = Methylation[[2]], TFBS = TFBSs) StateQuantificationPlot(SortedReads = SortedReads, states = SingleTFStates())
library(GenomicRanges) RegionOfInterest = GRanges("chr12", IRanges(20464551, 20465050)) Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting")) TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting")) SortedReads = SortReadsBySingleTF(MethSM = Methylation[[2]], TFBS = TFBSs) StateQuantificationPlot(SortedReads = SortedReads, states = SingleTFStates())
Inner utility for LowCoverageMethRateDistribution
SubsetGRangesForSamples(GRanges_obj, Samples)
SubsetGRangesForSamples(GRanges_obj, Samples)
GRanges_obj |
GRanges object as returned by CallContextMethylation function |
Samples |
vector of sample names as they appear in the SampleName field of the QuasR sampleFile |
TF pair state quantification bar
TFPairStateQuantificationPlot(SortedReads)
TFPairStateQuantificationPlot(SortedReads)
SortedReads |
Sorted reads as returned by SortReadsByTFCluster |
Design states for TF pair case
TFPairStates()
TFPairStates()
list of states
TFPairStates()
TFPairStates()