Package 'SingleMoleculeFootprinting'

Title: Analysis tools for Single Molecule Footprinting (SMF) data
Description: SingleMoleculeFootprinting is an R package providing functions to analyze Single Molecule Footprinting (SMF) data. Following the workflow exemplified in its vignette, the user will be able to perform basic data analysis of SMF data with minimal coding effort. Starting from an aligned bam file, we show how to perform quality controls over sequencing libraries, extract methylation information at the single molecule level accounting for the two possible kind of SMF experiments (single enzyme or double enzyme), classify single molecules based on their patterns of molecular occupancy, plot SMF information at a given genomic location
Authors: Guido Barzaghi [aut, cre] , Arnaud Krebs [aut] , Mike Smith [ctb]
Maintainer: Guido Barzaghi <[email protected]>
License: GPL-3
Version: 1.13.0
Built: 2024-06-30 06:33:16 UTC
Source: https://github.com/bioc/SingleMoleculeFootprinting

Help Index


Bait capture efficiency

Description

check bait capture efficiency. Expected to be ~70

Usage

BaitCapture(sampleSheet, genome, baits, clObj = NULL)

Arguments

sampleSheet

QuasR sample sheet

genome

BS genome

baits

Full path to bed file containing bait coordinates. If chromosome names are in e.g. "1" format, they'll be temporarily converted to "chr1"

clObj

cluster object to emply for parallel processing created using the parallel::makeCluster function. Defaults to NULL

Value

bait capture efficiency

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    # DO NOT RUN
    # clObj = parallel::makeCluster(5)
    # BaitRegions = SingleMoleculeFootprintingData::EnrichmentRegions_mm10.rds()
    # BaitCaptureEfficiency = BaitCapture(sampleSheet = Qinput, genome = BSgenome.Mmusculus.UCSC.mm10, baits = BaitRegions, clObj=clObj)
    # parallel::stopCluster(clObj)
}

Summarize methylation inside sorting bins

Description

Summarize methylation inside sorting bins

Usage

BinMethylation(MethSM, TFBS, bin)

Arguments

MethSM

Single molecule matrix

TFBS

Transcription factor binding site to use for sorting, passed as a GRanges object of length 1

bin

vector of two integers representing the coordinate of a bin relative to the center of the TFBS

Value

Reads covering bin with their summarized methylation status

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    QuasRprj = GetQuasRprj(Qinput, BSgenome.Mmusculus.UCSC.mm10)

    MySample = readr::read_delim(Qinput, delim = "\t")$SampleName[1]
    Region_of_interest = GRanges(seqnames = "chr6", ranges = IRanges(start = 88106000, end = 88106500), strand = "*")

    Methylation = CallContextMethylation(sampleSheet = Qinput,
                                     sample = MySample,
                                     genome = BSgenome.Mmusculus.UCSC.mm10,
                                     range = Region_of_interest,
                                     coverage = 20,
                                     ConvRate.thr = 0.2)

    TFBSs = GenomicRanges::GRanges("chr6", IRanges(c(88106253), c(88106263)), strand = "-")
    elementMetadata(TFBSs)$name = c("NRF1")
    names(TFBSs) = c(paste0("TFBS_", c(4305216)))

    binMethylationValues = BinMethylation(MethSM = Methylation[[2]], TFBS = TFBSs, bin = c(-15,15))
}

Call Context Methylation

Description

Can deal with multiple samples

Usage

CallContextMethylation(
  sampleSheet,
  sample,
  genome,
  range,
  coverage = 20,
  ConvRate.thr = 0.2,
  verbose = TRUE
)

Arguments

sampleSheet

QuasR pointer file

sample

for now this works for sure on one sample at the time only

genome

BSgenome

range

GenimocRange representing the genomic region of interest

coverage

coverage threshold. Defaults to 20.

ConvRate.thr

Convesion rate threshold. Double between 0 and 1, defaults to 0.2

verbose

TRUE/FALSE

Value

List with two Granges objects: average methylation call (GRanges) and single molecule methylation call (matrix)

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    QuasRprj = GetQuasRprj(Qinput, BSgenome.Mmusculus.UCSC.mm10)

    MySample = readr::read_delim(Qinput, delim = "\t")$SampleName[1]
    Region_of_interest = GRanges(seqnames = "chr6", ranges = IRanges(start = 88106000, end = 88106500), strand = "*")

    Methylation = CallContextMethylation(sampleSheet = Qinput,
                                     sample = MySample,
                                     genome = BSgenome.Mmusculus.UCSC.mm10,
                                     range = Region_of_interest,
                                     coverage = 20,
                                     ConvRate.thr = 0.2)
}

Collapse strands

Description

Collapse strands

Usage

CollapseStrands(MethGR, context, verbose = TRUE)

Arguments

MethGR

Granges obj of average methylation

context

"GC" or "CG". Broad because indicates just the directionality of collapse.

verbose

TRUE/FALSE

Value

MethGR with collapsed strands (everything turned to - strand)


Collapse strands in single molecule matrix

Description

The idea here is that (regardless of context) if a C is on the - strand, calling getSeq on that coord (N.b. unstranded, that's the important bit) will give a "G', a "C" if it's a + strand.

Usage

CollapseStrandsSM(MethSM, context, genome, chr, verbose = TRUE)

Arguments

MethSM

Single molecule matrix

context

"GC" or "CG". Broad because indicates just the directionality of collapse.

genome

BSgenome

chr

Chromosome, MethSM doesn't carry this info

verbose

TRUE/FALSE

Value

Strand collapsed MethSM


Conversion rate

Description

calculate sequencing library conversion rate on a chromosome of choice

Usage

ConversionRate(sampleSheet, genome, chr = 19, clObj = NULL)

Arguments

sampleSheet

QuasR sample sheet

genome

BS genome

chr

chromosome to calculate conversion rate on (default: 19)

clObj

cluster object to emply for parallel processing created using the parallel::makeCluster function. Defaults to NULL

Value

Conversion rate

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    # DO NOT RUN
    # clObj = parallel::makeCluster(5)
    # ConversionRatePrecision = ConversionRate(sampleSheet = Qinput, genome = BSgenome.Mmusculus.UCSC.mm10, chr = 19, clObj = clObj)
    # parallel::stopCluster(clObj)
}

Filter Cs for coverage

Description

Filter Cs for coverage

Usage

CoverageFilter(MethGR, thr)

Arguments

MethGR

Granges obj of average methylation

thr

converage threshold

Value

filtered MethGR


Detect type of experiment

Description

Detect type of experiment

Usage

DetectExperimentType(Samples, verbose = TRUE)

Arguments

Samples

SampleNames field from QuasR sampleSheet

verbose

TRUE/FALSE

Value

String indicating the type of experiment detected

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")

if(file.exists(Qinput)){
    sample = readr::read_delim(Qinput, delim = "\t")$SampleName
    ExpType = DetectExperimentType(sample)
}

Calculate reads conversion rate

Description

Calculate reads conversion rate

Usage

FilterByConversionRate(MethSM, chr, genome, thr = 0.2, verbose = TRUE)

Arguments

MethSM

as comes out of the func GetSingleMolMethMat

chr

Chromosome, MethSM doesn't carry this info

genome

BSgenome

thr

Double between 0 and 1. Threshold above which to filter reads. Defaults to 0.2

verbose

TRUE/FALSE

Value

Filtered MethSM

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    QuasRprj = GetQuasRprj(Qinput, BSgenome.Mmusculus.UCSC.mm10)

    sample = readr::read_delim(Qinput, delim = "\t")$SampleName
    range = GRanges(seqnames = "chr6", ranges = IRanges(start = 88106000, end = 88106500), strand = "*")

    MethSM = GetSingleMolMethMat(QuasRprj, range, sample)
    MethSM = FilterByConversionRate(MethSM, chr = "chr6", genome = BSgenome.Mmusculus.UCSC.mm10, thr = 0.8)
}

Filter Cytosines in context

Description

Filter Cytosines in context

Usage

FilterContextCytosines(MethGR, genome, context)

Arguments

MethGR

Granges obj of average methylation

genome

BSgenome

context

Context of interest (e.g. "GC", "CG",..)

Value

filtered Granges obj

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    QuasRprj = GetQuasRprj(Qinput, BSgenome.Mmusculus.UCSC.mm10)

    Samples = readr::read_delim(Qinput, delim = "\t")$SampleName
    sample = Samples[1]
    range = GRanges(seqnames = "chr6", ranges = IRanges(start = 88106000, end = 88106500), strand = "*")

    MethGR = QuasR::qMeth(QuasRprj[grep(sample, Samples)], mode="allC", range, collapseBySample = TRUE, keepZero = TRUE)
    FilterContextCytosines(MethGR, BSgenome.Mmusculus.UCSC.mm10, "NGCNN")
}

Fixing overhang before stand collapsing

Description

Fixing overhang before stand collapsing

Usage

FixOverhang(MethGR, context, which)

Arguments

MethGR

Granges obj of average methylation

context

context

which

"Top"|"Bottom"

Value

MethGR with fixed overhang


Get QuasRprj

Description

Get QuasRprj

Usage

GetQuasRprj(sampleSheet, genome)

Arguments

sampleSheet

QuasR pointer file

genome

BSgenome

Value

QuasR project object as returned by QuasR::qAlign function

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    QuasRprj = GetQuasRprj(Qinput, BSgenome.Mmusculus.UCSC.mm10)
}

Get Single Molecule methylation matrix

Description

Used internally as the first step in CallContextMethylation

Usage

GetSingleMolMethMat(QuasRprj, range, sample)

Arguments

QuasRprj

QuasR project object as returned by calling the QuasR function qAling on previously aligned data

range

GenimocRange representing the genomic region of interest

sample

One of the sample names as reported in the SampleName field of the QuasR pointer file provided to qAlign. N.b. all the files with the passed sample name will be used to call methylation

Value

Single molecule methylation matrix (all Cytosines)

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    QuasRprj = GetQuasRprj(Qinput, BSgenome.Mmusculus.UCSC.mm10)

    sample = readr::read_delim(Qinput, delim = "\t")$SampleName
    range = GRanges(seqnames = "chr6", ranges = IRanges(start = 88106000, end = 88106500), strand = "*")

    MethSM = GetSingleMolMethMat(QuasRprj, range, sample)
}

Perform Hierarchical clustering on single reads

Description

Perform Hierarchical clustering on single reads

Usage

HierarchicalClustering(MethSM)

Arguments

MethSM

Single molecule methylation matrix

Value

Single molecule matrix after hierarchical clustering


Design states for single TF case

Description

Design states for single TF case

Usage

OneTFstates()

Value

list of states


Plot average methylation

Description

Plot average methylation

Usage

PlotAvgSMF(MethGR, range, TFBSs)

Arguments

MethGR

Average methylation GRanges obj

range

GRanges interval to plot

TFBSs

GRanges object of transcription factor binding sites to include in the plot. Assumed to be already subset.

Value

Average SMF signal at single site

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    QuasRprj = GetQuasRprj(Qinput, BSgenome.Mmusculus.UCSC.mm10)

    MySample = readr::read_delim(Qinput, delim = "\t")$SampleName[1]
    Region_of_interest = GRanges(seqnames = "chr6", ranges = IRanges(start = 88106000, end = 88106500), strand = "*")

    Methylation = CallContextMethylation(sampleSheet = Qinput,
                                     sample = MySample,
                                     genome = BSgenome.Mmusculus.UCSC.mm10,
                                     range = Region_of_interest,
                                     coverage = 20,
                                     ConvRate.thr = 0.2)

    TFBSs = GenomicRanges::GRanges("chr6", IRanges(c(88106253), c(88106263)), strand = "-")
    elementMetadata(TFBSs)$name = c("NRF1")
    names(TFBSs) = c(paste0("TFBS_", c(4305216)))

    PlotAvgSMF(MethGR = Methylation[[1]], range = Region_of_interest, TFBSs = TFBSs)
}

Plot single molecule stack

Description

Plot single molecule stack

Usage

PlotSingleMoleculeStack(MethSM, range)

Arguments

MethSM

Single molecule methylation matrix

range

GRanges interval to plot

Value

Single molecule plot


Plot SMF data at single site

Description

Plot SMF data at single site

Usage

PlotSingleSiteSMF(
  ContextMethylation,
  sample,
  range,
  SortedReads = NULL,
  TFBSs,
  saveAs = NULL
)

Arguments

ContextMethylation

Context methylation object as returned by CallContextMethylation function

sample

one sample as reported in the SampleName files of the QuasR sampleSheet

range

GRange interval to plot

SortedReads

Defaults to NULL, in which case will plot unsorted reads. Sorted reads object as returned by SortReads function or "HC" to perform hierarchical clustering

TFBSs

GRange or GRangesList of transcription factor binding sites to add to the plot. If SortedReads are passed, the format of TFBSs (GRanges vs GRangesList) will be used to determie if single molecules were sorted based on one or multiple TFs

saveAs

Full path to pdf file to save plot to. Defaults to NULL, in which case will only display

Value

Single site plot including average SMF signal, single molecules stack and state quantification plot

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    QuasRprj = GetQuasRprj(Qinput, BSgenome.Mmusculus.UCSC.mm10)

    MySample = readr::read_delim(Qinput, delim = "\t")$SampleName[1]
    Region_of_interest = GRanges(seqnames = "chr6", ranges = IRanges(start = 88106000, end = 88106500), strand = "*")

    Methylation = CallContextMethylation(sampleSheet = Qinput,
                                     sample = MySample,
                                     genome = BSgenome.Mmusculus.UCSC.mm10,
                                     range = Region_of_interest,
                                     coverage = 20,
                                     ConvRate.thr = 0.2)

    TFBSs = GenomicRanges::GRanges("chr6", IRanges(c(88106253), c(88106263)), strand = "-")
    elementMetadata(TFBSs)$name = c("NRF1")
    names(TFBSs) = c(paste0("TFBS_", c(4305216)))
    SortedReads = SortReadsByTFCluster(MethSM = Methylation[[2]], TFBSs = TFBSs)

    PlotSingleSiteSMF(ContextMethylation = Methylation,
                      sample = MySample,
                      range = Region_of_interest,
                      SortedReads = SortedReads,
                      TFBSs = TFBSs,
                      saveAs = NULL)
}

Wrapper for PlotSingleMoleculeStack function

Description

adds the convenience of arranging reads before plotting

Usage

PlotSM(MethSM, range, SortedReads = NULL)

Arguments

MethSM

Single molecule methylation matrix

range

GRanges interval to plot

SortedReads

Defaults to NULL, in which case will plot unsorted reads. Sorted reads object as returned by SortReads function or "HC" to perform hierarchical clustering

Value

Single molecule stack plot

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    QuasRprj = GetQuasRprj(Qinput, BSgenome.Mmusculus.UCSC.mm10)

    MySample = readr::read_delim(Qinput, delim = "\t")$SampleName[1]
    Region_of_interest = GRanges(seqnames = "chr6", ranges = IRanges(start = 88106000, end = 88106500), strand = "*")

    Methylation = CallContextMethylation(sampleSheet = Qinput,
                                     sample = MySample,
                                     genome = BSgenome.Mmusculus.UCSC.mm10,
                                     range = Region_of_interest,
                                     coverage = 20,
                                     ConvRate.thr = 0.2)

    PlotSM(MethSM = Methylation[[2]], range = Region_of_interest)
}

Intersample correlation

Description

pair plot of sample correlations

Usage

SampleCorrelation(samples, context, CellType)

Arguments

samples

Avg methylation object. Can also be set to "Example" to produce plot using example data of the kind specified by the @param CellType

context

one of "AllCs", "DGCHN", "NWCGW". The first should be chosen for TKO experiments. For experiments carried on WT cells, we recommend checking both the "DGCHN" and "NWCGW" contexts by running this function once per context.

CellType

Cell type to compare your samples to. At the moment, this can be one of "ES", "NP", "TKO".

Value

Inter-sample correlation plot


Single TF state quantification bar

Description

Single TF state quantification bar

Usage

SingleTFStateQuantificationPlot(states, OrderedReads)

Arguments

states

as returned by OneTFstates function

OrderedReads

Reads ordered by states

Value

single TF state quantification plot


Sort reads by single TF

Description

Sort reads by single TF

Usage

SortReads(MethSM, TFBS, BinsCoord, SortByCluster)

Arguments

MethSM

Single molecule matrix

TFBS

Transcription factor binding site to use for sorting

BinsCoord

list of 3 bin coordinates relative to the center of the TFBS.

SortByCluster

T/F

Value

list of sorted reads

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    QuasRprj = GetQuasRprj(Qinput, BSgenome.Mmusculus.UCSC.mm10)

    MySample = readr::read_delim(Qinput, delim = "\t")$SampleName[1]
    Region_of_interest = GRanges(seqnames = "chr6", ranges = IRanges(start = 88106000, end = 88106500), strand = "*")

    Methylation = CallContextMethylation(sampleSheet = Qinput,
                                     sample = MySample,
                                     genome = BSgenome.Mmusculus.UCSC.mm10,
                                     range = Region_of_interest,
                                     coverage = 20,
                                     ConvRate.thr = 0.2)

    TFBSs = GenomicRanges::GRanges("chr6", IRanges(c(88106253), c(88106263)), strand = "-")
    elementMetadata(TFBSs)$name = c("NRF1")
    names(TFBSs) = c(paste0("TFBS_", c(4305216)))
    BinsCoord = list(c(-35,-25), c(-15,15), c(25,35))

    SortedReads = SortReads(Methylation[[2]], TFBSs, BinsCoord, SortByCluster = FALSE)
}

Wrapper to SortReads for single TF case

Description

Wrapper to SortReads for single TF case

Usage

SortReadsBySingleTF(MethSM, TFBS)

Arguments

MethSM

Single molecule matrix

TFBS

Transcription factor binding site to use for sorting, passed as a GRanges object of length 1

Value

List of reads sorted by single TF

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    QuasRprj = GetQuasRprj(Qinput, BSgenome.Mmusculus.UCSC.mm10)

    MySample = readr::read_delim(Qinput, delim = "\t")$SampleName[1]
    Region_of_interest = GRanges(seqnames = "chr6", ranges = IRanges(start = 88106000, end = 88106500), strand = "*")

    Methylation = CallContextMethylation(sampleSheet = Qinput,
                                     sample = MySample,
                                     genome = BSgenome.Mmusculus.UCSC.mm10,
                                     range = Region_of_interest,
                                     coverage = 20,
                                     ConvRate.thr = 0.2)

    TFBSs = GenomicRanges::GRanges("chr6", IRanges(c(88106253), c(88106263)), strand = "-")
    elementMetadata(TFBSs)$name = c("NRF1")
    names(TFBSs) = c(paste0("TFBS_", c(4305216)))

    SortedReads = SortReadsBySingleTF(MethSM = Methylation[[2]], TFBS = TFBSs)
}

Wrapper to SortReads for TF cluster case

Description

Wrapper to SortReads for TF cluster case

Usage

SortReadsByTFCluster(MethSM, TFBSs)

Arguments

MethSM

Single molecule matrix

TFBSs

Transcription factor binding sites to use for sorting, passed as a GRanges object of length > 1

Value

List of reads sorted by TF cluster

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    QuasRprj = GetQuasRprj(Qinput, BSgenome.Mmusculus.UCSC.mm10)

    MySample = readr::read_delim(Qinput, delim = "\t")$SampleName[1]
    Region_of_interest = GRanges(seqnames = "chr6", ranges = IRanges(start = 88106000, end = 88106500), strand = "*")

    Methylation = CallContextMethylation(sampleSheet = Qinput,
                                     sample = MySample,
                                     genome = BSgenome.Mmusculus.UCSC.mm10,
                                     range = Region_of_interest,
                                     coverage = 20,
                                     ConvRate.thr = 0.2)

    TFBSs = GenomicRanges::GRanges("chr6", IRanges(c(88106253), c(88106263)), strand = "-")
    elementMetadata(TFBSs)$name = c("NRF1")
    names(TFBSs) = c(paste0("TFBS_", c(4305216)))

    SortedReads = SortReadsByTFCluster(MethSM = Methylation[[2]], TFBSs = TFBSs)
}

Plot states quantification bar

Description

Plot states quantification bar

Usage

StateQuantificationPlot(SortedReads)

Arguments

SortedReads

Sorted reads object as returned by SortReads function

Value

Bar plot quantifying states

Examples

Qinput = paste0(tempdir(), "/NRF1Pair_Qinput.txt")
library(BSgenome.Mmusculus.UCSC.mm10)

if(file.exists(Qinput)){
    QuasRprj = GetQuasRprj(Qinput, BSgenome.Mmusculus.UCSC.mm10)

    MySample = readr::read_delim(Qinput, delim = "\t")$SampleName[1]
    Region_of_interest = GRanges(seqnames = "chr6", ranges = IRanges(start = 88106000, end = 88106500), strand = "*")

    Methylation = CallContextMethylation(sampleSheet = Qinput,
                                     sample = MySample,
                                     genome = BSgenome.Mmusculus.UCSC.mm10,
                                     range = Region_of_interest,
                                     coverage = 20,
                                     ConvRate.thr = 0.2)

    TFBSs = GenomicRanges::GRanges("chr6", IRanges(c(88106253), c(88106263)), strand = "-")
    elementMetadata(TFBSs)$name = c("NRF1")
    names(TFBSs) = c(paste0("TFBS_", c(4305216)))

    SortedReads = SortReadsByTFCluster(MethSM = Methylation[[2]], TFBSs = TFBSs)
    StateQuantificationPlot(SortedReads = SortedReads)
}

TF pair state quantification bar

Description

TF pair state quantification bar

Usage

TFPairStateQuantificationPlot(states, OrderedReads)

Arguments

states

as returned by TFpairStates function

OrderedReads

Reads ordered by states

Value

TF pair state quantification plot


Design states for TF pair case

Description

Design states for TF pair case

Usage

TFpairStates()

Value

list of states