Title: | R package for analysing TFBSs in DNA sequences |
---|---|
Description: | 'motifcounter' provides motif matching, motif counting and motif enrichment functionality based on position frequency matrices. The main features of the packages include the utilization of higher-order background models and accounting for self-overlapping motif matches when determining motif enrichment. The background model allows to capture dinucleotide (or higher-order nucleotide) composition adequately which may reduced model biases and misleading results compared to using simple GC background models. When conducting a motif enrichment analysis based on the motif match count, the package relies on a compound Poisson distribution or alternatively a combinatorial model. These distribution account for self-overlapping motif structures as exemplified by repeat-like or palindromic motifs, and allow to determine the p-value and fold-enrichment for a set of observed motif matches. |
Authors: | Wolfgang Kopp [aut, cre] |
Maintainer: | Wolfgang Kopp <[email protected]> |
License: | GPL-2 |
Version: | 1.31.0 |
Built: | 2024-11-27 04:41:50 UTC |
Source: | https://github.com/bioc/motifcounter |
The package provides functions for determining the positions of motif hits as well as motif hit enrichment for a given position frequency matrix (PFM) in a DNA sequence of interest. The following examples guides you through the main functions of the 'motifcounter' package.
For an analysis with 'motifcounter',
the user is required to provide 1) a PFM,
2) a DNA sequence which is used to estimate
a background model (see link{readBackground}
),
3) a DNA sequence of interest that shall be scanned for motif hits
(can be the same as the one used for point 2),
and 4) (optionally) a desired false positive probability of motif hits in
random DNA sequences (see motifcounterOptions
).
Package: | motifcounter |
Type: | Package |
Version: | 1.0 |
Date: | 2016-11-04 |
License: | GPL-2 |
Wolfgang Kopp
Maintainer: Wolfgang Kopp <[email protected]>
# Load sequences file = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(file) # Estimate an order-1 background model order = 1 bg = readBackground(seqs, order) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Normalize the motif # Normalization is sometimes necessary to prevent zeros in # the motif motif = normalizeMotif(motif) # Use subset of the sequences seqs = seqs[1:10] # Optionally, set the false positive probability #alpha=0.001 # is also the default #motifcounterOptions(alpha) # Investigate the per-position and per-strand scores in a given sequence scores = scoreSequence(seqs[[1]], motif, bg) # Investigate the per-position and per-strand motif hits in a given sequence hits = motifHits(seqs[[1]], motif, bg) # Determine the average score profile across a set of sequences scores = scoreProfile(seqs, motif, bg) # Determine the average motif hit profile across a set of sequences hits = motifHitProfile(seqs, motif, bg) # Determine the empirical score distribution scoreHistogram(seqs, motif, bg) # Determine the theoretical score distribution in random sequences scoreDist(motif, bg) # Determine the motif hit enrichment in a set of DNA sequences # 1. Use the compound Poisson approximation # and scan only a single strand for motif hits result = motifEnrichment(seqs, motif, bg, singlestranded = TRUE, method = "compound") # Determine the motif hit enrichment in a set of DNA sequences # 2. Use the compound Poisson approximation # and scan both strands for motif hits result = motifEnrichment(seqs, motif, bg, singlestranded = FALSE, method = "compound") # Determine the motif hit enrichment in a set of DNA sequences # 3. Use the combinatorial model # and scan both strands for motif hits result = motifEnrichment(seqs, motif, bg, singlestranded = FALSE, method = "combinatorial")
# Load sequences file = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(file) # Estimate an order-1 background model order = 1 bg = readBackground(seqs, order) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Normalize the motif # Normalization is sometimes necessary to prevent zeros in # the motif motif = normalizeMotif(motif) # Use subset of the sequences seqs = seqs[1:10] # Optionally, set the false positive probability #alpha=0.001 # is also the default #motifcounterOptions(alpha) # Investigate the per-position and per-strand scores in a given sequence scores = scoreSequence(seqs[[1]], motif, bg) # Investigate the per-position and per-strand motif hits in a given sequence hits = motifHits(seqs[[1]], motif, bg) # Determine the average score profile across a set of sequences scores = scoreProfile(seqs, motif, bg) # Determine the average motif hit profile across a set of sequences hits = motifHitProfile(seqs, motif, bg) # Determine the empirical score distribution scoreHistogram(seqs, motif, bg) # Determine the theoretical score distribution in random sequences scoreDist(motif, bg) # Determine the motif hit enrichment in a set of DNA sequences # 1. Use the compound Poisson approximation # and scan only a single strand for motif hits result = motifEnrichment(seqs, motif, bg, singlestranded = TRUE, method = "compound") # Determine the motif hit enrichment in a set of DNA sequences # 2. Use the compound Poisson approximation # and scan both strands for motif hits result = motifEnrichment(seqs, motif, bg, singlestranded = FALSE, method = "compound") # Determine the motif hit enrichment in a set of DNA sequences # 3. Use the combinatorial model # and scan both strands for motif hits result = motifEnrichment(seqs, motif, bg, singlestranded = FALSE, method = "combinatorial")
Objects of this class serve as a container that holds parameters for the Background model.
A Background model is constructed
via readBackground
.
station
Stationary probabilities
trans
Transition probabilities
counts
k-mer counts
order
Background model order
This function approximates the distribution of the clump sizes.
clumpSizeDist(maxclump, overlap, method = "kopp")
clumpSizeDist(maxclump, overlap, method = "kopp")
maxclump |
Maximal clump size |
overlap |
An Overlap object. |
method |
String that defines which method shall be invoked: 'pape' or 'kopp' (see description). Default: method = 'kopp'. |
The clump size distribution can be determined in two alternative ways:
A re-implemented version of the algorithm that was described in Pape et al. Compound poisson approximation of the number of occurrences of a position frequency matrix (PFM) on both strands. 2008 can be invoked using method='pape'.
An improved approximation of the clump size distribution uses more appropriate statistical assumptions concerning overlapping motif hits and that can be used with order-d background models as well. The improved version is used by default with method='kopp'.
List containing
Distribution of the clump size
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Load background model bg = readBackground(seqs, 1) # Use 100 individual sequences of length 150 bp each seqlen = rep(150, 100) # Compute overlapping probabilities # for scanning the forward DNA strand only op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Computes the compound Poisson distribution dist = motifcounter:::clumpSizeDist(20, op)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Load background model bg = readBackground(seqs, 1) # Use 100 individual sequences of length 150 bp each seqlen = rep(150, 100) # Compute overlapping probabilities # for scanning the forward DNA strand only op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Computes the compound Poisson distribution dist = motifcounter:::clumpSizeDist(20, op)
This function approxmiates the distribution of the number of motif hits. To this end, it sums over all combinations of obtaining k hits in a random sequence of a given length using an efficient dynamic programming algorithm.
combinatorialDist(seqlen, overlap)
combinatorialDist(seqlen, overlap)
seqlen |
Integer-valued vector that defines the lengths of the
individual sequences. For a given DNAStringSet,
this information can be retrieved using |
overlap |
An Overlap object. |
This function is an alternative to compoundPoissonDist
which requires fixed-length sequences and currently
only supports the computation of
the distribution of the number of hits when both
DNA strands are scanned for motif hits.
List containing
Distribution of the number of hits
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Load background model bg = readBackground(seqs, 1) # Compute overlap probabilities op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Use 2 sequences of length 100 bp each seqlen = rep(100, 2) # Computes the combinatorial distribution of the number of motif hits dist = motifcounter:::combinatorialDist(seqlen, op)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Load background model bg = readBackground(seqs, 1) # Compute overlap probabilities op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Use 2 sequences of length 100 bp each seqlen = rep(100, 2) # Computes the combinatorial distribution of the number of motif hits dist = motifcounter:::combinatorialDist(seqlen, op)
This function approximates the distribution of the number of motif hits that emerges from a random DNA sequence of a given length.
compoundPoissonDist(seqlen, overlap, method = "kopp")
compoundPoissonDist(seqlen, overlap, method = "kopp")
seqlen |
Integer-valued vector that defines the lengths of the
individual sequences. For a given DNAStringSet,
this information can be retrieved using |
overlap |
An Overlap object. |
method |
String that defines which method shall be invoked: 'pape' or 'kopp' (see description). Default: method = 'kopp'. |
The distribution can be determined in two alternative ways:
A re-implemented version of the algorithm that was described in Pape et al. Compound poisson approximation of the number of occurrences of a position frequency matrix (PFM) on both strands. 2008 can be invoked using method='pape'. The main purpose of this implementation concerns benchmarking an improved approximation. In contrast to the original model, this implementation can be used with general order-d Markov models.
We provide an improved compound Poisson approximation that
uses more appropriate statistical assumptions concerning
overlapping motif hits and that can be used with order-d
background models as well. The improved version is used by default
with method='kopp'.
Note: Only method='kopp' supports the computation
of the distribution of the number of motif hits w.r.t. scanning
a single DNA strand (see probOverlapHit
).
List containing
Distribution of the number of hits
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Load background model bg = readBackground(seqs, 1) # Use 100 individual sequences of length 150 bp each seqlen = rep(150, 100) # Compute overlapping probabilities # for scanning the forward DNA strand only op = motifcounter:::probOverlapHit(motif, bg, singlestranded = TRUE) # Computes the compound Poisson distribution dist = motifcounter:::compoundPoissonDist(seqlen, op) #plot(1:length(dist$dist)-1, dist$dist) # Compute overlapping probabilities # for scanning the forward DNA strand only op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Computes the compound Poisson distribution dist = motifcounter:::compoundPoissonDist(seqlen, op) #plot(1:length(dist$dist)-1, dist$dist)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Load background model bg = readBackground(seqs, 1) # Use 100 individual sequences of length 150 bp each seqlen = rep(150, 100) # Compute overlapping probabilities # for scanning the forward DNA strand only op = motifcounter:::probOverlapHit(motif, bg, singlestranded = TRUE) # Computes the compound Poisson distribution dist = motifcounter:::compoundPoissonDist(seqlen, op) #plot(1:length(dist$dist)-1, dist$dist) # Compute overlapping probabilities # for scanning the forward DNA strand only op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Computes the compound Poisson distribution dist = motifcounter:::compoundPoissonDist(seqlen, op) #plot(1:length(dist$dist)-1, dist$dist)
This function leverages a Markov model in order to determine the clump start probability. The computation depends on the selected false positive probability for calling motif matches 'alpha' and the pre-determined overlapping match probabilities 'beta'.
computeClumpStartProb(overlap)
computeClumpStartProb(overlap)
overlap |
An Overlap object. |
The general idea of the method relies on the fact that for the stationary distribution of the Markov model, motif matches must be observed with probability 'alpha'. Hence, the clump start probability 'tau' is optimized to achieve that goal.
The R interface is only used for the purpose of testing the correctness of the model.
Clump start probability 'tau'
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Load background model bg = readBackground(seqs, 1) # Compute overlap probabilities op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Computes the clump start probability dist = motifcounter:::computeClumpStartProb(op)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Load background model bg = readBackground(seqs, 1) # Compute overlap probabilities op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Computes the clump start probability dist = motifcounter:::computeClumpStartProb(op)
This function generates a random DNAString of a given length by sampling from the background model.
generateDNAString(len, bg)
generateDNAString(len, bg)
len |
Integer length of the sequence |
bg |
A Background object |
A DNAString object
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Generate a 1 kb random sequence motifcounter:::generateDNAString(1000, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Generate a 1 kb random sequence motifcounter:::generateDNAString(1000, bg)
This function generates a DNAStringSet-object of the given individual sequence lengths by sampling from the background model.
generateDNAStringSet(seqlen, bg)
generateDNAStringSet(seqlen, bg)
seqlen |
Integer-valued vector that defines the lengths of the
individual sequences. For a given DNAStringSet,
this information can be retrieved using |
bg |
A Background object |
A DNAStringSet object
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Generate random sequences of various lengths motifcounter:::generateDNAStringSet(10:50, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Generate random sequences of various lengths motifcounter:::generateDNAStringSet(10:50, bg)
Accessor to slot alpha
getAlpha(obj)
getAlpha(obj)
obj |
An Overlap object |
alpha slot
Accessor to slot beta
getBeta(obj)
getBeta(obj)
obj |
An Overlap object |
beta slot
Accessor to slot beta3p
getBeta3p(obj)
getBeta3p(obj)
obj |
An Overlap object |
beta3p slot
Accessor to slot beta
getBeta5p(obj)
getBeta5p(obj)
obj |
An Overlap object |
beta5p slot
Accessor to slot counts
getCounts(obj)
getCounts(obj)
obj |
A Background object |
counts slot
Accessor to slot gamma
getGamma(obj)
getGamma(obj)
obj |
An Overlap object |
gamma slot
Accessor to slot order
getOrder(obj)
getOrder(obj)
obj |
A Background object |
order slot
Accessor to slot singlestranded
getSinglestranded(obj)
getSinglestranded(obj)
obj |
An Overlap object |
singlestranded slot
Accessor to slot station
getStation(obj)
getStation(obj)
obj |
A Background object |
station slot
Accessor to slot trans
getTrans(obj)
getTrans(obj)
obj |
A Background object |
trans slot
This function computes the per-position motif matches in a given DNA strand.
hitStrand(seq, pfm, bg, threshold = NULL)
hitStrand(seq, pfm, bg, threshold = NULL)
seq |
A DNAString object |
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
threshold |
Score threshold for calling motif matches. If NULL, the threshold will determined from alpha. |
The function returns the per-position scores for the given strand. If the sequence is too short, it contains an empty vector.
Vector of motif hits on the given strand
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the per-position and per-strand scores motifcounter:::hitStrand(seqs[[1]], motif, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the per-position and per-strand scores motifcounter:::hitStrand(seqs[[1]], motif, bg)
The function returns a vector containing the lengths of each sequence contained in a set of sequences. Sequences containing 'N' or 'n' are skipped from the analysis and are set to length zero.
lenSequences(seqs)
lenSequences(seqs)
seqs |
A DNAStringSet object |
A vector containing the lengths of each individual sequences
# Load sequences file = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(file) # Retrieve sequence lengths motifcounter:::lenSequences(seqs)
# Load sequences file = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(file) # Retrieve sequence lengths motifcounter:::lenSequences(seqs)
This function implements the Markov model for producing motif matches. The function takes a state probability vector and uses the transition probabilities in order to obtain the state probability at the next time point. This function is used used to determine the stationary distribution of the states.
markovModel(overlap, nsteps = 1)
markovModel(overlap, nsteps = 1)
overlap |
An Overlap object. |
nsteps |
Number of state transitions to perform |
The R interface is only used for the purpose of testing the correctness of the model.
List containing
State probability distribution after the given number of steps
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Load background model bg = readBackground(seqs, 1) # Compute overlap probabilities op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Computes the state probabilities of the Markov model # (default: after one step) dist = motifcounter:::markovModel(op)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Load background model bg = readBackground(seqs, 1) # Compute overlap probabilities op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Computes the state probabilities of the Markov model # (default: after one step) dist = motifcounter:::markovModel(op)
This function checks if the PFM x background combination is valid. The function throws an error if this is not the case.
motifAndBackgroundValid(pfm, bg)
motifAndBackgroundValid(pfm, bg)
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
None
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x1.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Check validity motifcounter:::motifAndBackgroundValid(motif, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x1.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Check validity motifcounter:::motifAndBackgroundValid(motif, bg)
This function sets some global parameters for the 'motifcounter' package.
motifcounterOptions(alpha = 0.001, gran = 0.1, ncores = 1)
motifcounterOptions(alpha = 0.001, gran = 0.1, ncores = 1)
alpha |
Numeric False positive probabililty for calling motif hits by chance. Default: alpha = 0.001 |
gran |
Numeric score granularity which is used for discretizing the score range. Default: gran = 0.1 |
ncores |
Interger number of cores used for parallel processing, if openMP is available. Default: ncores = 1 |
alpha=0.001 amounts to calling one motif hit per strand by chance in a sequence of length 1000 bp. Decreasing gran will increase number of discrete bins that represent the real-valued score range. This will yield more a accurate score distribution due to less discretization noise, however, it incurs an increase of the computational burden.
None
# Prescribe motifcounter Options motifcounterOptions(alpha = 0.001, gran = 0.1, ncores = 1)
# Prescribe motifcounter Options motifcounterOptions(alpha = 0.001, gran = 0.1, ncores = 1)
This function determines whether a given motif is enriched in a given DNA sequences.
motifEnrichment(seqs, pfm, bg, singlestranded = FALSE, method = "compound")
motifEnrichment(seqs, pfm, bg, singlestranded = FALSE, method = "compound")
seqs |
A DNAStringSet or DNAString object |
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
singlestranded |
Boolean that indicates whether a single strand or both strands shall be scanned for motif hits. Default: singlestranded = FALSE. |
method |
String that defines whether to use the 'compound' Poisson approximation' or the 'combinatorial' model. Default: method='compound'. |
Enrichment is tested by comparing the observed number of motif hits against a theoretical distribution of the number of motif hits in random DNA sequences. Optionally, the theoretical distribution of the number of motif hits can be evaluated by either a 'compound Poisson model' or the 'combinatorial model'. Additionally, the enrichment test can be conducted with respect to scanning only the forward strand or both strands of the DNA sequences. The latter option is only available for the 'compound Poisson model'
List that contains
P-value for the enrichment test
Fold-enrichment with respect to the expected number of hits
compoundPoissonDist
, combinatorialDist
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # 1 ) Motif enrichment test w.r.t. scanning a *single* DNA strand # based on the 'Compound Poisson model' result = motifEnrichment(seqs, motif, bg, singlestranded = TRUE, method = "compound") # 2 ) Motif enrichment test w.r.t. scanning *both* DNA strand # based on the 'Compound Poisson model' result = motifEnrichment(seqs, motif, bg, method = "compound") # 3 ) Motif enrichment test w.r.t. scanning *both* DNA strand # based on the *combinatorial model* result = motifEnrichment(seqs, motif, bg, singlestranded = FALSE, method = "combinatorial")
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # 1 ) Motif enrichment test w.r.t. scanning a *single* DNA strand # based on the 'Compound Poisson model' result = motifEnrichment(seqs, motif, bg, singlestranded = TRUE, method = "compound") # 2 ) Motif enrichment test w.r.t. scanning *both* DNA strand # based on the 'Compound Poisson model' result = motifEnrichment(seqs, motif, bg, method = "compound") # 3 ) Motif enrichment test w.r.t. scanning *both* DNA strand # based on the *combinatorial model* result = motifEnrichment(seqs, motif, bg, singlestranded = FALSE, method = "combinatorial")
This function computes the per-position average motif hit profile across a set of fixed-length DNA sequences. It can be used to reveal positional constraints of TFBSs.
motifHitProfile(seqs, pfm, bg)
motifHitProfile(seqs, pfm, bg)
seqs |
A DNAStringSet or DNAString object |
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
List containing
Per-position average forward strand motif hits
Per-position average reverse strand motif hits
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) seqs = seqs[1:10] # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the motif hit profile motifHitProfile(seqs, motif, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) seqs = seqs[1:10] # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the motif hit profile motifHitProfile(seqs, motif, bg)
This function determines per-position motif hits in a given DNA sequence.
motifHits(seq, pfm, bg, threshold = NULL)
motifHits(seq, pfm, bg, threshold = NULL)
seq |
A DNAString object |
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
threshold |
Score threshold for calling motif matches. If NULL, the threshold will determined from alpha. |
List containing
Per-position motif hits on the forward strand
Per-position motif hits on the reverse strand
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seq = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seq, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Determine the motif hits motifHits(seq[[1]], motif, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seq = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seq, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Determine the motif hits motifHits(seq[[1]], motif, bg)
This function checks if the PFM is valid. The function throws an error if the R matrix does not represent a PFM.
motifValid(pfm)
motifValid(pfm)
pfm |
An R matrix that represents a position frequency matrix |
None
# Load motif motiffile = system.file("extdata", "x1.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Check validity motifcounter:::motifValid(motif)
# Load motif motiffile = system.file("extdata", "x1.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Check validity motifcounter:::motifValid(motif)
This function normalizes a PFM and optionally adds pseudo-evidence to each entry of the matrix.
normalizeMotif(pfm, pseudo = 0.01)
normalizeMotif(pfm, pseudo = 0.01)
pfm |
An R matrix that represents a position frequency matrix |
pseudo |
Small numeric pseudo-value that is added to each entry in the PFM in order to ensure strictly positive entries. Default: pseudo = 0.01 |
A normalized PFM
# Load motif motiffile = system.file("extdata", "x1.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Normalize motif new_motif = normalizeMotif(motif)
# Load motif motiffile = system.file("extdata", "x1.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Normalize motif new_motif = normalizeMotif(motif)
This function counts the number of motif hits that are found in a given set of DNA sequences.
numMotifHits(seqs, pfm, bg, singlestranded = FALSE)
numMotifHits(seqs, pfm, bg, singlestranded = FALSE)
seqs |
A DNAStringSet or DNAString object |
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
singlestranded |
Boolean that indicates whether a single strand or both strands shall be scanned for motif hits. Default: singlestranded = FALSE. |
Optionally, it can be used to count motif hits on one or both strands, respectively.
A list containing
Number of individual sequences
Vector of individual sequence lengths
Vector of the number of hits in each individual sequence
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Count motif hits both strands noc = motifcounter:::numMotifHits(seqs, motif, bg) noc$numofhits # Count motif hits on a single strand noc = motifcounter:::numMotifHits(seqs, motif, bg, singlestranded = TRUE) noc$numofhits
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Count motif hits both strands noc = motifcounter:::numMotifHits(seqs, motif, bg) noc$numofhits # Count motif hits on a single strand noc = motifcounter:::numMotifHits(seqs, motif, bg, singlestranded = TRUE) noc$numofhits
Objects of this class serve as a container that holds parameters for the overlapping hit probabilities.
An Overlap object is constructed via the probOverlapHit
alpha
Scalar numeric significance level to call motif matches
beta
Numeric vector of principal overlapping hit probabilities on the same strand.
beta3p
Numeric vector of principal overlapping hit probabilities with 3'-overlap.
beta5p
Numeric vector of principal overlapping hit probabilities with 5'-overlap.
gamma
Numeric vector of marginal overlapping hit probabilities.
singlestranded
logical flag to indicate whether one or both strands are scanned for motif matches.
This function computes a set of self-overlapping probabilites for a motif and background model.
probOverlapHit(pfm, bg, singlestranded = FALSE)
probOverlapHit(pfm, bg, singlestranded = FALSE)
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
singlestranded |
Boolean that indicates whether a single strand or both strands shall be scanned for motif hits. Default: singlestranded = FALSE. |
The 'gamma's are determined based on two-dimensional score distributions (similar as described in Pape et al. 2008), however, they are computed based on an order-d background model. On the other hand, the 'beta's represent overlapping hit probabilities that were corrected for intermediate hits.
An Overlap object
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute overlapping hit probabilities for scanning both DNA strands op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Compute overlapping hit probabilities for scanning a single DNA strand op = motifcounter:::probOverlapHit(motif, bg, singlestranded = TRUE)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute overlapping hit probabilities for scanning both DNA strands op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Compute overlapping hit probabilities for scanning a single DNA strand op = motifcounter:::probOverlapHit(motif, bg, singlestranded = TRUE)
Given a set of DNA sequences and an order, this function estimates an order-d Markov model which is used to characterize random DNA sequences.
readBackground(seqs, order = 1)
readBackground(seqs, order = 1)
seqs |
A DNAStringSet object |
order |
Order of the Markov models that shall be used as the background model. Default: order = 1. |
A Background object
# Load sequences file = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(file) # Estimate an order-1 Markov model bg = readBackground(seqs, 1)
# Load sequences file = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(file) # Estimate an order-1 Markov model bg = readBackground(seqs, 1)
This function computes the reverse complement of a given PFM.
revcompMotif(pfm)
revcompMotif(pfm)
pfm |
An R matrix that represents a position frequency matrix |
Reverse complemented PFM
# Load motif motiffile = system.file("extdata", "x1.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Reverse complement motif revcompmotif = motifcounter:::revcompMotif(motif)
# Load motif motiffile = system.file("extdata", "x1.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Reverse complement motif revcompmotif = motifcounter:::revcompMotif(motif)
This function computes the score distribution for the given PFM and background. The Score distribution is computed based on an efficient dynamic programming algorithm.
scoreDist(pfm, bg)
scoreDist(pfm, bg)
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
List that contains
Vector of scores
Score distribution
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the score distribution dp = scoreDist(motif, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the score distribution dp = scoreDist(motif, bg)
This function computes the score distribution for a given PFM and a background model.
scoreDistBf(pfm, bg)
scoreDistBf(pfm, bg)
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
The result of this function is identical to scoreDist
,
however, the method employs a less efficient algorithm that
enumerates all DNA sequences of the length of the motif.
This function is only used for debugging and testing purposes
and might require substantial computational
resources for long motifs.
List containing
Vector of scores
Score distribution
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the score distribution dp = motifcounter:::scoreDistBf(motif, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the score distribution dp = motifcounter:::scoreDistBf(motif, bg)
This function estimates the empirical score distribution on a set of randomly generated DNA sequences based on the background model. This function is only used for benchmarking analysis.
scoreDistEmpirical(pfm, bg, seqlen, nsim)
scoreDistEmpirical(pfm, bg, seqlen, nsim)
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
seqlen |
Integer-valued vector that defines the lengths of the
individual sequences. For a given DNAStringSet,
this information can be retrieved using |
nsim |
Integer number of random samples. |
List containing
Vector of scores
Score distribution
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compoute the empirical score distribution in # sequences of length 1kb using 1000 samples motifcounter:::scoreDistEmpirical(motif, bg, seqlen = 1000, nsim = 1000)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compoute the empirical score distribution in # sequences of length 1kb using 1000 samples motifcounter:::scoreDistEmpirical(motif, bg, seqlen = 1000, nsim = 1000)
This function computes the empirical score distribution for a given set of DNA sequences.
scoreHistogram(seqs, pfm, bg)
scoreHistogram(seqs, pfm, bg)
seqs |
A DNAStringSet or DNAString object |
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
It can be used to compare the empirical score
distribution against the theoretical one (see scoreDist
).
List containing
Vector of scores
Score distribution
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the empirical score histogram scoreHistogram(seqs, motif, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the empirical score histogram scoreHistogram(seqs, motif, bg)
This function computes the empirical score distribution by normalizing the observed score histogram for a given sequence.
scoreHistogramSingleSeq(seq, pfm, bg)
scoreHistogramSingleSeq(seq, pfm, bg)
seq |
A DNAString object |
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
List containing
Vector of scores
Score distribution
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the per-position and per-strand scores motifcounter:::scoreHistogramSingleSeq(seqs[[1]], motif, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the per-position and per-strand scores motifcounter:::scoreHistogramSingleSeq(seqs[[1]], motif, bg)
This function computes the per-position and per-strand average score profiles across a set of DNA sequences. It can be used to reveal positional constraints of TFBSs.
scoreProfile(seqs, pfm, bg)
scoreProfile(seqs, pfm, bg)
seqs |
A DNAStringSet or DNAString object |
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
List containing
Vector of per-position average forward strand scores
Vector of per-position average reverse strand scores
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the score profile scoreProfile(seqs, motif, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the score profile scoreProfile(seqs, motif, bg)
This function computes the per-position and per-strand score in a given DNA sequence.
scoreSequence(seq, pfm, bg)
scoreSequence(seq, pfm, bg)
seq |
A DNAString object |
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
List containing
Vector of scores on the forward strand
Vector of scores on the reverse strand
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the per-position and per-strand scores scoreSequence(seqs[[1]], motif, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the per-position and per-strand scores scoreSequence(seqs[[1]], motif, bg)
This function computes the per-position score in a given DNA strand.
scoreStrand(seq, pfm, bg)
scoreStrand(seq, pfm, bg)
seq |
A DNAString object |
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
The function returns the per-position scores for the given strand. If the sequence is too short, it contains an empty vector.
Vector of scores on the given strand
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the per-position and per-strand scores motifcounter:::scoreStrand(seqs[[1]], motif, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the per-position and per-strand scores motifcounter:::scoreStrand(seqs[[1]], motif, bg)
This function computes the score threshold for a desired false positive probability 'alpha'.
scoreThreshold(pfm, bg)
scoreThreshold(pfm, bg)
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
Note that the returned alpha usually differs slightly
from the one that is prescribed using
motifcounterOptions
, because
of the discrete nature of the sequences.
List containing
Score threshold
False positive probability
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the score threshold motifcounter:::scoreThreshold(motif, bg)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Compute the score threshold motifcounter:::scoreThreshold(motif, bg)
This function returns the current false positive level for calling motif hits in random sequences.
sigLevel()
sigLevel()
The returned value is usually slightly smaller than the prescribed 'alpha' in 'motifcounterOptions', because of the discrete nature of sequences.
False positive probability
motifcounter:::sigLevel()
motifcounter:::sigLevel()
This function repeatedly simulates random DNA sequences according to the background model and subsequently counts the number of k-clump occurrences, where denotes the clump size. This function is only used for benchmarking analysis.
simulateClumpSizeDist(pfm, bg, seqlen, nsim = 10, singlestranded = FALSE)
simulateClumpSizeDist(pfm, bg, seqlen, nsim = 10, singlestranded = FALSE)
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
seqlen |
Integer-valued vector that defines the lengths of the
individual sequences. For a given DNAStringSet,
this information can be retrieved using |
nsim |
Integer number of random samples. |
singlestranded |
Boolean that indicates whether a single strand or both strands shall be scanned for motif hits. Default: singlestranded = FALSE. |
A List that contains
Empirical distribution of the clump sizes
compoundPoissonDist
,combinatorialDist
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Study the clump size frequencies in one sequence of length 1 Mb seqlen = 1000000 # scan both strands simc = motifcounter:::simulateClumpSizeDist(motif, bg, seqlen) # scan a single strand simc = motifcounter:::simulateClumpSizeDist(motif, bg, seqlen, singlestranded = TRUE)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Study the clump size frequencies in one sequence of length 1 Mb seqlen = 1000000 # scan both strands simc = motifcounter:::simulateClumpSizeDist(motif, bg, seqlen) # scan a single strand simc = motifcounter:::simulateClumpSizeDist(motif, bg, seqlen, singlestranded = TRUE)
This function repeatedly simulates random DNA sequences according to the background model and subsequently counts how many motif hits occur in them. Thus, this function gives rise to the empirical distribution of the number of motif hits. This function is only used for benchmarking analysis.
simulateNumHitsDist(pfm, bg, seqlen, nsim, singlestranded = FALSE)
simulateNumHitsDist(pfm, bg, seqlen, nsim, singlestranded = FALSE)
pfm |
An R matrix that represents a position frequency matrix |
bg |
A Background object |
seqlen |
Integer-valued vector that defines the lengths of the
individual sequences. For a given DNAStringSet,
this information can be retrieved using |
nsim |
Integer number of random samples. |
singlestranded |
Boolean that indicates whether a single strand or both strands shall be scanned for motif hits. Default: singlestranded = FALSE. |
A List that contains
Empirical distribution of the number of motif hits
compoundPoissonDist
,combinatorialDist
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Study the counts in one sequence of length 150 bp seqlen = rep(150, 1) # Compute empirical distribution of the number of motif hits # by scanning both strands using 100 samples simc = motifcounter:::simulateNumHitsDist(motif, bg, seqlen, nsim = 100, singlestranded = FALSE) # Compute empirical distribution of the number of motif hits # by scanning a single strand using 100 samples simc = motifcounter:::simulateNumHitsDist(motif, bg, seqlen, nsim = 100, singlestranded = TRUE)
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load background bg = readBackground(seqs, 1) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Study the counts in one sequence of length 150 bp seqlen = rep(150, 1) # Compute empirical distribution of the number of motif hits # by scanning both strands using 100 samples simc = motifcounter:::simulateNumHitsDist(motif, bg, seqlen, nsim = 100, singlestranded = FALSE) # Compute empirical distribution of the number of motif hits # by scanning a single strand using 100 samples simc = motifcounter:::simulateNumHitsDist(motif, bg, seqlen, nsim = 100, singlestranded = TRUE)