Title: | Predicting binding site consensus from ranked DNA sequences |
---|---|
Description: | Functions and classes for de novo prediction of transcription factor binding consensus by heuristic search |
Authors: | Adam Ameur <[email protected]> |
Maintainer: | Adam Ameur <[email protected]> |
License: | GPL-2 |
Version: | 1.69.0 |
Built: | 2024-12-19 02:51:14 UTC |
Source: | https://github.com/bioc/BCRANK |
This function implements an algorithm for detection of short DNA
sequences that are overrepresented in some part of the list. Starting
from some initial consensus DNA sequence coded in IUPAC symbols, the
method uses a heuristic search to improve the consensus until a local
optimum is found. Individual predicted binding sites can
be reported by the function matchingSites
.
bcrank(fafile, startguesses=c(), restarts=10, length=10, reorderings=500, silent=FALSE, plot.progress=FALSE, do.search=TRUE, use.P1=FALSE, use.P2=TRUE, strip.desc=TRUE)
bcrank(fafile, startguesses=c(), restarts=10, length=10, reorderings=500, silent=FALSE, plot.progress=FALSE, do.search=TRUE, use.P1=FALSE, use.P2=TRUE, strip.desc=TRUE)
fafile |
a ranked fasta file containing DNA sequences. |
startguesses |
a character vector with consensus sequences in IUPAC coding to be used as starting sequences in the search. If empty, random start guesses will be generated. |
restarts |
number restarts of the algorithm when using random start guesses. |
length |
legth of random start guess. |
reorderings |
number of random reorderings of the DNA sequences performed when calculating score. |
silent |
reports progress status if FALSE. |
plot.progress |
if TRUE, the progress is displayed in a plot. |
do.search |
if FALSE, no search is performed. In that case the start guesses are assigned with scores and reported as results. |
use.P1 |
Use penalty for bases other than A,C,G,T. |
use.P2 |
Use penalty for motifs matching repetitive sequences. |
strip.desc |
Ignored (always treated as TRUE). |
The method returns an objcet of class BCRANKresult-class
.
Adam Ameur, [email protected]
Ameur, A., Rada-Iglesias, A., Komorowski, J., Wadelius, C. Identification of candidate regulatory SNPs by combination of transcription factor binding site prediction, SNP genotyping and haploChIP. Nucleic Acids Res, 2009, 37(12):e85.
matchingSites
, BCRANKresult-class
## Load example fasta file fastaFile <- system.file("Exfiles/USF1_small.fa", package = "BCRANK") ## Run BCRANK ## Not run: BCRANKout <- bcrank(fastaFile, restarts=20) ## Show BCRANK results toptable(BCRANKout) ## The top scoring result topMotif <- toptable(BCRANKout,1) ## Plot BCRANK search path plot(topMotif) ## Position Weight Matrix pwm(topMotif, normalize=FALSE)
## Load example fasta file fastaFile <- system.file("Exfiles/USF1_small.fa", package = "BCRANK") ## Run BCRANK ## Not run: BCRANKout <- bcrank(fastaFile, restarts=20) ## Show BCRANK results toptable(BCRANKout) ## The top scoring result topMotif <- toptable(BCRANKout,1) ## Plot BCRANK search path plot(topMotif) ## Position Weight Matrix pwm(topMotif, normalize=FALSE)
Holds the bcrank
score for one
IUPAC consensus sequence. Several objects of this class are
collected in a
BCRANKsearch-class
object
Objects are not intended to be created directly but as a result from
running bcrank
.
consensus
:consensus sequence in IUPAC coding
bcrankScore
:bcrank score for the consensus
matchVec
:vector with 0's (no match) and 1's (match) of same length as the ranked DNA sequences
signature(object = "BCRANKmatch")
: Returns
the consensus sequence.
signature(object = "BCRANKmatch")
: Returns
the bcrank score.
signature(object = "BCRANKmatch")
: Returns
a vector with 0's (no match) and 1's (match) of same length and
order as the ranked DNA sequences.
Adam Ameur, [email protected]
Results from running bcrank
on USF1 whole genome ChIP-chip
data for the human liver cell line HepG2.
data(BCRANKout)
data(BCRANKout)
Data from whole genome ChIP-chip experiments on human liver cell line HepG2. (Rada-Iglesias, A., et al. 2007)
Rada-Iglesias, A., et al. (2007) Whole-genome maps of USF1 and USF2 binding and histone H3 acetylation reveal new aspects of promoter structure and candidate genes for common human disorders.Genome Research, Accepted
Holds the results from running
bcrank
. Contains a number of
BCRANKsearch-class
object,
one for each restart of the bcrank search.
fname
:the name of the fasta file used for running bcrank.
toplist
:a list of BCRANKsearch-class
objects,
ranked by their scores.
funCall
:the function call that was made to bcrank.
nrSeqs
:number of sequences in the fasta input file.
restarts
:number of restarts used in the bcrank search.
signature(object = "BCRANKmatch")
: Returns the
fasta file name.
signature(object = "BCRANKmatch", i=NULL)
: If
i
is NULL, returns a data frame containing consensus and
score for the results for each restart of the bcrank
search. Otherwise, the i'th BCRANKsearch-class
object in
the toplist is returned.
Adam Ameur, [email protected]
Holds the whole search path from a single
bcrank
run. Each individual search step
is stored in a
BCRANKmatch-class
object. Several objects of this class are collected in a
BCRANKresult-class
object
Objects are not intended to be created directly but as a result from
running bcrank
.
searchPath
:a collection of BCRANKmatch-class
objects, ontaining all bcrank search steps from a start guess to a
locally optimal solution.
final
:a BCRANKmatch-class
object for the
highest scoring consensus sequence (locally optimal solution) in this bcrank run.
finalPWM
:position weight matrix for the highest scoring consensus sequence.
finalNrMatch
:number of occurrences of the final consensus sequence in the fasta input file.
nrIterations
:number of iterations required to move from the start guess to the final soloution in this bcrank run.
signature(object = "BCRANKsearch", i=NULL)
:
If i
is NULL, returns a data frame containing consensus and score
for the whole search path. Otherwise, the i'th
BCRANKmatch-class
object in the search path is returned.
signature(object = "BCRANKsearch", normalize=TRUE)
: Returns
the position weight matrix (pwm) for the highest scoring consensus in
this bcrank run. Matrix positions are between between 0 and 1 when normalize
is TRUE. When FALSE, the number of matching sequences is reported.
signature(x = "BCRANKsearch", y = "missing")
:
A plot method for the searchPath.
Adam Ameur, [email protected]
bcrank
,
BCRANKmatch-class
,
BCRANKresult-class
This function reports all occurrences of a consensus sequence in a fasta file. It can be used to extract transcription factor binding sites predicted by BCRANK or other motif search methods.
matchingSites(fafile, motifSequence, revComp=TRUE, strip.desc=TRUE)
matchingSites(fafile, motifSequence, revComp=TRUE, strip.desc=TRUE)
fafile |
a ranked fasta file containing DNA sequences. |
motifSequence |
a character vector in IUPAC coding representing a DNA sequence. |
revComp |
set to TRUE if the reverse complement also be matched. |
strip.desc |
Ignored (always treated as TRUE). |
Returns a data frame with positions, strand and DNA sequence for the matching sites.
Adam Ameur, [email protected]
Ameur, A., Rada-Iglesias, A., Komorowski, J., Wadelius, C. Identification of candidate regulatory SNPs by combination of transcription factor binding site prediction, SNP genotyping and haploChIP. Nucleic Acids Res, 2009, 37(12):e85.