Title: | Profile score distributions |
---|---|
Description: | Regularization and score distributions for position count matrices. |
Authors: | Paal O. Westermark |
Maintainer: | Paal O. Westermark <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.35.0 |
Built: | 2024-10-31 03:38:30 UTC |
Source: | https://github.com/bioc/profileScoreDist |
backgroundDist
returns the background distribution of a profile
object.
backgroundDist(x)
backgroundDist(x)
x |
A ProfileDist object. |
This is a generic function.
The background distribution vector.
anObject <- ProfileDist() backgroundDist(anObject)
anObject <- ProfileDist() backgroundDist(anObject)
Computes the discretisized score distribution of a position count matrix (PCM) or a position weight matrix (PWM), using the method described by Rahmann et al.
computeScoreDist(motif, gc, granularity = 0.01, unit = "nat")
computeScoreDist(motif, gc, granularity = 0.01, unit = "nat")
motif |
A matrix representing a PCM or PWM; each column a position and each row a base corresponding to A, C, G, T. This order is assumed, unless the rows are correspondingly named in a different order. |
gc |
A scalar giving the GC fraction to assume. |
granularity |
The granularity of the discretization, defaults to 0.01. |
unit |
The logarithm unit of the score computed from the PCM or PWM, can be "nat" (default, natural logarithm), "bit" (base 2), or "dit" (base 10). |
a ProfileDist object
Rahmann, S., Mueller, T., and Vingron, M. (2003). On the power of profiles for transcription factor binding site detection. Stat Appl Genet Mol Biol 2, Article7.
data(INR) thedist <- computeScoreDist(regularizeMatrix(INR), 0.5) plotDist(thedist)
data(INR) thedist <- computeScoreDist(regularizeMatrix(INR), 0.5) plotDist(thedist)
The position count matrix for the initiator (INR) core promoter element. This matrix was obtained from the JASPAR public domain database, but was originally published by P. Bucher (1990); in that publication (and elsewhere) it was termed Cap signal.
INR
INR
A matrix with named rows corresponding to the counts for each of the four nucleotides.
The position count matrix for INR.
http://jaspar.genereg.net
Bucher, P. (1990). Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. Journal of Molecular Biology 212, 563???-578.
Mathelier, A., Zhao, X., Zhang, A.W., Parcy, F., Worsley-Hunt, R., Arenillas, D.J., Buchman, S., Chen, C.-Y., Chou, A., Ienasescu, H., et al. (2014). JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Research 42, D142–D147.
plotDist
creates a rudimentary plot of signals and backgrounds.
plotDist(x)
plotDist(x)
x |
A ProfileDist object. |
This is a generic function.
The scores vector.
data(INR) thedist <- computeScoreDist(regularizeMatrix(INR), 0.5) plotDist(thedist)
data(INR) thedist <- computeScoreDist(regularizeMatrix(INR), 0.5) plotDist(thedist)
This class represents signal and background score distributions for a profile.
## S4 method for signature 'ProfileDist' show(object) ## S4 method for signature 'ProfileDist' score(x) ## S4 method for signature 'ProfileDist' signalDist(x) ## S4 method for signature 'ProfileDist' backgroundDist(x) ## S4 method for signature 'ProfileDist' plotDist(x)
## S4 method for signature 'ProfileDist' show(object) ## S4 method for signature 'ProfileDist' score(x) ## S4 method for signature 'ProfileDist' signalDist(x) ## S4 method for signature 'ProfileDist' backgroundDist(x) ## S4 method for signature 'ProfileDist' plotDist(x)
object |
A ProfileDist object for the |
x |
A ProfileDist object. |
A ProfileDist object.
show
: Shows useful information
score
: Accessor for the scores
signalDist
: Accessor for the signal distribution
backgroundDist
: Accessor for the background distribution
plotDist
: Simple plot method for signal and background
distributions
f
Signal distribution
g
Background distribution
Scores
Scores for the distributions
ProfileDist(f=numeric, g=numeric, Scores=numeric)
Carries out the regularization suggested by Rahmann et al. This lets each column in the regularized matrix be a linear combination of the column in the non-regularized matrix and rho, the overall base distribution of all positions. The weighting of the linear combination is determined by the parameter E in a non-trivial way, see Rahmann et al. for more information. A default value E=1.5 usually works well.
regularizeMatrix(motif, E = 1.5)
regularizeMatrix(motif, E = 1.5)
motif |
A position count matrix; each column a position and each row a base corresponding to A, C, G, T. This order is assumed, unless the rows are correspondingly named in a different order. |
E |
Weighting parameter between 0 and 3 for the regularization. |
The regularized matrix
Rahmann, S., Mueller, T., and Vingron, M. (2003). On the power of profiles for transcription factor binding site detection. Stat Appl Genet Mol Biol 2, Article7.
data(INR) regularizeMatrix(INR)
data(INR) regularizeMatrix(INR)
Computes score cutoffs for a PWM or a PCM, given distributions as calculated
with computeScoreDist()
. Cutoffs can be computed for a given false
discovery rate (FDR), for a given false negative rate (FNR), and the optimal
tradeoff between the two, in the sense that for some
that the user may choose.
scoreDistCutoffs(scoreDist, n, m = 1, c = 1, cutoff = 0.01)
scoreDistCutoffs(scoreDist, n, m = 1, c = 1, cutoff = 0.01)
scoreDist |
A ProfileDist object, as computed by
|
n |
The number of scores considered for the given PWM. If one sequence is considered and a score is computed for all overlapping windows of the same length as the PWM, this will be the length of the sequence, minus the PWM length plus 1. If scanning a sequence and its reverse complement too, this number must be further multiplied by two. The number forms the basis for the FDR, since this is a multiple testing problem. |
m |
The number of true positives assumed for computing the FNR. |
c |
A factor expressing how much more important the FDR is compared to the FNR, when computing the tradeoff cutoff that considers both FDR and FNR. See Rahmann et al. for details. |
cutoff |
The FDR and FNR considered, typically 0.01 or 0.05. |
a list with elements:
Score cutoff for
FDR=cutoff
Score cutoff for FNR=cutoff
Score cutoff for c
*FDR = FNR
Rahmann, S., Mueller, T., and Vingron, M. (2003). On the power of profiles for transcription factor binding site detection. Stat Appl Genet Mol Biol 2, Article7.
data(INR) thedist <- computeScoreDist(regularizeMatrix(INR), 0.5) scoreDistCutoffs(thedist, n=2000, cutoff=0.05)
data(INR) thedist <- computeScoreDist(regularizeMatrix(INR), 0.5) scoreDistCutoffs(thedist, n=2000, cutoff=0.05)
signalDist
returns the signal distribution of a profile
object.
signalDist(x)
signalDist(x)
x |
A ProfileDist object. |
This is a generic function.
The signal distribution vector.
anObject <- ProfileDist() backgroundDist(anObject)
anObject <- ProfileDist() backgroundDist(anObject)