Title: | Visualising oligonucleotide patterns and motif occurrences across a set of sorted sequences |
---|---|
Description: | Visualising oligonucleotide patterns and sequence motifs occurrences across a large set of sequences centred at a common reference point and sorted by a user defined feature. |
Authors: | Vanja Haberle <[email protected]> |
Maintainer: | Vanja Haberle <[email protected]> |
License: | GPL-3 |
Version: | 1.39.0 |
Built: | 2024-11-30 04:52:06 UTC |
Source: | https://github.com/bioc/seqPattern |
Visualising oligonucleotide patterns and sequence motifs occurrences across a large set of sequences centred at a common reference point and sorted by a user defined feature.
Package: | SeqPattern |
Type: | Package |
Version: | 1.0 |
Date: | 2014-12-05 |
License: | GPL-3 |
Depends: | R (>= 3.0.1), methods |
Vanja Haberle
Maintainer: Vanja Haberle <[email protected]>
Finds positions of specified sequence patterns in a list of sequences of the same length ordered by a provided index. Sequence patterns can be consensus sequences of variable length and can contain IUPAC ambiguity code. Position of each pattern occurrence is specified in two-dimensional matrix, i.e. the first coordinate provides the ordinal number of the sequence and the second coordinate gives the position within the sequence where the pattern occurs.
getPatternOccurrenceList(regionsSeq, patterns, seqOrder = c(1:length(regionsSeq)), useMulticore = FALSE, nrCores = NULL)
getPatternOccurrenceList(regionsSeq, patterns, seqOrder = c(1:length(regionsSeq)), useMulticore = FALSE, nrCores = NULL)
regionsSeq |
A |
patterns |
Character vector specifying one or more DNA sequence patterns (oligonucleotides). IUPAC ambiguity codes can be used and will match any letter in the subject that is associated with the code. |
seqOrder |
Integer vector specifying the order of the provided input sequences.
Must have the same length as the number of sequences in the
|
useMulticore |
Logical, should multicore be used. |
nrCores |
Number of cores to use when |
This function uses the matchPattern
function to find
occurrences of given sequence patterns in a set of input sequences.
Input sequences must all be of the same length and are ordered according to
the index provided in the seqOrder
argument, creating a n * m
matrix, where n
is the number of sequences and m
is the length
of the sequences. Positions of pattern matches in the resulting matrix are
returned as two-dimensional coordinates.
The function returns a named list with one element for each sequence pattern
specified in the patterns
argument. Each element of the list is a
data.frame
with positions of the corresponding pattern in the set of
input sequences. The input sequences of the same length are sorted according
to the index in seqOrder
argument and the positions of pattern
matches in the resulting n * m
matrix (where n
is the number
of sequences and m
is the length of the sequence) are provided. The
sequence
column in the data.frame provides the ordinal number of the
sequence in the ordered list of sequences and the position
column
provides the start position of the pattern match within that sequence.
Vanja Haberle
plotPatternDensityMap
motifScanHits
library(GenomicRanges) load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth # dinucleotide patterns patternsOccurrence <- getPatternOccurrenceList(regionsSeq = zebrafishPromoters, patterns = c("TA", "GC"), seqOrder = order(promoterWidth)) names(patternsOccurrence) head(patternsOccurrence[["GC"]]) # motif consensus sequence patternsOccurrence <- getPatternOccurrenceList(regionsSeq = zebrafishPromoters, patterns = "TATAWAWR", seqOrder = order(promoterWidth)) names(patternsOccurrence) head(patternsOccurrence[["TATAWAWR"]])
library(GenomicRanges) load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth # dinucleotide patterns patternsOccurrence <- getPatternOccurrenceList(regionsSeq = zebrafishPromoters, patterns = c("TA", "GC"), seqOrder = order(promoterWidth)) names(patternsOccurrence) head(patternsOccurrence[["GC"]]) # motif consensus sequence patternsOccurrence <- getPatternOccurrenceList(regionsSeq = zebrafishPromoters, patterns = "TATAWAWR", seqOrder = order(promoterWidth)) names(patternsOccurrence) head(patternsOccurrence[["TATAWAWR"]])
Finds positions of sequence motif hits above a specified threshold in a list of
sequences of the same length ordered by a provided index. Motif is specified by
a position weight matrix (PWM) that contains estimated probability of base b at
position i and is usually constructed via call to PWM
function.
Position of each motif hit is specified in two-dimensional matrix, i.e.
the first coordinate provides the ordinal number of the sequence and the second
coordinate gives the position within the sequence where the motif occurs.
motifScanHits(regionsSeq, motifPWM, minScore = "80%", seqOrder = c(1:length(regionsSeq)))
motifScanHits(regionsSeq, motifPWM, minScore = "80%", seqOrder = c(1:length(regionsSeq)))
regionsSeq |
A |
motifPWM |
A numeric matrix representing the Position Weight Matrix (PWM), such as
returned by |
minScore |
The minimum score for counting a motif hit. Can be given as a character
string containing a percentage (e.g. |
seqOrder |
Integer vector specifying the order of the provided input sequences.
Must have the same length as the number of sequences in the
|
This function uses the matchPWM
function to find matches to
given motif in a set of input sequences. Only matches above specified
minScore
are considered as hits. Input sequences must all be of the
same length and are ordered according to the index provided in the
seqOrder
argument, creating a n * m
matrix, where n
is
the number of sequences and m
is the length of the sequences.
Positions of motif hits in the resulting matrix are returned as
two-dimensional coordinates.
The function returns a data.frame
with positions of the motif hits in
the set of input sequences. The input sequences of the same length are
sorted according to the index in seqOrder
argument and the positions
of motif hits in the resulting n * m
matrix (where n
is the
number of sequences and m
is the length of the sequence) are
provided. The sequence
column in the data.frame provides the ordinal
number of the sequence in the ordered list of sequences and the
position
column provides the start position of the motif hit within
that sequence.
Vanja Haberle
plotMotifDensityMap
getPatternOccurrenceList
library(GenomicRanges) load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth load(system.file("data", "TBPpwm.RData", package="seqPattern")) motifOccurrence <- motifScanHits(regionsSeq = zebrafishPromoters, motifPWM = TBPpwm, minScore = "85%", seqOrder = order(promoterWidth)) head(motifOccurrence)
library(GenomicRanges) load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth load(system.file("data", "TBPpwm.RData", package="seqPattern")) motifOccurrence <- motifScanHits(regionsSeq = zebrafishPromoters, motifPWM = TBPpwm, minScore = "85%", seqOrder = order(promoterWidth)) head(motifOccurrence)
Provides motif scanning scores along the full length of a sequence for a list
of sequences of the same length ordered by a provided index. Motif is specified
by a position weight matrix (PWM) that contains estimated probability of base b
at position i and is usually constructed via call to PWM
function.
Scanning scores are returned in the form of a two-dimensional matrix, where the
rows are sequences ordered by the specified index and the columns are relative
positions within the sequence. Each cell in the matrix contains the score of
the specified motif in the given sequence starting at the given position. The
resulting matrix can be used to visualise motif occurrences and their strength
in an ordered set of sequences centered at a common reference point.
motifScanScores(regionsSeq, motifPWM, seqOrder = c(1:length(regionsSeq)), asPercentage = TRUE)
motifScanScores(regionsSeq, motifPWM, seqOrder = c(1:length(regionsSeq)), asPercentage = TRUE)
regionsSeq |
A |
motifPWM |
A numeric matrix representing the Position Weight Matrix (PWM), such as
returned by |
seqOrder |
Integer vector specifying the order of the provided input sequences.
Must have the same length as the number of sequences in the
|
asPercentage |
Logical, should the scores represent percentage of the maximal motif PWM
score ( |
This function uses the PWMscoreStartingAt
function to get
scores for a given motif starting at each position (nucleotide) in a set of
input sequences. Input sequences must all be of the same length and are
ordered according to the index provided in the seqOrder
argument,
creating an n * m
matrix, where n
is the number of sequences
and m
is the length of the sequences. Each cell in the matrix
contains the score of the specified motif in the given sequence starting at
the given position.
The function returns a matrix
with motif scanning scores for each
position in the set of input sequences.
Vanja Haberle
plotMotifScanScores
motifScanHits
library(GenomicRanges) load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth load(system.file("data", "TBPpwm.RData", package="seqPattern")) motifScores <- motifScanScores(regionsSeq = zebrafishPromoters, motifPWM = TBPpwm, seqOrder = order(promoterWidth), asPercentage = TRUE) dim(motifScores) motifScores[1:10,1:10]
library(GenomicRanges) load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth load(system.file("data", "TBPpwm.RData", package="seqPattern")) motifScores <- motifScanScores(regionsSeq = zebrafishPromoters, motifPWM = TBPpwm, seqOrder = order(promoterWidth), asPercentage = TRUE) dim(motifScores) motifScores[1:10,1:10]
Plots density of motif occurrences in an ordered set of sequences of the same length in the form of a two dimensional map centered at a common reference position. Motif is specified by a position weight matrix (PWM) that contains estimated probability of base b at position i, and only motif hits above specified threshold are taken into account and plotted.
plotMotifDensityMap(regionsSeq, motifPWM, minScore = "80%", seqOrder = c(1:length(regionsSeq)), flankUp = NULL, flankDown = NULL, nBin = NULL, bandWidth = NULL, color = "blue", transf = NULL, xTicks = NULL, xTicksAt = NULL, xLabel = "", yTicks = NULL, yTicksAt = NULL, yLabel = "", cexAxis = 8, plotScale = TRUE, scaleLength = NULL, scaleWidth = 15, addReferenceLine = TRUE, plotColorLegend = TRUE, outFile = "DensityMap", plotWidth = 2000, plotHeight = 2000)
plotMotifDensityMap(regionsSeq, motifPWM, minScore = "80%", seqOrder = c(1:length(regionsSeq)), flankUp = NULL, flankDown = NULL, nBin = NULL, bandWidth = NULL, color = "blue", transf = NULL, xTicks = NULL, xTicksAt = NULL, xLabel = "", yTicks = NULL, yTicksAt = NULL, yLabel = "", cexAxis = 8, plotScale = TRUE, scaleLength = NULL, scaleWidth = 15, addReferenceLine = TRUE, plotColorLegend = TRUE, outFile = "DensityMap", plotWidth = 2000, plotHeight = 2000)
regionsSeq |
A |
motifPWM |
A numeric matrix representing the Position Weight Matrix (PWM), such as
returned by |
minScore |
The minimum score for counting a motif hit. Can be given as a character
string containing a percentage (e.g. |
seqOrder |
Integer vector specifying the order of the provided input sequences.
Must have the same length as the number of sequences in the
|
flankUp , flankDown
|
The number of base-pairs upstream and downstream of the reference
position in the provided sequences, respectively.
|
nBin |
Numeric vector with two values containing the number of equally spaced
points in each direction over which the density is to be estimated. The
first value specifies number of bins along x-axis, i.e. along the
nucleotides in the sequence, and the second value specifies the number
of bins along y-axis, i.e. across ordered input sequences. The
values are passed on to the |
bandWidth |
Numeric vector of length 2, containing the bandwidth to be used in each
coordinate direction. The first value specifies the bandwidth along the
x-axis, i.e. along the nucleotides in the sequence, and the
second value specifies the bandwidth along y-axis, i.e. across
ordered input sequences. The values are passed on to the
|
color |
Character specifying the color palette for the density plot. One of the
following color palettes can be specified: |
transf |
The function mapping the density scale to the color scale. See Details. |
xTicks |
Character vector of labels to be placed at the tick-marks on x-axis.
The default |
xTicksAt |
Numeric vector of positions of the tick-marks on the x-axis. The values
can range from 1 (the position of the first base-pair in the sequence)
to input sequence length. The default |
xLabel |
The label for the x-axis. The default is no label, i.e. empty string. |
yTicks |
Character vector of labels to be placed at the tick-marks on y-axis.
The default |
yTicksAt |
Numeric vector of positions of the tick-marks on the y-axis. The values
can range from 1 (the position of the last sequence on the bottom of the
plot) to input sequence length (the position of the first sequence on
the top of the plot). The default |
yLabel |
The label for the y-axis. The default is no label, i.e. empty string. |
cexAxis |
The magnification to be used for axis annotation. |
plotScale |
Logical, should the scale bar be plotted in the lower left corner of the plot. |
scaleLength |
The length of the scale bar to be plotted. Used only when
|
scaleWidth |
The width of the line for the scale bar. Used only when
|
addReferenceLine |
Logical, should the vertical dashed line be drawn at the reference point. |
plotColorLegend |
Logical, should the color legend for the pattern density be plotted. If
|
outFile |
Character vector specifying the base name of the output plot file. The
final name of the plot file for each pattern will be
|
plotWidth , plotHeight
|
Width and height of the density plot in pixels. |
The function produces a PNG file in the working directory, visualising density of the motif occurrence above specified threshold in the set of ordered input sequences.
Vanja Haberle
Haberle et al. (2014) Two independent transcription initiation codes overlap on vertebrate core promoters, Nature 507:381-385.
motifScanHits
plotPatternDensityMap
library(GenomicRanges) load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth load(system.file("data", "TBPpwm.RData", package="seqPattern")) plotMotifDensityMap(regionsSeq = zebrafishPromoters, motifPWM = TBPpwm, minScore = "85%", seqOrder = order(promoterWidth), flankUp = 400, flankDown = 600, color = "red")
library(GenomicRanges) load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth load(system.file("data", "TBPpwm.RData", package="seqPattern")) plotMotifDensityMap(regionsSeq = zebrafishPromoters, motifPWM = TBPpwm, minScore = "85%", seqOrder = order(promoterWidth), flankUp = 400, flankDown = 600, color = "red")
Plots average profile of motif occurrence for a set of input sequences of the same length. Motif is specified by a position weight matrix (PWM) that contains estimated probability of base b at position i, and only motif hits above specified threshold are taken into account.
plotMotifOccurrenceAverage(regionsSeq, motifPWM, minScore = "80%", flankUp = NULL, flankDown = NULL, smoothingWindow = 1, color = "black", xLabel = "Distance to reference point (bp)", yLabel = "Relative frequency", cexAxis = 1, addReferenceLine = TRUE, plotLegend = FALSE, cexLegend = 1, add = FALSE, ...)
plotMotifOccurrenceAverage(regionsSeq, motifPWM, minScore = "80%", flankUp = NULL, flankDown = NULL, smoothingWindow = 1, color = "black", xLabel = "Distance to reference point (bp)", yLabel = "Relative frequency", cexAxis = 1, addReferenceLine = TRUE, plotLegend = FALSE, cexLegend = 1, add = FALSE, ...)
regionsSeq |
A |
motifPWM |
A numeric matrix representing the Position Weight Matrix (PWM), such as
returned by |
minScore |
The minimum score for counting a motif hit. Can be given as a character
string containing a percentage (e.g. |
flankUp , flankDown
|
The number of base-pairs upstream and downstream of the reference
position in the provided sequences, respectively.
|
smoothingWindow |
Integer specifying the size of a window (in base-pairs) to be used for smoothing the signal. Default value of 1 corresponds to no smoothing. |
color |
Value specifying the color for plotting. |
xLabel , yLabel
|
Character strings for x and y axis labels, respectively. |
cexAxis |
The magnification to be used for axis annotation relative to the current
setting of |
addReferenceLine |
Logical, should the vertical dashed line be drawn at the reference point. |
plotLegend |
Logical, should the legend be plotted at the top. |
cexLegend |
The magnification to be used for legend relative to the current setting
of |
add |
Logical, should the pattern occurrence profiles be added to the existing plot. |
... |
Further arguments to be passed to |
The function finds all hits matching the motif above the specified score threshold in the set of input sequences and plots an average profile reflecting the occurrence of the motif across input sequences.
Vanja Haberle
motifScanHits
plotMotifDensityMap
load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) load(system.file("data", "TBPpwm.RData", package="seqPattern")) plotMotifOccurrenceAverage(regionsSeq = zebrafishPromoters, motifPWM = TBPpwm, minScore = "85%", flankUp = 400, flankDown = 600, smoothingWindow = 3)
load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) load(system.file("data", "TBPpwm.RData", package="seqPattern")) plotMotifOccurrenceAverage(regionsSeq = zebrafishPromoters, motifPWM = TBPpwm, minScore = "85%", flankUp = 400, flankDown = 600, smoothingWindow = 3)
Plots heatmap of motif scanning scores for a set of sorted sequences of the same length in the form of a two dimensional map centered at a common reference position. Motif is specified by a position weight matrix (PWM) that contains estimated probability of base b at position i, and the percentage of the maximal PWM score is plotted for each position (nucleotide) in each sequence.
plotMotifScanScores(regionsSeq, motifPWM, seqOrder = c(1:length(regionsSeq)), flankUp = NULL, flankDown = NULL, xTicks = NULL, xTicksAt = NULL, xLabel = "", yTicks = NULL, yTicksAt = NULL, yLabel = "", cexAxis = 8, plotScale = TRUE, scaleLength = NULL, scaleWidth = 15, addReferenceLine = TRUE, plotColorLegend = TRUE, outFile = "MotifScanningScores.png", plotWidth = 2000, plotHeight = 2000)
plotMotifScanScores(regionsSeq, motifPWM, seqOrder = c(1:length(regionsSeq)), flankUp = NULL, flankDown = NULL, xTicks = NULL, xTicksAt = NULL, xLabel = "", yTicks = NULL, yTicksAt = NULL, yLabel = "", cexAxis = 8, plotScale = TRUE, scaleLength = NULL, scaleWidth = 15, addReferenceLine = TRUE, plotColorLegend = TRUE, outFile = "MotifScanningScores.png", plotWidth = 2000, plotHeight = 2000)
regionsSeq |
A |
motifPWM |
A numeric matrix representing the Position Weight Matrix (PWM), such as
returned by |
seqOrder |
Integer vector specifying the order of the provided input sequences.
Must have the same length as the number of sequences in the
|
flankUp , flankDown
|
The number of base-pairs upstream and downstream of the reference
position in the provided sequences, respectively.
|
xTicks |
Character vector of labels to be placed at the tick-marks on x-axis.
The default |
xTicksAt |
Numeric vector of positions of the tick-marks on the x-axis. The values
can range from 1 (the position of the first base-pair in the sequence)
to input sequence length. The default |
xLabel |
The label for the x-axis. The default is no label, i.e. empty string. |
yTicks |
Character vector of labels to be placed at the tick-marks on y-axis.
The default |
yTicksAt |
Numeric vector of positions of the tick-marks on the y-axis. The values
can range from 1 (the position of the last sequence on the bottom of the
plot) to input sequence length (the position of the first sequence on
the top of the plot). The default |
yLabel |
The label for the y-axis. The default is no label, i.e. empty string. |
cexAxis |
The magnification to be used for axis annotation. |
plotScale |
Logical, should the scale bar be plotted in the lower left corner of the plot. |
scaleLength |
The length of the scale bar to be plotted. Used only when
|
scaleWidth |
The width of the line for the scale bar. Used only when
|
addReferenceLine |
Logical, should the vertical dashed line be drawn at the reference point. |
plotColorLegend |
Logical, should the color legend for the scanning score be plotted on the right side of the plot. |
outFile |
Character vector specifying the base name of the output plot file. The
final name of the plot file for each pattern will be
|
plotWidth , plotHeight
|
Width and height of the density plot in pixels. |
The function produces a PNG file in the working directory, visualising motif scanning scores in the set of ordered input sequences.
Vanja Haberle
motifScanScores
plotPatternDensityMap
library(GenomicRanges) load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth load(system.file("data", "TBPpwm.RData", package="seqPattern")) plotMotifScanScores(regionsSeq=zebrafishPromoters, motifPWM = TBPpwm, seqOrder=order(promoterWidth), flankUp = 400, flankDown = 600)
library(GenomicRanges) load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth load(system.file("data", "TBPpwm.RData", package="seqPattern")) plotMotifScanScores(regionsSeq=zebrafishPromoters, motifPWM = TBPpwm, seqOrder=order(promoterWidth), flankUp = 400, flankDown = 600)
Plots density of sequence pattern occurrences in an ordered set of sequences of the same length in the form of a two dimensional map centered at a common reference position. Multiple sequence patterns can be processed at once and one plot per pattern will be created with the same color scale across all plots, allowing visual density comparison across different patterns.
plotPatternDensityMap(regionsSeq, patterns, seqOrder = c(1:length(regionsSeq)), flankUp = NULL, flankDown = NULL, nBin = NULL, bandWidth = NULL, color = "blue", transf = NULL, xTicks = NULL, xTicksAt = NULL, xLabel = "", yTicks = NULL, yTicksAt = NULL, yLabel = "", cexAxis = 8, plotScale = TRUE, scaleLength = NULL, scaleWidth = 15, addPatternLabel = TRUE, cexLabel = 8, labelCol = "black", addReferenceLine = TRUE, plotColorLegend = TRUE, outFile = "PatternDensityMap", plotWidth = 2000, plotHeight = 2000, useMulticore = FALSE, nrCores = NULL)
plotPatternDensityMap(regionsSeq, patterns, seqOrder = c(1:length(regionsSeq)), flankUp = NULL, flankDown = NULL, nBin = NULL, bandWidth = NULL, color = "blue", transf = NULL, xTicks = NULL, xTicksAt = NULL, xLabel = "", yTicks = NULL, yTicksAt = NULL, yLabel = "", cexAxis = 8, plotScale = TRUE, scaleLength = NULL, scaleWidth = 15, addPatternLabel = TRUE, cexLabel = 8, labelCol = "black", addReferenceLine = TRUE, plotColorLegend = TRUE, outFile = "PatternDensityMap", plotWidth = 2000, plotHeight = 2000, useMulticore = FALSE, nrCores = NULL)
regionsSeq |
A |
patterns |
Character vector specifying one or more DNA sequence patterns (oligonucleotides). IUPAC ambiguity codes can be used and will match any letter in the subject that is associated with the code. |
seqOrder |
Integer vector specifying the order of the provided input sequences.
Must have the same length as the number of sequences in the
|
flankUp , flankDown
|
The number of base-pairs upstream and downstream of the reference
position in the provided sequences, respectively.
|
nBin |
Numeric vector with two values containing the number of equally spaced
points in each direction over which the density is to be estimated. The
first value specifies number of bins along x-axis, i.e. along the
nucleotides in the sequence, and the second value specifies the number
of bins along y-axis, i.e. across ordered input sequences. The values
are passed on to the |
bandWidth |
Numeric vector of length 2, containing the bandwidth to be used in each
coordinate direction. The first value specifies the bandwidth along the
x-axis, i.e. along the nucleotides in the sequence, and the
second value specifies the bandwidth along y-axis, i.e. across
ordered input sequences. The values are passed on to the
|
color |
Character specifying the color palette for the density plot. One of the
following color palettes can be specified: |
transf |
The function mapping the density scale to the color scale. See Details. |
xTicks |
Character vector of labels to be placed at the tick-marks on x-axis.
The default |
xTicksAt |
Numeric vector of positions of the tick-marks on the x-axis. The values
can range from 1 (the position of the first base-pair in the sequence)
to input sequence length. The default |
xLabel |
The label for the x-axis. The default is no label, i.e. empty string. |
yTicks |
Character vector of labels to be placed at the tick-marks on y-axis.
The default |
yTicksAt |
Numeric vector of positions of the tick-marks on the y-axis. The values
can range from 1 (the position of the last sequence on the bottom of the
plot) to input sequence length (the position of the first sequence on
the top of the plot). The default |
yLabel |
The label for the y-axis. The default is no label, i.e. empty string. |
cexAxis |
The magnification to be used for axis annotation. |
plotScale |
Logical, should the scale bar be plotted in the lower left corner of the plot. |
scaleLength |
The length of the scale bar to be plotted. Used only when
|
scaleWidth |
The width of the line for the scale bar. Used only when
|
addPatternLabel |
Logical, should the pattern label be written in the upper left corner of the plot. |
cexLabel |
The magnification to be used for pattern label. |
labelCol |
The color to be used for pattern label and scale bar. |
addReferenceLine |
Logical, should the vertical dashed line be drawn at the reference point. |
plotColorLegend |
Logical, should the color legend for the pattern density be plotted. If
|
outFile |
Character vector specifying the base name of the output plot file. The
final name of the plot file for each pattern will be
|
plotWidth , plotHeight
|
Width and height of the density plot(s) in pixels. |
useMulticore |
Logical, should multicore be used. |
nrCores |
Number of cores to use when |
The function produces PNG files in the working directory, visualising density of patterns occurrence in the set of ordered input sequences. One file/plot per specified pattern is created.
Vanja Haberle
Haberle et al. (2014) Two independent transcription initiation codes overlap on vertebrate core promoters, Nature 507:381-385.
getPatternOccurrenceList
plotMotifDensityMap
library(GenomicRanges) load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth # dinucleotide patterns plotPatternDensityMap(regionsSeq = zebrafishPromoters, patterns = c("TA", "GC"), seqOrder = order(promoterWidth), flankUp = 400, flankDown = 600, color = "blue") # motif consensus sequence plotPatternDensityMap(regionsSeq = zebrafishPromoters, patterns = "TATAWAWR", seqOrder = order(promoterWidth), flankUp = 400, flankDown = 600, color = "cyan")
library(GenomicRanges) load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth # dinucleotide patterns plotPatternDensityMap(regionsSeq = zebrafishPromoters, patterns = c("TA", "GC"), seqOrder = order(promoterWidth), flankUp = 400, flankDown = 600, color = "blue") # motif consensus sequence plotPatternDensityMap(regionsSeq = zebrafishPromoters, patterns = "TATAWAWR", seqOrder = order(promoterWidth), flankUp = 400, flankDown = 600, color = "cyan")
Plots average profile of sequence pattern occurrence for a set of input sequences of the same length. Multiple sequence patterns can be processed at once and visualised in the same plot, allowing comparison across different patterns.
plotPatternOccurrenceAverage(regionsSeq, patterns, flankUp = NULL, flankDown = NULL, smoothingWindow = 1, color = rainbow(length(patterns)), xLabel = "Distance to reference point (bp)", yLabel = "Relative frequency", cexAxis = 1, addReferenceLine = TRUE, plotLegend = TRUE, cexLegend = 1, useMulticore = FALSE, nrCores = NULL, add = FALSE, ...)
plotPatternOccurrenceAverage(regionsSeq, patterns, flankUp = NULL, flankDown = NULL, smoothingWindow = 1, color = rainbow(length(patterns)), xLabel = "Distance to reference point (bp)", yLabel = "Relative frequency", cexAxis = 1, addReferenceLine = TRUE, plotLegend = TRUE, cexLegend = 1, useMulticore = FALSE, nrCores = NULL, add = FALSE, ...)
regionsSeq |
A |
patterns |
Character vector specifying one or more DNA sequence patterns (oligonucleotides). IUPAC ambiguity codes can be used and will match any letter in the subject that is associated with the code. |
flankUp , flankDown
|
The number of base-pairs upstream and downstream of the reference
position in the provided sequences, respectively.
|
smoothingWindow |
Integer specifying the size of a window (in base-pairs) to be used for smoothing the signal. Default value of 1 corresponds to no smoothing. |
color |
A vector of values specifying the colors for plotting. Number of colors must match the number of patterns that should be plotted. |
xLabel , yLabel
|
Character strings for x and y axis labels, respectively. |
cexAxis |
The magnification to be used for axis annotation relative to the current
setting of |
addReferenceLine |
Logical, should the vertical dashed line be drawn at the reference point. |
plotLegend |
Logical, should the legend be plotted at the top. |
cexLegend |
The magnification to be used for legend relative to the current setting
of |
useMulticore |
Logical, should multicore be used. |
nrCores |
Number of cores to use when |
add |
Logical, should the pattern occurrence profiles be added to the existing plot. |
... |
Further arguments to be passed to |
The function finds all hits matching the specified patterns in the set of input sequences and plots one average profile per pattern reflecting the occurrence of patterns across sequences.
Vanja Haberle
getPatternOccurrenceList
plotPatternDensityMap
load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) # dinucleotide patterns plotPatternOccurrenceAverage(regionsSeq = zebrafishPromoters, patterns = c("AT", "TA", "CG", "GC"), flankUp = 400, flankDown = 600, smoothingWindow = 3, color = c("gold", "darkred", "forestgreen", "navy")) # motif consensus sequence plotPatternOccurrenceAverage(regionsSeq = zebrafishPromoters, patterns = "TATAWAWR", flankUp = 400, flankDown = 600, smoothingWindow = 3, color = "gray")
load(system.file("data", "zebrafishPromoters.RData", package="seqPattern")) # dinucleotide patterns plotPatternOccurrenceAverage(regionsSeq = zebrafishPromoters, patterns = c("AT", "TA", "CG", "GC"), flankUp = 400, flankDown = 600, smoothingWindow = 3, color = c("gold", "darkred", "forestgreen", "navy")) # motif consensus sequence plotPatternOccurrenceAverage(regionsSeq = zebrafishPromoters, patterns = "TATAWAWR", flankUp = 400, flankDown = 600, smoothingWindow = 3, color = "gray")
This is a matrix
object representing a position-weight matrix
(PWM) for TATA-box binding protein motif. The scores in the matrix correspond
to log2 ratio between probability of observing base b (rows) at position i
(columns) and background probability of base b. This PWM has been derived using
position frequency matrix from Jaspar database. It is intended to be used as an
input data in running examples from seqPattern
package help pages.
data(TBPpwm)
data(TBPpwm)
A matrix
object
A matrix
object
This is a DNAStringSet
object that contains sequence of 1000
randomly selected promoters active in 24 hpf zebrafish Danio rerio
embryos. The data is taken from Nepal et al. Genome Research 2013, and
represents regions flanking 400 bp upstream and 600 bp downstream of the
dominant TSS detected by Cap analysis of gene expression (CAGE). It is intended
to be used as an input data in running examples from seqPattern
package help pages.
data(zebrafishPromoters)
data(zebrafishPromoters)
A DNAStringSet
object
A DNAStringSet
object
Nepal et al. (2013) Dynamic regulation of the transcription initiation landscape at single nucleotide resolution during vertebrate embryogenesis, Genome Research 23(11):1938-1950.
This is a data.frame
object that contains genomic coordinates of
12078 promoters active in zebrafish Danio rerio embryos at 24 hours past
fertilisation. For each promoter additional information is provided, including
position of the dominant (most frequently used) TSS position, number of CAGE
tags per million supporting that promoter and the interquantile width of the
promoter (width of the central region containing >= 80% of the CAGE tags). The
data is taken from Nepal et al. Genome Research 2013, and it is intended
to be used for running examples from seqPattern
package vignette.
data(zebrafishPromoters24h)
data(zebrafishPromoters24h)
A data.frame
object
A data.frame
object
Nepal et al. (2013) Dynamic regulation of the transcription initiation landscape at single nucleotide resolution during vertebrate embryogenesis, Genome Research 23(11):1938-1950.