Title: | An R package for RNA visualization and analysis |
---|---|
Description: | A package for RNA basepair analysis, including the visualization of basepairs as arc diagrams for easy comparison and annotation of sequence and structure. Arc diagrams can additionally be projected onto multiple sequence alignments to assess basepair conservation and covariation, with numerical methods for computing statistics for each. |
Authors: | Daniel Lai, Irmtraud Meyer <[email protected]> |
Maintainer: | Daniel Lai <[email protected]> |
License: | GPL-3 |
Version: | 1.35.0 |
Built: | 2024-11-04 06:05:21 UTC |
Source: | https://github.com/bioc/R4RNA |
An R package for RNA visualization and analysis
# Read input data predicted <- readHelix(system.file("extdata", "helix.txt", package = "R4RNA")) known <- readVienna(system.file("extdata", "vienna.txt", package = "R4RNA")) sequence <- as.character(readBStringSet(system.file("extdata", "fasta.txt", package = "R4RNA"))) plotHelix(predicted) pval.coloured <- colourByValue(predicted, log = TRUE, get = TRUE) plotDoubleHelix(pval.coloured, known, scale = FALSE) plotOverlapHelix(pval.coloured, known) cov.coloured <- colourByCovariation(known, sequence, get = TRUE) plotCovariance(sequence, cov.coloured) plotDoubleCovariance(cov.coloured, pval.coloured, sequence, conflict.filter = "grey") plotOverlapCovariance(pval.coloured, known, sequence, grid = TRUE, conflict.filter = "grey", legend = FALSE, any = TRUE) # List of all functions ls("package:R4RNA") # use example() and help() for more details on each function
# Read input data predicted <- readHelix(system.file("extdata", "helix.txt", package = "R4RNA")) known <- readVienna(system.file("extdata", "vienna.txt", package = "R4RNA")) sequence <- as.character(readBStringSet(system.file("extdata", "fasta.txt", package = "R4RNA"))) plotHelix(predicted) pval.coloured <- colourByValue(predicted, log = TRUE, get = TRUE) plotDoubleHelix(pval.coloured, known, scale = FALSE) plotOverlapHelix(pval.coloured, known) cov.coloured <- colourByCovariation(known, sequence, get = TRUE) plotCovariance(sequence, cov.coloured) plotDoubleCovariance(cov.coloured, pval.coloured, sequence, conflict.filter = "grey") plotOverlapCovariance(pval.coloured, known, sequence, grid = TRUE, conflict.filter = "grey", legend = FALSE, any = TRUE) # List of all functions ls("package:R4RNA") # use example() and help() for more details on each function
Functions to compute covariation, percent identity conservation, and percent canonical basepairs given a multiple sequence alignment and optionally a secondary structure. Statistics can be computed for a single base, basepair, helix or entire alignment.
baseConservation(msa, pos) basepairConservation(msa, pos.5p, pos.3p) basepairCovariation(msa, pos.5p, pos.3p) basepairCanonical(msa, pos.5p, pos.3p) helixConservation(helix, msa) helixCovariation(helix, msa) helixCanonical(helix, msa) alignmentConservation(msa) alignmentCovariation(msa, helix) alignmentCanonical(msa, helix) alignmentPercentGaps(msa)
baseConservation(msa, pos) basepairConservation(msa, pos.5p, pos.3p) basepairCovariation(msa, pos.5p, pos.3p) basepairCanonical(msa, pos.5p, pos.3p) helixConservation(helix, msa) helixCovariation(helix, msa) helixCanonical(helix, msa) alignmentConservation(msa) alignmentCovariation(msa, helix) alignmentCanonical(msa, helix) alignmentPercentGaps(msa)
helix |
A helix data.frame |
msa |
A multiple sequence alignment. Can be either a |
pos , pos.5p , pos.3p
|
Positions of bases or basepairs for which statistics shall be calculated for. |
Conservation values have a range of [0, 1], where 0 is the absence of primary sequence conservation (all bases different), and 1 is full primary sequence conservation (all bases identical).
Canonical values have a range of [0, 1], where 0 is a complete lack of basepair potential, and 1 indicates that all basepairs are valid
Covariation values have a range of [-2, 2], where -2 is a complete lack of basepair potential and sequence conservation, 0 is complete sequence conservation regardless of basepairing potential, and 2 is a complete lack of sequence conservation but maintaining full basepair potential.
helix
values are average of base/basepair values, and the
alignment
values are averages of helices or all columns depending
on whether the helix
argument is required.
alignmentPercentGaps
simply returns the percentage of nucleotides
that are gaps in a sequence for each sequence of the alignment.
baseConservation
, basepairConservation
,
basepairCovariation
, basepairCanonical
,
alignmentConservation
, alignmentCovariation
, and
alignmentCanonical
return a single decimal value.
helixConservation
, helixCovariation
, helixCanonical
return a list of values whose length equals the number of rows in helix
.
alignmentPercentGaps
returns a list of values whose length equals
the number of sequences in the multiple sequence alignment.
Jeff Proctor, Daniel Lai
data(helix) baseConservation(fasta, 9) basepairConservation(fasta, 9, 18) basepairCovariation(fasta, 9, 18) basepairCanonical(fasta, 9, 18) helixConservation(helix, fasta) helixCovariation(helix, fasta) helixCanonical(helix, fasta) alignmentConservation(fasta) alignmentCovariation(fasta, helix) alignmentCanonical(fasta, helix) alignmentPercentGaps(fasta)
data(helix) baseConservation(fasta, 9) basepairConservation(fasta, 9, 18) basepairCovariation(fasta, 9, 18) basepairCanonical(fasta, 9, 18) helixConservation(helix, fasta) helixCovariation(helix, fasta) helixCanonical(helix, fasta) alignmentConservation(fasta) alignmentCovariation(fasta, helix) alignmentCanonical(fasta, helix) alignmentPercentGaps(fasta)
Calculates the frequency of each basepair in a given helix structure. Internally, breaks helices into basepairs, and returns a structure of unique basepairs, where the values is its frequency, regardless of original value.
basepairFrequency(helix)
basepairFrequency(helix)
helix |
A helix data.frame |
A helix data.frame of unique basepairs of length 1, with the frequency of appearance as its value, sorted by decreasing value.
Daniel Lai
data(helix) basepairFrequency(helix)
data(helix) basepairFrequency(helix)
Given a helix data frame, expands a helix of arbitrary length into helices of length 1 (i.e. basepairs). Also does the reverse operation of clustering consecutive basepairs (or helices), and merging/collapsing them into a single helix.
expandHelix(helix) collapseHelix(helix, number = FALSE)
expandHelix(helix) collapseHelix(helix, number = FALSE)
helix |
A helix data frame. |
number |
Indicates presence of a column in the helix data frame titled exactly 'number', which will be used to unique identify basepairs belonging to the same helix. Only basepairs from the same helix as identified by the number will be collapsed together. |
During the expansion, basepairs expanded from a single helix will all be assigned the value of the originating helix (the same goes for all other columns besides i, j, and length). During collapsing, only helices/basepairs of equal value will be grouped together. The ordering of collapsed helices returned will be sorted by value (increasing order). For any other columns besides i, j, length and value, values will be obtained from the corresponding columns of the outer most basepair.
Returns a helix data frame.
Daniel Lai
# Create helix data frame helix <- data.frame(2, 8, 3, 0.5) helix[2, ] <- c(5, 15, 4, -0.5) helix <- as.helix(helix) helix$colour <- c("red", "blue") # Before expansion print(helix) # After expansion print(expanded <- expandHelix(helix)) # Collapse back (sorted by value) print(collapseHelix(expanded))
# Create helix data frame helix <- data.frame(2, 8, 3, 0.5) helix[2, ] <- c(5, 15, 4, -0.5) helix <- as.helix(helix) helix$colour <- c("red", "blue") # Before expansion print(helix) # After expansion print(expanded <- expandHelix(helix)) # Collapse back (sorted by value) print(collapseHelix(expanded))
Functions to coerce a structure into a helix data frame, and to check whether a structure is a valid helix data frame. A helix data frame is a data frame, so any structure coercible into a data.frame can become a helix data frame.
as.helix(x, length) is.helix(x)
as.helix(x, length) is.helix(x)
x |
Structure to coerce. Should be a structure coercible into a standard
R data.frame structure for |
length |
The length of the RNA sequence containing the helices. |
as.helix
takes in a data.frame and coerces it into a
helix data frame acceptable by other R4RNA functions. This mainly involves
setting specific column names and casting to specific types.
is.helix
returns a boolean.
as.helix
returns helix data frame with valid input.
Daniel Lai
# Not a valid helix data frame helix <- data.frame(c(1, 2, 3), seq(10, 20, length.out = 3), 5, runif(3)) is.helix(helix) warnings() # Formatted into a helix data frame helix <- as.helix(helix) is.helix(helix)
# Not a valid helix data frame helix <- data.frame(c(1, 2, 3), seq(10, 20, length.out = 3), 5, runif(3)) is.helix(helix) warnings() # Formatted into a helix data frame helix <- as.helix(helix) is.helix(helix)
Functions to generate colours for helices by various rules, including integer counts, value ranges, percent identity covariation, conservation, percentage canonical basepair, basepair frequency, and non-pseudoknotted groups.
colourByCount(helix, cols, counts, get = FALSE) colourByValue(helix, cols, breaks, get = FALSE, log = FALSE, include.lowest = TRUE, ...) colourByBasepairFrequency(helix, cols, get = TRUE) colourByUnknottedGroups(helix, cols, get = TRUE) colourByCovariation(helix, msa, cols, get = FALSE) colourByConservation(helix, msa, cols, get = FALSE) colourByCanonical(helix, msa, cols, get = FALSE) defaultPalette()
colourByCount(helix, cols, counts, get = FALSE) colourByValue(helix, cols, breaks, get = FALSE, log = FALSE, include.lowest = TRUE, ...) colourByBasepairFrequency(helix, cols, get = TRUE) colourByUnknottedGroups(helix, cols, get = TRUE) colourByCovariation(helix, msa, cols, get = FALSE) colourByConservation(helix, msa, cols, get = FALSE) colourByCanonical(helix, msa, cols, get = FALSE) defaultPalette()
helix |
A helix data frame to be coloured. |
cols |
An array of characters (or numbers) representing a set of colours to
colour |
counts |
An array of integers the same length as |
breaks |
An integer number of intervals to break the ‘value’ column of
|
get |
If TRUE, returns the input |
log |
If TRUE, will breaks values into even log10 space intervals, useful when values are p-values. |
include.lowest |
Whether the lowest interval should include the lowest value, passed to
|
... |
Additional arguments passed to |
msa |
A multiple sequence alignment. Can be either a |
colourByCount
assigns colours indepenent of the helix input's value
column, and instead operates over the number of helices (i.e. rows).
colourByValue
uses cut
to assign each of the helices
to an interval based on its value.
colourByCovariation
, colourByConservation
, and
colourByCanonical
, colour helices according to compensatory mutations
(or covariation), percentage identity conservation, and percentage canonical
basepair repsectively, relative to the multiple sequence alignment provided.
colourByBasepairFrequency
colours each basepair according to the
number of times it appear in the input, regardless of its value.
colourByUnknottedGroups
greedily partitions the basepairs into non-
pseudoknotted groups, and assigns a colour to each.
All “colourBy” functions return a list of colours when get =
FALSE
, and a helix with a col
column if get = TRUE
. In both
bases, the returned object has attributes “legend” and “fill”,
showing the mapping between interval (in legend) and colour (in fill),
which can as eponymous arguments legend
.
defaultPalette
returns the default list of colours.
Daniel Lai
data(helix) known$col <- colourByCount(known) plotHelix(known) plotHelix(colourByValue(helix, log = TRUE, get = TRUE)) cov <- colourByCovariation(known, fasta, get = TRUE) plotCovariance(fasta, cov) legend("topleft", legend = attr(cov, "legend"), fill = attr(cov, "fill"), title = "Covariation")
data(helix) known$col <- colourByCount(known) plotHelix(known) plotHelix(colourByValue(helix, log = TRUE, get = TRUE)) cov <- colourByCovariation(known, fasta, get = TRUE) plotCovariance(fasta, cov) legend("topleft", legend = attr(cov, "legend"), fill = attr(cov, "fill"), title = "Covariation")
Converts dot bracket vienna format to and from helix format. It should be noted that the allows structures of vienna is a subset of those allowed in the helix format. Thus, conversion from vienna to helix will yield the identical structure, while conversion from helix to vienna may result in the loss of certain basepairs (mainly those that are conflicting). Pseudoknots are supported in both directions of conversion with limitations.
viennaToHelix(vienna, value = NA, palette = NA) helixToVienna(helix) helixToConnect(helix) helixToBpseq(helix)
viennaToHelix(vienna, value = NA, palette = NA) helixToVienna(helix) helixToConnect(helix) helixToBpseq(helix)
vienna |
A string containing only a vienna dot bracket structure, with balanced brackets. Allowable brackets are (, <, [, {, A, B, C, and D (where upper-case alphabets are paired with lower-case alphabets). |
value |
A numerical value to assign to all helices. |
palette |
A list of colour names for up to 8 colours that will be used to colour brackets of type (, <, [, {, A, B, C, and D, respectively. |
helix |
A helix data.frame. |
viennaToHelix
will ignore any non dot-bracket characters prior
to parsing, so the resultant length will be shorter than expected if invalid
characters are included.
If the colour palette is less than the number of supported brackets, it will simply cycle through the list. To explicitly prevent the colouring/ display of specific bracket type, colour it “NA”.
For helixToVienna
, pseudoknotted basepairs will be assigned
different bracket types. As there are only 8 supported bracket types,
any basepair pseudonotted deeper than 8 levels will be excluded from the
output. Additionally, vienna format is unable to respresent conflicting
basepairs, so conflicting basepairs will also be excluded. For both types
of exclusion, those at the bottom of the helix data.frame will always be
excluded in favour of keeping helices higher on the data.frame table.
helixToConnect
and helixToBpseq
will convert a
non-conflicting helix data.frame into connect or bpseq format
repsectively, provided the helix structure has a “sequence” attribute
containing a single nucleotide sequence of the structure.
viennaToHelix
returns a helix data.frame. helixToVienna
returns a character string of basepairs in the Vienna helix format.
helixToConnect
and HelixTpBpseq
return data.frames in the
connect and bpseq formats, respectively.
Daniel Lai
# viennaToHelix demonstrating ALL valid bracket symbols dot_bracket <- ".....(<[{.....ABCD.....}]>).....dcba....." parsed <- viennaToHelix(dot_bracket, -31.5) print(parsed) vienna <- helixToVienna(parsed) print(vienna) # Colouring the brackets by bracket type colour <- c("red", "orange", "yellow", "green", "lightblue", "blue", "purple", "black") double.rainbow <- viennaToHelix(dot_bracket, 0, colour) plotHelix(double.rainbow)
# viennaToHelix demonstrating ALL valid bracket symbols dot_bracket <- ".....(<[{.....ABCD.....}]>).....dcba....." parsed <- viennaToHelix(dot_bracket, -31.5) print(parsed) vienna <- helixToVienna(parsed) print(vienna) # Colouring the brackets by bracket type colour <- c("red", "orange", "yellow", "green", "lightblue", "blue", "purple", "black") double.rainbow <- viennaToHelix(dot_bracket, 0, colour) plotHelix(double.rainbow)
Given a multiple sequence alignment and a corresponding secondary structure, nucleotides in the sequence alignment will be coloured according to the basepairing and conservation status, where green is the most commonly observed valid basepair in the column, dark blue being valid covariation (i.e. mutation into another valid basepair), cyan is one-sided mutation that retains the basepair, and red is a mutation where the basepair has been lost.
plotCovariance(msa, helix, arcs = TRUE, add = FALSE, grid = FALSE, text = FALSE, legend = TRUE, species = 0, base.colour = FALSE, palette = NA, flip = FALSE, grid.col = "white", grid.lwd = 0, text.cex = 0.5, text.col = "white", text.font = 2, text.family = "sans", species.cex = 0.5, species.col = "black", species.font = 2, species.family = "mono", shape = "circle", conflict.cutoff = 0.01, conflict.lty = 2, conflict.col = NA, pad = c(0, 0, 0, 0), y = 0, x = 0, ...) plotDoubleCovariance(top.helix, bot.helix, top.msa, bot.msa = top.msa, add = FALSE, grid = FALSE, species = 0, legend = TRUE, pad = c(0, 0, 0, 0), ...) plotOverlapCovariance(predict.helix, known.helix, msa, bot.msa = TRUE, overlap.cutoff = 1, miss = "black", add = FALSE, grid = FALSE, species = 0, legend = TRUE, pad = c(0, 0, 0, 0), ...)
plotCovariance(msa, helix, arcs = TRUE, add = FALSE, grid = FALSE, text = FALSE, legend = TRUE, species = 0, base.colour = FALSE, palette = NA, flip = FALSE, grid.col = "white", grid.lwd = 0, text.cex = 0.5, text.col = "white", text.font = 2, text.family = "sans", species.cex = 0.5, species.col = "black", species.font = 2, species.family = "mono", shape = "circle", conflict.cutoff = 0.01, conflict.lty = 2, conflict.col = NA, pad = c(0, 0, 0, 0), y = 0, x = 0, ...) plotDoubleCovariance(top.helix, bot.helix, top.msa, bot.msa = top.msa, add = FALSE, grid = FALSE, species = 0, legend = TRUE, pad = c(0, 0, 0, 0), ...) plotOverlapCovariance(predict.helix, known.helix, msa, bot.msa = TRUE, overlap.cutoff = 1, miss = "black", add = FALSE, grid = FALSE, species = 0, legend = TRUE, pad = c(0, 0, 0, 0), ...)
msa , top.msa , bot.msa
|
A multiple sequence alignment. Can be either a
|
helix , top.helix , bot.helix , predict.helix , known.helix
|
A helix data.frame with a structure corresponding to See |
arcs |
TRUE if the structure should be plotted as arcs. Arcs may be styled
with styling columns, see example and |
add |
TRUE if graphical elements are to be added to an existing device, else
a new plotting device is created with |
grid |
TRUE if the multiple sequence alignment is to be drawn as a grid of bases, else the multiple sequence alignment is drawn as equidistant horizontal lines. |
text |
Only applicable when grid is TRUE. TRUE if the grid is to be filled with nucleotide character. |
legend |
TRUE if legend are to be shown. |
species |
If a number greater than 0 is given, then species names for the multiple sequence alignment will be printed along the left side. This name is typically the entire header lines of FASTA entries. The number specifies the start position relative to the left edge of the multiple sequence alignment). |
base.colour |
TRUE if bases are to be coloured by nucleotide instead of basepair conservation. |
palette |
A list of colour names to override the default colour palette. When base.colour is TRUE, the first 6 colours will be used for colouring bases A, U, G, C, - (gap), and ? (everything else), respectively. When base.colour is FALSE, the first 7 colours will be used for colouring conserved basepairs, covarying basepairs, one-sided conserved basepairs, invalid basepairs, unpaired bases, gaps, and bases/pairs with ambiguous bases, resepctively. If the palette is shorter than the expected length, the palette will simply cycle. “NA” is a valid colour, that will effectively plot nothing. |
flip |
If TRUE, the entire plot will be flipped upside down. Note that this is not a perfect mirror image about the horizon. |
grid.col , grid.lwd
|
The colour and line width of the borders displayed when |
text.cex , text.col , text.font , text.family
|
cex, col, family and font for the text displayed via the |
species.cex , species.col , species.font , species.family
|
cex, col, family and font for the species text displayed via the
|
shape |
One of "circle", "triangle", or "square", specifying the shape of the arcs. |
conflict.lty , conflict.col , conflict.cutoff
|
Determines the line type (style) and colour to be used for conflicting
basepairs. By default, conflicting helices are drawn as dotted lines
( |
miss |
The colour for unpredicted arcs in overlapping diagrams, see
|
overlap.cutoff |
Decimal between 0 and 1 indicating the percentage of basepairs within a helix that have to be overlapping for the entire helix to count as overlapping. Default is 1, or 100 |
pad |
A four integer array passed to |
x , y
|
Coordinates for the left bottom corner of the plot. Useful for manually positioning and overlapping figure elements. |
... |
In For |
Not intended to return a value, will plot to GUI or file if specific.
Daniel Lai
data(helix) # Basic covariance plot plotCovariance(fasta, known, cex = 0.8, lwd = 1.5) # Grid mode plotCovariance(fasta, known, grid = TRUE, text = FALSE, cex = 0.8) # Global style and nucleotide colouring plotCovariance(fasta, known, grid = TRUE, text = FALSE, base.colour = TRUE) # Styling indivual helices with styling columns known$col <- c("red", "blue") plotCovariance(fasta, known, lwd = 2, cex = 0.8) # Use in combination with colourBy functions cov <- colourByCovariation(known, fasta, get = TRUE) plotCovariance(fasta, cov) legend("topleft", legend = attr(cov, "legend"), fill = attr(cov, "fill"), title = "Covariation")
data(helix) # Basic covariance plot plotCovariance(fasta, known, cex = 0.8, lwd = 1.5) # Grid mode plotCovariance(fasta, known, grid = TRUE, text = FALSE, cex = 0.8) # Global style and nucleotide colouring plotCovariance(fasta, known, grid = TRUE, text = FALSE, base.colour = TRUE) # Styling indivual helices with styling columns known$col <- c("red", "blue") plotCovariance(fasta, known, lwd = 2, cex = 0.8) # Use in combination with colourBy functions cov <- colourByCovariation(known, fasta, get = TRUE) plotCovariance(fasta, cov) legend("topleft", legend = attr(cov, "legend"), fill = attr(cov, "fill"), title = "Covariation")
Creates a blank plotting canvas with the given dimensions, along with functions to find best values for the canvas dimensions.
blankPlot(width, top, bottom, pad = c(0, 0, 0, 0), scale = TRUE, scale.lwd = 1, scale.col = "#DDDDDD", scale.cex = 1, debug = FALSE, png = NA, pdf = NA, factor = ifelse(!is.na(png), 8, 1/9), no.par = FALSE, asp = 1,...) maxHeight(helix)
blankPlot(width, top, bottom, pad = c(0, 0, 0, 0), scale = TRUE, scale.lwd = 1, scale.col = "#DDDDDD", scale.cex = 1, debug = FALSE, png = NA, pdf = NA, factor = ifelse(!is.na(png), 8, 1/9), no.par = FALSE, asp = 1,...) maxHeight(helix)
width |
A number indicating the horizontal width of the blank plot. |
top , bottom
|
The maximum and minimum values vertically to be displayed in the plot. |
pad |
An array of 4 integers, specifying the pixels of whitespace to pad beyond the dimensions given by top, bottom, and width. Four number corresponding to padding on the bottom, left, top and right, respectively. Default is c(0, 0, 0, 0). |
scale |
If TRUE, inserts a scale on the plot. |
scale.lwd , scale.col , scale.cex
|
Allows manual modification of the scale's line width and colour, respectively. |
png , pdf
|
If one or the other is set to a filename, a file in png or pdf format
will be produced respectively. If both are set to non-NA values,
|
factor |
The scaling factor used to produce plots of png or pdf format. Should
be set so after multiplication of the |
debug |
If TRUE, frames the boundaries of the intended plotting space in red, used to determine if inputs produce expected output area. Also outputs to STDIN dimensions of the plot. |
no.par |
Suppresses the internal call to |
asp |
Controls and aspect ratio of the plot, defaultly set to 1, set to NA to disable completely. |
... |
Additional arguments passed to |
helix |
A helix data.frame |
blankPlot
creates a blank plot with the given dimensions, with
minimal margins around the plot and no axis or labels. If more control
is required, using plot
directly would be more efficient.
maxHeight
returns the height that the highest helix would require,
and can be used to determine top
and bottom
for
blankPlot
.
maxHeight
returns a numeric integer.
Daniel Lai
# Create helix and obtain height helix <- as.helix(data.frame(1, 37, 12, 0.5)) height <- maxHeight(helix) print(height) # Use height to create properly sized plot width <- attr(helix, "length") blankPlot(width, height, 0) # Add helix to plot plotHelix(helix, add = TRUE)
# Create helix and obtain height helix <- as.helix(data.frame(1, 37, 12, 0.5)) height <- maxHeight(helix) print(height) # Use height to create properly sized plot width <- attr(helix, "length") blankPlot(width, height, 0) # Add helix to plot plotHelix(helix, add = TRUE)
This data set contains two sets of helices and a multiple sequence alignment.
The two sets of helices are helices
and known
which are
helices predicted to occur for RNA sequence RF00458 by the program TRANSAT,
and experimentally proposed structure of the same sequence, respectively.
fasta
is the seed homologues for the multiple sequence alignment
obtained from the RFAM database.
data(helix)
data(helix)
helix
and known
are 4 column data frames, where columns
i and j denote the left-most and right-most basepairs, the
length is the number of consecutive basepairs the helix
contains, and the value is assigned to each helix on a row.
fasta
is an array of named characters of length 7.
fasta
is an array of strings, helix
and known
are
data.frames in “helix” format.
Wiebe NJ, Meyer IM. (2010) TRANSAT– method for detecting the conserved helices of functional RNA structures, including transient, pseudo-knotted and alternative structures. PLoS Comput Biol. 6(6):e1000823.
Gardner PP, Daub J, Tate J, Moore BL, Osuch IH, Griffiths-Jones S, Finn RD, Nawrocki EP, Kolbe DL, Eddy SR, Bateman A. (2011) Rfam: Wikipedia, clans and the “decimal” release. Nucleic Acids Res. 39(Database issue):D141-5.
Breaks down input helices into basepairs, and assigns each basepair to a numbered group such that basepairs in each group are non-pseudoknotted relative to all other basepairs within the same group.
The algorithm is greedy and thus will not find the best combination of basepairs to minimize the number of groups.
unknottedGroups(helix)
unknottedGroups(helix)
helix |
A helix data.frame. |
An array of integers dictating the groups of each helix. Will only
correspond to the input helix structure if the input had helices of length 1
(e.g. output of expandHelix
).
Daniel Lai
data(helix) known$group <- unknottedGroups(known) print(known)
data(helix) known$group <- unknottedGroups(known) print(known)
Given a helix data frame, checks if helices are conflicting, duplicating, or overlapping, and returns an array of numeric values, where 0 is FALSE and 1 is TRUE. Values in between 0 and 1 occur when a single helix has multiple basepairs with different values, the number observed in this case is the mean of the basepair values within the helix. See details for exact definition of the three types of events.
isConflictingHelix(helix) isDuplicatingHelix(helix) isOverlappingHelix(helix, query)
isConflictingHelix(helix) isDuplicatingHelix(helix) isOverlappingHelix(helix, query)
helix |
A helix data frame |
query |
For |
Helices of length greater than 1 are internally expanded into basepairs of length 1, after which the following conditions are evaluated:
A conflicting basepair is one where at least one of its two positions is used by either end of another basepair.
A duplicating basepair is one where both of its positions are used by both ends of another basepair.
An overlapping basepair is one in helix
where both of its
positions are used by both ends of another basepair in the query
structure.
In the case of conflicting and duplicating basepairs, for a
set of basepairs that satisfies this condition, the basepair situation
highest on the data frame will be exempt from the condition. i.e. Say 5
basepairs are all duplicates of each other, the top 1 will return FALSE,
while the bottom 4 will return TRUE. This assumes some significant meaning
to the ordering of rows prior to using this function. This is to be used
with which
to filter out basepairs that satisfy these conditions,
leaving a set of basepairs free of these events.
If the original input had helices greater than length 1, then after applying all of the above, TRUE is treated as 1, FALSE as 0, and the average of values from each basepair is taken as the value for the helix in question.
Returns an array of numerics corresponding to each row of helix
, giving
the average conditional status of the helix, where 0 signifying all basepairs
are FALSE, and 1 where all basepairs are TRUE.
Daniel Lai
data(helix) conflicting <- isConflictingHelix(helix) duplicating <- isDuplicatingHelix(helix) # Nonsensical covariation plot plotCovariance(fasta, helix) # Plot nonconflicting helices plotCovariance(fasta, helix[(!conflicting & !duplicating), ]) # Similar result plotCovariance(fasta, helix, conflict.col = "lightgrey")
data(helix) conflicting <- isConflictingHelix(helix) duplicating <- isDuplicatingHelix(helix) # Nonsensical covariation plot plotCovariance(fasta, helix) # Plot nonconflicting helices plotCovariance(fasta, helix[(!conflicting & !duplicating), ]) # Similar result plotCovariance(fasta, helix, conflict.col = "lightgrey")
Sequence, floor and ceiling operations in log 10 space.
logseq(from, to, length.out) logfloor(x) logceiling(x)
logseq(from, to, length.out) logfloor(x) logceiling(x)
from , to
|
Positive non-zero values to start and end sequence, respectively. |
length.out |
The number of elements the resulting sequence should containg. If absent, function will attempt to generate numbers factors of 10 apart. |
x |
A value to round. |
logseq
returns an array numbers evenly distanced in log10-space.
logfloor
and logceiling
return a value that is 10 raised
to an integer number.
Daniel Lai
logseq(1e-10, 1e3) logseq(1e-10, 1e3, length.out = 10) logceiling(2.13e-6) logfloor(2.13e-6)
logseq(1e-10, 1e3) logseq(1e-10, 1e3, length.out = 10) logceiling(2.13e-6) logfloor(2.13e-6)
Plots a helix data frame as an arc diagram, with styling possible with properly named additional columns on the data frame.
plotHelix(helix, x = 0, y = 0, flip = FALSE, line = FALSE, arrow = FALSE, add = FALSE, shape = "circle", ...) plotDoubleHelix(top, bot, line = TRUE, arrow = FALSE, add = FALSE, ...) plotOverlapHelix(predict, known, miss = "black", line = TRUE, arrow = FALSE, add = FALSE, overlap.cutoff = 1, ...) plotArcs(i, j, length, x = 0, y = 0, flip = FALSE, shape = "circle", ...) plotArc(i, j, x = 0, y = 0, flip = FALSE, shape = "circle", ...)
plotHelix(helix, x = 0, y = 0, flip = FALSE, line = FALSE, arrow = FALSE, add = FALSE, shape = "circle", ...) plotDoubleHelix(top, bot, line = TRUE, arrow = FALSE, add = FALSE, ...) plotOverlapHelix(predict, known, miss = "black", line = TRUE, arrow = FALSE, add = FALSE, overlap.cutoff = 1, ...) plotArcs(i, j, length, x = 0, y = 0, flip = FALSE, shape = "circle", ...) plotArc(i, j, x = 0, y = 0, flip = FALSE, shape = "circle", ...)
helix , top , bot , predict , known
|
Helix data.frames, with the four mandatory columns. Any other column
will be considered a styling column, and will be used for styling the
helix. See |
x , y
|
The coordinate of the left bottom corner of the plot, useful for manually positioning figure elements. |
flip |
If TRUE, flips the arcs upside down about the y-axis. |
line |
If TRUE, a horizontal line representing the sequence is plotted. |
arrow |
If TRUE, an arrow is played on the right end of the line. |
add |
If TRUE, graphical elements are added to the active plot device, else a new plot device is created for the plot. |
shape |
One of "circle", "triangle", or "square", specifying the shape of the arcs. |
miss |
The colour for unpredicted arcs in overlapping diagrams, see
|
overlap.cutoff |
Decimal between 0 and 1 indicating the percentage of basepairs within a helix that have to be overlapping for the entire helix to count as overlapping. Default is 1, or 100 |
i , j
|
The starting and ending position of the arc along the x-axis |
length |
The total number of arcs to draw by incrementing i and decrementing j. Used to draw helices. |
... |
Any additional parameters passed to |
plotHelix
creates a arc diagram with all arcs on top,
plotDoubleHelix
creates a diagram with arcs on the top and bottom.
plotOverlapHelix
is slight trickier, and given two structures
predict
and known
, plots the predicted helices that are known
on top, predicted helices that are not known on the bottom, and finally plots
unpredicted helices on top in the colour defined by miss
.
plotArc
and plotArcs
are the core functions that make everything
work, and may be used for extreme fine-tuning and customization.
Not intended to return a value, will plot to GUI or file if specific.
Daniel Lai
data(helix) # Plot helix plain plotHelix(known) # Apply global appearance options plotHelix(known, line = TRUE, arrow = TRUE, col = "blue", lwd = 1.5) # Add extra column with styling options known$lty <- 1:4 known$lwd <- 1:2 known$col <- c(rgb(1, 0, 0), "orange", "yellow", "#00FF00", 4, "purple") plotHelix(known) # Manually colour helices according to value helix$col <- "red" helix$col[which(helix$value < 1e-3)] <- "orange" helix$col[which(helix$value < 1e-4)] <- "green" helix$col[which(helix$value < 1e-5)] <- "blue" plotHelix(helix) # Automatically creating a similar plot with legend coloured <- colourByValue(helix, log = TRUE, get = TRUE) plotHelix(coloured, line = TRUE, arrow = TRUE) legend("topleft", legend = attr(coloured, "legend"), fill = attr(coloured, "fill"), title = "P-value", text.col = "black") # Plot both helices with styles plotDoubleHelix(helix, known) # Overlap helix plotOverlapHelix(helix, known)
data(helix) # Plot helix plain plotHelix(known) # Apply global appearance options plotHelix(known, line = TRUE, arrow = TRUE, col = "blue", lwd = 1.5) # Add extra column with styling options known$lty <- 1:4 known$lwd <- 1:2 known$col <- c(rgb(1, 0, 0), "orange", "yellow", "#00FF00", 4, "purple") plotHelix(known) # Manually colour helices according to value helix$col <- "red" helix$col[which(helix$value < 1e-3)] <- "orange" helix$col[which(helix$value < 1e-4)] <- "green" helix$col[which(helix$value < 1e-5)] <- "blue" plotHelix(helix) # Automatically creating a similar plot with legend coloured <- colourByValue(helix, log = TRUE, get = TRUE) plotHelix(coloured, line = TRUE, arrow = TRUE) legend("topleft", legend = attr(coloured, "legend"), fill = attr(coloured, "fill"), title = "P-value", text.col = "black") # Plot both helices with styles plotDoubleHelix(helix, known) # Overlap helix plotOverlapHelix(helix, known)
Reads in secondary structure text files into a helix data frame.
readHelix(file) readConnect(file) readVienna(file, palette = NA) readBpseq(file)
readHelix(file) readConnect(file) readVienna(file, palette = NA) readBpseq(file)
file |
A text file in connect format, see details for format specifications. |
palette |
Used to colour basepairs by bracket type. See
|
Helix: Files start with a header line beginning with # followed by the sequence length, followed by a four-column tab-delimited table (with column names), where each row corresponds to a helix in the structure. The four columns are i and j for the left-most and right-most basepair positions respectively, the length of the helix (converging inwards from i and j, and finally an arbitrary value assigned to the helix.
Vienna: Dot-bracket notation from Vienna package programs, where each structure consists of matched brackets for basepairs and periods for unbased pairs. Valid brackets are (, , [, <, A, B, C, D matched with ), , ], >, a, b, c, d, respectively. An energy value can be appended to the end of any dot-bracket structure. The function will accept slight variations of the format, including those with FASTA-like headers (in which case line breaks are allows), and those without FASTA-like headers (in which case line breaks are NOT allowed), with both types allowing for a preceding (NOT following) nucleotide sequence for the structure. Multiple entries of the same length may be in a single file, which will be returned as a single helix structure, with respectively energy values (if specified).
Connect: Output from mfold and other programs, this format is expected to be a text file beginning with a header line that starts with the sequence length, with an optional Energy/dG value, followed by a six-column tab-delimited table where columns 1 and 5 denote the position that are basepaired (unpaired when column 5 is 0). Other columns are ignored, but for completeness, column 2 is the nucleotide, column 3 and 4 are the positions of the bases left and right of the base specified in column 1 respectively (with 0 denoting non-existance), and column 6 a copy of column 1. Multiple entries of the same length may be in a single file, which will be returned as a single helix structure. All helices will be assigned the energy value extracted from their respective structure header lines.
Bpseq: Format used by the Gutell Lab's Comparative RNA Website. The file may optionally begin several header lines (e.g. Filename, Organism, Accession, etc.), followed by a 3-column tab-delimited table for the structure, where column 1 is the base position, base 2 is the nucleotide base, and column 3 is the paired position (0 if unpaired). Certain pieces of header information will be parsed and returned as attributes of the output data frame. Multiple structures can be within a single file, returned as a single helix data frame, with attributes set to those of the first entry.
Returns a helix format data frame.
Daniel Lai, Jeff Proctor
file <- system.file("extdata", "helix.txt", package = "R4RNA") helix <- readHelix(file) head(helix) file <- system.file("extdata", "connect.txt", package = "R4RNA") connect <- readConnect(file) head(connect) message("Note connect data assigns structure energy level to all basepairs") file <- system.file("extdata", "vienna.txt", package = "R4RNA") vienna <- readVienna(file) head(vienna) message("Note vienna data assigns structure energy level to all basepairs") file <- system.file("extdata", "bpseq.txt", package = "R4RNA") bpseq <- readBpseq(file) head(bpseq) message("Note bpseq data has no value assigned to basepairs")
file <- system.file("extdata", "helix.txt", package = "R4RNA") helix <- readHelix(file) head(helix) file <- system.file("extdata", "connect.txt", package = "R4RNA") connect <- readConnect(file) head(connect) message("Note connect data assigns structure energy level to all basepairs") file <- system.file("extdata", "vienna.txt", package = "R4RNA") vienna <- readVienna(file) head(vienna) message("Note vienna data assigns structure energy level to all basepairs") file <- system.file("extdata", "bpseq.txt", package = "R4RNA") bpseq <- readBpseq(file) head(bpseq) message("Note bpseq data has no value assigned to basepairs")
Calculates a score that indicates how badly a set of basepairs (i.e. a secondary structure) fits with a sequence. A perfect fit is a structure where all basepairs form valid basepairs (A:U, G:C, G:U, and equivalents) and has a score of 0. Each basepair that forms a non-canonical pairing or pairs to gaps increases the score by 1, and each base-pair with a single-sided gap increases the score by 2.
structureMismatchScore(msa, helix, one.gap.penalty = 2, two.gap.penalty = 2, invalid.penalty = 1)
structureMismatchScore(msa, helix, one.gap.penalty = 2, two.gap.penalty = 2, invalid.penalty = 1)
msa |
A multiple sequence alignment. Can be either a |
helix |
A helix data.frame |
one.gap.penalty |
Penalty score for basepairs with one of the bases being a gap |
two.gap.penalty |
Penalty score for basepairs with both bases being a gaps |
invalid.penalty |
Penalty score for non-canonical basepairs |
Returns an array of mismatch scores.
Jeff Proctor, Daniel Lai
data(helix) mismatch <- structureMismatchScore(fasta, known) # Sort by increasing mismatch sorted_fasta <- fasta[order(mismatch)]
data(helix) mismatch <- structureMismatchScore(fasta, known) # Sort by increasing mismatch sorted_fasta <- fasta[order(mismatch)]
Write out a helix data frame into a text file into the four-column tab-delimited format with proper header and column names.
writeHelix(helix, file = stdout())
writeHelix(helix, file = stdout())
helix |
A helix data frame. |
file |
A character string pointing to a file path, or a file connection. Defaults to the console. |
No value returned, will write to STDOUT or specific file location.
Daniel Lai
# Create helix data frame helix <- data.frame(2, 8, 3, 0.5) helix[2, ] <- c(5, 15, 4, -0.5) helix <- as.helix(helix) writeHelix(helix)
# Create helix data frame helix <- data.frame(2, 8, 3, 0.5) helix[2, ] <- c(5, 15, 4, -0.5) helix <- as.helix(helix) writeHelix(helix)