| Title: | Implementation of the dot bracket annotations with Biostrings |
|---|---|
| Description: | The Structstrings package implements the widely used dot bracket annotation for storing base pairing information in structured RNA. Structstrings uses the infrastructure provided by the Biostrings package and derives the DotBracketString and related classes from the BString class. From these, base pair tables can be produced for in depth analysis. In addition, the loop indices of the base pairs can be retrieved as well. For better efficiency, information conversion is implemented in C, inspired to a large extend by the ViennaRNA package. |
| Authors: | Felix G.M. Ernst [aut, cre] (ORCID: <https://orcid.org/0000-0001-5064-0928>) |
| Maintainer: | Felix G.M. Ernst <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.29.0 |
| Built: | 2026-05-30 09:44:30 UTC |
| Source: | https://github.com/bioc/Structstrings |
convertAnnotation converts a type of dot bracket annotation into
another. This only works if the original bracket type is present and the
target bracket type is not.
convertAnnotation(x, from, to) ## S4 method for signature 'DotBracketString' convertAnnotation(x, from, to) ## S4 method for signature 'DotBracketStringSet' convertAnnotation(x, from, to) ## S4 method for signature 'DotBracketStringSetList' convertAnnotation(x, from, to)convertAnnotation(x, from, to) ## S4 method for signature 'DotBracketString' convertAnnotation(x, from, to) ## S4 method for signature 'DotBracketStringSet' convertAnnotation(x, from, to) ## S4 method for signature 'DotBracketStringSetList' convertAnnotation(x, from, to)
x |
a |
from |
which annotation type should be converted? Must be one of the
following values: |
to |
Into which annotation type should the selected one be converted?
Must be one of the following values:
|
The modified input object, a DotBracketString* object.
str <- "((.))..[[..]]...{{..}}......." dbs <- DotBracketString(str) convertAnnotation(dbs, 1L, 2L)str <- "((.))..[[..]]...{{..}}......." dbs <- DotBracketString(str) convertAnnotation(dbs, 1L, 2L)
The DotBracketDataFrame and DotBracketDFrame object is derived
from the DataFrame and
DFrame classes.
DotBracketDataFrame implents the concept and can be used to implement
other backends than the in-memory one as done by DotBracketDFrame.
The DotBracketDataFrameList is implemented analogous, which is also
available as CompressedSplitDotBracketDataFrameList. Since the names
are quite long, the following short cut functions are available for object
creation: DBDF, DBDFL and SDBDFL.
The DotBracketDataFrame can only contain 5 columns, which are named
pos, forward, reverse, character and base.
The last two columns are optional. The type of the first three has to be
integer, whereas the fourth is a character and fifth is a
XStringSet column.
Upon creation and modification, the validity of the contained base pairing information is checked. If the information is not correct, an error is thrown.
DotBracketDataFrame(..., row.names = NULL) DBDF(...) DotBracketDataFrameList(...) DBDFL(...) SplitDotBracketDataFrameList(..., compress = TRUE, cbindArgs = FALSE) SDBDFL(..., compress = TRUE, cbindArgs = FALSE)DotBracketDataFrame(..., row.names = NULL) DBDF(...) DotBracketDataFrameList(...) DBDFL(...) SplitDotBracketDataFrameList(..., compress = TRUE, cbindArgs = FALSE) SDBDFL(..., compress = TRUE, cbindArgs = FALSE)
... |
for |
row.names |
See |
compress |
If |
cbindArgs |
If |
a DotBracketDataFrame* object.
# Manual creation df <- DataFrame(pos = c(1,2,3,4,5,6), forward = c(6,5,0,0,2,1), reverse = c(1,2,0,0,5,6)) # Either works dbdf <- as(df,"DotBracketDataFrame") dbdf <- DotBracketDataFrame(df) # With multiple input DataFrames a SplitDotBracketDataFrameList is returned dbdfl <- DotBracketDataFrame(df,df,df,df) # Creation from a DotBracketString object is probably more common data("dbs", package = "Structstrings") dbdfl <- getBasePairing(dbs) # Elements are returned as DotBracketDataFrames dbdfl[[1]]# Manual creation df <- DataFrame(pos = c(1,2,3,4,5,6), forward = c(6,5,0,0,2,1), reverse = c(1,2,0,0,5,6)) # Either works dbdf <- as(df,"DotBracketDataFrame") dbdf <- DotBracketDataFrame(df) # With multiple input DataFrames a SplitDotBracketDataFrameList is returned dbdfl <- DotBracketDataFrame(df,df,df,df) # Creation from a DotBracketString object is probably more common data("dbs", package = "Structstrings") dbdfl <- getBasePairing(dbs) # Elements are returned as DotBracketDataFrames dbdfl[[1]]
The DotBracketString extends the
BString class. The
DotBracketStringSet and DotBracketStringSetList classes are
implemented accordingly.
The alphabet consists of the letters
(, ), ., <, >, [, ],
{ and }, which describes base pairing between positions. The
. letter describes an unpaired position. The number of opening and
closing letters need to be equal within a DotBracketString to be a
valid dot bracket annotation. This is checked upon creation and modificiation
of the object.
The objects can also be created using the shorter function names DB,
DBS and DBSL.
Currently, there is no distinction in base pairing strength between the different bracket types.
DotBracketString(x = "", start = 1, nchar = NA) DB(x = character(), start = 1, nchar = NA) DotBracketStringSet(x = character()) DBS(x = character()) DotBracketStringSetList(..., use.names = TRUE) DBSL(..., use.names = TRUE) ## S4 method for signature 'DotBracketString' alphabet(x) ## S4 method for signature 'DotBracketString' encoding(x)DotBracketString(x = "", start = 1, nchar = NA) DB(x = character(), start = 1, nchar = NA) DotBracketStringSet(x = character()) DBS(x = character()) DotBracketStringSetList(..., use.names = TRUE) DBSL(..., use.names = TRUE) ## S4 method for signature 'DotBracketString' alphabet(x) ## S4 method for signature 'DotBracketString' encoding(x)
x |
|
start |
|
nchar |
|
... |
|
use.names |
|
a DotBracketString* object.
str <- "((.))..[[..]]...{{..}}..<<..>>" db <- DotBracketString(str) dbs <- DotBracketStringSet(c("structure1" = str, "structure2" = str)) dbsl <- DotBracketStringSetList(list(first = dbs, second = dbs))str <- "((.))..[[..]]...{{..}}..<<..>>" db <- DotBracketString(str) dbs <- DotBracketStringSet(c("structure1" = str, "structure2" = str)) dbsl <- DotBracketStringSetList(list(first = dbs, second = dbs))
readDotBracketStringSet and writeDotBracketStringSet are
functions to read and write dot bracket strings from/to file. Since the
<> is in conflict with the fasta format, saving to fastq file is
sometimes the only option. Saving a string with a <> bracket type to a
fasta file will throw an error.
The functions use the underlying Biostrings infrastructure and share
most of its parameters. For a more detailed look have a look
here.
readDotBracketStringSet( filepath, format = "fasta", nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE, with.qualities = FALSE ) writeDotBracketStringSet( x, filepath, append = FALSE, compress = FALSE, format = "fasta", ... ) saveDotBracketStringSet( x, objname, dirpath = ".", save.dups = FALSE, verbose = TRUE )readDotBracketStringSet( filepath, format = "fasta", nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE, with.qualities = FALSE ) writeDotBracketStringSet( x, filepath, append = FALSE, compress = FALSE, format = "fasta", ... ) saveDotBracketStringSet( x, objname, dirpath = ".", save.dups = FALSE, verbose = TRUE )
filepath |
The file name, when writing, or file name(s) when reading. |
format |
"fasta" or "fastq" |
nrec |
Single integer. The maximum of number of records to read in. Negative values are ignored. |
skip |
Single non-negative integer. The number of records of the data file(s) to skip before beginning to read in records. |
seek.first.rec, with.qualities, compress, ..., use.names, objname, dirpath, save.dups, verbose
|
Have a look |
x |
A DotBracketStringSet object |
append |
|
readDotBracketStringSet returns a DotBracketStringSet
object, writeDotBracketStringSet returns NULL invisibly.
data("dbs", package = "Structstrings") file <- tempfile() # works both since a DotBracketStringSet is a BStringSet writeXStringSet(dbs,file) writeDotBracketStringSet(dbs,file) # to return immediatly a DotBracketStringSet us readDotBracketStringSet() dbs2 <- readDotBracketStringSet(file)data("dbs", package = "Structstrings") file <- tempfile() # works both since a DotBracketStringSet is a BStringSet writeXStringSet(dbs,file) writeDotBracketStringSet(dbs,file) # to return immediatly a DotBracketStringSet us readDotBracketStringSet() dbs2 <- readDotBracketStringSet(file)
getBasePairing converts a dot bracket annotation from a
DotBracketString into a base pair table as
DotBracketDataFrame. Base pairing is indicated by corresponding
numbers in the forward and reverse columns.
getDotBracket converts the dot bracket annotation from a
DotBracketDataFrame into a DotBracketString. If
the character colums is populated, the information from this column
will be used. If this is not desired set force = TRUE. However ,
beaware that this will result in a dot bracket annotation, which does not
necessarilly matches the original dot bracket string it may have been
created from. It is rather the dot bracket string with the lowest number of
different loops and it will use the different dot bracket annotations one
after another. Example: "(((<<<>>>)))" will be returned as
(((((()))))). (((<<<)))>>> will be returned as
(((<<<)))>>>, ((([[[)))]]] will be eturned as
(((<<<)))>>>.
getLoopIndices converts the dot bracket annotation from a
DotBracketString or DotBracketDataFrame into a
LoopIndexList.
getBasePairing(x, compress = TRUE, return.sequence = FALSE) getDotBracket(x, force = FALSE) getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketString' getBasePairing(x) ## S4 method for signature 'DotBracketStringSet' getBasePairing(x, compress = TRUE) ## S4 method for signature 'DotBracketDataFrame' getDotBracket(x, force = FALSE) ## S4 method for signature 'DotBracketDataFrameList' getDotBracket(x, force = FALSE) ## S4 method for signature 'SimpleSplitDotBracketDataFrameList' getDotBracket(x, force = FALSE) ## S4 method for signature 'CompressedSplitDotBracketDataFrameList' getDotBracket(x, force = FALSE) ## S4 method for signature 'DotBracketString' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketStringSet' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketDataFrame' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketDataFrameList' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'SimpleSplitDotBracketDataFrameList' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'CompressedSplitDotBracketDataFrameList' getLoopIndices(x, bracket.type, warn.type.drops = TRUE)getBasePairing(x, compress = TRUE, return.sequence = FALSE) getDotBracket(x, force = FALSE) getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketString' getBasePairing(x) ## S4 method for signature 'DotBracketStringSet' getBasePairing(x, compress = TRUE) ## S4 method for signature 'DotBracketDataFrame' getDotBracket(x, force = FALSE) ## S4 method for signature 'DotBracketDataFrameList' getDotBracket(x, force = FALSE) ## S4 method for signature 'SimpleSplitDotBracketDataFrameList' getDotBracket(x, force = FALSE) ## S4 method for signature 'CompressedSplitDotBracketDataFrameList' getDotBracket(x, force = FALSE) ## S4 method for signature 'DotBracketString' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketStringSet' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketDataFrame' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketDataFrameList' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'SimpleSplitDotBracketDataFrameList' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'CompressedSplitDotBracketDataFrameList' getLoopIndices(x, bracket.type, warn.type.drops = TRUE)
x |
a |
compress |
|
return.sequence |
if the input is a |
force |
|
bracket.type |
|
warn.type.drops |
|
getBasePairing:
The result is a DotBracketDataFrame with following columns:
pos, forward, reverse, character (and optionally the base column). If a
position is unpaired, forward and reverse will be 0, otherwise it will
match the base paired positions.
getLoopIndices: returns a LoopIndexList.
data("dbs", package = "Structstrings") # conversion dbdf <- getBasePairing(dbs) # ... and the round trip dbs <- getDotBracket(dbdf) # loop indices per bracket type loopids <- getLoopIndices(dbs) # choose the bracket type manually, if necessary loopids <- getLoopIndices(dbs, bracket.type = 1L) # do not show warning if mulitple bracket types are present loopids <- getLoopIndices(dbs, bracket.type = 1L, warn.type.drops = FALSE)data("dbs", package = "Structstrings") # conversion dbdf <- getBasePairing(dbs) # ... and the round trip dbs <- getDotBracket(dbdf) # loop indices per bracket type loopids <- getLoopIndices(dbs) # choose the bracket type manually, if necessary loopids <- getLoopIndices(dbs, bracket.type = 1L) # do not show warning if mulitple bracket types are present loopids <- getLoopIndices(dbs, bracket.type = 1L, warn.type.drops = FALSE)
With loop indeces base pairing information can be represented by giving each base pair a number and increasing/decreasing it with each opened/closed base pair. This information can be used for further analysis of the represented structure.
LoopIndexList(...)LoopIndexList(...)
... |
the |
a LoopIndexList object.
# if the object is create manually make sure it is a valid structure # information. Otherwise an error is thrown. lil <- LoopIndexList(list(c(1L,2L,3L,3L,3L,2L,1L,0L,5L,6L,6L,5L), c(1L,2L,2L,2L,2L,2L,1L,0L,5L,6L,6L,5L)))# if the object is create manually make sure it is a valid structure # information. Otherwise an error is thrown. lil <- LoopIndexList(list(c(1L,2L,3L,3L,3L,2L,1L,0L,5L,6L,6L,5L), c(1L,2L,2L,2L,2L,2L,1L,0L,5L,6L,6L,5L)))
The Structstrings package implements the widely used to bracket
annotation for storing base pairing information in structured RNA. For
example it is used in the ViennaRNA package (Lorenz et al. 2011), the
tRNAscan-SE software (Lowe et al. 1997) and the tRNAdb (Jühling et al. 2009).
Structstrings uses the infrastructure provided by the
Biostrings package and derives the class
DotBracketString and such from the equivalent
BString class. From these base pair table can be produced for
in depth analysis. For this purpose the DotBracketDataFrame
class is derived from the DataFrame class. In addition the loop
IDs of the base pairs can be retrieved as a LoopIndexList, a
derivate if the IntegerList. Generally, it checks automatically
for the validity of the dot bracket annotation.
The conversion of the DotBracketString to the base pair table
and the loop indices is implemented in C for efficiency. The C implementation
to a large extent inspired by the
ViennaRNA package.
This package was developed as a requirement for the tRNA package.
However, other projects might benefit as well, so it was split of and
improved upon.
Please refer to the Structstrings vignette for an example how to work and use the package: Structstrings.
Felix G M Ernst [aut,cre]
Lorenz, Ronny; Bernhart, Stephan H.; Höner zu Siederdissen, Christian; Tafer, Hakim; Flamm, Christoph; Stadler, Peter F.; Hofacker, Ivo L. (2011): "ViennaRNA Package 2.0". Algorithms for Molecular Biology 6:26. doi:10.1186/1748-7188-6-26
Lowe, T.M.; Eddy, S.R.(1997): "tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence". Nucl. Acids Res. 25: 955-964. doi:10.1093/nar/25.5.955
Jühling, Frank; Mörl, Mario; Hartmann, Roland K.; Sprinzl, Mathias; Stadler, Peter F.; Pütz, Joern (2009): "TRNAdb 2009: Compilation of tRNA Sequences and tRNA Genes." Nucleic Acids Research 37 (suppl_1): D159–D162. doi:10.1093/nar/gkn772.
Example data for using the Structstrings package
data(dbs) data(nseq)data(dbs) data(nseq)
object of class DotBracketStringSet and
DNAStringSet
An object of class DNAStringSet of length 299.
sequence and dot bracket annotation of tRNAscan-SE output for
*S. cerevisiae* imported using
tRNAscanImport. The example file
is part of the tRNAscanImport package.
Analog to Biostrings there are a few objects, which should only be
used internally, but may be of use to other package developers.
Otherwise take care.
DOTBRACKET_CHAR_VALUES DOTBRACKET_ALPHABET STRUCTURE_NEUTRAL_CHR STRUCTURE_OPEN_CHR STRUCTURE_CLOSE_CHR ## S4 replacement method for signature 'DotBracketDataFrame' x[i, j, ...] <- value ## S4 replacement method for signature 'CompressedSplitDotBracketDataFrameList' colnames(x) <- value ## S4 method for signature 'DotBracketString' seqtype(x) ## S4 method for signature 'DotBracketString' subseq(x, start = NA, end = NA, width = NA) ## S4 replacement method for signature 'DotBracketString' subseq(x, start = NA, end = NA, width = NA) <- value ## S4 replacement method for signature 'DotBracketStringSet' subseq(x, start = NA, end = NA, width = NA) <- valueDOTBRACKET_CHAR_VALUES DOTBRACKET_ALPHABET STRUCTURE_NEUTRAL_CHR STRUCTURE_OPEN_CHR STRUCTURE_CLOSE_CHR ## S4 replacement method for signature 'DotBracketDataFrame' x[i, j, ...] <- value ## S4 replacement method for signature 'CompressedSplitDotBracketDataFrameList' colnames(x) <- value ## S4 method for signature 'DotBracketString' seqtype(x) ## S4 method for signature 'DotBracketString' subseq(x, start = NA, end = NA, width = NA) ## S4 replacement method for signature 'DotBracketString' subseq(x, start = NA, end = NA, width = NA) <- value ## S4 replacement method for signature 'DotBracketStringSet' subseq(x, start = NA, end = NA, width = NA) <- value
seqtype, x, start, end, width, value, i, j, ...
|
used internally |
a integer vector of length 9 containing the integer values
of the dotbracket alphabet
a character vector of length 9 containing the single
characters of the dotbracket alphabet
a character vector of length 1 containing the character for
unpaired positions
a character vector of length 4 containing the opening
character of the dotbracket alphabet
a character vector of length 4 containing the closing
character of the dotbracket alphabet
DOTBRACKET_CHAR_VALUES DOTBRACKET_ALPHABET STRUCTURE_NEUTRAL_CHR STRUCTURE_OPEN_CHR STRUCTURE_CLOSE_CHR # the replace method for a DotBracketDataFrame had to be reimplemented # because of the requirement of columns for a DotBracketDataFrameList and # DotBracketDataFrame data("dbs", package = "Structstrings") dbdfl <- getBasePairing(dbs) # Elements are returned as DotBracketDataFrames dbdf <- dbdfl[[1]] dbdfl[[1]] <- dbdf dbdfl[1] <- dbdfl[1]DOTBRACKET_CHAR_VALUES DOTBRACKET_ALPHABET STRUCTURE_NEUTRAL_CHR STRUCTURE_OPEN_CHR STRUCTURE_CLOSE_CHR # the replace method for a DotBracketDataFrame had to be reimplemented # because of the requirement of columns for a DotBracketDataFrameList and # DotBracketDataFrame data("dbs", package = "Structstrings") dbdfl <- getBasePairing(dbs) # Elements are returned as DotBracketDataFrames dbdf <- dbdfl[[1]] dbdfl[[1]] <- dbdf dbdfl[1] <- dbdfl[1]
The StructuredXStringSet class can be used to store structure
information alongside RNA sequences. The class behaves like the
QualityScaledXStringSet
classes.
Please note, that this does not check for validity regarding base pairing capabilities.
StructuredRNAStringSet(x, structure) dotbracket(x) dotbracket(x) <- value ## S4 method for signature 'StructuredXStringSet' dotbracket(x) ## S4 replacement method for signature 'StructuredXStringSet' dotbracket(x) <- value readStructuredRNAStringSet( filepath, nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE ) writeStructuredXStringSet(x, filepath, append = FALSE, compress = FALSE, ...) ## S4 method for signature 'StructuredXStringSet' getBasePairing(x, compress = TRUE, return.sequence = FALSE) ## S4 method for signature 'StructuredXStringSet' getLoopIndices(x, bracket.type, warn.type.drops = TRUE)StructuredRNAStringSet(x, structure) dotbracket(x) dotbracket(x) <- value ## S4 method for signature 'StructuredXStringSet' dotbracket(x) ## S4 replacement method for signature 'StructuredXStringSet' dotbracket(x) <- value readStructuredRNAStringSet( filepath, nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE ) writeStructuredXStringSet(x, filepath, append = FALSE, compress = FALSE, ...) ## S4 method for signature 'StructuredXStringSet' getBasePairing(x, compress = TRUE, return.sequence = FALSE) ## S4 method for signature 'StructuredXStringSet' getLoopIndices(x, bracket.type, warn.type.drops = TRUE)
x |
For the |
structure, value
|
|
use.names, type, filepath, nrec, skip, seek.first.rec, append, ...
|
|
compress |
|
return.sequence |
|
bracket.type |
|
warn.type.drops |
See |
the dotbracket function allows access to the included
DotBracketStringSet.
a StructuredRNAStringSet object.
str <- DotBracketStringSet("(())") seq <- RNAStringSet("AGCU") sdbs <- StructuredRNAStringSet(seq,str)str <- DotBracketStringSet("(())") seq <- RNAStringSet("AGCU") sdbs <- StructuredRNAStringSet(seq,str)