Title: | Implementation of the dot bracket annotations with Biostrings |
---|---|
Description: | The Structstrings package implements the widely used dot bracket annotation for storing base pairing information in structured RNA. Structstrings uses the infrastructure provided by the Biostrings package and derives the DotBracketString and related classes from the BString class. From these, base pair tables can be produced for in depth analysis. In addition, the loop indices of the base pairs can be retrieved as well. For better efficiency, information conversion is implemented in C, inspired to a large extend by the ViennaRNA package. |
Authors: | Felix G.M. Ernst [aut, cre] |
Maintainer: | Felix G.M. Ernst <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.23.1 |
Built: | 2024-12-02 03:44:06 UTC |
Source: | https://github.com/bioc/Structstrings |
convertAnnotation
converts a type of dot bracket annotation into
another. This only works if the original bracket type is present and the
target bracket type is not.
convertAnnotation(x, from, to) ## S4 method for signature 'DotBracketString' convertAnnotation(x, from, to) ## S4 method for signature 'DotBracketStringSet' convertAnnotation(x, from, to) ## S4 method for signature 'DotBracketStringSetList' convertAnnotation(x, from, to)
convertAnnotation(x, from, to) ## S4 method for signature 'DotBracketString' convertAnnotation(x, from, to) ## S4 method for signature 'DotBracketStringSet' convertAnnotation(x, from, to) ## S4 method for signature 'DotBracketStringSetList' convertAnnotation(x, from, to)
x |
a |
from |
which annotation type should be converted? Must be one of the
following values: |
to |
Into which annotation type should the selected one be converted?
Must be one of the following values:
|
The modified input object, a DotBracketString*
object.
str <- "((.))..[[..]]...{{..}}......." dbs <- DotBracketString(str) convertAnnotation(dbs, 1L, 2L)
str <- "((.))..[[..]]...{{..}}......." dbs <- DotBracketString(str) convertAnnotation(dbs, 1L, 2L)
The DotBracketDataFrame
and DotBracketDFrame
object is derived
from the DataFrame
and
DFrame
classes.
DotBracketDataFrame
implents the concept and can be used to implement
other backends than the in-memory one as done by DotBracketDFrame
.
The DotBracketDataFrameList
is implemented analogous, which is also
available as CompressedSplitDotBracketDataFrameList
. Since the names
are quite long, the following short cut functions are available for object
creation: DBDF
, DBDFL
and SDBDFL
.
The DotBracketDataFrame
can only contain 5 columns, which are named
pos
, forward
, reverse
, character
and base
.
The last two columns are optional. The type of the first three has to be
integer
, whereas the fourth is a character
and fifth is a
XStringSet
column.
Upon creation and modification, the validity of the contained base pairing information is checked. If the information is not correct, an error is thrown.
DotBracketDataFrame(..., row.names = NULL) DBDF(...) DotBracketDataFrameList(...) DBDFL(...) SplitDotBracketDataFrameList(..., compress = TRUE, cbindArgs = FALSE) SDBDFL(..., compress = TRUE, cbindArgs = FALSE)
DotBracketDataFrame(..., row.names = NULL) DBDF(...) DotBracketDataFrameList(...) DBDFL(...) SplitDotBracketDataFrameList(..., compress = TRUE, cbindArgs = FALSE) SDBDFL(..., compress = TRUE, cbindArgs = FALSE)
... |
for |
row.names |
See |
compress |
If |
cbindArgs |
If |
a DotBracketDataFrame*
object.
# Manual creation df <- DataFrame(pos = c(1,2,3,4,5,6), forward = c(6,5,0,0,2,1), reverse = c(1,2,0,0,5,6)) # Either works dbdf <- as(df,"DotBracketDataFrame") dbdf <- DotBracketDataFrame(df) # With multiple input DataFrames a SplitDotBracketDataFrameList is returned dbdfl <- DotBracketDataFrame(df,df,df,df) # Creation from a DotBracketString object is probably more common data("dbs", package = "Structstrings") dbdfl <- getBasePairing(dbs) # Elements are returned as DotBracketDataFrames dbdfl[[1]]
# Manual creation df <- DataFrame(pos = c(1,2,3,4,5,6), forward = c(6,5,0,0,2,1), reverse = c(1,2,0,0,5,6)) # Either works dbdf <- as(df,"DotBracketDataFrame") dbdf <- DotBracketDataFrame(df) # With multiple input DataFrames a SplitDotBracketDataFrameList is returned dbdfl <- DotBracketDataFrame(df,df,df,df) # Creation from a DotBracketString object is probably more common data("dbs", package = "Structstrings") dbdfl <- getBasePairing(dbs) # Elements are returned as DotBracketDataFrames dbdfl[[1]]
The DotBracketString
extends the
BString
class. The
DotBracketStringSet
and DotBracketStringSetList
classes are
implemented accordingly.
The alphabet consists of the letters
(
, )
, .
, <
, >
, [
, ]
,
{
and }
, which describes base pairing between positions. The
.
letter describes an unpaired position. The number of opening and
closing letters need to be equal within a DotBracketString
to be a
valid dot bracket annotation. This is checked upon creation and modificiation
of the object.
The objects can also be created using the shorter function names DB
,
DBS
and DBSL
.
Currently, there is no distinction in base pairing strength between the different bracket types.
DotBracketString(x = "", start = 1, nchar = NA) DB(x = character(), start = 1, nchar = NA) DotBracketStringSet(x = character()) DBS(x = character()) DotBracketStringSetList(..., use.names = TRUE) DBSL(..., use.names = TRUE) ## S4 method for signature 'DotBracketString' alphabet(x) ## S4 method for signature 'DotBracketString' encoding(x)
DotBracketString(x = "", start = 1, nchar = NA) DB(x = character(), start = 1, nchar = NA) DotBracketStringSet(x = character()) DBS(x = character()) DotBracketStringSetList(..., use.names = TRUE) DBSL(..., use.names = TRUE) ## S4 method for signature 'DotBracketString' alphabet(x) ## S4 method for signature 'DotBracketString' encoding(x)
x |
|
start |
|
nchar |
|
... |
|
use.names |
|
a DotBracketString*
object.
str <- "((.))..[[..]]...{{..}}..<<..>>" db <- DotBracketString(str) dbs <- DotBracketStringSet(c("structure1" = str, "structure2" = str)) dbsl <- DotBracketStringSetList(list(first = dbs, second = dbs))
str <- "((.))..[[..]]...{{..}}..<<..>>" db <- DotBracketString(str) dbs <- DotBracketStringSet(c("structure1" = str, "structure2" = str)) dbsl <- DotBracketStringSetList(list(first = dbs, second = dbs))
readDotBracketStringSet
and writeDotBracketStringSet
are
functions to read and write dot bracket strings from/to file. Since the
<>
is in conflict with the fasta format, saving to fastq file is
sometimes the only option. Saving a string with a <>
bracket type to a
fasta file will throw an error.
The functions use the underlying Biostrings
infrastructure and share
most of its parameters. For a more detailed look have a look
here
.
readDotBracketStringSet( filepath, format = "fasta", nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE, with.qualities = FALSE ) writeDotBracketStringSet( x, filepath, append = FALSE, compress = FALSE, format = "fasta", ... ) saveDotBracketStringSet( x, objname, dirpath = ".", save.dups = FALSE, verbose = TRUE )
readDotBracketStringSet( filepath, format = "fasta", nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE, with.qualities = FALSE ) writeDotBracketStringSet( x, filepath, append = FALSE, compress = FALSE, format = "fasta", ... ) saveDotBracketStringSet( x, objname, dirpath = ".", save.dups = FALSE, verbose = TRUE )
filepath |
The file name, when writing, or file name(s) when reading. |
format |
"fasta" or "fastq" |
nrec |
Single integer. The maximum of number of records to read in. Negative values are ignored. |
skip |
Single non-negative integer. The number of records of the data file(s) to skip before beginning to read in records. |
seek.first.rec , with.qualities , compress , ... , use.names , objname , dirpath , save.dups , verbose
|
Have a look |
x |
A DotBracketStringSet object |
append |
|
readDotBracketStringSet
returns a DotBracketStringSet
object, writeDotBracketStringSet
returns NULL
invisibly.
data("dbs", package = "Structstrings") file <- tempfile() # works both since a DotBracketStringSet is a BStringSet writeXStringSet(dbs,file) writeDotBracketStringSet(dbs,file) # to return immediatly a DotBracketStringSet us readDotBracketStringSet() dbs2 <- readDotBracketStringSet(file)
data("dbs", package = "Structstrings") file <- tempfile() # works both since a DotBracketStringSet is a BStringSet writeXStringSet(dbs,file) writeDotBracketStringSet(dbs,file) # to return immediatly a DotBracketStringSet us readDotBracketStringSet() dbs2 <- readDotBracketStringSet(file)
getBasePairing
converts a dot bracket annotation from a
DotBracketString
into a base pair table as
DotBracketDataFrame
. Base pairing is indicated by corresponding
numbers in the forward and reverse columns.
getDotBracket
converts the dot bracket annotation from a
DotBracketDataFrame
into a DotBracketString
. If
the character
colums is populated, the information from this column
will be used. If this is not desired set force = TRUE
. However ,
beaware that this will result in a dot bracket annotation, which does not
necessarilly matches the original dot bracket string it may have been
created from. It is rather the dot bracket string with the lowest number of
different loops and it will use the different dot bracket annotations one
after another. Example: "(((<<<>>>)))" will be returned as
(((((())))))
. (((<<<)))>>>
will be returned as
(((<<<)))>>>
, ((([[[)))]]]
will be eturned as
(((<<<)))>>>
.
getLoopIndices
converts the dot bracket annotation from a
DotBracketString
or DotBracketDataFrame
into a
LoopIndexList
.
getBasePairing(x, compress = TRUE, return.sequence = FALSE) getDotBracket(x, force = FALSE) getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketString' getBasePairing(x) ## S4 method for signature 'DotBracketStringSet' getBasePairing(x, compress = TRUE) ## S4 method for signature 'DotBracketDataFrame' getDotBracket(x, force = FALSE) ## S4 method for signature 'DotBracketDataFrameList' getDotBracket(x, force = FALSE) ## S4 method for signature 'SimpleSplitDotBracketDataFrameList' getDotBracket(x, force = FALSE) ## S4 method for signature 'CompressedSplitDotBracketDataFrameList' getDotBracket(x, force = FALSE) ## S4 method for signature 'DotBracketString' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketStringSet' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketDataFrame' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketDataFrameList' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'SimpleSplitDotBracketDataFrameList' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'CompressedSplitDotBracketDataFrameList' getLoopIndices(x, bracket.type, warn.type.drops = TRUE)
getBasePairing(x, compress = TRUE, return.sequence = FALSE) getDotBracket(x, force = FALSE) getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketString' getBasePairing(x) ## S4 method for signature 'DotBracketStringSet' getBasePairing(x, compress = TRUE) ## S4 method for signature 'DotBracketDataFrame' getDotBracket(x, force = FALSE) ## S4 method for signature 'DotBracketDataFrameList' getDotBracket(x, force = FALSE) ## S4 method for signature 'SimpleSplitDotBracketDataFrameList' getDotBracket(x, force = FALSE) ## S4 method for signature 'CompressedSplitDotBracketDataFrameList' getDotBracket(x, force = FALSE) ## S4 method for signature 'DotBracketString' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketStringSet' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketDataFrame' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'DotBracketDataFrameList' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'SimpleSplitDotBracketDataFrameList' getLoopIndices(x, bracket.type, warn.type.drops = TRUE) ## S4 method for signature 'CompressedSplitDotBracketDataFrameList' getLoopIndices(x, bracket.type, warn.type.drops = TRUE)
x |
a |
compress |
|
return.sequence |
if the input is a |
force |
|
bracket.type |
|
warn.type.drops |
|
getBasePairing
:
The result is a DotBracketDataFrame
with following columns:
pos, forward, reverse, character (and optionally the base column). If a
position is unpaired, forward and reverse will be 0
, otherwise it will
match the base paired positions.
getLoopIndices
: returns a LoopIndexList
.
data("dbs", package = "Structstrings") # conversion dbdf <- getBasePairing(dbs) # ... and the round trip dbs <- getDotBracket(dbdf) # loop indices per bracket type loopids <- getLoopIndices(dbs) # choose the bracket type manually, if necessary loopids <- getLoopIndices(dbs, bracket.type = 1L) # do not show warning if mulitple bracket types are present loopids <- getLoopIndices(dbs, bracket.type = 1L, warn.type.drops = FALSE)
data("dbs", package = "Structstrings") # conversion dbdf <- getBasePairing(dbs) # ... and the round trip dbs <- getDotBracket(dbdf) # loop indices per bracket type loopids <- getLoopIndices(dbs) # choose the bracket type manually, if necessary loopids <- getLoopIndices(dbs, bracket.type = 1L) # do not show warning if mulitple bracket types are present loopids <- getLoopIndices(dbs, bracket.type = 1L, warn.type.drops = FALSE)
With loop indeces base pairing information can be represented by giving each base pair a number and increasing/decreasing it with each opened/closed base pair. This information can be used for further analysis of the represented structure.
LoopIndexList(...)
LoopIndexList(...)
... |
the |
a LoopIndexList
object.
# if the object is create manually make sure it is a valid structure # information. Otherwise an error is thrown. lil <- LoopIndexList(list(c(1L,2L,3L,3L,3L,2L,1L,0L,5L,6L,6L,5L), c(1L,2L,2L,2L,2L,2L,1L,0L,5L,6L,6L,5L)))
# if the object is create manually make sure it is a valid structure # information. Otherwise an error is thrown. lil <- LoopIndexList(list(c(1L,2L,3L,3L,3L,2L,1L,0L,5L,6L,6L,5L), c(1L,2L,2L,2L,2L,2L,1L,0L,5L,6L,6L,5L)))
The Structstrings
package implements the widely used to bracket
annotation for storing base pairing information in structured RNA. For
example it is used in the ViennaRNA package (Lorenz et al. 2011), the
tRNAscan-SE software (Lowe et al. 1997) and the tRNAdb (Jühling et al. 2009).
Structstrings
uses the infrastructure provided by the
Biostrings
package and derives the class
DotBracketString
and such from the equivalent
BString
class. From these base pair table can be produced for
in depth analysis. For this purpose the DotBracketDataFrame
class is derived from the DataFrame
class. In addition the loop
IDs of the base pairs can be retrieved as a LoopIndexList
, a
derivate if the IntegerList
. Generally, it checks automatically
for the validity of the dot bracket annotation.
The conversion of the DotBracketString
to the base pair table
and the loop indices is implemented in C for efficiency. The C implementation
to a large extent inspired by the
ViennaRNA package.
This package was developed as a requirement for the tRNA
package.
However, other projects might benefit as well, so it was split of and
improved upon.
Please refer to the Structstrings vignette for an example how to work and use the package: Structstrings.
Felix G M Ernst [aut,cre]
Lorenz, Ronny; Bernhart, Stephan H.; Höner zu Siederdissen, Christian; Tafer, Hakim; Flamm, Christoph; Stadler, Peter F.; Hofacker, Ivo L. (2011): "ViennaRNA Package 2.0". Algorithms for Molecular Biology 6:26. doi:10.1186/1748-7188-6-26
Lowe, T.M.; Eddy, S.R.(1997): "tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence". Nucl. Acids Res. 25: 955-964. doi:10.1093/nar/25.5.955
Jühling, Frank; Mörl, Mario; Hartmann, Roland K.; Sprinzl, Mathias; Stadler, Peter F.; Pütz, Joern (2009): "TRNAdb 2009: Compilation of tRNA Sequences and tRNA Genes." Nucleic Acids Research 37 (suppl_1): D159–D162. doi:10.1093/nar/gkn772.
Example data for using the Structstrings package
data(dbs) data(nseq)
data(dbs) data(nseq)
object of class DotBracketStringSet
and
DNAStringSet
An object of class DNAStringSet
of length 299.
sequence and dot bracket annotation of tRNAscan-SE output for
*S. cerevisiae* imported using
tRNAscanImport
. The example file
is part of the tRNAscanImport
package.
Analog to Biostrings
there are a few objects, which should only be
used internally, but may be of use to other package developers.
Otherwise take care.
DOTBRACKET_CHAR_VALUES DOTBRACKET_ALPHABET STRUCTURE_NEUTRAL_CHR STRUCTURE_OPEN_CHR STRUCTURE_CLOSE_CHR ## S4 replacement method for signature 'DotBracketDataFrame' x[i, j, ...] <- value ## S4 replacement method for signature 'CompressedSplitDotBracketDataFrameList' colnames(x) <- value ## S4 method for signature 'DotBracketString' seqtype(x) ## S4 method for signature 'DotBracketString' subseq(x, start = NA, end = NA, width = NA) ## S4 replacement method for signature 'DotBracketString' subseq(x, start = NA, end = NA, width = NA) <- value ## S4 replacement method for signature 'DotBracketStringSet' subseq(x, start = NA, end = NA, width = NA) <- value
DOTBRACKET_CHAR_VALUES DOTBRACKET_ALPHABET STRUCTURE_NEUTRAL_CHR STRUCTURE_OPEN_CHR STRUCTURE_CLOSE_CHR ## S4 replacement method for signature 'DotBracketDataFrame' x[i, j, ...] <- value ## S4 replacement method for signature 'CompressedSplitDotBracketDataFrameList' colnames(x) <- value ## S4 method for signature 'DotBracketString' seqtype(x) ## S4 method for signature 'DotBracketString' subseq(x, start = NA, end = NA, width = NA) ## S4 replacement method for signature 'DotBracketString' subseq(x, start = NA, end = NA, width = NA) <- value ## S4 replacement method for signature 'DotBracketStringSet' subseq(x, start = NA, end = NA, width = NA) <- value
seqtype , x , start , end , width , value , i , j , ...
|
used internally |
a integer
vector of length 9 containing the integer values
of the dotbracket alphabet
a character
vector of length 9 containing the single
characters of the dotbracket alphabet
a character
vector of length 1 containing the character for
unpaired positions
a character
vector of length 4 containing the opening
character of the dotbracket alphabet
a character
vector of length 4 containing the closing
character of the dotbracket alphabet
DOTBRACKET_CHAR_VALUES DOTBRACKET_ALPHABET STRUCTURE_NEUTRAL_CHR STRUCTURE_OPEN_CHR STRUCTURE_CLOSE_CHR # the replace method for a DotBracketDataFrame had to be reimplemented # because of the requirement of columns for a DotBracketDataFrameList and # DotBracketDataFrame data("dbs", package = "Structstrings") dbdfl <- getBasePairing(dbs) # Elements are returned as DotBracketDataFrames dbdf <- dbdfl[[1]] dbdfl[[1]] <- dbdf dbdfl[1] <- dbdfl[1]
DOTBRACKET_CHAR_VALUES DOTBRACKET_ALPHABET STRUCTURE_NEUTRAL_CHR STRUCTURE_OPEN_CHR STRUCTURE_CLOSE_CHR # the replace method for a DotBracketDataFrame had to be reimplemented # because of the requirement of columns for a DotBracketDataFrameList and # DotBracketDataFrame data("dbs", package = "Structstrings") dbdfl <- getBasePairing(dbs) # Elements are returned as DotBracketDataFrames dbdf <- dbdfl[[1]] dbdfl[[1]] <- dbdf dbdfl[1] <- dbdfl[1]
The StructuredXStringSet
class can be used to store structure
information alongside RNA sequences. The class behaves like the
QualityScaledXStringSet
classes.
Please note, that this does not check for validity regarding base pairing capabilities.
StructuredRNAStringSet(x, structure) dotbracket(x) dotbracket(x) <- value ## S4 method for signature 'StructuredXStringSet' dotbracket(x) ## S4 replacement method for signature 'StructuredXStringSet' dotbracket(x) <- value readStructuredRNAStringSet( filepath, nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE ) writeStructuredXStringSet(x, filepath, append = FALSE, compress = FALSE, ...) ## S4 method for signature 'StructuredXStringSet' getBasePairing(x, compress = TRUE, return.sequence = FALSE) ## S4 method for signature 'StructuredXStringSet' getLoopIndices(x, bracket.type, warn.type.drops = TRUE)
StructuredRNAStringSet(x, structure) dotbracket(x) dotbracket(x) <- value ## S4 method for signature 'StructuredXStringSet' dotbracket(x) ## S4 replacement method for signature 'StructuredXStringSet' dotbracket(x) <- value readStructuredRNAStringSet( filepath, nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE ) writeStructuredXStringSet(x, filepath, append = FALSE, compress = FALSE, ...) ## S4 method for signature 'StructuredXStringSet' getBasePairing(x, compress = TRUE, return.sequence = FALSE) ## S4 method for signature 'StructuredXStringSet' getLoopIndices(x, bracket.type, warn.type.drops = TRUE)
x |
For the |
structure , value
|
|
use.names , type , filepath , nrec , skip , seek.first.rec , append , ...
|
|
compress |
|
return.sequence |
|
bracket.type |
|
warn.type.drops |
See |
the dotbracket
function allows access to the included
DotBracketStringSet
.
a StructuredRNAStringSet
object.
str <- DotBracketStringSet("(())") seq <- RNAStringSet("AGCU") sdbs <- StructuredRNAStringSet(seq,str)
str <- DotBracketStringSet("(())") seq <- RNAStringSet("AGCU") sdbs <- StructuredRNAStringSet(seq,str)