XNAString package

Introduction

Development of XNAString package aims at enabling efficient manipulation of modified oligonucleotide sequences. The package inherits some of the functionalities from Biostrings package. In contrary to Biostrings sequences, XNAString classes allow for description of base sequence, sugar and backbone in a single object. XNAString is able to capture single stranded oligonucleotides, siRNAs, PNAs, shRNAs, gRNAs and synthetic mRNAs, and enable users to apply sequence-manipulating Bioconductor packages to their analysis. XNAString can read and write a HELM notation, compute alphabet frequency, align and match targets.

Methods and functions overview

All exported methods are listed in this section. They are divided into four tables:

  • XNAString methods

  • XNAStringSet methods

  • Both XNAString and XNAStringSet methods

  • Other functions

XNAString methods

XNAString class methods
function. description
XNAString Create XNAString object by passing at least base (sugar, name, backbone, target, conjugates and dictionary are optional).
XNAReverseComplement Take XNAString object and return reverse complement of base slot.
predictMfeStructure Take XNAString object and apply RNAfold_MFE function from ViennaRNA package on base slot (if single stranded molecule).
predictDuplexStructure Take XNAString object and apply RNAcofold_MFE function from ViennaRNA package on base slot (if double stranded molecule) or duplicate base and apply RNAcofold_MFE function from ViennaRNA (if single stranded molecule)
XNAStringToHelm Change XNAString/XNAStringSet object to helm notation.

XNAStringSet methods

XNAStringSet class methods
function. description
XNAStringSet Create XNAStringSet object by passing XNAString objects list.
set2List Change XNAStringSet object to list of XNAString objects.
set2Dt Change XNAStringSet object to data.table.
dt2Set Change data.table (or data.frame) to XNAStringSet object
[ Extract single/multiple XNAString objects (XNAStringSet object returned) by passing index/indexes number or name/names.
[[ Extract single XNAString object (XNAString object returned) by passing index number or name.

Both XNAString and XNAStringSet methods

XNAString and XNAStringSet class methods
function. description
name / name<- Extract / overwrite name slot.
base / base<- Extract / overwrite base slot.
sugar / sugar<- Extract / overwrite sugar slot.
backbone / backbone<- Extract / overwrite backbone slot.
target / target<- Extract / overwrite target slot.
conjugate5 / conjugate5<- Extract / overwrite conjugate5 slot.
conjugate3 / conjugate3<- Extract / overwrite conjugate3 slot.
secondary_structure / secondary_structure<- Extract / overwrite secondary_structure slot.
duplex_structure / duplex_structure<- Extract / overwrite duplex_structure slot.
dictionary / dictionary<- Extract / overwrite dictionary slot.
compl_dictionary / compl_dictionary<- Extract / overwrite compl_dictionary slot.
XNAStringFromHelm Change helm notation to XNAString/XNAStringSet object.
XNAPairwiseAlignment Inherited from Biostrings package. Solve global/local/ends-free alignment problems.
XNAMatchPattern Inherited from Biostrings package. Find/count all the occurrences of a given pattern (typically short) in a reference sequence (typically long). Support mismatches and indels.
XNAVmatchPattern Inherited from Biostrings package. Find/count all the occurrences of a given pattern (typically short) in a set of reference sequences. Support mismatches and indels.
XNAMatchPDict Inherited from Biostrings package. Find/count all the occurrences of a set of patterns in a reference sequence. Support a small number of mismatches.
XNAAlphabetFrequency Tabulate the letters and count frequency for nucleotides.
XNADinucleotideFrequency Tabulate the letters and count frequency for dinucleotides.
mimir2XnaDict Reformat mimir table to XNA dictionary standards

Other functions

XNAString class methods
function. description
concatDict Concatanate custom HELM-symbol dictionary with built-in HELM-symbol xna_dictionary.
mimir2XnaDict Rewrite dictionary table to standard format.

XNAString class

XNAString class is subclass of Biostrings::BString and has 13 slots:

  • name (character)

  • base (character, RNAString, RNAStringSet, DNAString or DNAStringSet)

  • sugar (character)

  • backbone (character, if missing and base is character default string is created by replicating character ‘X’, if missing and base is DNAString/RNAString, backbone all ‘O’)

  • target (DNAStringSet, DNAString or character)

  • conjugate5 (character)

  • conjugate3 (character)

  • secondary_structure (list - structure is character and mfe numeric)

  • duplex_structure (list - structure is character and mfe numeric)

  • dictionary (xna_dictionary default, data.table type)

  • compl_dictionary (complementary_bases default, data.table type)

  • default_sugar (character)

  • default_backbone (character)

Target, name and conjugate slots can be NA. If backbone or dictionaries missing, default values in use.

Validation procedure requirements for XNAString objects:

  • slots type must reflect above requirements

  • length of sugar equals base

  • length of backbone is one element shorter than sugar and base

  • condition on available letters in base / sugar / backbone dictionary must be satisfied

  • length of base, sugar and backbone vectors is the same and is equal 1 or 2, length of target vector >0, length of name and conjugates vectors equal 1

  • length of default_sugar and default_backbone is 1, nchar is also 1 if not NA

Object can be created only when all validation procedure requirements are met.

Object creation

Example 1 - basic XNAString object

If base slot is passed as character and sugar slot passed as well, default backbone is a replication of character ‘X’. Target / secondary_structure / duplex_structure slot default is created by XNAReverseComplement / predictMfeStructure / predictDuplexStructure method applied on base slot.

obj <- XNAString(base = 'ATCG', sugar = 'OOOO')
obj
## XNAString object
## name:       NA
## base:       ATCG
## sugar:      OOOO
## backbone:   XXX
## target:     CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0

Example 2 - XNAString object with optional slots

obj <- XNAString(base = 'ATCG', 
                 sugar = 'OOOO', 
                 backbone = 'SOS', 
                 target = Biostrings::DNAStringSet('TAGC'), 
                 conjugate3 = "[5gn2c6]", 
                 name = "oligo1")
obj
## XNAString object
## name:       oligo1
## base:       ATCG
## sugar:      OOOO
## backbone:   SOS
## target:     TAGC
## conjugate5: NA
## conjugate3: [5gn2c6]
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0

Example 3 - XNAString object with multiple targets

obj <- XNAString(base = 'ATCG', 
                 sugar = 'OOOO', 
                 backbone = 'SOS', 
                 target = Biostrings::DNAStringSet(c('TAGC', 'TATC')))
obj
## XNAString object
## name:       NA
## base:       ATCG
## sugar:      OOOO
## backbone:   SOS
## target:     TAGC, TATC
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0

Example 4 - XNAString for siRNA

obj <- XNAString(base = c('ATCG', 'TAGC'),
                 sugar = c('OOOO', 'OOOO'),
                 backbone = c('SOS','SOS'),
                 target = Biostrings::DNAStringSet(c('TAGC', 'TATC')))
obj
## XNAString object
## name:       NA
## base:       ATCG, TAGC
## sugar:      OOOO, OOOO
## backbone:   SOS, SOS
## target:     TAGC, TATC
## conjugate5: NA
## conjugate3: NA
## secondary_structure: 
## duplex_structure: ....&...., 0

Example 5 - DNAString/ RNAString as base slot

XNAString with DNAString base should create default sugar all D, and backbone all O.
XNAString with RNAString base should create default sugar all R, and backbone all O.
d1 <- Biostrings::DNAString(x = "ACGATCG")

obj <- XNAString(base = d1)
obj
## XNAString object
## name:       NA
## base:       ACGATCG
## sugar:      DDDDDDD
## backbone:   OOOOOO
## target:     CGATCGT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ......., 0
## duplex_structure: .((((((&.)))))), -7.6

Example 6 - XNAString with optional default_backbone and default_sugar slots

XNAString(base = 'ATCG')
## XNAString object
## name:       NA
## base:       ATCG
## sugar:      DDDD
## backbone:   OOO
## target:     CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
XNAString(base = 'ATCG', default_sugar = 'F', default_backbone = 'X')
## XNAString object
## name:       NA
## base:       ATCG
## sugar:      FFFF
## backbone:   XXX
## target:     CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
XNAString(base = Biostrings::DNAString('ATCG'))
## XNAString object
## name:       NA
## base:       ATCG
## sugar:      DDDD
## backbone:   OOO
## target:     CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
XNAString(base = Biostrings::DNAString('ATCG'),  default_backbone = 'O', default_sugar = 'R')
## XNAString object
## name:       NA
## base:       ATCG
## sugar:      RRRR
## backbone:   OOO
## target:     CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0

Methods

Example 1 - public slots getter and setter

Public slots setter - once an object is created, the setter enables users to modify all slots.
New input has to satisfy validation procedures (e.g. the sugar must be of the same length as base)
obj <- XNAString(base = 'ATCG', sugar = 'FODL', backbone = 'SOO')
base(obj)
## [1] "ATCG"
base(obj) <- 'CTGA'
base(obj)
## [1] "CTGA"

Example 2 - XNAReverseComplement (GU wobbles)

XNAReverseComplement method takes XNAString object as an input and finds reverse complement for base slot.
This method is used to create default target slot. Each object has compl_dictionary slot (if not passed while creating an object - complementary_bases table used as default). All bases from dictionary must be present also in compl_dictionary. If that is not the case, target slot default is empty. It is also possible to return multiple targets with XNAReverseComlement method. It can be done by passing custom complementary_bases dictionary and using Iupac symbols. E.g. by adding symbol ‘R’ to custom complementary_bases dictionary, there are two possibile bases: ‘G’ and ‘A’ (see the example below). More symbols and corresponding bases (Iupac standards): * symbol “W” - bases “A”, “T”

  • symbol “S” - bases “G”, “C”

  • symbol “M” - bases “A”, “C”

  • symbol “K” - bases “G”, “T”

  • symbol “R” - bases “A”, “G”

  • symbol “Y” - bases “C”, “T”

  • symbol “B” - bases “C”, “G”, “T”

  • symbol “D” - bases “A”, “G”, “T”

  • symbol “H” - bases “A”, “C”, “T”

  • symbol “V” - bases “A”, “C”, “G”

  • symbol “N” - bases “A”, “C”, “G”, “T”

XNAString::complementary_bases
##      base target compl_target
##    <char> <char>       <char>
## 1:      A      T            A
## 2:      C      G            C
## 3:      G      C            G
## 4:      T      A            T
## 5:      E      G            C
## 6:      U      A            T
obj1 <- XNAString(base = "ACEGTTGGT",
                    sugar = 'FODDDDDDD',
                    conjugate3 = 'TAG')
XNAString::XNAReverseComplement(obj1)
## [1] "ACCAACGGT"
custom_compl_dict <-
  rbind(
    XNAString::complementary_bases[seq(1, 5),],
    data.table::data.table(
      base = 'U',
      target = 'R',
      compl_target = 'T'
    )
  )
obj2 <- XNAString(base = "ACGCUUA",
                  sugar = 'DDDDDDD',
                  compl_dictionary = custom_compl_dict)
obj2
## XNAString object
## name:       NA
## base:       ACGCUUA
## sugar:      DDDDDDD
## backbone:   XXXXXX
## target:     TAAGCGT, TGAGCGT, TAGGCGT, TGGGCGT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ......., 0
## duplex_structure: ..((...&..))..., -2.1
#create custom complementary dictonary with complementary bases coded as IUPAC 
compl_dict <- XNAString::complementary_bases
compl_dict[base == "G"]$target <- "Y"
compl_dict[base == "T"]$target <- "R" # if you have T in your base sequence 
compl_dict[base == "U"]$target <- "R" # if you have U in your base sequence 

compl_dict
##      base target compl_target
##    <char> <char>       <char>
## 1:      A      T            A
## 2:      C      G            C
## 3:      G      Y            G
## 4:      T      R            T
## 5:      E      G            C
## 6:      U      R            T
xna <- XNAString::XNAString(base = "ACGTACGT", sugar = "DDDDDDDD",compl_dictionary = compl_dict)
xna
## XNAString object
## name:       NA
## base:       ACGTACGT
## sugar:      DDDDDDDD
## backbone:   XXXXXXX
## target:     ACGTACGT, GCGTACGT, ATGTACGT, GTGTACGT, ACGTGCGT, GCGTGCGT, ATGTGCGT, GTGTGCGT, ACGTATGT, GCGTATGT, ATGTATGT, GTGTATGT, ACGTGTGT, GCGTGTGT, ATGTGTGT, GTGTGTGT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ........, 0
## duplex_structure: ((((((((&)))))))), -9.8

Example 3 - predictMfeStructure

predictMfeStructure - this method uses RNAfold_MFE function from C library ViennaRNA, this function can handle only standard bases. Takes XNAString object and applies RNAfold_MFE function on base slot if base is DNAString or RNAString. If base is character (not-standard bases allowed), then RNAfold_mfe works for A, C, G, T, E, U letters with built-in complementary_bases dictionary (e.g. E letter is changed to C before applying predictMfeStructure function). If base slot includes other letters, their translation to compl_target must be included in compl_dict slot. Otherwise predictMfeStructure returns empty string. predictMfeStructure method is used to create default secondary_structure slot.

obj <- XNAString(base = 'GAGAGGGAACCAGGCAGGGACCGCAGACAACA', 
                   sugar = 'FODLMFODLMFODLMFODLMFODLMFFFFFFF')
XNAString::predictMfeStructure(obj)
## $structure
## [1] ".....((..((.....))..)).........."
## 
## $mfe
## [1] -4.5

Example 4 - predictDuplexStructure - double stranded molecule

predictDuplexStructure - this method uses RNAcofold_MFE function from C library ViennaRNA, this function can handle only standard bases. predictDuplexStructure takes XNAString object and applies RNAcofold_MFE function on base slot if base is DNAStringSet or RNAStringSet. If base is character vector (not-standard bases allowed), then RNAcofold_mfe works for A, C, G, T, E, U letters with built-in complementary_bases dictionary (e.g. E letter is changed to C before applying predictDuplexStructure function). If base slot includes other letters, their translation to compl_target must be included in compl_dict slot passed manually. Otherwise predictDuplexStructure returns empty string. predictDuplexStructure method is used to create default duplex_structure slot.

obj <- XNAString(base = Biostrings::DNAStringSet(c('GAGAGGGAACCAGGCAGGGACCGCAGACAACA', 'GAGAGGGAACCAGGCAGGGACCGCAGACAACA')))

XNAString::predictDuplexStructure(obj)
## $structure
## [1] ".....((..((..((.((..((..........&.....))..))..)).))..)).........."
## 
## $mfe
## [1] -14.5

Example 5 - predictDuplexStructure - single stranded molecule

RNAcofold_MFE function needs two sequences, so if molecule is sinlge stranded, base sequence is duplicated within the predictDuplexStructure method.

obj <- XNAString(base = Biostrings::DNAString('GAGAGGGAACCAGGCAGGGACCGCAGACAACA'), 
                 sugar = 'FODLMFODLMFODLMFODLMFODLMFFFFFFF')
  
XNAString::predictDuplexStructure(obj)
## $structure
## [1] ".....((..((..((.((..((..........&.....))..))..)).))..)).........."
## 
## $mfe
## [1] -14.5

XNAStringSet class

XNAStringSet class consists of XNAString objects given as a list. Validation procedure checks if all objects are of XNAString class.

Object creation

Example 1 - basic XNAStringSet object as a list of XNAString objects

XNAString_obj1 <- XNAString(base = 'ATCG', sugar = 'FODD')
XNAString_obj2 <- XNAString(base = 'TTCT', sugar = 'FOLL')

XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))

XNAStringSet_obj
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1:     NA   ATCG   FODD      XXX   CGAT         NA         NA
## 2:     NA   TTCT   FOLL      XXX   AGAA         NA         NA
##    secondary_structure
##                 <list>
## 1:             ...., 0
## 2:             ...., 0

Example 2 - XNAStringSet consists of XNAString objects for siRNA

XNAString_obj1 <- XNAString(base = c('ATCG', 'TAGC'), 
                 sugar = c('OOOO', 'OOOO'), 
                 backbone = c('SOS','SOS'), 
                 target = Biostrings::DNAStringSet(c('TAGC', 'TACC')))

XNAString_obj2 <- XNAString(base = c('GGCG', 'TATC'), 
                 sugar = c('OOOO', 'OOOO'), 
                 target = Biostrings::DNAStringSet(c('CCGC', 'TATG')))

XNAStringSet_siRNA <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))

XNAStringSet_siRNA
## XNAStringSet object
##      name       base      sugar backbone     target conjugate5 conjugate3
##    <list>     <list>     <list>   <list>     <list>     <list>     <list>
## 1:     NA ATCG, TAGC OOOO, OOOO SOS, SOS TAGC, TACC         NA         NA
## 2:     NA GGCG, TATC OOOO, OOOO XXX, XXX CCGC, TATG         NA         NA
##    secondary_structure
##                 <list>
## 1:                    
## 2:

Example 3 - XNAStringSet object created by passing vectors

XNAStringSet(base = c("ATGCT","TGCAT","ATATG"))
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1:     NA  ATGCT  DDDDD     OOOO  AGCAT         NA         NA
## 2:     NA  TGCAT  DDDDD     OOOO  ATGCA         NA         NA
## 3:     NA  ATATG  DDDDD     OOOO  CATAT         NA         NA
##    secondary_structure
##                 <list>
## 1:            ....., 0
## 2:            ....., 0
## 3:            ....., 0
XNAStringSet(base = c("ATGCT","TGCAT","ATATG"), default_sugar = 'R', default_backbone = 'X')
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1:     NA  ATGCT  RRRRR     XXXX  AGCAT         NA         NA
## 2:     NA  TGCAT  RRRRR     XXXX  ATGCA         NA         NA
## 3:     NA  ATATG  RRRRR     XXXX  CATAT         NA         NA
##    secondary_structure
##                 <list>
## 1:            ....., 0
## 2:            ....., 0
## 3:            ....., 0
XNAStringSet(base = c("ATGCT","TGCAT","ATATG"),sugar = c("DDDDD","DDDDD","DDDDD"))
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1:     NA  ATGCT  DDDDD     XXXX  AGCAT         NA         NA
## 2:     NA  TGCAT  DDDDD     XXXX  ATGCA         NA         NA
## 3:     NA  ATATG  DDDDD     XXXX  CATAT         NA         NA
##    secondary_structure
##                 <list>
## 1:            ....., 0
## 2:            ....., 0
## 3:            ....., 0
XNAStringSet(base= list(c('TT', 'GG'), 
                        c('TG', 'GT'), 
                        c('TG')), 
             sugar = list(c('FF', 'FO'), 
                          c('OO', 'OF'), 
                          c('OO')), 
             backbone =list(c('X', 'X'), 
                            c('X', 'X'), 
                            c('X')))
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1:     NA TT, GG FF, FO     X, X     AA         NA         NA
## 2:     NA TG, GT OO, OF     X, X     CA         NA         NA
## 3:     NA     TG     OO        X     CA         NA         NA
##    secondary_structure
##                 <list>
## 1:                    
## 2:                    
## 3:               .., 0

Example 4 - XNAStringSet object created by passing vectors and optional coml_dict (GU wobbles)

XNAStringSet(base= c('TT', 'GG'), sugar = c('FF', 'FF'))
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1:     NA     TT     FF        X     AA         NA         NA
## 2:     NA     GG     FF        X     CC         NA         NA
##    secondary_structure
##                 <list>
## 1:               .., 0
## 2:               .., 0
compl_dict <- XNAString::complementary_bases
compl_dict[base == "G"]$target <- "Y"
compl_dict[base == "T"]$target <- "R" # if you have T in your base sequence
compl_dict[base == "U"]$target <- "R" # if you have U in your base sequence

XNAStringSet(base= c('TT', 'GG'), sugar = c('FF', 'FF'), compl_dict = compl_dict) 
## XNAStringSet object
##      name   base  sugar backbone       target conjugate5 conjugate3
##    <list> <list> <list>   <list>       <list>     <list>     <list>
## 1:     NA     TT     FF        X AA, GA, ....         NA         NA
## 2:     NA     GG     FF        X CC, TC, ....         NA         NA
##    secondary_structure
##                 <list>
## 1:               .., 0
## 2:               .., 0
# compl_dict in use only if target empty
XNAStringSet(base= c('TT', 'GG'), 
             sugar = c('FF', 'FF'), 
             target = c('AA', 'CC'), 
             compl_dict = compl_dict)
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1:     NA     TT     FF        X     AA         NA         NA
## 2:     NA     GG     FF        X     CC         NA         NA
##    secondary_structure
##                 <list>
## 1:               .., 0
## 2:               .., 0

Example 5 - XNAStringSet created from data.table

dt <- data.table::data.table(base = c('TT', 'GG'))
out1 <- dt2Set(dt)
out1
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1:     NA     TT     DD        O     AA         NA         NA
## 2:     NA     GG     DD        O     CC         NA         NA
##    secondary_structure
##                 <list>
## 1:               .., 0
## 2:               .., 0
dt <- data.table::data.table(base = c('TT', 'GG'), default_sugar = 'R', default_backbone = 'X')
out2 <- dt2Set(dt)
out2
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1:     NA     TT     DD        O     AA         NA         NA
## 2:     NA     GG     DD        O     CC         NA         NA
##    secondary_structure
##                 <list>
## 1:               .., 0
## 2:               .., 0
dt <- data.table::data.table(base= list(c('TT', 'GG'), 
                                        c('TG', 'GT'), 
                                        c('TG')), 
                            sugar = list(c('FF', 'FO'), 
                                         c('OO', 'OF'), 
                                         c('OO')), 
                            backbone =list(c('X', 'X'), 
                                           c('X', 'X'), 
                                           c('X')))
out3 <- dt2Set(dt)
out3
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1:     NA TT, GG FF, FO     X, X     AA         NA         NA
## 2:     NA TG, GT OO, OF     X, X     CA         NA         NA
## 3:     NA     TG     OO        X     CA         NA         NA
##    secondary_structure
##                 <list>
## 1:                    
## 2:                    
## 3:               .., 0

Methods

Methods which enable XNAStringSet extraction / modification:

  • set2List

  • extraction methods (“[” and “[[”)

  • set2Dt method

  • public slots setter/getter

  • XNAStringSet from data.table/data.frame

Example 1 - change XNAStringSet object to a list of XNAString objects

XNAString_obj1 <- XNAString(name = 'oligo1', base = 'ATCG', sugar = 'FODD')
XNAString_obj2 <- XNAString(name = 'oligo2',base = 'TTCT', sugar = 'FOLL')

XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))
set2List(XNAStringSet_obj)
## [[1]]
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1: oligo1   ATCG   FODD      XXX   CGAT         NA         NA
##    secondary_structure
##                 <list>
## 1:             ...., 0
## 
## [[2]]
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1: oligo2   TTCT   FOLL      XXX   AGAA         NA         NA
##    secondary_structure
##                 <list>
## 1:             ...., 0

Example 2 - extract single XNAString object or part of XNAStringSet object

XNAStringSet_obj[2]
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1: oligo2   TTCT   FOLL      XXX   AGAA         NA         NA
##    secondary_structure
##                 <list>
## 1:             ...., 0
XNAStringSet_obj['oligo2']
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1: oligo2   TTCT   FOLL      XXX   AGAA         NA         NA
##    secondary_structure
##                 <list>
## 1:             ...., 0
XNAStringSet_obj[c(1,2)]
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1: oligo1   ATCG   FODD      XXX   CGAT         NA         NA
## 2: oligo2   TTCT   FOLL      XXX   AGAA         NA         NA
##    secondary_structure
##                 <list>
## 1:             ...., 0
## 2:             ...., 0
XNAStringSet_obj[c('oligo1', 'oligo2')]
## XNAStringSet object
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1: oligo1   ATCG   FODD      XXX   CGAT         NA         NA
## 2: oligo2   TTCT   FOLL      XXX   AGAA         NA         NA
##    secondary_structure
##                 <list>
## 1:             ...., 0
## 2:             ...., 0
XNAStringSet_obj[[2]]
## XNAString object
## name:       oligo2
## base:       TTCT
## sugar:      FOLL
## backbone:   XXX
## target:     AGAA
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
XNAStringSet_obj[['oligo2']]
## XNAString object
## name:       oligo2
## base:       TTCT
## sugar:      FOLL
## backbone:   XXX
## target:     AGAA
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0

Example 3 - change XNAStringSet object to data.table

set2Dt(XNAStringSet_obj, slots =c('name', 'base', 'sugar', 'backbone', 'target', 'conjugate5', 'conjugate3'))
##      name   base  sugar backbone target conjugate5 conjugate3
##    <list> <list> <list>   <list> <char>     <list>     <list>
## 1: oligo1   ATCG   FODD      XXX   CGAT         NA         NA
## 2: oligo2   TTCT   FOLL      XXX   AGAA         NA         NA

Example 4 - public slots getter and setter

base(XNAStringSet_siRNA, 1)
## [1] "ATCG" "GGCG"
base(XNAStringSet_siRNA, 2)
## [1] "TAGC" "TATC"
base(XNAStringSet_siRNA, 1) <- c('CTGA', 'CCCC')
base(XNAStringSet_siRNA, 2) <- c('TTTT', 'TTTT')
XNAStringSet_siRNA
## XNAStringSet object
##      name       base      sugar backbone     target conjugate5 conjugate3
##    <list>     <list>     <list>   <list>     <list>     <list>     <list>
## 1:     NA CTGA, TTTT OOOO, OOOO SOS, SOS TAGC, TACC         NA         NA
## 2:     NA CCCC, TTTT OOOO, OOOO XXX, XXX CCGC, TATG         NA         NA
##    secondary_structure
##                 <list>
## 1:                    
## 2:

Example 5 - XNAStringSet from data.table

dt <- data.table::data.table(base= c('AEACACACACACAEAE', 'AAETGCTETGTATTTTTE'), 
                  sugar = c('LLLDDDDDDDDDDLLL', 'LLLDDDDDDDDDDDLLLL'), 
                  backbone =c('SSSSSSSSSSSSSSS', 'SSSSSSSSSSSSSSSSS'))
dt
##                  base              sugar          backbone
##                <char>             <char>            <char>
## 1:   AEACACACACACAEAE   LLLDDDDDDDDDDLLL   SSSSSSSSSSSSSSS
## 2: AAETGCTETGTATTTTTE LLLDDDDDDDDDDDLLLL SSSSSSSSSSSSSSSSS
dt2Set(dt)
## XNAStringSet object
##      name         base        sugar     backbone             target conjugate5
##    <list>       <list>       <list>       <list>             <char>     <list>
## 1:     NA AEACACAC.... LLLDDDDD.... SSSSSSSS....   GTGTGTGTGTGTGTGT         NA
## 2:     NA AAETGCTE.... LLLDDDDD.... SSSSSSSS.... GAAAAATACAGAGCAGTT         NA
##    conjugate3 secondary_structure
##        <list>              <list>
## 1:         NA        ............
## 2:         NA        ............

HELM notation to XNAString object

Methods written in this section translate HELM notation to multistring notation and create XNAString/XNAStringSet object.

Example 1 - ssRNA

helm <-
    "CHEM1{[5gn2c6]}|RNA1{P.[dR](C)P.[dR](A)P.[LR](G)[sP].[LR](A)[sP].[LR](G)[sP].[LR](A)[sP].[dR](A)[sP].[dR](G)[sP].[dR](G)[sP].[dR](C)[sP].[dR](A)[sP].[dR](C)[sP].[dR](A)[sP].[dR](G)[sP].[dR](A)[sP].[LR]([5meC])[sP].[LR](G)[sP].[LR](G)}$CHEM1,RNA1,1:R2-1:R1$$$V2.0"

XNAStringFromHelm(helm, name ='oligo1')
## XNAString object
## name:       oligo1
## base:       GAGAAGGCACAGAEGG
## sugar:      LLLLDDDDDDDDDLLL
## backbone:   XXXXXXXXXXXXXXX
## target:     CCGTCTGTGCCTTCTC
## conjugate5: [5gn2c6]
## conjugate3: 
## secondary_structure: ................, 0
## duplex_structure: ......((........&......))........, -2.1

Example 2 - siRNA

helm <-
"RNA1{[mR](C)P.[mR](C)P.[fR](C)P.[mR](C)P.[fR](C)P.[mR](G)P.[fR](C)P.[mR](C)P.[fR](G)P.[mR](T)P.[fR](G)P.[mR](G)P.[fR](T)P.[mR](T)P.[fR](C)P.[mR](A)P.[fR](T)P.[mR](A)P.[fR](A)}|RNA2{[fR](T)P.[fR](T)P.[mR](A)P.[fR](T)P.[mR](G)P.[fR](A)P.[mR](A)P.[fR](C)P.[mR](C)P.[fR](A)P.[mR](C)P.[fR](G)P.[mR](G)P.[fR](C)P.[mR](A)P.[fR](G)P.[mR](G)P.[fR](G)P.[mR](G)P.[fR](C)P.[mR](G)}$RNA1,RNA2,2:pair-56:pair|RNA1,RNA2,5:pair-53:pair|RNA1,RNA2,8:pair-50:pair|RNA1,RNA2,11:pair-47:pair|RNA1,RNA2,14:pair-44:pair|RNA1,RNA2,17:pair-41:pair|RNA1,RNA2,20:pair-38:pair|RNA1,RNA2,23:pair-35:pair|RNA1,RNA2,26:pair-32:pair|RNA1,RNA2,29:pair-29:pair|RNA1,RNA2,32:pair-26:pair|RNA1,RNA2,35:pair-23:pair|RNA1,RNA2,38:pair-20:pair|RNA1,RNA2,41:pair-17:pair|RNA1,RNA2,44:pair-14:pair|RNA1,RNA2,47:pair-11:pair|RNA1,RNA2,50:pair-8:pair|RNA1,RNA2,53:pair-5:pair|RNA1,RNA2,56:pair-2:pair$$$V2.0"


XNAStringFromHelm(helm)
## XNAString object
## name:       NA
## base:       CCCCCGCCGTGGTTCATAA, TTATGAACCACGGCAGGGGCG
## sugar:      OOFOFOFOFOFOFOFOFOF, FFOFOFOFOFOFOFOFOFOFO
## backbone:   OOOOOOOOOOOOOOOOOO, OOOOOOOOOOOOOOOOOOOO
## target:     TTATGAACCACGGCGGGGG
## conjugate5: 
## conjugate3: 
## secondary_structure: 
## duplex_structure: ((((.(((((((((((((.&.))))))))))))).)))).., -32.900002

Example 3 - removed linker

helm <-
"CHEM1{[5gn2c6]}|RNA1{P.[dR](C)P.[dR](A)P.[LR](T)[sP].[LR](T)[sP].[LR](G)[sP].[dR](A)[sP].[dR](A)[sP].[dR](T)[sP].[dR](A)[sP].[dR](A)[sP].[dR](G)[sP].[dR](T)[sP].[dR](G)[sP].[dR](G)[sP].[dR](A)[sP].[LR](T)[sP].[LR](G)[sP].[LR](T)}$CHEM1,RNA1,1:R2-1:R1$$$V2.0"

XNAStringFromHelm(helm)
## XNAString object
## name:       NA
## base:       TTGAATAAGTGGATGT
## sugar:      LLLDDDDDDDDDDLLL
## backbone:   XXXXXXXXXXXXXXX
## target:     ACATCCACTTATTCAA
## conjugate5: [5gn2c6]
## conjugate3: 
## secondary_structure: ................, 0
## duplex_structure: ................&................, 0

Example 4 - create XNAStringSet from list of HELM strings

helm <- c("CHEM1{[5gn2c6]}|RNA1{P.[dR](C)P.[dR](A)P.[LR](G)[sP].[LR](A)[sP].[LR](G)[sP].[LR](A)[sP].[dR](A)[sP].[dR](G)[sP].[dR](G)[sP].[dR](C)[sP].[dR](A)[sP].[dR](C)[sP].[dR](A)[sP].[dR](G)[sP].[dR](A)[sP].[LR]([5meC])[sP].[LR](G)[sP].[LR](G)}$CHEM1,RNA1,1:R2-1:R1$$$V2.0",     
          "RNA1{[LR](T)[sP].[LR](G)[sP].[dR](T)[sP].[dR](G)[sP].[LR](T)[sP].[LR](G)[sP].[dR](T)[sP].[dR](G)[sP].[LR](T)[sP].[LR](G)[sP].[dR](T)[sP].[dR](G)[sP].[LR](T)[sP].[LR](G)[sP].[LR](T)}$$$$V2.0")

XNAStringFromHelm(helm, name =c('oligo1', 'oligo2'))
## XNAStringSet object
##      name         base        sugar     backbone           target conjugate5
##    <list>       <list>       <list>       <list>           <char>     <list>
## 1: oligo1 GAGAAGGC.... LLLLDDDD.... XXXXXXXX.... CCGTCTGTGCCTTCTC   [5gn2c6]
## 2: oligo2 TGTGTGTG.... LLDDLLDD.... XXXXXXXX....  ACACACACACACACA           
##    conjugate3 secondary_structure
##        <list>              <list>
## 1:                   ............
## 2:                   ............

XNAString object to HELM notation

Methods written in this section translate multistring notation from XNAString/XNAStringSet object to HELM notation. If molecule is double stranded also pairing information is added.

Example 1 - single stranded molecule

obj <- XNAString(base = 'GAGTTACTTGCCAAET',
                 sugar = 'LLLDMDDDDDDDDLLL',
                 backbone = 'XXXXXXXXXXXXXX2')
  
XNAStringToHelm(obj)
## [1] "RNA1{[LR](G)[sP].[LR](A)[sP].[LR](G)[sP].[dR](T)[sP].[MOE](T)[sP].[dR](A)[sP].[dR](C)[sP].[dR](T)[sP].[dR](T)[sP].[dR](G)[sP].[dR](C)[sP].[dR](C)[sP].[dR](A)[sP].[LR](A)[sP].[LR]([5meC])[PS2].[LR](T)}$$$$V2.0"

Example 2 - double stranded molecule, base is DNAString

obj <- XNAString(
    base = Biostrings::DNAStringSet(c("CCCC", "GGGG")),
    sugar = c("OOFO", "FFOF"),
    backbone = c("OOO", "OOO"),
    target = '',
    conjugate3 = "",
    conjugate5 = "")
  
XNAStringToHelm(obj)
## Warning in .call_fun_in_pwalign("pattern", ...): pattern() has moved from Biostrings to the pwalign package, and is formally
##   deprecated in Biostrings >= 2.75.1. Please call pwalign::pattern() to get rid
##   of this warning.
## Warning in .call_fun_in_pwalign("pattern", ...): pattern() has moved from Biostrings to the pwalign package, and is formally
##   deprecated in Biostrings >= 2.75.1. Please call pwalign::pattern() to get rid
##   of this warning.
## [1] "RNA1{[mR](C)P.[mR](C)P.[fR](C)P.[mR](C)}|RNA2{[fR](G)P.[fR](G)P.[mR](G)P.[fR](G)}$RNA1,RNA2,2:pair-11:pair|RNA1,RNA2,5:pair-8:pair|RNA1,RNA2,8:pair-5:pair|RNA1,RNA2,11:pair-2:pair$$$$V2.0"

Example 3 - double stranded molecule - pairing information added

obj <- XNAString(
    base = c("CCCCEGC", "UUAUGAT"),
    sugar = c("OOFOFOF", "FFOFOFO"),
    backbone = c("OOOOOO", "OOOOOO"),
    target = '',
    conjugate3 = "[5gn2c6]",
    conjugate5 = "")
  
XNAStringToHelm(obj)
## Warning in .call_fun_in_pwalign("pattern", ...): pattern() has moved from Biostrings to the pwalign package, and is formally
##   deprecated in Biostrings >= 2.75.1. Please call pwalign::pattern() to get rid
##   of this warning.
## Warning in .call_fun_in_pwalign("pattern", ...): pattern() has moved from Biostrings to the pwalign package, and is formally
##   deprecated in Biostrings >= 2.75.1. Please call pwalign::pattern() to get rid
##   of this warning.
## [1] "RNA1{[mR](C)P.[mR](C)P.[fR](C)P.[mR](C)P.[fR]([5meC])P.[mR](G)P.[fR](C)}|CHEM1{[5gn2c6]}|RNA2{[fR](U)P.[fR](U)P.[mR](A)P.[fR](U)P.[mR](G)P.[fR](A)P.[mR](T)}$RNA1,RNA2,2:pair-20:pair|RNA1,RNA2,5:pair-17:pair|RNA1,RNA2,8:pair-14:pair|RNA1,RNA2,11:pair-11:pair|RNA1,RNA2,14:pair-8:pair|RNA1,RNA2,17:pair-5:pair|RNA1,RNA2,20:pair-2:pair$$$$V2.0"

Alignment and matching functions

Methods in this section are inherited from Biostrings package. All options in Biostrings methods:

  • pairwiseAlignment

  • matchPattern

  • vmatchPattern

  • matchPDict

are available in XNAStrng package as well. XNAString package renamed these mathods by adding XNA prefix in the beginning.

XNAPairwiseAlignment for target sequence

Target sequence is used as pattern.

Example: global alignment

obj <- XNAString(base = 'ATCGATATATATACACATGTATGATG',
                 sugar = 'OOOODDDDDDDDDDDDDDDDDDDDDD',
                 target = DNAStringSet(c('TAGCTATATATATGTGTACATACTAC', 'TAGCTAGATATATGTGTACATACTAC')))

subject <- 'ATCGATATATATACACATGTATGATGTAGCTATATATATGTGTACATACTACATCGATATATATACACATGTATGATG'
substitutionMatrix <- Biostrings::nucleotideSubstitutionMatrix()
## Warning in .call_fun_in_pwalign("nucleotideSubstitutionMatrix", ...): nucleotideSubstitutionMatrix() has moved from Biostrings to the pwalign
##   package, and is formally deprecated in Biostrings >= 2.75.1. Please call
##   pwalign::nucleotideSubstitutionMatrix() to get rid of this warning.
XNAString::XNAPairwiseAlignment(pattern = obj, subject = subject, substitutionMatrix = substitutionMatrix)
## Global PairwiseAlignmentsSingleSubject (1 of 2)
## pattern: --------------------------TAGCTATA...CATACTAC--------------------------
## subject: ATCGATATATATACACATGTATGATGTAGCTATA...CATACTACATCGATATATATACACATGTATGATG
## score: -202

Example: local alignment

substitutionMatrix <- Biostrings::nucleotideSubstitutionMatrix()
## Warning in .call_fun_in_pwalign("nucleotideSubstitutionMatrix", ...): nucleotideSubstitutionMatrix() has moved from Biostrings to the pwalign
##   package, and is formally deprecated in Biostrings >= 2.75.1. Please call
##   pwalign::nucleotideSubstitutionMatrix() to get rid of this warning.
XNAString::XNAPairwiseAlignment(pattern = obj, subject = subject, type = "local", substitutionMatrix = substitutionMatrix)
## Local PairwiseAlignmentsSingleSubject (1 of 2)
## pattern:  [1] TAGCTATATATATGTGTACATACTAC
## subject: [27] TAGCTATATATATGTGTACATACTAC
## score: 26

XNAMatchPattern for target sequence

Target sequence is used as pattern. If more then one target is present in XNAString object, first is used as default. User can specify which target sequence should be taken as pattern (target.number parameter).

Example: default matching of first target

XNAString::XNAMatchPattern(pattern = obj, subject = subject)
## Views on a 78-letter BString subject
## subject: ATCGATATATATACACATGTATGATGTAGCTATA...CATACTACATCGATATATATACACATGTATGATG
## views:
##       start end width
##   [1]    27  52    26 [TAGCTATATATATGTGTACATACTAC]

Example: match target selected by user

XNAString::XNAMatchPattern(pattern = obj, subject = subject, target.number = 2)
## Views on a 78-letter BString subject
## subject: ATCGATATATATACACATGTATGATGTAGCTATA...CATACTACATCGATATATATACACATGTATGATG
## views: NONE

Example: match pattern selected by user with 1 mismatch

XNAString::XNAMatchPattern(pattern = obj, subject = subject, target.number = 2, max.mismatch = 1)
## Views on a 78-letter BString subject
## subject: ATCGATATATATACACATGTATGATGTAGCTATA...CATACTACATCGATATATATACACATGTATGATG
## views:
##       start end width
##   [1]    27  52    26 [TAGCTATATATATGTGTACATACTAC]

XNAVmatchPattern for multiple subjects

subject <- c('ATCGATATATATACACATGTATGATGTAGCTATATATATGTGTACATACTACATCGATATATATACACATGTATGATG', 'ATCGATATATATACACATGTATGATGTAGCTATATATATGTGTGCGACTACATCGATATATATACACATGTATGATG')

XNAString::XNAVmatchPattern(pattern = obj, subject)
## MIndex object of length 2
## [[1]]
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]        27        52        26
## 
## [[2]]
## IRanges object with 0 ranges and 0 metadata columns:
##        start       end     width
##    <integer> <integer> <integer>

XNAMatchPDict for multiple targets

Only one subject is allowed. Results created for all targets.

subject <- 'ATCGATATATATACACATGTATGATGTAGCTATATATATGTGTACATACTACATCGATATATATACACATGTATGATG'

XNAString::XNAMatchPDict(pdict = obj, subject = subject)
## MIndex object of length 2
## [[1]]
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]        27        52        26
## 
## [[2]]
## IRanges object with 0 ranges and 0 metadata columns:
##        start       end     width
##    <integer> <integer> <integer>

AlphabetFrequency for nucleotides

Letters are tabulated and occurence frequency calculated for nucleotides. There are 6 arguments: * obj (either XNAString and XNAStringSet object) * slot (‘base’, ‘sugar’ or ‘backbone’) * letters (frequency checked just for these letters. If empty, letters from object’s dictionary taken as the default. ) * matrix_nbr (1 is the default. If 1 - first slot’s element is use, if 2 - 2nd element in slot) * as.prob (default FALSE) * base_only (default FALSE. If TRUE, ‘A’, ‘C’, ‘G’, ‘T’, ‘other’ are tabulated)

Example 1: XNAString object

xnastring_obj <- XNAString(name = 'oligo1',
                           base = c('AEEE'),
                           sugar = c('FFOO'),
                           target = DNAStringSet('TTT'))

XNAString::XNAAlphabetFrequency(obj = xnastring_obj, slot = 'base')
##      A    C    E    G    T    U   
## [1,] 1.00 0.00 3.00 0.00 0.00 0.00
XNAString::XNAAlphabetFrequency(obj = xnastring_obj, slot = 'base', as.prob = TRUE)
##      A    C    E    G    T    U   
## [1,] 0.25 0.00 0.75 0.00 0.00 0.00
XNAString::XNAAlphabetFrequency(obj = xnastring_obj, slot = 'base', base_only = TRUE)
##     A     C     G     T other 
##  1.00  0.00  0.00  0.00  3.00
XNAString::XNAAlphabetFrequency(obj = xnastring_obj, slot = 'base', letters = c('A', 'C'))
##      A    C   
## [1,] 1.00 0.00

Example 2: XNAString object

xnastring_obj <- XNAString(name = 'oligo1',
                           base = c('AAEC', 'ECTA'),
                           sugar = c('FFOO', 'DDLM'))

XNAString::XNAAlphabetFrequency(obj = xnastring_obj, slot = 'sugar', matrix_nbr =  2)
##      D    F    L    M    O    R   
## [1,] 2.00 0.00 1.00 1.00 0.00 0.00

Example 3: XNAStringSet object, single base, sugar and backbone

XNAString_obj1 <- XNAString(base = 'ATCG', sugar = 'FODD')
XNAString_obj2 <- XNAString(base = 'TTCT', sugar = 'FOLL')

XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))

XNAString::XNAAlphabetFrequency(obj = XNAStringSet_obj, slot = 'base')
##      A    C    E    G    T    U   
## [1,] 1.00 1.00 0.00 1.00 1.00 0.00
## [2,] 0.00 1.00 0.00 0.00 3.00 0.00

Example 4: XNAStringSet object, double base, sugar and backbone

XNAString_obj1 <- XNAString(base = c('ATCG', 'TAGC'),
                 sugar = c('OFOO', 'ODDF'),
                 backbone = c('SOS','SOS'))

XNAString_obj2 <- XNAString(base = c('GGCG', 'TATC'),
                 sugar = c('OOOO', 'OOFO'))

XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))

XNAString::XNAAlphabetFrequency(obj = XNAStringSet_obj, slot = 'sugar', matrix_nbr = 2)
##      D    F    L    M    O    R   
## [1,] 2.00 1.00 0.00 0.00 1.00 0.00
## [2,] 0.00 1.00 0.00 0.00 3.00 0.00

AlphabetFrequency for dinucleotides

Letters are tabulated and occurence frequency calculated for dinucleotides. There are 6 arguments:

  • obj (either XNAString and XNAStringSet object)

  • slot (‘base’, ‘sugar’ or ‘backbone’)

  • double_letters (frequency checked just for these double letters. If empty, all possible double letters from object’s dictionary taken as the default.)

  • matrix_nbr (1 is the default. If 1 - first slot’s element is use, if 2 - 2nd element in slot)

  • as.prob (default FALSE)

  • base_only (default FALSE. If TRUE, all possible double letters composed of: ‘A’, ‘C’, ‘G’, ‘T’ are tabulated)

Example 1: XNAString object

xnastring_obj <- XNAString(name = 'oligo1',
                           base = c('GCGC'),
                           sugar = c('FODL'),
                           target = DNAStringSet('TTTT'))

XNAString::XNADinucleotideFrequency(obj = xnastring_obj, slot = 'base')
##      AA   CA   EA   GA   TA   UA   AC   CC   EC   GC   TC   UC   AE   CE   EE  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00
##      GE   TE   UE   AG   CG   EG   GG   TG   UG   AT   CT   ET   GT   TT   UT  
## [1,] 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
##      AU   CU   EU   GU   TU   UU  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00
XNAString::XNADinucleotideFrequency(obj = xnastring_obj, slot = 'base', as.prob = TRUE)
##      AA   CA   EA   GA   TA   UA   AC   CC   EC   GC   TC   UC   AE   CE   EE  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.67 0.00 0.00 0.00 0.00 0.00
##      GE   TE   UE   AG   CG   EG   GG   TG   UG   AT   CT   ET   GT   TT   UT  
## [1,] 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
##      AU   CU   EU   GU   TU   UU  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00
XNAString::XNADinucleotideFrequency(obj = xnastring_obj, slot = 'base', base_only = TRUE)
##    AA    CA    GA    TA    AC    CC    GC    TC    AG    CG    GG    TG    AT 
##  0.00  0.00  0.00  0.00  0.00  0.00  2.00  0.00  0.00  1.00  0.00  0.00  0.00 
##    CT    GT    TT other 
##  0.00  0.00  0.00  0.00
XNAString::XNADinucleotideFrequency(obj = xnastring_obj, slot = 'base', double_letters = c('GC', 'CU'))
##      GC   CU  
## [1,] 2.00 0.00

Example 2: XNAString object with custom dictionary

my_dict <- data.table::data.table(type = c(rep('base', 3), rep('sugar', 2), rep('backbone', 2)),
                                  symbol = c('G', 'E', 'A', 'F', 'O', 'S', 'B'))

xnastring_obj <-      XNAString_obj1 <- XNAString(
  base = c('AGGE', 'EEEA'),
  sugar = c('FFFO', 'OOOO'),
  backbone = c('SBS', 'SBS'),
  dictionary = my_dict
)



XNAString::XNADinucleotideFrequency(obj = xnastring_obj,
                                    slot = 'sugar',
                                    matrix_nbr =  2)
##      FF   OF   FO   OO  
## [1,] 0.00 0.00 0.00 3.00

Example 3: XNAStringSet object, single base, sugar and backbone

XNAString_obj1 <- XNAString(base = 'ATCG', sugar = 'FODD')
XNAString_obj2 <- XNAString(base = 'TTCT', sugar = 'FOLL')

XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))

XNAString::XNADinucleotideFrequency(obj = XNAStringSet_obj, slot = 'base')
##      AA   CA   EA   GA   TA   UA   AC   CC   EC   GC   TC   UC   AE   CE   EE  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00
##      GE   TE   UE   AG   CG   EG   GG   TG   UG   AT   CT   ET   GT   TT   UT  
## [1,] 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.00
##      AU   CU   EU   GU   TU   UU  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00

Example 4: XNAStringSet object, double base, sugar and backbone

XNAString_obj1 <- XNAString(base = c('ATCG', 'TAGC'),
                 sugar = c('OFOO', 'ODDF'),
                 backbone = c('SOS','SOS'))

XNAString_obj2 <- XNAString(base = c('GGCG', 'TATC'),
                 sugar = c('OOOO', 'OOFO'))

XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))

XNAString::XNADinucleotideFrequency(obj = XNAStringSet_obj, slot = 'sugar', matrix_nbr = 2)
##      DD   FD   LD   MD   OD   RD   DF   FF   LF   MF   OF   RF   DL   FL   LL  
## [1,] 1.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00
##      ML   OL   RL   DM   FM   LM   MM   OM   RM   DO   FO   LO   MO   OO   RO  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.00
##      DR   FR   LR   MR   OR   RR  
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00

Other functions

Example 1: mimir2XnaDict

dt <- data.table::data.table(HELM = c("([PPG])", "[fR]", "[srP]"),
                  TS_BASE_SEQ = c("F", NA, NA),
                  TS_SUGAR_SEQ = c(NA, NA, 'F'),
                  TS_BACKBONE_SEQ = c(NA, 'S', NA))
dt
##       HELM TS_BASE_SEQ TS_SUGAR_SEQ TS_BACKBONE_SEQ
##     <char>      <char>       <char>          <char>
## 1: ([PPG])           F         <NA>            <NA>
## 2:    [fR]        <NA>         <NA>               S
## 3:   [srP]        <NA>            F            <NA>
mimir2XnaDict(dt, 'TS_BASE_SEQ', 'TS_SUGAR_SEQ', 'TS_BACKBONE_SEQ')
##       HELM     type symbol
##     <char>   <char> <char>
## 1: ([PPG])     base      F
## 2:   [srP]    sugar      F
## 3:    [fR] backbone      S

Example 2: concatDict

my_dict <- data.table::data.table(HELM = c('[[B]]'),
                                   type = c('base'),
                                   symbol = c('B'))
concatDict(my_dict)
##         HELM     type symbol
##       <char>   <char> <char>
##  1:    [[B]]     base      B
##  2:        P backbone      O
##  3:    [PS2] backbone      2
##  4:  [Ppace] backbone      A
##  5:     [sP] backbone      X
##  6: [sPpace] backbone      B
##  7:    [srP] backbone      R
##  8:    [ssP] backbone      S
##  9:      (A)     base      A
## 10:      (C)     base      C
## 11:      (G)     base      G
## 12:      (T)     base      T
## 13: ([5meC])     base      E
## 14:      (U)     base      U
## 15:     [LR]    sugar      L
## 16:    [MOE]    sugar      M
## 17:      [R]    sugar      R
## 18:     [dR]    sugar      D
## 19:     [mR]    sugar      O
## 20:     [fR]    sugar      F
##         HELM     type symbol

Example 3: concatDict with duplicates

my_dict <- data.table::data.table(HELM = c('[[U]]'),
                                   type = c('base'),
                                   symbol = c('U'))
concatDict(my_dict)
## Error in concatDict(my_dict): There is at least one duplicated symbol for the same type.

Exemplary workflows

Transform data saved in HELM notation to XNAString object. Then for a given subject find global alignment and match target slot (used as a pattern) to the subject.


helm <-
    "CHEM1{[5gn2c6]}|RNA1{P.[dR](C)P.[dR](A)P.[LR](G)[sP].[LR](A)[sP].[LR](G)[sP].[LR](A)[sP].[dR](A)[sP].[dR](G)[sP].[dR](G)[sP].[dR](C)[sP].[dR](A)[sP].[dR](C)[sP].[dR](A)[sP].[dR](G)[sP].[dR](A)[sP].[LR]([5meC])[sP].[LR](G)[sP].[LR](G)}$CHEM1,RNA1,1:R2-1:R1$$$V2.0"

xna_obj <- XNAStringFromHelm(helm, name ='oligo1')
xna_obj
#> XNAString object
#> name:       oligo1
#> base:       GAGAAGGCACAGAEGG
#> sugar:      LLLLDDDDDDDDDLLL
#> backbone:   XXXXXXXXXXXXXXX
#> target:     CCGTCTGTGCCTTCTC
#> conjugate5: [5gn2c6]
#> conjugate3: 
#> secondary_structure: ................, 0
#> duplex_structure: ......((........&......))........, -2.1

subject <- 'ATCGATATATATACACCGTCTGTGCCTTCTCACTACATCGAG'
substitutionMatrix <- Biostrings::nucleotideSubstitutionMatrix()
#> Warning in .call_fun_in_pwalign("nucleotideSubstitutionMatrix", ...): nucleotideSubstitutionMatrix() has moved from Biostrings to the pwalign
#>   package, and is formally deprecated in Biostrings >= 2.75.1. Please call
#>   pwalign::nucleotideSubstitutionMatrix() to get rid of this warning.
XNAString::XNAPairwiseAlignment(pattern = obj, subject = subject, substitutionMatrix = substitutionMatrix)
#> Global PairwiseAlignmentsSingleSubject (1 of 2)
#> pattern: TAG----------------CTATATATATGTGTACATACTAC
#> subject: ATCGATATATATACACCGTCTGTGCCTTCTCACTACATCGAG
#> score: -68

XNAString::XNAMatchPattern(pattern = xna_obj, subject = subject)
#> Views on a 42-letter BString subject
#> subject: ATCGATATATATACACCGTCTGTGCCTTCTCACTACATCGAG
#> views:
#>       start end width
#>   [1]    16  31    16 [CCGTCTGTGCCTTCTC]

Create XNAStringSet object by passing base list. Check alphabet frequency and dinucleotide frequency for this object.



base <- list(c('ATCGATAT', 'ATCGATAT'),  c('TGGGGGTGC', 'ATCGGGAT'), c('CCCTAGTA'))
                            
set_obj <- XNAStringSet(base = base)
set_obj
#> XNAStringSet object
#>      name         base        sugar     backbone    target conjugate5
#>    <list>       <list>       <list>       <list>    <char>     <list>
#> 1:     NA ATCGATAT.... DDDDDDDD.... OOOOOOO,....  ATATCGAT         NA
#> 2:     NA TGGGGGTG.... DDDDDDDD.... OOOOOOOO.... GCACCCCCA         NA
#> 3:     NA     CCCTAGTA     DDDDDDDD      OOOOOOO  TACTAGGG         NA
#>    conjugate3 secondary_structure
#>        <list>              <list>
#> 1:         NA                    
#> 2:         NA                    
#> 3:         NA         ........, 0

XNAAlphabetFrequency(obj = set_obj, slot = 'base')
#>      A    C    E    G    T    U   
#> [1,] 3.00 1.00 0.00 1.00 3.00 0.00
#> [2,] 0.00 1.00 0.00 6.00 2.00 0.00
#> [3,] 2.00 3.00 0.00 1.00 2.00 0.00
XNAAlphabetFrequency(obj = set_obj, slot = 'base', matrix_nbr = 2)
#>      A    C    E    G    T    U   
#> [1,] 3.00 1.00 0.00 1.00 3.00 0.00
#> [2,] 2.00 1.00 0.00 3.00 2.00 0.00
#> [3,] 0.00 0.00 0.00 0.00 0.00 0.00
XNAAlphabetFrequency(obj = set_obj, slot = 'base', as.prob = TRUE)
#>      A    C    E    G    T    U   
#> [1,] 0.38 0.12 0.00 0.12 0.38 0.00
#> [2,] 0.00 0.11 0.00 0.67 0.22 0.00
#> [3,] 0.25 0.38 0.00 0.12 0.25 0.00

XNADinucleotideFrequency(obj = set_obj, slot = 'base', double_letters = c('AT', 'GA', 'GT'))
#>      AT   GA   GT  
#> [1,] 3.00 1.00 0.00
#> [2,] 0.00 0.00 1.00
#> [3,] 0.00 0.00 1.00
XNADinucleotideFrequency(obj = set_obj, slot = 'base', base_only = TRUE)
#>      AA   CA   GA   TA   AC   CC   GC   TC   AG   CG   GG   TG   AT   CT   GT  
#> [1,] 0.00 0.00 1.00 1.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00 3.00 0.00 0.00
#> [2,] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 4.00 2.00 0.00 0.00 1.00
#> [3,] 0.00 0.00 0.00 2.00 0.00 2.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 1.00
#>      TT   other
#> [1,] 0.00 0.00 
#> [2,] 0.00 0.00 
#> [3,] 0.00 0.00

Create XNAStringSet object with allowed GU wobbles in target

#create custom complementary dictonary with complementary bases coded as IUPAC 
compl_dict <- XNAString::complementary_bases
compl_dict[base == "G"]$target <- "Y"
compl_dict[base == "T"]$target <- "R" # if you have T in your base sequence 
compl_dict[base == "U"]$target <- "R" # if you have U in your base sequence 

compl_dict
#>      base target compl_target
#>    <char> <char>       <char>
#> 1:      A      T            A
#> 2:      C      G            C
#> 3:      G      Y            G
#> 4:      T      R            T
#> 5:      E      G            C
#> 6:      U      R            T

xna <- XNAString::XNAStringSet(base = c("ACGTACG", "CCCGTAC", "AATACTT"), compl_dict = compl_dict)
xna
#> XNAStringSet object
#>      name    base   sugar backbone       target conjugate5 conjugate3
#>    <list>  <list>  <list>   <list>       <list>     <list>     <list>
#> 1:     NA ACGTACG DDDDDDD   OOOOOO CGTACGT,....         NA         NA
#> 2:     NA CCCGTAC DDDDDDD   OOOOOO GTACGGG,....         NA         NA
#> 3:     NA AATACTT DDDDDDD   OOOOOO AAGTATT,....         NA         NA
#>    secondary_structure
#>                 <list>
#> 1:          ......., 0
#> 2:          ......., 0
#> 3:          ......., 0

Session info

Here is the output of sessionInfo() on the system on which this document was compiled running pandoc 3.2.1:

#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] Biostrings_2.75.3   GenomeInfoDb_1.43.2 XVector_0.47.1     
#>  [4] IRanges_2.41.2      S4Vectors_0.45.2    BiocGenerics_0.53.3
#>  [7] generics_0.1.3      XNAString_1.15.0    pander_0.6.5       
#> [10] knitr_1.49          BiocStyle_2.35.0   
#> 
#> loaded via a namespace (and not attached):
#>  [1] SummarizedExperiment_1.37.0 rjson_0.2.23               
#>  [3] xfun_0.49                   bslib_0.8.0                
#>  [5] htmlwidgets_1.6.4           Biobase_2.67.0             
#>  [7] lattice_0.22-6              vctrs_0.6.5                
#>  [9] tools_4.4.2                 bitops_1.0-9               
#> [11] curl_6.0.1                  parallel_4.4.2             
#> [13] Matrix_1.7-1                data.table_1.16.4          
#> [15] BSgenome_1.75.0             lifecycle_1.0.4            
#> [17] GenomeInfoDbData_1.2.13     stringr_1.5.1              
#> [19] compiler_4.4.2              Rsamtools_2.23.1           
#> [21] codetools_0.2-20            htmltools_0.5.8.1          
#> [23] sys_3.4.3                   buildtools_1.0.0           
#> [25] sass_0.4.9                  RCurl_1.98-1.16            
#> [27] yaml_2.3.10                 crayon_1.5.3               
#> [29] jquerylib_0.1.4             BiocParallel_1.41.0        
#> [31] cachem_1.1.0                DelayedArray_0.33.3        
#> [33] abind_1.4-8                 parallelly_1.41.0          
#> [35] digest_0.6.37               stringi_1.8.4              
#> [37] formattable_0.2.1           future_1.34.0              
#> [39] listenv_0.9.1               restfulr_0.0.15            
#> [41] maketools_1.3.1             fastmap_1.2.0              
#> [43] grid_4.4.2                  cli_3.6.3                  
#> [45] SparseArray_1.7.2           magrittr_2.0.3             
#> [47] S4Arrays_1.7.1              XML_3.99-0.17              
#> [49] future.apply_1.11.3         UCSC.utils_1.3.0           
#> [51] pwalign_1.3.1               rmarkdown_2.29             
#> [53] httr_1.4.7                  globals_0.16.3             
#> [55] matrixStats_1.4.1           evaluate_1.0.1             
#> [57] GenomicRanges_1.59.1        BiocIO_1.17.1              
#> [59] rtracklayer_1.67.0          rlang_1.1.4                
#> [61] Rcpp_1.0.13-1               glue_1.8.0                 
#> [63] BiocManager_1.30.25         jsonlite_1.8.9             
#> [65] R6_2.5.1                    MatrixGenerics_1.19.0      
#> [67] GenomicAlignments_1.43.0    zlibbioc_1.52.0