Development of XNAString package aims at enabling efficient manipulation of modified oligonucleotide sequences. The package inherits some of the functionalities from Biostrings package. In contrary to Biostrings sequences, XNAString classes allow for description of base sequence, sugar and backbone in a single object. XNAString is able to capture single stranded oligonucleotides, siRNAs, PNAs, shRNAs, gRNAs and synthetic mRNAs, and enable users to apply sequence-manipulating Bioconductor packages to their analysis. XNAString can read and write a HELM notation, compute alphabet frequency, align and match targets.
All exported methods are listed in this section. They are divided into four tables:
XNAString methods
XNAStringSet methods
Both XNAString and XNAStringSet methods
Other functions
function. | description |
---|---|
XNAString | Create XNAString object by passing at least base (sugar, name, backbone, target, conjugates and dictionary are optional). |
XNAReverseComplement | Take XNAString object and return reverse complement of base slot. |
predictMfeStructure | Take XNAString object and apply RNAfold_MFE function from ViennaRNA package on base slot (if single stranded molecule). |
predictDuplexStructure | Take XNAString object and apply RNAcofold_MFE function from ViennaRNA package on base slot (if double stranded molecule) or duplicate base and apply RNAcofold_MFE function from ViennaRNA (if single stranded molecule) |
XNAStringToHelm | Change XNAString/XNAStringSet object to helm notation. |
function. | description |
---|---|
XNAStringSet | Create XNAStringSet object by passing XNAString objects list. |
set2List | Change XNAStringSet object to list of XNAString objects. |
set2Dt | Change XNAStringSet object to data.table. |
dt2Set | Change data.table (or data.frame) to XNAStringSet object |
[ | Extract single/multiple XNAString objects (XNAStringSet object returned) by passing index/indexes number or name/names. |
[[ | Extract single XNAString object (XNAString object returned) by passing index number or name. |
function. | description |
---|---|
name / name<- | Extract / overwrite name slot. |
base / base<- | Extract / overwrite base slot. |
sugar / sugar<- | Extract / overwrite sugar slot. |
backbone / backbone<- | Extract / overwrite backbone slot. |
target / target<- | Extract / overwrite target slot. |
conjugate5 / conjugate5<- | Extract / overwrite conjugate5 slot. |
conjugate3 / conjugate3<- | Extract / overwrite conjugate3 slot. |
secondary_structure / secondary_structure<- | Extract / overwrite secondary_structure slot. |
duplex_structure / duplex_structure<- | Extract / overwrite duplex_structure slot. |
dictionary / dictionary<- | Extract / overwrite dictionary slot. |
compl_dictionary / compl_dictionary<- | Extract / overwrite compl_dictionary slot. |
XNAStringFromHelm | Change helm notation to XNAString/XNAStringSet object. |
XNAPairwiseAlignment | Inherited from Biostrings package. Solve global/local/ends-free alignment problems. |
XNAMatchPattern | Inherited from Biostrings package. Find/count all the occurrences of a given pattern (typically short) in a reference sequence (typically long). Support mismatches and indels. |
XNAVmatchPattern | Inherited from Biostrings package. Find/count all the occurrences of a given pattern (typically short) in a set of reference sequences. Support mismatches and indels. |
XNAMatchPDict | Inherited from Biostrings package. Find/count all the occurrences of a set of patterns in a reference sequence. Support a small number of mismatches. |
XNAAlphabetFrequency | Tabulate the letters and count frequency for nucleotides. |
XNADinucleotideFrequency | Tabulate the letters and count frequency for dinucleotides. |
mimir2XnaDict | Reformat mimir table to XNA dictionary standards |
function. | description |
---|---|
concatDict | Concatanate custom HELM-symbol dictionary with built-in HELM-symbol xna_dictionary. |
mimir2XnaDict | Rewrite dictionary table to standard format. |
XNAString class is subclass of Biostrings::BString and has 13 slots:
name (character)
base (character, RNAString, RNAStringSet, DNAString or DNAStringSet)
sugar (character)
backbone (character, if missing and base is character default string is created by replicating character ‘X’, if missing and base is DNAString/RNAString, backbone all ‘O’)
target (DNAStringSet, DNAString or character)
conjugate5 (character)
conjugate3 (character)
secondary_structure (list - structure is character and mfe numeric)
duplex_structure (list - structure is character and mfe numeric)
dictionary (xna_dictionary default, data.table type)
compl_dictionary (complementary_bases default, data.table type)
default_sugar (character)
default_backbone (character)
Target, name and conjugate slots can be NA. If backbone or dictionaries missing, default values in use.
Validation procedure requirements for XNAString objects:
slots type must reflect above requirements
length of sugar equals base
length of backbone is one element shorter than sugar and base
condition on available letters in base / sugar / backbone dictionary must be satisfied
length of base, sugar and backbone vectors is the same and is equal 1 or 2, length of target vector >0, length of name and conjugates vectors equal 1
length of default_sugar and default_backbone is 1, nchar is also 1 if not NA
Object can be created only when all validation procedure requirements are met.
If base slot is passed as character and sugar slot passed as well, default backbone is a replication of character ‘X’. Target / secondary_structure / duplex_structure slot default is created by XNAReverseComplement / predictMfeStructure / predictDuplexStructure method applied on base slot.
## XNAString object
## name: NA
## base: ATCG
## sugar: OOOO
## backbone: XXX
## target: CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
obj <- XNAString(base = 'ATCG',
sugar = 'OOOO',
backbone = 'SOS',
target = Biostrings::DNAStringSet('TAGC'),
conjugate3 = "[5gn2c6]",
name = "oligo1")
obj
## XNAString object
## name: oligo1
## base: ATCG
## sugar: OOOO
## backbone: SOS
## target: TAGC
## conjugate5: NA
## conjugate3: [5gn2c6]
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
obj <- XNAString(base = 'ATCG',
sugar = 'OOOO',
backbone = 'SOS',
target = Biostrings::DNAStringSet(c('TAGC', 'TATC')))
obj
## XNAString object
## name: NA
## base: ATCG
## sugar: OOOO
## backbone: SOS
## target: TAGC, TATC
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
obj <- XNAString(base = c('ATCG', 'TAGC'),
sugar = c('OOOO', 'OOOO'),
backbone = c('SOS','SOS'),
target = Biostrings::DNAStringSet(c('TAGC', 'TATC')))
obj
## XNAString object
## name: NA
## base: ATCG, TAGC
## sugar: OOOO, OOOO
## backbone: SOS, SOS
## target: TAGC, TATC
## conjugate5: NA
## conjugate3: NA
## secondary_structure:
## duplex_structure: ....&...., 0
XNAString with DNAString base should create default sugar all D, and backbone all O.
XNAString with RNAString base should create default sugar all R, and backbone all O.
## XNAString object
## name: NA
## base: ACGATCG
## sugar: DDDDDDD
## backbone: OOOOOO
## target: CGATCGT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ......., 0
## duplex_structure: .((((((&.)))))), -7.6
## XNAString object
## name: NA
## base: ATCG
## sugar: DDDD
## backbone: OOO
## target: CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
## XNAString object
## name: NA
## base: ATCG
## sugar: FFFF
## backbone: XXX
## target: CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
## XNAString object
## name: NA
## base: ATCG
## sugar: DDDD
## backbone: OOO
## target: CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
## XNAString object
## name: NA
## base: ATCG
## sugar: RRRR
## backbone: OOO
## target: CGAT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
Public slots setter - once an object is created, the setter enables users to modify all slots.
New input has to satisfy validation procedures (e.g. the sugar must be of the same length as base)
## [1] "ATCG"
## [1] "CTGA"
XNAReverseComplement method takes XNAString object as an input and
finds reverse complement for base slot.
This method is used to create default target slot. Each object has
compl_dictionary slot (if not passed while creating an object -
complementary_bases table used as default). All bases from dictionary
must be present also in compl_dictionary. If that is not the case,
target slot default is empty. It is also possible to return multiple
targets with XNAReverseComlement method. It can be done by passing
custom complementary_bases dictionary and using Iupac symbols. E.g. by
adding symbol ‘R’ to custom complementary_bases dictionary, there are
two possibile bases: ‘G’ and ‘A’ (see the example below). More symbols
and corresponding bases (Iupac standards): * symbol “W” - bases “A”,
“T”
symbol “S” - bases “G”, “C”
symbol “M” - bases “A”, “C”
symbol “K” - bases “G”, “T”
symbol “R” - bases “A”, “G”
symbol “Y” - bases “C”, “T”
symbol “B” - bases “C”, “G”, “T”
symbol “D” - bases “A”, “G”, “T”
symbol “H” - bases “A”, “C”, “T”
symbol “V” - bases “A”, “C”, “G”
symbol “N” - bases “A”, “C”, “G”, “T”
## base target compl_target
## <char> <char> <char>
## 1: A T A
## 2: C G C
## 3: G C G
## 4: T A T
## 5: E G C
## 6: U A T
obj1 <- XNAString(base = "ACEGTTGGT",
sugar = 'FODDDDDDD',
conjugate3 = 'TAG')
XNAString::XNAReverseComplement(obj1)
## [1] "ACCAACGGT"
custom_compl_dict <-
rbind(
XNAString::complementary_bases[seq(1, 5),],
data.table::data.table(
base = 'U',
target = 'R',
compl_target = 'T'
)
)
obj2 <- XNAString(base = "ACGCUUA",
sugar = 'DDDDDDD',
compl_dictionary = custom_compl_dict)
obj2
## XNAString object
## name: NA
## base: ACGCUUA
## sugar: DDDDDDD
## backbone: XXXXXX
## target: TAAGCGT, TGAGCGT, TAGGCGT, TGGGCGT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ......., 0
## duplex_structure: ..((...&..))..., -2.1
#create custom complementary dictonary with complementary bases coded as IUPAC
compl_dict <- XNAString::complementary_bases
compl_dict[base == "G"]$target <- "Y"
compl_dict[base == "T"]$target <- "R" # if you have T in your base sequence
compl_dict[base == "U"]$target <- "R" # if you have U in your base sequence
compl_dict
## base target compl_target
## <char> <char> <char>
## 1: A T A
## 2: C G C
## 3: G Y G
## 4: T R T
## 5: E G C
## 6: U R T
xna <- XNAString::XNAString(base = "ACGTACGT", sugar = "DDDDDDDD",compl_dictionary = compl_dict)
xna
## XNAString object
## name: NA
## base: ACGTACGT
## sugar: DDDDDDDD
## backbone: XXXXXXX
## target: ACGTACGT, GCGTACGT, ATGTACGT, GTGTACGT, ACGTGCGT, GCGTGCGT, ATGTGCGT, GTGTGCGT, ACGTATGT, GCGTATGT, ATGTATGT, GTGTATGT, ACGTGTGT, GCGTGTGT, ATGTGTGT, GTGTGTGT
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ........, 0
## duplex_structure: ((((((((&)))))))), -9.8
predictMfeStructure - this method uses RNAfold_MFE function from C library ViennaRNA, this function can handle only standard bases. Takes XNAString object and applies RNAfold_MFE function on base slot if base is DNAString or RNAString. If base is character (not-standard bases allowed), then RNAfold_mfe works for A, C, G, T, E, U letters with built-in complementary_bases dictionary (e.g. E letter is changed to C before applying predictMfeStructure function). If base slot includes other letters, their translation to compl_target must be included in compl_dict slot. Otherwise predictMfeStructure returns empty string. predictMfeStructure method is used to create default secondary_structure slot.
obj <- XNAString(base = 'GAGAGGGAACCAGGCAGGGACCGCAGACAACA',
sugar = 'FODLMFODLMFODLMFODLMFODLMFFFFFFF')
XNAString::predictMfeStructure(obj)
## $structure
## [1] ".....((..((.....))..)).........."
##
## $mfe
## [1] -4.5
predictDuplexStructure - this method uses RNAcofold_MFE function from C library ViennaRNA, this function can handle only standard bases. predictDuplexStructure takes XNAString object and applies RNAcofold_MFE function on base slot if base is DNAStringSet or RNAStringSet. If base is character vector (not-standard bases allowed), then RNAcofold_mfe works for A, C, G, T, E, U letters with built-in complementary_bases dictionary (e.g. E letter is changed to C before applying predictDuplexStructure function). If base slot includes other letters, their translation to compl_target must be included in compl_dict slot passed manually. Otherwise predictDuplexStructure returns empty string. predictDuplexStructure method is used to create default duplex_structure slot.
obj <- XNAString(base = Biostrings::DNAStringSet(c('GAGAGGGAACCAGGCAGGGACCGCAGACAACA', 'GAGAGGGAACCAGGCAGGGACCGCAGACAACA')))
XNAString::predictDuplexStructure(obj)
## $structure
## [1] ".....((..((..((.((..((..........&.....))..))..)).))..)).........."
##
## $mfe
## [1] -14.5
RNAcofold_MFE function needs two sequences, so if molecule is sinlge stranded, base sequence is duplicated within the predictDuplexStructure method.
obj <- XNAString(base = Biostrings::DNAString('GAGAGGGAACCAGGCAGGGACCGCAGACAACA'),
sugar = 'FODLMFODLMFODLMFODLMFODLMFFFFFFF')
XNAString::predictDuplexStructure(obj)
## $structure
## [1] ".....((..((..((.((..((..........&.....))..))..)).))..)).........."
##
## $mfe
## [1] -14.5
XNAStringSet class consists of XNAString objects given as a list. Validation procedure checks if all objects are of XNAString class.
XNAString_obj1 <- XNAString(base = 'ATCG', sugar = 'FODD')
XNAString_obj2 <- XNAString(base = 'TTCT', sugar = 'FOLL')
XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))
XNAStringSet_obj
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: NA ATCG FODD XXX CGAT NA NA
## 2: NA TTCT FOLL XXX AGAA NA NA
## secondary_structure
## <list>
## 1: ...., 0
## 2: ...., 0
XNAString_obj1 <- XNAString(base = c('ATCG', 'TAGC'),
sugar = c('OOOO', 'OOOO'),
backbone = c('SOS','SOS'),
target = Biostrings::DNAStringSet(c('TAGC', 'TACC')))
XNAString_obj2 <- XNAString(base = c('GGCG', 'TATC'),
sugar = c('OOOO', 'OOOO'),
target = Biostrings::DNAStringSet(c('CCGC', 'TATG')))
XNAStringSet_siRNA <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))
XNAStringSet_siRNA
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <list> <list> <list>
## 1: NA ATCG, TAGC OOOO, OOOO SOS, SOS TAGC, TACC NA NA
## 2: NA GGCG, TATC OOOO, OOOO XXX, XXX CCGC, TATG NA NA
## secondary_structure
## <list>
## 1:
## 2:
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: NA ATGCT DDDDD OOOO AGCAT NA NA
## 2: NA TGCAT DDDDD OOOO ATGCA NA NA
## 3: NA ATATG DDDDD OOOO CATAT NA NA
## secondary_structure
## <list>
## 1: ....., 0
## 2: ....., 0
## 3: ....., 0
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: NA ATGCT RRRRR XXXX AGCAT NA NA
## 2: NA TGCAT RRRRR XXXX ATGCA NA NA
## 3: NA ATATG RRRRR XXXX CATAT NA NA
## secondary_structure
## <list>
## 1: ....., 0
## 2: ....., 0
## 3: ....., 0
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: NA ATGCT DDDDD XXXX AGCAT NA NA
## 2: NA TGCAT DDDDD XXXX ATGCA NA NA
## 3: NA ATATG DDDDD XXXX CATAT NA NA
## secondary_structure
## <list>
## 1: ....., 0
## 2: ....., 0
## 3: ....., 0
XNAStringSet(base= list(c('TT', 'GG'),
c('TG', 'GT'),
c('TG')),
sugar = list(c('FF', 'FO'),
c('OO', 'OF'),
c('OO')),
backbone =list(c('X', 'X'),
c('X', 'X'),
c('X')))
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: NA TT, GG FF, FO X, X AA NA NA
## 2: NA TG, GT OO, OF X, X CA NA NA
## 3: NA TG OO X CA NA NA
## secondary_structure
## <list>
## 1:
## 2:
## 3: .., 0
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: NA TT FF X AA NA NA
## 2: NA GG FF X CC NA NA
## secondary_structure
## <list>
## 1: .., 0
## 2: .., 0
compl_dict <- XNAString::complementary_bases
compl_dict[base == "G"]$target <- "Y"
compl_dict[base == "T"]$target <- "R" # if you have T in your base sequence
compl_dict[base == "U"]$target <- "R" # if you have U in your base sequence
XNAStringSet(base= c('TT', 'GG'), sugar = c('FF', 'FF'), compl_dict = compl_dict)
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <list> <list> <list>
## 1: NA TT FF X AA, GA, .... NA NA
## 2: NA GG FF X CC, TC, .... NA NA
## secondary_structure
## <list>
## 1: .., 0
## 2: .., 0
# compl_dict in use only if target empty
XNAStringSet(base= c('TT', 'GG'),
sugar = c('FF', 'FF'),
target = c('AA', 'CC'),
compl_dict = compl_dict)
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: NA TT FF X AA NA NA
## 2: NA GG FF X CC NA NA
## secondary_structure
## <list>
## 1: .., 0
## 2: .., 0
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: NA TT DD O AA NA NA
## 2: NA GG DD O CC NA NA
## secondary_structure
## <list>
## 1: .., 0
## 2: .., 0
dt <- data.table::data.table(base = c('TT', 'GG'), default_sugar = 'R', default_backbone = 'X')
out2 <- dt2Set(dt)
out2
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: NA TT DD O AA NA NA
## 2: NA GG DD O CC NA NA
## secondary_structure
## <list>
## 1: .., 0
## 2: .., 0
dt <- data.table::data.table(base= list(c('TT', 'GG'),
c('TG', 'GT'),
c('TG')),
sugar = list(c('FF', 'FO'),
c('OO', 'OF'),
c('OO')),
backbone =list(c('X', 'X'),
c('X', 'X'),
c('X')))
out3 <- dt2Set(dt)
out3
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: NA TT, GG FF, FO X, X AA NA NA
## 2: NA TG, GT OO, OF X, X CA NA NA
## 3: NA TG OO X CA NA NA
## secondary_structure
## <list>
## 1:
## 2:
## 3: .., 0
Methods which enable XNAStringSet extraction / modification:
set2List
extraction methods (“[” and “[[”)
set2Dt method
public slots setter/getter
XNAStringSet from data.table/data.frame
XNAString_obj1 <- XNAString(name = 'oligo1', base = 'ATCG', sugar = 'FODD')
XNAString_obj2 <- XNAString(name = 'oligo2',base = 'TTCT', sugar = 'FOLL')
XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))
set2List(XNAStringSet_obj)
## [[1]]
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: oligo1 ATCG FODD XXX CGAT NA NA
## secondary_structure
## <list>
## 1: ...., 0
##
## [[2]]
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: oligo2 TTCT FOLL XXX AGAA NA NA
## secondary_structure
## <list>
## 1: ...., 0
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: oligo2 TTCT FOLL XXX AGAA NA NA
## secondary_structure
## <list>
## 1: ...., 0
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: oligo2 TTCT FOLL XXX AGAA NA NA
## secondary_structure
## <list>
## 1: ...., 0
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: oligo1 ATCG FODD XXX CGAT NA NA
## 2: oligo2 TTCT FOLL XXX AGAA NA NA
## secondary_structure
## <list>
## 1: ...., 0
## 2: ...., 0
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: oligo1 ATCG FODD XXX CGAT NA NA
## 2: oligo2 TTCT FOLL XXX AGAA NA NA
## secondary_structure
## <list>
## 1: ...., 0
## 2: ...., 0
## XNAString object
## name: oligo2
## base: TTCT
## sugar: FOLL
## backbone: XXX
## target: AGAA
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
## XNAString object
## name: oligo2
## base: TTCT
## sugar: FOLL
## backbone: XXX
## target: AGAA
## conjugate5: NA
## conjugate3: NA
## secondary_structure: ...., 0
## duplex_structure: ....&...., 0
set2Dt(XNAStringSet_obj, slots =c('name', 'base', 'sugar', 'backbone', 'target', 'conjugate5', 'conjugate3'))
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <char> <list> <list>
## 1: oligo1 ATCG FODD XXX CGAT NA NA
## 2: oligo2 TTCT FOLL XXX AGAA NA NA
## [1] "ATCG" "GGCG"
## [1] "TAGC" "TATC"
base(XNAStringSet_siRNA, 1) <- c('CTGA', 'CCCC')
base(XNAStringSet_siRNA, 2) <- c('TTTT', 'TTTT')
XNAStringSet_siRNA
## XNAStringSet object
## name base sugar backbone target conjugate5 conjugate3
## <list> <list> <list> <list> <list> <list> <list>
## 1: NA CTGA, TTTT OOOO, OOOO SOS, SOS TAGC, TACC NA NA
## 2: NA CCCC, TTTT OOOO, OOOO XXX, XXX CCGC, TATG NA NA
## secondary_structure
## <list>
## 1:
## 2:
dt <- data.table::data.table(base= c('AEACACACACACAEAE', 'AAETGCTETGTATTTTTE'),
sugar = c('LLLDDDDDDDDDDLLL', 'LLLDDDDDDDDDDDLLLL'),
backbone =c('SSSSSSSSSSSSSSS', 'SSSSSSSSSSSSSSSSS'))
dt
## base sugar backbone
## <char> <char> <char>
## 1: AEACACACACACAEAE LLLDDDDDDDDDDLLL SSSSSSSSSSSSSSS
## 2: AAETGCTETGTATTTTTE LLLDDDDDDDDDDDLLLL SSSSSSSSSSSSSSSSS
## XNAStringSet object
## name base sugar backbone target conjugate5
## <list> <list> <list> <list> <char> <list>
## 1: NA AEACACAC.... LLLDDDDD.... SSSSSSSS.... GTGTGTGTGTGTGTGT NA
## 2: NA AAETGCTE.... LLLDDDDD.... SSSSSSSS.... GAAAAATACAGAGCAGTT NA
## conjugate3 secondary_structure
## <list> <list>
## 1: NA ............
## 2: NA ............
Methods written in this section translate HELM notation to multistring notation and create XNAString/XNAStringSet object.
helm <-
"CHEM1{[5gn2c6]}|RNA1{P.[dR](C)P.[dR](A)P.[LR](G)[sP].[LR](A)[sP].[LR](G)[sP].[LR](A)[sP].[dR](A)[sP].[dR](G)[sP].[dR](G)[sP].[dR](C)[sP].[dR](A)[sP].[dR](C)[sP].[dR](A)[sP].[dR](G)[sP].[dR](A)[sP].[LR]([5meC])[sP].[LR](G)[sP].[LR](G)}$CHEM1,RNA1,1:R2-1:R1$$$V2.0"
XNAStringFromHelm(helm, name ='oligo1')
## XNAString object
## name: oligo1
## base: GAGAAGGCACAGAEGG
## sugar: LLLLDDDDDDDDDLLL
## backbone: XXXXXXXXXXXXXXX
## target: CCGTCTGTGCCTTCTC
## conjugate5: [5gn2c6]
## conjugate3:
## secondary_structure: ................, 0
## duplex_structure: ......((........&......))........, -2.1
helm <-
"RNA1{[mR](C)P.[mR](C)P.[fR](C)P.[mR](C)P.[fR](C)P.[mR](G)P.[fR](C)P.[mR](C)P.[fR](G)P.[mR](T)P.[fR](G)P.[mR](G)P.[fR](T)P.[mR](T)P.[fR](C)P.[mR](A)P.[fR](T)P.[mR](A)P.[fR](A)}|RNA2{[fR](T)P.[fR](T)P.[mR](A)P.[fR](T)P.[mR](G)P.[fR](A)P.[mR](A)P.[fR](C)P.[mR](C)P.[fR](A)P.[mR](C)P.[fR](G)P.[mR](G)P.[fR](C)P.[mR](A)P.[fR](G)P.[mR](G)P.[fR](G)P.[mR](G)P.[fR](C)P.[mR](G)}$RNA1,RNA2,2:pair-56:pair|RNA1,RNA2,5:pair-53:pair|RNA1,RNA2,8:pair-50:pair|RNA1,RNA2,11:pair-47:pair|RNA1,RNA2,14:pair-44:pair|RNA1,RNA2,17:pair-41:pair|RNA1,RNA2,20:pair-38:pair|RNA1,RNA2,23:pair-35:pair|RNA1,RNA2,26:pair-32:pair|RNA1,RNA2,29:pair-29:pair|RNA1,RNA2,32:pair-26:pair|RNA1,RNA2,35:pair-23:pair|RNA1,RNA2,38:pair-20:pair|RNA1,RNA2,41:pair-17:pair|RNA1,RNA2,44:pair-14:pair|RNA1,RNA2,47:pair-11:pair|RNA1,RNA2,50:pair-8:pair|RNA1,RNA2,53:pair-5:pair|RNA1,RNA2,56:pair-2:pair$$$V2.0"
XNAStringFromHelm(helm)
## XNAString object
## name: NA
## base: CCCCCGCCGTGGTTCATAA, TTATGAACCACGGCAGGGGCG
## sugar: OOFOFOFOFOFOFOFOFOF, FFOFOFOFOFOFOFOFOFOFO
## backbone: OOOOOOOOOOOOOOOOOO, OOOOOOOOOOOOOOOOOOOO
## target: TTATGAACCACGGCGGGGG
## conjugate5:
## conjugate3:
## secondary_structure:
## duplex_structure: ((((.(((((((((((((.&.))))))))))))).)))).., -32.900002
helm <-
"CHEM1{[5gn2c6]}|RNA1{P.[dR](C)P.[dR](A)P.[LR](T)[sP].[LR](T)[sP].[LR](G)[sP].[dR](A)[sP].[dR](A)[sP].[dR](T)[sP].[dR](A)[sP].[dR](A)[sP].[dR](G)[sP].[dR](T)[sP].[dR](G)[sP].[dR](G)[sP].[dR](A)[sP].[LR](T)[sP].[LR](G)[sP].[LR](T)}$CHEM1,RNA1,1:R2-1:R1$$$V2.0"
XNAStringFromHelm(helm)
## XNAString object
## name: NA
## base: TTGAATAAGTGGATGT
## sugar: LLLDDDDDDDDDDLLL
## backbone: XXXXXXXXXXXXXXX
## target: ACATCCACTTATTCAA
## conjugate5: [5gn2c6]
## conjugate3:
## secondary_structure: ................, 0
## duplex_structure: ................&................, 0
helm <- c("CHEM1{[5gn2c6]}|RNA1{P.[dR](C)P.[dR](A)P.[LR](G)[sP].[LR](A)[sP].[LR](G)[sP].[LR](A)[sP].[dR](A)[sP].[dR](G)[sP].[dR](G)[sP].[dR](C)[sP].[dR](A)[sP].[dR](C)[sP].[dR](A)[sP].[dR](G)[sP].[dR](A)[sP].[LR]([5meC])[sP].[LR](G)[sP].[LR](G)}$CHEM1,RNA1,1:R2-1:R1$$$V2.0",
"RNA1{[LR](T)[sP].[LR](G)[sP].[dR](T)[sP].[dR](G)[sP].[LR](T)[sP].[LR](G)[sP].[dR](T)[sP].[dR](G)[sP].[LR](T)[sP].[LR](G)[sP].[dR](T)[sP].[dR](G)[sP].[LR](T)[sP].[LR](G)[sP].[LR](T)}$$$$V2.0")
XNAStringFromHelm(helm, name =c('oligo1', 'oligo2'))
## XNAStringSet object
## name base sugar backbone target conjugate5
## <list> <list> <list> <list> <char> <list>
## 1: oligo1 GAGAAGGC.... LLLLDDDD.... XXXXXXXX.... CCGTCTGTGCCTTCTC [5gn2c6]
## 2: oligo2 TGTGTGTG.... LLDDLLDD.... XXXXXXXX.... ACACACACACACACA
## conjugate3 secondary_structure
## <list> <list>
## 1: ............
## 2: ............
Methods written in this section translate multistring notation from XNAString/XNAStringSet object to HELM notation. If molecule is double stranded also pairing information is added.
obj <- XNAString(base = 'GAGTTACTTGCCAAET',
sugar = 'LLLDMDDDDDDDDLLL',
backbone = 'XXXXXXXXXXXXXX2')
XNAStringToHelm(obj)
## [1] "RNA1{[LR](G)[sP].[LR](A)[sP].[LR](G)[sP].[dR](T)[sP].[MOE](T)[sP].[dR](A)[sP].[dR](C)[sP].[dR](T)[sP].[dR](T)[sP].[dR](G)[sP].[dR](C)[sP].[dR](C)[sP].[dR](A)[sP].[LR](A)[sP].[LR]([5meC])[PS2].[LR](T)}$$$$V2.0"
obj <- XNAString(
base = Biostrings::DNAStringSet(c("CCCC", "GGGG")),
sugar = c("OOFO", "FFOF"),
backbone = c("OOO", "OOO"),
target = '',
conjugate3 = "",
conjugate5 = "")
XNAStringToHelm(obj)
## [1] "RNA1{[mR](C)P.[mR](C)P.[fR](C)P.[mR](C)}|RNA2{[fR](G)P.[fR](G)P.[mR](G)P.[fR](G)}$RNA1,RNA2,2:pair-11:pair|RNA1,RNA2,5:pair-8:pair|RNA1,RNA2,8:pair-5:pair|RNA1,RNA2,11:pair-2:pair$$$$V2.0"
obj <- XNAString(
base = c("CCCCEGC", "UUAUGAT"),
sugar = c("OOFOFOF", "FFOFOFO"),
backbone = c("OOOOOO", "OOOOOO"),
target = '',
conjugate3 = "[5gn2c6]",
conjugate5 = "")
XNAStringToHelm(obj)
## [1] "RNA1{[mR](C)P.[mR](C)P.[fR](C)P.[mR](C)P.[fR]([5meC])P.[mR](G)P.[fR](C)}|CHEM1{[5gn2c6]}|RNA2{[fR](U)P.[fR](U)P.[mR](A)P.[fR](U)P.[mR](G)P.[fR](A)P.[mR](T)}$RNA1,RNA2,2:pair-20:pair|RNA1,RNA2,5:pair-17:pair|RNA1,RNA2,8:pair-14:pair|RNA1,RNA2,11:pair-11:pair|RNA1,RNA2,14:pair-8:pair|RNA1,RNA2,17:pair-5:pair|RNA1,RNA2,20:pair-2:pair$$$$V2.0"
Methods in this section are inherited from Biostrings package. All options in Biostrings methods:
pairwiseAlignment
matchPattern
vmatchPattern
matchPDict
are available in XNAStrng package as well. XNAString package renamed these mathods by adding XNA prefix in the beginning.
Target sequence is used as pattern.
obj <- XNAString(base = 'ATCGATATATATACACATGTATGATG',
sugar = 'OOOODDDDDDDDDDDDDDDDDDDDDD',
target = DNAStringSet(c('TAGCTATATATATGTGTACATACTAC', 'TAGCTAGATATATGTGTACATACTAC')))
subject <- 'ATCGATATATATACACATGTATGATGTAGCTATATATATGTGTACATACTACATCGATATATATACACATGTATGATG'
## Warning in .call_fun_in_pwalign("nucleotideSubstitutionMatrix", ...): nucleotideSubstitutionMatrix() has moved to the pwalign package. Please
## call pwalign::nucleotideSubstitutionMatrix() to get rid of this
## warning.
XNAString::XNAPairwiseAlignment(pattern = obj, subject = subject, substitutionMatrix = substitutionMatrix)
## Global PairwiseAlignmentsSingleSubject (1 of 2)
## pattern: --------------------------TAGCTATA...CATACTAC--------------------------
## subject: ATCGATATATATACACATGTATGATGTAGCTATA...CATACTACATCGATATATATACACATGTATGATG
## score: -202
## Warning in .call_fun_in_pwalign("nucleotideSubstitutionMatrix", ...): nucleotideSubstitutionMatrix() has moved to the pwalign package. Please
## call pwalign::nucleotideSubstitutionMatrix() to get rid of this
## warning.
XNAString::XNAPairwiseAlignment(pattern = obj, subject = subject, type = "local", substitutionMatrix = substitutionMatrix)
## Local PairwiseAlignmentsSingleSubject (1 of 2)
## pattern: [1] TAGCTATATATATGTGTACATACTAC
## subject: [27] TAGCTATATATATGTGTACATACTAC
## score: 26
Target sequence is used as pattern. If more then one target is present in XNAString object, first is used as default. User can specify which target sequence should be taken as pattern (target.number parameter).
## Views on a 78-letter BString subject
## subject: ATCGATATATATACACATGTATGATGTAGCTATA...CATACTACATCGATATATATACACATGTATGATG
## views:
## start end width
## [1] 27 52 26 [TAGCTATATATATGTGTACATACTAC]
## Views on a 78-letter BString subject
## subject: ATCGATATATATACACATGTATGATGTAGCTATA...CATACTACATCGATATATATACACATGTATGATG
## views: NONE
## Views on a 78-letter BString subject
## subject: ATCGATATATATACACATGTATGATGTAGCTATA...CATACTACATCGATATATATACACATGTATGATG
## views:
## start end width
## [1] 27 52 26 [TAGCTATATATATGTGTACATACTAC]
Only one subject is allowed. Results created for all targets.
subject <- 'ATCGATATATATACACATGTATGATGTAGCTATATATATGTGTACATACTACATCGATATATATACACATGTATGATG'
XNAString::XNAMatchPDict(pdict = obj, subject = subject)
## MIndex object of length 2
## [[1]]
## IRanges object with 1 range and 0 metadata columns:
## start end width
## <integer> <integer> <integer>
## [1] 27 52 26
##
## [[2]]
## IRanges object with 0 ranges and 0 metadata columns:
## start end width
## <integer> <integer> <integer>
Letters are tabulated and occurence frequency calculated for nucleotides. There are 6 arguments: * obj (either XNAString and XNAStringSet object) * slot (‘base’, ‘sugar’ or ‘backbone’) * letters (frequency checked just for these letters. If empty, letters from object’s dictionary taken as the default. ) * matrix_nbr (1 is the default. If 1 - first slot’s element is use, if 2 - 2nd element in slot) * as.prob (default FALSE) * base_only (default FALSE. If TRUE, ‘A’, ‘C’, ‘G’, ‘T’, ‘other’ are tabulated)
xnastring_obj <- XNAString(name = 'oligo1',
base = c('AEEE'),
sugar = c('FFOO'),
target = DNAStringSet('TTT'))
XNAString::XNAAlphabetFrequency(obj = xnastring_obj, slot = 'base')
## A C E G T U
## [1,] 1.00 0.00 3.00 0.00 0.00 0.00
## A C E G T U
## [1,] 0.25 0.00 0.75 0.00 0.00 0.00
## A C G T other
## 1.00 0.00 0.00 0.00 3.00
## A C
## [1,] 1.00 0.00
xnastring_obj <- XNAString(name = 'oligo1',
base = c('AAEC', 'ECTA'),
sugar = c('FFOO', 'DDLM'))
XNAString::XNAAlphabetFrequency(obj = xnastring_obj, slot = 'sugar', matrix_nbr = 2)
## D F L M O R
## [1,] 2.00 0.00 1.00 1.00 0.00 0.00
XNAString_obj1 <- XNAString(base = 'ATCG', sugar = 'FODD')
XNAString_obj2 <- XNAString(base = 'TTCT', sugar = 'FOLL')
XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))
XNAString::XNAAlphabetFrequency(obj = XNAStringSet_obj, slot = 'base')
## A C E G T U
## [1,] 1.00 1.00 0.00 1.00 1.00 0.00
## [2,] 0.00 1.00 0.00 0.00 3.00 0.00
XNAString_obj1 <- XNAString(base = c('ATCG', 'TAGC'),
sugar = c('OFOO', 'ODDF'),
backbone = c('SOS','SOS'))
XNAString_obj2 <- XNAString(base = c('GGCG', 'TATC'),
sugar = c('OOOO', 'OOFO'))
XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))
XNAString::XNAAlphabetFrequency(obj = XNAStringSet_obj, slot = 'sugar', matrix_nbr = 2)
## D F L M O R
## [1,] 2.00 1.00 0.00 0.00 1.00 0.00
## [2,] 0.00 1.00 0.00 0.00 3.00 0.00
Letters are tabulated and occurence frequency calculated for dinucleotides. There are 6 arguments:
obj (either XNAString and XNAStringSet object)
slot (‘base’, ‘sugar’ or ‘backbone’)
double_letters (frequency checked just for these double letters. If empty, all possible double letters from object’s dictionary taken as the default.)
matrix_nbr (1 is the default. If 1 - first slot’s element is use, if 2 - 2nd element in slot)
as.prob (default FALSE)
base_only (default FALSE. If TRUE, all possible double letters composed of: ‘A’, ‘C’, ‘G’, ‘T’ are tabulated)
xnastring_obj <- XNAString(name = 'oligo1',
base = c('GCGC'),
sugar = c('FODL'),
target = DNAStringSet('TTTT'))
XNAString::XNADinucleotideFrequency(obj = xnastring_obj, slot = 'base')
## AA CA EA GA TA UA AC CC EC GC TC UC AE CE EE
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00
## GE TE UE AG CG EG GG TG UG AT CT ET GT TT UT
## [1,] 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## AU CU EU GU TU UU
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00
## AA CA EA GA TA UA AC CC EC GC TC UC AE CE EE
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.67 0.00 0.00 0.00 0.00 0.00
## GE TE UE AG CG EG GG TG UG AT CT ET GT TT UT
## [1,] 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## AU CU EU GU TU UU
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00
## AA CA GA TA AC CC GC TC AG CG GG TG AT
## 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 1.00 0.00 0.00 0.00
## CT GT TT other
## 0.00 0.00 0.00 0.00
XNAString::XNADinucleotideFrequency(obj = xnastring_obj, slot = 'base', double_letters = c('GC', 'CU'))
## GC CU
## [1,] 2.00 0.00
my_dict <- data.table::data.table(type = c(rep('base', 3), rep('sugar', 2), rep('backbone', 2)),
symbol = c('G', 'E', 'A', 'F', 'O', 'S', 'B'))
xnastring_obj <- XNAString_obj1 <- XNAString(
base = c('AGGE', 'EEEA'),
sugar = c('FFFO', 'OOOO'),
backbone = c('SBS', 'SBS'),
dictionary = my_dict
)
XNAString::XNADinucleotideFrequency(obj = xnastring_obj,
slot = 'sugar',
matrix_nbr = 2)
## FF OF FO OO
## [1,] 0.00 0.00 0.00 3.00
XNAString_obj1 <- XNAString(base = 'ATCG', sugar = 'FODD')
XNAString_obj2 <- XNAString(base = 'TTCT', sugar = 'FOLL')
XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))
XNAString::XNADinucleotideFrequency(obj = XNAStringSet_obj, slot = 'base')
## AA CA EA GA TA UA AC CC EC GC TC UC AE CE EE
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00
## GE TE UE AG CG EG GG TG UG AT CT ET GT TT UT
## [1,] 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.00
## AU CU EU GU TU UU
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00
XNAString_obj1 <- XNAString(base = c('ATCG', 'TAGC'),
sugar = c('OFOO', 'ODDF'),
backbone = c('SOS','SOS'))
XNAString_obj2 <- XNAString(base = c('GGCG', 'TATC'),
sugar = c('OOOO', 'OOFO'))
XNAStringSet_obj <- XNAStringSet(objects = list(XNAString_obj1, XNAString_obj2))
XNAString::XNADinucleotideFrequency(obj = XNAStringSet_obj, slot = 'sugar', matrix_nbr = 2)
## DD FD LD MD OD RD DF FF LF MF OF RF DL FL LL
## [1,] 1.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00
## ML OL RL DM FM LM MM OM RM DO FO LO MO OO RO
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.00
## DR FR LR MR OR RR
## [1,] 0.00 0.00 0.00 0.00 0.00 0.00
## [2,] 0.00 0.00 0.00 0.00 0.00 0.00
dt <- data.table::data.table(HELM = c("([PPG])", "[fR]", "[srP]"),
TS_BASE_SEQ = c("F", NA, NA),
TS_SUGAR_SEQ = c(NA, NA, 'F'),
TS_BACKBONE_SEQ = c(NA, 'S', NA))
dt
## HELM TS_BASE_SEQ TS_SUGAR_SEQ TS_BACKBONE_SEQ
## <char> <char> <char> <char>
## 1: ([PPG]) F <NA> <NA>
## 2: [fR] <NA> <NA> S
## 3: [srP] <NA> F <NA>
## HELM type symbol
## <char> <char> <char>
## 1: ([PPG]) base F
## 2: [srP] sugar F
## 3: [fR] backbone S
my_dict <- data.table::data.table(HELM = c('[[B]]'),
type = c('base'),
symbol = c('B'))
concatDict(my_dict)
## HELM type symbol
## <char> <char> <char>
## 1: [[B]] base B
## 2: P backbone O
## 3: [PS2] backbone 2
## 4: [Ppace] backbone A
## 5: [sP] backbone X
## 6: [sPpace] backbone B
## 7: [srP] backbone R
## 8: [ssP] backbone S
## 9: (A) base A
## 10: (C) base C
## 11: (G) base G
## 12: (T) base T
## 13: ([5meC]) base E
## 14: (U) base U
## 15: [LR] sugar L
## 16: [MOE] sugar M
## 17: [R] sugar R
## 18: [dR] sugar D
## 19: [mR] sugar O
## 20: [fR] sugar F
## HELM type symbol
helm <-
"CHEM1{[5gn2c6]}|RNA1{P.[dR](C)P.[dR](A)P.[LR](G)[sP].[LR](A)[sP].[LR](G)[sP].[LR](A)[sP].[dR](A)[sP].[dR](G)[sP].[dR](G)[sP].[dR](C)[sP].[dR](A)[sP].[dR](C)[sP].[dR](A)[sP].[dR](G)[sP].[dR](A)[sP].[LR]([5meC])[sP].[LR](G)[sP].[LR](G)}$CHEM1,RNA1,1:R2-1:R1$$$V2.0"
xna_obj <- XNAStringFromHelm(helm, name ='oligo1')
xna_obj
#> XNAString object
#> name: oligo1
#> base: GAGAAGGCACAGAEGG
#> sugar: LLLLDDDDDDDDDLLL
#> backbone: XXXXXXXXXXXXXXX
#> target: CCGTCTGTGCCTTCTC
#> conjugate5: [5gn2c6]
#> conjugate3:
#> secondary_structure: ................, 0
#> duplex_structure: ......((........&......))........, -2.1
subject <- 'ATCGATATATATACACCGTCTGTGCCTTCTCACTACATCGAG'
substitutionMatrix <- Biostrings::nucleotideSubstitutionMatrix()
#> Warning in .call_fun_in_pwalign("nucleotideSubstitutionMatrix", ...): nucleotideSubstitutionMatrix() has moved to the pwalign package. Please
#> call pwalign::nucleotideSubstitutionMatrix() to get rid of this
#> warning.
XNAString::XNAPairwiseAlignment(pattern = obj, subject = subject, substitutionMatrix = substitutionMatrix)
#> Global PairwiseAlignmentsSingleSubject (1 of 2)
#> pattern: TAG----------------CTATATATATGTGTACATACTAC
#> subject: ATCGATATATATACACCGTCTGTGCCTTCTCACTACATCGAG
#> score: -68
XNAString::XNAMatchPattern(pattern = xna_obj, subject = subject)
#> Views on a 42-letter BString subject
#> subject: ATCGATATATATACACCGTCTGTGCCTTCTCACTACATCGAG
#> views:
#> start end width
#> [1] 16 31 16 [CCGTCTGTGCCTTCTC]
base <- list(c('ATCGATAT', 'ATCGATAT'), c('TGGGGGTGC', 'ATCGGGAT'), c('CCCTAGTA'))
set_obj <- XNAStringSet(base = base)
set_obj
#> XNAStringSet object
#> name base sugar backbone target conjugate5
#> <list> <list> <list> <list> <char> <list>
#> 1: NA ATCGATAT.... DDDDDDDD.... OOOOOOO,.... ATATCGAT NA
#> 2: NA TGGGGGTG.... DDDDDDDD.... OOOOOOOO.... GCACCCCCA NA
#> 3: NA CCCTAGTA DDDDDDDD OOOOOOO TACTAGGG NA
#> conjugate3 secondary_structure
#> <list> <list>
#> 1: NA
#> 2: NA
#> 3: NA ........, 0
XNAAlphabetFrequency(obj = set_obj, slot = 'base')
#> A C E G T U
#> [1,] 3.00 1.00 0.00 1.00 3.00 0.00
#> [2,] 0.00 1.00 0.00 6.00 2.00 0.00
#> [3,] 2.00 3.00 0.00 1.00 2.00 0.00
XNAAlphabetFrequency(obj = set_obj, slot = 'base', matrix_nbr = 2)
#> A C E G T U
#> [1,] 3.00 1.00 0.00 1.00 3.00 0.00
#> [2,] 2.00 1.00 0.00 3.00 2.00 0.00
#> [3,] 0.00 0.00 0.00 0.00 0.00 0.00
XNAAlphabetFrequency(obj = set_obj, slot = 'base', as.prob = TRUE)
#> A C E G T U
#> [1,] 0.38 0.12 0.00 0.12 0.38 0.00
#> [2,] 0.00 0.11 0.00 0.67 0.22 0.00
#> [3,] 0.25 0.38 0.00 0.12 0.25 0.00
XNADinucleotideFrequency(obj = set_obj, slot = 'base', double_letters = c('AT', 'GA', 'GT'))
#> AT GA GT
#> [1,] 3.00 1.00 0.00
#> [2,] 0.00 0.00 1.00
#> [3,] 0.00 0.00 1.00
XNADinucleotideFrequency(obj = set_obj, slot = 'base', base_only = TRUE)
#> AA CA GA TA AC CC GC TC AG CG GG TG AT CT GT
#> [1,] 0.00 0.00 1.00 1.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00 3.00 0.00 0.00
#> [2,] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 4.00 2.00 0.00 0.00 1.00
#> [3,] 0.00 0.00 0.00 2.00 0.00 2.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 1.00
#> TT other
#> [1,] 0.00 0.00
#> [2,] 0.00 0.00
#> [3,] 0.00 0.00
#create custom complementary dictonary with complementary bases coded as IUPAC
compl_dict <- XNAString::complementary_bases
compl_dict[base == "G"]$target <- "Y"
compl_dict[base == "T"]$target <- "R" # if you have T in your base sequence
compl_dict[base == "U"]$target <- "R" # if you have U in your base sequence
compl_dict
#> base target compl_target
#> <char> <char> <char>
#> 1: A T A
#> 2: C G C
#> 3: G Y G
#> 4: T R T
#> 5: E G C
#> 6: U R T
xna <- XNAString::XNAStringSet(base = c("ACGTACG", "CCCGTAC", "AATACTT"), compl_dict = compl_dict)
xna
#> XNAStringSet object
#> name base sugar backbone target conjugate5 conjugate3
#> <list> <list> <list> <list> <list> <list> <list>
#> 1: NA ACGTACG DDDDDDD OOOOOO CGTACGT,.... NA NA
#> 2: NA CCCGTAC DDDDDDD OOOOOO GTACGGG,.... NA NA
#> 3: NA AATACTT DDDDDDD OOOOOO AAGTATT,.... NA NA
#> secondary_structure
#> <list>
#> 1: ......., 0
#> 2: ......., 0
#> 3: ......., 0
Here is the output of sessionInfo()
on the system on
which this document was compiled running pandoc 3.2.1:
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] Biostrings_2.75.0 GenomeInfoDb_1.43.0 XVector_0.46.0
#> [4] IRanges_2.41.0 S4Vectors_0.44.0 BiocGenerics_0.53.1
#> [7] generics_0.1.3 XNAString_1.15.0 pander_0.6.5
#> [10] knitr_1.48 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] SummarizedExperiment_1.36.0 rjson_0.2.23
#> [3] xfun_0.48 bslib_0.8.0
#> [5] htmlwidgets_1.6.4 Biobase_2.67.0
#> [7] lattice_0.22-6 vctrs_0.6.5
#> [9] tools_4.4.1 bitops_1.0-9
#> [11] curl_5.2.3 parallel_4.4.1
#> [13] Matrix_1.7-1 data.table_1.16.2
#> [15] BSgenome_1.75.0 lifecycle_1.0.4
#> [17] GenomeInfoDbData_1.2.13 stringr_1.5.1
#> [19] compiler_4.4.1 Rsamtools_2.22.0
#> [21] codetools_0.2-20 htmltools_0.5.8.1
#> [23] sys_3.4.3 buildtools_1.0.0
#> [25] sass_0.4.9 RCurl_1.98-1.16
#> [27] yaml_2.3.10 crayon_1.5.3
#> [29] jquerylib_0.1.4 BiocParallel_1.41.0
#> [31] cachem_1.1.0 DelayedArray_0.33.1
#> [33] abind_1.4-8 parallelly_1.38.0
#> [35] digest_0.6.37 stringi_1.8.4
#> [37] formattable_0.2.1 future_1.34.0
#> [39] listenv_0.9.1 restfulr_0.0.15
#> [41] maketools_1.3.1 fastmap_1.2.0
#> [43] grid_4.4.1 cli_3.6.3
#> [45] SparseArray_1.6.0 magrittr_2.0.3
#> [47] S4Arrays_1.6.0 XML_3.99-0.17
#> [49] future.apply_1.11.3 UCSC.utils_1.2.0
#> [51] pwalign_1.3.0 rmarkdown_2.28
#> [53] httr_1.4.7 globals_0.16.3
#> [55] matrixStats_1.4.1 evaluate_1.0.1
#> [57] GenomicRanges_1.59.0 BiocIO_1.17.0
#> [59] rtracklayer_1.66.0 rlang_1.1.4
#> [61] Rcpp_1.0.13 glue_1.8.0
#> [63] BiocManager_1.30.25 jsonlite_1.8.9
#> [65] R6_2.5.1 MatrixGenerics_1.19.0
#> [67] GenomicAlignments_1.43.0 zlibbioc_1.52.0