Title: | Working with modified nucleotide sequences |
---|---|
Description: | Representing nucleotide modifications in a nucleotide sequence is usually done via special characters from a number of sources. This represents a challenge to work with in R and the Biostrings package. The Modstrings package implements this functionallity for RNA and DNA sequences containing modified nucleotides by translating the character internally in order to work with the infrastructure of the Biostrings package. For this the ModRNAString and ModDNAString classes and derivates and functions to construct and modify these objects despite the encoding issues are implemenented. In addition the conversion from sequences to list like location information (and the reverse operation) is implemented as well. |
Authors: | Felix G.M. Ernst [aut, cre] , Denis L.J. Lafontaine [ctb, fnd] |
Maintainer: | Felix G.M. Ernst <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.23.0 |
Built: | 2024-10-30 08:23:23 UTC |
Source: | https://github.com/bioc/Modstrings |
These functions follow the same principle as the
Biostrings
functions. Please be
aware, that the matices can become quite large, since the alphabet of
ModString
objects contains more letters.
## S4 method for signature 'ModDNAString' hasOnlyBaseLetters(x) ## S4 method for signature 'ModRNAString' hasOnlyBaseLetters(x) ## S4 method for signature 'ModDNAString' alphabetFrequency(x, as.prob = FALSE, baseOnly = FALSE) ## S4 method for signature 'ModRNAString' alphabetFrequency(x, as.prob = FALSE, baseOnly = FALSE) ## S4 method for signature 'ModDNAStringSet' alphabetFrequency(x, as.prob = FALSE, collapse = FALSE, baseOnly = FALSE) ## S4 method for signature 'ModRNAStringSet' alphabetFrequency(x, as.prob = FALSE, collapse = FALSE, baseOnly = FALSE) ## S4 method for signature 'MaskedModString' alphabetFrequency(x, as.prob = FALSE, ...) ## S4 method for signature 'ModStringViews' letterFrequency(x, letters, OR = "|", as.prob = FALSE, ...) ## S4 method for signature 'MaskedModString' letterFrequency(x, letters, OR = "|", as.prob = FALSE) ## S4 method for signature 'ModStringSet' consensusMatrix(x, as.prob = FALSE, shift = 0L, width = NULL, baseOnly = FALSE) ## S4 method for signature 'ModDNAStringSet' consensusString(x, threshold = 0.25, shift = 0L, width = NULL) ## S4 method for signature 'ModRNAStringSet' consensusString(x, threshold = 0.25, shift = 0L, width = NULL) ## S4 method for signature 'ModStringViews' consensusString(x, threshold, shift = 0L, width = NULL)
## S4 method for signature 'ModDNAString' hasOnlyBaseLetters(x) ## S4 method for signature 'ModRNAString' hasOnlyBaseLetters(x) ## S4 method for signature 'ModDNAString' alphabetFrequency(x, as.prob = FALSE, baseOnly = FALSE) ## S4 method for signature 'ModRNAString' alphabetFrequency(x, as.prob = FALSE, baseOnly = FALSE) ## S4 method for signature 'ModDNAStringSet' alphabetFrequency(x, as.prob = FALSE, collapse = FALSE, baseOnly = FALSE) ## S4 method for signature 'ModRNAStringSet' alphabetFrequency(x, as.prob = FALSE, collapse = FALSE, baseOnly = FALSE) ## S4 method for signature 'MaskedModString' alphabetFrequency(x, as.prob = FALSE, ...) ## S4 method for signature 'ModStringViews' letterFrequency(x, letters, OR = "|", as.prob = FALSE, ...) ## S4 method for signature 'MaskedModString' letterFrequency(x, letters, OR = "|", as.prob = FALSE) ## S4 method for signature 'ModStringSet' consensusMatrix(x, as.prob = FALSE, shift = 0L, width = NULL, baseOnly = FALSE) ## S4 method for signature 'ModDNAStringSet' consensusString(x, threshold = 0.25, shift = 0L, width = NULL) ## S4 method for signature 'ModRNAStringSet' consensusString(x, threshold = 0.25, shift = 0L, width = NULL) ## S4 method for signature 'ModStringViews' consensusString(x, threshold, shift = 0L, width = NULL)
x |
a |
as.prob |
|
baseOnly |
|
collapse |
|
... |
See |
letters |
See |
OR |
See |
shift |
See |
width |
See |
threshold |
Since the amiguityMap is fixed to |
a matrix with the results (letter x pos).
mod <- ModDNAString(paste(alphabet(ModDNAString()), collapse = "")) mod hasOnlyBaseLetters(mod) alphabetFrequency(mod)
mod <- ModDNAString(paste(alphabet(ModDNAString()), collapse = "")) mod hasOnlyBaseLetters(mod) alphabetFrequency(mod)
The functions are implemented as defined in the Biostrings package. Have
a look the MaskedXString
class.
## S4 method for signature 'MaskedModString' seqtype(x)
## S4 method for signature 'MaskedModString' seqtype(x)
x |
a |
a MaskedModString
object.
# Mask positions mask <- Mask(mask.width=5, start=c(2), width=c(3)) mr <- ModRNAString("ACGU7") mr masks(mr) <- mask mr # Invert masks mr <- gaps(mr) mr # Drop the mask masks(mr) <- NULL mr
# Mask positions mask <- Mask(mask.width=5, start=c(2), width=c(3)) mr <- ModRNAString("ACGU7") mr masks(mr) <- mask mr # Invert masks mr <- gaps(mr) mr # Drop the mask masks(mr) <- NULL mr
A ModDNAString
object allows DNA sequences with modified nucleotides
to be stored and manipulated.
ModDNAString(x = "", start = 1, nchar = NA)
ModDNAString(x = "", start = 1, nchar = NA)
x |
the input as a |
start |
the postion in the character vector to use as start position in
the |
nchar |
the width of the character vector to use in the
|
The ModDNAString class contains the virtual ModString
class,
which is itself based on the XString
class. Therefore, functions for working with XString
classes are
inherited.
The alphabet
of the ModDNAString class consist of the
non-extended IUPAC codes "A,G,C,T,N", the gap letter "-", the hard masking
letter "+", the not available letter "." and letters for individual
modifications: alphabet(ModDNAString())
.
Since the special characters are encoded differently depending on the OS and
encoding settings of the R session, it is not always possible to enter a DNA
sequence containing modified nucleotides via the R console. The most
convinient solution for this problem is to use the function
modifyNucleotides
and modify and existing DNAString or
ModDNAString object.
A ModDNAString
object can be converted into a DNAString
object
using the DNAstring()
constructor. Modified nucleotides are
automaitcally converted intro their base nucleotides.
If a modified DNA nucleotide you want to work with is not part of the alphabet, please let us know.
a ModDNAString
object
# Constructing ModDNAString containing an m6A md1 <- ModDNAString("AGCT`") md1 # the alphabet of the ModDNAString class alphabet(md1) # due to encoding issues the shortNames can also be used shortName(md1) # due to encoding issues the nomenclature can also be used nomenclature(md1) # convert to DNAString d1 <- DNAString(md1) d1
# Constructing ModDNAString containing an m6A md1 <- ModDNAString("AGCT`") md1 # the alphabet of the ModDNAString class alphabet(md1) # due to encoding issues the shortNames can also be used shortName(md1) # due to encoding issues the nomenclature can also be used nomenclature(md1) # convert to DNAString d1 <- DNAString(md1) d1
modifyNucleotides
modifies a nucleotide in a sequence (or set
of sequences) based on the type of modification provided. It checks for the
identity of the base nucleotide to be
modifyNucleotides( x, at, mod, nc.type = "short", stop.on.error = TRUE, verbose = FALSE ) ## S4 method for signature 'ModString' modifyNucleotides( x, at, mod, nc.type = c("short", "nc"), stop.on.error = TRUE, verbose = FALSE ) ## S4 method for signature 'ModStringSet' modifyNucleotides( x, at, mod, nc.type = c("short", "nc"), stop.on.error = TRUE, verbose = FALSE ) ## S4 method for signature 'DNAString' modifyNucleotides( x, at, mod, nc.type = c("short", "nc"), stop.on.error = TRUE, verbose = FALSE ) ## S4 method for signature 'RNAString' modifyNucleotides( x, at, mod, nc.type = c("short", "nc"), stop.on.error = TRUE, verbose = FALSE ) ## S4 method for signature 'DNAStringSet' modifyNucleotides( x, at, mod, nc.type = c("short", "nc"), stop.on.error = TRUE, verbose = FALSE ) ## S4 method for signature 'RNAStringSet' modifyNucleotides( x, at, mod, nc.type = c("short", "nc"), stop.on.error = TRUE, verbose = FALSE )
modifyNucleotides( x, at, mod, nc.type = "short", stop.on.error = TRUE, verbose = FALSE ) ## S4 method for signature 'ModString' modifyNucleotides( x, at, mod, nc.type = c("short", "nc"), stop.on.error = TRUE, verbose = FALSE ) ## S4 method for signature 'ModStringSet' modifyNucleotides( x, at, mod, nc.type = c("short", "nc"), stop.on.error = TRUE, verbose = FALSE ) ## S4 method for signature 'DNAString' modifyNucleotides( x, at, mod, nc.type = c("short", "nc"), stop.on.error = TRUE, verbose = FALSE ) ## S4 method for signature 'RNAString' modifyNucleotides( x, at, mod, nc.type = c("short", "nc"), stop.on.error = TRUE, verbose = FALSE ) ## S4 method for signature 'DNAStringSet' modifyNucleotides( x, at, mod, nc.type = c("short", "nc"), stop.on.error = TRUE, verbose = FALSE ) ## S4 method for signature 'RNAStringSet' modifyNucleotides( x, at, mod, nc.type = c("short", "nc"), stop.on.error = TRUE, verbose = FALSE )
x |
a |
at |
the location where the modification should be made. The same input as in the original If x is a If x is a rectangular |
mod |
The modification short name or nomenclature If If |
nc.type |
the type of nomenclature to be used. Either "short" or "nc".
"Short" for m3C would be "m3C", "nc" for m3C would be "3C". (
|
stop.on.error |
For |
verbose |
See |
the input ModString
or ModStringSet
object with the changes applied
# modify nucleotides in a ModDNAString seq <- ModDNAString("AGTC") seq mseq1 <- modifyNucleotides(seq,c(1,2,4),c("1mA","7mG","3mC")) mseq1 # This fails since m7G requires a G at the selected position in the sequence ## Not run: mseq <- modifyNucleotides(seq,c(3),c("7mG")) ## End(Not run) # modify nucleotides in a ModRNAString seq <- ModRNAString("AGUC") seq mseq1 <- modifyNucleotides(seq,c(1,2,4),c("m1A","m7G","m3C")) mseq1 # This fails since m7G requires a G at the selected position in the sequence ## Not run: mseq <- modifyNucleotides(seq,c(3),c("m7G")) ## End(Not run)
# modify nucleotides in a ModDNAString seq <- ModDNAString("AGTC") seq mseq1 <- modifyNucleotides(seq,c(1,2,4),c("1mA","7mG","3mC")) mseq1 # This fails since m7G requires a G at the selected position in the sequence ## Not run: mseq <- modifyNucleotides(seq,c(3),c("7mG")) ## End(Not run) # modify nucleotides in a ModRNAString seq <- ModRNAString("AGUC") seq mseq1 <- modifyNucleotides(seq,c(1,2,4),c("m1A","m7G","m3C")) mseq1 # This fails since m7G requires a G at the selected position in the sequence ## Not run: mseq <- modifyNucleotides(seq,c(3),c("m7G")) ## End(Not run)
A ModRNAString
object allows RNA sequences with modified nucleotides
to be stored and manipulated.
ModRNAString(x = "", start = 1, nchar = NA)
ModRNAString(x = "", start = 1, nchar = NA)
x |
the input as a |
start |
the postion in the character vector to use as start position in
the |
nchar |
the width of the character vector to use in the
|
The ModRNAString class contains the virtual ModString
class,
which is itself based on the XString
class. Therefore, functions for working with XString
classes are
inherited.
The alphabet of the ModRNAString class consist of the non-extended IUPAC
codes "A,G,C,U", the gap letter "-", the hard masking letter "+", the not
available letter "." and letters for individual modifications:
alphabet(ModRNAString())
.
Since the special characters are encoded differently depending on the OS and
encoding settings of the R session, it is not always possible to enter a RNA
sequence containing modified nucleotides via the R console. The most
convinient solution for this problem is to use the function
modifyNucleotides
and modify and existing RNAString or
ModRNAString object.
A ModRNAString
object can be converted into a RNAString
object
using the RNAstring()
constructor. Modified nucleotides are
automaitcally converted intro their base nucleotides.
If a modified RNA nucleotide you want to work with is not part of the alphabet, please let us know.
a ModRNAString
object
# Constructing ModDNAString containing an m6A and a dihydrouridine mr1 <- ModRNAString("AGCU`D") mr1 # the alphabet of the ModRNAString class alphabet(mr1) # due to encoding issues the shortNames can also be used shortName(mr1) # due to encoding issues the nomenclature can also be used nomenclature(mr1) # convert to RNAString r1 <- RNAString(mr1) r1
# Constructing ModDNAString containing an m6A and a dihydrouridine mr1 <- ModRNAString("AGCU`D") mr1 # the alphabet of the ModRNAString class alphabet(mr1) # due to encoding issues the shortNames can also be used shortName(mr1) # due to encoding issues the nomenclature can also be used nomenclature(mr1) # convert to RNAString r1 <- RNAString(mr1) r1
The virtual ModString
class derives from the XString
virtual
class. Like its parent and its children, it is used for storing sequences of
characters. However, the XString
/BString
class requires single
byte characters as the letters of the input sequences. The ModString
extends the capability for multi-byte chracters by encoding these characters
into a single byte characters using a dictionary for internal conversion. It
also takes care of different encoding behavior of operating systems.
The ModDNAString
and ModRNAString
classes derive
from the ModString
class and use the functionality to store nucleotide
sequences containing modified nucleotides. To describe modified RNA and DNA
nucleotides with a single letter, special characters are commonly used, eg.
from the greek alphabet, which are multi-byte characters.
The ModString
class is virtual and it cannot be directly used to
create an object. Please have a look at ModDNAString
and
ModRNAString
for the specific alphabets of the individual
classes.
Representing nucleotide modifications in a nucleotide sequence is usually
done via special characters from a number of sources. This represents a
challenge to work with in R and the Biostrings
package. The
Modstrings
package implements this functionallity for RNA and DNA
sequences containing modified nucleotides by translating the character
internally in order to work with the infrastructure of the Biostrings
package. For this the ModRNAString
and ModDNAString
classes and
derivates and functions to construct and modify these objects despite the
encoding issues are implemenented. In addition the conversion from sequences
to list like location information (and the reverse operation) is implemented
as well.
A good place to start would be the vignette and the man page for the
ModStringSet
objects.
The alphabets for the modifications used in this package are based on the compilation of RNA modifications by http://modomics.genesilico.pl by the Bujnicki lab and DNA modifications https://dnamod.hoffmanlab.org by the Hoffman lab. Both alphabets were modified to remove some incompatible characters.
Felix G M Ernst [aut,cre] and Denis L.J. Lafontaine [ctb]
Analog to Biostrings
there are a few functions, which should only
be used internally. Otherwise take care.
## S4 method for signature 'ModDNAString' seqtype(x) ## S4 method for signature 'ModRNAString' seqtype(x) ## S4 replacement method for signature 'ModString' seqtype(x) <- value ## S4 method for signature 'ModString' XString(seqtype, x, start = NA, end = NA, width = NA) ## S4 replacement method for signature 'ModStringSet' seqtype(x) <- value ## S4 method for signature 'ModStringSet' XStringSet(seqtype, x, start = NA, end = NA, width = NA, use.names = TRUE) data(modsRNA) data(modsDNA) data(MOD_RNA_DICT_MODOMICS) data(MOD_RNA_DICT_TRNADB)
## S4 method for signature 'ModDNAString' seqtype(x) ## S4 method for signature 'ModRNAString' seqtype(x) ## S4 replacement method for signature 'ModString' seqtype(x) <- value ## S4 method for signature 'ModString' XString(seqtype, x, start = NA, end = NA, width = NA) ## S4 replacement method for signature 'ModStringSet' seqtype(x) <- value ## S4 method for signature 'ModStringSet' XStringSet(seqtype, x, start = NA, end = NA, width = NA, use.names = TRUE) data(modsRNA) data(modsDNA) data(MOD_RNA_DICT_MODOMICS) data(MOD_RNA_DICT_TRNADB)
seqtype , x , start , end , width , use.names , value
|
used internally |
An object of class DFrame
with 162 rows and 9 columns.
An object of class DFrame
with 47 rows and 5 columns.
An object of class DFrame
with 170 rows and 3 columns.
An object of class DFrame
with 60 rows and 3 columns.
a XString* object
The ModStringSet
class is a container for storing a set of
ModString
objects. It follows the same principles as the
other XStringSet
objects.
As usual the ModStringSet
containers derive directly from the
XStringSet
virtual class.
The ModStringSet
class is in itself a virtual class with two types of
derivates:
ModDNAStringSet
ModRNAStringSet
Each class can only be converted to its parent DNAStringSet
or
RNAStringSet
. The modified nucleotides will be converted to their
original nucleotides.
Please note, that due to encoding issues not all modifications can be instanciated directly from the console. The vignette contains a comphrensive explanation and examples for working around the problem.
ModDNAStringSet( x = character(), start = NA, end = NA, width = NA, use.names = TRUE ) ModRNAStringSet( x = character(), start = NA, end = NA, width = NA, use.names = TRUE )
ModDNAStringSet( x = character(), start = NA, end = NA, width = NA, use.names = TRUE ) ModRNAStringSet( x = character(), start = NA, end = NA, width = NA, use.names = TRUE )
x |
Either a character vector (with no NAs), or an ModString, ModStringSet or ModStringViews object. |
start , end , width
|
Either NA, a single integer, or an integer vector of the same length as x specifying how x should be "narrowed" (see ?narrow for the details). |
use.names |
TRUE or FALSE. Should names be preserved? |
a ModStringSet
object.
# Constructing ModDNAStringSet containing an m6A m1 <- ModDNAStringSet(c("AGCT`","AGCT`")) m1 # converting to DNAStringSet # Constructing ModRNAStringSet containing an m6A m2 <- ModRNAStringSet(c("AGCU`","AGCU`")) m2
# Constructing ModDNAStringSet containing an m6A m1 <- ModDNAStringSet(c("AGCT`","AGCT`")) m1 # converting to DNAStringSet # Constructing ModRNAStringSet containing an m6A m2 <- ModRNAStringSet(c("AGCU`","AGCU`")) m2
Functions to read/write an ModStringSet object from/to a file.
readModDNAStringSet( filepath, format = "fasta", nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE, with.qualities = FALSE ) readModRNAStringSet( filepath, format = "fasta", nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE, with.qualities = FALSE ) writeModStringSet( x, filepath, append = FALSE, compress = FALSE, compression_level = NA, format = "fasta", ... )
readModDNAStringSet( filepath, format = "fasta", nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE, with.qualities = FALSE ) readModRNAStringSet( filepath, format = "fasta", nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE, with.qualities = FALSE ) writeModStringSet( x, filepath, append = FALSE, compress = FALSE, compression_level = NA, format = "fasta", ... )
filepath , format , nrec , skip , seek.first.rec , use.names , with.qualities , append , compress , compression_level , ...
|
See |
x |
A |
A ModStringSet
of the defined type.
seqs <- paste0(paste(alphabet(ModDNAString()), collapse = ""), c("A","G","T")) seqs set <- ModDNAStringSet(seqs) set file <- tempfile() writeModStringSet(set, file) read <- readModDNAStringSet(file) read
seqs <- paste0(paste(alphabet(ModDNAString()), collapse = ""), c("A","G","T")) seqs set <- ModDNAStringSet(seqs) set file <- tempfile() writeModStringSet(set, file) read <- readModDNAStringSet(file) read
title
ModDNAStringSetList(..., use.names = TRUE) ModRNAStringSetList(..., use.names = TRUE)
ModDNAStringSetList(..., use.names = TRUE) ModRNAStringSetList(..., use.names = TRUE)
... |
|
use.names |
|
a ModStringSetList
object.
mrseq <- c("ACGU7","ACGU7","ACGU7","ACGU7") mrseq # Example: contruction of ModStringSetlist from ModString objects mr <- ModRNAString("ACGU7") mr mrs <- ModRNAStringSet(list(mr,mr,mr,mr)) mrs mrsl <- ModRNAStringSetList(mrs,mrs) mrsl # Example: construction of ModStringSetlist from mixed sources mrsl2 <- ModRNAStringSetList(mrs,mrseq) mrsl2
mrseq <- c("ACGU7","ACGU7","ACGU7","ACGU7") mrseq # Example: contruction of ModStringSetlist from ModString objects mr <- ModRNAString("ACGU7") mr mrs <- ModRNAStringSet(list(mr,mr,mr,mr)) mrs mrsl <- ModRNAStringSetList(mrs,mrs) mrsl # Example: construction of ModStringSetlist from mixed sources mrsl2 <- ModRNAStringSetList(mrs,mrseq) mrsl2
As the XStringViews
the
ModStringViews
is the basic container for storing a set of views on
the same sequence (this time a ModString
object).
## S4 method for signature 'ModString' Views(subject, start = NULL, end = NULL, width = NULL, names = NULL)
## S4 method for signature 'ModString' Views(subject, start = NULL, end = NULL, width = NULL, names = NULL)
subject , start , end , width , names
|
See |
For the details have a look at the
XStringViews
class.
a ModStringViews
object.
seq <- ModDNAString("AGC6AGC6") seq v <- Views(seq, start = 3:1, end = 6:8) v
seq <- ModDNAString("AGC6AGC6") seq v <- Views(seq, start = 3:1, end = 6:8) v
title
QualityScaledModDNAStringSet(x, quality) QualityScaledModRNAStringSet(x, quality) readQualityScaledModDNAStringSet( filepath, quality.scoring = c("phred", "solexa", "illumina"), nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE ) readQualityScaledModRNAStringSet( filepath, quality.scoring = c("phred", "solexa", "illumina"), nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE ) writeQualityScaledModStringSet( x, filepath, append = FALSE, compress = FALSE, compression_level = NA )
QualityScaledModDNAStringSet(x, quality) QualityScaledModRNAStringSet(x, quality) readQualityScaledModDNAStringSet( filepath, quality.scoring = c("phred", "solexa", "illumina"), nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE ) readQualityScaledModRNAStringSet( filepath, quality.scoring = c("phred", "solexa", "illumina"), nrec = -1L, skip = 0L, seek.first.rec = FALSE, use.names = TRUE ) writeQualityScaledModStringSet( x, filepath, append = FALSE, compress = FALSE, compression_level = NA )
x |
For the For |
quality |
A
|
filepath , nrec , skip , seek.first.rec , use.names , append , compress , compression_level
|
|
quality.scoring |
Specify the quality scoring used in the FASTQ file.
Must be one of "phred" (the default), "solexa", or "illumina". If set to "
phred" (or "solexa" or "illumina"), the qualities will be stored in a
|
a QualityScaledModDNAStringSet
or
QualityScaledModDNAStringSet
object
seq <- ModRNAString("AGCU7") seq qseq <- PhredQuality(paste0(rep("!", length(seq)), collapse = "")) qseq qset <- QualityScaledModRNAStringSet(seq, qseq) qset
seq <- ModRNAString("AGCU7") seq qseq <- PhredQuality(paste0(rep("!", length(seq)), collapse = "")) qseq qset <- QualityScaledModRNAStringSet(seq, qseq) qset
replaceLetterAt
replaces a letter in a ModString
objects
with a new letter. In contrast to modifyNucleotides
it does not
check the letter to be replaced for its identity, it just replaces it and
behaves exactly like the
## S4 method for signature 'ModString' replaceLetterAt(x, at, letter, verbose = FALSE) ## S4 method for signature 'ModStringSet' replaceLetterAt(x, at, letter, verbose = FALSE)
## S4 method for signature 'ModString' replaceLetterAt(x, at, letter, verbose = FALSE) ## S4 method for signature 'ModStringSet' replaceLetterAt(x, at, letter, verbose = FALSE)
x |
a |
at |
the location where the replacement should be made. The same input as in If x is a If x is a rectangular |
letter |
The new letters. The same input as in If x is a If x is a rectangular |
verbose |
See |
the input ModString
or ModStringSet
object with the changes applied
# Replacing the last two letters in a ModDNAString seq1 <- ModDNAString("AGTC") seq seq2 <- replaceLetterAt(seq1,c(3,4),"CT") seq2 # Now containg and m3C seq2 <- replaceLetterAt(seq1,c(3,4),ModDNAString("/T")) seq2 # Replacing the last two letters in a set of sequences set1 <- ModDNAStringSet(c("AGTC","AGTC")) set1 set2 <- replaceLetterAt(set1, matrix(rep(c(FALSE,FALSE,TRUE,TRUE),2), nrow = 2, byrow = TRUE), c("CT","CT")) set2
# Replacing the last two letters in a ModDNAString seq1 <- ModDNAString("AGTC") seq seq2 <- replaceLetterAt(seq1,c(3,4),"CT") seq2 # Now containg and m3C seq2 <- replaceLetterAt(seq1,c(3,4),ModDNAString("/T")) seq2 # Replacing the last two letters in a set of sequences set1 <- ModDNAStringSet(c("AGTC","AGTC")) set1 set2 <- replaceLetterAt(set1, matrix(rep(c(FALSE,FALSE,TRUE,TRUE),2), nrow = 2, byrow = TRUE), c("CT","CT")) set2
Since the one letter nomenclature for RNA and DNA modification differs depending on the source, a translation to a common alphabet is necessary.
sanitizeInput
exchanges based on a dictionary. The dictionary is
expected to be a DataFrame
with two columns, mods_abbrev
and
short_name
. Based on the short_name
the characters from in the
input are converted from values of mods_abbrev
into the the ones
from alphabet
.
Only different values will be searched for and exchanged.
sanitizeFromModomics
and sanitizeFromtRNAdb
use a predefined
dictionary, which is builtin.
sanitizeInput(input, dictionary) sanitizeFromModomics(input) sanitizeFromtRNAdb(input)
sanitizeInput(input, dictionary) sanitizeFromModomics(input) sanitizeFromtRNAdb(input)
input |
a |
dictionary |
a DataFrame containing at least two columns
|
the modified character
vector compatible for constructing a
ModString
object.
# Modomics chr <- "AGC@" # Error since the @ is not in the alphabet ## Not run: seq <- ModRNAString(chr) ## End(Not run) seq <- ModRNAString(sanitizeFromModomics(chr)) seq # tRNAdb chr <- "AGC+" # No error but the + has a different meaning in the alphabet ## Not run: seq <- ModRNAString(chr) ## End(Not run) seq <- ModRNAString(sanitizeFromtRNAdb(chr)) seq
# Modomics chr <- "AGC@" # Error since the @ is not in the alphabet ## Not run: seq <- ModRNAString(chr) ## End(Not run) seq <- ModRNAString(sanitizeFromModomics(chr)) seq # tRNAdb chr <- "AGC+" # No error but the + has a different meaning in the alphabet ## Not run: seq <- ModRNAString(chr) ## End(Not run) seq <- ModRNAString(sanitizeFromtRNAdb(chr)) seq
XString
and a GRanges
objectWith combineIntoModstrings
and separate
the construction and
deconstruction of ModString Objects from an interacive session avoiding
problematic encoding issues. In addition, modification information can be
transfered from/to tabular data with these functions.
combineIntoModstrings
expects seqnames(gr)
or names(gr)
to match the available names(x)
. Only information with strand
information *
and +
are used.
separate
when used with a GRanges
/GRangesList
object
will return an object of the same type, but with modifications seperated. For
example an element with mod = "m1Am"
will be returned as two elements
with mod = c("Am","m1A")
. The reverse operation is available via
combineModifications()
.
removeIncompatibleModifications
filters incompatible modification from
a GRanges
or GRangesList
. incompatibleModifications()
returns the logical vector used for this operation.
separate(x, nc.type = "short") combineIntoModstrings( x, gr, with.qualities = FALSE, quality.type = "Phred", stop.on.error = TRUE, verbose = FALSE, ... ) combineModifications(gr, ...) incompatibleModifications(gr, x, ...) removeIncompatibleModifications(gr, x, ...) ## S4 method for signature 'ModString' separate(x, nc.type = c("short", "nc")) ## S4 method for signature 'ModStringSet' separate(x, nc.type = c("short", "nc")) ## S4 method for signature 'GRanges' separate(x) ## S4 method for signature 'GRangesList' separate(x) ## S4 method for signature 'XString,GRanges' combineIntoModstrings( x, gr, with.qualities = FALSE, quality.type = "Phred", stop.on.error = TRUE, verbose = FALSE, ... ) ## S4 method for signature 'XStringSet,GRangesList' combineIntoModstrings( x, gr, with.qualities = FALSE, quality.type = "Phred", stop.on.error = TRUE, verbose = FALSE, ... ) ## S4 method for signature 'XStringSet,GRanges' combineIntoModstrings( x, gr, with.qualities = FALSE, quality.type = "Phred", stop.on.error = TRUE, verbose = FALSE, ... ) ## S4 method for signature 'GRanges' combineModifications(gr) ## S4 method for signature 'GRangesList' combineModifications(gr) ## S4 method for signature 'GRanges,XString' incompatibleModifications(gr, x) ## S4 method for signature 'GRanges,XStringSet' incompatibleModifications(gr, x) ## S4 method for signature 'GRangesList,XStringSet' incompatibleModifications(gr, x) ## S4 method for signature 'GRanges,XString' removeIncompatibleModifications(gr, x) ## S4 method for signature 'GRanges,XStringSet' removeIncompatibleModifications(gr, x) ## S4 method for signature 'GRangesList,XStringSet' removeIncompatibleModifications(gr, x)
separate(x, nc.type = "short") combineIntoModstrings( x, gr, with.qualities = FALSE, quality.type = "Phred", stop.on.error = TRUE, verbose = FALSE, ... ) combineModifications(gr, ...) incompatibleModifications(gr, x, ...) removeIncompatibleModifications(gr, x, ...) ## S4 method for signature 'ModString' separate(x, nc.type = c("short", "nc")) ## S4 method for signature 'ModStringSet' separate(x, nc.type = c("short", "nc")) ## S4 method for signature 'GRanges' separate(x) ## S4 method for signature 'GRangesList' separate(x) ## S4 method for signature 'XString,GRanges' combineIntoModstrings( x, gr, with.qualities = FALSE, quality.type = "Phred", stop.on.error = TRUE, verbose = FALSE, ... ) ## S4 method for signature 'XStringSet,GRangesList' combineIntoModstrings( x, gr, with.qualities = FALSE, quality.type = "Phred", stop.on.error = TRUE, verbose = FALSE, ... ) ## S4 method for signature 'XStringSet,GRanges' combineIntoModstrings( x, gr, with.qualities = FALSE, quality.type = "Phred", stop.on.error = TRUE, verbose = FALSE, ... ) ## S4 method for signature 'GRanges' combineModifications(gr) ## S4 method for signature 'GRangesList' combineModifications(gr) ## S4 method for signature 'GRanges,XString' incompatibleModifications(gr, x) ## S4 method for signature 'GRanges,XStringSet' incompatibleModifications(gr, x) ## S4 method for signature 'GRangesList,XStringSet' incompatibleModifications(gr, x) ## S4 method for signature 'GRanges,XString' removeIncompatibleModifications(gr, x) ## S4 method for signature 'GRanges,XStringSet' removeIncompatibleModifications(gr, x) ## S4 method for signature 'GRangesList,XStringSet' removeIncompatibleModifications(gr, x)
x |
For For |
nc.type |
the type of nomenclature to be used. Either "short" or "nc".
"Short" for m3C would be "m3C", "nc" for m3C would be "3C". (
|
gr |
a GRanges object |
with.qualities |
|
quality.type |
the type of |
stop.on.error |
For |
verbose |
For |
... |
|
for separate
a GRanges
object and for
combineIntoModstrings
a ModString*
object or a
QualityScaledModStringSet
, if with.qualities = TRUE
.
library(GenomicRanges) # ModDNAString seq <- ModDNAString(paste(alphabet(ModDNAString()), collapse = "")) seq gr <- separate(seq) gr seq2 <- combineIntoModstrings(as(seq,"DNAString"),gr) seq2 seq == seq2 # ModRNAString seq <- ModRNAString(paste(alphabet(ModRNAString()), collapse = "")) seq gr <- separate(seq) gr # Separating RNA modifications gr <- gr[1] separate(gr) # ... and combine them again (both operations work only on a subset of # modifications) combineModifications(separate(gr)) # handling incompatible modifications seq <- RNAString("AGCU") gr <- GRanges(c("chr1:1:+","chr1:2:+"),mod="m1A") incompatibleModifications(gr,seq) # removeIncompatibleModifications(gr,seq)
library(GenomicRanges) # ModDNAString seq <- ModDNAString(paste(alphabet(ModDNAString()), collapse = "")) seq gr <- separate(seq) gr seq2 <- combineIntoModstrings(as(seq,"DNAString"),gr) seq2 seq == seq2 # ModRNAString seq <- ModRNAString(paste(alphabet(ModRNAString()), collapse = "")) seq gr <- separate(seq) gr # Separating RNA modifications gr <- gr[1] separate(gr) # ... and combine them again (both operations work only on a subset of # modifications) combineModifications(separate(gr)) # handling incompatible modifications seq <- RNAString("AGCU") gr <- GRanges(c("chr1:1:+","chr1:2:+"),mod="m1A") incompatibleModifications(gr,seq) # removeIncompatibleModifications(gr,seq)
The alphabet()
, shortName()
fullName()
and
nomenclature()
functions return the letters, names and associated
abbreviations for the type of ModString. alphabet()
returns the normal
letters and modification letters, whereas shortName()
,
fullName()
and nomenclature()
return results for modifications
only.
shortName(x) fullName(x) nomenclature(x) ## S4 method for signature 'ModString' alphabet(x, baseOnly = FALSE) ## S4 method for signature 'ModStringSet' alphabet(x, baseOnly = FALSE) ## S4 method for signature 'ModString' shortName(x) ## S4 method for signature 'ModStringSet' shortName(x) ## S4 method for signature 'ModString' fullName(x) ## S4 method for signature 'ModStringSet' fullName(x) ## S4 method for signature 'ModString' nomenclature(x) ## S4 method for signature 'ModStringSet' nomenclature(x)
shortName(x) fullName(x) nomenclature(x) ## S4 method for signature 'ModString' alphabet(x, baseOnly = FALSE) ## S4 method for signature 'ModStringSet' alphabet(x, baseOnly = FALSE) ## S4 method for signature 'ModString' shortName(x) ## S4 method for signature 'ModStringSet' shortName(x) ## S4 method for signature 'ModString' fullName(x) ## S4 method for signature 'ModStringSet' fullName(x) ## S4 method for signature 'ModString' nomenclature(x) ## S4 method for signature 'ModStringSet' nomenclature(x)
x |
a |
baseOnly |
|
a character vector.
alphabet(ModDNAString()) shortName(ModDNAString()) nomenclature(ModDNAString())
alphabet(ModDNAString()) shortName(ModDNAString()) nomenclature(ModDNAString())